I have a very strange problem. I want to replace every thing in a
string except letters, numbers, space, and certain symbols listed in
the regex expression below
"blahblah !@#$%^&*()--.,<>=".replaceAll("[^A-Za-z0-9/-?:().,'+^| ]","")
I expect to get the following string back:
"blahblah ^().,"
but I actually get the following:
"blahblah ^().,<>="
Notice the greater/less than and equals signs are still there!
I did a quick check using this site:
http://www.fileformat.info/tool/regex.htm and I get the same result
back. What's going on here???
colirl - 22 Feb 2006 17:13 GMT
Hi,
I have not actually run the regex with my fixes but first of all there
are a few problems. Characters like $, |, [, ), \, / and so on are
peculiar cases in regular expressions. If you want to match for one of
those then you have to preceed it by a backslash. So:
\| # Vertical bar
\[ # An open square bracket
\) # A closing parenthesis
\* # An asterisk
\^ # A carat symbol
\/ # A slash
\\ # A backslash
Try this for your ( and ) and see if it makes any difference! I dont'
have the time to test the fix but thats what you need to do for special
charachters.
falcon - 22 Feb 2006 17:19 GMT
colirl,
That doesn't seem to work. Besides, it is replacing most of right
characters with blanks, for some reason it keeps relational symbols
(><=).
falcon - 22 Feb 2006 17:29 GMT
Sorry, it does work, I had to move some chars in my regext string
around, but adding back slashes to those characters which have special
meaning apparently was the problem. Thanks colirl!
colirl - 22 Feb 2006 17:44 GMT
so then you get
blahblah ^()--.,
Solution:
"blahblah !@#$%^&*()--.,<>=".replaceAll("[^A-Za-z0-9/\-?:().,'+^|
]","")
Enjoy :)
Roedy Green - 24 Feb 2006 13:23 GMT
>Sorry, it does work, I had to move some chars in my regext string
>around, but adding back slashes to those characters which have special
>meaning apparently was the problem. Thanks colirl!
It its pretty hairy since \ is used both by regex and string literals
for quoting. so a literal \ becomes \\\\
see http://mindprod.com/jgloss\regex.html#QUOTING

Signature
Canadian Mind Products, Roedy Green.
http://mindprod.com Java custom programming, consulting and coaching.
colirl - 22 Feb 2006 17:42 GMT
Ok you need to escape the - symbol because as I said, specail
characters need to be escaped.
try [^A-Za-z0-9/\-?:().,'+^| ] as your expression. gets rid of
the ><= for me.
Rob Skedgell - 22 Feb 2006 21:20 GMT
> Ok you need to escape the - symbol because as I said, specail
> characters need to be escaped.
>
> try [^A-Za-z0-9/\-?:().,'+^| ] as your expression. gets rid of
> the ><= for me.
Or you can put a '-' unescaped as the last character in a character
class, since it doesn't form part of a range e.g. "[a-z-]" will match
"-". Admittedly the 1.5.0 javadocs for java.util.regex.Pattern at
<http://java.sun.com/j2se/1.5.0/docs/api/java/util/regex/Pattern.html#cc>
don't make that clear: "inside a character class ... the expression -
becomes a range forming metacharacter."

Signature
Rob Skedgell <rob+news@nephelococcygia.demon.co.uk>
GnuPG/PGP: 7DA3 1579 C0DD 8748 C05A B984 E2A2 3234 D14B 6DD7
colirl - 22 Feb 2006 22:53 GMT
Rob,
ya, I had forgotten about that one. :).
Jussi Piitulainen - 22 Feb 2006 18:16 GMT
> I have a very strange problem. I want to replace every thing in a
> string except letters, numbers, space, and certain symbols listed in
> the regex expression below
>
> "blahblah !@#$%^&*()--.,<>=".replaceAll("[^A-Za-z0-9/-?:().,'+^| ]","")
/-? contains :;<=>.
falcon - 22 Feb 2006 20:24 GMT
Jussi,
I already fixed the problem, but its amusing that I missed seeing /-?
as *from '/'* *to '?'*
Thanks :)