Home | Contact Us | FAQ | Search & Site Map | Link to Us
Sign In | Join | Other 45 Sites in Network
HomeAnnouncementsWhite Papers
Discussion GroupsFirst AidDatabasesJavaBeansGUIJava 3DVirtual MachineCORBASecurityToolsGeneral
Java DirectoryOpen Source ProjectsSample Book ChaptersUser GroupsWeb Resources
Related Topics
Databases.NETMore Topics ...

Java Forum / General / February 2006

Tip: Looking for answers? Try searching our database.

regex problem: 'greater than' 'less than' and 'equals' not matching!

Thread view: 
falcon - 22 Feb 2006 16:58 GMT
I have a very strange problem.  I want to replace every thing in a
string except letters, numbers, space, and certain symbols listed in
the regex expression below

"blahblah !@#$%^&*()--.,<>=".replaceAll("[^A-Za-z0-9/-?:().,'+^| ]","")

I expect to get the following string back:
"blahblah ^().,"

but I actually get the following:
"blahblah ^().,<>="

Notice the greater/less than and equals signs are still there!

I did a quick check using this site:
http://www.fileformat.info/tool/regex.htm and I get the same result
back.  What's going on here???
colirl - 22 Feb 2006 17:13 GMT
Hi,

I have not actually run the regex with my fixes but first of all there
are a few problems. Characters like $, |, [, ), \, / and so on are
peculiar cases in regular expressions. If you want to match for one of
those then you have to preceed it by a backslash. So:

\|        # Vertical bar
\[        # An open square bracket
\)        # A closing parenthesis
\*        # An asterisk
\^        # A carat symbol
\/        # A slash
\\        # A backslash

Try this for your ( and ) and see if it makes any difference! I dont'
have the time to test the fix but thats what you need to do for special
charachters.
falcon - 22 Feb 2006 17:19 GMT
colirl,
That doesn't seem to work.  Besides, it is replacing most of right
characters with blanks, for some reason it keeps relational symbols
(><=).
falcon - 22 Feb 2006 17:29 GMT
Sorry, it does work, I had to move some chars in my regext string
around, but adding back slashes to those characters which have special
meaning apparently was the problem.  Thanks colirl!
colirl - 22 Feb 2006 17:44 GMT
so then you get

blahblah ^()--.,

Solution:
"blahblah !@#$%^&*()--.,<>=".replaceAll("[^A-Za-z0-9/\-?:().,'+^|
]","")

Enjoy :)
Roedy Green - 24 Feb 2006 13:23 GMT
>Sorry, it does work, I had to move some chars in my regext string
>around, but adding back slashes to those characters which have special
>meaning apparently was the problem.  Thanks colirl!

It its pretty hairy since \ is used both by regex and string literals
for quoting. so a literal \ becomes \\\\
see http://mindprod.com/jgloss\regex.html#QUOTING
Signature

Canadian Mind Products, Roedy Green.
http://mindprod.com Java custom programming, consulting and coaching.

colirl - 22 Feb 2006 17:42 GMT
Ok you need to escape the - symbol because as I said, specail
characters need to be escaped.

try [^A-Za-z0-9/\-?:().,'+^| ]       as your expression. gets rid of
the ><= for me.
Rob Skedgell - 22 Feb 2006 21:20 GMT
> Ok you need to escape the - symbol because as I said, specail
> characters need to be escaped.
>
> try [^A-Za-z0-9/\-?:().,'+^| ]       as your expression. gets rid of
> the ><= for me.

Or you can put a '-' unescaped as the last character in a character
class, since it doesn't form part of a range e.g. "[a-z-]" will match
"-". Admittedly the 1.5.0 javadocs for java.util.regex.Pattern at
<http://java.sun.com/j2se/1.5.0/docs/api/java/util/regex/Pattern.html#cc>
don't make that clear: "inside a character class ... the expression -
becomes a range forming metacharacter."

Signature

Rob Skedgell <rob+news@nephelococcygia.demon.co.uk>
GnuPG/PGP: 7DA3 1579 C0DD 8748 C05A  B984 E2A2 3234 D14B 6DD7

colirl - 22 Feb 2006 22:53 GMT
Rob,
ya, I had forgotten about that one. :).
Jussi Piitulainen - 22 Feb 2006 18:16 GMT
> I have a very strange problem.  I want to replace every thing in a
> string except letters, numbers, space, and certain symbols listed in
> the regex expression below
>
> "blahblah !@#$%^&*()--.,<>=".replaceAll("[^A-Za-z0-9/-?:().,'+^| ]","")

/-? contains :;<=>.
falcon - 22 Feb 2006 20:24 GMT
Jussi,
I already fixed the problem, but its amusing that I missed seeing /-?
as *from '/'*  *to '?'*

Thanks :)


Free Magazines

Get these publications absolutely FREE for up to 12 months. There are no hidden fees and no obligation. Simply choose a title, complete the application form and submit it. Read more ...

Oracle MagazineNetwork ComputingComputer WorldBio-IT WorldeWeekInformation WeekInfosecurity
 
Sign In
Join
My Latest Posts
My Monitored Threads
My Blog
My Photo Gallery
My Profile
My Homepage

Start New Thread
Enable EMail Alerts
Rate this Thread



©2009 Advenet LLC   Privacy Policy - Terms of Use
This website includes both content owned or controlled by Advenet as well as content owned or controlled by third parties.