Home | Contact Us | FAQ | Search & Site Map | Link to Us
Sign In | Join | Other 45 Sites in Network
HomeAnnouncementsWhite Papers
Discussion GroupsFirst AidDatabasesJavaBeansGUIJava 3DVirtual MachineCORBASecurityToolsGeneral
Java DirectoryOpen Source ProjectsSample Book ChaptersUser GroupsWeb Resources
Related Topics
Databases.NETMore Topics ...

Java Forum / General / July 2008

Tip: Looking for answers? Try searching our database.

Matching parentheses with Regular Expressions

Thread view: 
James - 04 Jul 2008 02:12 GMT
I`m trying to use regex to match/replace a word in parentheses.
The regular expression

   private static final Pattern java_proc =
Pattern.compile("(java)");

does not work, because parentheses are treated as groupings.

  Using "\" to designate the parentheses as literal characters does
not work --- not sure why:

   private static final Pattern java_proc = Pattern.compile("\(java
\)");

     I searched for and read a related post here, but it did not
help.  I seem to be having a different problem than they.  Or I just
don`t understand the post.

     What am I doing wrong?       Thanks, Alan
James - 04 Jul 2008 02:23 GMT
OK, I finally found the words about using double slashes in front of
parentheses.  So, now, why won`t the following regular expression
pattern compile?

private static final Pattern java_proc = Pattern.compile("\\\\.+\
\Process\\(java\\)\\");

The error says:

java.lang.ExceptionInInitializerError
Caused by: java.util.regex.PatternSyntaxException: Unknown character
property name {r} near index 6
\\.+\Process\(java\)\
     ^

This does not make sense to me.

   I`m trying to match text of the form (example):

\\GOLLY\Process(java)\% Processor Time

           Thanks, Alan
Joshua Cranmer - 04 Jul 2008 02:31 GMT
> OK, I finally found the words about using double slashes in front of
> parentheses.  So, now, why won`t the following regular expression
[quoted text clipped - 10 lines]
> \\.+\Process\(java\)\
>       ^

This is what the regex is seeing. Don't forget that `\' is also a
metacharacter in regexes. So to match a '\' in regex requires you to use
'\\\\', which causes the regex to see '\\', which is what it uses to
match as a '\'. So the regex you're probably trying to compile:
"\\\\{2}.+\\\\Process\\(java\\)\\\\" (The {2} is so that you don't have
to type in 8 slashes)

Signature

Beware of bugs in the above code; I have only proved it correct, not
tried it. -- Donald E. Knuth

James - 04 Jul 2008 02:44 GMT
Thank you.

  I have one last remaining problem.  The full data I`m working with,
in CSV format, looks like this:

"(PDH-CSV 4.0) (Eastern Daylight Time)(240)","\\GOLLY\Memory\%
Committed Bytes In Use","\\GOLLY\Process(java)\% Processor Time"

I want to match on

\\GOLLY\Process(java)\

so I can replace it.

   The regular expression

\\\\{2}.+\\\\Process\\(java\\).

matches, but it matches too much of it:

\\GOLLY\Memory\% Committed Bytes In Use","\\GOLLY\Process(java)\

   How can I get it to only match the part I want?

                   Thanks again, Alan
Joshua Cranmer - 04 Jul 2008 02:52 GMT
>     The regular expression
>
> \\\\{2}.+\\\\Process\\(java\\).
>
> matches, but it matches too much of it:

In that case, you probably want this regex:
\\\\{2}[^\\\\]+\\\\Process\\(java\\)
Signature

Beware of bugs in the above code; I have only proved it correct, not
tried it. -- Donald E. Knuth

shakah - 04 Jul 2008 13:04 GMT
> >     The regular expression
>
[quoted text clipped - 8 lines]
> Beware of bugs in the above code; I have only proved it correct, not
> tried it. -- Donald E. Knuth

FWIW, you could avoid a little of the backslash escape mess
by using single-char character classes, e.g.:
 Pattern.compile("[\\]{2}[^\\]+[\\]Process[(]java[)]") ;
 // ...outside of a Java string that'd be [\]{2}[^\]+
[\]Process[(]java[)]
Mark Space - 04 Jul 2008 19:36 GMT
>>>     The regular expression
>>> \\\\{2}.+\\\\Process\\(java\\).
[quoted text clipped - 12 lines]
>   // ...outside of a Java string that'd be [\]{2}[^\]+
> [\]Process[(]java[)]

You also might get rid of some of those backslashes by substituting
another character, then using replace() on the string before compiling it.

      final static String PATTERN = "``{2}.+``Process`(java`)";

      String myRegex = PATTERN.replace("`", "\\" );
      System.out.println( myRegex );

Result:

\\{2}.+\\Process\(java\)

It just makes things more readable.  Using `, or %, or # in a string,
then replace that character with \'s before compiling it as a regex can
save your eyes.

Incidentally, I wonder if Sun could be convinced to add this themselves.
 Maybe add a new operator/keyword altogether.  Like # introduces new
keywords or operators.  It's followed by the keyword or operator.  This
just allows Sun to make new keywords or operators, with out breaking any
existing code.  So #s might give us new string constatns.  Let's say '
then means like a Unix shell string, where escaping is ignored.

  String regex = #s'\\{2}.+\\Process\(java\)';

Would give that literal string, without the need to escape the
backslashes.  Easier for regex at least.  Other types of flags besides '
could be introduced too.  `,$,@,%,= might do the same thing, just use a
different character as a string terminator, in case you want a ' to be
part of the string.  """ might introduce a "here-is" operator.  Etc.

Just thinking out loud....
Roedy Green - 04 Jul 2008 19:50 GMT
On Fri, 04 Jul 2008 11:36:12 -0700, Mark Space
<markspace@sbc.global.net> wrote, quoted or indirectly quoted someone
who said :

>You also might get rid of some of those backslashes by substituting
>another character, then using replace() on the string before compiling it.

Other ideas:

1. Use Quoter to insert \ quoting, both for regex and Java strings.
see http://mindprod.com/applet/quoter.html

2. implement one or more of my regex student projects
http://mindprod.com/project/regexutility.html
http://mindprod.com/project/regexcomposer.html
http://mindprod.com/project/regexdebugger.html
http://mindprod.com/project/regexproofreader.html

3. use \Q ... \E
Signature


Roedy Green Canadian Mind Products
The Java Glossary
http://mindprod.com

Mark Space - 04 Jul 2008 20:05 GMT
> 3. use \Q ... \E

OK, that's cool.  It only works with regex, but it's darn handy for
them.  Thanks!
James - 05 Jul 2008 20:48 GMT
shakah,

  The statement

           Pattern JAVA_PROC = Pattern.compile("[\\]{2}[^\\]+[\
\]Process[(]java[)]");

compiles but raises an exception there:

run:
Exception in thread "main" java.util.regex.PatternSyntaxException:
Unclosed character class near index 30
[\]{2}[^\]+[\]Process[(]java[)]
                             ^

All:  Thank you for your suggestions.
Roedy Green - 05 Jul 2008 21:31 GMT
On Sat, 5 Jul 2008 12:48:44 -0700 (PDT), James
<jalanthomas@verizon.net> wrote, quoted or indirectly quoted someone
who said :

>[\]{2}[^\]+[\]Process[(]java[)]
>                              ^

() both need escapes.  If that is a Java literal, you also need to
escape \ both for Java and for regex.

see http://mindprod.com/jgloss/regex.html#QUOTING
Signature


Roedy Green Canadian Mind Products
The Java Glossary
http://mindprod.com

Joshua Cranmer - 05 Jul 2008 21:34 GMT
> Exception in thread "main" java.util.regex.PatternSyntaxException:
> Unclosed character class near index 30
> [\]{2}[^\]+[\]Process[(]java[)]

You have to quote the slashes here still since the slashes are currently
quoting the close of the character class expression.

Signature

Beware of bugs in the above code; I have only proved it correct, not
tried it. -- Donald E. Knuth

Stefan Ram - 04 Jul 2008 02:24 GMT
>private static final Pattern java_proc = Pattern.compile("\(java\)");

 private static final Pattern java_proc = Pattern.compile("\\(java\\)");
Arved Sandstrom - 04 Jul 2008 03:22 GMT
>   I`m trying to use regex to match/replace a word in parentheses.
> The regular expression
[quoted text clipped - 15 lines]
>
>      What am I doing wrong?       Thanks, Alan

Double backslash your pattern: \\(java)\\

AHS
Roedy Green - 04 Jul 2008 05:23 GMT
On Thu, 3 Jul 2008 18:12:55 -0700 (PDT), James
<jalanthomas@verizon.net> wrote, quoted or indirectly quoted someone
who said :

>    private static final Pattern java_proc = Pattern.compile("\(java
>\)");

It gets complicated because you have both Java and regex escape
quoting.

See http://mindprod.com/jgloss/regex.html#QUOTING

Signature

Roedy Green Canadian Mind Products
The Java Glossary
http://mindprod.com

Roedy Green - 06 Jul 2008 07:59 GMT
On Thu, 3 Jul 2008 18:12:55 -0700 (PDT), James
<jalanthomas@verizon.net> wrote, quoted or indirectly quoted someone
who said :

>   I`m trying to use regex to match/replace a word in parentheses.
>The regular expression

An aside,  you can't use a regex to tell if ( ) are nested and
balanced correctly to arbitrary depth.

For that you need a parser.

See http://mindprod.com/jgloss/parser.html
Signature


Roedy Green Canadian Mind Products
The Java Glossary
http://mindprod.com



Free Magazines

Get these publications absolutely FREE for up to 12 months. There are no hidden fees and no obligation. Simply choose a title, complete the application form and submit it. Read more ...

Oracle MagazineNetwork ComputingComputer WorldBio-IT WorldeWeekInformation WeekInfosecurity
 
Sign In
Join
My Latest Posts
My Monitored Threads
My Blog
My Photo Gallery
My Profile
My Homepage

Start New Thread
Enable EMail Alerts
Rate this Thread



©2008 Advenet LLC   Privacy Policy - Terms of Use
This website includes both content owned or controlled by Advenet as well as content owned or controlled by third parties.