Java Forum / General / July 2008
Matching parentheses with Regular Expressions
James - 04 Jul 2008 02:12 GMT I`m trying to use regex to match/replace a word in parentheses. The regular expression
private static final Pattern java_proc = Pattern.compile("(java)");
does not work, because parentheses are treated as groupings.
Using "\" to designate the parentheses as literal characters does not work --- not sure why:
private static final Pattern java_proc = Pattern.compile("\(java \)");
I searched for and read a related post here, but it did not help. I seem to be having a different problem than they. Or I just don`t understand the post.
What am I doing wrong? Thanks, Alan
James - 04 Jul 2008 02:23 GMT OK, I finally found the words about using double slashes in front of parentheses. So, now, why won`t the following regular expression pattern compile?
private static final Pattern java_proc = Pattern.compile("\\\\.+\ \Process\\(java\\)\\");
The error says:
java.lang.ExceptionInInitializerError Caused by: java.util.regex.PatternSyntaxException: Unknown character property name {r} near index 6 \\.+\Process\(java\)\ ^
This does not make sense to me.
I`m trying to match text of the form (example):
\\GOLLY\Process(java)\% Processor Time
Thanks, Alan
Joshua Cranmer - 04 Jul 2008 02:31 GMT > OK, I finally found the words about using double slashes in front of > parentheses. So, now, why won`t the following regular expression [quoted text clipped - 10 lines] > \\.+\Process\(java\)\ > ^ This is what the regex is seeing. Don't forget that `\' is also a metacharacter in regexes. So to match a '\' in regex requires you to use '\\\\', which causes the regex to see '\\', which is what it uses to match as a '\'. So the regex you're probably trying to compile: "\\\\{2}.+\\\\Process\\(java\\)\\\\" (The {2} is so that you don't have to type in 8 slashes)
 Signature Beware of bugs in the above code; I have only proved it correct, not tried it. -- Donald E. Knuth
James - 04 Jul 2008 02:44 GMT Thank you.
I have one last remaining problem. The full data I`m working with, in CSV format, looks like this:
"(PDH-CSV 4.0) (Eastern Daylight Time)(240)","\\GOLLY\Memory\% Committed Bytes In Use","\\GOLLY\Process(java)\% Processor Time"
I want to match on
\\GOLLY\Process(java)\
so I can replace it.
The regular expression
\\\\{2}.+\\\\Process\\(java\\).
matches, but it matches too much of it:
\\GOLLY\Memory\% Committed Bytes In Use","\\GOLLY\Process(java)\
How can I get it to only match the part I want?
Thanks again, Alan
Joshua Cranmer - 04 Jul 2008 02:52 GMT > The regular expression > > \\\\{2}.+\\\\Process\\(java\\). > > matches, but it matches too much of it: In that case, you probably want this regex: \\\\{2}[^\\\\]+\\\\Process\\(java\\)
 Signature Beware of bugs in the above code; I have only proved it correct, not tried it. -- Donald E. Knuth
shakah - 04 Jul 2008 13:04 GMT > > The regular expression > [quoted text clipped - 8 lines] > Beware of bugs in the above code; I have only proved it correct, not > tried it. -- Donald E. Knuth FWIW, you could avoid a little of the backslash escape mess by using single-char character classes, e.g.: Pattern.compile("[\\]{2}[^\\]+[\\]Process[(]java[)]") ; // ...outside of a Java string that'd be [\]{2}[^\]+ [\]Process[(]java[)]
Mark Space - 04 Jul 2008 19:36 GMT >>> The regular expression >>> \\\\{2}.+\\\\Process\\(java\\). [quoted text clipped - 12 lines] > // ...outside of a Java string that'd be [\]{2}[^\]+ > [\]Process[(]java[)] You also might get rid of some of those backslashes by substituting another character, then using replace() on the string before compiling it.
final static String PATTERN = "``{2}.+``Process`(java`)";
String myRegex = PATTERN.replace("`", "\\" ); System.out.println( myRegex );
Result:
\\{2}.+\\Process\(java\)
It just makes things more readable. Using `, or %, or # in a string, then replace that character with \'s before compiling it as a regex can save your eyes.
Incidentally, I wonder if Sun could be convinced to add this themselves. Maybe add a new operator/keyword altogether. Like # introduces new keywords or operators. It's followed by the keyword or operator. This just allows Sun to make new keywords or operators, with out breaking any existing code. So #s might give us new string constatns. Let's say ' then means like a Unix shell string, where escaping is ignored.
String regex = #s'\\{2}.+\\Process\(java\)';
Would give that literal string, without the need to escape the backslashes. Easier for regex at least. Other types of flags besides ' could be introduced too. `,$,@,%,= might do the same thing, just use a different character as a string terminator, in case you want a ' to be part of the string. """ might introduce a "here-is" operator. Etc.
Just thinking out loud....
Roedy Green - 04 Jul 2008 19:50 GMT On Fri, 04 Jul 2008 11:36:12 -0700, Mark Space <markspace@sbc.global.net> wrote, quoted or indirectly quoted someone who said :
>You also might get rid of some of those backslashes by substituting >another character, then using replace() on the string before compiling it. Other ideas:
1. Use Quoter to insert \ quoting, both for regex and Java strings. see http://mindprod.com/applet/quoter.html
2. implement one or more of my regex student projects http://mindprod.com/project/regexutility.html http://mindprod.com/project/regexcomposer.html http://mindprod.com/project/regexdebugger.html http://mindprod.com/project/regexproofreader.html
3. use \Q ... \E
 Signature
Roedy Green Canadian Mind Products The Java Glossary http://mindprod.com
Mark Space - 04 Jul 2008 20:05 GMT > 3. use \Q ... \E OK, that's cool. It only works with regex, but it's darn handy for them. Thanks!
James - 05 Jul 2008 20:48 GMT shakah,
The statement
Pattern JAVA_PROC = Pattern.compile("[\\]{2}[^\\]+[\ \]Process[(]java[)]");
compiles but raises an exception there:
run: Exception in thread "main" java.util.regex.PatternSyntaxException: Unclosed character class near index 30 [\]{2}[^\]+[\]Process[(]java[)] ^
All: Thank you for your suggestions.
Roedy Green - 05 Jul 2008 21:31 GMT On Sat, 5 Jul 2008 12:48:44 -0700 (PDT), James <jalanthomas@verizon.net> wrote, quoted or indirectly quoted someone who said :
>[\]{2}[^\]+[\]Process[(]java[)] > ^ () both need escapes. If that is a Java literal, you also need to escape \ both for Java and for regex.
see http://mindprod.com/jgloss/regex.html#QUOTING
 Signature
Roedy Green Canadian Mind Products The Java Glossary http://mindprod.com
Joshua Cranmer - 05 Jul 2008 21:34 GMT > Exception in thread "main" java.util.regex.PatternSyntaxException: > Unclosed character class near index 30 > [\]{2}[^\]+[\]Process[(]java[)] You have to quote the slashes here still since the slashes are currently quoting the close of the character class expression.
 Signature Beware of bugs in the above code; I have only proved it correct, not tried it. -- Donald E. Knuth
Stefan Ram - 04 Jul 2008 02:24 GMT >private static final Pattern java_proc = Pattern.compile("\(java\)"); private static final Pattern java_proc = Pattern.compile("\\(java\\)");
Arved Sandstrom - 04 Jul 2008 03:22 GMT > I`m trying to use regex to match/replace a word in parentheses. > The regular expression [quoted text clipped - 15 lines] > > What am I doing wrong? Thanks, Alan Double backslash your pattern: \\(java)\\
AHS
Roedy Green - 04 Jul 2008 05:23 GMT On Thu, 3 Jul 2008 18:12:55 -0700 (PDT), James <jalanthomas@verizon.net> wrote, quoted or indirectly quoted someone who said :
> private static final Pattern java_proc = Pattern.compile("\(java >\)"); It gets complicated because you have both Java and regex escape quoting.
See http://mindprod.com/jgloss/regex.html#QUOTING
 Signature Roedy Green Canadian Mind Products The Java Glossary http://mindprod.com
Roedy Green - 06 Jul 2008 07:59 GMT On Thu, 3 Jul 2008 18:12:55 -0700 (PDT), James <jalanthomas@verizon.net> wrote, quoted or indirectly quoted someone who said :
> I`m trying to use regex to match/replace a word in parentheses. >The regular expression An aside, you can't use a regex to tell if ( ) are nested and balanced correctly to arbitrary depth.
For that you need a parser.
See http://mindprod.com/jgloss/parser.html
 Signature
Roedy Green Canadian Mind Products The Java Glossary http://mindprod.com
Free MagazinesGet these publications absolutely FREE for up to 12 months. There are no hidden fees and no obligation. Simply choose a title, complete the application form and submit it. Read more ...
|
|
|