Java Forum / General / August 2007
Regex is correct but java won't parse it ?
News - 14 Aug 2007 00:03 GMT Hello all,
I want to create a logic class to evaluate simple logical epxressions and print their truth table. I am using a regular epxression that looks for a pattern commencing with a char and followed by any number of (operator char) groups, (for the sake of simplicity only the AND operator "&" is included till I get it working properly).
My regex is [a-b]([&][a-b])*. I know the regex is correct because I have tested it using the regular expression demo from www.regular-expressions.info .
Following is my code stripped to the essentials. As it stands this returns a match for even misformed strings and I cannot see why !
import java.util.regex.*; public class Logic { public static void main(String[] args) { StringBuffer strb = new StringBuffer(); for (int i = 0; i < args.length; i++) { strb.append(args[i]); //Add the command line arguments to String Buffer } String str = strb.toString(); //Change to a string so Matcher can use it. String regex = new String("[a-z]([&][a-z])*"); System.out.println(str); //Test print to ensure the string and regex are correct System.out.println(regex); Pattern p = Pattern.compile(regex, Pattern.CASE_INSENSITIVE | Pattern.UNICODE_CASE); Matcher m = p.matcher(regex); if (m.find()) { System.out.println("Matched"); } else { System.out.println("Not Matched"); } } }
Any ideas ? Thanks in advance !
Joshua Cranmer - 14 Aug 2007 00:19 GMT > if (m.find()) { find() returns if there exists a substring that matches the expression. For example, you regex will match "3453457a4234456456" because there is an 'a' in the expression. What you want is match().
 Signature Beware of bugs in the above code; I have only proved it correct, not tried it. -- Donald E. Knuth
Stefan Ram - 14 Aug 2007 00:30 GMT >> if (m.find()) { >find() returns if there exists a substring that matches the expression. >For example, you regex will match "3453457a4234456456" because there is >an 'a' in the expression. What you want is match(). Possibly, you meant »matches()« - there seems to be no »match()« method in the class »Matcher«:
http://download.java.net/jdk7/docs/api/java/util/regex/Matcher.html#matches()
Joshua Cranmer - 14 Aug 2007 00:44 GMT >>> if (m.find()) { >> find() returns if there exists a substring that matches the expression. [quoted text clipped - 5 lines] > > http://download.java.net/jdk7/docs/api/java/util/regex/Matcher.html#matches() Too much JavaScript for me, then.
Alternatively, using the regex "^[a-z]([&+*-][a-z])*$" with find would also work, provided that the string is only one line line long.
Interestingly enough, from the URL you provided, you seem to be using JDK 7. What's different from 1.6 (so far)?
 Signature Beware of bugs in the above code; I have only proved it correct, not tried it. -- Donald E. Knuth
Stefan Ram - 14 Aug 2007 00:54 GMT >Interestingly enough, from the URL you provided, you seem to be using >JDK 7. What's different from 1.6 (so far)? http://tech.puredanger.com/java7#roundup http://tech.puredanger.com/java7
Esmond Pitt - 14 Aug 2007 02:10 GMT > I want to create a logic class to evaluate simple logical epxressions > and print their truth table. I am using a regular epxression that looks > for a pattern commencing with a char and followed by any number of > (operator char) groups, (for the sake of simplicity only the AND > operator "&" is included till I get it working properly). Hold on. The minute you get to handling "|" as well as "&" you will discover that this is not a regular-expression problem, it is a parsing problem. You will need to implement operator precedence, and REs can't do that.
News - 14 Aug 2007 02:51 GMT Hi Esmond, Joshua and Stefan,
Thanks for pointing out to me the difference between .find() and .matches(). It's a big step closer but .matches() returns false unless I replace the regex with the EXACT string I am seaching for, eg "[a-z]([&][a-z])*" is replaced with "p&q" and I search on "p&q". I also tried .LookingAt() but still don't get a match. I alos tried using the escape sequence \\& in the regex but no difference.
Esmond, I will certainly watch out for precedence issues once I get this simple case working ! Thanks again. Herer is my latest.
import java.util.regex.*; public class Logic { public static void main(String[] args) { StringBuffer strb = new StringBuffer(); for (int i = 0; i < args.length; i++) { strb.append(args[i]); //Add the command line arguments to String Buffer } String str = strb.toString(); //Change to a string so Matcher can use it. String regex = new String("[a-z]([&][a-z])*"); System.out.println(str); //Test print to ensure the string and regex are correct System.out.println(regex); Pattern p = Pattern.compile(regex, Pattern.CASE_INSENSITIVE | Pattern.UNICODE_CASE); Matcher m = p.matcher(regex); if (m.matches()) { System.out.println("Matched"); } else { System.out.println("Not Matched"); } } }
which when run with "p&q&n" produces
p&q&n [a-z]([&][a-z])* Not Matched
Any ideas? There's a beer in it!
Wayne
>> I want to create a logic class to evaluate simple logical epxressions and >> print their truth table. I am using a regular epxression that looks for a [quoted text clipped - 6 lines] > problem. You will need to implement operator precedence, and REs can't do > that. Esmond Pitt - 15 Aug 2007 02:56 GMT > Esmond, I will certainly watch out for precedence issues once I get this > simple case working ! Why would you bother to get it working when REs can't do it? You need to build a tokenizer and a parser.
Wayne McDermott - 16 Aug 2007 11:28 GMT >> Esmond, I will certainly watch out for precedence issues once I get this >> simple case working ! > > Why would you bother to get it working when REs can't do it? You need to > build a tokenizer and a parser. Howdy Esmond,
The StringTokenizer documentation actually recommends regular expressions be used instead ! See http://java.sun.com/j2se/1.4.2/docs/api/java/util/StringTokenizer.html
Cheers,
Wayne
Chris Dollin - 16 Aug 2007 12:17 GMT >>> Esmond, I will certainly watch out for precedence issues once I get this >>> simple case working ! [quoted text clipped - 7 lines] > used instead ! See > http://java.sun.com/j2se/1.4.2/docs/api/java/util/StringTokenizer.html Not a StringTokenizer; a tokeniser, aka lexer, aka lexical analyser, that recognises tokens in the language, not just sequences separated by some character.
If you're going to parse logical expressions, you will very soon go past the stage where regular expressions can do the job, since you'll want to tackle operators with different precedences, and brackets. It is DEAD EASY to write a parser for simple expressions once you have the tokens.
[You can use REs to recognise the tokens relatively easily.]
 Signature Chris "wrote one two weeks ago" Dollin
Hewlett-Packard Limited registered no: registered office: Cain Road, Bracknell, Berks RG12 1HN 690597 England
Martin Gregorie - 17 Aug 2007 00:32 GMT >>>> Esmond, I will certainly watch out for precedence issues once I get this >>>> simple case working ! [quoted text clipped - 15 lines] > DEAD EASY to write a parser for simple expressions once you have the > tokens. Its even easier to use something like Coco/R, which takes a single input file and generates a Scanner (tokenizer) and a Parser class from it. Even better, the frameworks for these classes are external text files, so you can modify them. For instance, I needed a Scanner that would accept a string to be processed - there was no constructor that would do that but adding one was simple enough. As you'd hope, the Java version of Coco/R is written in Java.
 Signature martin@ | Martin Gregorie gregorie. | Essex, UK org |
Roedy Green - 17 Aug 2007 01:22 GMT >Why would you bother to get it working when REs can't do it? You need to >build a tokenizer and a parser. see http://mindprod.com/jgloss/parser.html
 Signature Roedy Green Canadian Mind Products The Java Glossary http://mindprod.com
Roedy Green - 14 Aug 2007 02:56 GMT >import java.util.regex.*; >public class Logic { [quoted text clipped - 15 lines] >System.out.println("Not Matched"); } > } I tidied and commented your code. In doing so the primary error jumped out.
import java.util.regex.*; public class Logic { public static void main(String[] args) { final StringBuffer strb = new StringBuffer(); for ( int i = 0; i < args.length; i++ ) { strb.append(args[i]); //Add the command line arguments to StrinhBuffer } final String str = strb.toString(); //Change to a string so Matcher can use it. // look for string of the form ---a&a&b&c--- final String regex = "[a-z]([&][a-z])*"; System.out.println("command line:" + str); //Test print to ensure the string and regex are correct System.out.println("regex:" + regex); final Pattern p = Pattern.compile(regex, Pattern.CASE_INSENSITIVE | Pattern.UNICODE_CASE); // scan command string, not the regex. final Matcher m = p.matcher(str); if ( m.find() ) { System.out.println("Matched"); // add some more printout to see what was matched. final int gc = m.groupCount(); // group 0 is the whole pattern matched, // loops runs from from 0 to gc, not 0 to gc-1 as is traditional. for ( int i=0; i<=gc; i++ ) { System.out.println( i + " : " + m.group( i ) ); } } else { System.out.println("Not Matched"); } } }
 Signature Roedy Green Canadian Mind Products The Java Glossary http://mindprod.com
Roedy Green - 14 Aug 2007 03:06 GMT >My regex is [a-b]([&][a-b])*. I know the regex is correct because I have >tested it using the regular expression demo from if by any chance you are trying the find &xxxx; entities, see http://mindprod.com/products.html#ENTITIES for a canned solution.
 Signature Roedy Green Canadian Mind Products The Java Glossary http://mindprod.com
News - 14 Aug 2007 04:05 GMT Hey Roedy,
There is no ASCII symbol for smacking yourself in the forehead and kicking the cat so all I can say is thanks !!
> >My regex is [a-b]([&][a-b])*. I know the regex is correct because I have >>tested it using the regular expression demo from > > if by any chance you are trying the find &xxxx; entities, see > http://mindprod.com/products.html#ENTITIES for a canned solution. Roedy Green - 14 Aug 2007 09:21 GMT >There is no ASCII symbol for smacking yourself in the forehead and kicking >the cat so all I can say is thanks!! At http://mindprod.com/jgloss/regex.html are some code snippets for doing the usual things with regexes. If you start with them, then modify the code, you will likely avoid errors like the one that threw you.
 Signature Roedy Green Canadian Mind Products The Java Glossary http://mindprod.com
Andrew Thompson - 14 Aug 2007 09:27 GMT ...
>There is no ASCII symbol for smacking yourself in the forehead and kicking >the cat ... Did the cat write the code?
If not, I suggest it more appropriate, if no less violent, to kick the ..entity or being that wrote the code.
 Signature Andrew Thompson http://www.athompson.info/andrew/
Lew - 14 Aug 2007 13:47 GMT News wrote:
> ... >> There is no ASCII symbol for smacking yourself in the forehead and kicking >> the cat ...
> Did the cat write the code? > > If not, I suggest it more appropriate, if no less violent, > to kick the ..entity or being that wrote the code. Maybe they meant "cat" in the beatnik sense, that is, they are going to kick the "cat" that wrote it.
 Signature Lew
bsgama@gmail.com - 24 Aug 2007 15:55 GMT in this line Matcher m = p.matcher(regex);, you shoud pass de str, not the regex!
> Hello all, > [quoted text clipped - 33 lines] > > Any ideas ? Thanks in advance !
Free MagazinesGet these publications absolutely FREE for up to 12 months. There are no hidden fees and no obligation. Simply choose a title, complete the application form and submit it. Read more ...
|
|
|