Home | Contact Us | FAQ | Search & Site Map | Link to Us
Sign In | Join | Other 45 Sites in Network
HomeAnnouncementsWhite Papers
Discussion GroupsFirst AidDatabasesJavaBeansGUIJava 3DVirtual MachineCORBASecurityToolsGeneral
Java DirectoryOpen Source ProjectsSample Book ChaptersUser GroupsWeb Resources
Related Topics
Databases.NETMore Topics ...

Java Forum / General / August 2007

Tip: Looking for answers? Try searching our database.

Regex is correct but java won't parse it ?

Thread view: 
News - 14 Aug 2007 00:03 GMT
Hello all,

I want to create a logic class to evaluate simple logical epxressions and
print their truth table. I am using a regular epxression that looks for a
pattern commencing with a char and followed by any number of (operator char)
groups, (for the sake of simplicity only the AND operator "&" is included
till I get it working properly).

My regex is [a-b]([&][a-b])*.  I know the regex is correct because I have
tested it using the regular expression demo from
www.regular-expressions.info .

Following is my code stripped to the essentials. As it stands this returns a
match for even misformed strings and I cannot see why !

import java.util.regex.*;
public class Logic {
public static void main(String[] args)  {
StringBuffer strb = new StringBuffer();
for (int i = 0; i < args.length; i++)  {
strb.append(args[i]);  //Add the command line arguments to String Buffer  }
String str = strb.toString(); //Change to a string so Matcher can use it.
String regex = new String("[a-z]([&][a-z])*");
System.out.println(str);  //Test print to ensure the string and regex are
correct
System.out.println(regex);
Pattern p = Pattern.compile(regex, Pattern.CASE_INSENSITIVE |
Pattern.UNICODE_CASE);
Matcher m = p.matcher(regex);
if (m.find())  {
System.out.println("Matched");  }
else  {
System.out.println("Not Matched");  }
}
}

Any ideas ? Thanks in advance !
Joshua Cranmer - 14 Aug 2007 00:19 GMT
> if (m.find())  {

find() returns if there exists a substring that matches the expression.
For example, you regex will match "3453457a4234456456" because there is
an 'a' in the expression. What you want is match().

Signature

Beware of bugs in the above code; I have only proved it correct, not
tried it. -- Donald E. Knuth

Stefan Ram - 14 Aug 2007 00:30 GMT
>> if (m.find())  {
>find() returns if there exists a substring that matches the expression.
>For example, you regex will match "3453457a4234456456" because there is
>an 'a' in the expression. What you want is match().

 Possibly, you meant »matches()« - there seems to be no »match()« method
 in the class »Matcher«:

http://download.java.net/jdk7/docs/api/java/util/regex/Matcher.html#matches()
Joshua Cranmer - 14 Aug 2007 00:44 GMT
>>> if (m.find())  {
>> find() returns if there exists a substring that matches the expression.
[quoted text clipped - 5 lines]
>
> http://download.java.net/jdk7/docs/api/java/util/regex/Matcher.html#matches()

Too much JavaScript for me, then.

Alternatively, using the regex "^[a-z]([&+*-][a-z])*$" with find would
also work, provided that the string is only one line line long.

Interestingly enough, from the URL you provided, you seem to be using
JDK 7. What's different from 1.6 (so far)?

Signature

Beware of bugs in the above code; I have only proved it correct, not
tried it. -- Donald E. Knuth

Stefan Ram - 14 Aug 2007 00:54 GMT
>Interestingly enough, from the URL you provided, you seem to be using
>JDK 7. What's different from 1.6 (so far)?

http://tech.puredanger.com/java7#roundup
http://tech.puredanger.com/java7
Esmond Pitt - 14 Aug 2007 02:10 GMT
> I want to create a logic class to evaluate simple logical epxressions
> and print their truth table. I am using a regular epxression that looks
> for a pattern commencing with a char and followed by any number of
> (operator char) groups, (for the sake of simplicity only the AND
> operator "&" is included till I get it working properly).

Hold on. The minute you get to handling "|" as well as "&" you will
discover that this is not a regular-expression problem, it is a parsing
problem. You will need to implement operator precedence, and REs can't
do that.
News - 14 Aug 2007 02:51 GMT
Hi Esmond, Joshua and Stefan,

Thanks for pointing out to me the difference between .find() and .matches().
It's a big step closer but .matches() returns false unless I replace the
regex with the EXACT string I am seaching for, eg "[a-z]([&][a-z])*" is
replaced with "p&q" and I search on "p&q". I also tried .LookingAt() but
still don't get a match. I alos tried using the escape sequence \\& in the
regex but no difference.

Esmond, I will certainly watch out for precedence issues once I get this
simple case working ! Thanks again. Herer is my latest.

import java.util.regex.*;
public class Logic {
public static void main(String[] args)  {
 StringBuffer strb = new StringBuffer();
 for (int i = 0; i < args.length; i++)  {
  strb.append(args[i]);  //Add the command line arguments to String Buffer
 }
 String str = strb.toString(); //Change to a string so Matcher can use it.
 String regex = new String("[a-z]([&][a-z])*");
 System.out.println(str);  //Test print to ensure the string and regex are
correct
 System.out.println(regex);
 Pattern p = Pattern.compile(regex, Pattern.CASE_INSENSITIVE |
Pattern.UNICODE_CASE);
 Matcher m = p.matcher(regex);
 if (m.matches())
 {
  System.out.println("Matched");
 }
 else
 {
  System.out.println("Not Matched");
 } } }

which when run with "p&q&n" produces

p&q&n
[a-z]([&][a-z])*
Not Matched

Any ideas? There's a beer in it!

Wayne

>> I want to create a logic class to evaluate simple logical epxressions and
>> print their truth table. I am using a regular epxression that looks for a
[quoted text clipped - 6 lines]
> problem. You will need to implement operator precedence, and REs can't do
> that.
Esmond Pitt - 15 Aug 2007 02:56 GMT
> Esmond, I will certainly watch out for precedence issues once I get this
> simple case working !

Why would you bother to get it working when REs can't do it? You need to
build a tokenizer and a parser.
Wayne McDermott - 16 Aug 2007 11:28 GMT
>> Esmond, I will certainly watch out for precedence issues once I get this
>> simple case working !
>
> Why would you bother to get it working when REs can't do it? You need to
> build a tokenizer and a parser.

Howdy Esmond,

The StringTokenizer documentation actually recommends regular expressions be
used instead ! See
http://java.sun.com/j2se/1.4.2/docs/api/java/util/StringTokenizer.html

Cheers,

Wayne
Chris Dollin - 16 Aug 2007 12:17 GMT
>>> Esmond, I will certainly watch out for precedence issues once I get this
>>> simple case working !
[quoted text clipped - 7 lines]
> used instead ! See
> http://java.sun.com/j2se/1.4.2/docs/api/java/util/StringTokenizer.html

Not a StringTokenizer; a tokeniser, aka lexer, aka lexical analyser, that
recognises tokens in the language, not just sequences separated by some
character.

If you're going to parse logical expressions, you will very soon go past
the stage where regular expressions can do the job, since you'll want
to tackle operators with different precedences, and brackets. It is
DEAD EASY to write a parser for simple expressions once you have the
tokens.

[You can use REs to recognise the tokens relatively easily.]

Signature

Chris "wrote one two weeks ago" Dollin

Hewlett-Packard Limited                                          registered no:
registered office: Cain Road, Bracknell, Berks RG12 1HN          690597 England

Martin Gregorie - 17 Aug 2007 00:32 GMT
>>>> Esmond, I will certainly watch out for precedence issues once I get this
>>>> simple case working !
[quoted text clipped - 15 lines]
> DEAD EASY to write a parser for simple expressions once you have the
> tokens.

Its even easier to use something like Coco/R, which takes a single input
file and generates a Scanner (tokenizer) and a Parser class from it.
Even better, the frameworks for these classes are external text files,
so you can modify them. For instance, I needed a Scanner that would
accept a string to be processed - there was no constructor that would do
that but adding one was simple enough. As you'd hope, the Java version
of Coco/R is written in Java.

Signature

martin@   | Martin Gregorie
gregorie. | Essex, UK
org       |

Roedy Green - 17 Aug 2007 01:22 GMT
>Why would you bother to get it working when REs can't do it? You need to
>build a tokenizer and a parser.
see http://mindprod.com/jgloss/parser.html
Signature

Roedy Green Canadian Mind Products
The Java Glossary
http://mindprod.com

Roedy Green - 14 Aug 2007 02:56 GMT
>import java.util.regex.*;
>public class Logic {
[quoted text clipped - 15 lines]
>System.out.println("Not Matched");  }
> }

I tidied and commented your code. In doing so the primary error jumped
out.

import java.util.regex.*;
public class Logic
  {
  public static void main(String[] args)
     {
     final StringBuffer strb = new StringBuffer();
     for ( int i = 0; i < args.length; i++ )
        {
        strb.append(args[i]);  //Add the command line arguments to
StrinhBuffer
        }
     final String str = strb.toString(); //Change to a string so
Matcher can use it.
     // look for string of the form ---a&a&b&c---
     final String regex = "[a-z]([&][a-z])*";
     System.out.println("command line:" + str);  //Test print to
ensure the string and regex are correct
     System.out.println("regex:" + regex);
     final Pattern p = Pattern.compile(regex,
Pattern.CASE_INSENSITIVE |
                                       Pattern.UNICODE_CASE);
     // scan command string, not the regex.
     final Matcher m = p.matcher(str);
     if ( m.find() )
        {
        System.out.println("Matched");
        // add some more printout to see what was matched.
        final int gc = m.groupCount();
        // group 0 is the whole pattern matched,
        // loops runs from from 0 to gc, not 0 to gc-1 as is
traditional.
        for ( int i=0; i<=gc; i++ )
           {
           System.out.println( i + " : " + m.group( i ) );
           }
        }
     else
        {
        System.out.println("Not Matched");
        }
     }
  }

Signature

Roedy Green Canadian Mind Products
The Java Glossary
http://mindprod.com

Roedy Green - 14 Aug 2007 03:06 GMT
>My regex is [a-b]([&][a-b])*.  I know the regex is correct because I have
>tested it using the regular expression demo from

if by any chance you are trying the find &xxxx; entities, see
http://mindprod.com/products.html#ENTITIES for a canned solution.
Signature

Roedy Green Canadian Mind Products
The Java Glossary
http://mindprod.com

News - 14 Aug 2007 04:05 GMT
Hey Roedy,

There is no ASCII symbol for smacking yourself in the forehead and kicking
the cat so all I can say is thanks !!

> >My regex is [a-b]([&][a-b])*.  I know the regex is correct because I have
>>tested it using the regular expression demo from
>
> if by any chance you are trying the find &xxxx; entities, see
> http://mindprod.com/products.html#ENTITIES for a canned solution.
Roedy Green - 14 Aug 2007 09:21 GMT
>There is no ASCII symbol for smacking yourself in the forehead and kicking
>the cat so all I can say is thanks!!
At http://mindprod.com/jgloss/regex.html are some code snippets for
doing the usual things with regexes.  If you start with them, then
modify the code, you will likely avoid errors like the one that threw
you.
Signature

Roedy Green Canadian Mind Products
The Java Glossary
http://mindprod.com

Andrew Thompson - 14 Aug 2007 09:27 GMT
...
>There is no ASCII symbol for smacking yourself in the forehead and kicking
>the cat ...

Did the cat write the code?

If not, I suggest it more appropriate, if no less violent,
to kick the ..entity or being that wrote the code.

Signature

Andrew Thompson
http://www.athompson.info/andrew/

Lew - 14 Aug 2007 13:47 GMT
News wrote:
> ...
>> There is no ASCII symbol for smacking yourself in the forehead and kicking
>> the cat ...

> Did the cat write the code?
>
> If not, I suggest it more appropriate, if no less violent,
> to kick the ..entity or being that wrote the code.

Maybe they meant "cat" in the beatnik sense, that is, they are going to kick
the "cat" that wrote it.

Signature

Lew

bsgama@gmail.com - 24 Aug 2007 15:55 GMT
in this line Matcher m = p.matcher(regex);, you shoud pass de str, not
the regex!

> Hello all,
>
[quoted text clipped - 33 lines]
>
> Any ideas ? Thanks in advance !


Free Magazines

Get these publications absolutely FREE for up to 12 months. There are no hidden fees and no obligation. Simply choose a title, complete the application form and submit it. Read more ...

Oracle MagazineNetwork ComputingComputer WorldBio-IT WorldeWeekInformation WeekInfosecurity
 
Sign In
Join
My Latest Posts
My Monitored Threads
My Blog
My Photo Gallery
My Profile
My Homepage

Start New Thread
Enable EMail Alerts
Rate this Thread



©2009 Advenet LLC   Privacy Policy - Terms of Use
This website includes both content owned or controlled by Advenet as well as content owned or controlled by third parties.