People,
I need a parser for a simple expression
it must support keywords like AND, NOT and OR and "(" and ")" constructions.
I have been looking into ANTLR and JavaCC but am really curious if
anyone can point me to an easier method.
thx,
Ivo.
Roedy Green - 14 Sep 2007 12:16 GMT
>I have been looking into ANTLR and JavaCC but am really curious if
>anyone can point me to an easier method.
JavaCC is much easier than it looks. Search for an example similar to
what you need, and tinker.
http://mindprod.com/jgloss/javacc.html

Signature
Roedy Green Canadian Mind Products
The Java Glossary
http://mindprod.com
Stefan Ram - 14 Sep 2007 14:43 GMT
>I need a parser for a simple expression
(If answering to the following post, one should please not
quote all of it, but only the short parapgrahs one directly
refers to.)
In order to interpret or translate an expression (term), it is
decomposed into lexical units (tokens, words), which then are
used by a parser to build symbols and a structured
representation of the input. This representation then might be
evaluated or translated into some other representation.
The syntactial structuring resembles the rules for the
construction of an expression, which often is given by so-
called "productions" of the EBNF (extended Backus-Nauer-Form)
and which sometimes are left-recursive.
When writing a parser, the left-recursive productions sometimes
are a worry to the author, because it is not obvious how to
avoid an infinite recursion. The solution is to rewrite them as
right-recursive productions.
The addition with a binary infix Operator, for example, is
left associative. However, it is simpler to analyze in a
right-associative manner. Therefore, one analyzes the source
using right-associative rules and then creates a result
using a left-associative interpretation.
A left-associative grammar might be, for example, as follows.
<numeral> ::= '2' | '4' | '5'.
<expression> ::= <numeral> | <expression> '+' <numeral>.
start symbol: <expression>.
To analyze this using a recursive descent parser one
prefers to use the following grammar.
<numeral> ::= '2' | '4' | '5'.
<expression> ::= <numeral>[ '+' <expression> ].
start symbol: <expression>.
This can be written using iteration as follows.
<numeral> ::= '2' | '4' | '5'.
<expression> ::= <numeral>{ '+' <numeral> }.
start symbol: <expression>.
However, the product is created in the sense of the
first grammar. Example code follows.
class Scan
{ static String source = "5+4+2)";
static int pos = 0;
static char get(){ return source.charAt( pos++ ); }}
class Parse
{
static int numeral(){ return Scan.get() - '0'; }
static int expression(){ int result = numeral();
while( '+' == Scan.get() )result += numeral();
return result; }}
public class Start
{
public static void main( final String[] args )
{ System.out.println( Parse.expression() ); }}
To be able to parse expressions with higher
priority, the grammar can be extended.
<numeral> ::= '2' | '4' | '5'.
<product> ::= <numeral> | <product> '*' <numeral>.
<sum> ::= <product> | <sum> '+' <product>.
start symbol: <sum>.
In iterative notation:
<numeral> ::= '2' | '4' | '5'.
<product> ::= <numeral>{ '*' <numeral> }.
<sum> ::= <product>{ '+' <product> }.
start symbol: <sum>.
In Java:
class Scan
{ static String source = "5+4*2)";
static int pos = 0;
static char get( final boolean advance )
{ return source.charAt( advance ? pos++ : pos ); }}
class Parse
{
static int numeral(){ return Scan.get( true ) - '0'; }
static int product(){ int result = numeral();
while( '*' == Scan.get( false )){ Scan.get( true ); result *= numeral(); }
return result; }
static int sum(){ int result = product();
while( '+' == Scan.get( true ))result += product();
return result; }}
public class Start
{
public static void main( final String[] args )
{ System.out.println( Parse.sum() ); }}
Exercises
- What is the output of the above programs?
- Extend the last grammar and the last program so as
to handle subtraction.
- Extend the result of the last exercise in order
to handle division.
- Extend the result of the last exercise so that also
numbers with multiple digits are accepted.
- Extend the result of the last exercise so that also
terms in parentheses are accepted. The input "(2+4)*5)"
should give the result "30".
- Extend the result of the last exercise so that
also a unary minus "-" is recognized.
- Extend the result of the last exercise so that
more operators and functions are recognized.
- Extend the result of the last exercise so that
meaningful error messages are created for all
inputs that to not fulfill the rules of the input
language.
- Extend the result of the last exercise so that the
error messages also show the location where the error
was detected. It should be possible to enter an expression
that spans multiple lines, and an error message should
contain the number of the line where the error was
detected.
See also:
JEP - Java Mathematical Expression Parser
http://www.singularsys.com/jep/
Steven Metsker: Building Parsers with Java.
Addison-Wesley 2001, ISBN 0201719622.
A.W. Appel: Modern Compiler Implementation in Java.
Cambridge University Press 1998, ISBN 0521586542.
derek - 14 Sep 2007 15:01 GMT
have you thought of embedding a scripting language into your application? you would get a complete language instead of just some simple expression support. embedding a language should only take a few lines.
Ivo - 14 Sep 2007 15:14 GMT
> have you thought of embedding a scripting language into your application? you would get a complete language instead of just some simple expression support. embedding a language should only take a few lines.
The company I work for will not allow it.
DSL = allowed as long as the end result is one of the supported
languages (Java or COBOL) haha.
Ivo.
Ivo - 14 Sep 2007 15:03 GMT
> People,
>
[quoted text clipped - 8 lines]
> thx,
> Ivo.
OK here my ANTLR gramar:
===
grammar AndOrNotVoter;
@header {
package test;
}
@lexer::header {package test;}
@members {
/** Map variable name to Integer object holding value */
}
orexpression
: andexpression ('or' andexpression)*
;
andexpression
: notexpression ('and' notexpression)*
;
notexpression
: ('not')? atom
;
atom
: role
| LPAREN orexpression RPAREN
;
role
:'ROLE_LEVEL1'
:'ROLE_LEVEL2'
:'ROLE_LEVEL3'
:'ROLE_LEVEL4'
|'ROLE_EDITOR'
|'ROLE_OWNER'
|'ROLE_ADMIN'
;
LPAREN : '('
;
RPAREN : ')'
;
WS : ( ' '
| '\t'
| '\r'
| '\n'
)+
{ $channel=HIDDEN; }
;
===
I have to evaluate a couple of roles a person can have or may not have
in order to get access to certain data.
So a person that has logged on gets a couple of roles. These roles are
maintained in a class called GrantedAuthorities .
When a certain page is requested I have to check of the granted
Authorities to allow this person access to all or some of the data on
the page.
So there is a custom tag that accepts an expression like;
===example===
(ROLE_OWNER and ROLE_LEVEL4) or ROLE_ADMIN
===/example===
This expression needs to be parsed and compared to the
GrantedAuthorities I get from the logon procedure. I have no idea how to
go on from here... Anyone? I can sure use the help.
It is not like evaluating a mathematical expression where all the data
needed is provided in the expression. In order to give meaning to this
expression I need data from GrantedAuthorities.
How do I go about doing this?
Thanks for your patience
Ivo.
Martin Gregorie - 14 Sep 2007 19:52 GMT
> People,
>
[quoted text clipped - 5 lines]
> I have been looking into ANTLR and JavaCC but am really curious if
> anyone can point me to an easier method.
Have you looked at Coco/R?
http://zoogz/~reference/reference/comp_lang.html

Signature
martin@ | Martin Gregorie
gregorie. | Essex, UK
org |
Ivo - 14 Sep 2007 20:56 GMT
>> People,
>>
[quoted text clipped - 9 lines]
>
> http://zoogz/~reference/reference/comp_lang.html
No I haven't. It seems that ANTLR has more support (BTW your link
doesn't seem to work?)
Martin Gregorie - 14 Sep 2007 21:51 GMT
>>> People,
>>>
[quoted text clipped - 12 lines]
> No I haven't. It seems that ANTLR has more support (BTW your link
> doesn't seem to work?)
Here's a correction. The other was a mis-paste that I failed to catch:
http://www.ssw.uni-linz.ac.at/Research/Projects/Coco/
CoCo/R has flavors for several languages and, when I first used it a
couple of years ago, I don't recall any searches finding ANTLR. In
summary, a single .ATR input file generates a Scanner and a Parser class
that includes custom code from the .ATR file. So much, so standard. An
advantage over some other code generators is that the framework files
for the two classes can be modified fairly easily, e.g. to generate code
to parse a String rather than reading its input from a file (I needed
this ability).
If you're C literate and have used lex and yacc then Coco/R is a
complete doddle to use.

Signature
martin@ | Martin Gregorie
gregorie. | Essex, UK
org |
Ivo - 14 Sep 2007 23:02 GMT
>>>> People,
>>>>
[quoted text clipped - 28 lines]
> If you're C literate and have used lex and yacc then Coco/R is a
> complete doddle to use.
thanks for the advice.
I have use Lex and Yacc a lot.
I am still going to use ANTLR I think because it has the same kind of
lex and yacc like method of working and it now has a cool IDE for testing.
My problem is still not solved though. My grammar works but now I have
to do something with it. My boolean expressions are dependent on
information that has to be provided before parsing is started. Still
looking into that one.
Help is appreciated
cheerz,
Ivo.
Roedy Green - 15 Sep 2007 15:15 GMT
On Fri, 14 Sep 2007 21:51:46 +0100, Martin Gregorie
<martin@see.sig.for.address> wrote, quoted or indirectly quoted
someone who said :
>If you're C literate and have used lex and yacc then Coco/R is a
>complete doddle to use.
That means "easy"?

Signature
Roedy Green Canadian Mind Products
The Java Glossary
http://mindprod.com
Martin Gregorie - 15 Sep 2007 23:31 GMT
> On Fri, 14 Sep 2007 21:51:46 +0100, Martin Gregorie
> <martin@see.sig.for.address> wrote, quoted or indirectly quoted
[quoted text clipped - 4 lines]
>
> That means "easy"?
Yep.

Signature
martin@ | Martin Gregorie
gregorie. | Essex, UK
org |
Arne Vajhøj - 15 Sep 2007 01:08 GMT
> I need a parser for a simple expression
>
[quoted text clipped - 3 lines]
> I have been looking into ANTLR and JavaCC but am really curious if
> anyone can point me to an easier method.
If you need a parser for a specific grammar then use the other
advise you have received.
If you just need some form of expression evaluation then
look at the script language integration in Java 1.6 and
JavaScript's nice eval function.
Arne
Piotr Kobzda - 15 Sep 2007 03:55 GMT
> I need a parser for a simple expression
>
[quoted text clipped - 3 lines]
> I have been looking into ANTLR and JavaCC but am really curious if
> anyone can point me to an easier method.
If strict validation of your expressions is not required (what I think
is your situation), then the above seems not to be very difficult to
implement without any third-party tools. See the code below (not tested
intensively!).
Regardless of what expression parsing/compiling/processing technique
you'll choose, there is also presented the way I think you could use it
in your code (in this scenario your granted Authorities must simply
implement RoleSet interface).
HTH,
piotr
import java.util.HashSet;
import java.util.Set;
import java.util.StringTokenizer;
public class Roles {
public interface RoleSet {
boolean contains(String role);
}
public interface CompiledExpression {
boolean matches(RoleSet roles);
}
public static CompiledExpression compile(String expression) {
return new Compiler(new Tokenizer(expression)).compile();
}
static class Tokenizer {
StringTokenizer st;
Tokenizer(String expression) {
st = new StringTokenizer(expression, " ()", true);
}
String nextToken() {
while (st.hasMoreTokens()) {
String token = st.nextToken().trim();
if (token.length() > 0) {
return token;
}
}
return null;
}
}
static class Compiler {
Tokenizer tokenizer;
public Compiler(Tokenizer tokenizer) {
this.tokenizer = tokenizer;
}
public CompiledExpression compile() {
return compile(compile(null));
}
CompiledExpression compile(final CompiledExpression left) {
final String token = tokenizer.nextToken();
if(token == null) {
return left;
}
abstract class NextExpression implements CompiledExpression {
CompiledExpression right;
NextExpression(CompiledExpression right) {
this.right = right;
}
}
if(token.equalsIgnoreCase("not")) {
return compile(new NextExpression(compile(null)) {
public boolean matches(RoleSet roles) {
return ! right.matches(roles);
}
});
}
if(token.equalsIgnoreCase("or")) {
return new NextExpression(compile(compile(null))) {
public boolean matches(RoleSet roles) {
return left.matches(roles) || right.matches(roles);
}
};
}
if(token.equalsIgnoreCase("and")) {
return compile(new NextExpression(compile(null)) {
public boolean matches(RoleSet roles) {
return left.matches(roles) && right.matches(roles);
}
});
}
if(token.equals("(")) {
return compile(new NextExpression(compile(null)) {
public boolean matches(RoleSet roles) {
return right.matches(roles);
}
});
}
if(token.equals(")")) {
return left;
}
return new CompiledExpression() {
public boolean matches(RoleSet roles) {
return roles.contains(token);
}
};
}
}
public static void main(String[] args) throws Exception {
final String expression = "a or b and not c";
final Roles.CompiledExpression ce = Roles.compile(expression);
System.out.println("expression: " + expression);
new RoleSet() {
Set<String> roles = new HashSet<String>();
public boolean contains(String role) {
boolean enabled = roles.contains(role);
System.out.println(
"role " + role + " " + (enabled ? "set" : "not set"));
return enabled;
}
void check() {
System.out.println(roles + " " +
(ce.matches(this) ? "matches" : "don't match"));
}
{ // test...
check();
roles.add("b");
check();
roles.add("c");
check();
roles.add("a");
check();
}
};
}
}
Ivo - 15 Sep 2007 14:20 GMT
>> I need a parser for a simple expression
>>
[quoted text clipped - 157 lines]
> }
> }
Piotr,
You Rock!
Thanks. This was what I am looking for.
I need to refine it a bit but It seems like you made my day.
I have an implementation in ANTLR almost finished and I will probably
finish it just for the practice and the experience but I will probably
use a variant of your version. It eliminates the need of 3rd party stuff
and because this role checking will be done a lot (really a lot) I like
this "lightweight" solution very much.
I had looked into StringTokenizer a bit but it seems not deep enough.
Thanks again.
Ivo.