Home | Contact Us | FAQ | Search & Site Map | Link to Us
Sign In | Join | Other 45 Sites in Network
HomeAnnouncementsWhite Papers
Discussion GroupsFirst AidDatabasesJavaBeansGUIJava 3DVirtual MachineCORBASecurityToolsGeneral
Java DirectoryOpen Source ProjectsSample Book ChaptersUser GroupsWeb Resources
Related Topics
Databases.NETMore Topics ...

Java Forum / General / January 2006

Tip: Looking for answers? Try searching our database.

Boolean query parsing.... what tools, examples, suggestionss

Thread view: 
puzzlecracker - 15 Jan 2006 01:47 GMT
am implementing a fairly simple and  straightforward  text-search (I
display each line that contains required pattern ) that supports
Boolean queries in the following format:

str1 AND str2 NOT str3 -  where not is a unary operation  thus the
following would be equivalent to  str1 AND str ANDNOT str3 by default
unless a user specifies otherwise....

Additionally,  It is left-associative;

I would also like to have parenthesis as well: (str1 AND (str2 OR
str3))

That is it....

Should I write a parser for that followed by walking the AST (abstract
syntax tree) - javacc or antlar? -  or would java regular expressions
suffice? it should  be fast, whereas java regex is known to be slow....

Any suggestions, examples, references would be highly appreciated.

Thx
Luc The Perverse - 15 Jan 2006 02:37 GMT
> str1 AND str2 NOT str3 -  where not is a unary operation  thus the
> following would be equivalent to  str1 AND str ANDNOT str3 by default
> unless a user specifies otherwise....

Either not is a binary operator that means "and not" or it is a unary
operator and the operation you listed is not defined, but rather should be
str1 AND str2 AND NOT str3

--
LTP

:)
puzzlecracker - 15 Jan 2006 04:04 GMT
> > str1 AND str2 NOT str3 -  where not is a unary operation  thus the
> > following would be equivalent to  str1 AND str ANDNOT str3 by default
[quoted text clipped - 8 lines]
>
> :)

Thx.   any  implementable suggestions?
Luc The Perverse - 15 Jan 2006 06:20 GMT
> Thx.   any  implementable suggestions?

Yes.

Parse the string and break it up into operators and test conditions.    Make
the class which holds the condition have a negatable clause, and remove nots
while parsing by just including them in the next condition instance.

In this way you should have

condition operator condition operator . . . operator.

This can be easily tested for.

I'm not entirely convinced that recursion is the best way to handle
parenthesis as I ran into trouble with it trying to make a calculator
program.  I devised a very simple (although perhaps inefficient) method in
which I would search for the first closing parenthesis and then search
backwards till I found an open parenthesis.  This is the first innermost
parenthesis, once it is computed it can be saved as a single operand
(value), or you could allow your operands themselves to be a tree structure,
depending on how much you want to analyze and use them.  Once you have done
this, then just start over.

It depends - do you need a way to store the result, or just need to solve
it.  Recursion is very easy, and I don't remember anymore what trouble I was
having.   So if all you want to do is get the result, making a function
which knows if it is inside a parenthesis is useful.   Make it return
boolean and throw ParseException, and accept the equation from the current
position and tell it if it is inside a parenthesis (so it will know that
hitting an ending parenthesis is not a parse error, and that it needs to
exit)  If it hits an opening parenthesis, then it will recursively call
itself.   The only trick will be remembering where in the string you are.
You could find a way to have the recursive call pass back how many
chatacters were advanced during the operation.   The calling parent
recursive function should "own" the child, so the child shouldn't be making
any changes to the parent's data - as such it would be a violation of
accepted OOP to allow the child to advance a string counter owned or shared
by the parent, even if you know it will work.   This is of course my
opinion.

--
LTP

:)
puzzlecracker - 18 Jan 2006 03:52 GMT
> > Thx.   any  implementable suggestions?
>
[quoted text clipped - 7 lines]
>
> condition operator condition operator . . . operator.
I am not sure what you mean by operator...please explain

> This can be easily tested for.
>
[quoted text clipped - 7 lines]
> depending on how much you want to analyze and use them.  Once you have done
> this, then just start over.

How do you create a tree... I mean what data structure should be used
,advisable?

> It depends - do you need a way to store the result, or just need to solve
> it.  Recursion is very easy, and I don't remember anymore what trouble I was
[quoted text clipped - 3 lines]
> position and tell it if it is inside a parenthesis (so it will know that
> hitting an ending parenthesis is not a parse error, and that it needs to

what are the recursive solutions available?

thx
> exit)  If it hits an opening parenthesis, then it will recursively call
> itself.   The only trick will be remembering where in the string you are.
[quoted text clipped - 10 lines]
>
> :)
Luc The Perverse - 18 Jan 2006 04:02 GMT
>> condition operator condition operator . . . operator.
> I am not sure what you mean by operator...please explain

An operator is like AND, OR, XOR, NAND

*snip*
> How do you create a tree... I mean what data structure should be used
> ,advisable?
*snip*
> what are the recursive solutions available?
>
> thx

Maybe we should start from the beginning? What experience do you have
programming?

Writing any kind of a parser should not be an introductory learning
assignment - unless that is truly what interests you.

Signature

LTP

:)
puzzlecracker - 18 Jan 2006 04:36 GMT
> >> condition operator condition operator . . . operator.
> > I am not sure what you mean by operator...please explain
[quoted text clipped - 19 lines]
>
> :)

:) working professional with 5 years of software dev....  and 0 in
compiler creation!
puzzlecracker - 18 Jan 2006 04:37 GMT
> >> condition operator condition operator . . . operator.
> > I am not sure what you mean by operator...please explain
[quoted text clipped - 17 lines]
> --
> LTP

then what is the difference between operator and condition in your
context?

thanks
Luc The Perverse - 19 Jan 2006 01:01 GMT
>> >> condition operator condition operator . . . operator.
>> > I am not sure what you mean by operator...please explain
[quoted text clipped - 22 lines]
>
> thanks

An operator is any "+ - ! & && AND XOR " etc  in your example it is AND OR
etc.

A condition is something which is true or false.   Such as "true" "1==X"
"b!=c"  etc.

--
LTP

:)
puzzlecracker - 19 Jan 2006 02:11 GMT
---In this way you should have

--condition operator condition operator . . . operator.

I am not sure how this is possible then..... doesn't make much
sense...pls explain. thx
Luc The Perverse - 19 Jan 2006 02:53 GMT
> ---In this way you should have
>
> --condition operator condition operator . . . operator.
>
> I am not sure how this is possible then..... doesn't make much
> sense...pls explain. thx

Please quote what you are replying to.

You should never end in an operator.

Maybe you should try to code it, and then when you run into a specific
problem you can ask us.

Make a base class (can be empty) called Operand

Derive classes Operator and Condition.  Don't include NOT in your first
draft, also don't include parenthesis.  Don't attempt to test actual
conditions, just use TRUE and FALSE.

First do a sanity check, to make sure you start with a condition and you
alternate condition and operator.  Make sure you end with a condition.

In first draft, don't try to parse complex expressions - just one character
for each operand.

T&F|T

Your set will consist of T for true, F for fale & for AND and | for or.

Next you can add whatever you feel like next.   Make the &'s into ANDs, make
it space tolerant.   Add Parenthesis.

Seperately develop your condition tests.  I'm not sure what your conditions
are, but if you have variables 1 through 10 stored in variable names $1 $2
$3 . . .  $10  You could make it something like this

$1 > $3    Your legal variables are $[1 . . 10] and your legal operators for
the expressions are = (or ==) > < <= >=  <> (or !=)

When I haved developped very simple scripting languages (the closest thing I
can relate to through experience) I find it useful to use special characters
to identify literals, variables.  For instance a dollar sign for any kind of
variable, a pound sign for any kind of number.   Also I like to make all my
operators single characters, but this is certainly not a requisite.  But it
makes it SO much easier to parse!   Example

$1<#1.0|$1=#1.0

If you want to keep <= as an operator define a special character for it, and
then just search and replace in string for <= before starting.

The point of this is to be able to immediately identify from a single
character what the current operand is.

--
LTP


Free Magazines

Get these publications absolutely FREE for up to 12 months. There are no hidden fees and no obligation. Simply choose a title, complete the application form and submit it. Read more ...

Oracle MagazineNetwork ComputingComputer WorldBio-IT WorldeWeekInformation WeekInfosecurity
 
Sign In
Join
My Latest Posts
My Monitored Threads
My Blog
My Photo Gallery
My Profile
My Homepage

Start New Thread
Enable EMail Alerts
Rate this Thread



©2008 Advenet LLC   Privacy Policy - Terms of Use
This website includes both content owned or controlled by Advenet as well as content owned or controlled by third parties.