Home | Contact Us | FAQ | Search & Site Map | Link to Us
Sign In | Join | Other 45 Sites in Network
HomeAnnouncementsWhite Papers
Discussion GroupsFirst AidDatabasesJavaBeansGUIJava 3DVirtual MachineCORBASecurityToolsGeneral
Java DirectoryOpen Source ProjectsSample Book ChaptersUser GroupsWeb Resources
Related Topics
Databases.NETMore Topics ...

Java Forum / General / October 2006

Tip: Looking for answers? Try searching our database.

Regular Expression Analyzer

Thread view: 
radimpe@gmail.com - 17 Oct 2006 23:22 GMT
Hi,

Hope someone can help me with a pet project. I have a huge set of
search terms based on historic searches. I'm working on a search engine
that can do various different types of searches. What I am trying to do
is to see if the historic search terms can determine (through a regular
expression match) which one of the search methods to use.

For example Search type one is an exact match search. I would like to
do an exact match search if a user types in an ISBN Book number (eg.
978-0-306-40615-7). If a user types in a book title "Moby Dick" I would
like to do a wildcard search etc.

Can anybody help me write a Java Application that will process the
historic terms to find out if there is a reliable RegEx for each search
type. Even just a pointer in the right direction would be useful.

Rad.
hiwa - 17 Oct 2006 23:45 GMT
> Hi,
>
[quoted text clipped - 14 lines]
>
> Rad.
Be more specific and I could give advice.
TechBookReport - 18 Oct 2006 11:04 GMT
> Hi,
>
[quoted text clipped - 14 lines]
>
> Rad.

Here's a quick pointer: http://javaalmanac.com/egs/java.util.regex/pkg.html

Signature

TechBookReport Java http://www.techbookreport.com/JavaIndex.html

radimpe@gmail.com - 18 Oct 2006 23:35 GMT
So based on the javaalmanac (great starting point, thanks) examples
I've written the following 'test'

package com.regexp;

import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class Parsing {

    public static void main(String[] args) {
          // Parse a line with and's and or's
       String inputStr = "11N2222 22NB3333";
       String patternStr = "\\d{2}\\D{1,2}\\d{4,5}";
       Pattern pattern = Pattern.compile(patternStr);
       Matcher matcher = pattern.matcher(inputStr);

       boolean matchFound = matcher.find();
           String match = matcher.group();
           matchFound = matcher.find();            // true

       match = matcher.group();

    }

}

Which does match the patterns I know about. What my ultimate aim is, is
to develop a parser that will, given a set of input, determine what the
"patternStr" should be. For example the manufacturer part number on an
electronic distributor's website. It should be possible to work a
number of patterns that can find 90% of the makeup of the manufacturer
part number. (I don't expect there to be just one)

Hope this makes much more sense this time around

Rad.
Robert Klemme - 19 Oct 2006 09:15 GMT
> So based on the javaalmanac (great starting point, thanks) examples
> I've written the following 'test'

<snip/>

> Which does match the patterns I know about. What my ultimate aim is, is
> to develop a parser that will, given a set of input, determine what the
[quoted text clipped - 4 lines]
>
> Hope this makes much more sense this time around

Yes, I think so.  Thanks for the clarification!  I will try to rephrase
in my own words just to make sure I understood you correctly.  So you
are basically searching for an algorithm that with a given set of inputs
(historic searches) partitions that set into like items (i.e. searches
that share some common pattern) and want to derive this common pattern
from each set.  Then you want to use that to find out in which of these
sets of like searches a new search that the user enters belongs and
facilitate that knowledge to optimize the current search.

I think, depending on the inputs (historic searches) this could be a
quite difficult task and I am sorry to say that I do not have a simple
answer.  Maybe you should look into text retrieval systems.  They might
contain things like that.

Kind regards

    robert
radimpe@gmail.com - 19 Oct 2006 19:30 GMT
> > So based on the javaalmanac (great starting point, thanks) examples
> > I've written the following 'test'
[quoted text clipped - 27 lines]
>
>     robert

I thought it might be the case... Pet projects always sound so simple
when you start them...
Robert Klemme - 18 Oct 2006 20:39 GMT
> Hope someone can help me with a pet project. I have a huge set of
> search terms based on historic searches. I'm working on a search engine
[quoted text clipped - 6 lines]
> 978-0-306-40615-7). If a user types in a book title "Moby Dick" I would
> like to do a wildcard search etc.

How is this connected to the historic searches?  First you say you want
to use the RE against historic searches but now it seems you want to use
it with current searches.

> Can anybody help me write a Java Application that will process the
> historic terms to find out if there is a reliable RegEx for each search
> type. Even just a pointer in the right direction would be useful.

http://www.amazon.com/l/dp/0596528124/

    robert
radimpe@gmail.com - 18 Oct 2006 22:31 GMT
> How is this connected to the historic searches?  First you say you want
> to use the RE against historic searches but now it seems you want to use
> it with current searches.

Appologies. It is linked in as much that I could use them to form a
'pattern' of how people are using the search. I essentially have two
sets of data. Firstly there is the base data that I want people to
search on and secondly there is the historic search terms of how people
have tried to search. My intention is to be able to form a good enough
pattern from both sets that I can 'direct' the historic search terms to
interrogate the data using the right 'type' of search.


Thanks for the links so far. I've got some reading to catch up with.
IchBin - 19 Oct 2006 09:08 GMT
> Hi,
>
[quoted text clipped - 14 lines]
>
> Rad.

Eclipse has a plugin that does regular expression analysis. I installed
because I am coding PHP now and that lang seems to use regx much heavier
than Java. Anyway, the plugin is called "QuickREx"

Eclipse update url:
http://www.bastian-bergerhoff.com/eclipse/features.

Eclipse Plugins Site:
http://eclipse-plugins.2y.net/eclipse/plugin_details.jsp?id=964

Can be found here:     
http://www.bastian-bergerhoff.com/eclipse/features/web/QuickREx/toc.html

Signature

Thanks in Advance...                      http://ichbin.9999mb.com
IchBin, Pocono Lake, Pa, USA              http://weconsultants.phpnet.us
__________________________________________________________________________

'If there is one, Knowledge is the "Fountain of Youth"'
-William E. Taylor,  Regular Guy (1952-)

TechBookReport - 19 Oct 2006 10:46 GMT
>> Hi,
>>
[quoted text clipped - 27 lines]
> Can be found here:      
> http://www.bastian-bergerhoff.com/eclipse/features/web/QuickREx/toc.html

Thanks for the pointer to QuickREx. Looks very useful indeed.

Pan

Signature

TechBookReport Java http://www.techbookreport.com/JavaIndex.html

radimpe@gmail.com - 19 Oct 2006 19:31 GMT
Thanks. I'll have a look. It may just come in handy.

> > Hi,
> >
[quoted text clipped - 35 lines]
> 'If there is one, Knowledge is the "Fountain of Youth"'
> -William E. Taylor,  Regular Guy (1952-)


Free Magazines

Get these publications absolutely FREE for up to 12 months. There are no hidden fees and no obligation. Simply choose a title, complete the application form and submit it. Read more ...

Oracle MagazineNetwork ComputingComputer WorldBio-IT WorldeWeekInformation WeekInfosecurity
 
Sign In
Join
My Latest Posts
My Monitored Threads
My Blog
My Photo Gallery
My Profile
My Homepage

Start New Thread
Enable EMail Alerts
Rate this Thread



©2008 Advenet LLC   Privacy Policy - Terms of Use
This website includes both content owned or controlled by Advenet as well as content owned or controlled by third parties.