Hi,
Hope someone can help me with a pet project. I have a huge set of
search terms based on historic searches. I'm working on a search engine
that can do various different types of searches. What I am trying to do
is to see if the historic search terms can determine (through a regular
expression match) which one of the search methods to use.
For example Search type one is an exact match search. I would like to
do an exact match search if a user types in an ISBN Book number (eg.
978-0-306-40615-7). If a user types in a book title "Moby Dick" I would
like to do a wildcard search etc.
Can anybody help me write a Java Application that will process the
historic terms to find out if there is a reliable RegEx for each search
type. Even just a pointer in the right direction would be useful.
Rad.
hiwa - 17 Oct 2006 23:45 GMT
> Hi,
>
[quoted text clipped - 14 lines]
>
> Rad.
Be more specific and I could give advice.
TechBookReport - 18 Oct 2006 11:04 GMT
> Hi,
>
[quoted text clipped - 14 lines]
>
> Rad.
Here's a quick pointer: http://javaalmanac.com/egs/java.util.regex/pkg.html

Signature
TechBookReport Java http://www.techbookreport.com/JavaIndex.html
radimpe@gmail.com - 18 Oct 2006 23:35 GMT
So based on the javaalmanac (great starting point, thanks) examples
I've written the following 'test'
package com.regexp;
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class Parsing {
public static void main(String[] args) {
// Parse a line with and's and or's
String inputStr = "11N2222 22NB3333";
String patternStr = "\\d{2}\\D{1,2}\\d{4,5}";
Pattern pattern = Pattern.compile(patternStr);
Matcher matcher = pattern.matcher(inputStr);
boolean matchFound = matcher.find();
String match = matcher.group();
matchFound = matcher.find(); // true
match = matcher.group();
}
}
Which does match the patterns I know about. What my ultimate aim is, is
to develop a parser that will, given a set of input, determine what the
"patternStr" should be. For example the manufacturer part number on an
electronic distributor's website. It should be possible to work a
number of patterns that can find 90% of the makeup of the manufacturer
part number. (I don't expect there to be just one)
Hope this makes much more sense this time around
Rad.
Robert Klemme - 19 Oct 2006 09:15 GMT
> So based on the javaalmanac (great starting point, thanks) examples
> I've written the following 'test'
<snip/>
> Which does match the patterns I know about. What my ultimate aim is, is
> to develop a parser that will, given a set of input, determine what the
[quoted text clipped - 4 lines]
>
> Hope this makes much more sense this time around
Yes, I think so. Thanks for the clarification! I will try to rephrase
in my own words just to make sure I understood you correctly. So you
are basically searching for an algorithm that with a given set of inputs
(historic searches) partitions that set into like items (i.e. searches
that share some common pattern) and want to derive this common pattern
from each set. Then you want to use that to find out in which of these
sets of like searches a new search that the user enters belongs and
facilitate that knowledge to optimize the current search.
I think, depending on the inputs (historic searches) this could be a
quite difficult task and I am sorry to say that I do not have a simple
answer. Maybe you should look into text retrieval systems. They might
contain things like that.
Kind regards
robert
radimpe@gmail.com - 19 Oct 2006 19:30 GMT
> > So based on the javaalmanac (great starting point, thanks) examples
> > I've written the following 'test'
[quoted text clipped - 27 lines]
>
> robert
I thought it might be the case... Pet projects always sound so simple
when you start them...
Robert Klemme - 18 Oct 2006 20:39 GMT
> Hope someone can help me with a pet project. I have a huge set of
> search terms based on historic searches. I'm working on a search engine
[quoted text clipped - 6 lines]
> 978-0-306-40615-7). If a user types in a book title "Moby Dick" I would
> like to do a wildcard search etc.
How is this connected to the historic searches? First you say you want
to use the RE against historic searches but now it seems you want to use
it with current searches.
> Can anybody help me write a Java Application that will process the
> historic terms to find out if there is a reliable RegEx for each search
> type. Even just a pointer in the right direction would be useful.
http://www.amazon.com/l/dp/0596528124/
robert
radimpe@gmail.com - 18 Oct 2006 22:31 GMT
> How is this connected to the historic searches? First you say you want
> to use the RE against historic searches but now it seems you want to use
> it with current searches.
Appologies. It is linked in as much that I could use them to form a
'pattern' of how people are using the search. I essentially have two
sets of data. Firstly there is the base data that I want people to
search on and secondly there is the historic search terms of how people
have tried to search. My intention is to be able to form a good enough
pattern from both sets that I can 'direct' the historic search terms to
interrogate the data using the right 'type' of search.
Thanks for the links so far. I've got some reading to catch up with.
IchBin - 19 Oct 2006 09:08 GMT
> Hi,
>
[quoted text clipped - 14 lines]
>
> Rad.
Eclipse has a plugin that does regular expression analysis. I installed
because I am coding PHP now and that lang seems to use regx much heavier
than Java. Anyway, the plugin is called "QuickREx"
Eclipse update url:
http://www.bastian-bergerhoff.com/eclipse/features.
Eclipse Plugins Site:
http://eclipse-plugins.2y.net/eclipse/plugin_details.jsp?id=964
Can be found here:
http://www.bastian-bergerhoff.com/eclipse/features/web/QuickREx/toc.html

Signature
Thanks in Advance... http://ichbin.9999mb.com
IchBin, Pocono Lake, Pa, USA http://weconsultants.phpnet.us
__________________________________________________________________________
'If there is one, Knowledge is the "Fountain of Youth"'
-William E. Taylor, Regular Guy (1952-)
TechBookReport - 19 Oct 2006 10:46 GMT
>> Hi,
>>
[quoted text clipped - 27 lines]
> Can be found here:
> http://www.bastian-bergerhoff.com/eclipse/features/web/QuickREx/toc.html
Thanks for the pointer to QuickREx. Looks very useful indeed.
Pan

Signature
TechBookReport Java http://www.techbookreport.com/JavaIndex.html
radimpe@gmail.com - 19 Oct 2006 19:31 GMT
Thanks. I'll have a look. It may just come in handy.
> > Hi,
> >
[quoted text clipped - 35 lines]
> 'If there is one, Knowledge is the "Fountain of Youth"'
> -William E. Taylor, Regular Guy (1952-)