Home | Contact Us | FAQ | Search & Site Map | Link to Us
Sign In | Join | Other 45 Sites in Network
HomeAnnouncementsWhite Papers
Discussion GroupsFirst AidDatabasesJavaBeansGUIJava 3DVirtual MachineCORBASecurityToolsGeneral
Java DirectoryOpen Source ProjectsSample Book ChaptersUser GroupsWeb Resources
Related Topics
Databases.NETMore Topics ...

Java Forum / General / February 2007

Tip: Looking for answers? Try searching our database.

Most efficient algorithm for matching

Thread view: 
Timasmith - 13 Feb 2007 02:27 GMT
Suppose I have a list of 10,000 items that you can order, each order
item has an ID and a Name.

If you type in a string and I want to find every item that CONTAINS
that string, what is the most efficient way?

I happen to be using java but I am open to general solutions.

Obviously I could just loop through the list and do a match on each
string but I cant help thinking one could be more efficient thatn
O(n).

A traditional database btree index doesnt help, but I think search
engines must use something fast.

thanks

Tim
Knute Johnson - 13 Feb 2007 02:49 GMT
> Suppose I have a list of 10,000 items that you can order, each order
> item has an ID and a Name.
[quoted text clipped - 14 lines]
>
> Tim

If your data isn't sorted just looping through it is the fastest way.

Signature

Knute Johnson
email s/nospam/knute/

Timasmith - 13 Feb 2007 03:25 GMT
On Feb 12, 9:49 pm, Knute Johnson <nos...@rabbitbrush.frazmtn.com>
wrote:
> Timasmithwrote:
> > Suppose I have a list of 10,000 items that you can order, each order
[quoted text clipped - 24 lines]
>
> - Show quoted text -

Maybe.  What if I constrained the search to say that the string must
have a least 2 characters.  So I guess you could take all 10,000
entries, determine all of the sequences and put those into a
hashtable.  So if I search on chair it would find all the indexed
entries with ch in the them, then look through the results.  If <5% of
the entries have ch in them, I would imagine that would help.
Lew - 13 Feb 2007 03:48 GMT
> On Feb 12, 9:49 pm, Knute Johnson <nos...@rabbitbrush.frazmtn.com>
> wrote:
>> Timasmithwrote:
>>> Suppose I have a list of 10,000 items that you can order, each order
>>> item has an ID and a Name.

Maybe some more outré structure like a trie will help:
<http://en.wikipedia.org/wiki/Trie>
in particular a suffix tree:
<http://en.wikipedia.org/wiki/Suffix_tree>

other structures:
<http://en.wikipedia.org/wiki/List_of_data_structures>

- Lew
Eric Sosman - 13 Feb 2007 03:54 GMT
> On Feb 12, 9:49 pm, Knute Johnson <nos...@rabbitbrush.frazmtn.com>
> wrote:
[quoted text clipped - 26 lines]
> entries with ch in the them, then look through the results.  If <5% of
> the entries have ch in them, I would imagine that would help.

    If you're doing just a few searches, a straight-line pass
over all the data is probably best.

    If you're doing many searches, it may make sense to invest
a fairly large amount of work in initialization to make the
searches themselves run faster.  Here's one way:

    Map<String,Set<Item>> map = new HashMap<String,Set<Item>>();
    for (Item item : itemCollection) {
       for (String key : item.setOfAllKeys()) {
           Set<Item> set = map.get(key);
           if (set == null) {
               set = new Set<Item>();
               map.put(key, set);
           }
           set.add(item);
       }
    }

That is, you have a map that associates each substring with a
set of all the Items having that substring.  Each Item appears
in as many sets as it has distinct substrings.

    It may be desirable to endow setOfAllKeys() with a little
more intelligence than a simple brute-force decomposition would
exhibit.  If one of your Items is "LEFT HANDED MONKEY WRENCH"
there's probably not much to be gained by indexing it under "FT"
and "D M" and "AND" -- yet indexing under "WRENCHES" might be
a good idea, even though it's not a substring of the name.

    See also "The Art of Computer Programming, Volume III:
Sorting and Searching" by D.E. Knuth.  IIRC, one approach he
describes for this kind of search is to start with "LEFT HANDED
MONKEY WRENCH" and enter it in all of the lists LE, EF, FT, ...
DE, ED.  Then given the search key "DED MONK" you would form
the intersection of the lists for DE, ED, ..., NK (keeping the
lists sorted would be helpful, or you could use bit vectors)
and finish by inspecting only those Items in the intersection.

Signature

Eric Sosman
esosman@acm-dot-org.invalid

Knute Johnson - 13 Feb 2007 05:38 GMT
> On Feb 12, 9:49 pm, Knute Johnson <nos...@rabbitbrush.frazmtn.com>
> wrote:
[quoted text clipped - 26 lines]
> entries with ch in the them, then look through the results.  If <5% of
> the entries have ch in them, I would imagine that would help.

With 10,000 items, I doubt very much that you can beat a linear search
by very much with a complicated data structure.  With 10,000,00 I can
see where it would be a different story.  A linear search is pretty easy
to code.  10,000 classes of 1000 bytes only takes up 10mb.  Easy to
store in memory.  10 million items on the other hand would probably have
to be kept on some kind of permanent storage.  Or you might want to use
some real database manager instead and use its searching capabilities.

Signature

Knute Johnson
email s/nospam/knute/

Daniel Pitts - 13 Feb 2007 20:35 GMT
> On Feb 12, 9:49 pm, Knute Johnson <nos...@rabbitbrush.frazmtn.com>
> wrote:
[quoted text clipped - 34 lines]
> entries with ch in the them, then look through the results.  If <5% of
> the entries have ch in them, I would imagine that would help.

You could also look into a third-party indexing library.

Check out: Lucene.
or for a web service: Solr

Lucene was developed by an (ex?) Google employee.
Solr was developed at, and is used extensively at, CNET Networks.
bugbear - 13 Feb 2007 09:57 GMT
> Suppose I have a list of 10,000 items that you can order, each order
> item has an ID and a Name.
>
> If you type in a string and I want to find every item that CONTAINS
> that string, what is the most efficient way?

Just to clarify; you wish to detect your target string (e.g. "har")
within the Name of your item object?

e.g. "harold" is a hit, "harry morse" is a hit, "mr haroldson" is a hit,
"nigel smith" is a miss?

  BugBear
Timasmith - 13 Feb 2007 11:45 GMT
> Timasmithwrote:
> > Suppose I have a list of 10,000 items that you can order, each order
[quoted text clipped - 10 lines]
>
>    BugBear

Correct.
bugbear - 13 Feb 2007 12:27 GMT
>> Timasmithwrote:
>>> Suppose I have a list of 10,000 items that you can order, each order
[quoted text clipped - 8 lines]
>
> Correct.

OK. Any other constraints on alphabet (i.e. number of distinct characters
allowed for use in "Name"), or search targets (whole words?),
number of insertions to the item list compared with number of retrievals (searches)?

  BugBear
Timasmith - 13 Feb 2007 20:20 GMT
> Timasmithwrote:
> >> Timasmithwrote:
[quoted text clipped - 15 lines]
>
>    BugBear

No constraints on alphabet, most of the search targets are single
words from 3-30 characters.

Insertions is rare, the list does not change from day to day,
retreivals is measured in thousands or even tens of thousands per hour.
bugbear - 15 Feb 2007 15:51 GMT
> No constraints on alphabet, most of the search targets are single
> words from 3-30 characters.
>
> Insertions is rare, the list does not change from day to day,
> retreivals is measured in thousands or even tens of thousands per hour.

Given the other info in the thread,
I would try the following hybrid hack.

Concatanate all the Names into a single
array of characters, separated by some
"impossible" character (likely nul) with a second array
containing the offsets within the character "pool"
back to your Items.

Use a fast text search to locate
your target within the pool.

I would suggest:

A Fast Generic Sequence Matching Algorithm
David R. Musser, Gor V. Nishanov
http://citeseer.ist.psu.edu/586484.html

Having all the data in a memory contiguous
pool should speed things along, and the search
should be very fast.

Then use the (sorted, of course) offset array
to turn the offset into Items references.

   BugBear
bugbear - 15 Feb 2007 15:55 GMT
> A Fast Generic Sequence Matching Algorithm
> David R. Musser, Gor V. Nishanov
> http://citeseer.ist.psu.edu/586484.html

Source - not JAVA

http://www.cs.rpi.edu/~musser/gp/gensearch/index.html

  BugBear
Christian - 13 Feb 2007 20:57 GMT
Timasmith schrieb:
>> Timasmithwrote:
>>> Suppose I have a list of 10,000 items that you can order, each order
[quoted text clipped - 10 lines]
>
> Correct.

search for ukkonen's algorithm ...
its an algorithm for substring trees

building is in O(n)
storage needed in O(n)
and searching in O(1)
that may fit your needs ...
Christian - 15 Feb 2007 20:08 GMT
Christian schrieb:
> Timasmith schrieb:
>>> Timasmithwrote:
[quoted text clipped - 19 lines]
> and searching in O(1)
> that may fit your needs ...

if you are not willing to index  you could also try do use

Boyer Moore ... that needs no indexing and should be in average case

O(m +n/m)

with n textlenght and m length of the searched substring

(with substringtrees searching is not in O(1) its only independent from
n which in most cases should be good enough)
Jeff Higgins - 13 Feb 2007 12:35 GMT
> Suppose I have a list of 10,000 items that you can order, each order
> item has an ID and a Name.
[quoted text clipped - 3 lines]
>
> I happen to be using java but I am open to general solutions.

http://lucene.apache.org/java/docs/

> Obviously I could just loop through the list and do a match on each
> string but I cant help thinking one could be more efficient thatn
[quoted text clipped - 6 lines]
>
> Tim
David Segall - 14 Feb 2007 05:56 GMT
>> Suppose I have a list of 10,000 items that you can order, each order
>> item has an ID and a Name.
[quoted text clipped - 5 lines]
>
>http://lucene.apache.org/java/docs/
I don't think so. From the docs "In order to prevent extremely slow
WildcardQueries, a Wildcard term should not start with one of the
wildcards * or ?".
Chris Uppal - 14 Feb 2007 19:10 GMT
> > http://lucene.apache.org/java/docs/
> I don't think so. From the docs "In order to prevent extremely slow
> WildcardQueries, a Wildcard term should not start with one of the
> wildcards * or ?".

I think you must be misinterpretting the Lucene documentation.  Lucene is
specifically /for/ searching large volumes of text.

I suspect that what it is trying to warn of is that it you want to look for the
needle, "needle", in a multiGB haystack, you should ask Lucent to look for
"needle" not for "*needle".

   -- chris
David Segall - 15 Feb 2007 15:11 GMT
>> > http://lucene.apache.org/java/docs/
>> I don't think so. From the docs "In order to prevent extremely slow
[quoted text clipped - 3 lines]
>I think you must be misinterpretting the Lucene documentation.  Lucene is
>specifically /for/ searching large volumes of text.
True, but the OP does not have large volumes of text. He does not
specify the amount but, to me, "a name" implies at most a few hundred
characters.

>I suspect that what it is trying to warn of is that it you want to look for the
>needle, "needle", in a multiGB haystack, you should ask Lucent to look for
>"needle" not for "*needle".
The OP emphasised that "I want to find every item that CONTAINS that
string". He asked for the ability to efficiently look for "*needle".
It is clear from the documentation that Lucene is optimised to look
for "words", that it can cope with "wor*" but that it is not the right
tool for "*ords".
Lew - 15 Feb 2007 16:06 GMT
>>>> http://lucene.apache.org/java/docs/
>>> I don't think so. From the docs "In order to prevent extremely slow
>>> WildcardQueries, a Wildcard term should not start with one of the
>>> wildcards * or ?".

> The OP emphasised that "I want to find every item that CONTAINS that
> string". He asked for the ability to efficiently look for "*needle".
> It is clear from the documentation that Lucene is optimised to look
> for "words", that it can cope with "wor*" but that it is not the right
> tool for "*ords".

But the /search/ term doesn't start with a wildcard, the /search/ term is
"ords" or "needle".

Consider

String s = "Text base in which to search for words.";
boolean containsSearchTerm = (s.indexOf( "ords" ) >= 0);

Note that this finds if the item "CONTAINS that string", but that the /search/
term does not start with a wildcard. The OP was asking for this capability
across secondary storage.

- Lew
David Segall - 16 Feb 2007 12:08 GMT
>>>>> http://lucene.apache.org/java/docs/
>>>> I don't think so. From the docs "In order to prevent extremely slow
[quoted text clipped - 9 lines]
>But the /search/ term doesn't start with a wildcard, the /search/ term is
>"ords" or "needle".
True, but Lucene's /index/ only contains "words".

>Consider
>
>String s = "Text base in which to search for words.";
>boolean containsSearchTerm = (s.indexOf( "ords" ) >= 0);
Exactly! For Lucene, that is a wild card search which must be
performed on _every_ word in the index. The OP would be far better off
using that code on his 10,000 raw strings.

Anyway, our debate over using Lucene has been overtaken by Christians
suggestion to use Ukkonen's algorithm. Apart from the thorny problem
of interpreting multi-word searches, which applies to all the
suggestions, the algorithm seems ideal. An early result of a Google
search contains a lucid explanation and some source code
<http://marknelson.us/1996/08/01/suffix-trees/>.
Lew - 16 Feb 2007 14:25 GMT
> Anyway, our debate over using Lucene has been overtaken by Christians
> suggestion to use Ukkonen's algorithm. Apart from the thorny problem
> of interpreting multi-word searches, which applies to all the
> suggestions, the algorithm seems ideal. An early result of a Google
> search contains a lucid explanation and some source code
> <http://marknelson.us/1996/08/01/suffix-trees/>.

Lew wrote on 2002-02-10:
>> Maybe some more outré structure like a trie will help:
>> <http://en.wikipedia.org/wiki/Trie>
[quoted text clipped - 3 lines]
>> other structures:
>> <http://en.wikipedia.org/wiki/List_of_data_structures>

- Lew
Lew - 16 Feb 2007 23:02 GMT
> Lew wrote on 2002-02-10:

Oops. That was on 2/12, not 2/10.

- Lew
Daniel Pitts - 14 Feb 2007 19:39 GMT
> >> Suppose I have a list of 10,000 items that you can order, each order
> >> item has an ID and a Name.
[quoted text clipped - 9 lines]
> WildcardQueries, a Wildcard term should not start with one of the
> wildcards * or ?".

You would use stemming to index the substrings, rather than wildcards
to search for it.
David Segall - 15 Feb 2007 15:37 GMT
>> >> Suppose I have a list of 10,000 items that you can order, each order
>> >> item has an ID and a Name.
[quoted text clipped - 12 lines]
>You would use stemming to index the substrings, rather than wildcards
>to search for it.
Stemming in Lucene reduces token terms to their "_linguistic_ root
form. e.g. reduces 'fishing' and 'fishes' to 'fish'". It would not
help find 'shing' in 'fishing'.
Chris Uppal - 15 Feb 2007 19:23 GMT
> Stemming in Lucene reduces token terms to their "_linguistic_ root
> form. e.g. reduces 'fishing' and 'fishes' to 'fish'". It would not
> help find 'shing' in 'fishing'.

Ah, I see what you mean now.  Thanks.

   -- chris
Daniel Pitts - 15 Feb 2007 20:45 GMT
> >> >> Suppose I have a list of 10,000 items that you can order, each order
> >> >> item has an ID and a Name.
[quoted text clipped - 16 lines]
> form. e.g. reduces 'fishing' and 'fishes' to 'fish'". It would not
> help find 'shing' in 'fishing'.

Linguistic roots are only ONE implementation of stemming. Its
perfectly acceptible to stem "suffixes".
Chris Uppal - 13 Feb 2007 16:51 GMT
> Suppose I have a list of 10,000 items that you can order, each order
> item has an ID and a Name.
>
> If you type in a string and I want to find every item that CONTAINS
> that string, what is the most efficient way?

Efficient text searching is a /big/ subject.  No, let me be more precise: it's
/HUGE/ subject.

If your list of items is reasonably short, and the items themselves are
reasonably short, then you can do yourself a big favour by ignoring efficiency
and just searching linearly.  A lot depends on the application, but I'm not
convinced that 10K items is enough to justify heavier-weight solutions.

Next up, there are efficient text searching algorithms, such as Boyer-Moore,
and the interesting newish breed of "bit-parallel" or "bitmap" searching
algorithms.  If you have enough text, and if the overhead (in time, space, or
convenience) of maintaining an index is unacceptable, then looking into those
algorithms would be sensible.  Note that there variants on these algorithms
which do approximate matching.  You might look for "agrep" for instance.

Also Wikipeadia has quite a few pages on text searching.
   http://en.wikipedia.org/wiki/Category:Search_algorithms

Lastly, if you need greater speed than such methods can provide (and search
engines /'definitely/ do ;-) then you'd need to use some sort of indexing.

The simplest way to approach that is to ensure that your database has something
like "full text search and indexing" in its feature set -- and use that.

Alternatively, there are off-the-shelf packages for bulk test searching.  Jeff
Higgins has already pointed you towards Apache Lucene, which is one such
package.

   -- chris
S Perryman - 15 Feb 2007 09:00 GMT
>> Suppose I have a list of 10,000 items that you can order, each order
>> item has an ID and a Name.

>> If you type in a string and I want to find every item that CONTAINS
>> that string, what is the most efficient way?

> Efficient text searching is a /big/ subject.  No, let me be more precise:
> it's
> /HUGE/ subject.

> If your list of items is reasonably short, and the items themselves are
> reasonably short, then you can do yourself a big favour by ignoring
> efficiency
> and just searching linearly.  A lot depends on the application, but I'm
> not
> convinced that 10K items is enough to justify heavier-weight solutions.

> Next up, there are efficient text searching algorithms, such as
> Boyer-Moore,
> and the interesting newish breed of "bit-parallel" or "bitmap" searching
> algorithms.

I have heard of the "bitmap" algorithms, but don't know how they compare
to the really good std ones like KMP (Boyer-Moore is subject to degeneracy
to
O(M * N) for certain alphabets while KMP should have fixed overhead
except probably for massively long search patterns - computing the "failure"
function etc) .

Regards,
Steven Perryman
H. S. Lahman - 13 Feb 2007 17:07 GMT
Responding to Timasmith...

> Suppose I have a list of 10,000 items that you can order, each order
> item has an ID and a Name.

The key will be ordering, but for just 10K items one probably doesn't
want to get too elaborate.

> If you type in a string and I want to find every item that CONTAINS
> that string, what is the most efficient way?

I assume you mean in the Name rather than the ID. I also assume that the
data gets stored once but is searched often.

Is there a limit on the lengths of the Names? What is the average
length? The answers to these questions will drive tailoring the
solutions below to trade-off speed vs. effort. Can you define the ID
(e.g., an index for the order the data items are stored)?

> I happen to be using java but I am open to general solutions.
>
> Obviously I could just loop through the list and do a match on each
> string but I cant help thinking one could be more efficient thatn
> O(n).

I would be inclined to make use of a couple of practical
characteristics. In searching ASCII keys most rejections will occur in
the first few characters. (It has been many, many moons since I've done
searches so I forget the actual numbers for the distribution.) Also,
string matching algorithms for a given starting point in each string are
quite efficient.

I think one "cheap" approach would be to keep an ordered list of
digraphs or trigraphs that appear anywhere in the Name. That list gets
updated whenever an item is added/removed from the data. The list would
contain multiple entries consisting of the item ID and the start
position of the digraph/trigraph in the Name. (Maybe the remaining
length if the search keys are likely to be long so that you can
eliminate embedded substrings that would extend beyond the Name length.)

The data itself would be ordered in an array and the "ID" above would be
an index into the storage position (either the original ID or one you
create when storing the data). Your algorithm would find the starting
digraph/trigraph in the list above and "walk" the entries for that
digraph/trigraph. For each one it would do a lookup to get the data and
use a conventional substring matching algorithm for the full search
string at the indicated start point.

There are even less elaborate versions. You can simply keep a list of
data items that contain the starting letter of the search string
somewhere and search just them. Then all you need is to store the data
so you can access items by an index referenced in the starting letter list.

Bottom line: you have a trade-off between how fast you want to be and
how much work you want to put into storing the data items. As a fairly
general rule the search will be in direct proportional to how much space
you have for storing supporting infrastructure like index tables.

For example, you could create 26 trees whose nodes were letters. The
root node in each tree would be one of the possible first letters of the
Name. Each child node would be a character that immediately followed the
parent as the next letter of the Name (pretty much like a B-Tree). Each
node would have a list of items that contained the substring of letters
leading to that node. Then you could create a similar set of trees where
the root nodes were the second letter of the name and so on. The search
algorithm would "walk" the trees in each set from the node with the
proper starting letter. This would be very fast but it would require
substantially more storage space than the original data. [One could get
even more carried away if one uses a digraph or trigraph as the root
node in each tree. B-)]

*************
There is nothing wrong with me that could
not be cured by a capful of Drano.

H. S. Lahman
hsl@pathfindermda.com
Pathfinder Solutions
http://www.pathfindermda.com
blog: http://pathfinderpeople.blogs.com/hslahman
"Model-Based Translation: The Next Step in Agile Development".  Email
info@pathfindermda.com for your copy.
Pathfinder is hiring:
http://www.pathfindermda.com/about_us/careers_pos3.php.
(888)OOA-PATH
Timasmith - 13 Feb 2007 20:50 GMT
> Responding toTimasmith...
>
[quoted text clipped - 79 lines]
> Pathfinder is hiring:http://www.pathfindermda.com/about_us/careers_pos3.php.
> (888)OOA-PATH

So I can point you to a file which is near exactly what I would be
searching.

http://www.fda.gov/cder/ndc/ziptext.zip

It is a list of medications.   There are 50,000 which is a reasonable
representation.  Look at column G in listings.txt.  If you take that
it is 1.5MB in size.

If I load up the items into an ArrayList and then search linearly for
items matching WELLBUTRIN

then do to 1000 searches takes 9-11 seconds.  So 100 searches a second
seems rather fast, maybe I am worrying about nothing.

<pre>
     // build file
     String contents = FileUtil.getTextContents("C:\\temp\\a.txt");
     java.util.StringTokenizer st = new
java.util.StringTokenizer(contents,"\n\r");
     ArrayList<String> list = new ArrayList<String>(50000);
     while (st.hasMoreTokens()) {
        list.add(st.nextToken());
     }

     // start timer
     System.out.println(Timer.getNow());
     String match = "WELLBUTRIN";
     ArrayList<Integer> matches = new ArrayList<Integer>();
     for (int i=0; i<1000; i++) {
        matches.clear();
        for(int j=0; j<list.size(); j++) {
           if (list.get(j).indexOf(match) > -1) {
              matches.add(j);
           }
        }
     }
     //stop timer
     System.out.println(Timer.getNow());
     //System.out.println(matches);
</pre>
Knute Johnson - 14 Feb 2007 00:22 GMT
>> Responding toTimasmith...
>>
[quoted text clipped - 117 lines]
>       //System.out.println(matches);
> </pre>

I'll bet if you do that in a String[] it will be even faster.

Signature

Knute Johnson
email s/nospam/knute/

H. S. Lahman - 14 Feb 2007 17:49 GMT
Responding to Timasmith...

> So I can point you to a file which is near exactly what I would be
> searching.
[quoted text clipped - 10 lines]
> then do to 1000 searches takes 9-11 seconds.  So 100 searches a second
> seems rather fast, maybe I am worrying about nothing.

It depends upon the context.  If you only need the search as a response
to particular interactive user request, then that 10 ms will not be
noticeable compared to the hundreds of milliseconds for the user's
typing. (It will probably take longer just to read the data from disk.)

OTOH, if your software is in the bowels of an HMO's multi-user
application processing 1K pharmacy coverage requests a second, you
probably need to speed things up a tad. [Though I don't know why a
pharmacy would be providing partial keys; maybe they can't read the
doctor's writing. B-)] In fact, then you probably don't want to read the
DB at all for an individual request. You probably want to read the data
into memory at startup and answer the queries from memory. If the data
is only using 1.5 Mb, you could probably have another 5-10 Mb of
supporting data structures for the searches w/o getting into page
faulting trouble.

Bottom line: I think you need to look at the situation and determine
what is acceptable performance and then do any trade-offs from there.

*************
There is nothing wrong with me that could
not be cured by a capful of Drano.

H. S. Lahman
hsl@pathfindermda.com
Pathfinder Solutions
http://www.pathfindermda.com
blog: http://pathfinderpeople.blogs.com/hslahman
"Model-Based Translation: The Next Step in Agile Development".  Email
info@pathfindermda.com for your copy.
Pathfinder is hiring:
http://www.pathfindermda.com/about_us/careers_pos3.php.
(888)OOA-PATH
Ron Jeffries - 13 Feb 2007 21:43 GMT
>Suppose I have a list of 10,000 items that you can order, each order
>item has an ID and a Name.
[quoted text clipped - 10 lines]
>A traditional database btree index doesnt help, but I think search
>engines must use something fast.

Before doing anything else, I'd suggest that you code up the simple
search and time it. My guess is it'll fly.

Ron Jeffries
www.XProgramming.com
bugbear - 14 Feb 2007 13:43 GMT
> Before doing anything else, I'd suggest that you code up the simple
> search and time it. My guess is it'll fly.

Agreed in principle, but it depends
what "fast enough" means.

For human interaction 1/100 of a second is good.
For human interaction with multiple users (e.g. webserver)
you might want more 1/1000.

For the evaluation loop inside a genetic algorithm
you might be aiming for micro seconds.

  BugBear
Lew - 14 Feb 2007 20:17 GMT
> For human interaction 1/100 of a second is good.
> For human interaction with multiple users (e.g. webserver)
> you might want more 1/1000.

No reason to be faster for any one transaction if they interleave well.

Where did you come up with the .01s figure? Is there ergonomic research behind it?

- Lew
bugbear - 15 Feb 2007 15:53 GMT
>> For human interaction 1/100 of a second is good.
>> For human interaction with multiple users (e.g. webserver)
[quoted text clipped - 4 lines]
> Where did you come up with the .01s figure? Is there ergonomic research
> behind it?

No ; I'm afraid I made it up, on the grounds that
1/100th is clearly "good enough". Slower might
also be good enough, but I'm confident that 1/100 is OK.

(hell, it's quicker than many people's monitor refresh!)

  BugBear


Free Magazines

Get these publications absolutely FREE for up to 12 months. There are no hidden fees and no obligation. Simply choose a title, complete the application form and submit it. Read more ...

Oracle MagazineNetwork ComputingComputer WorldBio-IT WorldeWeekInformation WeekInfosecurity
 
Sign In
Join
My Latest Posts
My Monitored Threads
My Blog
My Photo Gallery
My Profile
My Homepage

Start New Thread
Enable EMail Alerts
Rate this Thread



©2008 Advenet LLC   Privacy Policy - Terms of Use
This website includes both content owned or controlled by Advenet as well as content owned or controlled by third parties.