Now I am confused.
[....]
You are correct, they don't. However, I can still see if they're any
faster at building the list. Currently I am sorting as a separate step so
I can still compare the speed of the various methods. I included the part
about sorting in the original post because I thought it might influence the
method used.
- Mike
> Neither of the suggested solutions would meet this condition. Or would
> they? What am I missing?
Hemal Pandya wrote:
> Now I am confused.
>
[quoted text clipped - 4 lines]
> Neither of the suggested solutions would meet this condition. Or would
> they? What am I missing?
Maybe you missed my comment "Once you have the set of keys, you can sort
it using a Comparator that orders based on the associated count."?
> You need a collection of (word, frequency). When building the
> collection, you want it searchable by the word. When reporting, you
> need it sorted by frequency. One way is to sort the collection after
> the scan. Would that be worse then maintaining two collections during
> scan? I don't know...
A continuously sorted collection would be quite expensive to maintain,
especially considering the statistics stated in the base message: "The
list of source strings can be anywhere from 1 - 20000 strings having
anywhere from 1 - 500 unique terms. The average set will be roughly 300
strings with about 30 unique terms."
Maintaining sorted order requires adjustment of the structure for every
input, 300 times in the average case. A HashMap indexed by string has
one insert for each unique string, 30 times in the average case,
followed by a 30 element sort at the end. The remaining 270 operations
do a HashMap get followed by an add.
The OP did say "I'd then like to sort this data so the items with the
higher counts come first in the list." which suggests to me that
solutions that first collect the data and then sort meet the requirements.
> If I remember right, the K&R C Programming Language book has a sample
> code for word frequency.
I don't think K&R gave enough attention to the java.util package to be a
really good guide to Java programming in this area. :-)
Patricia
Hemal Pandya - 23 Dec 2006 05:20 GMT
[...]
> Maybe you missed my comment "Once you have the set of keys, you can sort
> it using a Comparator that orders based on the associated count."?
Of course. I didn't think it likely that you would not have addressed
it. Perhaps I was thrown off because reading the above to mean to sort
the keys and I was stuck with the through that the collection of pairs
needs to be sorted.
Thank you for the explanation of performance issue.