Home | Contact Us | FAQ | Search & Site Map | Link to Us
Sign In | Join | Other 45 Sites in Network
HomeAnnouncementsWhite Papers
Discussion GroupsFirst AidDatabasesJavaBeansGUIJava 3DVirtual MachineCORBASecurityToolsGeneral
Java DirectoryOpen Source ProjectsSample Book ChaptersUser GroupsWeb Resources
Related Topics
Databases.NETMore Topics ...

Java Forum / General / April 2006

Tip: Looking for answers? Try searching our database.

Comparing elements from one hashset to another

Thread view: 
farah - 30 Mar 2006 13:55 GMT
Hi,

The following code was repeated three times to store 3 files into 3
individual hash sets. I now need to compare file A to file B, i want to
see how many of the words in set A are also on set B. (Each set
contains the same number of words).  Im not sure how to implement this,
can anyone help??

public FileIn() {

try {
// make sure there is a text file in the java directory where the java
// code is saved
in = new BufferedReader(new FileReader("C:\\Program
Files\\Java\\example.txt"));
set = new HashSet();

int Len = 1;
while(Len>0) {
String line = in.readLine();
try {
Len = line.length();
System.out.println(line);
set.add(line);
} catch(NullPointerException npe){
Len = 0; //no more file to read
}
}
Patricia Shanahan - 30 Mar 2006 16:28 GMT
> Hi,
>
[quoted text clipped - 3 lines]
> contains the same number of words).  Im not sure how to implement this,
> can anyone help??

Take a look at the method retainAll that HashSet inherits from
AbstractCollection.

Patricia
Remon van Vliet - 30 Mar 2006 16:35 GMT
> Hi,
>
[quoted text clipped - 24 lines]
> }
> }

Hello, what you want to do is make a union of the two collections (e.g. make
a 3rd collection that has all elements that are contained by both of your
source collections, a .size() on this 3rd collection gives you the answer to
your problem). The fastest way to do this is (using A, B and C to match your
question) :

HashSet C =  new HashSet(A);
A.retainAll(B);
C.size();

This obviously works just as well if you have more than 2 source collections

Remon van Vliet
farah - 30 Mar 2006 19:24 GMT
Im not too sure i understand........how would this enable me to compare
the values from two individual sets? would i be checking for
duplicates??

farah
Oliver Wong - 30 Mar 2006 19:54 GMT
> Im not too sure i understand........how would this enable me to compare
> the values from two individual sets? would i be checking for
> duplicates??

   The retainAll() method essentially performs the intersection operation
on two sets, which is (I think) what you want (i.e. A intersect B = {all x :
(x is an element of A) and (x is an element of B)} ).

   - Oliver
Patricia Shanahan - 30 Mar 2006 20:37 GMT
>> Im not too sure i understand........how would this enable me to compare
>> the values from two individual sets? would i be checking for
[quoted text clipped - 6 lines]
>
>    - Oliver

In particular, this does a non-destructive intersection using clone and
retainAll:

    HashSet ab = (HashSet)a.clone();
    ab.retainAll(b);

The elements of ab are those that appear in both a and b.

Of course, you can always do it manually by iterating over one of the
sets and asking the other set whether it contains each element. But why
write code yourself if it is already there and tested in java.util?

Patricia
Hendrik Maryns - 31 Mar 2006 10:51 GMT
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
NotDashEscaped: You need GnuPG to verify this message

Patricia Shanahan schreef:

>>> Im not too sure i understand........how would this enable me to compare
>>> the values from two individual sets? would i be checking for
[quoted text clipped - 12 lines]
>     HashSet ab = (HashSet)a.clone();
>     ab.retainAll(b);

This also works if a is declared of type Set (don't code against
implementations...):

Set ab = new HashSet(a);
ab.retainAll(b);

This is what is recommended in Sun's Java tutorials, too.

H.

Signature

Hendrik Maryns

==================
www.lieverleven.be
http://aouw.org

Ed Kirwan - 31 Mar 2006 11:25 GMT
> This also works if a is declared of type Set (don't code against
> implementations...):
[quoted text clipped - 5 lines]
>
> H.

:)

Collection ab = new HashSet(a);

Signature

www.EdmundKirwan.com - Home of The Fractal Class Composition.

Download Fractality, free Java code analyzer:
www.EdmundKirwan.com/servlet/fractal/frac-page130.html

farah - 02 Apr 2006 00:04 GMT
Set ab = new HashSet(a);
ab.retainAll(b);

how can i check that this code works? it compiles but how can i check
that it has done its task?
Roedy Green - 02 Apr 2006 00:56 GMT
>how can i check that this code works? it compiles but how can i check
>that it has done its task?

see the source code at http://mindprod.com/jgloss/hashset.html

It dumps the result set so you can see it works.
Signature

Canadian Mind Products, Roedy Green.
http://mindprod.com Java custom programming, consulting and coaching.

farah - 02 Apr 2006 00:11 GMT
Set ab = new HashSet(a);
ab.retainAll(b);

ok so the code compares the words from two individual sets.....how can
i check the output of this so that it shows me which of the words
within the two sets are identical???

Farah
Roedy Green - 02 Apr 2006 00:57 GMT
>ok so the code compares the words from two individual sets.....how can
>i check the output of this so that it shows me which of the words
>within the two sets are identical???

Again, please see the source code at
http://mindprod.com/jgloss/hashset.html#COMPARING
Signature

Canadian Mind Products, Roedy Green.
http://mindprod.com Java custom programming, consulting and coaching.

farah - 02 Apr 2006 00:16 GMT
Set ab = new HashSet(a);
ab.retainAll(b);

ok so the code compares the words from two individual sets.....how can
i check that it has identified any words that were contained in BOTH
sets?

(Id like the code to place any words that are contained in both sets
int a seperate store and than return the number of words contained in
that store (this being the total number of identical words that appear
in both sets))

Farah
Roedy Green - 02 Apr 2006 00:59 GMT
>ok so the code compares the words from two individual sets.....how can
>i check that it has identified any words that were contained in BOTH
>sets?

please see http://mindprod.com/jgloss/hashset.html#COMPARISON

Run the code.  Trace it.  It solves your problem three different ways.
All produce the expected result.
Signature

Canadian Mind Products, Roedy Green.
http://mindprod.com Java custom programming, consulting and coaching.

Wibble - 02 Apr 2006 04:07 GMT
> Set ab = new HashSet(a);
> ab.retainAll(b);
[quoted text clipped - 9 lines]
>
> Farah

ab.retainAll(b).retainAll(a).size()
farah - 02 Apr 2006 12:22 GMT
 Roedy,

For the retainAll method, a list of fruits that had been entered
individually as strings were used. My code hoever uses info from files
that are read into the program.  Im having trouble modifying my code to
work around this, can you help??

Farah
Hendrik Maryns - 02 Apr 2006 13:00 GMT
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
NotDashEscaped: You need GnuPG to verify this message

farah uitte de volgende tekst op 04/02/2006 01:22 PM:
>   Roedy,
>
> For the retainAll method, a list of fruits that had been entered
> individually as strings were used. My code hoever uses info from files
> that are read into the program.  Im having trouble modifying my code to
> work around this, can you help??

Not if you give no information.  I mean, do you really plan on doing any
serious programming?  It is very straightforward what happens in the
code we gave you.  If you don't understand it, I see no point in even
trying to cope with file input.

H.
Signature

Hendrik Maryns

==================
www.lieverleven.be
http://aouw.org

Roedy Green - 02 Apr 2006 18:15 GMT
>For the retainAll method, a list of fruits that had been entered
>individually as strings were used. My code hoever uses info from files
>that are read into the program.  Im having trouble modifying my code to
>work around this, can you help??

The only difference will  be how you get the words from the files into
the HashSets..  How you do that depends on the format of the files.

Presuming they are simple text files with many words per line and
punctuation, you might use a StreamTokenizer,
http://mindprod.com/jgloss/streamtokenizer.html StringTokenizer
http://mindprod.com/jgloss/stringtokenizern or  Regex
http://mindprod.com/jgloss/regex.html  split that breaks the words
apart at space or punctuation. You then add each word to HashSet A or
B.  If they are one per line, you can read them with a
BufferedReader.readLine and trim.

To read the file for regex, see
http://mindprod.com/applets/fileio.html

If there are duplicates WITHIN a file you will find out from the
boolean returned by HashSet.add.
Signature

Canadian Mind Products, Roedy Green.
http://mindprod.com Java custom programming, consulting and coaching.

Remon van Vliet - 03 Apr 2006 15:51 GMT
Just me or did you just ask the same question three times? Anyway, make a
unit test and check if it produces the expected results (which it will).

Remon van Vliet

> Set ab = new HashSet(a);
> ab.retainAll(b);
[quoted text clipped - 9 lines]
>
> Farah
Roedy Green - 30 Mar 2006 20:38 GMT
On Thu, 30 Mar 2006 17:35:26 +0200, "Remon van Vliet"
<remon@exmachina.nl> wrote, quoted or indirectly quoted someone who
said :

>HashSet C =  new HashSet(A);
>A.retainAll(B);
>C.size();

the retainAll is easiest to code of the three algorithms, but I
suspect that the sort method will be fastest for very large sets,
which is the bulkiest to code.

retainAll has the over head of creating a duplicate set and the
overhead of looking up each element and removing nearly all the
elements.

sort has the overhead of loading the sort class, exporting two arrays,
and sorting.

lookup has the overhead of a  hashed lookup on every element of one of
the sets in the other.

Code for all three algorithms is posted  at
http://mindprod.com/jgloss/hashset.html#COMPARING
Signature

Canadian Mind Products, Roedy Green.
http://mindprod.com Java custom programming, consulting and coaching.

Patricia Shanahan - 30 Mar 2006 20:46 GMT
> On Thu, 30 Mar 2006 17:35:26 +0200, "Remon van Vliet"
> <remon@exmachina.nl> wrote, quoted or indirectly quoted someone who
[quoted text clipped - 20 lines]
> Code for all three algorithms is posted  at
> http://mindprod.com/jgloss/hashset.html#COMPARING

In terms of theoretical complexity, the sort method is O(n log(n)), but
the retainAll method, and its manual equivalent, are O(n), assuming
HashSet lookup is constant time. Of course, for any reasonably small
problem size, the sort method might be faster.

On the other hand, we were not asked for the fastest method, and the
retainAll approach is really easy to code.

Patricia
farah - 30 Mar 2006 22:12 GMT
i think in this instance i might try the easier of the methods.....im
pretty new to java and not very competent with the language
Roedy Green - 30 Mar 2006 20:03 GMT
>The following code was repeated three times to store 3 files into 3
>individual hash sets. I now need to compare file A to file B, i want to
>see how many of the words in set A are also on set B. (Each set
>contains the same number of words).  Im not sure how to implement this,
>can anyone help??

there are two basic ways:

extract the two sets as arrays, sort and compare linearly

Enumerate Set A and look each element up in Set B to see if it exists.

http://mindprod.com/jgloss/hashset.html#COMPARING
Signature

Canadian Mind Products, Roedy Green.
http://mindprod.com Java custom programming, consulting and coaching.



Free Magazines

Get these publications absolutely FREE for up to 12 months. There are no hidden fees and no obligation. Simply choose a title, complete the application form and submit it. Read more ...

Oracle MagazineNetwork ComputingComputer WorldBio-IT WorldeWeekInformation WeekInfosecurity
 
Sign In
Join
My Latest Posts
My Monitored Threads
My Blog
My Photo Gallery
My Profile
My Homepage

Start New Thread
Enable EMail Alerts
Rate this Thread



©2008 Advenet LLC   Privacy Policy - Terms of Use
This website includes both content owned or controlled by Advenet as well as content owned or controlled by third parties.