Java Forum / General / June 2005
hashCode for 4 ints?
Jeff - 21 Jun 2005 11:26 GMT Am I misusing HashMap or just need to fix the .hashCode() override for the key object?
The class I use as a key for a HashMap is unique based on a combination of 4 ints. Normally, about 5000 instances are created and persist over time. The maximum possible created is probably max around 40,000.
Since .hashCode() returns an int, how do I create a unique identifier for the 128 bits contained in the 4 ints? Or should I swap to another data structure?
Thanks!
// Here's a simplification of my class that illustrates my problem.
public class IntHashExample { int A, B, C, D;
IntHashExample( int A , int B , int C , int D ) { this.A = A; this.B = B; this.C = C; this.D = D; }
public boolean equals( Object arg0 ) {
IntHashExample other = ( IntHashExample ) arg0;
if ( ( A == other.A ) && ( B == other.B ) && ( C == other.C ) && ( D == other.D ) ) return true; else return false; }
// how do I create a hashcode to lookup the unique combination of the 4 ints? The following obviously wouldn't work, // but hopefully indicates my need public int hashCode() {
long highest = ( long ) A << 96;
long high = ( long ) B << 64;
long low = ( long ) C << 32;
long lowest = ( long ) D;
int code = ( int ) ( highest | high | low | lowest );
return code; } }
 Signature Jeff
Patricia Shanahan - 21 Jun 2005 13:18 GMT ...
> Since .hashCode() returns an int, how do I create a unique identifier for > the 128 bits contained in the 4 ints? Or should I swap to another data > structure? ...
Why do you need the hash code to be a unique identifier?
Patricia
huddler - 21 Jun 2005 15:06 GMT > ... > > Since .hashCode() returns an int, how do I create a unique identifier for [quoted text clipped - 3 lines] > > Why do you need the hash code to be a unique identifier? <Warning: Java Newbie>
If the hash is used as the key in a HashMap, then non-unique hashes would create a side effect to put() where an existing Object in the HashMap is replaced with the new Object. He might not want that to happen.
Patricia Shanahan - 21 Jun 2005 16:36 GMT >>... >> [quoted text clipped - 12 lines] > HashMap is replaced with the new Object. He might not want that to > happen. The "key" is the object itself, not its hash. Replacement happens if, and only if, the two objects are equal.
The hash is used for performance purposes, to reduce the number of objects that HashMap has to look at when testing for equality.
Here's an example program, using the unrecommented, very poor performance, but functionally correct trivial hashCode method:
import java.util.HashMap; import java.util.Map;
public class TrivialHash { String name;
public TrivialHash(String name) { this.name = name; }
public String toString() { return name; }
public boolean equals(Object o) { return (o instanceof TrivialHash && name.equals(((TrivialHash) o).name)); }
public int hashCode() { return 0; }
public static void main(String[] args) { Map map = new HashMap(); map.put(new TrivialHash("A"), "X"); map.put(new TrivialHash("B"), "Y"); System.out.println(new TrivialHash("A").hashCode()); System.out.println(new TrivialHash("B").hashCode()); System.out.println(map.get(new TrivialHash("A"))); System.out.println(map.get(new TrivialHash("B"))); } }
This program prints:
0 0 X Y
demonstrating that equal hash codes do not prevent two key values from being in the HashMap at the same time.
Patricia
huddler - 21 Jun 2005 18:07 GMT Well, as a Java newbie, I'm in a constant state of confusion. In the 1.4.2 doc for the HashMap put() method, it says:
<quote> public Object put(Object key, Object value)
Associates the specified value with the specified key in this map. If the map previously contained a mapping for this key, the old value is replaced. </quote>
So could I trouble you to explain (or point to an explanation) of why your example and the doc SEEM to contradict each other?
http://java.sun.com/j2se/1.4.2/docs/api/java/util/HashMap.html#put(java.lang.Obj ect,%20java.lang.Object)
I await humiliation.
Patricia Shanahan - 21 Jun 2005 21:25 GMT > Well, as a Java newbie, I'm in a constant state of confusion. In the > 1.4.2 doc for the HashMap put() method, it says: [quoted text clipped - 14 lines] > > I await humiliation. I aim for enlightenment, rather than humiliation. Of course, I may miss sometimes.
I think you may be misinterpreting the word "key" in the HashMap documentation. It is also possible that the OP shares your confusion.
Look at the argument list. The key is the object, not its hash code.
In my example program, a TrivialHash object is equal to other TrivialHash objects with the same name, and nothing else. The first call uses as key a TrivialHash object with name "A". The second call uses a TrivialHash object with name "B", which is unequal. As far as HashMap is concerned, when I do the second map.put call, the map does not contain a mapping for the key, so it does not need to replace an old value.
HashMap, like most hash tables, assumes that equal keys have equal hash codes, but that some unequal keys may share the same hash code. That is all HashMap is allowed to assume, because of the hashCode definition in the Object javadocs: "If two objects are equal according to the equals(Object) method, then calling the hashCode method on each of the two objects must produce the same integer result."
Essentially, a hash table splits the keys into subsets, called buckets, using the hash code. If two keys are equal, they go on the same bucket, so a lookup only has to examine the keys in one bucket. However, two keys in the same bucket, even if they have exactly the same hash, may not be equal. The only way HashMap can be sure two Java objects are equal is to get a true result from o1.equals(o2).
The behavior you seem to be expecting, using the hash code as key and ignoring the object's equals method, does match a real but specialized technique. A "perfect hash" ensures unequal keys have unequal hash codes. It shifts some complexity from the lookup process to the construction of the hash. It may be used, for example, for short lists of keywords that do not change. Do a search for "perfect hash" if you want to learn more.
The Java hashCode function is NOT required to be a perfect hash, and HashMap does not assume it is one.
Patricia
George Cherry - 22 Jun 2005 22:32 GMT >> Well, as a Java newbie, I'm in a constant state of confusion. In the >> 1.4.2 doc for the HashMap put() method, it says: [quoted text clipped - 32 lines] > HashMap, like most hash tables, assumes that equal keys have equal hash > codes, but that some unequal keys may share the same hash code. Yes, part of the contract of a hashCode() is
1. Equal objects have equal hash codes. 2. Unequal objects MAY have the same hash code. (Of course, IDEALLY they don't have the same hash code.)
> That is > all HashMap is allowed to assume, because of the hashCode definition in [quoted text clipped - 5 lines] > using the hash code. If two keys are equal, they go on the same bucket, > so a lookup only has to examine the keys in one bucket. The bucket is often the head of a linked list, I think.
> However, two > keys in the same bucket, even if they have exactly the same hash, may > not be equal. The only way HashMap can be sure two Java objects are > equal is to get a true result from o1.equals(o2). Yes, o1.equals(o2) implies o1.hashCode() == o2.hashCode() BUT o1.hashCode() == o2.hashCode() does not imply o1.equals(o2)
> The behavior you seem to be expecting, using the hash code as key and > ignoring the object's equals method, does match a real but specialized > technique. A "perfect hash" In this world there is no perfection, no permanence, and no self. -- Buddha
> ensures unequal keys have unequal hash > codes. It shifts some complexity from the lookup process to the [quoted text clipped - 6 lines] > > Patricia Great follow-up to your excellent example.
George
George Cherry - 22 Jun 2005 22:18 GMT > The "key" is the object itself, not its hash. Replacement happens if, > and only if, the two objects are equal. [quoted text clipped - 50 lines] > > Patricia Excellent example, Patricia. In the spirit of your example, here is another one:
/* * This program illustrates that if equals() is appropriately * defined, then even a mediocre hashCode() will not prevent a * hash collection from working correctly (although a mediocre * hashCode() will defeat the performance purpose of a hash * collection. */
import java.util.HashSet; import java.util.Set;
public class SillyHashCodeTester { private String name; private int id;
public SillyHashCodeTester(String name, int id) { this.name = name; this.id = id; }
public String toString() { return name + ", " + id; }
public boolean equals(Object o) { return ( o instanceof SillyHashCodeTester && this.name.equals( ((SillyHashCodeTester)o).name ) && this.id == ((SillyHashCodeTester)o).id ); }
//A: Define a silly hashCode(): public int hashCode() { return 42; }
public static void main(String[] args) { Set<SillyHashCodeTester> hashSet; hashSet = new HashSet<SillyHashCodeTester>();
System.out.println( "Set size: " + hashSet.size() ); System.out.println("Add an object to the set."); hashSet.add( new SillyHashCodeTester("abc", 100) ); System.out.println( "Set size: " + hashSet.size() + "\n" );
System.out.println("Add a non-duplicate object to the set."); hashSet.add( new SillyHashCodeTester("abc", 200) ); System.out.println( "Set size: " + hashSet.size() + "\n" );
System.out.println("Try to add a duplicate object to the set."); hashSet.add( new SillyHashCodeTester("abc", 100) ); System.out.println( "Set size: " + hashSet.size() + "\n" );
System.out.println("Try to add a duplicate object to the set."); hashSet.add( new SillyHashCodeTester("abc", 200) ); System.out.println( "Set size: " + hashSet.size() + "\n" );
System.out.println("Add a non-duplicate object to the set."); hashSet.add( new SillyHashCodeTester("xyz", 100) ); System.out.println( "Set size: " + hashSet.size() + "\n" ); } }
/* This program should print:
Set size: 0 Add an object to the set. Set size: 1
Add a non-duplicate object to the set. Set size: 2
Try to add a duplicate object to the set. Set size: 2
Try to add a duplicate object to the set. Set size: 2
Add a non-duplicate object to the set. Set size: 3 */
George
huddler - 22 Jun 2005 03:54 GMT Cripes, I hate when I'm this stupid in public.
Thanks for the discussion, Patricia.
George Cherry - 22 Jun 2005 22:33 GMT > Cripes, I hate when I'm this stupid in public. > > Thanks for the discussion, Patricia. I beg to differ with you, huddler. You are NOT stupid--the stupid persist in their ignorance.
George
Roedy Green - 28 Jun 2005 12:44 GMT >> Since .hashCode() returns an int, how do I create a unique identifier for >> the 128 bits contained in the 4 ints? Or should I swap to another data >> structure? see http://mindprod.com/jgloss/hashcode.html. Just xor them together.
 Signature Bush crime family lost/embezzled $3 trillion from Pentagon. Complicit Bush-friendly media keeps mum. Rumsfeld confesses on video. http://www.infowars.com/articles/us/mckinney_grills_rumsfeld.htm
Canadian Mind Products, Roedy Green. See http://mindprod.com/iraq.html photos of Bush's war crimes
Thomas Weidenfeller - 28 Jun 2005 14:34 GMT > see http://mindprod.com/jgloss/hashcode.html. Just xor them together. Just XORing the integers can give very bad hash-codes. In the not to unlikely case where all the involved integers are rather small positive(!) numbers the hash is not well spread at all over the entire possible range.
Small positive integers have many of their higher bits set to zero. XORing zeros gives zeros again. A hash calculated from small positive integers vi XORing can never exceed
2^(floor(log2(max(i1, i2, ..., in))) + 1) - 1
But even for positive integers which are not so small any more you end up with bad hashes. E.g. if your largest integer is 10000, you would still never get a hash code which is larger than 2^14 -1 = 16283.
/Thomas
 Signature The comp.lang.java.gui FAQ: ftp://ftp.cs.uu.nl/pub/NEWS.ANSWERS/computer-lang/java/gui/faq http://www.uni-giessen.de/faq/archiv/computer-lang.java.gui.faq/
Thomas Weidenfeller - 21 Jun 2005 13:22 GMT > The class I use as a key for a HashMap is unique based on a combination of 4 > ints. Normally, about 5000 instances are created and persist over time. [quoted text clipped - 3 lines] > the 128 bits contained in the 4 ints? Or should I swap to another data > structure? You don't. Hashes don't have to be unique. It is much better if they are, but they can't under all conditions. That's why you also need to have equals(). The rule is that hashCode() must return the same hash for objects which are equal (and of course of the object is the same), but is allowed to return the same hash also for unequal objects.
An extremely inefficient hash, but still an allowed implementation of hashCode() would be:
public int hashCode() { return 0; }
NO, DON'T DO IT - this is just an example.
So what is a good hash? Entire computer science books can be filled with research about this. I would suggest you start with a simple one, and measure if it is ok for your task. Or, you could try stuff like the following:
Warning:
- This was a quick hack, and I didn't verify if I got the algorithm even remotely right.
- I have also no clue if the algorithm is suitable for your numbers
- It burns a lot more CPU cycles than a simper hash (like just xoring all integers), and might not be worth the trouble at all
static final int FNV1_INIT = 0x811c9dc5; static final int FNV1_PRIME = 0x01000193;
protected int hashFnvInt(int v, int hash) { for(int i = 0; i < 4; i++) {
// equivalent to hash *= FNV1_PRIME // might generate a lot of VM instructions // and not worth the saving of the // multiplication hash += (hash << 1) + (hash << 4) + (hash << 7) + (hash << 8) + (hash << 24); hash ^= v & 0xFF; v >>= 8; } return hash; }
public int hashCode() { return hashFnvInt(D, hashFnvInt(C, hashFnvInt(B, hashFnvInt(A, FNV1_INIT)))); }
Googel for FNV (Fowler, Noll, Vo) for details.
/Thomas
 Signature The comp.lang.java.gui FAQ: ftp://ftp.cs.uu.nl/pub/NEWS.ANSWERS/computer-lang/java/gui/faq
Free MagazinesGet these publications absolutely FREE for up to 12 months. There are no hidden fees and no obligation. Simply choose a title, complete the application form and submit it. Read more ...
|
|
|