Java Forum / General / November 2005
hashCode() ?
hack_tick - 29 Nov 2005 08:01 GMT I need to find out whether the hashcode() function actually returns unique value wrt every different object and can it be uswed to identify a single object in an array of more than 1-million objects?
Stefan Schulz - 29 Nov 2005 08:08 GMT > I need to find out whether the hashcode() function actually returns > unique value wrt every different object and can it be uswed to identify > a single object in an array of more than 1-million objects? Most assuredly not. It only guarantees that objects which equals() returns true for have the same hashcode. The following is perfectly legal:
class HashBreaker { public int hashCode(){ return 0; } }
 Signature You can't run away forever, But there's nothing wrong with getting a good head start. --- Jim Steinman, "Rock and Roll Dreams Come Through"
NullBock - 29 Nov 2005 09:21 GMT Short answer: no. Long answer, maybe. ;>
It depends on how you write the object. For instance, consider the class
class UniqueHash { static int id = 0; int hashCode = id++;
public int hashCode() { return hashCode; } }
Objects instantiated from this class are guaranteed to have unique hash codes, and so the following will be true for all instances of a and b:
UniqueHash a,b; //... a.equals(b) == (a.hashCode() == b.hashCode());
And indeed, you could replace the a.equals(b) with a == b for the same results.
Typically, though, such behavior is undesirable and even dangerous. hashCode and equals do different things. It sounds like you need not an array but a HashMap, which isn't significantly slower than an array anyhow.
Walter Gildersleeve Freiburg, Germany
______________________________________________________ http://linkfrog.net URL Shortening Free and easy, small and green.
NullBock - 29 Nov 2005 09:24 GMT Ok, I'm just being anal, but I had to fix my code for multi-threading:
class UniqueHash { static int id = 0; int hashCode = nextHash();
private static synchronized int nextHash() { return id++; }
public int hashCode() { return hashCode; }
}
Walter Gildersleeve Freiburg, Germany
______________________________________________________ http://linkfrog.net URL Shortening Free and easy, small and green.
hack_tick - 29 Nov 2005 10:41 GMT I am not able to make out whats the problem in your class, seems threadsafe, any ideas/suggestions from others?
NullBock - 29 Nov 2005 10:56 GMT the command:
hashCode = id++;
consists of two, non-atomic actions. First, the hashCode field is set to id, then id is incremented is incremented. What happens if one thread yields in the middle of this operation? hashCode will get a stale value of id.
id == 1;
thread one hashCode = id [hashCode == 1] yields thread two hashCode = id [hashCode == 1] id ++ [id == 2] yields thread one id ++ [id == 3]
Then you would have two instances of UniqueHash with the hashCode 1.
Walter Gildersleeve Freiburg, Germany
______________________________________________________ http://linkfrog.net URL Shortening Free and easy, small and green.
hack_tick - 29 Nov 2005 11:11 GMT execuse me for my vary naive knowledge of the Java-language, but isnt the function nextHash() synchronized?
NullBock - 29 Nov 2005 11:15 GMT Exactly--it's the version of the UniqueHash that *is* thread-safe. The original, where I merely set hashCode = id++, was not thread safe.
Walter Gildersleeve Freiburg, Germany
______________________________________________________ http://linkfrog.net URL Shortening Free and easy, small and green.
Roedy Green - 29 Nov 2005 17:50 GMT >consists of two, non-atomic actions. First, the hashCode field is set >to id, then id is incremented is incremented. What happens if one >thread yields in the middle of this operation? hashCode will get a >stale value of id. An hashcode is assigned in the constructor. Only one thread can possibly see an object as it is being constructed.
 Signature Canadian Mind Products, Roedy Green. http://mindprod.com Java custom programming, consulting and coaching.
Thomas Hawtin - 29 Nov 2005 19:50 GMT >>consists of two, non-atomic actions. First, the hashCode field is set >>to id, then id is incremented is incremented. What happens if one [quoted text clipped - 3 lines] > An hashcode is assigned in the constructor. Only one thread can > possibly see an object as it is being constructed. Actually there were two thread-safety bugs in the initial example code, but one wouldn't usually be considered as such.
The important problem is the access to the static field. Two objects could be created at the same time. That could lead to the second thread's constructor run loading a value between the first's load and store.
The second problem is that Java does not guarantee anything remotely like Sequential Consistency. You may create an object and then store a reference to it. But there is not a guarantee that another thread will see it in the same order. Prior to 1.4 you could ensure correct behaviour with client code that recklessly uses objects of your class between threads without correct synchronisation. However you would require a synchronized block in both the constructor (very rare) and the hashCode method. From 1.5 you can just define the field final (and make sure not to leak this from the constructor).
Sun's implementation of Object.hashCode appears to assign the identity hashcode lazily. Presumably to keep the allocation code as short as possible.
Tom Hawtin
 Signature Unemployed English Java programmer http://jroller.com/page/tackline/
Chris Uppal - 29 Nov 2005 11:01 GMT > Ok, I'm just being anal, but I had to fix my code for multi-threading: Well, if we're nitpicking...
Your nextHash() depends on 32-bit integer arithmetic not wrapping around. That's a plausible assumption for short-lived applications, but may not be valid in general. Not a problem with your code, specifically, it's a certain consquence hashCode() returning an int.
-- chris
NullBock - 29 Nov 2005 11:35 GMT Yah, I thought about that, too. But if you want to have unique hash codes for objects, then you're going to be confronted with integer boundary problems at some point. I guess I could've set id to Integer.MIN_VALUE, which've given more than twice as many possibilities, eh?
'Course, I doubt anyone's going to need more than 2 bil instances anyhow.
Mike Schilling - 29 Nov 2005 15:06 GMT > Yah, I thought about that, too. But if you want to have unique hash > codes for objects, then you're going to be confronted with integer > boundary problems at some point. I guess I could've set id to > Integer.MIN_VALUE, which've given more than twice as many > possibilities, eh? Same number of possibilities, just in a different order. (Integer.MAX_VALUE + 1 = Integer.MIN_VALUE).
Thomas Hawtin - 29 Nov 2005 17:03 GMT > Ok, I'm just being anal, but I had to fix my code for multi-threading: class AlmostUniqueHash {
> static int id = 0; > int hashCode = nextHash(); > > private static synchronized int nextHash() { > return id++; > } From 1.5 you can getter performance performance using java.util.concurrent.atomic.AtomicInteger. OTOH, you might consider me more anal than yourself.
(Modern Sun) ThreadLocal uses a similar tactic to encourage perfect hashing. Its particular hash table implementation works better if hash entries not only avoid clashing, but also avoid being close to one another.
Tom Hawtin
 Signature Unemployed English Java programmer http://jroller.com/page/tackline/
Chris Uppal - 30 Nov 2005 09:23 GMT > From 1.5 you can getter performance performance using > java.util.concurrent.atomic.AtomicInteger. Do you happen to know what the performance difference is ? How much faster is it than an acquire/release of an uncontended lock ? Or a contended lock ? How well does the performance difference generalise across architectures ?
I could measure it on a uniprocessor Pentium box, but that's none too representative these days, and anyway I want to do other things today...
-- chris
Thomas Hawtin - 30 Nov 2005 12:21 GMT >> From 1.5 you can getter performance performance using >>java.util.concurrent.atomic.AtomicInteger. [quoted text clipped - 5 lines] > I could measure it on a uniprocessor Pentium box, but that's none too > representative these days, and anyway I want to do other things today... I just tried to write a benchmark on my Celeron D box. The AtomicInteger version got optimised away. It's essentially a wrapper around an intrinsic. Presumably the really important benefits are on multithreaded hardware where you want to avoid thread contention as much as possible.
Tom Hawtin
 Signature Unemployed English Java programmer http://jroller.com/page/tackline/
Chris Uppal - 30 Nov 2005 17:24 GMT > I just tried to write a benchmark on my Celeron D box. The AtomicInteger > version got optimised away. That should be fast enough for some applications ;-)
> Presumably the really important benefits are on multithreaded > hardware where you want to avoid thread contention as much as possible. And perhaps the AtomicInteger stuff can benefit by not having to force a full resynchronisation of local/remote memory.
What I need is a loosely coupled multiprocessor box to test this on. Sadly, I have neither the cash nor the space...
-- chris
zero - 29 Nov 2005 11:00 GMT > I need to find out whether the hashcode() function actually returns > unique value wrt every different object and can it be uswed to identify > a single object in an array of more than 1-million objects? From the Object JavaDoc:
<quote> It is not required that if two objects are unequal according to the equals (java.lang.Object) method, then calling the hashCode method on each of the two objects must produce distinct integer results. </quote>
In other words, a hashCode is not unique. A good hashCode implementation does try its best to be unique, but there is no way to be sure. Just think about it, conceptually there can be an unlimited number of different objects. A hashCode however is an integer, which has at most 2^32 different values.
The hashCode inherited from Object is a "best effort" to create a unique integer. However, for best results you should override hashCode if you intend to use it to identify objects.
 Signature Beware the False Authority Syndrome
Roedy Green - 29 Nov 2005 11:07 GMT >I need to find out whether the hashcode() function actually returns >unique value wrt every different object and can it be uswed to identify >a single object in an array of more than 1-million objects? In general it will not. That is not its function. however Object.hashcode uses the address as its hash, so it is unique.j
You could check by dumping all the hashcodes to a file and using an external sort then looking for dups.
 Signature Canadian Mind Products, Roedy Green. http://mindprod.com Java custom programming, consulting and coaching.
Ian Pilcher - 29 Nov 2005 14:56 GMT > In general it will not. That is not its function. however > Object.hashcode uses the address as its hash, so it is unique.j Actually, it does eventually produce collisions (Sun 1.5.0).
 Signature ======================================================================== Ian Pilcher i.pilcher@comcast.net ========================================================================
Roedy Green - 29 Nov 2005 17:54 GMT On Tue, 29 Nov 2005 08:56:12 -0600, Ian Pilcher <i.pilcher@comcast.net> wrote, quoted or indirectly quoted someone who said :
>Actually, it does eventually produce collisions (Sun 1.5.0). If you held onto all your objects you could get uniqueness.
But this is abuse of hashCode. If you want some unique identifier, assign it yourself, perhaps using a long.
 Signature Canadian Mind Products, Roedy Green. http://mindprod.com Java custom programming, consulting and coaching.
Thomas Hawtin - 29 Nov 2005 20:03 GMT > On Tue, 29 Nov 2005 08:56:12 -0600, Ian Pilcher > <i.pilcher@comcast.net> wrote, quoted or indirectly quoted someone who > said : > >>Actually, it does eventually produce collisions (Sun 1.5.0). ^^i.e. Object.hashCode/System.identityHashCode
> If you held onto all your objects you could get uniqueness. Not true. Holding onto objects does not help much. The code below has references to two objects with the same identity hash code simultaneously.
The address isn't used raw, as that produces hash codes with the three lowest bits zero.
http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6321873
class HashClash { public static void main(String[] args) { final Object obj = new Object(); final int target = obj.hashCode(); Object clash; long ct = 0; do { clash = new Object(); ++ct; } while (clash.hashCode() != target && ct<10L*1000*1000*1000L); if (clash.hashCode() == target) { System.out.println(ct+": "+obj+" - "+clash); } else { System.out.println("No clashes found"); } } }
62028120: java.lang.Object@9cab16 - java.lang.Object@9cab16
Tom Hawtin
 Signature Unemployed English Java programmer http://jroller.com/page/tackline/
Roedy Green - 30 Nov 2005 00:30 GMT On Tue, 29 Nov 2005 20:03:00 +0000, Thomas Hawtin <usenet@tackline.plus.com> wrote, quoted or indirectly quoted someone who said :
>class HashClash { > public static void main(String[] args) { [quoted text clipped - 13 lines] > } >} think this works because obj is considered not living even before you get into the loop, so its address/hashcode are up for grabs to recycle.
 Signature Canadian Mind Products, Roedy Green. http://mindprod.com Java custom programming, consulting and coaching.
Thomas Hawtin - 30 Nov 2005 01:03 GMT > think this works because obj is considered not living even before you > get into the loop, so its address/hashcode are up for grabs to > recycle. The obj reference is used in the println expression.
From a pedantic point of view, it could be removed, all except for a copy of the hash code. One way to guarantee that objects are alive (from 1.5) is to later assign them to a volatile reference. Handy for finalisers. Here is a corrected version of the program:
class HashClash { private static volatile Object dummy; public static void main(String[] args) { final Object obj = new Object(); final int target = obj.hashCode(); Object clash; long ct = 0; do { clash = new Object(); ++ct; } while (clash.hashCode() != target && ct<10L*1000*1000*1000L); if (clash.hashCode() == target) { System.out.println(ct+": "+obj+" - "+clash); } else { System.out.println("No clashes found"); } dummy = obj; dummy = clash; } }
Which gives:
62028120: java.lang.Object@9cab16 - java.lang.Object@9cab16
Tom Hawtin
 Signature Unemployed English Java programmer http://jroller.com/page/tackline/
Thomas Hawtin - 29 Nov 2005 17:10 GMT >>I need to find out whether the hashcode() function actually returns >>unique value wrt every different object and can it be uswed to identify >>a single object in an array of more than 1-million objects? > > In general it will not. That is not its function. however > Object.hashcode uses the address as its hash, so it is unique.j We did this two or three months ago. Object addresses change on modern JVMs, so that cannot be used as the way hashCode are retained (although it may well have something to do with generation...). It is unfeasible for even a 32-bit JVM to keep track of all allocated hash codes. Therefore you should expect duplicates. Sun's (SE) JVM does not disappoint.
This is not an unusual misunderstanding. I have filed a bug with respect to the Object.hashCode documentation. There is simplistic a program in the comments that may find clashes.
http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6321873
Tom Hawtin
 Signature Unemployed English Java programmer http://jroller.com/page/tackline/
Chris Uppal - 29 Nov 2005 11:08 GMT > I need to find out whether the hashcode() function actually returns > unique value wrt every different object and can it be uswed to identify > a single object in an array of more than 1-million objects? It's not defined to be true, so it isn't.
OTOH, if you are asking whether, /in fact/, it might work, then it /may/ be the case that for a given JVM implementation, and providing you are only using objects with the default hashCode() (or are using the system identity hash explicitly), that no two objects that exist at the same time have the same hash. However that assumption depends on the implementation of the JVM and is quite likely to be invalid.
Don't do it in production code. Don't even do it in throwaway code unless (for your purposes) any errors are tolerable and/or detectable.
-- chris
~Glynne - 29 Nov 2005 18:02 GMT > I need to find out whether the hashcode() function actually returns > unique value wrt every different object and can it be uswed to identify > a single object in an array of more than 1-million objects? The following are the only deductions that can be made about objects and their hashcodes: 1) IF ob1.equals(ob2) THEN hash1==hash2 2) IF hash1 != hash2 THEN !(ob1.equals(ob2))
In particular, note that the converse statements do not hold, i.e. 1') hash1==hash2 does not imply that ob1.equals(ob2) 2') !(ob1.equals(ob2)) does not imply that hash1 != hash2
In light of this, an idiom that I've found useful is to go ahead and use the hashcode to search the array. This will narrow your million objects down to the one that you're looking for....plus zero or more objects whose hashcodes collide with it.
Finally, you can filter the handful of candidates identified by hashcode matching, by using object comparisons to find your one true match.
~Glynne
NullBock - 29 Nov 2005 20:18 GMT Which of course begs the question, why an array, and not a HashMap?
Free MagazinesGet these publications absolutely FREE for up to 12 months. There are no hidden fees and no obligation. Simply choose a title, complete the application form and submit it. Read more ...
|
|
|