Home | Contact Us | FAQ | Search & Site Map | Link to Us
Sign In | Join | Other 45 Sites in Network
HomeAnnouncementsWhite Papers
Discussion GroupsFirst AidDatabasesJavaBeansGUIJava 3DVirtual MachineCORBASecurityToolsGeneral
Java DirectoryOpen Source ProjectsSample Book ChaptersUser GroupsWeb Resources
Related Topics
Databases.NETMore Topics ...

Java Forum / General / November 2005

Tip: Looking for answers? Try searching our database.

hashCode() ?

Thread view: 
hack_tick - 29 Nov 2005 08:01 GMT
I need to find out whether the hashcode() function actually returns
unique value wrt every different object and can it be uswed to identify
a single object in an array of more than 1-million objects?
Stefan Schulz - 29 Nov 2005 08:08 GMT
> I need to find out whether the hashcode() function actually returns
> unique value wrt every different object and can it be uswed to identify
> a single object in an array of more than 1-million objects?

Most assuredly not. It only guarantees that objects which equals() returns
true for have the same hashcode. The following is perfectly legal:

class HashBreaker {
 public int hashCode(){
   return 0;
 }
}

Signature

You can't run away forever,
But there's nothing wrong with getting a good head start.
          --- Jim Steinman, "Rock and Roll Dreams Come Through"
         

NullBock - 29 Nov 2005 09:21 GMT
Short answer: no.  Long answer, maybe. ;>

It depends on how you write the object.  For instance, consider the
class

class UniqueHash {
 static int id = 0;
 int hashCode = id++;

 public int hashCode() {
   return hashCode;
 }
}

Objects instantiated from this class are guaranteed to have unique hash
codes, and so the following will be true for all instances of a and b:

UniqueHash a,b;
//...
a.equals(b) == (a.hashCode() == b.hashCode());

And indeed, you could replace the a.equals(b) with a == b for the same
results.

Typically, though, such behavior is undesirable and even dangerous.
hashCode and equals do different things.  It sounds like you need not
an array but a HashMap, which isn't significantly slower than an array
anyhow.

Walter Gildersleeve
Freiburg, Germany

______________________________________________________
http://linkfrog.net
URL Shortening
Free and easy, small and green.
NullBock - 29 Nov 2005 09:24 GMT
Ok, I'm just being anal, but I had to fix my code for multi-threading:

class UniqueHash {
 static int id = 0;
 int hashCode = nextHash();

 private static synchronized int nextHash() {
   return id++;
 }

 public int hashCode() {
   return hashCode;
 }

}

Walter Gildersleeve
Freiburg, Germany

______________________________________________________
http://linkfrog.net
URL Shortening
Free and easy, small and green.
hack_tick - 29 Nov 2005 10:41 GMT
I am not able to make out whats the problem in your class, seems
threadsafe, any ideas/suggestions from others?
NullBock - 29 Nov 2005 10:56 GMT
the command:

hashCode = id++;

consists of two, non-atomic actions.  First, the hashCode field is set
to id, then id is incremented is incremented.  What happens if one
thread yields in the middle of this operation?  hashCode will get a
stale value of id.

id == 1;

thread one
hashCode = id [hashCode == 1]
yields
thread two
hashCode = id [hashCode == 1]
id ++ [id == 2]
yields
thread one
id ++ [id == 3]

Then you would have two instances of UniqueHash with the hashCode 1.

Walter Gildersleeve
Freiburg, Germany

______________________________________________________
http://linkfrog.net
URL Shortening
Free and easy, small and green.
hack_tick - 29 Nov 2005 11:11 GMT
execuse me for my vary naive knowledge of the Java-language, but isnt
the function  nextHash() synchronized?
NullBock - 29 Nov 2005 11:15 GMT
Exactly--it's the version of the UniqueHash that *is* thread-safe.  The
original, where I merely set hashCode = id++, was not thread safe.

Walter Gildersleeve
Freiburg, Germany

______________________________________________________
http://linkfrog.net
URL Shortening
Free and easy, small and green.
Roedy Green - 29 Nov 2005 17:50 GMT
>consists of two, non-atomic actions.  First, the hashCode field is set
>to id, then id is incremented is incremented.  What happens if one
>thread yields in the middle of this operation?  hashCode will get a
>stale value of id.

An hashcode is assigned in the constructor. Only one thread can
possibly see an object as it is being constructed.
Signature

Canadian Mind Products, Roedy Green.
http://mindprod.com Java custom programming, consulting and coaching.

Thomas Hawtin - 29 Nov 2005 19:50 GMT
>>consists of two, non-atomic actions.  First, the hashCode field is set
>>to id, then id is incremented is incremented.  What happens if one
[quoted text clipped - 3 lines]
> An hashcode is assigned in the constructor. Only one thread can
> possibly see an object as it is being constructed.

Actually there were two thread-safety bugs in the initial example code,
but one wouldn't usually be considered as such.

The important problem is the access to the static field. Two objects
could be created at the same time. That could lead to the second
thread's constructor run loading a value between the first's load and store.

The second problem is that Java does not guarantee anything remotely
like Sequential Consistency. You may create an object and then store a
reference to it. But there is not a guarantee that another thread will
see it in the same order. Prior to 1.4 you could ensure correct
behaviour with client code that recklessly uses objects of your class
between threads without correct synchronisation. However you would
require a synchronized block in both the constructor (very rare) and the
hashCode method. From 1.5 you can just define the field final (and make
sure not to leak this from the constructor).

Sun's implementation of Object.hashCode appears to assign the identity
hashcode lazily. Presumably to keep the allocation code as short as
possible.

Tom Hawtin
Signature

Unemployed English Java programmer
http://jroller.com/page/tackline/

Chris Uppal - 29 Nov 2005 11:01 GMT
> Ok, I'm just being anal, but I had to fix my code for multi-threading:

Well, if we're nitpicking...

Your nextHash() depends on 32-bit integer arithmetic not wrapping around.
That's a plausible assumption for short-lived applications, but may not be
valid in general.  Not a problem with your code, specifically, it's a certain
consquence hashCode() returning an int.

   -- chris
NullBock - 29 Nov 2005 11:35 GMT
Yah, I thought about that, too.  But if you want to have unique hash
codes for objects, then you're going to be confronted with integer
boundary problems at some point.  I guess I could've set id to
Integer.MIN_VALUE, which've given more than twice as many
possibilities, eh?

'Course, I doubt anyone's going to need more than 2 bil instances
anyhow.
Mike Schilling - 29 Nov 2005 15:06 GMT
> Yah, I thought about that, too.  But if you want to have unique hash
> codes for objects, then you're going to be confronted with integer
> boundary problems at some point.  I guess I could've set id to
> Integer.MIN_VALUE, which've given more than twice as many
> possibilities, eh?

Same number of possibilities, just in a different order.  (Integer.MAX_VALUE
+ 1 = Integer.MIN_VALUE).
Thomas Hawtin - 29 Nov 2005 17:03 GMT
> Ok, I'm just being anal, but I had to fix my code for multi-threading:

class AlmostUniqueHash {
>   static int id = 0;
>   int hashCode = nextHash();
>
>   private static synchronized int nextHash() {
>     return id++;
>   }

From 1.5 you can getter performance performance using
java.util.concurrent.atomic.AtomicInteger. OTOH, you might consider me
more anal than yourself.

(Modern Sun) ThreadLocal uses a similar tactic to encourage perfect
hashing. Its particular hash table implementation works better if hash
entries not only avoid clashing, but also avoid being close to one another.

Tom Hawtin
Signature

Unemployed English Java programmer
http://jroller.com/page/tackline/

Chris Uppal - 30 Nov 2005 09:23 GMT
>  From 1.5 you can getter performance performance using
> java.util.concurrent.atomic.AtomicInteger.

Do you happen to know what the performance difference is ? How much faster is
it than an acquire/release of an uncontended lock ?  Or a contended lock ?  How
well does the performance difference generalise across architectures ?

I could measure it on a uniprocessor Pentium box, but that's none too
representative these days, and anyway I want to do other things today...

   -- chris
Thomas Hawtin - 30 Nov 2005 12:21 GMT
>> From 1.5 you can getter performance performance using
>>java.util.concurrent.atomic.AtomicInteger.
[quoted text clipped - 5 lines]
> I could measure it on a uniprocessor Pentium box, but that's none too
> representative these days, and anyway I want to do other things today...

I just tried to write a benchmark on my Celeron D box. The AtomicInteger
version got optimised away. It's essentially a wrapper around an
intrinsic. Presumably the really important benefits are on multithreaded
hardware where you want to avoid thread contention as much as possible.

Tom Hawtin
Signature

Unemployed English Java programmer
http://jroller.com/page/tackline/

Chris Uppal - 30 Nov 2005 17:24 GMT
> I just tried to write a benchmark on my Celeron D box. The AtomicInteger
> version got optimised away.

That should be fast enough for some applications ;-)

> Presumably the really important benefits are on multithreaded
> hardware where you want to avoid thread contention as much as possible.

And perhaps the AtomicInteger stuff can benefit by not having to force a full
resynchronisation of local/remote memory.

What I need is a loosely coupled multiprocessor box to test this on.  Sadly, I
have neither the cash nor the space...

   -- chris
zero - 29 Nov 2005 11:00 GMT
> I need to find out whether the hashcode() function actually returns
> unique value wrt every different object and can it be uswed to identify
> a single object in an array of more than 1-million objects?

From the Object JavaDoc:

<quote>
It is not required that if two objects are unequal according to the equals
(java.lang.Object) method, then calling the hashCode method on each of the
two objects must produce distinct integer results.
</quote>

In other words, a hashCode is not unique.  A good hashCode implementation
does try its best to be unique, but there is no way to be sure.  Just think
about it, conceptually there can be an unlimited number of different
objects.  A hashCode however is an integer, which has at most 2^32
different values.

The hashCode inherited from Object is a "best effort" to create a unique
integer.  However, for best results you should override hashCode if you
intend to use it to identify objects.

Signature

Beware the False Authority Syndrome

Roedy Green - 29 Nov 2005 11:07 GMT
>I need to find out whether the hashcode() function actually returns
>unique value wrt every different object and can it be uswed to identify
>a single object in an array of more than 1-million objects?

In general it will not.  That is not its function. however
Object.hashcode uses the address as its hash, so it is unique.j

You could check by dumping all the hashcodes to a file and using an
external sort then looking for dups.
Signature

Canadian Mind Products, Roedy Green.
http://mindprod.com Java custom programming, consulting and coaching.

Ian Pilcher - 29 Nov 2005 14:56 GMT
> In general it will not.  That is not its function. however
> Object.hashcode uses the address as its hash, so it is unique.j

Actually, it does eventually produce collisions (Sun 1.5.0).

Signature

========================================================================
Ian Pilcher                                        i.pilcher@comcast.net
========================================================================

Roedy Green - 29 Nov 2005 17:54 GMT
On Tue, 29 Nov 2005 08:56:12 -0600, Ian Pilcher
<i.pilcher@comcast.net> wrote, quoted or indirectly quoted someone who
said :

>Actually, it does eventually produce collisions (Sun 1.5.0).

If you held onto all your objects you could get uniqueness.

But this is abuse of hashCode. If you want some unique identifier,
assign it yourself, perhaps using a long.
Signature

Canadian Mind Products, Roedy Green.
http://mindprod.com Java custom programming, consulting and coaching.

Thomas Hawtin - 29 Nov 2005 20:03 GMT
> On Tue, 29 Nov 2005 08:56:12 -0600, Ian Pilcher
> <i.pilcher@comcast.net> wrote, quoted or indirectly quoted someone who
> said :
>
>>Actually, it does eventually produce collisions (Sun 1.5.0).
            ^^i.e. Object.hashCode/System.identityHashCode

> If you held onto all your objects you could get uniqueness.

Not true. Holding onto objects does not help much. The code below has
references to two objects with the same identity hash code simultaneously.

The address isn't used raw, as that produces hash codes with the three
lowest bits zero.

http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6321873

class HashClash {
    public static void main(String[] args) {
         final Object obj = new Object();
         final int target = obj.hashCode();
         Object clash;
         long ct = 0;
         do {
             clash = new Object();
             ++ct;
         } while (clash.hashCode() != target && ct<10L*1000*1000*1000L);
         if (clash.hashCode() == target) {
             System.out.println(ct+": "+obj+" - "+clash);
         } else {
             System.out.println("No clashes found");
         }
    }
}

62028120: java.lang.Object@9cab16 - java.lang.Object@9cab16

Tom Hawtin
Signature

Unemployed English Java programmer
http://jroller.com/page/tackline/

Roedy Green - 30 Nov 2005 00:30 GMT
On Tue, 29 Nov 2005 20:03:00 +0000, Thomas Hawtin
<usenet@tackline.plus.com> wrote, quoted or indirectly quoted someone
who said :

>class HashClash {
>     public static void main(String[] args) {
[quoted text clipped - 13 lines]
>     }
>}

think this works because obj is considered not living even before you
get into the loop, so its address/hashcode are up for grabs to
recycle.
Signature

Canadian Mind Products, Roedy Green.
http://mindprod.com Java custom programming, consulting and coaching.

Thomas Hawtin - 30 Nov 2005 01:03 GMT
> think this works because obj is considered not living even before you
> get into the loop, so its address/hashcode are up for grabs to
> recycle.

The obj reference is used in the println expression.

From a pedantic point of view, it could be removed, all except for a
copy of the hash code. One way to guarantee that objects are alive (from
1.5) is to later assign them to a volatile reference. Handy for
finalisers. Here is a corrected version of the program:

class HashClash {
    private static volatile Object dummy;
    public static void main(String[] args) {
         final Object obj = new Object();
         final int target = obj.hashCode();
         Object clash;
         long ct = 0;
         do {
             clash = new Object();
             ++ct;
         } while (clash.hashCode() != target && ct<10L*1000*1000*1000L);
         if (clash.hashCode() == target) {
             System.out.println(ct+": "+obj+" - "+clash);
         } else {
             System.out.println("No clashes found");
         }
         dummy = obj;
         dummy = clash;
    }
}

Which gives:

62028120: java.lang.Object@9cab16 - java.lang.Object@9cab16

Tom Hawtin
Signature

Unemployed English Java programmer
http://jroller.com/page/tackline/

Thomas Hawtin - 29 Nov 2005 17:10 GMT
>>I need to find out whether the hashcode() function actually returns
>>unique value wrt every different object and can it be uswed to identify
>>a single object in an array of more than 1-million objects?
>
> In general it will not.  That is not its function. however
> Object.hashcode uses the address as its hash, so it is unique.j

We did this two or three months ago. Object addresses change on modern
JVMs, so that cannot be used as the way hashCode are retained (although
it may well have something to do with generation...). It is unfeasible
for even a 32-bit JVM to keep track of all allocated hash codes.
Therefore you should expect duplicates. Sun's (SE) JVM does not disappoint.

This is not an unusual misunderstanding. I have filed a bug with respect
to the Object.hashCode documentation. There is simplistic a program in
the comments that may find clashes.

http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6321873

Tom Hawtin
Signature

Unemployed English Java programmer
http://jroller.com/page/tackline/

Chris Uppal - 29 Nov 2005 11:08 GMT
> I need to find out whether the hashcode() function actually returns
> unique value wrt every different object and can it be uswed to identify
> a single object in an array of more than 1-million objects?

It's not defined to be true, so it isn't.

OTOH, if you are asking whether, /in fact/, it might work, then it /may/ be the
case that for a given JVM implementation, and providing you are only using
objects with the default hashCode() (or are using the system identity hash
explicitly), that no two objects that exist at the same time have the same
hash.  However that assumption depends on the implementation of the JVM and is
quite likely to be invalid.

Don't do it in production code.  Don't even do it in throwaway code unless (for
your purposes) any errors are tolerable and/or detectable.

   -- chris
~Glynne - 29 Nov 2005 18:02 GMT
> I need to find out whether the hashcode() function actually returns
> unique value wrt every different object and can it be uswed to identify
> a single object in an array of more than 1-million objects?

The following are the only deductions that can be made about objects
and their hashcodes:
1)  IF ob1.equals(ob2)  THEN  hash1==hash2
2)  IF hash1 != hash2  THEN  !(ob1.equals(ob2))

In particular, note that the converse statements do not hold, i.e.
1')  hash1==hash2 does not imply that ob1.equals(ob2)
2')  !(ob1.equals(ob2)) does not imply that hash1 != hash2

In light of this, an idiom that I've found useful is to go ahead and
use the hashcode to search the array.  This will narrow your million
objects down to the one that you're looking for....plus zero or more
objects whose hashcodes collide with it.

Finally, you can filter the handful of candidates identified by
hashcode matching, by using object comparisons to find your one true
match.

~Glynne
NullBock - 29 Nov 2005 20:18 GMT
Which of course begs the question, why an array, and not a HashMap?


Free Magazines

Get these publications absolutely FREE for up to 12 months. There are no hidden fees and no obligation. Simply choose a title, complete the application form and submit it. Read more ...

Oracle MagazineNetwork ComputingComputer WorldBio-IT WorldeWeekInformation WeekInfosecurity
 
Sign In
Join
My Latest Posts
My Monitored Threads
My Blog
My Photo Gallery
My Profile
My Homepage

Start New Thread
Enable EMail Alerts
Rate this Thread



©2008 Advenet LLC   Privacy Policy - Terms of Use
This website includes both content owned or controlled by Advenet as well as content owned or controlled by third parties.