Home | Contact Us | FAQ | Search & Site Map | Link to Us
Sign In | Join | Other 45 Sites in Network
HomeAnnouncementsWhite Papers
Discussion GroupsFirst AidDatabasesJavaBeansGUIJava 3DVirtual MachineCORBASecurityToolsGeneral
Java DirectoryOpen Source ProjectsSample Book ChaptersUser GroupsWeb Resources
Related Topics
Databases.NETMore Topics ...

Java Forum / General / June 2007

Tip: Looking for answers? Try searching our database.

String and hash value

Thread view: 
Bruintje Beer - 02 Jun 2007 15:44 GMT
Hi,

Is a hashvalue for a string using the function String.hashCode() always
unique.

John
Lew - 02 Jun 2007 16:02 GMT
> Is a hashvalue for a string using the function String.hashCode() always
> unique.

If by "unique" you mean that any distinct String has a distinct hashCode(), it
is not guaranteed.

Have you read the Javadocs for hashCode()?  The API docs are very often a
perfect first place to find such answers.
<http://java.sun.com/javase/6/docs/api/java/lang/Object.html#hashCode()>
> It is /not/ required that if two objects are unequal according to the equals(java.lang.Object) method, then calling the hashCode method on each of the two objects must produce distinct integer results.

You can examine the particular algorithm for String at
<http://java.sun.com/javase/6/docs/api/java/lang/String.html#hashCode()>

Even without Javadocs it is easy to prove that the hash codes for Strings
cannot be unique for all possible String values.  There are only 2**32 - 1
possible hashCode() results.  Any String longer than two characters has a
larger value set than that.  The sum of the value set counts for Strings of
any length is larger still.

Gotta love those Javadocs.

Signature

Lew

Daniel Dyer - 02 Jun 2007 16:05 GMT
> Hi,
>
> Is a hashvalue for a string using the function String.hashCode() always
> unique.
>
> John

No.  Consider that hasCode returns an int.  An int is a 32-bit value, so  
there are only 2^32 (approx. 4.3 billion) possible values.  There number  
of possible Strings is, for all practical purposes, unlimited.  So it  
would not be possible to write a hashCode implementation that generates a  
different value for each different String.

Dan.

Signature

Daniel Dyer
http//www.uncommons.org

Lew - 02 Jun 2007 16:05 GMT
> Is a hashvalue for a string using the function String.hashCode() always
> unique.

If by "unique" you mean that any distinct String has a distinct hashCode(), it
is not guaranteed.

Have you read the Javadocs for hashCode()?  The API docs are very often a
perfect first place to find such answers.

<http://java.sun.com/javase/6/docs/api/java/lang/Object.html#hashCode()>
> It is /not/ required that if two objects are unequal according to the equals(java.lang.Object) method, then calling the hashCode method on each of the two objects must produce distinct integer results.
(emph. orig.)

You can examine the particular algorithm for String at
<http://java.sun.com/javase/6/docs/api/java/lang/String.html#hashCode()>

Even without Javadocs it is easy to prove that the hash codes for Strings
cannot be unique for all possible String values.  There are only 2**32
possible hashCode() results.  Any String longer than two characters has a
larger value set than that.  The sum of the value set counts for Strings of
any length is larger still.

Gotta love those Javadocs.

Signature

Lew

Robert Klemme - 02 Jun 2007 18:53 GMT
> Is a hashvalue for a string using the function String.hashCode() always
> unique.

Hash values are by definition not guaranteed to be unique.

http://en.wikipedia.org/wiki/Hash_function#Properties

Regards

    robert
Tom Hawtin - 02 Jun 2007 20:36 GMT
> Hash values are by definition not guaranteed to be unique.
>
> http://en.wikipedia.org/wiki/Hash_function#Properties

There are hash functions that are guaranteed to be unique.

http://en.wikipedia.org/wiki/Perfect_hashing

For instance, Boolean.hashCode returns a unique hash for each (both)
possible values.

Tom Hawtin
Robert Klemme - 02 Jun 2007 20:47 GMT
>> Hash values are by definition not guaranteed to be unique.
>>
[quoted text clipped - 3 lines]
>
> http://en.wikipedia.org/wiki/Perfect_hashing

Right.  Thanks for refreshing my CS knowledge!

> For instance, Boolean.hashCode returns a unique hash for each (both)
> possible values.

Which doesn't seem too difficult given the size of the domain. :-)

I believe though, that in practice usually imperfect hashing is used -
especially if you consider that Java collections that use hashing
internally cannot know from what domain values will be.

Kind regards

    robert
Tom Hawtin - 02 Jun 2007 20:35 GMT
> Is a hashvalue for a string using the function String.hashCode() always
> unique.

The String returned by Integer.toString(int) for all possible values of
the argument have unique values. However there are String values that
cannot be returned by Integer.toString(int). As hashCode returns an int,
clearly there has to be some clashes somewhere.

Tom Hawtin
Phi - 03 Jun 2007 19:36 GMT
Bruintje Beer schrieb:
> Hi,
>
> Is a hashvalue for a string using the function String.hashCode() always
> unique.
>
> John

Hi

For two references pointing to an "equal()" String, the hash value must
be equal.
But be aware, that the old JVM (1.0.2) used a different algorithm for
"hashCode()" than the new (1.6).
I have forgotten, in which version it realy changed; but if you want to
have a unique hash-value that is unique accross different
Java-platforms, copy the source of a "good" hash code (eg VM 1.6.0) into
your hash-sensitive code.

phi


Free Magazines

Get these publications absolutely FREE for up to 12 months. There are no hidden fees and no obligation. Simply choose a title, complete the application form and submit it. Read more ...

Oracle MagazineNetwork ComputingComputer WorldBio-IT WorldeWeekInformation WeekInfosecurity
 
Sign In
Join
My Latest Posts
My Monitored Threads
My Blog
My Photo Gallery
My Profile
My Homepage

Start New Thread
Enable EMail Alerts
Rate this Thread



©2008 Advenet LLC   Privacy Policy - Terms of Use
This website includes both content owned or controlled by Advenet as well as content owned or controlled by third parties.