Home | Contact Us | FAQ | Search & Site Map | Link to Us
Sign In | Join | Other 45 Sites in Network
HomeAnnouncementsWhite Papers
Discussion GroupsFirst AidDatabasesJavaBeansGUIJava 3DVirtual MachineCORBASecurityToolsGeneral
Java DirectoryOpen Source ProjectsSample Book ChaptersUser GroupsWeb Resources
Related Topics
Databases.NETMore Topics ...

Java Forum / General / July 2006

Tip: Looking for answers? Try searching our database.

hash codes

Thread view: 
rmacnak@gmail.com - 26 Jul 2006 23:01 GMT
What propose/benefit does Object.hashCode() serve?  Why would I want to
use this?
VisionSet - 26 Jul 2006 23:14 GMT
> What propose/benefit does Object.hashCode() serve?  Why would I want to
> use this?

It is used to perform fast hashing ie equality checks of objects.
the equals() method is often slow to evaluate, hashcode should provide a
quickly evaluated value such that 2 equal objects always produce the same
hashcode. If 2 equal hashcodes do not however indicate an object is equal,
but rather if this is the case the calling code should also then call
equals().  This reduces hugely the number of times in total equals() is
called on say a large collection of objects.

--
Mike W
rmacnak@gmail.com - 27 Jul 2006 00:50 GMT
> > What propose/benefit does Object.hashCode() serve?  Why would I want to
> > use this?
[quoted text clipped - 9 lines]
> --
> Mike W

Thanks alot, but are there any other purposes besides faster equals()?
I think I heard something about checksums and hashs.
Mark Space - 27 Jul 2006 01:44 GMT
> Thanks alot, but are there any other purposes besides faster equals()?
> I think I heard something about checksums and hashs.

Besides the obvious (using hashcodes in a hashmap to lookup/search
faster), there's Bloom Filters.  Bloom Filters speed computation of a
true/false value when the computation is very expensive.

http://www.beachsoftware.com/bloom/white_paper.html
Simon - 27 Jul 2006 09:22 GMT
VisionSet schrieb:
>> What propose/benefit does Object.hashCode() serve?  Why would I want to
>> use this?
[quoted text clipped - 3 lines]
> quickly evaluated value such that 2 equal objects always produce the same
> hashcode.

Assume two objects are represented by n and m bits, respectively. Then,
obviously, comparison is in O(min(n,m)). Can you give an example where hash code
computation for both objects is in o(min(n,m))?

What you can do, of course, is cache the hash value. Then, hash value
computation often is only a lookup and then is in O(1). Also, if you want to
compare to a remote object, comparing the hash value first can save
communication cost. However, I don't see how you can save computation time.

> If 2 equal hashcodes do not however indicate an object is equal,
> but rather if this is the case the calling code should also then call
> equals().  This reduces hugely the number of times in total equals() is

I agree. So actually, if hash values are cached, comparing hashValue()s should
be the first thing to do inside the equals() method (as opposed to letting the
user do this each time before he uses the equals). Do you know why
String.equals() doen't do this?

Cheers,
Simon
Patricia Shanahan - 27 Jul 2006 14:14 GMT
> VisionSet schrieb:
...
>> If 2 equal hashcodes do not however indicate an object is equal,
>> but rather if this is the case the calling code should also then call
[quoted text clipped - 4 lines]
> user do this each time before he uses the equals). Do you know why
> String.equals() doen't do this?

The hash values are cached, and are pre-checked before calling equals,
in classes such as HashMap.

I'm not sure that calculating and caching the hash values in String
would be a gain. It would depend on the average number of comparisons
per String, and also the lengths of the matching prefixes.

HashMap needs the hash code anyway, so pre-checking is effectively free.

Patricia
Simon - 27 Jul 2006 15:06 GMT
Patricia Shanahan schrieb:
>> VisionSet schrieb:
> ...
[quoted text clipped - 11 lines]
> The hash values are cached, and are pre-checked before calling equals,
> in classes such as HashMap.

Probably I expressed myself incorrectly. I should have said "*since* hash values
are cached..." I'm not asking why String doesn't cache them, but why String
doesn't use them in its equals() method.

> I'm not sure that calculating and caching the hash values in String
> would be a gain. It would depend on the average number of comparisons
> per String, and also the lengths of the matching prefixes.

Let's forget about the prefix length for a second and assume that the strings
differ only in the last character compared. Then, comparison and hash code
computation need, I believe, almost the same computation time. So you gain
something as soon as a String is involved in more than only a few (I would guess
2) comparisons.

Now, assume the strings differ after a short prefix. Then, as you said, you can
easily lose time by computing hash values instead of starting the comparison and
terminating with false quickly. Now you can do several things:

1 - Use the hash value at least if it was computed by some other reason before
(this is free)

2 - Many Strings of length n are created in linear time anyway (e.g. by reading
them from a file or stream, creating them with a StringBuffer, user input, ...).
So you can afford computing the hash value on the fly or after it has been
created and still have linear time. The only way to create a string of length n
in sublinear time I can think of is by substring() etc. E.g. the constructors
String(char[]), String(char[],int,int), and String(String) could call
hashValue() almost for free since they need linear time anyway.

3 - In String.equals() you could compute the hash value as you go along. If the
Strings are equal, you have computed the hash code for free and you can cache
it. If you terminate earlier, you can throw away your intermediate result. You
could still improve this by completing the hash value computation if the
matching prefix has length at least n/2, say. Ok, that doesn't improve the
situation much for adversarial input, but for typical inputs it could be
competitive. And you can also randomize that, and...

What? You think this is getting clumsy...

Well. One could use at least the first option since this one is essentially free.

Cheers,
Simon
Patricia Shanahan - 27 Jul 2006 15:50 GMT
> Patricia Shanahan schrieb:
>>> VisionSet schrieb:
[quoted text clipped - 14 lines]
> are cached..." I'm not asking why String doesn't cache them, but why String
> doesn't use them in its equals() method.

You seem to be ignoring the issue of which class caches hash codes when.

String does not cache hash codes. HashMap, for example, does cache the
hash code for each key in the map, and only calculates the probe key
hash code once during a get.

>> I'm not sure that calculating and caching the hash values in String
>> would be a gain. It would depend on the average number of comparisons
[quoted text clipped - 12 lines]
> 1 - Use the hash value at least if it was computed by some other reason before
> (this is free)

Checking the hash codes costs three comparisons. The condition to
abandon is that the hashes are different and neither of them is zero,
the value used for a hash that has not yet been calculated.

If either hash code has not been calculated, or the strings being
compared are equal, or differ but have equal hash codes, or differ
somewhere in the first few characters, it is a net loss.

Note that the pre-check behavior of HashMap tends to reduce the
probability of comparisons between strings whose hash codes have already
been calculated. While it does force hash code calculation, it never
calls equals for a pair of objects with different hash codes.

It is possible that it might be a gain, but not obvious. It would
require experiments across a wide range of benchmarks.

> 2 - Many Strings of length n are created in linear time anyway (e.g. by reading
> them from a file or stream, creating them with a StringBuffer, user input, ...).
[quoted text clipped - 3 lines]
> String(char[]), String(char[],int,int), and String(String) could call
> hashValue() almost for free since they need linear time anyway.

String(char[]) etc. use System.arraycopy. Although it is also linear
time, for any reasonably long string is it much faster than character by
character copying.

> 3 - In String.equals() you could compute the hash value as you go along. If the
> Strings are equal, you have computed the hash code for free and you can cache
[quoted text clipped - 7 lines]
>
> Well. One could use at least the first option since this one is essentially free.

Nope. If it does any good at all, it costs three comparisons.

Patricia
Simon - 27 Jul 2006 16:20 GMT
> You seem to be ignoring the issue of which class caches hash codes when.
>
> String does not cache hash codes.

From the source of String.java, JDK 1.6.0

   /** Cache the hash code for the string */
   private int hash; // Default to 0

[...]

   public int hashCode() {
    int h = hash;
    if (h == 0) {
       int off = offset;
       char val[] = value;
       int len = count;

           for (int i = 0; i < len; i++) {
               h = 31*h + val[off++];
           }
           hash = h;
       }
       return h;
   }

> It is possible that it might be a gain, but not obvious. It would
> require experiments across a wide range of benchmarks.

ACK.

>> Well. One could use at least the first option since this one is
>> essentially free.
>
> Nope. If it does any good at all, it costs three comparisons.

You wrote in an earlier post:

> HashMap needs the hash code anyway, so pre-checking is effectively free.

I'm just saying: If String.equals() does have the hash code anyway (by chance),
pre-checking is effectively free. Three comparisons aren't all that much are
they? You need 2 comparisons in any iteration of the comparison loop.

Cheers,
Simon
Patricia Shanahan - 27 Jul 2006 19:12 GMT
>> You seem to be ignoring the issue of which class caches hash codes when.
>>
[quoted text clipped - 6 lines]
>
> [...]

Sorry about that, I did check and found it does cache when hashCode is
called, but forgot I had already typed that line.

>> It is possible that it might be a gain, but not obvious. It would
>> require experiments across a wide range of benchmarks.
[quoted text clipped - 12 lines]
> pre-checking is effectively free. Three comparisons aren't all that much are
> they? You need 2 comparisons in any iteration of the comparison loop.

There are two important differences between HashMap and String on this:

1. HashMap must calculate the hash code in order to find the right
bucket, so it never has to test whether the hash code has been
calculated. It only has to do one compare, not three, to know that two
strings have different hash codes. String may nor may not already have
one or both hash codes, so it has to test whether they exist.

2. HashMap is going to have to incur the cost of a method call, at an
absolute minimum, if it does call equals. String is already sitting in
its own equals method, all ready to start comparing characters.

A lot depends on both the probability of an equals call for two String
instances that have different hash codes and both have been calculated,
and the average number of iterations of the comparison loop, given the
strings are different.

Patricia
Patricia Shanahan - 27 Jul 2006 21:28 GMT
...
> I'm just saying: If String.equals() does have the hash code anyway (by chance),
> pre-checking is effectively free. Three comparisons aren't all that much are
> they? You need 2 comparisons in any iteration of the comparison loop.
...

I should perhaps explain why I think comparisons can be a big deal. The
problem is not the comparison itself, which can be just as fast as an
integer add, but the conditional branch that follows.

One of the nastiest, most disruptive things you can do to a processor is
force it to empty its pipeline and begin again.

The processor has to decide what to load next in the pipeline
immediately after loading the branch, many cycles before it executes and
the processor finds out if it is taken or not taken. The processor has
to guess, and if it guesses wrong all the code in the pipeline following
the branch is junk, and must be thrown away. It will have to wait for
the correct instructions to work their way through before it can do any
more useful work.

To reduce the frequency, processor designs incorporate various attempts
to predict the branch direction for frequently executed branches. Branch
prediction depends on the same sort of issues as caching - essentially,
it is an attempt to predict behavior based on recent history.

However, I don't think the branches involved in testing for hash codes
unequal but both non-zero are going to be particularly predictable, so
there is a good chance that at least one of them will be mispredicted.

Again, this is something that can only really be evaluated by
experiment, but the stakes are far higher than adding a few integer
instructions.

Patricia
Thomas Fritsch - 27 Jul 2006 16:27 GMT
> String does not cache hash codes.
I have looked into Sun's "String.java" source (Java1.4.2),
and as I see it, String does cache its hash code:

<quote>
[...]
    /** Cache the hash code for the string */
    private int hash = 0;
[...]
    public int hashCode() {
    int h = hash;
    if (h == 0) {
       int off = offset;
       char val[] = value;
       int len = count;

            for (int i = 0; i < len; i++) {
                h = 31*h + val[off++];
            }
            hash = h;
        }
        return h;
    }
[...]
</quote>

> HashMap, for example, does cache the
> hash code for each key in the map, and only calculates the probe key
> hash code once during a get.
Agreed.

Signature

Thomas

cp - 26 Jul 2006 23:30 GMT
The purpose of computing a hash code is that the hash should be quicker to
calculate and compare than computing full object equality. eg.
someObject.equals(otherObejct);
Chris Uppal - 27 Jul 2006 10:06 GMT
> The purpose of computing a hash code is that the hash should be quicker to
> calculate and compare than computing full object equality. eg.
> someObject.equals(otherObejct);

Not necessarily true, nor is it a requirement.  For the hash to be
performance-effective, it only needs to be fast enough that the time saved by
computing it /once/ is greater than the time saved by not having to do /many/
equality comparisons.

   -- chris
Lasse Reichstein Nielsen - 27 Jul 2006 22:41 GMT
> What propose/benefit does Object.hashCode() serve?  Why would I want to
> use this?

On top of the other benefits mentioned, it is also important that
hashCode generates a numeric value from an object. This allows it to
be used in numeric calculations like picking a bucket in HashMap.

You could make a "hash value" from objects that allowed quick
comparisons and was always equal if the objects were equal, but that
was not a number, but that would not sufficient for implementing a
HashMap.

/L
Signature

Lasse Reichstein Nielsen  -  lrn@hotpop.com
DHTML Death Colors: <URL:http://www.infimum.dk/HTML/rasterTriangleDOM.html>
 'Faith without judgement merely degrades the spirit divine.'



Free Magazines

Get these publications absolutely FREE for up to 12 months. There are no hidden fees and no obligation. Simply choose a title, complete the application form and submit it. Read more ...

Oracle MagazineNetwork ComputingComputer WorldBio-IT WorldeWeekInformation WeekInfosecurity
 
Sign In
Join
My Latest Posts
My Monitored Threads
My Blog
My Photo Gallery
My Profile
My Homepage

Start New Thread
Enable EMail Alerts
Rate this Thread



©2008 Advenet LLC   Privacy Policy - Terms of Use
This website includes both content owned or controlled by Advenet as well as content owned or controlled by third parties.