Java Forum / General / June 2005
equals and hashCode
Tom Dyess - 19 Jun 2005 08:49 GMT This is gonna be a tough question to put into words, but here goes:
I understand the purpose of overriding equals() in comparing the data of an object verses the reference, but the requirement to override hashCode() is a little less clear.
Consider this idiom from this reference: http://www.geocities.com/technofundo/tech/java/equalhash.html
"Equal objects must produce the same hash code as long as they are equal, however unequal objects need not produce distinct hash codes."
The simplest way to create an equals and hashCode implementation for an object would be to make a distinct hash code based on the value of the data in the object, then in equals(), simply compare the value of hash codes. Why would I not want to make a distinct hashCode based on the data - i.e. "however unequal objects need not produce distinct hash codes"
Thanks for any information. I'm researching this because we are using Hibernate in our project and want to understand how it can apply.
 Signature Tom Dyess OraclePower.com
Sebastian Scheid - 19 Jun 2005 09:52 GMT > This is gonna be a tough question to put into words, but here goes: > [quoted text clipped - 12 lines] > data in the object, then in equals(), simply compare the value of hash > codes. That may not be as simple as it seems to you. Consider two classes A and B which do not have any semantical interrelation but have similar fields:
class A { int num1; // significant for equals String name; // NOT significant
public boolean equals(Object obj) { if(this == obj) return true; if((obj == null) || (obj.getClass() != this.getClass())) return false;
A a = (A) obj; return num1 == a.num1; }
public int hashCode() { int hash = 7; hash = 31 * hash + num1; return hash; }
}
class B { int num2; // significant for equals Date date; // NOT significant
public boolean equals(Object obj) { if(this == obj) return true; if((obj == null) || (obj.getClass() != this.getClass())) return false;
B b = (B) obj; return num2 == b.num2; }
public int hashCode() { int hash = 7; hash = 31 * hash + num2; return hash; }
}
If instances of A and B have the same value in their num1/num2 field, their hashcode is the same but they are not equal. How do you want to make sure the hashCodes are distinct in this case? And this is a pretty obvious case so you could raise the plea that changing one of the hashCode calculations could solve the problem. But some other hashCode method could return the same value with a complete different computation.
And there is an even simpler example:
class C { int num1; int num2;
equals...
public int hashCode() { int hash = 7; hash = 31 * hash + num1; hash = 31 * hash + num2; return hash; } }
If num1 and num2 are significant fields, that you involve in the calculation of hashCode you cannot distinguish between two instances c1 (num1 = 3, num2 = 4) and c2 (num1 = 4, num2 = 3) with the given hashCode method. There might be a possibility, e.g.
public int hashCode() { int hash = 7; hash = 31 * hash + num1; hash = 32 * hash + num2; return hash; }
but I am not sure if this implementation does really calculate distinct codes. Proofing this could be difficult especially if many fields are involved in the calculation.
> Why would I not want to make a distinct hashCode based on the data - i.e. > "however unequal objects need not produce distinct hash codes" You cannot guaranty distinct hashCodes over several classes. While it is expensive to do that for your own classes it is impossible if you add classes that are not under your control.
Regards Sebastian
Lasse Reichstein Nielsen - 19 Jun 2005 13:34 GMT > class C { > int num1; [quoted text clipped - 13 lines] > of hashCode you cannot distinguish between two instances c1 (num1 = 3, num2 > = 4) and c2 (num1 = 4, num2 = 3) with the given hashCode method. Pedantry: the two examples does give different hash values (((7*31)+3)*31+4) and (((7*31)+4)*31+3) respectively. However the point is valid for other values, e.g., c1(num1 = 3, num2 = 4) c2(num1 = 2, num2 = 35)
> You cannot guaranty distinct hashCodes over several classes. You cannot guarantee distinct hashCodes for even one class, since you can have more than 2^32 different instances of it (given sufficient RAM :) and hashCode() only returns an int.
/L
 Signature Lasse Reichstein Nielsen - lrn@hotpop.com DHTML Death Colors: <URL:http://www.infimum.dk/HTML/rasterTriangleDOM.html> 'Faith without judgement merely degrades the spirit divine.'
Boudewijn Dijkstra - 19 Jun 2005 13:49 GMT > [...] > [quoted text clipped - 15 lines] > of hashCode you cannot distinguish between two instances c1 (num1 = 3, num2 > = 4) and c2 (num1 = 4, num2 = 3) with the given hashCode method. These two particular instances happen to produce different hash codes: 6824 and 6854, respectively. In fact, there seems to be no combination of c1 and c2 with c1.num1==c2.num2&&c1.num2==c2.num1 where they produce the same hash code. But they do for arbitrary combinations, e.g. c1(3,66) and c2(5,4) both produce 6886.
> There might be a possibility, e.g. > [quoted text clipped - 7 lines] > but I am not sure if this implementation does really calculate distinct > codes. No, it doesn't. In fact, if the number of valid combinations between num1 and num2 exceeds 2^31-1, then there is no way of guarantueeing a distinct hash code between distinct values.
> Proofing this could be difficult especially if many fields are involved in > the calculation. Not difficult. Just a matter of feeding the right equation into your integer equation solver.
Patricia Shanahan - 20 Jun 2005 20:45 GMT ...
> No, it doesn't. In fact, if the number of valid combinations between num1 and > num2 exceeds 2^31-1, then there is no way of guarantueeing a distinct hash > code between distinct values. ...
Shouldn't that be 2^32, the number of distinct values of an int? A hash code does not have to be positive.
Patricia
Boudewijn Dijkstra - 22 Jun 2005 21:50 GMT > ... >> No, it doesn't. In fact, if the number of valid combinations between num1 [quoted text clipped - 4 lines] > Shouldn't that be 2^32, the number of distinct values of an int? A hash > code does not have to be positive. My mistake. Damn those pesky signed numers!
Lasse Reichstein Nielsen - 20 Jun 2005 20:49 GMT > In fact, there seems to be no combination of c1 and c2 with > c1.num1==c2.num2&&c1.num2==c2.num1 where they produce the same hash > code. c1.num1 = c1.num2 = c2.num1 = c2.num2 = 0;
(or any other where they are equal, but that's really cheating :)
For it to happen, you need two numbers, a and b, such that a * 31 + b = b * 31 + a (mod 2^32) Ignoring overflow, it would require a * 30 == b * 30, i.e., a==b.
With overflow, we can also succeede with: a = Integer.MIN_VALUE; b = 0;
:) /L
 Signature Lasse Reichstein Nielsen - lrn@hotpop.com DHTML Death Colors: <URL:http://www.infimum.dk/HTML/rasterTriangleDOM.html> 'Faith without judgement merely degrades the spirit divine.'
googmeister@gmail.com - 19 Jun 2005 13:23 GMT > Why would I not want to make a distinct hashCode > based on the data - i.e. "however unequal objects > need not produce distinct hash codes" It might not always be possible since there are only 2^32 possible hashCode values. So you'd be stuck if you tried to do this for String values, since there are many many more than 2^32 possible strings values.
Leon - 19 Jun 2005 18:07 GMT > This is gonna be a tough question to put into words, but here goes: > [quoted text clipped - 13 lines] > not want to make a distinct hashCode based on the data - i.e. "however unequal > objects need not produce distinct hash codes" I disagree. The simplest way is to give the -same- value for every instance.
public int hashCode() { return 0; }
Greetings, Leon.
Wibble - 19 Jun 2005 18:11 GMT > This is gonna be a tough question to put into words, but here goes: > [quoted text clipped - 16 lines] > Thanks for any information. I'm researching this because we are using > Hibernate in our project and want to understand how it can apply. Even when the semantic space is small enough to fit into 32 bits, computing a perfect hash is often expensive. If its cheap for your class, then by all means compute it and compare the hash and type for equalilty. Domains that would meet this criteria would be byte[4] arrays or java.lang.Integer.
Tom Dyess - 19 Jun 2005 20:46 GMT >> This is gonna be a tough question to put into words, but here goes: >> [quoted text clipped - 23 lines] > Domains that would meet this criteria would be byte[4] arrays or > java.lang.Integer. Thanks for all the input. I can see where equals and compareTo would come in handy, but what is a common use for a hashCode? For hash collections? Is it implemented because of the lack of a common global context in clustering scenarios (ie Tomcat)?
 Signature Tom Dyess OraclePower.com
Harald - 19 Jun 2005 21:56 GMT > Thanks for all the input. I can see where equals and compareTo would come in > handy, but what is a common use for a hashCode? For hash collections? Is it Interesting. I seem to be unable write 10 lines of Java without using a HashMap, which obviously requires that the keys have a hash code.-)
Harald.
 Signature ---------------------+--------------------------------------------- Harald Kirsch (@home)| Java Text Crunching: http://www.ebi.ac.uk/Rebholz-srv/whatizit/software
Tom Dyess - 19 Jun 2005 23:12 GMT >> Thanks for all the input. I can see where equals and compareTo would come >> in [quoted text clipped - 5 lines] > > Harald. All of my HashMap usage comes from persistant data from Oralce. I use the primary key as the hash when performing a put() so I can reference it quickly by the key. I also use it for key/value tables.
params.put(rs.getString("par_name"), rs.getString("par_value"));
Does the first arguement override the obejct's internal hashCode implementation?
 Signature Tom Dyess OraclePower.com
Wibble - 20 Jun 2005 01:34 GMT >>>Thanks for all the input. I can see where equals and compareTo would come >>>in [quoted text clipped - 14 lines] > Does the first arguement override the obejct's internal hashCode > implementation? Think harder Tom, and read carefully.
Tom Dyess - 20 Jun 2005 03:13 GMT >>>>Thanks for all the input. I can see where equals and compareTo would >>>>come in [quoted text clipped - 16 lines] >> > Think harder Tom, and read carefully. Umm, lemme think... your a jackass! What do I win?
 Signature Tom Dyess OraclePower.com
Wibble - 20 Jun 2005 12:12 GMT >>>>>Thanks for all the input. I can see where equals and compareTo would >>>>>come in [quoted text clipped - 18 lines] > > Umm, lemme think... your a jackass! What do I win? Your still ignorant and I have a clue, so I guess you lose.
Tom Dyess - 20 Jun 2005 22:08 GMT >>>>>>Thanks for all the input. I can see where equals and compareTo would >>>>>>come in [quoted text clipped - 22 lines] >> > Your still ignorant and I have a clue, so I guess you lose. In my 20 years and 5 languages of programming, I've noticed that the condescending ones are always making up for what's lacking. Answer me this, smart guy, why go through the effort of building a never-perfect hash algorithm to distinctly identify each object that is persistant when I could use a guaranteed unique primary key from my database like a sequence?
 Signature Tom Dyess OraclePower.com
Lasse Reichstein Nielsen - 20 Jun 2005 22:17 GMT > why go through the effort of building a never-perfect hash algorithm > to distinctly identify each object that is persistant when I could > use a guaranteed unique primary key from my database like a > sequence? Because that's not necessarily a hash code. Remember the single requirement of the hashcode method: if a.equals(b) then a.hashCode()==b.hashCode().
If your objects has an equals method that can equate two objects with different unique id's, then the id can't be used as a hash code.
If you just inherit equals() from Object, then there is no reason to make an intelligent hashCode either. Just use Math.random the first time hashCode is called, and store if for later calls.
/L
 Signature Lasse Reichstein Nielsen - lrn@hotpop.com DHTML Death Colors: <URL:http://www.infimum.dk/HTML/rasterTriangleDOM.html> 'Faith without judgement merely degrades the spirit divine.'
Tom Dyess - 20 Jun 2005 23:32 GMT >> why go through the effort of building a never-perfect hash algorithm >> to distinctly identify each object that is persistant when I could [quoted text clipped - 13 lines] > > /L Lasse,
Thanks for the information. Sounds similar to a COM GUID except can have logic involved. Pretty neat.
Tom
--- Tom Dyess OraclePower.com
Virgil Green - 21 Jun 2005 19:35 GMT >>>>> Thanks for all the input. I can see where equals and compareTo >>>>> would come in [quoted text clipped - 25 lines] > > Umm, lemme think... your a *******! What do I win? That's spelled "you're"
 Signature Virgil
Tom Dyess - 22 Jun 2005 01:38 GMT >>>>>> Thanks for all the input. I can see where equals and compareTo >>>>>> would come in [quoted text clipped - 27 lines] > > That's spelled "you're" Lol. Thanks for noticing, Vergil.
 Signature Tom Dyess OraclePower.com
Tom Dyess - 22 Jun 2005 01:41 GMT > Lol. Thanks for noticing, Vergil. Crap, I spelled that wrong as well, Virgil. Is it tomorrow yet?
--- Tom Dyess OraclePower.com
Virgil Green - 23 Jun 2005 18:29 GMT >> Lol. Thanks for noticing, Vergil. >> [quoted text clipped - 7 lines] > Tom Dyess > OraclePower.com Previous reply sent too soon.
 Signature Virgil
Virgil Green - 23 Jun 2005 18:29 GMT >>>>>>> Thanks for all the input. I can see where equals and compareTo >>>>>>> would come in [quoted text clipped - 33 lines] > > Lol. Thanks for noticing, Vergil. That's spelled "Virgil"
 Signature Virgil
Leon - 19 Jun 2005 22:25 GMT > Thanks for all the input. I can see where equals and compareTo would come in > handy, but what is a common use for a hashCode? For hash collections? Is it > implemented because of the lack of a common global context in clustering > scenarios (ie Tomcat)? With off course an efficient and effective hashcode method, using a HashSet would greatly improve the speed of handling of collections. Here's an example:
/** * answer whether c1 includes all elements of c2 * that is: for each element of c2 there is an equal object in c1 */ public boolean includesAll(Collection c1, Collection c2) {
HashSet hashSet = new HashSet(c1); return hashSet.containsAll(c2);
}
Greetings, Leon.
Andrew McDonagh - 22 Jun 2005 00:45 GMT > This is gonna be a tough question to put into words, but here goes: > [quoted text clipped - 7 lines] > "Equal objects must produce the same hash code as long as they are equal, > however unequal objects need not produce distinct hash codes." As far as the contract for hashCode() goes, the value it returns does not have to be unique. In fact we can not and must not rely upon the hashCode being unique. hashCode is not an identifier of the object
Its worth knowing that its common for two instances of unrelated classes to produce the same hashCode value.
The purpose of hashCode is for hashing efficiency used by HashSet, HashTable, HashMap, etc.
In general, the hashCode is used to determine which bucket the object is placed in, so that fast retrival can occur because we don't have to compare every element within the collection, just every element within the bucket.
here's the HashMap.containsKey() method showing the retrival and comparisons of each entry with the bucket.
*/ public boolean containsKey(Object key) { Object k = maskNull(key); int hash = hash(k); int i = indexFor(hash, table.length); Entry e = table[i]; while (e != null) { if (e.hash == hash && eq(k, e.key)) return true; e = e.next; } return false; }
The advice wrt to overriding equals() is that you should also override hashCode() to ensure the efficiencies of the hashing algorithms.
> The simplest way to create an equals and hashCode implementation for an > object would be to make a distinct hash code based on the value of the data [quoted text clipped - 4 lines] > Thanks for any information. I'm researching this because we are using > Hibernate in our project and want to understand how it can apply. This might work, but is not a good general practise, consider...
Class Person {
private String forename; private String surname;
public Person(String theirName, String theirSurname) { name = theirName; surname = theirSurname; }
public int hashCode() { return foreName.hashCode() + surname.hashCode(); }
public boolean equals(Object o) { Person other = (Person) o; return other.hashCode() = this.hashCode(); }
}
if the created two instances with different values, but reversed..
Person p1 = new Person("Andrew", "John"); Person p2 = new Person("John", "Andrew");
its possible that the hashCode value for both objects would be the same and so if we used the hashCode as an equality comparitor, a false positive would result.
Always, start with creating the equals method, then make the hashCode work.
Chris Uppal - 22 Jun 2005 10:28 GMT > Always, start with creating the equals method, then make the hashCode > work. Agreed, but create an empty hashCode() method when you create the equals() method, or you're likely to forget...
BTW, when you do get round to setting up the hash code remember that you don't have to include every little detail that you'll be comparing in the equals() method. If (this is a slightly strained example, I admit) you had a person with account number, first name, and last name, then it would probably be a waste of time to include the two names in the hash even if the same account number were allowed to be used by more than one person.
-- chris
Tom Dyess - 22 Jun 2005 13:51 GMT > Agreed, but create an empty hashCode() method when you create the equals() > method, or you're likely to forget... Yes, I employ this technique for languages that require manual freeing of objects as well as opening/closing recordsets. Works every time and sure beats searching the code looking for memory leaks cause by constructed objects that weren't freed.
 Signature Tom Dyess OraclePower.com
Andrew McDonagh - 22 Jun 2005 18:25 GMT >>Always, start with creating the equals method, then make the hashCode >>work. > > Agreed, but create an empty hashCode() method when you create the equals() > method, or you're likely to forget... Agreed, I've had a todo on my list to change Eclipse's code generator to do just that when overriding equals().
> BTW, when you do get round to setting up the hash code remember that you don't > have to include every little detail that you'll be comparing in the equals() [quoted text clipped - 4 lines] > > -- chris Also good advice for the op - its easy for us to forget these little snippets eh ;-)
Free MagazinesGet these publications absolutely FREE for up to 12 months. There are no hidden fees and no obligation. Simply choose a title, complete the application form and submit it. Read more ...
|
|
|