Java Forum / General / August 2007
hashcode calculation for a Collection of objects
Jimmy - 07 Aug 2007 20:18 GMT If 2 Collections contain 2 List of objects, and the type of this object has hashCode() override (hashcode calculation using all fields in this object), can these 2 different List instances compare by using list1.hashCode() == list2.hashCode(); without iterate through each list by comparing each item?
Thanks, Jimmy
Arne Vajhøj - 07 Aug 2007 20:21 GMT > If 2 Collections contain 2 List of objects, and the type of this > object has hashCode() override (hashcode calculation using all fields > in this object), can these 2 different List instances compare by using > list1.hashCode() == list2.hashCode(); without iterate through each > list by comparing each item? Depends on what semantics you want.
The comparison is valid Java, but it returning true does not guarantee identical elements.
Multiple values gets mapped to the same hash code, so same value implies same hash code but same hash code does not imply same value.
Arne
rossum - 07 Aug 2007 22:02 GMT >If 2 Collections contain 2 List of objects, and the type of this >object has hashCode() override (hashcode calculation using all fields [quoted text clipped - 4 lines] >Thanks, >Jimmy Assuming that hashCode() is implemented correctly then equal hash codes is neccessary, but not sufficient for the two lists to be equal.
Basically you are going to have to check that two lists with equal hash codes actually are equal.
If the hash codes are different then you can be sure that the two lists are different.
rossum
Daniel Pitts - 08 Aug 2007 00:41 GMT > If 2 Collections contain 2 List of objects, and the type of this > object has hashCode() override (hashcode calculation using all fields [quoted text clipped - 4 lines] > Thanks, > Jimmy Assuming that the Collection implementations are the same, and List implementations are the same, you can call c1.equals(c2) (that way you don't have to iterate over them yourself)
If that is an expensive operation, then you can use "c1.hashCode() == c2.hashCode() && c1.equals(c2)"
Read <http://java.sun.com/j2se/1.4.2/docs/api/java/util/ Collection.html#equals(java.lang.Object)> for more information about equals() and Collections.
Lew - 08 Aug 2007 01:51 GMT Jimmy wrote:
>> If 2 Collections contain 2 List of objects, and the type of this >> object has hashCode() override (hashcode calculation using all fields >> in this object), can these 2 different List instances compare by using >> list1.hashCode() == list2.hashCode(); without iterate through each >> list by comparing each item?
> Assuming that the Collection implementations are the same, and List > implementations are the same, you can call c1.equals(c2) (that way you > don't have to iterate over them yourself) > > If that is an expensive operation, then you can use "c1.hashCode() == > c2.hashCode() && c1.equals(c2)" This will more than double the time to evaluate equal lists and increase the time for all others, compared to just using equals():
<http://java.sun.com/javase/6/docs/api/java/util/List.html#hashCode()>
> The hash code of a list is defined to be the result of the following calculation: > [quoted text clipped - 4 lines] > hashCode = 31*hashCode + (obj==null ? 0 : obj.hashCode()); > } If anything, because equals() doesn't need to perform the multiply-add operations the simpler approach is probably faster always.
Extra points to those who know why List.hashCode() multiplies by the magic number 31.
 Signature Lew
Patricia Shanahan - 08 Aug 2007 02:17 GMT > Jimmy wrote: >>> If 2 Collections contain 2 List of objects, and the type of this [quoted text clipped - 26 lines] > If anything, because equals() doesn't need to perform the multiply-add > operations the simpler approach is probably faster always. Also, equals can stop early. The AbstractList implementation stops without examining any elements if the two references are equal or the other is not a List. Given two distinct List references, it stops at the first inequality or when it runs out of elements in the shorter List.
Patricia
Roedy Green - 08 Aug 2007 01:40 GMT >If 2 Collections contain 2 List of objects, and the type of this >object has hashCode() override (hashcode calculation using all fields >in this object), can these 2 different List instances compare by using >list1.hashCode() == list2.hashCode(); without iterate through each >list by comparing each item? One way to compute a collective hashCode is to xor the individual hashCodes. ^ is the xor operator. Comparing collective hashCodes gives you are rough measure of equality. If the hashCodes don't match, then the collections are definitely different. If the hashCodes DO match then the collections are likely the same, and you can then use another method to verify.
In comparing objects your equals method might first compare hashCodes as a way of quickly discovering inequality based on input from all relevant fields.
 Signature Roedy Green Canadian Mind Products The Java Glossary http://mindprod.com
Eric Sosman - 08 Aug 2007 15:30 GMT Roedy Green wrote On 08/07/07 20:40,:
>>If 2 Collections contain 2 List of objects, and the type of this >>object has hashCode() override (hashcode calculation using all fields [quoted text clipped - 4 lines] > One way to compute a collective hashCode is to xor the individual > hashCodes. ^ is the xor operator. [...] The Javadoc for List specifies how hashCode() should be computed. Since List is an interface there is no language-supported way to enforce what the Javadoc requires (nor to detect buggy implementations of it), but the specification is there nonetheless.
Similarly, Set specifies how hashCode() is to be computed -- and calls for a different algorithm than List uses. (List's hashCode() depends on the order of the elements as well as on their hashCodes; Set has no concept of order.)
So: Any two Lists (of any type) containing equal elements in the same order will have equal hashCodes if their hashCode() methods are implemented as called for. Also, any two Sets (of any type) containing equal elements should have equal hashCodes. But equality doesn't hold across different interfaces: A List with elements x,y,z most likely has a different hashCode than a Set of x,y,z.
 Signature Eric.Sosman@sun.com
Olle - 08 Aug 2007 16:27 GMT > So: Any two Lists (of any type) containing equal > elements in the same order will have equal hashCodes if [quoted text clipped - 3 lines] > across different interfaces: A List with elements x,y,z > most likely has a different hashCode than a Set of x,y,z. Just to clairify:
A List can never equal a Set, even if they contain the same elements they are NOT the same (since the Set lacks the order information).
Cut from Collection's JavaDoc:
The general contract for the Object.equals method states that equals must be symmetric (in other words, a.equals(b) if and only if b.equals(a)). The contracts for List.equals and Set.equals state that lists are only equal to other lists, and sets to other sets. Thus, a custom equals method for a collection class that implements neither the List nor Set interface must return false when this collection is compared to any list or set. (By the same logic, it is not possible to write a class that correctly implements both the Set and List interfaces.)
Read more on List and Set if this is not enough proof :-)
/Olle
Daniel Pitts - 09 Aug 2007 00:33 GMT > > So: Any two Lists (of any type) containing equal > > elements in the same order will have equal hashCodes if [quoted text clipped - 24 lines] > > /Olle If you really want to know if the list and set contain exactly the same items, you can use if (list.size() == set.size() && set.containsAll(list)) { } Not I specifically choose set.containsAll(), rather than list.containsAll(), since set implementations tend to be more efficient at this operation.
Patricia Shanahan - 09 Aug 2007 00:58 GMT ...
> If you really want to know if the list and set contain exactly the > same items, you can use [quoted text clipped - 3 lines] > list.containsAll(), since set implementations tend to be more > efficient at this operation. Suppose the list contains "A", "A", and the set contains "A" and "B"?
Testing the other way round would be correct, because the set cannot contain duplicates. Alternatively, create a new HashSet containing the elements of the List and test it for equality to the set.
Patricia
kaldrenon - 09 Aug 2007 13:51 GMT > Alternatively, create a new HashSet containing the > elements of the List and test it for equality to the set. Couldn't this easily break due to sets containing no duplicate? Consider List X == ["A","A","B"] and Set Y == ["A","B"]. If you convert X to a set it will become ["A","B"], will it not?
Unless the conditions of the algorithm guarantee uniqueness in both Collections, converting a List to a Set sounds dangerous and could lead to error.
Patricia Shanahan - 09 Aug 2007 15:19 GMT >> Alternatively, create a new HashSet containing the >> elements of the List and test it for equality to the set. [quoted text clipped - 6 lines] > Collections, converting a List to a Set sounds dangerous and could > lead to error. Remember the problem description this was intended to solve: "If you really want to know if the list and set contain exactly the same items..."
Your X and Y do contain exactly the same items, so the comparison should return true.
Conversion of List to Set is an information-destroying operation, but in this particular case, the information that gets destroyed is exactly the superfluous information that prevents compareTo from getting the right answer. Sometimes, it is good to destroy information.
I would agree with you if the problem were to determine if the list and set contain the same items the same number of times each.
Patricia
kaldrenon - 09 Aug 2007 15:53 GMT > I would agree with you if the problem were to determine if the list and > set contain the same items the same number of times each. This is what I had inferred the problem to be, hence my questioning of your suggestion. But given a situation of only checking /which/ values are in each collection, as opposed to how many of each, you are right that moving a List into a Set is what will be needed.
Christian - 09 Aug 2007 12:35 GMT Roedy Green schrieb:
>> If 2 Collections contain 2 List of objects, and the type of this >> object has hashCode() override (hashcode calculation using all fields [quoted text clipped - 12 lines] > as a way of quickly discovering inequality based on input from all > relevant fields. XOR seems to be not the best idea to do this.. multiplication is probably better.. as with xor you will miss Objects if they are even times in the list.. especially if you have lists of Boolean or other Objects with only a few different hashcodes this would be bad..
Thomas Hawtin - 09 Aug 2007 13:35 GMT > multiplication is probably better.. as with xor you will miss Objects if > they are even times in the list.. You have to be very careful with multiplication. What if a number of the hash values were even?
Tom Hawtin
Lew - 09 Aug 2007 13:57 GMT >> multiplication is probably better.. as with xor you will miss Objects if >> they are even times in the list.. > > You have to be very careful with multiplication. What if a number of the > hash values were even? Lew wrote:
> Extra points to those who know why List.hashCode() multiplies by the magic number 31. (Knuth explained it. Hint: the hash code calculation uses 32-bit integer arithmetic without exceptions for overflow.)
 Signature Lew
Roedy Green - 10 Aug 2007 03:36 GMT >XOR seems to be not the best idea to do this.. >multiplication is probably better.. as with xor you will miss Objects if >they are even times in the list.. the nice properties of XOR are:
1. it does not depend on order of computation.
2. it does not "waste" bits. If you change even one bit in one of the components, the final value will change.
3. it is quick.
4. it does not tend to collapse the range of the digest into a narrower band.
The nasty properties of XOR are:
1. it treats identical pairs of component as if they were not there.
2. it ignores order. A ^ B is the same a B ^ A. for that you want some sort of checksum/digest such as Adlerian. See http://mindprod.com/jgloss/digest.html
 Signature Roedy Green Canadian Mind Products The Java Glossary http://mindprod.com
Roedy Green - 10 Aug 2007 04:06 GMT >the nice properties of XOR see http://mindprod.com/jgloss/xor.html http://mindprod.com/jgloss/hashcode.html
 Signature Roedy Green Canadian Mind Products The Java Glossary http://mindprod.com
Twisted - 10 Aug 2007 15:19 GMT On Aug 9, 10:36 pm, Roedy Green <see_webs...@mindprod.com.invalid> wrote: [snip excellent discussion of pros and cons of xor in hashCode()]
One place xor would be acceptable would be where there is no chance of duplication. A Set, say, except that the Set interface specifies, and AbstractSet enforces, that the Set hashcode be the sum; nonetheless xor would have worked too if Sun had chosen it. An object whose fields are of different types can xor their hashCodes to get a hashCode for itself safely, if none of these types are related in the type hierarchy other than common descent from Object.
Eric Sosman - 10 Aug 2007 19:09 GMT Roedy Green wrote On 08/09/07 22:36,:
> the nice properties of XOR are: > [...] > 4. it does not tend to collapse the range of the digest into a > narrower band. Equally, it does not expand the range. If you XOR a bunch of bytes, you get one of 256 possible values, never more.
 Signature Eric.Sosman@sun.com
Roedy Green - 24 Aug 2007 06:29 GMT >> 4. it does not tend to collapse the range of the digest into a >> narrower band. > > Equally, it does not expand the range. If you >XOR a bunch of bytes, you get one of 256 possible >values, never more. I said "it does not collapse the range", not that "it expands it". Further, XOR depends on every single bit used to compute it. Change a bit and the result changes.
Various multiply, shift, modulus etc tend to collapse the range.
 Signature Roedy Green Canadian Mind Products The Java Glossary http://mindprod.com
Eric Sosman - 24 Aug 2007 17:36 GMT Roedy Green wrote On 08/24/07 01:29,:
>>>4. it does not tend to collapse the range of the digest into a >>>narrower band. [quoted text clipped - 4 lines] > > I said "it does not collapse the range", not that "it expands it". Yes, that's what you said. And you're right, and I did not contradict you.
> Further, XOR depends on every single bit used to compute it. Change a > bit and the result changes. > Various multiply, shift, modulus etc tend to collapse the range. ... but if I compute a 32-bit value, even if two-thirds of its bits are "collapsed" in the sense of being directly computable from the others, my "eleven-bits-of-entropy" hash still gets eight times better scattering than an 8-bit value of whatever pedigree. A computation that yields an 8-bit result is "pre-collapsed," even if it is careful to preserve as many degrees of freedom as it still retains.
It reminds me a little bit of the GIF/JPEG wars, now (happily) mostly a thing of the past. "JPEG is lossy!" shrieked the GIF camp -- and then the JPEG fans pointed out that turning an image into GIF *starts* by discarding two-thirds (typically) of the input. True, it zealously guards the remaining one-third in a way JPEG does not, but the "collapse" has already occurred and cannot be undone.
 Signature Eric.Sosman@sun.com
Free MagazinesGet these publications absolutely FREE for up to 12 months. There are no hidden fees and no obligation. Simply choose a title, complete the application form and submit it. Read more ...
|
|
|