Home | Contact Us | FAQ | Search & Site Map | Link to Us
Sign In | Join | Other 45 Sites in Network
HomeAnnouncementsWhite Papers
Discussion GroupsFirst AidDatabasesJavaBeansGUIJava 3DVirtual MachineCORBASecurityToolsGeneral
Java DirectoryOpen Source ProjectsSample Book ChaptersUser GroupsWeb Resources
Related Topics
Databases.NETMore Topics ...

Java Forum / General / August 2007

Tip: Looking for answers? Try searching our database.

hashcode calculation for a Collection of objects

Thread view: 
Jimmy - 07 Aug 2007 20:18 GMT
If 2 Collections contain 2 List of objects, and the type of this
object has hashCode() override (hashcode calculation using all fields
in this object), can these 2 different List instances compare by using
list1.hashCode() == list2.hashCode(); without iterate through each
list by comparing each item?

Thanks,
Jimmy
Arne Vajhøj - 07 Aug 2007 20:21 GMT
> If 2 Collections contain 2 List of objects, and the type of this
> object has hashCode() override (hashcode calculation using all fields
> in this object), can these 2 different List instances compare by using
> list1.hashCode() == list2.hashCode(); without iterate through each
> list by comparing each item?

Depends on what semantics you want.

The comparison is valid Java, but it returning true does not
guarantee identical elements.

Multiple values gets mapped to the same hash code, so
same value implies same hash code but same hash code
does not imply same value.

Arne
rossum - 07 Aug 2007 22:02 GMT
>If 2 Collections contain 2 List of objects, and the type of this
>object has hashCode() override (hashcode calculation using all fields
[quoted text clipped - 4 lines]
>Thanks,
>Jimmy
Assuming that hashCode() is implemented correctly then equal hash
codes is neccessary, but not sufficient for the two lists to be equal.

Basically you are going to have to check that two lists with equal
hash codes actually are equal.

If the hash codes are different then you can be sure that the two
lists are different.

rossum
Daniel Pitts - 08 Aug 2007 00:41 GMT
> If 2 Collections contain 2 List of objects, and the type of this
> object has hashCode() override (hashcode calculation using all fields
[quoted text clipped - 4 lines]
> Thanks,
> Jimmy

Assuming that the Collection  implementations are the same, and List
implementations are the same, you can call c1.equals(c2) (that way you
don't have to iterate over them yourself)

If that is an expensive operation, then you can use "c1.hashCode() ==
c2.hashCode() && c1.equals(c2)"

Read <http://java.sun.com/j2se/1.4.2/docs/api/java/util/
Collection.html#equals(java.lang.Object)> for more information about
equals() and Collections.
Lew - 08 Aug 2007 01:51 GMT
Jimmy wrote:
>> If 2 Collections contain 2 List of objects, and the type of this
>> object has hashCode() override (hashcode calculation using all fields
>> in this object), can these 2 different List instances compare by using
>> list1.hashCode() == list2.hashCode(); without iterate through each
>> list by comparing each item?

> Assuming that the Collection  implementations are the same, and List
> implementations are the same, you can call c1.equals(c2) (that way you
> don't have to iterate over them yourself)
>
> If that is an expensive operation, then you can use "c1.hashCode() ==
> c2.hashCode() && c1.equals(c2)"

This will more than double the time to evaluate equal lists and increase the
time for all others, compared to just using equals():

<http://java.sun.com/javase/6/docs/api/java/util/List.html#hashCode()>
> The hash code of a list is defined to be the result of the following calculation:
>
[quoted text clipped - 4 lines]
>       hashCode = 31*hashCode + (obj==null ? 0 : obj.hashCode());
>   }

If anything, because equals() doesn't need to perform the multiply-add
operations the simpler approach is probably faster always.

Extra points to those who know why List.hashCode() multiplies by the magic
number 31.

Signature

Lew

Patricia Shanahan - 08 Aug 2007 02:17 GMT
> Jimmy wrote:
>>> If 2 Collections contain 2 List of objects, and the type of this
[quoted text clipped - 26 lines]
> If anything, because equals() doesn't need to perform the multiply-add
> operations the simpler approach is probably faster always.

Also, equals can stop early. The AbstractList implementation stops
without examining any elements if the two references are equal or the
other is not a List. Given two distinct List references, it stops at the
first inequality or when it runs out of elements in the shorter List.

Patricia
Roedy Green - 08 Aug 2007 01:40 GMT
>If 2 Collections contain 2 List of objects, and the type of this
>object has hashCode() override (hashcode calculation using all fields
>in this object), can these 2 different List instances compare by using
>list1.hashCode() == list2.hashCode(); without iterate through each
>list by comparing each item?

One way to compute a collective hashCode is to xor the individual
hashCodes. ^ is the xor operator.  Comparing collective hashCodes
gives you are rough measure of equality.  If the hashCodes don't
match, then the collections are definitely different. If the hashCodes
DO match then the collections are likely the same, and you can then
use another method to verify.

In comparing objects your equals method might first compare hashCodes
as a way of quickly discovering inequality based on input from all
relevant fields.
Signature

Roedy Green Canadian Mind Products
The Java Glossary
http://mindprod.com

Eric Sosman - 08 Aug 2007 15:30 GMT
Roedy Green wrote On 08/07/07 20:40,:

>>If 2 Collections contain 2 List of objects, and the type of this
>>object has hashCode() override (hashcode calculation using all fields
[quoted text clipped - 4 lines]
> One way to compute a collective hashCode is to xor the individual
> hashCodes. ^ is the xor operator.  [...]

   The Javadoc for List specifies how hashCode()
should be computed.  Since List is an interface there
is no language-supported way to enforce what the Javadoc
requires (nor to detect buggy implementations of it),
but the specification is there nonetheless.

   Similarly, Set specifies how hashCode() is to be
computed -- and calls for a different algorithm than
List uses.  (List's hashCode() depends on the order of
the elements as well as on their hashCodes; Set has no
concept of order.)

   So: Any two Lists (of any type) containing equal
elements in the same order will have equal hashCodes if
their hashCode() methods are implemented as called for.
Also, any two Sets (of any type) containing equal elements
should have equal hashCodes.  But equality doesn't hold
across different interfaces: A List with elements x,y,z
most likely has a different hashCode than a Set of x,y,z.

Signature

Eric.Sosman@sun.com

Olle - 08 Aug 2007 16:27 GMT
>     So: Any two Lists (of any type) containing equal
> elements in the same order will have equal hashCodes if
[quoted text clipped - 3 lines]
> across different interfaces: A List with elements x,y,z
> most likely has a different hashCode than a Set of x,y,z.

Just to clairify:

A List can never equal a Set, even if they contain the same elements
they are NOT the same (since the Set lacks the order information).

Cut from Collection's JavaDoc:

The general contract for the Object.equals method states that equals
must be symmetric (in other words, a.equals(b) if and only if
b.equals(a)). The contracts for List.equals and Set.equals state that
lists are only equal to other lists, and sets to other sets. Thus, a
custom equals method for a collection class that implements neither
the List nor Set interface must return false when this collection is
compared to any list or set. (By the same logic, it is not possible to
write a class that correctly implements both the Set and List
interfaces.)

Read more on List and Set if this is not enough proof :-)

/Olle
Daniel Pitts - 09 Aug 2007 00:33 GMT
> >     So: Any two Lists (of any type) containing equal
> > elements in the same order will have equal hashCodes if
[quoted text clipped - 24 lines]
>
> /Olle

If you really want to know if the list and set contain exactly the
same items, you can use
if (list.size() == set.size() && set.containsAll(list)) {
}
Not I specifically choose set.containsAll(), rather than
list.containsAll(), since set implementations tend to be more
efficient at this operation.
Patricia Shanahan - 09 Aug 2007 00:58 GMT
...
> If you really want to know if the list and set contain exactly the
> same items, you can use
[quoted text clipped - 3 lines]
> list.containsAll(), since set implementations tend to be more
> efficient at this operation.

Suppose the list contains "A", "A", and the set contains "A" and "B"?

Testing the other way round would be correct, because the set cannot
contain duplicates. Alternatively, create a new HashSet containing the
elements of the List and test it for equality to the set.

Patricia
kaldrenon - 09 Aug 2007 13:51 GMT
> Alternatively, create a new HashSet containing the
> elements of the List and test it for equality to the set.

Couldn't this easily break due to sets containing no duplicate?
Consider List X == ["A","A","B"] and Set Y == ["A","B"]. If you
convert X to a set it will become ["A","B"], will it not?

Unless the conditions of the algorithm guarantee uniqueness in both
Collections, converting a List to a Set sounds dangerous and could
lead to error.
Patricia Shanahan - 09 Aug 2007 15:19 GMT
>> Alternatively, create a new HashSet containing the
>> elements of the List and test it for equality to the set.
[quoted text clipped - 6 lines]
> Collections, converting a List to a Set sounds dangerous and could
> lead to error.

Remember the problem description this was intended to solve: "If you
really want to know if the list and set contain exactly the
same items..."

Your X and Y do contain exactly the same items, so the comparison should
return true.

Conversion of List to Set is an information-destroying operation, but in
this particular case, the information that gets destroyed is exactly the
superfluous information that prevents compareTo from getting the right
answer. Sometimes, it is good to destroy information.

I would agree with you if the problem were to determine if the list and
set contain the same items the same number of times each.

Patricia
kaldrenon - 09 Aug 2007 15:53 GMT
> I would agree with you if the problem were to determine if the list and
> set contain the same items the same number of times each.

This is what I had inferred the problem to be, hence my questioning of
your suggestion. But given a situation of only checking /which/ values
are in each collection, as opposed to how many of each, you are right
that moving a List into a Set is what will be needed.
Christian - 09 Aug 2007 12:35 GMT
Roedy Green schrieb:

>> If 2 Collections contain 2 List of objects, and the type of this
>> object has hashCode() override (hashcode calculation using all fields
[quoted text clipped - 12 lines]
> as a way of quickly discovering inequality based on input from all
> relevant fields.

XOR seems to be not the best idea to do this..
multiplication is probably better.. as with xor you will miss Objects if
they are even times in the list..
especially if you have lists of Boolean or other Objects with only a few
different hashcodes this would be bad..
Thomas Hawtin - 09 Aug 2007 13:35 GMT
> multiplication is probably better.. as with xor you will miss Objects if
> they are even times in the list..

You have to be very careful with multiplication. What if a number of the
hash values were even?

Tom Hawtin
Lew - 09 Aug 2007 13:57 GMT
>> multiplication is probably better.. as with xor you will miss Objects if
>> they are even times in the list..
>
> You have to be very careful with multiplication. What if a number of the
> hash values were even?

Lew wrote:
> Extra points to those who know why List.hashCode() multiplies by the magic number 31.

(Knuth explained it.  Hint: the hash code calculation uses 32-bit integer
arithmetic without exceptions for overflow.)

Signature

Lew

Roedy Green - 10 Aug 2007 03:36 GMT
>XOR seems to be not the best idea to do this..
>multiplication is probably better.. as with xor you will miss Objects if
>they are even times in the list..

the nice properties of XOR are:

1. it does not depend on order of computation.

2. it does not "waste" bits.  If you change even one bit in one of the
components, the final value will change.

3. it is quick.

4. it does not tend to collapse the range of the digest into a
narrower band.

The nasty properties of XOR are:

1. it treats identical pairs of component as if they were not there.

2. it ignores order.  A ^ B is the same a B ^ A.  for that you want
some sort of checksum/digest such as Adlerian.  See
http://mindprod.com/jgloss/digest.html
Signature

Roedy Green Canadian Mind Products
The Java Glossary
http://mindprod.com

Roedy Green - 10 Aug 2007 04:06 GMT
>the nice properties of XOR
see http://mindprod.com/jgloss/xor.html
http://mindprod.com/jgloss/hashcode.html
Signature

Roedy Green Canadian Mind Products
The Java Glossary
http://mindprod.com

Twisted - 10 Aug 2007 15:19 GMT
On Aug 9, 10:36 pm, Roedy Green <see_webs...@mindprod.com.invalid>
wrote:
[snip excellent discussion of pros and cons of xor in hashCode()]

One place xor would be acceptable would be where there is no chance of
duplication. A Set, say, except that the Set interface specifies, and
AbstractSet enforces, that the Set hashcode be the sum; nonetheless
xor would have worked too if Sun had chosen it. An object whose fields
are of different types can xor their hashCodes to get a hashCode for
itself safely, if none of these types are related in the type
hierarchy other than common descent from Object.
Eric Sosman - 10 Aug 2007 19:09 GMT
Roedy Green wrote On 08/09/07 22:36,:

> the nice properties of XOR are:
> [...]
> 4. it does not tend to collapse the range of the digest into a
> narrower band.

   Equally, it does not expand the range.  If you
XOR a bunch of bytes, you get one of 256 possible
values, never more.

Signature

Eric.Sosman@sun.com

Roedy Green - 24 Aug 2007 06:29 GMT
>> 4. it does not tend to collapse the range of the digest into a
>> narrower band.
>
>    Equally, it does not expand the range.  If you
>XOR a bunch of bytes, you get one of 256 possible
>values, never more.

I said "it does not collapse the range", not that "it expands it".
Further, XOR depends on every single bit used to compute it.  Change a
bit and the result changes.

Various multiply, shift, modulus etc tend to collapse the range.

Signature

Roedy Green Canadian Mind Products
The Java Glossary
http://mindprod.com

Eric Sosman - 24 Aug 2007 17:36 GMT
Roedy Green wrote On 08/24/07 01:29,:

>>>4. it does not tend to collapse the range of the digest into a
>>>narrower band.
[quoted text clipped - 4 lines]
>
> I said "it does not collapse the range", not that "it expands it".

   Yes, that's what you said.  And you're right, and I
did not contradict you.

> Further, XOR depends on every single bit used to compute it.  Change a
> bit and the result changes.
> Various multiply, shift, modulus etc tend to collapse the range.

   ... but if I compute a 32-bit value, even if two-thirds
of its bits are "collapsed" in the sense of being directly
computable from the others, my "eleven-bits-of-entropy" hash
still gets eight times better scattering than an 8-bit value
of whatever pedigree.  A computation that yields an 8-bit
result is "pre-collapsed," even if it is careful to preserve
as many degrees of freedom as it still retains.

   It reminds me a little bit of the GIF/JPEG wars, now
(happily) mostly a thing of the past.  "JPEG is lossy!"
shrieked the GIF camp -- and then the JPEG fans pointed
out that turning an image into GIF *starts* by discarding
two-thirds (typically) of the input.  True, it zealously
guards the remaining one-third in a way JPEG does not, but
the "collapse" has already occurred and cannot be undone.

Signature

Eric.Sosman@sun.com



Free Magazines

Get these publications absolutely FREE for up to 12 months. There are no hidden fees and no obligation. Simply choose a title, complete the application form and submit it. Read more ...

Oracle MagazineNetwork ComputingComputer WorldBio-IT WorldeWeekInformation WeekInfosecurity
 
Sign In
Join
My Latest Posts
My Monitored Threads
My Blog
My Photo Gallery
My Profile
My Homepage

Start New Thread
Enable EMail Alerts
Rate this Thread



©2009 Advenet LLC   Privacy Policy - Terms of Use
This website includes both content owned or controlled by Advenet as well as content owned or controlled by third parties.