> > The most common implementation of Set is HashSet. Why did they create this
> > set backed by a Map, when they could have extended ArrayList and just
> > overridden the add(Object obj) method ?
>
> Because it makes for an interesting question to ask on homework problems?
> It might do. However I'm curious because the HashSet source code uses a Map
> and puts in dummy Object instances as the values. The only thing I can think
> of is that it is quicker to check if a Hash contains a specified object than
> an array backed List.
That's exactly it. The Collections API is very concerned with the order
of complexity assigned to various operations. Using hashing as the
store for a Set makes it much faster to check if a specific value is
already there (amortized constant time, in fact), which is a common
operation for a Set.
What advantage do you see for using a List?

Signature
www.designacourse.com
The Easiest Way to Train Anyone... Anywhere.
Chris Smith - Lead Software Developer/Technical Trainer
MindIQ Corporation
Robert Olofsson - 18 Oct 2003 10:08 GMT
: That's exactly it. The Collections API is very concerned with the order
: of complexity assigned to various operations. Using hashing as the
: store for a Set makes it much faster to check...
: What advantage do you see for using a List?
Less memory use. A hash has an array where it stores Entries,
Each entry has at least two references. A list only has the array
with references to the values.
If memory is the problem it is easy to write a ListSet that uses a
list as a backend. I would think that speed is preferd for nearly all
applications though.
/robo
Chris Smith - 18 Oct 2003 14:26 GMT
> : That's exactly it. The Collections API is very concerned with the order
> : of complexity assigned to various operations. Using hashing as the
[quoted text clipped - 9 lines]
> list as a backend. I would think that speed is preferd for nearly all
> applications though.
I suppose, but it's important to remember that we're talking about
substantially different natures of advantage. The time advantage is a
difference of complexity. The space advantage is a difference of a
constant factor. Hence, the space advantage *might* occasionally matter
for programs that are right on the border of being fast enough. The
time advantage makes a difference in the very nature of problems for
which the structure is appropriate.

Signature
www.designacourse.com
The Easiest Way to Train Anyone... Anywhere.
Chris Smith - Lead Software Developer/Technical Trainer
MindIQ Corporation
Mark Thornton - 18 Oct 2003 15:12 GMT
>>: That's exactly it. The Collections API is very concerned with the order
>>: of complexity assigned to various operations. Using hashing as the
[quoted text clipped - 17 lines]
> time advantage makes a difference in the very nature of problems for
> which the structure is appropriate.
A List is likely to be faster if the expected number of entries is very
small (e.g. ~ 1). I have some code where the maximum number of entries
is 4 (and there only 16 different sets possible).
Mark Thornton
Chris Smith - 18 Oct 2003 15:34 GMT
> A List is likely to be faster if the expected number of entries is very
> small (e.g. ~ 1). I have some code where the maximum number of entries
> is 4 (and there only 16 different sets possible).
Fine. Again, though, only by a constant factor. That's a whole
different ball of wax than a difference in asymptotic complexity.
(Incidentally, it seems offhand like a BitSet might be a better
abstraction for your case, anyway.)

Signature
www.designacourse.com
The Easiest Way to Train Anyone... Anywhere.
Chris Smith - Lead Software Developer/Technical Trainer
MindIQ Corporation
Mark Thornton - 18 Oct 2003 16:02 GMT
>>A List is likely to be faster if the expected number of entries is very
>>small (e.g. ~ 1). I have some code where the maximum number of entries
[quoted text clipped - 5 lines]
> (Incidentally, it seems offhand like a BitSet might be a better
> abstraction for your case, anyway.)
You can't put objects into a BitSet even if the universe contains only 4
objects. Also the number might grow in future so I want to keep the
implementation as an internal detail.
Mark Thornton
Randall R Schulz - 20 Oct 2003 05:47 GMT
Robert,
You're right about the sub-optimality of the Sun HashSet
implementation. In part it is dictated by some of the Map API.
(In particular, the entrySet() method of interface Map.)
For an implementation that is cheaper for most of the Map operations,
check out GNU Trove / Trove4J: <http://trove4j.sourceforge.net/>
Randall Schulz
> : That's exactly it. The Collections API is very concerned with the order
> : of complexity assigned to various operations. Using hashing as the
[quoted text clipped - 11 lines]
>
> /robo
Thomas G. Marshall - 20 Oct 2003 04:36 GMT
Chris Smith <cdsmith@twu.net> coughed up the following:
>> It might do. However I'm curious because the HashSet source code
>> uses a Map and puts in dummy Object instances as the values. The
[quoted text clipped - 7 lines]
> already there (amortized constant time, in fact), which is a common
> operation for a Set.
Yes. To the point, a set contains only one of anything. In order to
enforce that with a List the class would have to iterate through itself on
every add(), to ensure no duplicates.
...[thwack]...