Home | Contact Us | FAQ | Search & Site Map | Link to Us
Sign In | Join | Other 45 Sites in Network
HomeAnnouncementsWhite Papers
Discussion GroupsFirst AidDatabasesJavaBeansGUIJava 3DVirtual MachineCORBASecurityToolsGeneral
Java DirectoryOpen Source ProjectsSample Book ChaptersUser GroupsWeb Resources
Related Topics
Databases.NETMore Topics ...

Java Forum / General / February 2007

Tip: Looking for answers? Try searching our database.

TreeSet and HashSet

Thread view: 
Marcin - 02 Feb 2007 19:40 GMT
Hello

There is a very useful functionality, that I think should be implemented in
TreeSet nad HashSet
that is the method: Object get(Object o).
The method should return the same object from colletion as the parameter
object.
In TreeSet complexity would be log(n), in HashSet would be constant.
With lack of this functionality one must implement collections on maps, so
the unnecessary and more complex type will be used.

What do you think about this?

Regards
Marcin
Daniel Pitts - 02 Feb 2007 20:00 GMT
> Hello
>
[quoted text clipped - 11 lines]
> Regards
> Marcin
What would be the use case of this?
would it return null if the set didn't contain the object o?
Also, this should be in the Set interface, if anywhere.

Whats so hard about using boolean contains(Object o)?

Or, are you basically using a Set<SomeTypeThatHasBothKeyAndValue>?
In that case, you SHOULD use a Map. Thats the whole point on maps, is
that you can key on the value.

If you want to have an AssociativeSet<T>, thats something a little
different than a standard Set

class AssociativeSet<T> implements Set<T> {
   final Map<T, T> associations;
   public AssociativeSet() {
      associations = new HashMat<T, T>();
   }

   public AssociativeSet(Map<T, T> backingMap) {
      associations = backingMap;
   }

   public T get(T o) {
     return associations.get(o);
   }
   // TODO: delegate most of the Set methods to associations keySet
method.
}
Marcin - 02 Feb 2007 21:05 GMT
>> Hello
>>
[quoted text clipped - 18 lines]
>
> Whats so hard about using boolean contains(Object o)?

contains does not return object only the information about exisiting of this
object.

> Or, are you basically using a Set<SomeTypeThatHasBothKeyAndValue>?
> In that case, you SHOULD use a Map. Thats the whole point on maps, is
> that you can key on the value.
We can said that almost every class that have implemented the method equals
have both keys and values, where the keys are fields used in the method
equals. But we do not use maps always.
Maps concept should be hide from developer where it is not needed. Maps
concept does not exist in this case directly, and should be ommited in my
opinion. The sets concept is clearer.

>Oh, almost forgot to mention.  HashSet is backed by a HashMap, so
>you're using a map anyway. Why not just use a Map if thats what you
>really want?

When you want to use Maps instead of sets you should create the other class
for key objects, when you are dealing with very huge data, space consumed by
key object could be very big. And we have redundant data. The solution like
dividing class into two class: key class, and value class breaks the class
concept. Why used more complicated data type, when adding get method is not
hard, because of sets implementation based on maps?

Marcin
Eric Sosman - 02 Feb 2007 22:11 GMT
Marcin wrote On 02/02/07 16:05,:
> [...]
>>Oh, almost forgot to mention.  HashSet is backed by a HashMap, so
[quoted text clipped - 4 lines]
> for key objects, when you are dealing with very huge data, space consumed by
> key object could be very big. [...]

   Note that a Map, like any Collection, only stores
references and not actual object instances.  If you create
an object that has a megabyte key:

    class Thing {
       private byte[] key = new byte[1024*1024];
       ...
    }

... and insert it in a Map as both key and value:

    Thing thing = new Thing();
    map.put(thing, thing);

... there's still only one copy of the megabyte key
floating around.

Signature

Eric.Sosman@sun.com

Marcin - 02 Feb 2007 23:16 GMT
>    Note that a Map, like any Collection, only stores
> references and not actual object instances.  If you create
[quoted text clipped - 12 lines]
> ... there's still only one copy of the megabyte key
> floating around.

When we want to put new key objects with chosen fields the problem exists
with unwrapped type like
char, int, long, float, double, etc. But in your solution you're right,
there is only need to store references. But i'm not sure if this solution is
safe, because we have references to the same important data from two sets
keySet and valueSet, so we have to be careful with them. Mapping object to
the same object does not have a big sense, for example what about the method
put when the key already exist, after this operation we have a map where key
and value are different, so we have inconsistent data. I'm not sure is
this concept is perfect. For example in the implementation of sets java
language developers were used mapping from object to dummy object instead of
object to the same object, maybe there were some important reasons?

Marcin
Eric Sosman - 03 Feb 2007 02:54 GMT
>>    Note that a Map, like any Collection, only stores
>> references and not actual object instances.  If you create
[quoted text clipped - 16 lines]
> with unwrapped type like
> char, int, long, float, double, etc.

    ... but you wrote about "very huge data, space consumed by
key object could be very big."  If the key is a wrapped primitive,
it isn't "very big."

> But in your solution you're right,
> there is only need to store references. But i'm not sure if this solution is
[quoted text clipped - 3 lines]
> put when the key already exist, after this operation we have a map where key
> and value are different, so we have inconsistent data.

    But what do you provide as the second argument to put()?  If
you always write put(obj,obj) there will never be any kind of
mismatch.  Yes, sometimes put(obj,obj) might replace an existing
oldobj->oldobj mapping -- but if you didn't want that to happen,
you shouldn't have called put() in the first place!

    As for the set of keys and the set (collection, really) of
values, yes: you must "be careful with them."  But this is no
different from being "careful" with the members of a Set!  If you
add a mutable object to a Set and then mutate it in a way that
violates the Set's requirements, you're already in trouble.  Using
a Map doesn't make things worse in any way I can think of.

> I'm not sure is
> this concept is perfect. For example in the implementation of sets java
> language developers were used mapping from object to dummy object instead of
> object to the same object, maybe there were some important reasons?

    Maybe.  It might be nothing more than a convenience: using a
special object as the "value" makes it easy to allow null as an
actual member of the set.

Signature

Eric Sosman
esosman@acm-dot-org.invalid

Marcin - 03 Feb 2007 11:31 GMT
>> But in your solution you're right,
>> there is only need to store references. But i'm not sure if this solution
[quoted text clipped - 13 lines]
> oldobj->oldobj mapping -- but if you didn't want that to happen,
> you shouldn't have called put() in the first place!

You can't use put(obj, obj2), because it doesn't have any sense in this
case. When you have the normal key and value, it has a sense.
What about clients, who want also get functionality?
We have to return Map<T, T> in the interface. How can you be sure that
clients won't call put(obj, obj2), you don't have control on it. You can't
throw exception when this occures with easy. So it might be more mistakes
about this. Also you must desribe this strange mapping in javadoc and be
sure that everyone understand it. You can instead public only a Set and a
get method, and hide a map implementation from the client. But this is
exactly like your own implementation of Set based on Maps, so this is the
same as "associative set".

>     As for the set of keys and the set (collection, really) of
> values, yes: you must "be careful with them."  But this is no
> different from being "careful" with the members of a Set!  If you
> add a mutable object to a Set and then mutate it in a way that
> violates the Set's requirements, you're already in trouble.  Using
> a Map doesn't make things worse in any way I can think of.

1. You can't assert that client doesn't cause the situation where there will
be mappings with no sense. When we assume that key.equals(value) or key ==
value (when key and a value are nulls), we should
check all situations, and throw exceptions when our assumption was violated
by client, for example (o1, o2) where !o1.equals(o2), or for example (null,
o1) null != o1, there are number of new mistakes, and we have to manage all
of them in some way. How? These client new mistakes don't exist when we use
"associative set".

Marcin
Eric Sosman - 03 Feb 2007 13:36 GMT
>>> But in your solution you're right,
>>> there is only need to store references. But i'm not sure if this solution
[quoted text clipped - 22 lines]
> sure that everyone understand it. You can instead public only a Set and a
> get method, and hide a map implementation from the client. [...]

    The words you seem to have forgotten are "composition" and
"encapsulation."

    You began with a desire for a Set<T> that provides the extra
operation `T get(T)'.  Very well: The suggestion is that you go
ahead and write a class that implements Set<T> plus the extra
operation, and that the class' implementation would use Map<T,T>
internally.  There is no need to expose the internal Map to the
class' users, nor to allow them to do anything undesirable to it.

Signature

Eric Sosman
esosman@acm-dot-org.invalid

Marcin - 03 Feb 2007 14:26 GMT
>>You can instead public only a Set and a
>>get method, and hide a map implementation from the client. But this is
>>exactly like your own implementation of Set based on Maps, so this is the
^^^^^^^^^^^^^^
>>same as "associative set".
>
[quoted text clipped - 7 lines]
> internally.  There is no need to expose the internal Map to the
> class' users, nor to allow them to do anything undesirable to it.

I've written about it in the last sentence. See above and Daniel has
mentioned it. I agree with you.

Thanks for discussion.

Marcin
Daniel Pitts - 02 Feb 2007 22:50 GMT
> >> Hello
>
[quoted text clipped - 45 lines]
>
> Marcin
For one thing, adding a "get" method is likely to break any existing
implementations of the "Set" interface, including third-party code.

Secondly, it is perfectly okay to have a Map<T, T> for your particular
usecase, which is "query-by-example". And, low-and-behold actually has
a T get(Object o) method!

a Map isn't more complicated than a Set to use. I also gave you code
that will help you extends Set the way you want to, even though I
disagree with the need.
Daniel Pitts - 02 Feb 2007 20:01 GMT
> Hello
>
[quoted text clipped - 11 lines]
> Regards
> Marcin

Oh, almost forgot to mention.  HashSet is backed by a HashMap, so
you're using a map anyway. Why not just use a Map if thats what you
really want?
Mark Rafn - 02 Feb 2007 23:51 GMT
>There is a very useful functionality, that I think should be implemented in
>TreeSet nad HashSet that is the method: Object get(Object o).

I echo what others have said: if anywhere, it should go on the Set interface.
I'd call it a convenience, but probably not worth a non-backward-compatible
change (all implementations and subclasses now break).

>The method should return the same object from colletion as the parameter
>object.

Well, no.  If it was sane, it would return the object in the collection which
is equal() to the parameter object (or null).  This wouldn't necessarily be
the same object as the parameter.

This is important because one use of this would be to normalize equivalent
immutable objects to avoid a bunch of unnecessary copies that are equal() but
not ==.  Currently, you do this with a map whose keys and values are the same.

If you're willing to live with getting back the parameter object instead of
the object in the collection, you can do
 set.contains(o) ? o : null
anyplace you'd want a get() method, but I can't think of a use for that.

>With lack of this functionality one must implement collections on maps, so
>the unnecessary and more complex type will be used.

The implementations in java.util use maps, so there's not much reason for you
not to do the same.
--
Mark Rafn    dagon@dagon.net    <http://www.dagon.net/>
Marcin - 03 Feb 2007 00:49 GMT
> >There is a very useful functionality, that I think should be implemented
> >in
[quoted text clipped - 5 lines]
> non-backward-compatible
> change (all implementations and subclasses now break).

What about subinterface of set?

>>The method should return the same object from colletion as the parameter
>>object.
[quoted text clipped - 4 lines]
> be
> the same object as the parameter.

I mean "equal object", not the same.

> This is important because one use of this would be to normalize equivalent
> immutable objects to avoid a bunch of unnecessary copies that are equal()
[quoted text clipped - 14 lines]
> you
> not to do the same.

You write a code, and operate on sets. But after some time there is a need
to update some data, and you have a set of new objects, that have the same
keys as existing objects in the collection. You can do this in linear time
by using iterator, so you don't need maps to have get functionality. The
problem is with linear time, it is too much. So in order to achieve
logarithmic time, you must change many lines of codes to use maps, even
then,
you don't need maps, you need only the better time to get object. The all
changes is because of lack of get method.

So when you write a code using sets you must think if you will need get
method for example after some years. You can't use sets anymore because you
are not sure about this.
So the sets are useless in most cases, because the changes to maps in the
feature are expensive tasks. And even when you operate with sets in your
code specification, you implement all things with maps.

Marcin
Chris Uppal - 05 Feb 2007 15:24 GMT
> There is a very useful functionality, that I think should be implemented
> in TreeSet nad HashSet
> that is the method: Object get(Object o).
> The method should return the same object from colletion as the parameter
> object.

I think that's a good idea.

I have evidence on my side too, because the language I normally use has just
that operation (called "find" rather than "get" -- which I think is a better
name, though it could still be improved) on its Sets and Set-like collections.
I find it useful.

For those who say that you can get the same effect with a Map, that's true, but
why should the programmer be forced to use a Map when the ability is
/intrinsically/ something that a Set must be able to provide ?  If Sets can
naturally do it themselves (and they can), and if the operation is useful in
practise (and it is), then it should be part of the Set API.

For those who point out that Java's Sets are implemented as Maps anyway, so
there's no gain in avoiding the Map, I think they are confusing specification
with implementation.  As it happens, Sun currently does implement Sets as Maps,
but that implementation decision (somewhat questionable as it is) should not
inform the design of the /API/.

The other thing I can't understand is why the hashed collections don't have
pluggable implementations.  Seems a blatant oversight to me...

   -- chris
Lew - 06 Feb 2007 01:37 GMT
Marcin wrote:
>> There is a very useful functionality, that I think should be implemented
>> in TreeSet nad HashSet
>> that is the method: Object get(Object o).
>> The method should return the same object from colletion as the parameter
>> object.

> I think that's a good idea.
>
> I have evidence on my side too, because the language I normally use has just
> that operation (called "find" rather than "get" -- which I think is a better
> name, though it could still be improved) on its Sets and Set-like collections.
> I find it useful.

I am mystified how it helps. What is the difference between

Object obj = o;

and

Object obj = enhancedSet.find( o );
?

- Lew
Mark Rafn - 06 Feb 2007 02:20 GMT
>I am mystified how it helps. What is the difference between
>Object obj = o; and Object obj = enhancedSet.find( o );

The difference is that the second will return the "canonical" instance of o,
where the first leaves you with multiple copies.

In the rather specialized case where you have many or large objects and you'd
rather keep one instance of each than many, and you for some reason don't want
to hide it all in a factory or manager, this can make it easy.

In every case I can think of, it's pretty easy to make a factory or FooSet
that's specific to the data and hides the implementation such that nobody
cares whether the Set interface has this method or the factory just keeps a
HashMap.
--
Mark Rafn    dagon@dagon.net    <http://www.dagon.net/>
Esmond Pitt - 06 Feb 2007 09:50 GMT
You guys need to make up your minds. If you have an object that can be
used to retrieve another object but isn't the 'canonical' object, surely
the first object is a key? which indicates using a Map?
Chris Uppal - 06 Feb 2007 18:09 GMT
> You guys need to make up your minds. If you have an object that can be
> used to retrieve another object but isn't the 'canonical' object, surely
> the first object is a key? which indicates using a Map?

No, the question we want to ask the set to answer is "what object, if any, do
you contain that is equivalent (by your rules) to this one?".

For a multiset (bag, or whatever you call it) the question would be "which
objects do [etc] ?".

I don't think that has any more similarity to a mapping operation than asking
the set /whether/ it contains an object which is equivalent to [etc].  Note
that the inclusion test can itself be phrased as an object->boolean mapping,
but no one suggests that Map<Object, Boolean> makes Set<Object> redundant.

Notice also that the equivalent question "which key, if any, do you contain
[etc]" is also something which could also be asked of Maps -- and is not the
same as asking what value is keyed by that object.  (I see no obvious use for
that particular operation, though -- but maybe that's only because I don't
already have it available).

   -- chris
Lew - 06 Feb 2007 22:39 GMT
> No, the question we want to ask the set to answer is "what object, if any, do
> you contain that is equivalent (by your rules) to this one?".
[quoted text clipped - 12 lines]
> that particular operation, though -- but maybe that's only because I don't
> already have it available).

Thanks. I see it. Well, I guess Sun can't provide everything we want; they
have to leave a few classes for programmers to write or we'd be out of jobs.

I can see that it'd be easy to implement such a "CanonicalSet" as an
implementor of Set.

- Lew


Free Magazines

Get these publications absolutely FREE for up to 12 months. There are no hidden fees and no obligation. Simply choose a title, complete the application form and submit it. Read more ...

Oracle MagazineNetwork ComputingComputer WorldBio-IT WorldeWeekInformation WeekInfosecurity
 
Sign In
Join
My Latest Posts
My Monitored Threads
My Blog
My Photo Gallery
My Profile
My Homepage

Start New Thread
Enable EMail Alerts
Rate this Thread



©2008 Advenet LLC   Privacy Policy - Terms of Use
This website includes both content owned or controlled by Advenet as well as content owned or controlled by third parties.