Java Forum / General / September 2006
retrieve the key from a map
ralf@rw7.de - 07 Sep 2006 10:11 GMT Hi everyone,
I want to retrieve a key from a map, and I don't find a way to do this. For instance there is the following code:
HashMap m = new HashMap(); String key = new String("k"); m.put(key, "v");
Later I don't have any reference to "key" anymore, other than the reference in the map. But I know, the string is "k". I want to retrieve the instance that is the key in the map.
Any ideas? Ralf.
M.J. Dance - 07 Sep 2006 10:25 GMT > Hi everyone, > [quoted text clipped - 11 lines] > Any ideas? > Ralf. Unless you're using IdentityHashMap (or some other special-purpose implementation of Map), knowing that 'the string is "k"' should be enough. Otherwise use m.keySet() and go from there.
Ed - 07 Sep 2006 10:26 GMT ralf@rw7.de skrev:
> Hi everyone, > > I want to retrieve a key from a map, and I don't find a way to do this. Distasteful, brute-force approach:
import java.util.Map; import java.util.HashMap; import java.util.Iterator;
class Crap {
public static void main(String[] args) { Map map = new HashMap(); map.put("man", "john"); map.put("woman", "jane"); System.out.println("Key for john is: " + getKey(map, "john")); }
private static String getKey(Map map, String value) { for (Iterator i = map.keySet().iterator(); i.hasNext();) { String key = (String)i.next(); if (map.get(key).equals(value)) { return key; } } return "No value found"; } }
.ed
-- www.EdmundKirwan.com - Home of The Fractal Class Composition.
Download Fractality, free Java code analyzer: www.EdmundKirwan.com/servlet/fractal/frac-page130.html
Zaph0d - 07 Sep 2006 13:35 GMT > ... > HashMap m = new HashMap(); The key here is the word "Hash" in HashMap.
A hash algorithm is an algorithm that maps data to a key so that different data will generate different keys. A good hash algorithm means that the keys that are generated are uniformly disributed along the range (int min->int max for example). Also, good hash algorithm should minimize collisions - two different datas having the same key.
Each Object in java has an overrideable hashCode() method which is supposed to return the object hash. An Object hash code is its java reference number ("pointer"). A String hash code is built from the string (you can see the details in String api docs).
A HashMap stores and retreives according the they "key" object hashCode(), and then according to equals(). The process is this: 1. find the "cell" holding all the keys with a specific hashcode. 2. iterate through these keys using equals() with the key object given. 3. if equals returns true, do the 'ok' action (overwrite the key/data pair / retreive the related data). if no equals returns true, do the 'fail' action (create a new key/data pair and store it / return null).
Take String for example - String overrides hashcode and equals. String A equals String B if their characters match. Also, String hashcode is computed based on the characters. That means that it doesn't matter which instance of String you used to store the data, you can retrieve it using a String with the same characters.
If you want to store your own repeateable keys, just build hashCode and equals according to the repeatable data: same data->same hashcode and equals=true (doesn't have to hold the other direction, but every hashcode collision can cost performance).
If you're still confused, please read on Hash maps in a good algorithm site/book.
ralf@rw7.de - 07 Sep 2006 14:17 GMT Sorry to everyone,
probably I simplified my example a bit too much. So here is what I really want;
I want to implement a cache, where the key class is implemented by myself, but the value class is java.util.List. I want to count the hits of this cache, separately for each cache entry. Therefore I want to store the hits-counter at the key class.
class Key { Object value; int hits;
int hashCode() { return value.hashCode(); }
boolean equals(Object o) { return (o instanceof Key) && value.equals(((Key)o).value); } }
class Cache { HashMap<Key, List> store = new HashMap<Key, List>()
void put(Key key, List value) { store.put(key, value); }
List get(Key key) { Key originalKey = store.getKey(key) // does not exist !!!!!! if(originalKey!=null) originalKey.hits++; return store.get(key); } }
As you can see, for the hit counter to work, I need the same (not just equal) key, that was used for insertion into the map. I don't know how to do this - apart from the obvious brute force attack.
Ralf.
Robert Klemme - 07 Sep 2006 14:43 GMT > Sorry to everyone, > [quoted text clipped - 5 lines] > of this cache, separately for each cache entry. Therefore I want to > store the hits-counter at the key class. I'd rather not do that. The reason is: the key must be know outside of your class Cache and thus the key should not carry information which is internal to your cache. Also it may make usage of the Cache class more complicated; for example: if you use String as key then the client must create a StringKeyWithCounts and hand that off to your class (at least the way you do it right now). If you do it internally in class Cache you still pay the overhead of object creation which you need to do the lookup.
Rather do this: create a value class that is internal to your Cache (private static) with a field for the value and a field for the hit counter. See the sample below.
Kind regards
robert
import java.util.HashMap; import java.util.Map;
public class Cache<K, V> {
private Map<K, Val<V>> values = new HashMap<K, Val<V>>();
public void set( K key, V val ) { Val<V> tmp = values.get( key );
if ( tmp == null ) { tmp = new Val<V>(); values.put( key, tmp ); }
tmp.setValue( val ); }
public V get( K key ) { Val<V> tmp = values.get( key );
if ( tmp == null ) { return null; } else { tmp.increment(); return tmp.getValue(); } }
public int getHits( K key ) { Val<V> tmp = values.get( key ); return tmp == null ? 0 : tmp.getCount(); } }
class Val<V> { private V value;
private int count;
public void setValue( V val ) { this.value = val; }
public V getValue() { return value; }
public void increment() { ++count; }
public int getCount() { return count; } }
Ed - 07 Sep 2006 15:07 GMT Robert Klemme skrev:
> public class Cache<K, V> { > [quoted text clipped - 13 lines] > public V get( K key ) { > Val<V> tmp = values.get( key ); Light-years OT:
One of the more ridiculous reasons I don't like generics: it makes code look like a fleet of WW2 bombers droning overhead before they spill their deathly loads on me. All those hard, merciless angle-brackets. Funny how the syntax gives such three-dimensional depth to the code.
Listen! Hear that?
Dddddddddrrrrrrrrroooooooooonnnnnnnnneeeeeeeeeee ...
.ed
-- www.EdmundKirwan.com - Home of The Fractal Class Composition.
Download Fractality, free Java code analyzer: www.EdmundKirwan.com/servlet/fractal/frac-page130.html
Chris Uppal - 08 Sep 2006 08:45 GMT > > private Map<K, Val<V>> values = new HashMap<K, Val<V>>(); [...]
> One of the more ridiculous reasons I don't like generics: it makes code > look like a fleet of WW2 bombers droning overhead before they spill > their deathly loads on me. You have a vivid imagination...
But I shall happily add this to my own "reasons why generics suck" list.
-- chris
Chris Uppal - 07 Sep 2006 14:46 GMT > I want to implement a cache, where the key class is implemented by > myself, but the value class is java.util.List. I want to count the hits > of this cache, separately for each cache entry. Therefore I want to > store the hits-counter at the key class. It would be a lot simpler to keep a separate Map from keys to hit-counts than to try to keep the count "in" the key.
-- chris
Robert Klemme - 07 Sep 2006 15:11 GMT >> I want to implement a cache, where the key class is implemented by >> myself, but the value class is java.util.List. I want to count the hits [quoted text clipped - 3 lines] > It would be a lot simpler to keep a separate Map from keys to hit-counts than > to try to keep the count "in" the key. That's an option. However, I prefer the solution I presented for the following reasons:
- more time efficient (just a single map lookup) - more space efficient (just one hash table) - easier maintenance of consistency (only one map to update)
Depending on the circumstances this can matter or not. Also other solutions might be more appropriate for other requirements.
Kind regards
robert
Chris Uppal - 07 Sep 2006 15:32 GMT [me:]
> > It would be a lot simpler to keep a separate Map from keys to > > hit-counts than to try to keep the count "in" the key. [quoted text clipped - 5 lines] > - more space efficient (just one hash table) > - easier maintenance of consistency (only one map to update) Agreed, but you don't list the downsides: more complicated code, code which does two unrelated things with one operation, code in whch the actual values have a somewhat obscure relationship with the logical values.
Swings and roundabouts...
-- chris
Patricia Shanahan - 07 Sep 2006 16:06 GMT > [me:] >>> It would be a lot simpler to keep a separate Map from keys to [quoted text clipped - 11 lines] > > Swings and roundabouts... and inability to reuse existing code, developed for unrelated applications, that does one, but not both, of the jobs.
The following would need a remove method added, but it should be obvious how to do that.
/* * Created on Aug 14, 2004 */ package utilities;
import java.util.Collections; import java.util.HashMap; import java.util.Map; import java.util.Set;
/** * Utility class for counting instances of equal objects. */ public class Counter {
private Map<Object,Count> data = new HashMap<Object,Count>();
/** * Increment by one the count associated with a specified * key. * @param key */ public void increment(Object key) { Count count = data.get(key); if (count == null) { count = new Count(); data.put(key, count); } count.increment(); }
/** * Get the count associated with a specified key. * @param key The key whose count is required. * @return The number of times increment has been called with * a key equal to this one. */ public int get(Object key) { Count count = data.get(key); if (count == null) { return 0; } else { return count.get(); } }
/** * Get number of unique counted objects * * @return Number of key objects for which increment * has been called. Equal objects are only counted once. */ public int size() { return data.size(); }
/** * Get all the unique counted objects * @return A set containing a refence to the counted objects. */ public Set getKeys() { return Collections.unmodifiableSet(data.keySet()); }
private static class Count { private int val = 0;
private void increment() { val++; }
private int get() { return val; } }
}
Chris Uppal - 08 Sep 2006 08:37 GMT > /* > * Created on Aug 14, 2004 > */ Not only highly intelligent but prescient too !
;-)
-- chris
Robert Klemme - 07 Sep 2006 20:15 GMT > [me:] >>> It would be a lot simpler to keep a separate Map from keys to [quoted text clipped - 5 lines] >> - more space efficient (just one hash table) >> - easier maintenance of consistency (only one map to update) Honestly, I don't understand your points:
> Agreed, but you don't list the downsides: more complicated code, Um, where is this code complicated?
> code which > does two unrelated things with one operation, The requirement that the cache class must count hits automatically means that two unrelated things happen in one method: lookup and counting. Or did you mean something else?
> code in whch the actual values > have a somewhat obscure relationship with the logical values. This is completely encapsulated inside the Cache class. And I also do not see how this is obscure.
Maybe I'm too tired right now to get your point. I'd appreciate it if you elaborate it a bit more. Or you illustrate your point with another solution so everyone can compare both of them.
Kind regards
robert
Patricia Shanahan - 07 Sep 2006 15:30 GMT >> I want to implement a cache, where the key class is implemented by >> myself, but the value class is java.util.List. I want to count the hits [quoted text clipped - 3 lines] > It would be a lot simpler to keep a separate Map from keys to hit-counts than > to try to keep the count "in" the key. If it were going to be kept anywhere in a single map, the hit count should be part of the value, not the key. However, two maps would be simpler and cleaner.
Patricia
Matt Humphrey - 07 Sep 2006 15:45 GMT > Sorry to everyone, > [quoted text clipped - 5 lines] > of this cache, separately for each cache entry. Therefore I want to > store the hits-counter at the key class. <snip code>
I agree with Patricia's assessment and I think you should focus on the value rather than the key--roughly
class CacheEntry { String key List list int hits }
Your cache should simply map String keys to CacheEntry . You can index this with any key, not necessarily the original, but you'll get back your list and hits as well as your original key (if you still happen to need it, which I doubt). You can switch to a more complex key and this needn't change, but if the tight coupling between values and hits bothers you, you'll have to use two separate hash tables (as others have pointed out) nominally for no other reason than to map the source keys into the cache keys.
Matt Humphrey matth@ivizNOSPAM.com http://www.iviz.com/
Patricia Shanahan - 07 Sep 2006 14:21 GMT > Hi everyone, > [quoted text clipped - 8 lines] > reference in the map. But I know, the string is "k". I want to retrieve > the instance that is the key in the map. If you know the String is "k", you can find the specific key instance by getting the keySet, and iterating over it checking for .equals equality to "k".
If you do this frequently, keep a separate hash map that contains each key in m as both key and value:
m_keys.put(key,key)
When you want to do the retrieval:
m_keys.get("k")
However, I have my doubts about a design that depends on overriding .equals in a class, but does not treat instances that equal each other as being equivalent. Note that two .equals equal instances cannot both be keys in the map at the same time.
What are you really trying to achieve?
Patricia
ralf@rw7.de - 08 Sep 2006 13:27 GMT Hi!
Thank you for your ideas. I'd like to summarize it in one mail.
Iterating over keySet is what I meant with "obvious brute force attack". The map is expected to have quite a lot of entries (could easily eat up a quarter of all available VM heap memory or so). In fact, this brute force attack currently used, but of course I want to have an alternative way.
A separate hashmap is not what I want. Then I have two of such big maps in memory instead of just one. That hit counter is a small nice-to-have, which does not justify any significant additional memory footprint or cpu usage.
Wrapping values into another class is also not an option for the same reason. This would require one additional object instance for each cache entry - too expensive for me.
Design - Yes I know, that an equals-method (and hashCode), that does not cover all member fields of the class (it omits field hits in my example) is not a nice design. However, all this is deeply buried within the core of the framework. Neither class Key nor Cache is publicly visible.
Memory Usage is quite important for me. The map is for caching database queries across transactions. All objects in the map are there for a long time, which moves them sooner or later into the old-gen space, which is subject to expensive full-GCs.
Until recently I solved the problem using commons-collections. The maps of commons-collections do provide a protected method getEntry(Object key). Within a subclass I can implement getKey(Object key) with getEntry(key).getKey(). Java Collections do have that getEntry method as well, but this is package private, so I cannot access is via subclassing.
While writing this mail I thought about calling getEntry via reflection, bypassing the package-private-restriction. Really ugly...
:-) Thank you for responses, Ralf.
Robert Klemme - 08 Sep 2006 14:32 GMT > Wrapping values into another class is also not an option for the same > reason. This would require one additional object instance for each > cache entry - too expensive for me. But the same is true for your approach of putting the counter into the key class (unless I am missing something). Memory wise it does not matter whether you wrap the key or the value. And if your basic key is a String for example, then you have an instance of your class Key which holds this string (=> 1 additional instance). There might be other aspects of your code that you did not exhibit yet.
> Design - Yes I know, that an equals-method (and hashCode), that does > not cover all member fields of the class (it omits field hits in my > example) is not a nice design. However, all this is deeply buried > within the core of the framework. Neither class Key nor Cache is > publicly visible. Even then you should care about design. Someone will have to maintain this X months from now and he / she will be thankful if you do.
> Memory Usage is quite important for me. The map is for caching database > queries across transactions. All objects in the map are there for a > long time, which moves them sooner or later into the old-gen space, > which is subject to expensive full-GCs. Is it caching queries or query results? Also, if it's query results, why do you have to cache them across transactions? This might lead to inconsistencies.
Kind regards
robert
ralf@rw7.de - 08 Sep 2006 17:24 GMT > > Wrapping values into another class is also not an option for the same > > reason. This would require one additional object instance for each > > cache entry - too expensive for me. > But the same is true for your approach of putting the counter into the > key class (unless I am missing something). I need my own key class anyway. The key is the database query, which consists of one or more objects, so I need a container class anyway. There is almost no additional cost for putting the counter there. Values are java.util.List, and don't want to implement my own List just for adding the hit counter. Maybe, some day I will use arrays instead of Lists (for saving even more memory), then this will not work anyway.
> Even then you should care about design. Someone will have to maintain > this X months from now and he / she will be thankful if you do. This is just a deal between performance and simplicity. Since I do write a persistence framework which will (hopefully) be reused many time, I tend to prefer performance.
> Is it caching queries or query results? Also, if it's query results, > why do you have to cache them across transactions? This might lead to > inconsistencies. Its caching results. Of course, as long as this cache is activated, one must not modify the database contents bypassing the framework. If this is intended, this cache must be deactivated. For clustering, there will be a special invalidation notification, but this is not yet implemented.
Best regards Ralf.
PS completely OT: I see, you don't use google groups for posting. Which NNTP server do you use? I could not find one, that allows posting.
Robert Klemme - 08 Sep 2006 21:31 GMT >>> Wrapping values into another class is also not an option for the same >>> reason. This would require one additional object instance for each [quoted text clipped - 5 lines] > consists of one or more objects, so I need a container class anyway. > There is almost no additional cost for putting the counter there. Compared to the overhead you pay for the list and the objects that make up your values the overhead of an additional object that encapsulates values and counts seems negligible. This smells a bit like premature optimization.
> Its caching results. Of course, as long as this cache is activated, one > must not modify the database contents bypassing the framework. If this > is intended, this cache must be deactivated. For clustering, there will > be a special invalidation notification, but this is not yet > implemented. Just out of curiosity: what features does your persistence framework provide that existing frameworks do not?
> PS completely OT: > I see, you don't use google groups for posting. Which NNTP server do > you use? I could not find one, that allows posting. http://news.individual.net/
Pretty reliable and a lot of groups but comes with a small cost. I was annoyed with trouble I had with free newsservers so I switched over.
Kind regards robert
ralf@rw7.de - 10 Sep 2006 07:13 GMT > Compared to the overhead you pay for the list and the objects that make > up your values the overhead of an additional object that encapsulates > values and counts seems negligible. This smells a bit like premature > optimization. May be. I did not yet do any real memory consumption test. I probably will do really soon. But until then I think, every object counts - at least for objects in the global cache (that will go into old-gen) and for numbers of objects growing with the size of the cache.
I'm also a bit annoyed, because the functionality I need is neither difficult to implement (for the JCF) nor ugly design. My way of storing the hits counter may be ugly, but providing the getKey functionality is not. It's a one-line-method, and it has just been forgotten.
Not that I want to do any JCF-bashing. In fact I think, JCF is one of the most well designed libraries ever written for Java. And Java is one of the best (general purpose) programming languages ever implemented.
> Just out of curiosity: what features does your persistence framework > provide that existing frameworks do not? I really like static type safety. Within my framework, its a compile-time error to assign a double value to a persistent field of type string. Its also a compile-time error to compare a string field to a double value (or a double field) within a database query. Same with using the "like" operator on a double field. And of course, queries for customer objects return a Collection<Customer>. Removing a persistent field or class from the schema gives you compile-time errors, whereever you used it (including database queries of course, not just getters/setters). Renaming a persistent field or class using a modern Java IDE really renames all references to that field, including those in queries. No more "column not found"-SQLExceptions. I *hate* writing database queries in some plain text language.
Second I don't like writing XML (or any other external, non-java) description of the persistent schema. Within COPE (did I mention the name of the framework already :-) ), everything is specified within your Java source code. Everything. Persistent classes, fields, constraints, queries.
And I want support for migrating the database to new versions of the application without loss of data. For instance, if you add a new persistent field to the schema you must not forget to add a new column to the database table. For COPE there is a helper application telling you which column (or table or constraint) is missing (or not used any longer) and lets you create (drop) it with a single click. I really missed this in my projects, because it's another typical source of that "column not found"-SQLExceptions.
And I don't want to waste my time remembering, which column type can store a string up to 400 characters most efficiently with that particular database I'm using right now. Was it "varchar(400)" or "varchar2(400)" or "text" or "tinytext" or just "string" ? Grmpf. COPE does that for you.
Ooops. Got a bit lengthy :-) . And of course the cache will be the most effifcient on earth when it's ready, but it's not yet. :-)
There is also a website, with some explanation:
http://cope.sourceforge.net/
> http://news.individual.net/ > > Pretty reliable and a lot of groups but comes with a small cost. I was > annoyed with trouble I had with free newsservers so I switched over. Thanks a lot. I just registered.
Best regards, Ralf.
Robert Klemme - 11 Sep 2006 10:47 GMT >> Compared to the overhead you pay for the list and the objects that make >> up your values the overhead of an additional object that encapsulates [quoted text clipped - 5 lines] > least for objects in the global cache (that will go into old-gen) and > for numbers of objects growing with the size of the cache. Btw, did you also consider using a LRU cache? It's pretty easy to implement and has little run time overhead.
>> Just out of curiosity: what features does your persistence framework >> provide that existing frameworks do not? > > I really like static type safety. [snip] I think this is dealt with by existing implementations of OOR mappings.
> Second I don't like writing XML (or any other external, non-java) > description of the persistent schema. Within COPE (did I mention the > name of the framework already :-) ), everything is specified within > your Java source code. Everything. Persistent classes, fields, > constraints, queries. EJB 3.0 and Hibernate can also use Java 1.5 annotations.
> And I want support for migrating the database to new versions of the > application without loss of data. For instance, if you add a new [quoted text clipped - 4 lines] > missed this in my projects, because it's another typical source of that > "column not found"-SQLExceptions. I'm not sure what EJB and Hibernate do about this but since this is such a common scenario I bet they have solved this already.
> And I don't want to waste my time remembering, which column type can > store a string up to 400 characters most efficiently with that > particular database I'm using right now. Was it "varchar(400)" or > "varchar2(400)" or "text" or "tinytext" or just "string" ? Grmpf. COPE > does that for you. Again, EJB and Hibernate deal with that, too. There is also db4o which also has nice querying features IIRC: http://www.db4o.com/
> Ooops. Got a bit lengthy :-) . > And of course the cache will be the most effifcient on earth when it's > ready, but it's not yet. :-) Certainly. :-) Sorry if I'm sounding discouraging but I have yet to see the feature that makes your work stand out against existing solutions.
> There is also a website, with some explanation: > > http://cope.sourceforge.net/ Thanks for that link!
>> http://news.individual.net/ >> >> Pretty reliable and a lot of groups but comes with a small cost. I was >> annoyed with trouble I had with free newsservers so I switched over. > > Thanks a lot. I just registered. No sweat. Have fun!
Kind regards
robert
Chris Uppal - 08 Sep 2006 15:32 GMT > Until recently I solved the problem using commons-collections. The maps > of commons-collections do provide a protected method getEntry(Object > key). Within a subclass I can implement getKey(Object key) with > getEntry(key).getKey(). Java Collections do have that getEntry method > as well, but this is package private, so I cannot access is via > subclassing. If your application is such that you have to consider memory use so carefully, then it is is certainly worth your while creating your own implementation of Map which has all the features you need and lacks ones you don't. (E.g you probably wouldn't want the object count overhead of using the Entry objects, but would just have two or three parallel arrays to hold keys, values, and counts).
-- chris
ralf@rw7.de - 08 Sep 2006 17:07 GMT > If your application is such that you have to consider memory use so carefully, > then it is is certainly worth your while creating your own implementation of > Map which has all the features you need and lacks ones you don't. (E.g you > probably wouldn't want the object count overhead of using the Entry objects, > but would just have two or three parallel arrays to hold keys, values, and > counts). I want to switch to pcj.sourceforge.net somewhere in the future. pcj does open hashing, so there are no Entry-objects anyway. Unfortunately pcj still does not support an equivalent to LinkedHashMap, so I still have to use Java Collections.
If pcj supported a LinkedHashMap and a getKey()-function, then this would perfectly fit my requirements. So I rather will extend pcj somewhere in the future then implementing my own hashmap.
Ralf.
Free MagazinesGet these publications absolutely FREE for up to 12 months. There are no hidden fees and no obligation. Simply choose a title, complete the application form and submit it. Read more ...
|
|
|