Java Forum / General / July 2006
what's better way to store a million keys in mem?
John_Woo - 17 Jul 2006 15:11 GMT Hi,
A application, needs to check whether a user already logined, by looking at static var in memory.
I don't have good idea. what's doing is, use Hashmap <not sure Hashtable is better interms of inserting/removing/finding> to store ID <string> -- key and status <character> -- value.
question: what the better way to implement this goal, it should support up to 1 million of keys, in terms of high inserting/removing/finding, and happened more frequently.
-- Thanks John Toronto
dsjoblom@abo.fi - 17 Jul 2006 16:09 GMT > Hi, > [quoted text clipped - 9 lines] > million of keys, in terms of high inserting/removing/finding, and > happened more frequently. If you *must* keep track of users by keeping user data in memory, this is probably one of the quickest solutions. But it will eat up a lot of memory.
If we assume an overhead of 8 bytes per object and 4 byte references, a HashMap will use /at least/ 28 bytes per entry and that does not include any data you put in. So if we assume your keys and values are say 50 bytes on average, a HashMap will use /at least/ around 80 megabytes for 1 million entries. And that can probably be safely rounded up to about 100 MB. If you run a 64-bit machine, you are even worse off. While this may not sound too much, all of this memory is memory that the main part of your application can't use.
HashMaps (by design) are also notorious for thrashing the CPU cache, because of bad locality of data, and this is made even more painful in java, because everything in the map is *also* a reference which augments the problem.
I'd give a real database a try for this. There is of course a bit of overhead, but you use a lot less memory and I guess it will also be faster. Of course, you should benchmark and compare any solutions suggested.
Regards, Daniel Sjöblom
Mark Space - 17 Jul 2006 16:29 GMT >> Hi, >> [quoted text clipped - 13 lines] > is probably one of the quickest solutions. But it will eat up a lot of > memory. And it won't be stored in memory anyway. Every system now-a-days uses virtual memory. Which means that memory that isn't used much is off-load to disk, and only loaded back when needed. So your data will be stored on disk anyway.
A database is a good idea. Maybe check out file locking as an alternative. For example, have a list of users and passwords in a file. Maybe call it /etc/passwd or something. Each time a user logs in, lock the first byte of his/her entry in the /etc/passwd file. This may take more memory and time than a Java hash map but who knows, sometimes these low level system operations are highly optimized so, on the other hand, you also might come out ahead.
Off the top of my head, I think the other way to do this is to write a very efficient program in C, then let Java talk to it over a socket or something.
Final thoughts: one million users is A LOT. I mean, REALLY A LOT. There are only 40,000 or so user ports on a Unix system with TCP/IP. Have you very carefully investigated the requirements of this system you are building? I'm starting to seem more than just "a database" or "use C." I think you'll need to build out some kind of load balancing array, and that's non-trivial. Even a couple thousand users logged in at any given time is going to stress out a single system greatly. Better think this through carefully if it's important...
Mark Space - 17 Jul 2006 16:35 GMT >>> Hi, >>> [quoted text clipped - 23 lines] > Maybe call it /etc/passwd or something. Each time a user logs in, lock > the first byte of his/her entry in the /etc/passwd file. This may take Oops, John I think I got confuzzled with your previous post. Ignore the stuff about locking users. Everything else about one million users still applies I think, at least in principle.
> Off the top of my head, I think the other way to do this is to write a > very efficient program in C, then let Java talk to it over a socket or [quoted text clipped - 8 lines] > time is going to stress out a single system greatly. Better think this > through carefully if it's important... Vincent Cantin - 17 Jul 2006 20:12 GMT > I'd give a real database a try for this. There is of course a bit of > overhead, but you use a lot less memory and I guess it will also be > faster. Of course, you should benchmark and compare any solutions > suggested. I also think that you should use a database. You can use a ORM (Object-Relation Mapping) like the javax.persistence API of JEE5 or hibernate directly, for example.
steve - 17 Jul 2006 23:02 GMT > Hi, > [quoted text clipped - 14 lines] > John > Toronto this will not scale with a single CPU, you will need multiple cpu's and servers , if you tend to have a million users. and if you use multiple servers you cannot store the keys in memory, unless you share the memory between processors. so therefore you are looking at some sort of database or shared disk storage. something like oracle , will allow you to have multiple physical servers & cpu accessing the same database, as if it is a single entity, you can even partition the database into areas of say 100,000 so that the loading on the diskdrives is partitioned and spread evenly between the different cpu's.
Steve
John_Woo - 18 Jul 2006 01:01 GMT > > Hi, > > [quoted text clipped - 26 lines] > > Steve Thanks Steve, that sounds. Currently we have multi-cpus, may use multi-server later, but still no database designed for holding the login-status.
Can u tell more about how to use shared disk in java, how to test it?
-- John
Thomas Hawtin - 18 Jul 2006 14:11 GMT > A application, needs to check whether a user already logined, by > looking at static var in memory. [quoted text clipped - 7 lines] > million of keys, in terms of high inserting/removing/finding, and > happened more frequently. If you arrange your IDs to be sequential, then you could use an AtomicIntegerArray. That would have little overhead over the minimum possible memory requirement. If the IDs are not sequential, the a linear probing hash map would make sense. Either find an open source implementation or write your own. A lot easier and faster than using C or a database.
Tom Hawtin
 Signature Unemployed English Java programmer http://jroller.com/page/tackline/
Chris Uppal - 18 Jul 2006 14:46 GMT > If you arrange your IDs to be sequential, then you could use an > AtomicIntegerArray. That would have little overhead over the minimum > possible memory requirement. If the IDs are not sequential, the a linear > probing hash map would make sense. Either find an open source > implementation or write your own. A lot easier and faster than using C > or a database. I think the latter would be better even if the IDs are sequential -- the whole million are unlikely to be logged in at once.
-- chris
Thomas Hawtin - 18 Jul 2006 15:13 GMT >> If you arrange your IDs to be sequential, then you could use an >> AtomicIntegerArray. That would have little overhead over the minimum [quoted text clipped - 5 lines] > I think the latter would be better even if the IDs are sequential -- the whole > million are unlikely to be logged in at once. If it's only one status byte per user, then we are only talking 1 MB - tiny. Using AtomicIntegerArray in that way is relatively easy (so long as you know what you are doing) and removes problems related to locking.
Tom Hawtin
 Signature Unemployed English Java programmer http://jroller.com/page/tackline/
Chris Uppal - 19 Jul 2006 09:16 GMT [me:]
> > I think the latter would be better even if the IDs are sequential -- > > the whole million are unlikely to be logged in at once. > > If it's only one status byte per user, then we are only talking 1 MB - > tiny. Or even less if you use 1-bit per user. That way you can run the whole system on the CTO's mobile phone ;-)
-- chris
Free MagazinesGet these publications absolutely FREE for up to 12 months. There are no hidden fees and no obligation. Simply choose a title, complete the application form and submit it. Read more ...
|
|
|