Java Forum / General / April 2007
Best database for implementing a cache
vj - 30 Mar 2007 19:42 GMT Hi all,
I need to implement a cache of images that ae being fetched from a remote server. So I am in need of a very fast file based database that can fulfill my needs. The database should be embeddable with my application and should store all its data in a file. I have explored apache derby and hsqldb but they are probably overkill for my problem since i do not need ACID transaction support moreover all data will be stored in a single table rather than a single table. Any suggestion which database might server me well.
Thanks,
VJ
Eric Sosman - 30 Mar 2007 20:06 GMT vj wrote On 03/30/07 14:42,:
> Hi all, > > I need to implement a cache of images that ae being fetched from a > remote server. So I am in need of a very fast file based database that > can fulfill my needs. How fast is the connection to the remote server? How much faster than that connection must your database be in order to qualify as "very fast?" ;-)
> The database should be embeddable with my > application and should store all its data in a file. I have explored > apache derby and hsqldb but they are probably overkill for my problem > since i do not need ACID transaction support moreover all data will be > stored in a single table rather than a single table. A distinction of considerable subtlety ...
> Any suggestion which database might server me well. If I'm asking a question whose answer is obvious to everyone else, I apologize, but: What do you want the database to do for you? The only requirement you've mentioned is that you want to deposit the images in it "very fast," but you haven't said what you want to do with them afterwards. Do you need to retrieve the images in "as received" condition, or do you need to pluck out bits and pieces of transformed image data? Do you need to do OCR on the images and index them by all the words that are recognized? Do you need to do face recognition, fingerprint matching, ...?
If all you need is rapid writing, you're in luck: Every Unix variant comes with an absolutely free, VERY fast, write-only database called /dev/null ;-)
 Signature Eric.Sosman@sun.com
vj - 30 Mar 2007 23:04 GMT > If I'm asking a question whose answer is obvious to > everyone else, I apologize, but: What do you want the [quoted text clipped - 7 lines] > that are recognized? Do you need to do face recognition, > fingerprint matching, ...? Sorry i think that i failed to mark the word cache in bold letter. May i opine that would also be obvious to every one else here that a cache server simply stores frequently accessed data & usually it does not transforms it in any way.
Lew - 31 Mar 2007 01:01 GMT > Sorry i think that i failed to mark the word cache in bold letter. May > i opine that would also be > obvious to every one else here that a cache server simply stores > frequently accessed data & > usually it does not transforms it in any way. Not hardly - it wasn't all that obvious to me, anyway. And even knowing what something "usually" does is no help at all in knowing what you, particularly, want.
May I opine that snide remarks to people who are trying to help you might have an unfortunate effect on that help?
So how about you help people who want to help you?
-- Lew
vj - 31 Mar 2007 08:16 GMT I apologize for my sneeky comments for what seems as an honest reply , unfortunately misinterpreted as sarcastic. Surely i am indeed greateful to you all becuase you have bailed me out countless times but honestly the reply posted to my query was not what i was expecting.
Eric Sosman - 31 Mar 2007 14:03 GMT >> If I'm asking a question whose answer is obvious to >> everyone else, I apologize, but: What do you want the [quoted text clipped - 13 lines] > frequently accessed data & > usually it does not transforms it in any way. Fine, but you still haven't explained what this <b>cache</b> is supposed to do. (Well, you <i>have</i> explained, but saying that it "simply stores frequently accessed data" is obviously wrong: if it were right, you'd have already followed my earlier suggestion to use /dev/null.)
So: What search criterion or criteria is or are used to retrieve things from this <b>cache</b>? Can the <b>cache</b> expand indefinitely, or is some kind of housecleaning policy envisioned (and if so, what kind)? And how much faster than your network connection must the <b>cache</b> be in order to meet your requirement of "very fast?"
No one can become a successful programmer without the ability to describe the problems he or she is trying to solve. More often than not, getting the problem description right is the hardest part of doing the job.
 Signature Eric Sosman esosman@acm-dot-org.invalid
vj - 01 Apr 2007 13:08 GMT > >> If I'm asking a question whose answer is obvious to > >> everyone else, I apologize, but: What do you want the [quoted text clipped - 35 lines] > Eric Sosman > esos...@acm-dot-org.invalid Ok let me describe the problem that i am working on in detail.
I am building a Map server that provides images from google maps database. Now i went through the javascript at maps.google.com and was able to reverse engineer the urls and query formats that they use. The good ( or the bad) thing is that they provide map image in block pieces. So i am building another http server that will provide the clients with these blocks using a comparatively very simple api and with a side effect of exploiting a cache of these block images in order to provide quicker service.
So since fetching these blocks will be much faster from a local server rather than a remote server, i want to implement this cache. The client application will request appropriate blocks from the server and display them in a pre described format. target platforms for client apps are .Net and probably a webapp making use of AJAX.
I hope that it will make thing lot clearer then they previously were. And would further strengten my point that i just need to save the images and provide them back as it is rather then doing any transformation on the server which is why i initially used the term <strong> cache </strong>.
Patrick May - 01 Apr 2007 17:04 GMT > Ok let me describe the problem that i am working on in detail. > [quoted text clipped - 13 lines] > platforms for client apps are .Net and probably a webapp making use > of AJAX. Let me preface this with the full disclosure that I work for a company (GigaSpaces) that produces a commercial version of the solution I'm going to suggest. I am not responding to push our product, but because I believe the underlying technology is a good fit for your requirements.
The underlying technology is Jini (http://www.jini.org), and more specifically the JavaSpaces service. A JavaSpace makes an ideal cache for your image blocks, allowing them to be served at in-memory speeds from your local server. The API is very simple, consisting of the four core methods write(), read(), take(), and notify(). Pre-processing of your image blocks to the required formats is easily accomplished through the use of the standard JavaSpaces master-worker pattern.
The capabilities of Jini in general and JavaSpaces in particular are not limited to caching, but they do fill that role well. Blitz (http://www.dancres.org/blitz/) is an excellent open source implementation and the commercial implementation I mentioned above is available at http://www.gigaspaces.com.
Regards,
Patrick
------------------------------------------------------------------------ S P Engineering, Inc. | Large scale, mission-critical, distributed OO | systems design and implementation. pjm@spe.com | (C++, Java, Common Lisp, Jini, middleware, SOA)
Chris Uppal - 02 Apr 2007 11:27 GMT > I am building a Map server that provides images from google maps > database. Now i went through the javascript at maps.google.com and was [quoted text clipped - 4 lines] > with a side effect of exploiting a cache of these block images in > order to provide quicker service. Legal question: have you checked this scheme with a real lawyer ? It sounds very dodgy to me. http://maps.google.com/intl/en_ALL/help/terms_local.html
> So since fetching these blocks will be much faster from a local server > rather than a remote server, i want to implement this cache. Why not just put a caching proxy server between yourself and maps.google.com ? That sounds easier than reinventing the wheel, and may work better when it comes things like expiring cache entries in a timely manner.
-- chris
Daniel Pitts - 30 Mar 2007 20:14 GMT > Hi all, > [quoted text clipped - 10 lines] > > VJ Why not just write then to the disk directly? Most File Systems can be treated as a form a database, and it seems like thats kind of what you want anyway.
vj - 30 Mar 2007 22:59 GMT > Why not just write then to the disk directly? Most File Systems can be > treated as a form a database, and it seems like thats kind of what you > want anyway. Yes i am writing the image files directly to the filesystem as it is. However i need to limit the size of the cache so that it may not grow arbitarily large. that would require me to maintain access count metric for evicting existing stored images. Since filesystem itself would not let me assiciate this info with the image I tought of a utilizing a database that would store the access count as well as all the available images in the cache.
> How fast is the connection to the remote server? How > much faster than that connection must your database be in > order to qualify as "very fast?" ;-) The remote server is situated in another continent and will be connected via a measly 100 Kbps isdn. The average image size is abt 1 Mb that would theoretically take 80 seconds to download. The clients are connected to the server via 1000Mbps lan that can receive this image in 0.008 seconds. So for the database connection that would qualify as 'very fast' should let me acheive this theoretical limits as close as possible ;-)
Arne Vajhøj - 31 Mar 2007 02:11 GMT >> Why not just write then to the disk directly? Most File Systems can be >> treated as a form a database, and it seems like thats kind of what you [quoted text clipped - 7 lines] > utilizing a database that would store the access count as well as all > the available images in the cache. What about keeping files on disk, having meta data in memory and simply rebuilding meta data at startup (list all files on disk and start with zero counters) ?
Arne
vj - 31 Mar 2007 08:36 GMT > >> Why not just write then to the disk directly? Most File Systems can be > >> treated as a form a database, and it seems like thats kind of what you [quoted text clipped - 13 lines] > > Arne Certainly thats a good idea. How aboute mainting a hashtable and a priority queue. The hashtable will store all the meta data associated with a file and the priority queue will implement file aviction policy. as soon as a a file is hit i will increase the hit count. On a cache miss i will fetch the image , create an entry in the hash table and push an entry in the queue. if the queue size if creater than a certain treshhold i will pop the element from its head (with the lowest access count) delete its corrosponding file / hashtable entry.
This seems to be a good idea but i have a small doubt. Does propirity queue dynamicaly orders elements. I mean since the priority queue is implemented as a heap in java hence element ordring only takes place when elements are added. This will create problems as because if i update access count in hashtable then they wont be reflected in the structure of the queue. Hence when i pop element from its head for eviction it might not be the one with minimum access count.
What do you think, Any other data structure that we can use for fast lookup of elements
Arne Vajhøj - 01 Apr 2007 00:53 GMT > Certainly thats a good idea. How aboute mainting a hashtable and a > priority queue. The hashtable will store all the meta data associated [quoted text clipped - 15 lines] > What do you think, Any other data structure that we can use for fast > lookup of elements What about: HastMape, key=filename, value=(file content, list with access times) cleanup code triggered either by time or every n'th access iterates over HashMap and: - remove entries that have too few or too old accesses - trim list with access times for too old accesses for entries kept ?
Arne
vj - 01 Apr 2007 12:58 GMT > > Certainly thats a good idea. How aboute mainting a hashtable and a > > priority queue. The hashtable will store all the meta data associated [quoted text clipped - 25 lines] > > Arne but wont the iternation over the hashmap be slow. O(n) ...
Arne Vajhøj - 01 Apr 2007 18:00 GMT >> What about: >> HastMape, key=filename, value=(file content, list with access times) [quoted text clipped - 5 lines] > > but wont the iternation over the hashmap be slow. O(n) ... It is O(n). Slow is a relative concept.
When we are talking about network IO and disk IO then any access to in memory data structures are usually insignificant.
And remember the iteration is only done every X minute or for every Y lookup.
Arne
steve - 01 Apr 2007 22:37 GMT >>> What about: >>> HastMape, key=filename, value=(file content, list with access times) [quoted text clipped - 16 lines] > > Arne then do it in another thread.
vj - 02 Apr 2007 08:43 GMT > >>> What about: > >>> HastMape, key=filename, value=(file content, list with access times) [quoted text clipped - 18 lines] > > then do it in another thread. ok i agree that iterating over the hash map will be faster than the Network & disk i/o . But still if i could avoid it will be better.
Arne Vajhøj - 03 Apr 2007 01:45 GMT > ok i agree that iterating over the hash map will be faster than the > Network & disk i/o . But still if i could avoid it will be better. Only if you could do it without adding code and complexity to your project.
Adding code and complexity for optimizations that are not measurable is bad practice.
Arne
alexandre_paterson@yahoo.fr - 02 Apr 2007 17:45 GMT ...
> Certainly thats a good idea. How about mainting a hashtable and a > priority queue. The hashtable will store all the meta data associated [quoted text clipped - 6 lines] > What do you think, Any other data structure that we can use for fast > lookup of elements I may be misunderstanding but aren't you simply looking for an LRU / MRU cache? (Least Recently Used / Most Recently Used cache) If that's the case, you simply use a LinkedHashMap and override one method to have the exact "aviction policy" [sic] you want. This is well detailed in LinkedHashMap's JavaDocs.
Free MagazinesGet these publications absolutely FREE for up to 12 months. There are no hidden fees and no obligation. Simply choose a title, complete the application form and submit it. Read more ...
|
|
|