A lot to comment on.
1. When I say huge data structures I mean a lot of varying sized data
structures from very small to few megabytes in the larger. When I'm
saying a lot I refer to 1000s of such separate structures.
2. I don't see how weak (or other) references can help in this case.
For as long as they have reference they will remain in memory and when
they lose reference then anyway they will be removed (by gc). What I
want is to load and unload from memory as needed.
3. When I say some are accessed frequently I mean every 30 seconds, and
this applies the small data structures mainly. The larger structures
are accessed every 10-15 minutes.
4. I tried the zipping on the larger data structures which are less
frequently accessed and it caused the OS to stuck on IO wait when
zipping and unzipping (gzip input/output stream)
5. Serialization and deserialization are very expensive, however is
there any clean way to avoid them?
Hi zeus,
> A lot to comment on.
> 1. When I say huge data structures I mean a lot of varying sized data
> structures from very small to few megabytes in the larger. When I'm
> saying a lot I refer to 1000s of such separate structures.
OK. A "good" solution should be able to deal with this.
> 2. I don't see how weak (or other) references can help in this case.
> For as long as they have reference they will remain in memory and when
> they lose reference then anyway they will be removed (by gc). What I
> want is to load and unload from memory as needed.
Well, AFAIK there *are* possibilities to get informed when a WeakRef
will be 'removed'. IIRC, ReferenceQueue does this (but working with
finalize (which works much better than it is said to) may work as well).
So, when a the WeakRef is removed, you store the object to disk, and
when someone tries to access it afterwards, you reload it. I think, this
should work!
> 3. When I say some are accessed frequently I mean every 30 seconds, and
> this applies the small data structures mainly. The larger structures
> are accessed every 10-15 minutes.
It should be no greater problem to "count" how often/frequent an object
is accessed and to primarily suspend objects to disk that are accessed
in a low frequency.
> 4. I tried the zipping on the larger data structures which are less
> frequently accessed and it caused the OS to stuck on IO wait when
> zipping and unzipping (gzip input/output stream)
That problem should not occur when using this self-implemented solution.
> 5. Serialization and deserialization are very expensive, however is
> there any clean way to avoid them?
You can implement your own methods for doing this, but I cannot say in
how far this is faster that the java-build-in serialisation.
Ciao,
Ingo