We have an application with, say, 2560 megs of heap space used.
Currently running 1.4.2 w/hotspot using the concurrent GC for the
tenured generation. 128 meg eden, no survivor spaces. We are not using
the parallel eden collector, since we are running on boxes with 2 proc
hyperthreaded Xeons which don't seem to be big enough to make the overhead
of the parallel algorithm worthwhile... it is much much slower.
This is an application that is processing a high request rate with fairly
low tolerances for GC pauses (a few hundred ms is ok).
Most of the tenured generation is used for long lived data that seldom
changes. The eden is collected every five seconds or so under peak loads,
virtually all of that is normally objects created during request processing
that are thrown away and never tenured. Small survivor spaces with
a low tenuring threshold would possibly be nice for the request
objects that happen to be live when the eden GC runs to avoid tenuring
them, but a command line argument parsing bug in the current 1.4.2
prevents that from being possible right now.
So far so good. Now I want to add in a ~500 meg data cache that
caches data lookups that have to be done across the network. This
cache has a fairly high lookup rate, say 50000 lookups/sec, and a
90% hit rate. The purpose of this cache is to reduce the need for
network queries, it isn't just an object pool or some such that is
extraneous and easily eliminated.
My concern is that with a standard cache replacement policy, this would
seem to result in a large number of tenured --> eden cross generational
references as the data in the cache gets replaced. We already have
issues with eden GC taking up to 5 seconds or so at times when we do
update some of the large data structures that normally end up sitting in
the tenured generation.
Does anyone have any suggestions about other approaches here?
In particular:
- could a more specialized cache replacement algorithm be used to minimize
this overhead? Any references to work in this area?
- any other VMs that offer different GC stragegies that may work better
here? I haven't noticed anything in 1.5 that changes any of this much.
- what if we did our own management of memory used for the cache, so it
would end up being allocated once and always being in the tenured
generation then we have to worry about the details of what lives
where? It seems horriible to have to consider this, but there seem
to be limited options.
any comments or references would be appreciated...
Chris Uppal - 16 May 2004 11:06 GMT
> My concern is that with a standard cache replacement policy, this would
> seem to result in a large number of tenured --> eden cross generational
> references as the data in the cache gets replaced. We already have
> issues with eden GC taking up to 5 seconds or so at times when we do
> update some of the large data structures that normally end up sitting in
> the tenured generation.
Just a thought, but it should be easy to test -- can you just make the cache
immutable and do the "replacement" by making a modified copy of the whole
thing ? It might even pay for itself in reduced lock contention without
considering cache effects.
-- chris