We've got an app which is running slow, and a CPU profiler shows that 90% of
the time is being spent in java.io.RandomAccessFile.seek(). This is
mystifying, because we are reading 2K pages sequentially out of about 30
files, so there should be at maximum 30 seeks. At 10ms per seek, it should
take 300ms, but it actually is taking 2 seconds.
Is there any possible way for Java to measure the number of disk seeks and
when they're happening?
If not, are there any operating-system level tools that will do it? We're on
Win XP. It wouldn't hurt to have a tool that would do it on Linux as well.
****
On a related question: if I go through the hassle of learning about NIO and
selectors for disk access, is it possible that multiple threads, all
contending for different pages on disk randomly, can have their requests
scheduled intelligently? Can NIO notify me when a page has been read into
memory? My guess is that NIO can do this, but I'd hate to blow a week on it
to find out that it can't.
Oliver Wong - 25 Oct 2005 22:28 GMT
> We've got an app which is running slow, and a CPU profiler shows that 90%
> of the time is being spent in java.io.RandomAccessFile.seek(). This is
> mystifying, because we are reading 2K pages sequentially out of about 30
> files, so there should be at maximum 30 seeks. At 10ms per seek, it should
> take 300ms, but it actually is taking 2 seconds.
Wow. Your application domain is such that 1.7 seconds makes a
difference? I'm curious as to what it is.
> Is there any possible way for Java to measure the number of disk seeks and
> when they're happening?
I suspect not, as that sounds like it'd be getting "too close to the
metal", so to speak. Some devices running Java don't even have disks (e.g.
cell phones).
> If not, are there any operating-system level tools that will do it? We're
> on Win XP. It wouldn't hurt to have a tool that would do it on Linux as
> well.
Have you considered creating a RAM disk to speed up read-access?
- Oliver
sconeek - 26 Oct 2005 01:11 GMT
you can use a win based freeware app called filemon. it basically tells
you which programs are writing or reading to the HDD. it should give
you some idea of where your time is being taken and by which file, as
it could be something else completely. hope this helps.
Roedy Green - 26 Oct 2005 04:57 GMT
>We've got an app which is running slow, and a CPU profiler shows that 90% of
>the time is being spent in java.io.RandomAccessFile.seek().
That I find odd. I can easily see a lot of elapsed time being spent
in there, but what the heck could chew up CPU time?
Sounds like something might be calling seek totally unnecessarily over
and over.
In the olden days you could in the debugger insert some code into
method and have it log call stack dumps so you could analyse where it
was getting called from most frequently.
Surely there must be sexy tools around today to do that for you.
See http://mindprod.com/profiler.html
I suppose you could put your own wrapper around RandomAccessFile to
monitor seek usage. See http://mindprod.com/jgloss/trace.html
Perhaps you could get your debugger to stop on every 149th seek and
just get an intuitive idea where the seek calls are coming from.
Or just count the seeks and see if that sounds plausible.
have a look at all the places you call seek, and see if you really
need to.
If you get really desperate, try reimplementing the RandomAccessFile
stuff with nio, possibly even using memory mapped files.
Another possibility is to negotiate with me to solve this for you for
a fee. You don't pay unless I get you an X% speedup. I would need all
the code to experiment with on my own machine.

Signature
Canadian Mind Products, Roedy Green.
http://mindprod.com Java custom programming, consulting and coaching.
Roedy Green - 26 Oct 2005 04:59 GMT
>This is
>mystifying, because we are reading 2K pages sequentially out of about 30
>files, so there should be at maximum 30 seeks.
If you are reading sequentially, really, then sequential access can
give better performance with potential for lookahead buffering.

Signature
Canadian Mind Products, Roedy Green.
http://mindprod.com Java custom programming, consulting and coaching.
Missaka Wijekoon - 26 Oct 2005 07:20 GMT
> We've got an app which is running slow, and a CPU profiler shows that 90% of
> the time is being spent in java.io.RandomAccessFile.seek(). This is
[quoted text clipped - 7 lines]
> If not, are there any operating-system level tools that will do it? We're on
> Win XP. It wouldn't hurt to have a tool that would do it on Linux as well.
Do you see the same problem on Linux?
If you can run it on Linux, do an "strace -p <pid of jvm>". What system
calls does the JVM make when it is thrashing? Try using vmstat to look
at disk activity.
What other processes are running or contending for disk and/or cpu
usage? How much free memory does the machine have? How does this
compare to the size of the JVM?
-Misk