Java Forum / General / May 2006
Dump complete java VM state as core dump (not via OS) possible?
halfdog - 10 May 2006 14:07 GMT Hi everyone,
I've a problem debugging an application.
Background: Sometimes my application comes to a very unlikely state, which at the moment results in an error message. The stack trace alone has no great value, since this state is cause by the interaction of more than one thread. The state is resolved throwing an exception, the program continues normally.
Goal: If I reach this state, I want to suspend the application, dump the complete state of all java threads, objects, ... (complete java memory core dump) to analyse it later.
Question: Is there a possibility to generate such dumps?
Just thinking:
One posibility would be, to send a signal from the vm to some external program (e.g. udp packet), this program attaches a standard OS-level debugger (e.g. gdb under linux), dumps the core, resumes app, detach. But I guess reconstruction of internal VM state from complete dump is rather hard, is it?
I've looked at jdb, to see if it has some java-core-dump functions, but it seems not to be so. Are there alternative implementations, or helper scripts that make jdb loop over all java object addresses and dump them?
Does someone known about the jdb remote debugging interface? Is the protocol public, can I implement such a feature on myself?
stixwix - 10 May 2006 14:29 GMT > Goal: If I reach this state, I want to suspend the application, dump > the complete state of all java threads, objects, ... (complete java > memory core dump) to analyse it later. > > Question: Is there a possibility to generate such dumps? If it stays in this state long enough for human intervention then you can do kill -3 [process number] under Linux I think. Andy
Razvan - 10 May 2006 14:35 GMT > Goal: If I reach this state, I want to suspend the application, dump > the complete state of all java threads, objects, ... (complete java > memory core dump) to analyse it later. Just curios ? What tools would you use to analyze that ?
I never used very advanced debugging techniques. For me, "System.out.println()" and Java exceptions are more than enough. Whenever I had a complex issue I just analyzed the whole algorithm with extra attention. I may loose several hours just thinking but it the end it always paid out for me. The truth is that I am too lazy to use more advanced debugging techniques.
Regards, Razvan
http://www.mihaiu.name/2004/sun_java_scjp_310_035/
> Hi everyone, > [quoted text clipped - 27 lines] > Does someone known about the jdb remote debugging interface? Is the > protocol public, can I implement such a feature on myself? halfdog - 10 May 2006 15:07 GMT stixwix wrote:
> If it stays in this state long enough for human intervention then you > can do kill -3 [process number] under Linux I think. > Andy There are two problems: The state is not reproducible, it occurs at any day or nighttime. Secondly, the server is contacted by various xml-rpc clients. If they do not get a response within [http-client-timeout] (120sec) they fall out of sync and the connected clients will report errors. So halting the system for longer than 1min is out of question.
(Apart from that: with gdb --pid [processID] you can attach to any running process, then call "generate-core-file x.core", "quit" and you have a core dump to analyse without killing the process)
> Just curios ? What tools would you use to analyze that ? I heard that there is a tool to analyse OS process core dumps from java VMs and reconstruct some of the java object state information. I have no information how to use these and how good the data reconstruction would be.
> I never used very advanced debugging techniques. For me, > "System.out.println()" and Java exceptions are more than enough. > Whenever I had a complex issue I just analyzed the whole algorithm with > extra attention. I may loose several hours just thinking but it the end > it always paid out for me. The truth is that I am too lazy to use more >advanced debugging techniques. I also used these very frequently until I had some strange debugging problems to solve:
1: Very rare time race condition: There was no possibility to reproduce the bug, it occured at a frequency of about 1:1000 000, a test program calling the methods could make them fail when running long enough:
Problem: Attaching any loggers or system.outs modified the time course of the program (possibly through thread scheduling when doing IO ops), so that it was never possible to get the error. Removing the logging output made it reapper.
Solution: Added many unneeded synchronized{} blocks, so that the error resulted in a deadlock, which was debuggable
Now it was a deadlock problem, which is also impossible to debug with System.out, but with debugger attached it is possible to see which threads wait for monitors. Afterwards you can fix the buggy code
Razvan Mihaiu - 10 May 2006 15:44 GMT Just some suggestions:
1. Take the system out of the production environment. I mean replicate the system somewhere else. I cannot see you working calmly and efficiently on a system that cannot be down more than 1 minute.
2. Insert as many "System.out" statements as possible. Sometime they can help even in case of deadlocks - for example you expected a certain statement to be printed but it wasn't. This could be a good indication that a deadlock occurred.
I agree - in case of a complex system, this is a nightmare.
2. Create a test application that will also parse the log file of your application. When a certain error occurs instruct the test application to stop. Also make a log file for the test application itself. You need to know the exact steps that were done right before the error.
Now, it all depends on how much time you can spend on this. On most bugs you cannot spend too much time - so what I told you might still be impracticable.
Just out of curiosity: how many threads are you speaking about ?
http://www.mihaiu.name/2004/sun_java_scjp_310_035/
halfdog - 10 May 2006 16:29 GMT > 1. Take the system out of the production environment. I mean replicate > the system somewhere else. I cannot see you working calmly and > efficiently on a system that cannot be down more than 1 minute. The main difficulty is: the errors are not easily replicable, but may cause severe troubles in future. Since i'm an perfectionist, I want to fix them after a single occurence. All errors that I could replicate are already fixed.
I have a test system and test programs which I run in indefinite loops, but they only produce some errors (or do you have a test program for the case: client DNS mapping changes during transaction, or https certificate expires while reading data?)
The remaining errors have no clear cause, e.g:
"Resource already in use": The application detects that a resource is still in use so it cannot open it, client call fails (no exception, ..) As a I know, there should be no lock on the resource. Which other thread still holds the lock? Or is there even no lock owner, perhaps it is just an error in the locking system itself?
> 2. Insert as many "System.out" statements as possible. Sometime they > can help even in case of deadlocks - for example you expected a certain > statement to be printed but it wasn't. This could be a good indication > that a deadlock occurred. The difficulty is: one thread can detect when the problem has occured, but has nothing to do with it (see resource in use example), so writing log output in this thread does not make much sense. So all other threads have to produce the logging output (they already do, with level=ALL i get about 100kb/s log info on server when running high throughput tests, client produces about 800kb in 30sec transaction). When I have a clue about an error, i work through it, but it is out of question to enable logging on the production site.
> Just out of curiosity: how many threads are you speaking about ? With no load 20 (timeout checkers, cache optimizers), then 1 per client and service (if a client needs 3 services in parallel - which is normal, it will start 3 threads).
Razvan Mihaiu - 10 May 2006 20:59 GMT > The main difficulty is: the errors are not easily replicable, but may > cause severe troubles in future. Since i'm an perfectionist, I want to > fix them after a single occurence. All errors that I could replicate > are already fixed. If you can do that please post your technique here. I will sure follow this thread in the future.
Maybe with the tools that you mention you can do something like this. Keep the group informed.
> With no load 20 (timeout checkers, cache optimizers), then 1 per client > and service (if a client needs 3 services in parallel - which is > normal, it will start 3 threads). I have to admit. I have not developed such a complex network application in Java.
Regards, Razvan
halfdog - 11 May 2006 10:14 GMT I did some more searching and found some methods in java.lang.Class Runtime:
void traceInstructions(boolean on) Enables/Disables tracing of instructions. void traceMethodCalls(boolean on) Enables/Disables tracing of method calls.
After my "bad event", i'll enable these for some seconds to capture at least some of the other (live) threads, but their data still stays invisible.
It seems that there is no automated way to capture vm state data for debugging, if someone has one, pls let me know!
Chris Uppal - 11 May 2006 15:09 GMT > void traceInstructions(boolean on) Enables/Disables tracing > of instructions. > void traceMethodCalls(boolean on) Enables/Disables tracing of > method calls. I don't think either of them do anything.
> It seems that there is no automated way to capture vm state data for > debugging, if someone has one, pls let me know! You /might/ be able to find something here
http://java.sun.com/j2se/1.5.0/docs/tooldocs/index.html#manage
In particular there's a link to a trouble shooting guide:
http://java.sun.com/j2se/1.5/pdf/jdk50_ts_guide.pdf
Lastly, the -Xrunhprof option to the java command has some options which /might/ be relevant. Try java -Xrunhprof:help for a start, and then try Google for more explanations
-- chris
halfdog - 11 May 2006 17:18 GMT Wow, thanks, you brought me to the right track. Your link pointed to j2se tools, which I never heard of until now, and there it was: ''jsadebugd'' The tool allows to attach to a running java-vm, or a core-dump of a vm and present the result via RMI
> ps aux | grep java xxx 21849 0.0 13.1 278344 51012 pts/5 S May10 0:07 home/xxx/external_data/java/jdk1.5.0_06/bin/java -Djava.util.logging.manager=org.apache.juli.ClassLoaderLogManager -Djava.util.logging.config.file=/home/xxx/var/tomcat/devel/conf/logging.properties -Xdebug -Xrunjdwp:transport=dt_socket,address=localhost:33333,server=y,suspend=n -Djava.endorsed.dirs=/home/xxx/external_data/java/apache-tomcat-5.5.12/common/endorsed -classpath
:/home/fiedler/external_data/java/apache-tomcat-5.5.12/bin/bootstrap.jar:/home/xxx/external_data/java/apache-tomcat-5.5.12/bin/commons-logging-api.jar -Dcatalina.base=/home/xxx/var/tomcat/devel -Dcatalina.home=/home/xxx/external_data/java/apache-tomcat-5.5.12 -Djava.io.tmpdir=/home/xxx/var/tomcat/devel/temp org.apache.catalina.startup.Bootstrap start
> gdb --pid 21849 Attaching to process 21849 (gdb) generate-core-file java.core Saved corefile java.core (gdb) quit
> jsadebugd /home/xxx/external_data/java/jdk1.5.0_06/bin/java java.core DebugServer Attaching to core java.core from executable /home/xxx/external_data/java/jdk1.5.0_06/bin/java and starting RMI services, please wait... Debugger attached and RMI services started.
## Now open another console, print a stack trace
> jstack DebugServer@localhost Attaching to remote server DebugServer@localhost, please wait... Debugger attached successfully. Client compiler detected. JVM version is 1.5.0_06-b05 Thread t@ 22418: (state = BLOCKED) - java.lang.Object.wait(long) @bci=0 (Interpreted frame) - java.lang.Object.wait() @bci=2, line=474 (Interpreted frame) - xxxxxxxxxxxx.GenericCallDispatcher$CallDispatcherWorkerThread.run() @bci=12, line=388 (Interpreted frame) - java.lang.Thread.run() @bci=11, line=595 (Interpreted frame)
..... and so on
Currently I'm looking for the RMI interface specification for the jsadebugd, if there are more methods for inspection available out there. The jdb and this server seem to be incompatible, so it is just a small step.
PS: Bad? luck Windows users, seems that this ultra-chaotic, scratch your nose with your feets-geek tool is only available for linux
Chris Uppal - 14 May 2006 10:02 GMT > PS: Bad? luck Windows users, seems that this ultra-chaotic, scratch > your nose with your feets-geek tool is only available for linux Thanks for the follow up. It does sound rather a rather, um, baroque way of getting at the data you need. But if it works, it works....
-- chris
halfdog - 16 May 2006 11:26 GMT With the help of a guy from javasoft I managed to do it all: Debug a dump of a java vm:
$ cat << END > gdb.commands
> gcore tomcat.core > detach > quit > END $ gdb --pid 26003 -x gdb.commands $ jsadebugd ..../jdk1.5.0_06/bin/java tomcat.core DebugServer & $ jdb -connect sun.jvm.hotspot.jdi.SADebugServerAttachingConnector:debugServerName=DebugServer@localhost
You can inspect all the data (threads, stack frames, local variables, objects) just as if you would have attached to a live VM, only modifications do not work (step, continue, set...) because a VM core dump is dead.
This is a really strange way to debug a java application, I never thought that it could work that way.
Free MagazinesGet these publications absolutely FREE for up to 12 months. There are no hidden fees and no obligation. Simply choose a title, complete the application form and submit it. Read more ...
|
|
|