Java Forum / Virtual Machine / May 2004
Profiling / Instruction count
user@domain.invalid - 03 May 2004 21:36 GMT Hi Since I'm doing an optimization project for a language, which compiles to the JVM, I like to know, if there is an easy way to measure the gain of my optimizations, a.k. compare the two compilation with and w/o optimizations. How is this done. I don't simply want to measure time on the OS, since this would also measure startup time of the JVM. I thought of writing a Java class, sort of a class loader, which calls the "main" method of my tests and measures time, before and after. But still this would also measure things like garbage collection etc. The other idea I was playing around with, is to instrument the code and count the different JVM instructions. Only to implement this is kind of tedious, since there are quite some VM instructions. Also one would have to have a model of which JVM instructions takes how much time. Do such model exist, anyway, since this is specific to a particular JVM? Does one of you guys have a suggestion.
-Urs
Urs Keller - 03 May 2004 21:39 GMT So much to my mail settings ... :-)
Urs Keller - 03 May 2004 23:15 GMT I found out that the Jikes RVM supports Instruction Counters: http://www-124.ibm.com/developerworks/oss/jikesrvm/userguide/HTML/userguide_74.html Does anyone know of a model on how different instructions relate to each other in terms of execution time? This model combined with the instruction counters, would make a pretty good performance measurement.
-Urs
> Hi > Since I'm doing an optimization project for a language, which compiles [quoted text clipped - 13 lines] > > -Urs Mark Bottomley - 04 May 2004 03:51 GMT > I found out that the Jikes RVM supports Instruction Counters: http://www-124.ibm.com/developerworks/oss/jikesrvm/userguide/HTML/userguide_74.html
> Does anyone know of a model on how different instructions relate to each > other in terms of execution time? This model combined with the [quoted text clipped - 19 lines] > > > > -Urs Optimization and optimization measurement on most JVM's bears little relationship to any optimizations performed transforming the bytecodes in a class file. These changes are swamped by gains from JIT or AOT. JITs typically speed up execution by an order of magnitude. The only time JITs aren't found is usually on small platforms. Most java compilers generate byte code that is not very optimal, they prefer to be correct and debuggable. They know that the JIT will improve the areas that are "hot" and by using execution path monitoring, they have more information to do a better job of compiling the bytecode to native than an optimizing compiler with just the source code or just the bytecode.
There are also major variations between JVMs based on the CPU (e.g. even different Pentium and AMD chips will differ in execution rates) and based on how they implement the interesting and typically time consuming bytecodes (e.g. new, invokes and get/putfield/static).
It would be helpfull to identify what JVM/platform you are targetting and what type of programs you are trying to optimize e.g. database, text manipulation, numerical, etc. (and the starting language if you wish to mention it)
Mark...
Urs Keller - 04 May 2004 06:27 GMT Hi
Suppose there isn't a JIT or AOT available on my plattform. Since this is an optimization project it still makes sense to assume this. And you don't want to pass barely stupid code to the JIT. I guess It wouldn't optimize away all the things you could do statically.
-Urs
>>I found out that the Jikes RVM supports Instruction Counters: > [quoted text clipped - 46 lines] > > Mark... Mark Bottomley - 05 May 2004 01:53 GMT > Hi > [quoted text clipped - 6 lines] > > >>I found out that the Jikes RVM supports Instruction Counters: http://www-124.ibm.com/developerworks/oss/jikesrvm/userguide/HTML/userguide_74.html
> >>Does anyone know of a model on how different instructions relate to each > >>other in terms of execution time? This model combined with the [quoted text clipped - 42 lines] > > > > Mark... The types of bytecode you choose to generate will have a major impact on how the VM performs. I ask because if you are doing mostly math then the gains will probably be directly proportional to the number of byte codes, unless your platform has difficulties with multiply/divide. If you are using objects and frequently abandoning them, then new and GC's will dominate. If you are writing highly factored code, then method invocations could dominate.
You can calibrate any given VM with looping code and the internal timers. The things to remember are to disable any JIT/AOT, usually with a command line parameter like -Xint (means interpreter only) and you can generate the class files manually using one of several free bytecode assemblers like Jasmin. You should by preference use the internal java timers, but remember to "warm" the VM by running everything at least once to make sure that all relevant classes are loaded and method invocations have been resolved so you're not measuring start-up time. i.e. don't just have other calls to the same routines, but run the exact same bytecodes as some VMs resolve each bytecode and some VMs resolve a central entry like the constant pool, so unless you know the internal details of the VM, be careful.
You probably need calibrate only a small fraction of the byte codes as few are likely to be used and many are rare in the wild. e.g. all the iconst_x bytecodes are probably identical, all the if_x are probably identical, all the if_icmpx are probably equal, all logic and/or/xor are probably identical, dup_x instructions are extremely rare in the wild and typically not worth profiling, aload_x/astore_x should be calibrated separately from iload_x/ istore_x as some GC implementations use the aload/astore instructions to implement read and/or write blocks so they may differ from iload/istore which just move integers between local variables and the stack.
If you tell us the type of programs you are writing, I could be more helpful.
You can also try downloading the Sable research VM from McGill Univ. and look at the implementation of the individual byte codes.
Mark...
glen herrmannsfeldt - 07 May 2004 22:54 GMT > Since I'm doing an optimization project for a language, which compiles > to the JVM, I like to know, if there is an easy way to measure the gain [quoted text clipped - 4 lines] > method of my tests and measures time, before and after. But still this > would also measure things like garbage collection etc. I have done timing on x86 machines with a native method to return the value of the x86 time stamp counter (rdtsc instruction).
There are some questions related to how rdtsc actually works, and even more in the interaction between it and JVM, with or without JIT, but it can be done and sometimes helps.
On most x86 machines it is two instructions, rdtsc and ret. A few more statements to convince the assembler to assemble it.
-- glen
Roedy Green - 08 May 2004 00:32 GMT >There are some questions related to how rdtsc actually works, and >even more in the interaction between it and JVM, with or without JIT, >but it can be done and sometimes helps. just google RDTSC and you will learn more than you wanted to know.
The Pentium CPU has a counter that increments by one on every clock cycle. It gets set to 0 on boot. This gives you a very fine grain clock for measuring relative time, though it is not suitable for measuring wall clock time since the crystal that runs the cpu is not particularly calibrated and you don't know what it is calibrated too, without an experment to compare the RDTSC and traditional timers.
Whether the cpu is working for you, or somebody else, or idling in a loop waiting for some task to come ready, it is humming away. I don't think there is any situation where the Pentium goes into a wait state like the old 360s where it would stop incrementing.
You can use it to measure something accurately of very short duration, such as the one JNI call.
-- Canadian Mind Products, Roedy Green. Coaching, problem solving, economical contract programming. See http://mindprod.com/jgloss/jgloss.html for The Java Glossary.
glen herrmannsfeldt - 08 May 2004 22:04 GMT >>There are some questions related to how rdtsc actually works, and >>even more in the interaction between it and JVM, with or without JIT, >>but it can be done and sometimes helps.
> just google RDTSC and you will learn more than you wanted to know. I have never worried about it, but some have said that there are problems with using it. Considering that the processors can do out of order execution, it might be that you can't accurately measure single instructions, but you can't do that for Java, anyway.
It isn't perfect, but it can be used, and is sometimes useful for timing small sections of code.
RDTSC returns its value in EDX and EAX, just the registers that most compilers use to return 64 bit function values in.
-- glen
Roedy Green - 09 May 2004 03:08 GMT >I have never worried about it, but some have said that there are >problems with using it. Considering that the processors can do >out of order execution, it might be that you can't accurately >measure single instructions, but you can't do that for Java, >anyway. It does not really matter if they do out of order execution. It is repeatable. the count you get is both an accurate count of ticks of time, and cpu work. Some instructions take differing numbers of cycles depending on the operands.
The problem is though there is so much other stuff going on in the background in your OS that can throw of your measures, since they show up in the counts too. -- Canadian Mind Products, Roedy Green. Coaching, problem solving, economical contract programming. See http://mindprod.com/jgloss/jgloss.html for The Java Glossary.
glen herrmannsfeldt - 12 May 2004 09:01 GMT (snip regarding RDTSC)
> It does not really matter if they do out of order execution. It is > repeatable. the count you get is both an accurate count of ticks of > time, and cpu work. Some instructions take differing numbers of > cycles depending on the operands.
> The problem is though there is so much other stuff going on in the > background in your OS that can throw of your measures, since they show > up in the counts too. I agree.
I mentioned RDTSC in another post, likely in another newsgroup, some time ago, and someone replied that the times wouldn't be accurate, even without background events. It might be that they aren't, but they are close enough for what I wanted to use them for.
Because of those comments, I wanted to slightly qualify my suggestions about using RDTSC. Like all tools, they should be used carefully, but I do think it can be useful.
-- glen
Free MagazinesGet these publications absolutely FREE for up to 12 months. There are no hidden fees and no obligation. Simply choose a title, complete the application form and submit it. Read more ...
|
|
|