Home | Contact Us | FAQ | Search & Site Map | Link to Us
Sign In | Join | Other 45 Sites in Network
HomeAnnouncementsWhite Papers
Discussion GroupsFirst AidDatabasesJavaBeansGUIJava 3DVirtual MachineCORBASecurityToolsGeneral
Java DirectoryOpen Source ProjectsSample Book ChaptersUser GroupsWeb Resources
Related Topics
Databases.NETMore Topics ...

Java Forum / Virtual Machine / May 2004

Tip: Looking for answers? Try searching our database.

Profiling  / Instruction count

Thread view: 
user@domain.invalid - 03 May 2004 21:36 GMT
Hi
Since I'm doing an optimization project for a language, which compiles
to the JVM, I like to know, if there is an easy way to measure the gain
of my optimizations, a.k. compare the two compilation with and w/o
optimizations. How is this done. I don't simply want to measure time on
the OS, since this would also measure startup time of the JVM. I thought
of writing a Java class, sort of a class loader, which calls the "main"
method of my tests and measures time, before and after. But still this
would also measure things like garbage collection etc.
The other idea I was playing around with, is to instrument the code and
count the different JVM instructions. Only to implement this is kind of
tedious, since there are quite some VM instructions. Also one would have
to have a model of which JVM instructions takes how much time. Do such
model exist, anyway, since this is specific to a particular JVM? Does
one of you guys have a suggestion.

-Urs
Urs Keller - 03 May 2004 21:39 GMT
So much to my mail settings ... :-)
Urs Keller - 03 May 2004 23:15 GMT
I found out that the Jikes RVM supports Instruction Counters:
http://www-124.ibm.com/developerworks/oss/jikesrvm/userguide/HTML/userguide_74.html
Does anyone know of a model on how different instructions relate to each
other in terms of execution time? This model combined with the
instruction counters, would make a pretty good performance measurement.

-Urs

> Hi
> Since I'm doing an optimization project for a language, which compiles
[quoted text clipped - 13 lines]
>
> -Urs
Mark Bottomley - 04 May 2004 03:51 GMT
> I found out that the Jikes RVM supports Instruction Counters:

http://www-124.ibm.com/developerworks/oss/jikesrvm/userguide/HTML/userguide_74.html
> Does anyone know of a model on how different instructions relate to each
> other in terms of execution time? This model combined with the
[quoted text clipped - 19 lines]
> >
> > -Urs

Optimization and optimization measurement on most JVM's bears little
relationship to any optimizations performed transforming the bytecodes
in a class file. These changes are swamped by gains from JIT or AOT.
JITs typically speed up execution by an order of magnitude. The only
time JITs aren't found is usually on small platforms. Most java compilers
generate byte code that is not very optimal, they prefer to be correct and
debuggable. They know that the JIT will improve the areas that are
"hot" and by using execution path monitoring, they have more information
to do a better job of compiling the bytecode to native than an optimizing
compiler with just the source code or just the bytecode.

There are also major variations between JVMs based on the CPU
(e.g. even different Pentium and AMD chips will differ in execution rates)
and based on how they implement the interesting and typically time
consuming bytecodes (e.g. new, invokes and get/putfield/static).

It would be helpfull to identify what JVM/platform you are targetting and
what type of programs you are trying to optimize e.g. database, text
manipulation, numerical, etc. (and the starting language if you wish to
mention it)

Mark...
Urs Keller - 04 May 2004 06:27 GMT
Hi

Suppose there isn't a JIT or AOT available on my plattform.
Since this is an optimization project it still makes sense to assume
this. And you don't want to pass barely stupid code to the JIT. I guess
It wouldn't optimize away all the things you could do statically.

-Urs

>>I found out that the Jikes RVM supports Instruction Counters:
>
[quoted text clipped - 46 lines]
>
> Mark...
Mark Bottomley - 05 May 2004 01:53 GMT
> Hi
>
[quoted text clipped - 6 lines]
>
> >>I found out that the Jikes RVM supports Instruction Counters:

http://www-124.ibm.com/developerworks/oss/jikesrvm/userguide/HTML/userguide_74.html

> >>Does anyone know of a model on how different instructions relate to each
> >>other in terms of execution time? This model combined with the
[quoted text clipped - 42 lines]
> >
> > Mark...

The types of bytecode you choose to generate will have a major impact on
how the VM performs. I ask because if you are doing mostly math then the
gains will probably be directly proportional to the number of byte codes,
unless
your platform has difficulties with multiply/divide. If you are using
objects and
frequently abandoning them, then new and GC's will dominate. If you are
writing highly factored code, then method invocations could dominate.

You can calibrate any given VM with looping code and the
internal timers. The things to remember are to disable any JIT/AOT, usually
with a  command line parameter like -Xint (means interpreter only) and you
can generate the class files manually using one of several free bytecode
assemblers like Jasmin. You should by preference use the internal java
timers,
but remember to "warm" the VM by running everything at least once to make
sure that all relevant classes are loaded and method invocations have been
resolved so you're not measuring start-up time. i.e. don't just have other
calls
to the same routines, but run the exact same bytecodes as some VMs
resolve each bytecode and some VMs resolve a central entry like the
constant pool, so unless you know the internal details of the VM, be
careful.

You probably need calibrate only a small fraction of the byte codes as few
are likely to be used and many are rare in the wild. e.g. all the iconst_x
bytecodes are probably identical, all the if_x are probably identical, all
the
if_icmpx are probably equal, all logic and/or/xor are probably identical,
dup_x instructions are extremely rare in the wild and typically not worth
profiling, aload_x/astore_x should be calibrated separately from iload_x/
istore_x as some GC implementations use the aload/astore instructions to
implement read and/or write blocks so they may differ from iload/istore
which just move integers between local variables and the stack.

If you tell us the type of programs you are writing, I could be more
helpful.

You can also try downloading the Sable research VM from McGill Univ.
and look at the implementation of the individual byte codes.

Mark...
glen herrmannsfeldt - 07 May 2004 22:54 GMT
> Since I'm doing an optimization project for a language, which compiles
> to the JVM, I like to know, if there is an easy way to measure the gain
[quoted text clipped - 4 lines]
> method of my tests and measures time, before and after. But still this
> would also measure things like garbage collection etc.

I have done timing on x86 machines with a native method to return
the value of the x86 time stamp counter (rdtsc instruction).

There are some questions related to how rdtsc actually works, and
even more in the interaction between it and JVM, with or without JIT,
but it can be done and sometimes helps.

On most x86 machines it is two instructions, rdtsc and ret.  A few
more statements to convince the assembler to assemble it.

-- glen
Roedy Green - 08 May 2004 00:32 GMT
>There are some questions related to how rdtsc actually works, and
>even more in the interaction between it and JVM, with or without JIT,
>but it can be done and sometimes helps.

just google RDTSC and you will learn more than you wanted to know.

The Pentium CPU has a counter that increments by one on every clock
cycle.  It gets set to 0 on boot.  This gives you a very fine grain
clock for measuring relative time, though it is not suitable for
measuring wall clock time since the crystal that runs the cpu is not
particularly calibrated and you don't know what it is calibrated too,
without an experment to compare the RDTSC and traditional timers.

Whether the cpu is working for you, or somebody else, or idling in a
loop waiting for some task to come ready, it is humming away.  I don't
think there is any situation where the Pentium goes into a wait state
like the old 360s where it would stop incrementing.

You can use it to measure something accurately of very short duration,
such as the one JNI call.

--
Canadian Mind Products, Roedy Green.
Coaching, problem solving, economical contract programming.
See http://mindprod.com/jgloss/jgloss.html for The Java Glossary.
glen herrmannsfeldt - 08 May 2004 22:04 GMT
>>There are some questions related to how rdtsc actually works, and
>>even more in the interaction between it and JVM, with or without JIT,
>>but it can be done and sometimes helps.

> just google RDTSC and you will learn more than you wanted to know.

I have never worried about it, but some have said that there are
problems with using it.  Considering that the processors can do
out of order execution, it might be that you can't accurately
measure single instructions, but you can't do that for Java,
anyway.

It isn't perfect, but it can be used, and is sometimes useful
for timing small sections of code.

RDTSC returns its value in EDX and EAX, just the registers that
most compilers use to return 64 bit function values in.

-- glen
Roedy Green - 09 May 2004 03:08 GMT
>I have never worried about it, but some have said that there are
>problems with using it.  Considering that the processors can do
>out of order execution, it might be that you can't accurately
>measure single instructions, but you can't do that for Java,
>anyway.

It does not really matter if they do out of order execution.  It is
repeatable. the count you get is both an accurate count of ticks of
time, and cpu work.  Some instructions take differing numbers of
cycles depending on the operands.  

The problem is though there is so much other stuff going on in the
background in your OS that can throw of your measures, since they show
up in the counts too.
--
Canadian Mind Products, Roedy Green.
Coaching, problem solving, economical contract programming.
See http://mindprod.com/jgloss/jgloss.html for The Java Glossary.
glen herrmannsfeldt - 12 May 2004 09:01 GMT
(snip regarding RDTSC)

> It does not really matter if they do out of order execution.  It is
> repeatable. the count you get is both an accurate count of ticks of
> time, and cpu work.  Some instructions take differing numbers of
> cycles depending on the operands.  

> The problem is though there is so much other stuff going on in the
> background in your OS that can throw of your measures, since they show
> up in the counts too.

I agree.

I mentioned RDTSC in another post, likely in another newsgroup, some
time ago, and someone replied that the times wouldn't be accurate,
even without background events.  It might be that they aren't,
but they are close enough for what I wanted to use them for.

Because of those comments, I wanted to slightly qualify my
suggestions about using RDTSC.   Like all tools, they should be
used carefully, but I do think it can be useful.

-- glen


Free Magazines

Get these publications absolutely FREE for up to 12 months. There are no hidden fees and no obligation. Simply choose a title, complete the application form and submit it. Read more ...

Oracle MagazineNetwork ComputingComputer WorldBio-IT WorldeWeekInformation WeekInfosecurity
 
Sign In
Join
My Latest Posts
My Monitored Threads
My Blog
My Photo Gallery
My Profile
My Homepage

Start New Thread
Enable EMail Alerts
Rate this Thread



©2008 Advenet LLC   Privacy Policy - Terms of Use
This website includes both content owned or controlled by Advenet as well as content owned or controlled by third parties.