Java Forum / Virtual Machine / September 2003
Java compiler with inline bytecode?
Will Clark - 03 Sep 2003 16:52 GMT Does anyone know of a java compiler that allows inline "java assembly mnemonics" to be included in amongst the standard java code?
Similar to the inline asm keyword in C/C++ compilers...
Cheers :o)
Will
Roedy Green - 03 Sep 2003 17:55 GMT >Does anyone know of a java compiler that allows inline "java assembly >mnemonics" to be included in amongst the standard java code? > >Similar to the inline asm keyword in C/C++ compilers... Since Java is multiplatform, for which platform would the inline assembler apply? The way to do that is to write some JNI in C/C++ and use the inline assembler features of that language.
Perhaps you meant inline byte code. There is not as much call for it since you can generate pretty well anything you want with high level Java anyway. There is not much you can do in the way of optimisation. However if you wanted to do that, see http://mindprod.com/jgloss/jasm.html. That is still not inline.
-- Canadian Mind Products, Roedy Green. Coaching, problem solving, economical contract programming. See http://mindprod.com/jgloss/jgloss.html for The Java Glossary.
Will Clark - 03 Sep 2003 20:47 GMT Yes, indeed what I mean is inline bytecode (as in what java compiles to) because, while there may be little that can be achieved in terms of optimisation writing directly in bytecode instead of the more structured java language, I have a couple of really tight loops that can really benefit from being coded directly in bytecode instead of compiled. Java does tend to do things a long way round when it comes to certain complex repetitive manipulations.
Also, there are various shortcomings to the java language itself which can be overcome by using jasmin (or whatever) and hand-coding your own routine.
I currently have various routines coded like this, and because I cannot spend the time recoding 2000 lines of java into bytecode (or for that matter trawl through the disassembled code) I have a specialised class containing all the hand-coded functions which is then assembled using Jasmin.
It would be nice (as a feature) to have a Java compiler that also allows inline assembly (like C/C++ compilers do). It would also allow my specialised functions to be part of the main class file, rather than an appendage (which is not tidy).
It would still be cross-platform as the java assembly mnemonics are cross-platform in themselves.
Perhaps when I get a moment, I'll have to write my own! :o)
Will
> >Does anyone know of a java compiler that allows inline "java assembly > >mnemonics" to be included in amongst the standard java code? [quoted text clipped - 15 lines] > Coaching, problem solving, economical contract programming. > See http://mindprod.com/jgloss/jgloss.html for The Java Glossary. Will Clark - 03 Sep 2003 21:13 GMT (reposted because the original seems to have disappeared into the ether!)
Yes, indeed what I mean is inline bytecode (as in what java compiles to) because, while there may be little that can be achieved in terms of optimisation writing directly in bytecode instead of the more structured java language, I have a couple of really tight loops that can really benefit from being coded directly in bytecode instead of compiled. Java does tend to do things a long way round when it comes to certain complex repetitive manipulations.
Also, there are various shortcomings to the java language itself which can be overcome by using jasmin (or whatever) and hand-coding your own routine.
I currently have various routines coded like this, and because I cannot spend the time recoding 2000 lines of java into bytecode (or for that matter trawl through the disassembled code) I have a specialised class containing all the hand-coded functions which is then assembled using Jasmin.
It would be nice (as a feature) to have a Java compiler that also allows inline assembly (like C/C++ compilers do). It would also allow my specialised functions to be part of the main class file, rather than an appendage (which is not tidy).
It would still be cross-platform as the java assembly mnemonics are cross-platform in themselves.
Perhaps when I get a moment, I'll have to write my own! :o)
Will
> >Does anyone know of a java compiler that allows inline "java assembly > >mnemonics" to be included in amongst the standard java code? [quoted text clipped - 15 lines] > Coaching, problem solving, economical contract programming. > See http://mindprod.com/jgloss/jgloss.html for The Java Glossary. Mark Bottomley - 04 Sep 2003 00:22 GMT Will:
The problem with inlining bytecodes is that it can lead to unverifiable code very easily. The normal solution is to let the JIT compile the hot code as it will do a better job (faster than any bytecode manipulations) unless you are on a platform sans JIT. JITs are probably why the java compilers are poor at performing optimizations - they are generally unnecessary and optimized code may be sufficiently obscure as to make the JIT design more difficult.
Mark...
> (reposted because the original seems to have disappeared into the ether!) > [quoted text clipped - 45 lines] > > Coaching, problem solving, economical contract programming. > > See http://mindprod.com/jgloss/jgloss.html for The Java Glossary. Roedy Green - 04 Sep 2003 00:28 GMT > The problem with inlining bytecodes is that it can lead to unverifiable >code very easily. Do you mean? 1. you can inadvertently generate illegal code, code that if allowed to run would get things like stack underflow?
2. The programs you can generate are too complicated. The byte code verifier can't decide if they are safe or not?
3. That you can generate legal code that would run safely, but that looks bad to the byte code verifier?
-- Canadian Mind Products, Roedy Green. Coaching, problem solving, economical contract programming. See http://mindprod.com/jgloss/jgloss.html for The Java Glossary.
Mark Bottomley - 04 Sep 2003 01:22 GMT Roedy:
I would assume that the java compiler with the byte code inlining would do some preliminary checking for valid bytecodes and their parameters (constant pool references and relative branch offsets).
As for errors, I mean 1 and 3. - you can easily generate bytecode sequences that are unverifiable. The java verifier does not accept all correct programs - only the verifiable subset of correct programs. The sequences may be illegal in that you do any of the things illegal in java class files including stack underflow, exceeding max stack depth for a method, accessing non-existant local variables, accessing halves of long/double local variables, accessing local variables or stack elements as incorrect types e.g. iadd two object references, creating a loop where the stack is not equivalent on different execution paths, and many other verification limitations. Verifiers do not have trouble with complicated code, but the restrictions on valid code may be challenging to the average Java programmer, many of whom have not seen bytecodes.
The solution would be to execute the class with -noverify on the command line, but you give up the protections that many people desire from Java.
Mark...
> > The problem with inlining bytecodes is that it can lead to unverifiable > >code very easily. [quoted text clipped - 14 lines] > Coaching, problem solving, economical contract programming. > See http://mindprod.com/jgloss/jgloss.html for The Java Glossary. Roedy Green - 04 Sep 2003 02:56 GMT > The solution would be to execute the class with -noverify on the command >line, but you give up the protections that many people desire from Java. Yes, it is very like coding in forth, without a safety net.
What you more likely going to do is generate byte codes from some high level information, not actually hand code each routine in byte code.
Applications I thought would be interesting for this treatment: 1. parser 2. regex implementation 3. spreadsheet bean.
-- Canadian Mind Products, Roedy Green. Coaching, problem solving, economical contract programming. See http://mindprod.com/jgloss/jgloss.html for The Java Glossary.
Chris Smith - 04 Sep 2003 14:18 GMT > Also, there are various shortcomings to the java language itself which can > be overcome by using jasmin (or whatever) and hand-coding your own routine. [quoted text clipped - 3 lines] > trawl through the disassembled code) I have a specialised class containing > all the hand-coded functions which is then assembled using Jasmin. I'm very interested as to what you found to to be faster or easier using bytecode than plain Java.
The generally accepted wisdom is that there's not much that can be done to optimize bytecode instructions... Sun used to have a -O option on their compiler to attempt bytecode optimizations, but they disabled it when they discovered that the generated code actually ran slower than "unoptimized" bytecode. So if you've found a way to optimize bytecode that really gives a performance improvement, that would be unusual. I'd be interested to see how you've done the benchmark, and what code provides those results.
My worry is that you've looked at generated bytecode and made the assumption that there's a vaguely linear relationship between the number of bytecode instructions in a code segment. That's generally not even remotely true... since bytecode is recompiled into an optimized native machine code before it's used, this exercise would be like determining the efficiency of a C program by counting the lines of code.
Truly optimizing bytecode would require that you understand the optimization and structure recognition heuristics of your JIT compiler, and try to write bytecode that's more easily recognizable. Since JITs are tweaked to work well with common Java language compilers like javac and jikes, they are by definition at the leading edge of writing efficient bytecode. As I mentioned above, Sun actually managed to achieve greater performance by disabling their bytecode optimization so that they would generate more plain, repetitive, and predictable code that the JIT can more easily recognize and classify as a structural component. You'd have to find some unintentional quirk of the JIT compiler to beat these vanilla compilers.
I'm also having trouble imagining any kind of functional "shortcoming" in Java that would make it easier to express an algorithm in verifiable bytecode. And bytecode certainly doesn't make anything more readable. So what exactly do you mean?
> It would be nice (as a feature) to have a Java compiler that also allows > inline assembly (like C/C++ compilers do). It would also allow my > specialised functions to be part of the main class file, rather than an > appendage (which is not tidy). Well, first I'll say that because of the portability aspects of Java and the desires of Sun in this regard, it's conventional to be very picky about defining what constitutes Java code. Specifically, the hypothetical language that you propose above is not Java. You could call is JavASM or something like that...
I see no reason that this would be impossible. You've have to find some way to express the impact of this bytecode segment on the method metadata (for example, stack depth), but that shouldn't be too hard: you could adopt a Jasmin-like syntax, or (better) you could have your compiler do a sort of pre-validation to determine max stack depth and ensure that there isn't a stack underflow within the code that could cause it to view contents placed on the stack by the Java code sections. You'd probably have the compiler run a verifier against the block anyway, since I'd consider it bad form for a compiler to agree to output invalid bytecode. You'd want each "asm" block to verify separately so as to avoid dependencies on the Java bytecode generation of the intervening Java code segments, so if you were hoping for 'goto' between two ASM blocks, you probably want to give that up.
Other than that, yeah it seems feasible. I still don't understand why (other than as an interesting toy) you would want such a thing.
 Signature www.designacourse.com The Easiest Way to Train Anyone... Anywhere.
Chris Smith - Lead Software Developer/Technical Trainer MindIQ Corporation
Will Clark - 04 Sep 2003 16:23 GMT Hi Chris,
Thankyou for your long reply. It is certainly true that I can be mistaken for assuming that the number of lines of bytecode in some way correlates to the speed at which it executes. However, in a few (and it is a few) cases, it can be true. While these optimisations do little on their own, the combined effect of calling such an algorithm using these (and other such) optimisations over 10,000,000 times can be quite substantial. These optimisations, to be fair, that I have written do not do miracles with the speed, but will however reduce the overall time by up to half a minute - bearing in mind it can take up to 10 to complete a series of computations! Any optimisation at all is welcome, at this level!
The optimisations/customisations have a couple of characteristics:
1. In some places, it is merely to choose the correct function that runs... for example
switch (test) { case 1: a = method1(b,c,d); break; case 2: a = method2(b,c,d); break; case 3: a = method3(b,c,d); break; }
As you can see, the end result and the calling parameters are the same, just the function called is based on test... therefore, the optimisation basically places all the parameters on the stack, then calls the correct function based on test, and then at the end stores the result. Effectively, the switch functionality is on the method name alone.
You may say, "sure I understand how that may optimise the classfile in terms of size, but not speed?" but for some reason (perhaps some different JIT optimisation is used) it does run that little bit faster.
2. The next one is a little contentious! It involves calling two functions sequentially where the parameters are the same. Now Java sometimes does an optimisation that duplicates the parameter on the stack so that it isn't loaded onto the stack again for the second function. Sometimes it does not (and I haven't particularily had the time to pin down why and when!). But suffice to say, this is the second of the main characteristics.
3. The above one can be extended to include cases where the after the first function, the parameter is modified, but the second function wants to use the original parameter (ie unmodified). Now Java will allow something like:
int a = param; function1(a); param += function3(); function2(a,param);
but this uses extra local variables (which is slower than duplicating the value on the stack and using it again).
4. Finally, and most dubious of all!, using different invoke instructions to bypass the inheritance rules of the Java language. I posted this in another message to java.lang.programmer on 27/08/2003 under the title "Re: quick inheritance question".
Not all these necessarily are great optimisations, but they do help keep the code size down (memory limitations!) and improve the performance slightly.
Its all a bit convoluted! I'd love to show you all the code, and take you through it, but I don't have the priviledge of being able to do that :o)
Thanks for your comments tho'... I'll let you know if I do end up creating my own java-esque compiler!
Will
> > Also, there are various shortcomings to the java language itself which can > > be overcome by using jasmin (or whatever) and hand-coding your own routine. [quoted text clipped - 67 lines] > Other than that, yeah it seems feasible. I still don't understand why > (other than as an interesting toy) you would want such a thing. Roedy Green - 04 Sep 2003 17:24 GMT > int a = param; > function1(a); [quoted text clipped - 3 lines] >but this uses extra local variables (which is slower than duplicating the >value on the stack and using it again). This I would expect any decent optimising compiler catch on its own since there is no way "a" could have changed between the two uses.
You might like to experiment with Jet on your Java code and byte code to see if it makes any difference. See http://mindprod.com/jgloss/jet.html. There is a free version. The only catch is you are not allowed to distribute code generated by it. -- Canadian Mind Products, Roedy Green. Coaching, problem solving, economical contract programming. See http://mindprod.com/jgloss/jgloss.html for The Java Glossary.
Michael Amling - 04 Sep 2003 17:50 GMT > 2. The next one is a little contentious! It involves calling two functions > sequentially where the parameters are the same. Now Java sometimes does an > optimisation that duplicates the parameter on the stack so that it isn't > loaded onto the stack again for the second function. Sometimes it does not > (and I haven't particularily had the time to pin down why and when!). But > suffice to say, this is the second of the main characteristics. If the argument values depend only on constants and local variables, this is safe. But if the argument values depend on nonfinal instance or static variables, then the compiler should not assume they have the same values for the second call.
> 3. The above one can be extended to include cases where the after the first > function, the parameter is modified, but the second function wants to use [quoted text clipped - 4 lines] > param += function3(); > function2(a,param); Assuming "param" is local, or at least not visible to functions 1, 2 or 3, you can recode this as function1(param); function2(param, param+=function3()); and the compiler might recognize a dup opportunity for this code that it doesn't for the above.
> but this uses extra local variables (which is slower than duplicating the > value on the stack and using it again). [quoted text clipped - 3 lines] > message to java.lang.programmer on 27/08/2003 under the title "Re: quick > inheritance question". Using final classes can make the JITC's work easier, too, nicht?
--Mike Amling
Chris Smith - 06 Sep 2003 04:57 GMT > > 2. The next one is a little contentious! It involves calling two functions > > sequentially where the parameters are the same. Now Java sometimes does an [quoted text clipped - 7 lines] > static variables, then the compiler should not assume they have the same > values for the second call. ... unless it can prove that there are no monitor enter/exit instructions in the first method, and that the first method does not itself modify the fields. Of course, this requires some knowledge about the called code, and that's the kind of optimization that a dynamically recompiling JIT compiler can do, but which is not feasible in writing static bytecode as part of the source.
 Signature www.designacourse.com The Easiest Way to Train Anyone... Anywhere.
Chris Smith - Lead Software Developer/Technical Trainer MindIQ Corporation
Mark Bottomley - 05 Sep 2003 00:48 GMT Will:
There are several ways to improve the Java bytecodes by changing the way the Java is written. Simplifications such as using only static methods instead of instance methods will improve method invocation speed slightly. Also on most systems, if a switch statement has very cases (system dependent, but around 3-5), it is often more efficient to code using if/then/else. Switch statements that use contiguous cases that translate into tableswitch bytecodes are generally faster (Usually constant time) than sparse cases that translate into lookupswitch bytecodes (time is either O(n) or O(log2 n) depending on the internal VM implementation). Sparse switches can have their average execution time improved if the lower indices are more comonly called than the higher indices as they are sorted inside the bytecode. It is also faster in general to avoid invocations, if speed is of the essence, by repeating code and inlining methods in Java. This avoids parameter pushing, invocation frame setup and teardown and several internal runtime limit checks and calculations, and return value (if any) copying. Invocations are relatively expensive, even after JITing. Inlining is trading some of the readability/maintainability of modularity for speed, but that is your choice. Java class files support up to 64k bytes of code per method, so there is plenty of room.
Although you haven't mentioned arrays, if you are doing anything 10,000,000 times you probably are using arrays. If you are using multi-dimensional arrays, you can have invisible overhead. Also watch that you are not allocating and abandoning objects in the calls as this will lead to garbage collections in the background.
Optimization in general is best applied to only a very small section of the code. (80/20 rule, 90/10 rule whatever ratio one believes to be accurate). I hope you are using one of the profiling tools to determine where your effort is best applied.
Mark...
> Hi Chris, > [quoted text clipped - 79 lines] > > > > I'm very interested as to what you found to to be faster or easier using
> > bytecode than plain Java. > > [quoted text clipped - 58 lines] > > Other than that, yeah it seems feasible. I still don't understand why > > (other than as an interesting toy) you would want such a thing. Chris Smith - 06 Sep 2003 05:10 GMT > Thankyou for your long reply. It is certainly true that I can be mistaken > for assuming that the number of lines of bytecode in some way correlates to > the speed at which it executes. However, in a few (and it is a few) cases, > it can be true. Sure, it could occasionally be true. I'd caution you to be careful with the techniques you use for benchmarking, though. Remember that a dynamic (or "mixed-mode") JIT compiler like Sun's Hotspot will generally start out by interpreting everything, and you will see very obvious performance differences there that match your expectations. It's not until a section of code is identified as performance-critical that its optimal performance is visible.
A couple things you say below lead me to believe that you're either trying to make assumptions about performance from reading bytecode or using a benchmarking technique that reflects interpreted performance, either instead of or in addition to compiled performance. The most obvious is an implication that using an additional local variable will have an impact on performance.
I'm also very suspicious that your optimizations regarding setting up stack frames for method calls will cause serious performance problems on platforms where there are non-stack-based or mixed stack/register calling conventions, or where method frame setup is assisted by CPU instructions. They make it more difficult for the JIT to recognize stack operations that relate to method frames (as opposed to operand stack ops that should be converted to operands for CPU instructions), and generate better method calling code.
I'm not trying to convince you to give up optimizing, but I am trying to convince you to be careful that you are actually getting the results you want. Especially when you're dealing with something as complex as JIT performance, it's easy to see just the things that match your expectations and neglect the counter-evidence that's staring you in the face.
 Signature www.designacourse.com The Easiest Way to Train Anyone... Anywhere.
Chris Smith - Lead Software Developer/Technical Trainer MindIQ Corporation
pete kirkham - 06 Sep 2003 18:42 GMT > The most > obvious is an implication that using an additional local variable will > have an impact on performance. When mucking around with optimising string concats, converting something like: int LENGTH = 70000; StringBuffer a = new StringBuffer(LENGTH);
for (int i=0; i<LENGTH; i++) { a.append('~'); }
so that the loop reused the return value of append on the stack, instead of popping it and loading the variable again, slowed it down even though the bytecode to do had fewer instructions in the loop.
0 ldc #2 <Integer 70000> 2 istore_1 3 new #3 <Class java.lang.StringBuffer> 6 dup 7 invokespecial #4 <Method java.lang.StringBuffer()> 10 astore_2 then 11 iconst_0 12 istore_3 13 goto 26 16 aload_2 17 bipush 126 19 invokevirtual #5 <Method java.lang.StringBuffer append(char)> 22 pop 23 iinc 3 1 26 iload_3 27 iload_1 28 if_icmplt 16 or 11 aload_2 12 iconst_0 13 istore_3 14 goto 25 17 bipush 126 19 invokevirtual #5 <Method java.lang.StringBuffer append(char)> 22 iinc 3 1 25 iload_3 26 iload_1 27 if_icmplt 17
So, based on the timings I got running these two (i posted them a few months back), the additional variable access in this case allows the JIT compiler to optimise better.
Pete
Wee Jin Goh - 07 Sep 2003 14:12 GMT > Sure, it could occasionally be true. I'd caution you to be careful with > the techniques you use for benchmarking, though. Remember that a [quoted text clipped - 3 lines] > until a section of code is identified as performance-critical that its > optimal performance is visible. Which is why when benchmarking, you don't want HotSpot to invoke the interpreter at all, but you want it to start running once everything is compiled. To do this, you add the flag -Xbatch when running your benchmark app. This forces HotSpot to compile everything, before running. Thus, you get slower startup times, but the benefit is, you don't get the effects of HotSpot's 'warm-up' time.
Regards, Wee Jin
Stuart D. Gathman - 10 Sep 2003 23:05 GMT >> Also, there are various shortcomings to the java language itself which >> can be overcome by using jasmin (or whatever) and hand-coding your own [quoted text clipped - 11 lines] > The generally accepted wisdom is that there's not much that can be done > to optimize bytecode instructions... Sun used to have a -O option on Some of us don't use the JIT. It is great for certain applications, but the startup and memory costs swamp the benefits for our applications. When most of your code is executed very few times, and there are few tight loops, the JIT just gets in the way. Business logic is like that.
Free MagazinesGet these publications absolutely FREE for up to 12 months. There are no hidden fees and no obligation. Simply choose a title, complete the application form and submit it. Read more ...
|
|
|