Home | Contact Us | FAQ | Search & Site Map | Link to Us
Sign In | Join | Other 45 Sites in Network
HomeAnnouncementsWhite Papers
Discussion GroupsFirst AidDatabasesJavaBeansGUIJava 3DVirtual MachineCORBASecurityToolsGeneral
Java DirectoryOpen Source ProjectsSample Book ChaptersUser GroupsWeb Resources
Related Topics
Databases.NETMore Topics ...

Java Forum / Virtual Machine / September 2003

Tip: Looking for answers? Try searching our database.

Java compiler with inline bytecode?

Thread view: 
Will Clark - 03 Sep 2003 16:52 GMT
Does anyone know of a java compiler that allows inline "java assembly
mnemonics" to be included in amongst the standard java code?

Similar to the inline asm keyword in C/C++ compilers...

Cheers :o)

Will
Roedy Green - 03 Sep 2003 17:55 GMT
>Does anyone know of a java compiler that allows inline "java assembly
>mnemonics" to be included in amongst the standard java code?
>
>Similar to the inline asm keyword in C/C++ compilers...

Since Java is multiplatform, for which platform would the inline
assembler apply? The way to do that is to write some JNI in C/C++ and
use the inline assembler features of that language.

Perhaps you meant inline byte code.  There is not as much call for it
since you can generate pretty well anything you want with high level
Java anyway. There is not much you can do in the way of optimisation.
However if you wanted to do that, see
http://mindprod.com/jgloss/jasm.html.  That is still not inline.

--
Canadian Mind Products, Roedy Green.
Coaching, problem solving, economical contract programming.
See http://mindprod.com/jgloss/jgloss.html for The Java Glossary.
Will Clark - 03 Sep 2003 20:47 GMT
Yes, indeed what I mean is inline bytecode (as in what java compiles to)
because, while there may be little that can be achieved in terms of
optimisation writing directly in bytecode instead of the more structured
java language, I have a couple of really tight loops that can really benefit
from being coded directly in bytecode instead of compiled. Java does tend to
do things a long way round when it comes to certain complex repetitive
manipulations.

Also, there are various shortcomings to the java language itself which can
be overcome by using jasmin (or whatever) and hand-coding your own routine.

I currently have various routines coded like this, and because I cannot
spend the time recoding 2000 lines of java into bytecode (or for that matter
trawl through the disassembled code) I have a specialised class containing
all the hand-coded functions which is then assembled using Jasmin.

It would be nice (as a feature) to have a Java compiler that also allows
inline assembly (like C/C++ compilers do). It would also allow my
specialised functions to be part of the main class file, rather than an
appendage (which is not tidy).

It would still be cross-platform as the java assembly mnemonics are
cross-platform in themselves.

Perhaps when I get a moment, I'll have to write my own! :o)

Will

> >Does anyone know of a java compiler that allows inline "java assembly
> >mnemonics" to be included in amongst the standard java code?
[quoted text clipped - 15 lines]
> Coaching, problem solving, economical contract programming.
> See http://mindprod.com/jgloss/jgloss.html for The Java Glossary.
Will Clark - 03 Sep 2003 21:13 GMT
(reposted because the original seems to have disappeared into the ether!)

Yes, indeed what I mean is inline bytecode (as in what java compiles to)
because, while there may be little that can be achieved in terms of
optimisation writing directly in bytecode instead of the more structured
java language, I have a couple of really tight loops that can really benefit
from being coded directly in bytecode instead of compiled. Java does tend to
do things a long way round when it comes to certain complex repetitive
manipulations.

Also, there are various shortcomings to the java language itself which can
be overcome by using jasmin (or whatever) and hand-coding your own routine.

I currently have various routines coded like this, and because I cannot
spend the time recoding 2000 lines of java into bytecode (or for that matter
trawl through the disassembled code) I have a specialised class containing
all the hand-coded functions which is then assembled using Jasmin.

It would be nice (as a feature) to have a Java compiler that also allows
inline assembly (like C/C++ compilers do). It would also allow my
specialised functions to be part of the main class file, rather than an
appendage (which is not tidy).

It would still be cross-platform as the java assembly mnemonics are
cross-platform in themselves.

Perhaps when I get a moment, I'll have to write my own! :o)

Will

> >Does anyone know of a java compiler that allows inline "java assembly
> >mnemonics" to be included in amongst the standard java code?
[quoted text clipped - 15 lines]
> Coaching, problem solving, economical contract programming.
> See http://mindprod.com/jgloss/jgloss.html for The Java Glossary.
Mark Bottomley - 04 Sep 2003 00:22 GMT
Will:

   The problem with inlining bytecodes is that it can lead to unverifiable
code very easily. The normal solution is to let the JIT compile the hot code
as it will do a better job (faster than any bytecode manipulations) unless
you are on a platform sans JIT. JITs are probably why the java compilers are
poor at performing optimizations - they are generally unnecessary and
optimized code may be sufficiently obscure as to make the JIT design more
difficult.

Mark...

> (reposted because the original seems to have disappeared into the ether!)
>
[quoted text clipped - 45 lines]
> > Coaching, problem solving, economical contract programming.
> > See http://mindprod.com/jgloss/jgloss.html for The Java Glossary.
Roedy Green - 04 Sep 2003 00:28 GMT
>    The problem with inlining bytecodes is that it can lead to unverifiable
>code very easily.

Do you mean?

1. you can inadvertently generate illegal code, code that if allowed
to run would get things like stack underflow?

2. The programs you can generate are too complicated.  The byte code
verifier can't decide if they are safe or not?

3. That you can generate legal code that would run safely, but that
looks bad to the byte code verifier?

--
Canadian Mind Products, Roedy Green.
Coaching, problem solving, economical contract programming.
See http://mindprod.com/jgloss/jgloss.html for The Java Glossary.
Mark Bottomley - 04 Sep 2003 01:22 GMT
Roedy:

   I would assume that the java compiler with the byte code inlining would
do some preliminary checking for valid bytecodes and their parameters
(constant pool references and relative branch offsets).

   As for errors, I mean 1 and 3. - you can easily generate bytecode
sequences that are unverifiable. The java verifier does not accept all
correct programs - only the verifiable subset of correct programs. The
sequences may be illegal in that you do any of the things illegal in java
class files including stack underflow, exceeding max stack depth for a
method, accessing non-existant local variables, accessing halves of
long/double local variables, accessing local variables or stack elements as
incorrect types e.g. iadd two object references, creating a loop where the
stack is not equivalent on different execution paths, and many other
verification limitations. Verifiers do not have trouble with complicated
code, but the restrictions on valid code may be challenging to the average
Java programmer, many of whom have not seen bytecodes.

   The solution would be to execute the class with -noverify on the command
line, but you give up the protections that many people desire from Java.

Mark...

> >    The problem with inlining bytecodes is that it can lead to unverifiable
> >code very easily.
[quoted text clipped - 14 lines]
> Coaching, problem solving, economical contract programming.
> See http://mindprod.com/jgloss/jgloss.html for The Java Glossary.
Roedy Green - 04 Sep 2003 02:56 GMT
>    The solution would be to execute the class with -noverify on the command
>line, but you give up the protections that many people desire from Java.

Yes, it is very like coding in forth, without a safety net.

What you more likely going to do is generate byte codes from some high
level information, not actually hand code each routine in byte code.

Applications I thought would be interesting for this treatment:
1. parser
2. regex implementation
3. spreadsheet bean.

--
Canadian Mind Products, Roedy Green.
Coaching, problem solving, economical contract programming.
See http://mindprod.com/jgloss/jgloss.html for The Java Glossary.
Chris Smith - 04 Sep 2003 14:18 GMT
> Also, there are various shortcomings to the java language itself which can
> be overcome by using jasmin (or whatever) and hand-coding your own routine.
[quoted text clipped - 3 lines]
> trawl through the disassembled code) I have a specialised class containing
> all the hand-coded functions which is then assembled using Jasmin.

I'm very interested as to what you found to to be faster or easier using
bytecode than plain Java.

The generally accepted wisdom is that there's not much that can be done
to optimize bytecode instructions... Sun used to have a -O option on
their compiler to attempt bytecode optimizations, but they disabled it
when they discovered that the generated code actually ran slower than
"unoptimized" bytecode.  So if you've found a way to optimize bytecode
that really gives a performance improvement, that would be unusual.  I'd
be interested to see how you've done the benchmark, and what code
provides those results.

My worry is that you've looked at generated bytecode and made the
assumption that there's a vaguely linear relationship between the number
of bytecode instructions in a code segment.  That's generally not even
remotely true... since bytecode is recompiled into an optimized native
machine code before it's used, this exercise would be like determining
the efficiency of a C program by counting the lines of code.

Truly optimizing bytecode would require that you understand the
optimization and structure recognition heuristics of your JIT compiler,
and try to write bytecode that's more easily recognizable.  Since JITs
are tweaked to work well with common Java language compilers like javac
and jikes, they are by definition at the leading edge of writing
efficient bytecode.  As I mentioned above, Sun actually managed to
achieve greater performance by disabling their bytecode optimization so
that they would generate more plain, repetitive, and predictable code
that the JIT can more easily recognize and classify as a structural
component.  You'd have to find some unintentional quirk of the JIT
compiler to beat these vanilla compilers.

I'm also having trouble imagining any kind of functional "shortcoming"
in Java that would make it easier to express an algorithm in verifiable
bytecode.  And bytecode certainly doesn't make anything more readable.  
So what exactly do you mean?

> It would be nice (as a feature) to have a Java compiler that also allows
> inline assembly (like C/C++ compilers do). It would also allow my
> specialised functions to be part of the main class file, rather than an
> appendage (which is not tidy).

Well, first I'll say that because of the portability aspects of Java and
the desires of Sun in this regard, it's conventional to be very picky
about defining what constitutes Java code.  Specifically, the
hypothetical language that you propose above is not Java.  You could
call is JavASM or something like that...

I see no reason that this would be impossible.  You've have to find some
way to express the impact of this bytecode segment on the method
metadata (for example, stack depth), but that shouldn't be too hard: you
could adopt a Jasmin-like syntax, or (better) you could have your
compiler do a sort of pre-validation to determine max stack depth and
ensure that there isn't a stack underflow within the code that could
cause it to view contents placed on the stack by the Java code sections.  
You'd probably have the compiler run a verifier against the block
anyway, since I'd consider it bad form for a compiler to agree to output
invalid bytecode.  You'd want each "asm" block to verify separately so
as to avoid dependencies on the Java bytecode generation of the
intervening Java code segments, so if you were hoping for 'goto' between
two ASM blocks, you probably want to give that up.

Other than that, yeah it seems feasible.  I still don't understand why
(other than as an interesting toy) you would want such a thing.

Signature

www.designacourse.com
The Easiest Way to Train Anyone... Anywhere.

Chris Smith - Lead Software Developer/Technical Trainer
MindIQ Corporation

Will Clark - 04 Sep 2003 16:23 GMT
Hi Chris,

Thankyou for your long reply. It is certainly true that I can be mistaken
for assuming that the number of lines of bytecode in some way correlates to
the speed at which it executes. However, in a few (and it is a few) cases,
it can be true. While these optimisations do little on their own, the
combined effect of calling such an algorithm using these (and other such)
optimisations over 10,000,000 times can be quite substantial. These
optimisations, to be fair, that I have written do not do miracles with the
speed, but will however reduce the overall time by up to half a minute -
bearing in mind it can take up to 10 to complete a series of computations!
Any optimisation at all is welcome, at this level!

The optimisations/customisations have a couple of characteristics:

1. In some places, it is merely to choose the correct function that runs...
for example

   switch (test)
       {
       case 1:    a = method1(b,c,d); break;
       case 2:    a = method2(b,c,d); break;
       case 3:    a = method3(b,c,d); break;
       }

As you can see, the end result and the calling parameters are the same, just
the function called is based on test... therefore, the optimisation
basically places all the parameters on the stack, then calls the correct
function based on test, and then at the end stores the result. Effectively,
the switch functionality is on the method name alone.

You may say, "sure I understand how that may optimise the classfile in terms
of size, but not speed?" but for some reason (perhaps some different JIT
optimisation is used) it does run that little bit faster.

2. The next one is a little contentious! It involves calling two functions
sequentially where the parameters are the same. Now Java sometimes does an
optimisation that duplicates the parameter on the stack so that it isn't
loaded onto the stack again for the second function. Sometimes it does not
(and I haven't particularily had the time to pin down why and when!). But
suffice to say, this is the second of the main characteristics.

3. The above one can be extended to include cases where the after the first
function, the parameter is modified, but the second function wants to use
the original parameter (ie unmodified). Now Java will allow something like:

   int a = param;
   function1(a);
   param += function3();
   function2(a,param);

but this uses extra local variables (which is slower than duplicating the
value on the stack and using it again).

4. Finally, and most dubious of all!, using different invoke instructions to
bypass the inheritance rules of the Java language. I posted this in another
message to java.lang.programmer on 27/08/2003 under the title "Re: quick
inheritance question".

Not all these necessarily are great optimisations, but they do help keep the
code size down (memory limitations!) and improve the performance slightly.

Its all a bit convoluted! I'd love to show you all the code, and take you
through it, but I don't have the priviledge of being able to do that :o)

Thanks for your comments tho'... I'll let you know if I do end up creating
my own java-esque compiler!

Will

> > Also, there are various shortcomings to the java language itself which can
> > be overcome by using jasmin (or whatever) and hand-coding your own routine.
[quoted text clipped - 67 lines]
> Other than that, yeah it seems feasible.  I still don't understand why
> (other than as an interesting toy) you would want such a thing.
Roedy Green - 04 Sep 2003 17:24 GMT
>    int a = param;
>    function1(a);
[quoted text clipped - 3 lines]
>but this uses extra local variables (which is slower than duplicating the
>value on the stack and using it again).

This I would expect any decent optimising compiler catch on its own
since there is no way "a" could have changed between the two uses.

You might like to experiment with Jet on your Java code and byte code
to see if it makes any difference.  See
http://mindprod.com/jgloss/jet.html.  There is a free version.  The
only catch is you are not allowed to distribute code generated by it.
--
Canadian Mind Products, Roedy Green.
Coaching, problem solving, economical contract programming.
See http://mindprod.com/jgloss/jgloss.html for The Java Glossary.
Michael Amling - 04 Sep 2003 17:50 GMT
> 2. The next one is a little contentious! It involves calling two functions
> sequentially where the parameters are the same. Now Java sometimes does an
> optimisation that duplicates the parameter on the stack so that it isn't
> loaded onto the stack again for the second function. Sometimes it does not
> (and I haven't particularily had the time to pin down why and when!). But
> suffice to say, this is the second of the main characteristics.

 If the argument values depend only on constants and local variables,
this is safe. But if the argument values depend on nonfinal instance or
static variables, then the compiler should not assume they have the same
values for the second call.

> 3. The above one can be extended to include cases where the after the first
> function, the parameter is modified, but the second function wants to use
[quoted text clipped - 4 lines]
>     param += function3();
>     function2(a,param);

  Assuming "param" is local, or at least not visible to functions 1, 2
or 3, you can recode this as
    function1(param);
    function2(param, param+=function3());
and the compiler might recognize a dup opportunity for this code that it
doesn't for the above.

> but this uses extra local variables (which is slower than duplicating the
> value on the stack and using it again).
[quoted text clipped - 3 lines]
> message to java.lang.programmer on 27/08/2003 under the title "Re: quick
> inheritance question".

  Using final classes can make the JITC's work easier, too, nicht?

--Mike Amling
Chris Smith - 06 Sep 2003 04:57 GMT
> > 2. The next one is a little contentious! It involves calling two functions
> > sequentially where the parameters are the same. Now Java sometimes does an
[quoted text clipped - 7 lines]
> static variables, then the compiler should not assume they have the same
> values for the second call.

... unless it can prove that there are no monitor enter/exit
instructions in the first method, and that the first method does not
itself modify the fields.  Of course, this requires some knowledge about
the called code, and that's the kind of optimization that a dynamically
recompiling JIT compiler can do, but which is not feasible in writing
static bytecode as part of the source.

Signature

www.designacourse.com
The Easiest Way to Train Anyone... Anywhere.

Chris Smith - Lead Software Developer/Technical Trainer
MindIQ Corporation

Mark Bottomley - 05 Sep 2003 00:48 GMT
Will:

   There are several ways to improve the Java bytecodes by changing the way
the Java is written. Simplifications such as using only static methods
instead of instance methods will improve method invocation speed slightly.
Also on most systems, if a switch statement has very cases (system
dependent, but around 3-5), it is often more efficient to code using
if/then/else. Switch statements that use contiguous cases that translate
into tableswitch bytecodes are generally faster (Usually constant time) than
sparse cases that translate into lookupswitch bytecodes (time is either O(n)
or O(log2 n) depending on the internal VM implementation). Sparse switches
can have their average execution time improved if the lower indices are more
comonly called than the higher indices as they are sorted inside the
bytecode.  It is also faster in general to avoid invocations, if speed is of
the essence, by repeating code and inlining methods in Java. This avoids
parameter pushing, invocation frame setup and teardown and several internal
runtime limit checks and calculations, and return value (if any) copying.
Invocations are relatively expensive, even after JITing.  Inlining is
trading some of the readability/maintainability of  modularity for speed,
but that is your choice. Java class files support up to 64k bytes of code
per method, so there is plenty of room.

Although you haven't mentioned arrays, if you are doing anything 10,000,000
times you probably are using arrays. If you are using multi-dimensional
arrays, you can have invisible overhead. Also watch that you are not
allocating and abandoning objects in the calls as this will lead to garbage
collections in the background.

Optimization in general is best applied to only a very small section of the
code. (80/20 rule, 90/10 rule whatever ratio one believes to be accurate). I
hope you are using one of the profiling tools to determine where your effort
is best applied.

Mark...

> Hi Chris,
>
[quoted text clipped - 79 lines]
> >
> > I'm very interested as to what you found to to be faster or easier using

> > bytecode than plain Java.
> >
[quoted text clipped - 58 lines]
> > Other than that, yeah it seems feasible.  I still don't understand why
> > (other than as an interesting toy) you would want such a thing.
Chris Smith - 06 Sep 2003 05:10 GMT
> Thankyou for your long reply. It is certainly true that I can be mistaken
> for assuming that the number of lines of bytecode in some way correlates to
> the speed at which it executes. However, in a few (and it is a few) cases,
> it can be true.

Sure, it could occasionally be true.  I'd caution you to be careful with
the techniques you use for benchmarking, though.  Remember that a
dynamic (or "mixed-mode") JIT compiler like Sun's Hotspot will generally
start out by interpreting everything, and you will see very obvious
performance differences there that match your expectations.  It's not
until a section of code is identified as performance-critical that its
optimal performance is visible.

A couple things you say below lead me to believe that you're either
trying to make assumptions about performance from reading bytecode or
using a benchmarking technique that reflects interpreted performance,
either instead of or in addition to compiled performance.  The most
obvious is an implication that using an additional local variable will
have an impact on performance.

I'm also very suspicious that your optimizations regarding setting up
stack frames for method calls will cause serious performance problems on
platforms where there are non-stack-based or mixed stack/register
calling conventions, or where method frame setup is assisted by CPU
instructions.  They make it more difficult for the JIT to recognize
stack operations that relate to method frames (as opposed to operand
stack ops that should be converted to operands for CPU instructions),
and generate better method calling code.

I'm not trying to convince you to give up optimizing, but I am trying to
convince you to be careful that you are actually getting the results you
want.  Especially when you're dealing with something as complex as JIT
performance, it's easy to see just the things that match your
expectations and neglect the counter-evidence that's staring you in the
face.

Signature

www.designacourse.com
The Easiest Way to Train Anyone... Anywhere.

Chris Smith - Lead Software Developer/Technical Trainer
MindIQ Corporation

pete kirkham - 06 Sep 2003 18:42 GMT
> The most
> obvious is an implication that using an additional local variable will
> have an impact on performance.

When mucking around with optimising string concats, converting something
like:
 int LENGTH = 70000;
 StringBuffer a = new StringBuffer(LENGTH);

 for (int i=0; i<LENGTH; i++) {
   a.append('~');
 }

so that the loop reused the return value of append on the stack, instead
of popping it and loading the variable again, slowed it down even though
the bytecode to do had fewer instructions in the loop.

   0 ldc #2 <Integer 70000>
   2 istore_1
   3 new #3 <Class java.lang.StringBuffer>
   6 dup
   7 invokespecial #4 <Method java.lang.StringBuffer()>
  10 astore_2
then
  11 iconst_0
  12 istore_3
  13 goto 26
  16 aload_2
  17 bipush 126
  19 invokevirtual #5 <Method java.lang.StringBuffer append(char)>
  22 pop
  23 iinc 3 1
  26 iload_3
  27 iload_1
  28 if_icmplt 16
or
  11 aload_2
  12 iconst_0
  13 istore_3
  14 goto 25
  17 bipush 126
  19 invokevirtual #5 <Method java.lang.StringBuffer append(char)>
  22 iinc 3 1
  25 iload_3
  26 iload_1
  27 if_icmplt 17

So, based on the timings I got running these two (i posted them a few
months back), the additional variable access in this case allows the JIT
compiler to optimise better.

Pete
Wee Jin Goh - 07 Sep 2003 14:12 GMT
> Sure, it could occasionally be true.  I'd caution you to be careful with
> the techniques you use for benchmarking, though.  Remember that a
[quoted text clipped - 3 lines]
> until a section of code is identified as performance-critical that its
> optimal performance is visible.

Which is why when benchmarking, you don't want HotSpot to invoke the
interpreter at all, but you want it to start running once everything is
compiled. To do this, you add the flag -Xbatch when running your benchmark
app. This forces HotSpot to compile everything, before running. Thus, you
get slower startup times, but the benefit is, you don't get the effects of
HotSpot's 'warm-up' time.

Regards,
Wee Jin
Stuart D. Gathman - 10 Sep 2003 23:05 GMT
>> Also, there are various shortcomings to the java language itself which
>> can be overcome by using jasmin (or whatever) and hand-coding your own
[quoted text clipped - 11 lines]
> The generally accepted wisdom is that there's not much that can be done
> to optimize bytecode instructions... Sun used to have a -O option on

Some of us don't use the JIT.  It is great for certain applications, but
the startup and memory costs swamp the benefits for our applications.
When most of your code is executed very few times, and there are few
tight loops, the JIT just gets in the way.  Business logic is like that.


Free Magazines

Get these publications absolutely FREE for up to 12 months. There are no hidden fees and no obligation. Simply choose a title, complete the application form and submit it. Read more ...

Oracle MagazineNetwork ComputingComputer WorldBio-IT WorldeWeekInformation WeekInfosecurity
 
Sign In
Join
My Latest Posts
My Monitored Threads
My Blog
My Photo Gallery
My Profile
My Homepage

Start New Thread
Enable EMail Alerts
Rate this Thread



©2008 Advenet LLC   Privacy Policy - Terms of Use
This website includes both content owned or controlled by Advenet as well as content owned or controlled by third parties.