Java Forum / General / May 2006
Writing custom compiler
daniel.w.gelder@gmail.com - 14 May 2006 04:52 GMT Hi,
I'm in the process of writing a custom compiler for my own language that will target the JVM. I'm just getting started but I've got a ClassInfo file successfully streamed. I have a question for anyone who knows.
Apparently I have to define my own constructor, even if it doesn't do anything. I seem to need to define an <init> method with signature "()V", otherwise myClass.newInstance() throws InstantiationException.
Anyway, there's nothing in <init> except a return statement. So I get this exception:
java.lang.VerifyError: (class: Dan, method: <init> signature: ()V) Constructor must call super() or this()
So apparently I have to call Object.<init>() too. Makes sense, but I thought you could never call <init> yourself. Is there a trick here?
Thanks. Dan
Kent Paul Dolan - 14 May 2006 08:30 GMT [I don't have much knowledge to offer. Instead, let me play potted plant here and see if it helps.]
> Apparently I have to define my own constructor, > even if it doesn't do anything. That shouldn't be the case, if you don't explicitly define a parameterless void constructor, the system creates a default one for you which calls super() and returns. In fact, creating such a constructor and making it private so it can't be invoked is one frequently seen trick to prevent the system from supplying such a default constructor unbeknownst to you and having it invoked where you had no such intention.
> I seem to need to define an <init> method with signature > "()V", otherwise myClass.newInstance() throws InstantiationException. 1) Are those angle brackets a literal part of the name "<init>"? Is that some template parameter naming, or what?
2) What does the "V" in "()V" mean? "Void return?"
> Anyway, there's nothing in <init> except a return statement. So I get > this exception:
> java.lang.VerifyError: (class: Dan, method: <init> signature: ()V) > Constructor must call super() or this() Is that a compile time error, or a run time error? It looks like a runtime invocation of MyClass.init() has encountered a problem with a constructor for MyClass being missing, but as noted above, one should be created (and then invoked) by default if you don't prevent that happening.
> So apparently I have to call Object.<init>() too. Makes sense, but I > thought you could never call <init> yourself. Is there a trick here? I'm still confused by those angle brackets, but it isn't "init" that you are being told to call, you are being told, at a default invocation of MyClass.init() at startup time, that you haven't yet instantiated an object from MyClass, thus there is no object whose instance (as opposed to static) method init() can be invoked.
A common pattern is to have your main routine inherit from applet, to instance applet in main(...), then to invoke init() from that instance.
class MyClass extends applet { private static MyClass mc = null;
public MyClass() // constructor { super(); } void main(...) // must exist in some class of your app { mc = new MyClass(); mc.init(); // yes, you _can_ call init() yourself // ... do more stuff } void init() { // initialize stuff for instance object mc of MyClass. } }
Or something vaguely like that. Until I figure out how to boot Debian Linux using grub on and from an external drive (to leave the internal drive's MS-Windows garbage unmolested) on my replacement laptop, I'm temporarily out of the Java business.
FWIW
xanthian.
daniel.w.gelder@gmail.com - 14 May 2006 09:25 GMT Yeeeeeaaahhhkaayyyy.....try re-reading my first sentence. :-)
Thanks anyway though for replying. Dan
Patricia Shanahan - 14 May 2006 14:13 GMT > [I don't have much knowledge to offer. Instead, let me > play potted plant here and see if it helps.] [quoted text clipped - 11 lines] > you and having it invoked where you had no such > intention. ...
I think Daniel is using bytecode, rather than Java, as his target language, so there will be things a Java compiler would do automatically that he needs to do explicitly.
However, this does suggest a procedure for solving his problem:
1. Write a Java class with no specified superclass and no constructor declaration.
2. Compile it.
3. Examine the bytecode. See what the compiler generates to represent the default constructor. It will contain a call to the Object constructor.
4. Make the new compiler generate the same thing.
Patricia
Chris Uppal - 14 May 2006 09:36 GMT > Apparently I have to define my own constructor, even if it doesn't do > anything. Yup. There has to be a constructor or there's nothing there for other code to call.
> I seem to need to define an <init> method with signature > "()V", otherwise myClass.newInstance() throws InstantiationException. That's correct.
> So apparently I have to call Object.<init>() too. Makes sense, but I > thought you could never call <init> yourself. Is there a trick here? "Rules change in the reaches"[*]. I.e. this is bytecode land -- a high-level, mostly-dynamic, mostly-OO programming language with an interesting hybrid static/dynamic type system. The rules you learned in Java are only a rough approximation to the rules which apply here.
In this case you are correct. You have to supersend <init> (or use some other flavour of <init>). BTW, if you do supersend then it has to be an invokespecial instruction, invokevirtual isn't allowed here.
The JVM spec is irritatingly incomplete, occasionally ambiguous, and even within those limits, not especially well-written, but it does cover this stuff. You should probably read the whole thing at least once (if you haven't already). Much of it is a non-normative (and largely irrelevant) rehashing of the JLS. Resign yourself to the idea that you are going to be reading the /other/ bits over and over again ;-)
-- chris
[*] Or, if you prefer, "You're not in Kansas anymore". Or even, "Welcome to the /real/ world" ;-)
daniel.w.gelder@gmail.com - 14 May 2006 10:17 GMT I seem to have gotten it at last. I'm kind of surprised how much actual bytecode javac always had to make for
public class Test { }
It uses a lot more space than the original file, that's for sure. Oh well. Time to optimize my compiler frontend and do a little coding.
Dan
Mike Schilling - 16 May 2006 08:05 GMT > Hi, > > I'm in the process of writing a custom compiler for my own language > that will target the JVM. I'm just getting started but I've got a > ClassInfo file successfully streamed. I have a question for anyone who > knows. I know (obviously) nothing about your language, but I'm wondering:
Might it be easier to use Java as an intermediate language? That is, generate Java from your language and then use javac to compile that?
As Chris points out, the JVM spec is irritatingly incomplete about the precise requirements for bytecode, while the JLS and assorted other books are far better at explaining Java. And should you run into trouble, you'll have a much easier time debugging your generated Java than debugging bytecode directly.
Chris Uppal - 16 May 2006 17:12 GMT > Might it be easier to use Java as an intermediate language? That is, > generate Java from your language and then use javac to compile that? Or maybe an intermediate level technology like Javassist.
Just mentioning options; personally I'd pop a beer and get stuck right into the bytecode ;-)
-- chris
daniel.w.gelder@gmail.com - 17 May 2006 02:04 GMT I've already popped several beers and quite a lot of coffee beans too on it!
Actually I tried using Java as an intermediate language first, calling into sun.tools.javac. It worked, but it was really shockingly inefficient in a lot of ways and I lost interest.
Bytecode, while tricky, is at least a challenge.
Dan
dimitar - 16 May 2006 23:11 GMT In addition to the JVM spec, you can also check Bill Venners's "Inside the JVM". It's out of print, but you might find a copy in your library.
Dimitar
dimitar - 16 May 2006 23:12 GMT In addition to the JVM spec, you can also check Bill Venners's "Inside the JVM". It's out of print, but you might find a copy in your library.
Dimitar
daniel.w.gelder@gmail.com - 20 May 2006 23:59 GMT Now I'm getting deep. I have a question to anyone who knows: what is the real difference between local variables and the operand stack in the JVM? Both exist only within a method frame. Operations push and pop only from the operand stack, granted, but it seems like the 'dup' and 'swap' commands are entirely sufficient to compile optimized code, given a non-naive compiler. After all, if you know you'll need the results of an operation more than once, just 'dup' it the first time. If it's not in the right place, 'swap' it. Right?
Mike Schilling - 21 May 2006 02:14 GMT > Now I'm getting deep. I have a question to anyone who knows: what is > the real difference between local variables and the operand stack in > the JVM? There isn't one, really. I vaguely recall reading a paper by a .NET advocate saying that MSIL is superior to Java bytecode because MSIL does make that distinction while bytecode doesn't (I don't remember why this was supposed to be an advantage.)
Chris Uppal - 21 May 2006 11:44 GMT > Operations push and pop > only from the operand stack, granted, but it seems like the 'dup' and > 'swap' commands are entirely sufficient to compile optimized code, > given a non-naive compiler Remember that the JVM bytecode instructions will be translated into real machine operations. If that machine doesn't have an operation stack (or if the JITer -- if any -- doesn't use it) then stack twiddling instructions will be converted into variable-to-variable, or register-to-register, movements. It might be harder or even impossible for the JITer to optimise such code.
Also, don't forget that the operand stack is cleared whenever an exception is thrown.
OTOH, don't go overboard in avoiding stack twiddling -- after all those instructions are there and they are intended to be used. It's probably best to take the output of javac as a guide to how much use to make of the stack.
-- chris
Chris Smith - 21 May 2006 17:23 GMT > Remember that the JVM bytecode instructions will be translated into real > machine operations. If that machine doesn't have an operation stack (or if the > JITer -- if any -- doesn't use it) then stack twiddling instructions will be > converted into variable-to-variable, or register-to-register, movements. It > might be harder or even impossible for the JITer to optimise such code. With a few exceptions, though, optimizers don't optimize native machine code. They optimize intermediate representations. My guess is that an aggressive optimizer does the following:
1. Break up the method into basic blocks.
2. Convert all local variables AND the operand stack into SSA form with temporary variables.
3. Optimize and generate code (including register allocation) from the SSA representation.
So I believe that it wouldn't make much difference in code quality whether data is stored in local variable slots or on the operand stack, because the two will be indistinguishable after the conversion to SSA form. Both dividing the code into basic blocks and writing the SSA representation is relatively cheap (linear on the method length), so this may even be the process for quick optimizations as well.
Where this does make a difference is in the size of the bytecode, and I suspect that was part of the reason for the design choices. When Java code was supposed to be transferred via IR beams between set-top cable television boxes, code size was important. Arguably (though to a lesser extent), it is so again with J2ME.
> It's probably best to > take the output of javac as a guide to how much use to make of the stack. To get the best optimization possible, it's probably best to take the output of javac as a guide for as much as you possibly can. There has undoubtedly been much work done to make the JIT in most major virtual machines work as well as possible with common types of code that are written by javac. The same optimizations aren't likely to happen with someone's one-use code generator. :)
 Signature www.designacourse.com The Easiest Way To Train Anyone... Anywhere.
Chris Smith - Lead Software Developer/Technical Trainer MindIQ Corporation
Mike Schilling - 21 May 2006 17:41 GMT > To get the best optimization possible, it's probably best to take the > output of javac as a guide for as much as you possibly can. There has > undoubtedly been much work done to make the JIT in most major virtual > machines work as well as possible with common types of code that are > written by javac. The same optimizations aren't likely to happen with > someone's one-use code generator. :) And this becomes automatic if your compiler outputs Java. But I repeat myself :-)
Chris Uppal - 22 May 2006 09:47 GMT > With a few exceptions, though, optimizers don't optimize native machine > code. They optimize intermediate representations. Agreed.
> 2. Convert all local variables AND the operand stack into SSA form with > temporary variables. My reason for suspecting that overuse of the stack might impede optimisation is that the semantics of stack operations are richer than just register moves -- for instance an /ordering/ of data items on the stack is implied, even when the algorithm itself doesn't make use of that ordering. If the optimisation phase attempts to preserve that layout, then it'll have more work to do than if the same algorithm had been expressed as movements between (unordered) "registers". I assume that you are right in thinking that an SSA analysis is capable of removing "stackness" from bytecode which passes verification (I know very little about SSA). I would be just a little hesitant in going from that to assuming that <some unknown JVM> does do so in practise -- especially for non-idiomatic bytecode.
-- chris
Roedy Green - 21 May 2006 21:32 GMT >Now I'm getting deep. I have a question to anyone who knows: what is >the real difference between local variables and the operand stack in [quoted text clipped - 4 lines] >results of an operation more than once, just 'dup' it the first time. >If it's not in the right place, 'swap' it. Right? the stack has: 1. return value where to carry on when the method ends. 2. local variables. 3. temporaries needed to evaluate expressions
 Signature Canadian Mind Products, Roedy Green. http://mindprod.com Java custom programming, consulting and coaching.
Roedy Green - 21 May 2006 21:35 GMT On Sun, 21 May 2006 20:32:22 GMT, Roedy Green <my_email_is_posted_on_my_website@munged.invalid> wrote, quoted or indirectly quoted someone who said :
>the stack has: >1. return value where to carry on when the method ends. >2. local variables. >3. temporaries needed to evaluate expressions and 1.5 parameters passed to this method.
 Signature Canadian Mind Products, Roedy Green. http://mindprod.com Java custom programming, consulting and coaching.
Mike Schilling - 22 May 2006 00:32 GMT > On Sun, 21 May 2006 20:32:22 GMT, Roedy Green > <my_email_is_posted_on_my_website@munged.invalid> wrote, quoted or [quoted text clipped - 6 lines] > > and 1.5 parameters passed to this method. I've seen methods with one parameter and methods with two parameters, but I've never seem a method with 1.5 parameters.
Roedy Green - 23 May 2006 04:42 GMT On Sun, 21 May 2006 23:32:54 GMT, "Mike Schilling" <mscottschilling@hotmail.com> wrote, quoted or indirectly quoted someone who said :
>>>1. return value where to carry on when the method ends. >>>2. local variables. [quoted text clipped - 4 lines] >I've seen methods with one parameter and methods with two parameters, but >I've never seem a method with 1.5 parameters. i just meant to insert a point between 1 and 2.
 Signature Canadian Mind Products, Roedy Green. http://mindprod.com Java custom programming, consulting and coaching.
Mike Schilling - 23 May 2006 07:09 GMT > On Sun, 21 May 2006 23:32:54 GMT, "Mike Schilling" > <mscottschilling@hotmail.com> wrote, quoted or indirectly quoted [quoted text clipped - 10 lines] > > i just meant to insert a point between 1 and 2. I know, Roedy, I was just making a joke.
Roedy Green - 23 May 2006 19:51 GMT On Tue, 23 May 2006 06:09:57 GMT, "Mike Schilling" <mscottschilling@hotmail.com> wrote, quoted or indirectly quoted someone who said :
>I know, Roedy, I was just making a joke. I could not tell though your poker face.
 Signature Canadian Mind Products, Roedy Green. http://mindprod.com Java custom programming, consulting and coaching.
Roedy Green - 21 May 2006 21:34 GMT > but it seems like the 'dup' and >'swap' commands are entirely sufficient to compile optimized code, >given a non-naive compiler. After all, if you know you'll need the >results of an operation more than once, just 'dup' it the first time. >If it's not in the right place, 'swap' it. Right? FORTH is a stack based machine similar to the JVM.
In FORTH besides SWAP and DUP you have other operators, most notably PICK to let you get at any element arbitrarily deep in the stack. The JVM does not have nearly as many stack operators as FORTH, but it does let you do stack relative addressing which gives you pick. It also lets you do frame relative addressing to let you access the locals and parms with fixed offsets.
 Signature Canadian Mind Products, Roedy Green. http://mindprod.com Java custom programming, consulting and coaching.
Free MagazinesGet these publications absolutely FREE for up to 12 months. There are no hidden fees and no obligation. Simply choose a title, complete the application form and submit it. Read more ...
|
|
|