Home | Contact Us | FAQ | Search & Site Map | Link to Us
Sign In | Join | Other 45 Sites in Network
HomeAnnouncementsWhite Papers
Discussion GroupsFirst AidDatabasesJavaBeansGUIJava 3DVirtual MachineCORBASecurityToolsGeneral
Java DirectoryOpen Source ProjectsSample Book ChaptersUser GroupsWeb Resources
Related Topics
Databases.NETMore Topics ...

Java Forum / General / May 2006

Tip: Looking for answers? Try searching our database.

Writing custom compiler

Thread view: 
daniel.w.gelder@gmail.com - 14 May 2006 04:52 GMT
Hi,

I'm in the process of writing a custom compiler for my own language
that will target the JVM. I'm just getting started but I've got a
ClassInfo file successfully streamed. I have a question for anyone who
knows.

Apparently I have to define my own constructor, even if it doesn't do
anything. I seem to need to define an <init> method with signature
"()V", otherwise myClass.newInstance() throws InstantiationException.

Anyway, there's nothing in <init> except a return statement. So I get
this exception:

java.lang.VerifyError: (class: Dan, method: <init> signature: ()V)
Constructor must call super() or this()

So apparently I have to call Object.<init>() too. Makes sense, but I
thought you could never call <init> yourself. Is there a trick here?

Thanks.
Dan
Kent Paul Dolan - 14 May 2006 08:30 GMT
[I don't have much knowledge to offer. Instead, let me
play potted plant here and see if it helps.]

> Apparently I have to define my own constructor,
> even if it doesn't do anything.

That shouldn't be the case, if you don't explicitly
define a parameterless void constructor, the system
creates a default one for you which calls super()
and returns. In fact, creating such a constructor
and making it private so it can't be invoked is one
frequently seen trick to prevent the system from
supplying such a default constructor unbeknownst to
you and having it invoked where you had no such
intention.

> I seem to need to define an <init> method with signature
> "()V", otherwise myClass.newInstance() throws InstantiationException.

1) Are those angle brackets a literal part of the name "<init>"?
  Is that some template parameter naming, or what?

2) What does the "V" in "()V" mean? "Void return?"

> Anyway, there's nothing in <init> except a return statement. So I get
> this exception:

> java.lang.VerifyError: (class: Dan, method: <init> signature: ()V)
> Constructor must call super() or this()

Is that a compile time error, or a run time error? It looks
like a runtime invocation of MyClass.init() has encountered
a problem with a constructor for MyClass being missing, but
as noted above, one should be created (and then invoked) by
default if you don't prevent that happening.

> So apparently I have to call Object.<init>() too. Makes sense, but I
> thought you could never call <init> yourself. Is there a trick here?

I'm still confused by those angle brackets, but it isn't "init"
that you are being told to call, you are being told, at a default
invocation of MyClass.init() at startup time, that you haven't
yet instantiated an object from MyClass, thus there is no object
whose instance (as opposed to static) method init() can be invoked.

A common pattern is to have your main routine inherit from
applet, to instance applet in main(...), then to invoke init()
from that instance.

class MyClass extends applet
{
 private static MyClass mc = null;

 public MyClass() // constructor
 {
   super();
 }
 void main(...) // must exist in some class of your app
 {
   mc = new MyClass();
   mc.init(); // yes, you _can_ call init() yourself
   // ... do more stuff
 }
 void init()
 {
   // initialize stuff for instance object mc of MyClass.
 }
}

Or something vaguely like that. Until I figure out how
to boot Debian Linux using grub on and from an external
drive (to leave the internal drive's MS-Windows garbage
unmolested) on my replacement laptop, I'm temporarily
out of the Java business.

FWIW

xanthian.
daniel.w.gelder@gmail.com - 14 May 2006 09:25 GMT
Yeeeeeaaahhhkaayyyy.....try re-reading my first sentence. :-)

Thanks anyway though for replying.
Dan
Patricia Shanahan - 14 May 2006 14:13 GMT
> [I don't have much knowledge to offer. Instead, let me
> play potted plant here and see if it helps.]
[quoted text clipped - 11 lines]
> you and having it invoked where you had no such
> intention.
...

I think Daniel is using bytecode, rather than Java, as his target
language, so there will be things a Java compiler would do automatically
that he needs to do explicitly.

However, this does suggest a procedure for solving his problem:

1. Write a Java class with no specified superclass and no constructor
declaration.

2. Compile it.

3. Examine the bytecode. See what the compiler generates to represent
the default constructor. It will contain a call to the Object constructor.

4. Make the new compiler generate the same thing.

Patricia
Chris Uppal - 14 May 2006 09:36 GMT
> Apparently I have to define my own constructor, even if it doesn't do
> anything.

Yup.  There has to be a constructor or there's nothing there for other code to
call.

> I seem to need to define an <init> method with signature
> "()V", otherwise myClass.newInstance() throws InstantiationException.

That's correct.

> So apparently I have to call Object.<init>() too. Makes sense, but I
> thought you could never call <init> yourself. Is there a trick here?

"Rules change in the reaches"[*].  I.e. this is bytecode land -- a high-level,
mostly-dynamic, mostly-OO programming language with an interesting hybrid
static/dynamic type system.  The rules you learned in Java are only a rough
approximation to the rules which apply here.

In this case you are correct.  You have to supersend <init> (or use some other
flavour of <init>).  BTW, if you do supersend then it has to be an
invokespecial instruction, invokevirtual isn't allowed here.

The JVM spec is irritatingly incomplete, occasionally ambiguous, and even
within those limits, not especially well-written, but it does cover this stuff.
You should probably read the whole thing at least once (if you haven't
already).  Much of it is a non-normative (and largely irrelevant) rehashing of
the JLS.  Resign yourself to the idea that you are going to be reading the
/other/ bits over and over again ;-)

   -- chris

[*] Or, if you prefer, "You're not in Kansas anymore".  Or even, "Welcome to
the /real/ world" ;-)
daniel.w.gelder@gmail.com - 14 May 2006 10:17 GMT
I seem to have gotten it at last. I'm kind of surprised how much actual
bytecode javac always had to make for

public class Test {
}

It uses a lot more space than the original file, that's for sure. Oh
well. Time to optimize my compiler frontend and do a little coding.

Dan
Mike Schilling - 16 May 2006 08:05 GMT
> Hi,
>
> I'm in the process of writing a custom compiler for my own language
> that will target the JVM. I'm just getting started but I've got a
> ClassInfo file successfully streamed. I have a question for anyone who
> knows.

I know (obviously) nothing about your language, but I'm wondering:

Might it be easier to use Java as an intermediate language?  That is,
generate Java from your language and then use javac to compile that?

As Chris points out, the JVM spec is irritatingly incomplete about the
precise requirements for bytecode, while the JLS and assorted other books
are far better at explaining Java.  And should you run into trouble, you'll
have a much easier time debugging your generated Java than debugging
bytecode directly.
Chris Uppal - 16 May 2006 17:12 GMT
> Might it be easier to use Java as an intermediate language?  That is,
> generate Java from your language and then use javac to compile that?

Or maybe an intermediate level technology like Javassist.

Just mentioning options; personally I'd pop a beer and get stuck right into the
bytecode ;-)

   -- chris
daniel.w.gelder@gmail.com - 17 May 2006 02:04 GMT
I've already popped several beers and quite a lot of coffee beans too
on it!

Actually I tried using Java as an intermediate language first, calling
into sun.tools.javac. It worked, but it was really shockingly
inefficient in a lot of ways and I lost interest.

Bytecode, while tricky, is at least a challenge.

Dan
dimitar - 16 May 2006 23:11 GMT
In addition to the JVM spec, you can also check Bill Venners's "Inside
the JVM". It's out of print, but you might find a copy in your library.

Dimitar
dimitar - 16 May 2006 23:12 GMT
In addition to the JVM spec, you can also check Bill Venners's "Inside
the JVM". It's out of print, but you might find a copy in your library.

Dimitar
daniel.w.gelder@gmail.com - 20 May 2006 23:59 GMT
Now I'm getting deep. I have a question to anyone who knows: what is
the real difference between local variables and the operand stack in
the JVM? Both exist only within a method frame. Operations push and pop
only from the operand stack, granted, but it seems like the 'dup' and
'swap' commands are entirely sufficient to compile optimized code,
given a non-naive compiler. After all, if you know you'll need the
results of an operation more than once, just 'dup' it the first time.
If it's not in the right place, 'swap' it. Right?
Mike Schilling - 21 May 2006 02:14 GMT
> Now I'm getting deep. I have a question to anyone who knows: what is
> the real difference between local variables and the operand stack in
> the JVM?

There isn't one, really.  I vaguely recall reading a paper by a .NET
advocate saying that MSIL is superior to Java bytecode because MSIL does
make that distinction while bytecode doesn't  (I don't remember why this was
supposed to be an advantage.)
Chris Uppal - 21 May 2006 11:44 GMT
> Operations push and pop
> only from the operand stack, granted, but it seems like the 'dup' and
> 'swap' commands are entirely sufficient to compile optimized code,
> given a non-naive compiler

Remember that the JVM bytecode instructions will be translated into real
machine operations.  If that machine doesn't have an operation stack (or if the
JITer -- if any -- doesn't use it) then stack twiddling instructions will be
converted into variable-to-variable, or register-to-register, movements.  It
might be harder or even impossible for the JITer to optimise such code.

Also, don't forget that the operand stack is cleared whenever an exception is
thrown.

OTOH, don't go overboard in avoiding stack twiddling -- after all those
instructions are there and they are intended to be used.  It's probably best to
take the output of javac as a guide to how much use to make of the stack.

   -- chris
Chris Smith - 21 May 2006 17:23 GMT
> Remember that the JVM bytecode instructions will be translated into real
> machine operations.  If that machine doesn't have an operation stack (or if the
> JITer -- if any -- doesn't use it) then stack twiddling instructions will be
> converted into variable-to-variable, or register-to-register, movements.  It
> might be harder or even impossible for the JITer to optimise such code.

With a few exceptions, though, optimizers don't optimize native machine
code.  They optimize intermediate representations.  My guess is that an
aggressive optimizer does the following:

1. Break up the method into basic blocks.

2. Convert all local variables AND the operand stack into SSA form with
temporary variables.

3. Optimize and generate code (including register allocation) from the
SSA representation.

So I believe that it wouldn't make much difference in code quality
whether data is stored in local variable slots or on the operand stack,
because the two will be indistinguishable after the conversion to SSA
form.  Both dividing the code into basic blocks and writing the SSA
representation is relatively cheap (linear on the method length), so
this may even be the process for quick optimizations as well.

Where this does make a difference is in the size of the bytecode, and I
suspect that was part of the reason for the design choices.  When Java
code was supposed to be transferred via IR beams between set-top cable
television boxes, code size was important.  Arguably (though to a lesser
extent), it is so again with J2ME.

> It's probably best to
> take the output of javac as a guide to how much use to make of the stack.

To get the best optimization possible, it's probably best to take the
output of javac as a guide for as much as you possibly can.  There has
undoubtedly been much work done to make the JIT in most major virtual
machines work as well as possible with common types of code that are
written by javac.  The same optimizations aren't likely to happen with
someone's one-use code generator. :)

Signature

www.designacourse.com
The Easiest Way To Train Anyone... Anywhere.

Chris Smith - Lead Software Developer/Technical Trainer
MindIQ Corporation

Mike Schilling - 21 May 2006 17:41 GMT
> To get the best optimization possible, it's probably best to take the
> output of javac as a guide for as much as you possibly can.  There has
> undoubtedly been much work done to make the JIT in most major virtual
> machines work as well as possible with common types of code that are
> written by javac.  The same optimizations aren't likely to happen with
> someone's one-use code generator. :)

And this becomes automatic if your compiler outputs Java.  But I repeat
myself :-)
Chris Uppal - 22 May 2006 09:47 GMT
> With a few exceptions, though, optimizers don't optimize native machine
> code.  They optimize intermediate representations.

Agreed.

> 2. Convert all local variables AND the operand stack into SSA form with
> temporary variables.

My reason for suspecting that overuse of the stack might impede optimisation is
that the semantics of stack operations are richer than just register moves --
for instance an /ordering/ of data items on the stack is implied, even when the
algorithm itself doesn't make use of that ordering.  If the optimisation phase
attempts to preserve that layout, then it'll have more work to do than if the
same algorithm had been expressed as movements between (unordered) "registers".
I assume that you are right in thinking that an SSA analysis is capable of
removing "stackness" from bytecode which passes verification (I know very
little about SSA).  I would be just a little hesitant in going from that to
assuming that <some unknown JVM> does do so in practise -- especially for
non-idiomatic bytecode.

   -- chris
Roedy Green - 21 May 2006 21:32 GMT
>Now I'm getting deep. I have a question to anyone who knows: what is
>the real difference between local variables and the operand stack in
[quoted text clipped - 4 lines]
>results of an operation more than once, just 'dup' it the first time.
>If it's not in the right place, 'swap' it. Right?

the stack has:
1. return value where to carry on when the method ends.
2. local variables.
3. temporaries needed to evaluate expressions
Signature

Canadian Mind Products, Roedy Green.
http://mindprod.com Java custom programming, consulting and coaching.

Roedy Green - 21 May 2006 21:35 GMT
On Sun, 21 May 2006 20:32:22 GMT, Roedy Green
<my_email_is_posted_on_my_website@munged.invalid> wrote, quoted or
indirectly quoted someone who said :

>the stack has:
>1. return value where to carry on when the method ends.
>2. local variables.
>3. temporaries needed to evaluate expressions

and 1.5 parameters passed to this method.
Signature

Canadian Mind Products, Roedy Green.
http://mindprod.com Java custom programming, consulting and coaching.

Mike Schilling - 22 May 2006 00:32 GMT
> On Sun, 21 May 2006 20:32:22 GMT, Roedy Green
> <my_email_is_posted_on_my_website@munged.invalid> wrote, quoted or
[quoted text clipped - 6 lines]
>
> and 1.5 parameters passed to this method.

I've seen methods with one parameter and methods with two parameters, but
I've never seem a method with 1.5 parameters.
Roedy Green - 23 May 2006 04:42 GMT
On Sun, 21 May 2006 23:32:54 GMT, "Mike Schilling"
<mscottschilling@hotmail.com> wrote, quoted or indirectly quoted
someone who said :

>>>1. return value where to carry on when the method ends.
>>>2. local variables.
[quoted text clipped - 4 lines]
>I've seen methods with one parameter and methods with two parameters, but
>I've never seem a method with 1.5 parameters.

i just meant to insert a point between 1 and 2.
Signature

Canadian Mind Products, Roedy Green.
http://mindprod.com Java custom programming, consulting and coaching.

Mike Schilling - 23 May 2006 07:09 GMT
> On Sun, 21 May 2006 23:32:54 GMT, "Mike Schilling"
> <mscottschilling@hotmail.com> wrote, quoted or indirectly quoted
[quoted text clipped - 10 lines]
>
> i just meant to insert a point between 1 and 2.

I know, Roedy, I was just making a joke.
Roedy Green - 23 May 2006 19:51 GMT
On Tue, 23 May 2006 06:09:57 GMT, "Mike Schilling"
<mscottschilling@hotmail.com> wrote, quoted or indirectly quoted
someone who said :

>I know, Roedy, I was just making a joke.

I could not tell though your poker face.
Signature

Canadian Mind Products, Roedy Green.
http://mindprod.com Java custom programming, consulting and coaching.

Roedy Green - 21 May 2006 21:34 GMT
> but it seems like the 'dup' and
>'swap' commands are entirely sufficient to compile optimized code,
>given a non-naive compiler. After all, if you know you'll need the
>results of an operation more than once, just 'dup' it the first time.
>If it's not in the right place, 'swap' it. Right?

FORTH is a stack based machine similar to the JVM.

In FORTH besides SWAP and DUP you have other operators, most notably
PICK to let you get at any element arbitrarily deep in the stack. The
JVM does not have nearly as many stack operators as FORTH, but it does
let you do stack relative addressing which gives you pick. It also
lets you do frame relative addressing to let you access the locals and
parms with fixed offsets.

Signature

Canadian Mind Products, Roedy Green.
http://mindprod.com Java custom programming, consulting and coaching.



Free Magazines

Get these publications absolutely FREE for up to 12 months. There are no hidden fees and no obligation. Simply choose a title, complete the application form and submit it. Read more ...

Oracle MagazineNetwork ComputingComputer WorldBio-IT WorldeWeekInformation WeekInfosecurity
 
Sign In
Join
My Latest Posts
My Monitored Threads
My Blog
My Photo Gallery
My Profile
My Homepage

Start New Thread
Enable EMail Alerts
Rate this Thread



©2008 Advenet LLC   Privacy Policy - Terms of Use
This website includes both content owned or controlled by Advenet as well as content owned or controlled by third parties.