Home | Contact Us | FAQ | Search & Site Map | Link to Us
Sign In | Join | Other 45 Sites in Network
HomeAnnouncementsWhite Papers
Discussion GroupsFirst AidDatabasesJavaBeansGUIJava 3DVirtual MachineCORBASecurityToolsGeneral
Java DirectoryOpen Source ProjectsSample Book ChaptersUser GroupsWeb Resources
Related Topics
Databases.NETMore Topics ...

Java Forum / General / March 2006

Tip: Looking for answers? Try searching our database.

Performance Q: java hotspot vs. native code

Thread view: 
Twisted - 14 Mar 2006 20:05 GMT
In each of these two cases, would optimized C run substantially faster
than Java (hotspot or other JIT VM)?

* A number-crunching algorithm with a tight loop and a large number of
iterations (from
 thousands potentially up to millions, or more) using doubles.

* Ditto, but with the C code using arrays of uints and carries to
effect a high precision
 fixed-point math, and the Java code using BigDecimals.

* Ditto, but with roll-your-own Java BigDecimal-alikes using arrays and
math.

If there's an even higher performance option (short of compile-to-FPGA
:)) for the high-precision cases, please let me know about that as
well. (I know when it gets up into the 500+ digits it can be faster to
use FFT for the multiplies -- O(n log n) vs O(n^2). I'll cross that one
when I come to it.)
tom fredriksen - 14 Mar 2006 22:38 GMT
> In each of these two cases, would optimized C run substantially faster
> than Java (hotspot or other JIT VM)?

You would be hard pressed to find anything that runs faster than C. The
reason for this is simple, C is a low abstraction language only slightly
more abstracted than assembler. This means it produces platform close
code with few runtime hindrances. While any interpreted language will be
reduced by active runtime checks and interpreter operations. This
particularly applies to languages with higher abstraction levels than C,
which is quite a few these days, e.g. perl, java, ruby, lisp etc.
Even natively compiled java code would still be slower as it still needs
runtime checks and so forth.

Of course algorithm is another of the most important factors, but if its
the same in both implementations then C wins for the aforementioned
reasons. The only way an any other language might win is if the language
has an algorithmic enhancer which changes the code to an algorithm
better than the one you have programmed, but that is not likely to
happen. (The only thing I can think of for this to be true is f.ex. that
javas regexp engine is faster than a similar regexp engine used in c,
but that comes down to algorithm again.)

/tom
Twisted - 15 Mar 2006 03:34 GMT
This even applies to basic math calcs, without e.g. arrays (and thus
bounds-checking) and objects (dynamic dispatch, null pointer checking)?
tom fredriksen - 15 Mar 2006 09:08 GMT
> This even applies to basic math calcs, without e.g. arrays (and thus
> bounds-checking) and objects (dynamic dispatch, null pointer checking)?

I don have a definitive answer for that, because it depends on some issues.

The first question is if the code is absolutely free of any support
methods or mechanisms in the language that needs f.ex. runtime checks
and controls.

The second is runtime environment.
If its interpreted; then most likely, at least because of the
interpreter operations.
If its compiled to native code then it might happen.

If its only basic calcs with native data types, then I suggest you make
a prototype in both languages and compare them, just make sure the codes
are equal otherwise some language thing might make some difference.

For the sake of it I will try to figure out a prototype test in both
language, and give it a go I will post it here, please do so aswell as
we could have two different operations to compare wrt speed.

/tom
tom fredriksen - 15 Mar 2006 23:16 GMT
>> This even applies to basic math calcs, without e.g. arrays (and thus
>> bounds-checking) and objects (dynamic dispatch, null pointer checking)?
>
> For the sake of it I will try to figure out a prototype test in both
> language, and give it a go I will post it here, please do so aswell as
> we could have two different operations to compare wrt speed.

I made a test which perform a simplified in_cksum calculation on a 64KB
packet in a loop. It is done in both C and java (with -server and gcj
(gcc java compiler)

The results where the following:

java -client  11.85    (java 1.5.0_04)
java -server  11.90
gcj         12.26    (gcc 3.3.2 on linux 2.6.3)
C integer     11.01   
C float         7.23

So my advice would be to use C if you need absolute speed, but if you
can accept a little reduction, then java might be ok if you are sticking
with pure native datatypes and operations.

Since the in_cksum operation is entirely an integer operation the C
float version is not really interesting (but I made a mistake to begin
with so I thought it was a valid result)

The code is as follows:

(The C code has to be sligthly changed otherwise the array
initailisation routine would have used float operations instead leading
to the whole program being float operations, with invalid results.)

/***** JAVA *****/

public class Cksum
{
    public static void main(String args[])
        {
            Random rand = new Random();
            int total = 0;
            int count = 65500;
            int data[] = new int[count];

            for(int c=0; c<count; c++) {
                data[c] = rand.nextInt(2000000000);
            }
               
            long startTime = System.currentTimeMillis();
            for(int d=0; d<50000; d++) {
                for(int c=0; c<count; c++) {
                    total += data[c];
                }
            }
            long endTime = System.currentTimeMillis();
           
            System.out.println("Elapsed time (ms): " + (endTime - startTime));
            System.out.println("Total: " + total);
        }
}

/***** C *****/

#include <stdio.h>
#include <stdlib.h>
#include <sys/time.h>

int main(int argc, char *argv[])
{
    unsigned int total = 0;
    int count = 65500;
    unsigned int data[count];
    unsigned int data2[count];

    for(int c=0; c<count; c++) {
/*         data[c]=1.0+(unsigned int) (2000000000.0*rand()/(RAND_MAX+1.0)); */
        data2[c] = data[c];
    }

    struct timeval start_time;
    struct timeval end_time;
   
    gettimeofday(&start_time, NULL);

    for(int d=0; d<50000; d++) {
        for(int c=0; c<count; c++) {
            total += data[c];
        }
    }
    gettimeofday(&end_time, NULL);

    double t1=(start_time.tv_sec*1000)+(start_time.tv_usec/1000.0);
    double t2=(end_time.tv_sec*1000)+(end_time.tv_usec/1000.0);

    printf("Elapsed time (ms): %.6lf\n", t2-t1);
    printf("Total: %u\n", total);

    for(int c=0; c<100; c++) {
        printf("data2: %u  ", data2[c]);
    }
    printf("\n");
   
    return(0);
}
Roedy Green - 16 Mar 2006 02:06 GMT
>java -client  11.85    (java 1.5.0_04)
>java -server  11.90
>gcj         12.26    (gcc 3.3.2 on linux 2.6.3)
>C integer     11.01   
>C float         7.23

have yo uposted the code for your benchmark.  Iwould like to try it
with Jet.
Signature

Canadian Mind Products, Roedy Green.
http://mindprod.com Java custom programming, consulting and coaching.

tom fredriksen - 16 Mar 2006 09:43 GMT
>> java -client  11.85    (java 1.5.0_04)
>> java -server  11.90
[quoted text clipped - 4 lines]
> have yo uposted the code for your benchmark.  Iwould like to try it
> with Jet.

What do you mean? Its in the post at the end.

/tom
Roedy Green - 16 Mar 2006 19:15 GMT
>/*         data[c]=1.0+(unsigned int) (2000000000.0*rand()/(RAND_MAX+1.0)); */

I don't get it. What are you comparing?  The algorithms are not even
close.  and you have the code commented out.
Signature

Canadian Mind Products, Roedy Green.
http://mindprod.com Java custom programming, consulting and coaching.

tom fredriksen - 16 Mar 2006 19:34 GMT
>> /*         data[c]=1.0+(unsigned int) (2000000000.0*rand()/(RAND_MAX+1.0)); */
>
> I don't get it. What are you comparing?  The algorithms are not even
> close.  and you have the code commented out.

What do you mean? I am comparing the following.

C:

    for(int d=0; d<50000; d++) {
        for(int c=0; c<count; c++) {
            total += data[c];
        }
    }

Java:

    for(int d=0; d<50000; d++) {
                for(int c=0; c<count; c++) {
                    total += data[c];
                }
            }

All the other stuff appears outside the loop, so it is irrelevant. In
the C code I have to use a different initialisation method to be sure I
get integer operations that are comparable, but it should not affect the
measurement. Thats the only difference. Just so you now, the algorithm
is a simplified internet checksum used in tcp/ip, which does not take
into account overflow, its just a simple test that performs some
"credible" math operation.

/tom
Roedy Green - 16 Mar 2006 19:59 GMT
>public class Cksum

I ran it on my machine:

with Jet
Elapsed time (ms): 4641
Total: 476899872

with Java 1.6 client
Elapsed time (ms): 11297
Total: -1699311728

I would change the benchmark to
Random rand = new Random(149);
to give repeatable results. Then that total would verify the algorithm
worked.

That makes Jet the clear winner, far faster than C.

That is because Jet does loop unravelling.

Signature

Canadian Mind Products, Roedy Green.
http://mindprod.com Java custom programming, consulting and coaching.

tom fredriksen - 17 Mar 2006 00:11 GMT
>> public class Cksum
>
[quoted text clipped - 14 lines]
>
> That makes Jet the clear winner, far faster than C.

I consider using Jet is cheating, if I had used other techniques to
enhance the C code or had access to some C optimisers I am sure I could
make it go faster as well. But I was trying to make an equal
implementation/compilation comparison.

Some comments
- did you try it with the java 1.5 which is production grade compared to
1.6, with server option as well.
- where are the numbers for the C implementation on your machine
  otherwise you need to tell us what kind of machine are you using?

/tom
Roedy Green - 17 Mar 2006 01:02 GMT
>I consider using Jet is cheating, if I had used other techniques to
>enhance the C code or had access to some C optimisers I am sure I could
>make it go faster as well. But I was trying to make an equal
>implementation/compilation comparison.

It is not cheating if the result is the same and there are no
conditions under which the code does not work.  It is simply using a
superior compiler.  If you want to compare languages it is silly
comparing them with less than the best compilers.  You are then
artificially skewing the result by which inept compilers you choose.

The best compilers are limited only by the theoretical constraints of
the language.  The not so hot ones have all sorts of limitation
nothing whatever to do with the language.
Signature

Canadian Mind Products, Roedy Green.
http://mindprod.com Java custom programming, consulting and coaching.

tom fredriksen - 17 Mar 2006 01:08 GMT
>> I consider using Jet is cheating, if I had used other techniques to
>> enhance the C code or had access to some C optimisers I am sure I could
[quoted text clipped - 10 lines]
> the language.  The not so hot ones have all sorts of limitation
> nothing whatever to do with the language.

But where are your numbers for the C version and what cpu are you
running it on?

/tom
Roedy Green - 17 Mar 2006 03:38 GMT
>But where are your numbers for the C version and what cpu are you
>running it on?

you don't need them.  All you need in the ratio of Jet to Java
-client.
Signature

Canadian Mind Products, Roedy Green.
http://mindprod.com Java custom programming, consulting and coaching.

tom fredriksen - 17 Mar 2006 10:44 GMT
>> But where are your numbers for the C version and what cpu are you
>> running it on?
>
> you don't need them.  All you need in the ratio of Jet to Java
> -client.

Yes if it had been me running it on my machine, but since it is running
on a different machine with different software than I used to test it it
does matter. F.ex java 6 is not yet as fast as java 5, it might contain
other runtime enhancements than java 5 and so on.

In any case you mentioned Jet is using loop unrolling, which is a
techniques which boils down to algorithm enhancing, so by that if the C
code had been using loop unrolling too, the results would be different.

You have to compare an apple with an apple not with a sugar coated
apple. My point is this, code enhancements can be categorised and you
can not use enhancements from a different category because it
significantly changes the comparison. Because then its not about
comparing languages its about comparing different algorithms.

To perform proper comparison and measurements all test must run under
same or similar conditions, you can not mix and switch as you desire and
then make a claim.

/tom
Chris Uppal - 17 Mar 2006 11:28 GMT
> To perform proper comparison and measurements all test must run under
> same or similar conditions, you can not mix and switch as you desire and
> then make a claim.

This is true.  But don't take it too far; if one implementation strategy makes
certain types of automatic optimisation possible which are impossible to apply
with another strategy, then the advantages of using those optimisations are
legitimately part of the comparison.  They don't turn it into an apples vs.
oranges comparison.

E.g. if a JITing JVM can detect the availability of Intel SMP instructions, and
dynamically choose to generate code which utilises them, then that is a
legitimate advantage over another implementation (of Java, C, or anything else)
which uses static compilation, and therefore does not generate comparable code.

Another example, also hypothetical.  If the semantics of a language such as C
are such that the compiler cannot perform automatic loop unrolling, whereas the
semantics of another language are such that the compiler /can/ spot some
opportunities unaided, then it's perfectly legitimate to compare the two
implementations directly.

   -- chris
tom fredriksen - 17 Mar 2006 12:51 GMT
>> To perform proper comparison and measurements all test must run under
>> same or similar conditions, you can not mix and switch as you desire and
[quoted text clipped - 10 lines]
> legitimate advantage over another implementation (of Java, C, or anything else)
> which uses static compilation, and therefore does not generate comparable code.

If the claim is "compile the fastest code you possibly get in C and
Java" then yes you are right, but then you are discussing which language
has come further along in their development of optimised code.
Sort of like comparing a Ferrari to a Koenigsegg car.

It is another matter to do a test, but limit what one language can use
and dont limit what another language can use. One car must be a Ford
Mondeo or similar, but the other car can be a Ferrari or similar if it
wants. Then you are not comparing speeds of comparable items.

> Another example, also hypothetical.  If the semantics of a language such as C
> are such that the compiler cannot perform automatic loop unrolling, whereas the
> semantics of another language are such that the compiler /can/ spot some
> opportunities unaided, then it's perfectly legitimate to compare the two
> implementations directly.

Since loop unrolling and smp systems make the test enter an entirely
different class of performance, you can not use those techniques unless
both tests are using them. Otherwise its not a race, its a slaughter:)

When setting out on a project to perform a test, a statement of what the
test is to perform must be decided. After that is done, it must be made
sure that the test is objective and comparative based on the premise of
the test. I am not claiming that my test absolutely adheres to those two
criteria, basically because its an informal test, but I did try to make
it relatively comparable. But there are of course too many variables in
a proper test for me to undertake now. Because you would have to
classify all compiler and programming techniques etc and decide which
are applicable for the test to be objective and so on.

/tom
Roedy Green - 17 Mar 2006 20:47 GMT
>If the claim is "compile the fastest code you possibly get in C and
>Java" then yes you are right, but then you are discussing which language
[quoted text clipped - 5 lines]
>Mondeo or similar, but the other car can be a Ferrari or similar if it
>wants. Then you are not comparing speeds of comparable items.

If you introduce handicaps, YOU are rigging the outcome. You are not
really measuring anything objective. You are tricking people into
accepting your test as an objective measure of merit.

What counts is which performs best in the real world.  Your job is to
make the test as reflective as possible of the real world, not to make
decisions on which optimisation techniques count as valid, unless for
some reason a technique could not actually be used in the real world.

That is why, for example, you make the tests add and print results so
the optimiser can't discard code in the test, which it could not do in
the real world.  You do that by making the test more realistic, not by
disqualifying an optimiser.

.
Signature

Canadian Mind Products, Roedy Green.
http://mindprod.com Java custom programming, consulting and coaching.

tom fredriksen - 17 Mar 2006 21:25 GMT
> What counts is which performs best in the real world.  Your job is to
> make the test as reflective as possible of the real world, not to make
> decisions on which optimisation techniques count as valid, unless for
> some reason a technique could not actually be used in the real world.

That would have been true if the point of the test was "get the best
performance you can get of these two languages", but it was not it was
an informal comparison to chart the landscape.

> That is why, for example, you make the tests add and print results so
> the optimiser can't discard code in the test, which it could not do in
> the real world.  

That has nothing to do with the rigging the test, it helps set up a
comparable test, and you know it. Stick to the facts, not what suits
your arguments.

> You do that by making the test more realistic, not by
> disqualifying an optimiser.

Of course it is entirely possible to implement another test which does
exhibit such behaviour. please do so then, I have accomplished what I
want. If you want something else then feel free to do so or not.

/tom
Scott Ellsworth - 20 Mar 2006 21:20 GMT
> > What counts is which performs best in the real world.  Your job is to
> > make the test as reflective as possible of the real world, not to make
[quoted text clipped - 4 lines]
> performance you can get of these two languages", but it was not it was
> an informal comparison to chart the landscape.

Right, and part of the landscape is the available tool set.

BEA's JRockit has not been ported to the Mac, so my interest in it is
minimal.  GCC is on my platform, so my interest in it, especially with a
reasonable optimization set, is high.

Similarly, someone on a platform where Jet works is going to be
interested in it, while on an unsupported platform, it does them little
good.  It is not part of the landscape that they want charted.

So, whether _you_ find JRockit, Jet, or GCC with certain optimizations
on useful for your purposes, it is still a valid comparison for some
potential users.  It lets them chart their landscape.

Scott

Signature

Scott Ellsworth
scott@alodar.nospam.com
Java and database consulting for the life sciences

Roedy Green - 17 Mar 2006 20:35 GMT
>To perform proper comparison and measurements all test must run under
>same or similar conditions, you can not mix and switch as you desire and
>then make a claim.

The result I was talking about was a factor of 4 faster. No fine
detail is going to change that.

You remind me of a skinny kid named Ritchie Dowrey at whose house we
used to play football.  He owned the football.  One every play he had
a "new rule" that always favoured his team.  Nobody knew enough to
challenge him.

The theme was echoed in the movie Can Hieronymus Merkin Ever Forget
Mercy Humppe and Find True Happiness?

You are making your rules up on the fly to generate your desired
result. You are behaving like a religious fanatic distorting the
evidence to produce a predecided conclusion.

Look at this from a practical point of view.  You don't really care
HOW a compiler gets its speed, all you care about is does it do the
calculations faster. Therefore I dismiss your talk of the compiler
"cheating".
Signature

Canadian Mind Products, Roedy Green.
http://mindprod.com Java custom programming, consulting and coaching.

tom fredriksen - 17 Mar 2006 21:18 GMT
> The result I was talking about was a factor of 4 faster. No fine
> detail is going to change that.

Now you are spreading FUD, microsoft style.

> You are making your rules up on the fly to generate your desired
> result. You are behaving like a religious fanatic distorting the
> evidence to produce a predecided conclusion.

Enough with the personal characterisations! It makes you look like a
fanatic desperately trying to convince everybody you are right.

> Look at this from a practical point of view.  You don't really care
> HOW a compiler gets its speed, all you care about is does it do the
> calculations faster. Therefore I dismiss your talk of the compiler
> "cheating".

That's your prerogative, It still does not give you a statistically
sound or objective result. Because you are controlling the results.
I am not saying the measurement I am doing are perfect, just that they
are more fair than what yours are.

But if you are convinced you are right, you can prove it by doing the
following.

- implement loop unrolling and use a C optimiser, then run the tests
again, then post the details of the code and optimiser used.
- post the measurement numbers of both C tests.

if you can not do that, you can not prove fairness. You have nothing to
loose because the Jet version is, according to you, superior anyway.

/tom
Thomas Hawtin - 17 Mar 2006 20:58 GMT
> java -client  11.85    (java 1.5.0_04)
> java -server  11.90
> gcj          12.26    (gcc 3.3.2 on linux 2.6.3)
> C integer     11.01  
> C float          7.23

I rewrote the Java version of the microbenchmark to be more realistic
and conventional. My results

1.5.0_06-b05, Client: 12216, 12182, 12174, 12186
1.5.0_06-b05, Server: 11079, 4210, 4207, 4231
1.6.0-beta2-b76, Client: 10203, 10191, 10208, 10247
1.6.0-beta2-b76, Server: 12675, 12668, 5484, 5491
g++ (GCC) 4.0.0 20050519 (Red Hat 4.0.0-8), -O3: 6647.844000
  (using commented out code: 5849.171000)

So what can we conclude? After a start up penalty, Sun's current Server
HotSpot is much faster the C++. No. Microbenchmarks can help your
understanding of how a particular compiler behaves. They are useless at
determining the goodness of performance across languages.

Tom Hawtin

class Checksum {
    private static int core(int[] data) {
        int count = data.length;
        int total = 0;

        for (int d=0; d<50000; d++) {
            for (int c=0; c<count; c++) {
                total += data[c];
            }
        }
        return total;
    }

    public static void main(String[] args) {
        java.util.Random rand = new java.util.Random();
        int count = 65500;
        int[] data = new int[count];

        for (int c=0; c<count; c++) {
            data[c] = rand.nextInt(2000000000);
        }
        for (int run=0; run<4; ++run) {
            long startTime = System.currentTimeMillis();
            int total = core(data);
            long endTime = System.currentTimeMillis();

            System.out.println("Elapsed time (ms): " + (endTime -
startTime));
            System.out.println("Total: " + total);
        }
    }
}
Signature

Unemployed English Java programmer
http://jroller.com/page/tackline/

Chris Uppal - 19 Mar 2006 15:43 GMT
> So what can we conclude? After a start up penalty, Sun's current Server
> HotSpot is much faster the C++. No. Microbenchmarks can help your
> understanding of how a particular compiler behaves. They are useless at
> determining the goodness of performance across languages.

I got interested enough to reproduce Thomas's tests with a number of C++
compilers.

gcc running with -O3, and no other optimisation settings (life's too short even
to read the man page!).

MS VC6, in "Release" mode, plus telling it to optimise for speed only, and to
generate code targetting the "Pentium Pro" (the most modern target available).

MS VS 2003 in default "Release" mode.  Note that this includes array overrun
checking by default (presumably Tom considers this necessary foran apples to
apples comparison -- although I don't).

MS VS 2003 in "Release" mode, plus telling it to generate code for a Pentium 4,
and turning on all the other relevant-looking optimisations.

Java -client and -server.  In both cases JDK 1.5.0

Results are:

gcc -O3         5458    5177   5278   5187
vc6 +opt        7020    6850   6759   6850
vs2003          3555    3385   3465   3385
vs2003 +opt   3635    3385   3385   3465
java -client    13770  13610 13699 13620
java -server   11456    3485   3365   3385

In all cases running on a 1.5 GHz celeron box.  I haven't attempted to explore
what would happen running the same code on diferent chips (especially AMD).

What can we conclude ?  Well, provided we remember that this is only one very,
very, specific test, and that other apparently similar tests might give very
different results, I think it's obvious...

   -- chris
Twisted - 21 Mar 2006 21:14 GMT
At this point, it's looking like java -server is comparable to C++ with
reasonably up-to-date stuff and integer math in a tight loop.

What about floating point math (say, a few adds and a couple mults) in
a similar loop? How does it perform on different chips? Say (and
someone in this group probably has access to each of these)
-- Latest 32-bit Intel offering
-- AMD Athlon same clock speed
-- Athlon 64, same speed again
-- dual core? (double the data length if you can make it use both
cores; if you can't, report that fact.)

And what exactly is Jet? I know, I know, google it, but somehow I doubt
page after page of aeronautical Web sites will be enlightening in this
instance.

--
I am the terror that flaps in the net!
I am the elusive window handle leak two hours before it's due to ship!
I am TWISTED!
Roedy Green - 21 Mar 2006 22:51 GMT
>And what exactly is Jet? I know, I know, google it, but somehow I doubt
>page after page of aeronautical Web sites will be enlightening in this
>instance.

see http://mindprod.com/jgloss/jet.html
and http://mindprod.com/jgloss/aot.html
Signature

Canadian Mind Products, Roedy Green.
http://mindprod.com Java custom programming, consulting and coaching.

Twisted - 22 Mar 2006 10:00 GMT
Ugh. They want you to pay money? Even for noncommercial/freeware
development, open source, personal use, etc.???

Forget it. Especially as Sun's HotSpot with -server seems to perform
same as native C, and will be far more portable.

--
I am the terror that flaps in the net!
I am the broken build that dies without a stack trace!
I am TWISTED!
Roedy Green - 23 Mar 2006 01:09 GMT
On Fri, 17 Mar 2006 19:58:54 +0000, Thomas Hawtin
<usenet@tackline.plus.com> wrote, quoted or indirectly quoted someone
who said :

>1.5.0_06-b05, Client: 12216, 12182, 12174, 12186
>1.5.0_06-b05, Server: 11079, 4210, 4207, 4231
>1.6.0-beta2-b76, Client: 10203, 10191, 10208, 10247
>1.6.0-beta2-b76, Server: 12675, 12668, 5484, 5491
>g++ (GCC) 4.0.0 20050519 (Red Hat 4.0.0-8), -O3: 6647.844000
>   (using commented out code: 5849.171000)

here are my results on Win2K.

java 1.6 -client                          11016 11046 11032 11047
java  jdk1.6.0\bin] -server      12781 12766  6516  6500
Java  jdk1.6.0\jre\bin -server  12391  12453  6500  6500
Jet 4.1                                         4656 4656  4656 4657

So Jet is faster than Hotspot by a  factor of 2.7 to start and by 1.4
after HotSpot warms up.

Here is the key to Jet's speed: it unravelled the inner loop to handle
an odd/even pair in one iteration.

L10:
       add     ebx, 16(eax, esi, 4) ; bypass 16 bytes of overhead
       add     ebx, 20(eax, esi, 4) ; indexing by 4-byte groups
       add     esi,2
       cmp    esi,ecx
       jl         L10

The unraveling likely does more than cut your cmp/jmp overhead in
half. It gives the pipeline a little extra time to get the second
operand ready..
Signature

Canadian Mind Products, Roedy Green.
http://mindprod.com Java custom programming, consulting and coaching.

Twisted - 23 Mar 2006 11:43 GMT
Is there an open source equivalent?

--
I am the terror that flaps in the net!
I am the tiny kitten that pees in your shoe!
I am TWISTED!
Roedy Green - 23 Mar 2006 21:12 GMT
>Is there an open source equivalent?

There are only two AOT compilers left standing. See
http://mindprod.com/jgloss/aot.html
Signature

Canadian Mind Products, Roedy Green.
http://mindprod.com Java custom programming, consulting and coaching.

Thomas Hawtin - 15 Mar 2006 12:56 GMT
> This even applies to basic math calcs, without e.g. arrays (and thus
> bounds-checking) and objects (dynamic dispatch, null pointer checking)?

Bounds checking is exceptionally cheap. It's a register-register compare
and an untaken conditional. It can be hoisted out of inner loops, but
because it's such a cheap operation there isn't a enormous benefit.

Dynamic dispatch. A decent performing JVM will inline methods. It's not
as if virtual functions often cause problems in C++ anyway.

Null pointer checking is similarly simple. Mostly it's a case of letting
the memory management unit trap the page fault.

Probably the worst thing is object's memory layout. The inability to
keep one object within the memory allocated to another. Think Complex[].

Tom Hawtin
Signature

Unemployed English Java programmer
http://jroller.com/page/tackline/

Roedy Green - 15 Mar 2006 20:59 GMT
On Wed, 15 Mar 2006 11:56:55 +0000, Thomas Hawtin
<usenet@tackline.plus.com> wrote, quoted or indirectly quoted someone
who said :

>Bounds checking is exceptionally cheap. It's a register-register compare
>and an untaken conditional. It can be hoisted out of inner loops, but
>because it's such a cheap operation there isn't a enormous benefit.

Since Java's array elements are always powers of two, you can get the
address offset from the index by a simple shift.  Some hardware
architectures even give you that shift for free.  In languages where
you can have arrays of objects rather than arrays of references, you
have to do a full multiply. There it becomes really important to
convert the multiply to an add each time through the loop, which chews
up some of your precious registers.

As a side effect of this sort of hoisting you can eliminate the bounds
checks. They are built in to the loop termination check.
Signature

Canadian Mind Products, Roedy Green.
http://mindprod.com Java custom programming, consulting and coaching.



Free Magazines

Get these publications absolutely FREE for up to 12 months. There are no hidden fees and no obligation. Simply choose a title, complete the application form and submit it. Read more ...

Oracle MagazineNetwork ComputingComputer WorldBio-IT WorldeWeekInformation WeekInfosecurity
 
Sign In
Join
My Latest Posts
My Monitored Threads
My Blog
My Photo Gallery
My Profile
My Homepage

Start New Thread
Enable EMail Alerts
Rate this Thread



©2008 Advenet LLC   Privacy Policy - Terms of Use
This website includes both content owned or controlled by Advenet as well as content owned or controlled by third parties.