Java Forum / General / March 2006
Performance Q: java hotspot vs. native code
Twisted - 14 Mar 2006 20:05 GMT In each of these two cases, would optimized C run substantially faster than Java (hotspot or other JIT VM)?
* A number-crunching algorithm with a tight loop and a large number of iterations (from thousands potentially up to millions, or more) using doubles.
* Ditto, but with the C code using arrays of uints and carries to effect a high precision fixed-point math, and the Java code using BigDecimals.
* Ditto, but with roll-your-own Java BigDecimal-alikes using arrays and math.
If there's an even higher performance option (short of compile-to-FPGA
:)) for the high-precision cases, please let me know about that as well. (I know when it gets up into the 500+ digits it can be faster to use FFT for the multiplies -- O(n log n) vs O(n^2). I'll cross that one when I come to it.)
tom fredriksen - 14 Mar 2006 22:38 GMT > In each of these two cases, would optimized C run substantially faster > than Java (hotspot or other JIT VM)? You would be hard pressed to find anything that runs faster than C. The reason for this is simple, C is a low abstraction language only slightly more abstracted than assembler. This means it produces platform close code with few runtime hindrances. While any interpreted language will be reduced by active runtime checks and interpreter operations. This particularly applies to languages with higher abstraction levels than C, which is quite a few these days, e.g. perl, java, ruby, lisp etc. Even natively compiled java code would still be slower as it still needs runtime checks and so forth.
Of course algorithm is another of the most important factors, but if its the same in both implementations then C wins for the aforementioned reasons. The only way an any other language might win is if the language has an algorithmic enhancer which changes the code to an algorithm better than the one you have programmed, but that is not likely to happen. (The only thing I can think of for this to be true is f.ex. that javas regexp engine is faster than a similar regexp engine used in c, but that comes down to algorithm again.)
/tom
Twisted - 15 Mar 2006 03:34 GMT This even applies to basic math calcs, without e.g. arrays (and thus bounds-checking) and objects (dynamic dispatch, null pointer checking)?
tom fredriksen - 15 Mar 2006 09:08 GMT > This even applies to basic math calcs, without e.g. arrays (and thus > bounds-checking) and objects (dynamic dispatch, null pointer checking)? I don have a definitive answer for that, because it depends on some issues.
The first question is if the code is absolutely free of any support methods or mechanisms in the language that needs f.ex. runtime checks and controls.
The second is runtime environment. If its interpreted; then most likely, at least because of the interpreter operations. If its compiled to native code then it might happen.
If its only basic calcs with native data types, then I suggest you make a prototype in both languages and compare them, just make sure the codes are equal otherwise some language thing might make some difference.
For the sake of it I will try to figure out a prototype test in both language, and give it a go I will post it here, please do so aswell as we could have two different operations to compare wrt speed.
/tom
tom fredriksen - 15 Mar 2006 23:16 GMT >> This even applies to basic math calcs, without e.g. arrays (and thus >> bounds-checking) and objects (dynamic dispatch, null pointer checking)? > > For the sake of it I will try to figure out a prototype test in both > language, and give it a go I will post it here, please do so aswell as > we could have two different operations to compare wrt speed. I made a test which perform a simplified in_cksum calculation on a 64KB packet in a loop. It is done in both C and java (with -server and gcj (gcc java compiler)
The results where the following:
java -client 11.85 (java 1.5.0_04) java -server 11.90 gcj 12.26 (gcc 3.3.2 on linux 2.6.3) C integer 11.01 C float 7.23
So my advice would be to use C if you need absolute speed, but if you can accept a little reduction, then java might be ok if you are sticking with pure native datatypes and operations.
Since the in_cksum operation is entirely an integer operation the C float version is not really interesting (but I made a mistake to begin with so I thought it was a valid result)
The code is as follows:
(The C code has to be sligthly changed otherwise the array initailisation routine would have used float operations instead leading to the whole program being float operations, with invalid results.)
/***** JAVA *****/
public class Cksum { public static void main(String args[]) { Random rand = new Random(); int total = 0; int count = 65500; int data[] = new int[count];
for(int c=0; c<count; c++) { data[c] = rand.nextInt(2000000000); } long startTime = System.currentTimeMillis(); for(int d=0; d<50000; d++) { for(int c=0; c<count; c++) { total += data[c]; } } long endTime = System.currentTimeMillis(); System.out.println("Elapsed time (ms): " + (endTime - startTime)); System.out.println("Total: " + total); } }
/***** C *****/
#include <stdio.h> #include <stdlib.h> #include <sys/time.h>
int main(int argc, char *argv[]) { unsigned int total = 0; int count = 65500; unsigned int data[count]; unsigned int data2[count];
for(int c=0; c<count; c++) { /* data[c]=1.0+(unsigned int) (2000000000.0*rand()/(RAND_MAX+1.0)); */ data2[c] = data[c]; }
struct timeval start_time; struct timeval end_time; gettimeofday(&start_time, NULL);
for(int d=0; d<50000; d++) { for(int c=0; c<count; c++) { total += data[c]; } } gettimeofday(&end_time, NULL);
double t1=(start_time.tv_sec*1000)+(start_time.tv_usec/1000.0); double t2=(end_time.tv_sec*1000)+(end_time.tv_usec/1000.0);
printf("Elapsed time (ms): %.6lf\n", t2-t1); printf("Total: %u\n", total);
for(int c=0; c<100; c++) { printf("data2: %u ", data2[c]); } printf("\n"); return(0); }
Roedy Green - 16 Mar 2006 02:06 GMT >java -client 11.85 (java 1.5.0_04) >java -server 11.90 >gcj 12.26 (gcc 3.3.2 on linux 2.6.3) >C integer 11.01 >C float 7.23 have yo uposted the code for your benchmark. Iwould like to try it with Jet.
 Signature Canadian Mind Products, Roedy Green. http://mindprod.com Java custom programming, consulting and coaching.
tom fredriksen - 16 Mar 2006 09:43 GMT >> java -client 11.85 (java 1.5.0_04) >> java -server 11.90 [quoted text clipped - 4 lines] > have yo uposted the code for your benchmark. Iwould like to try it > with Jet. What do you mean? Its in the post at the end.
/tom
Roedy Green - 16 Mar 2006 19:15 GMT >/* data[c]=1.0+(unsigned int) (2000000000.0*rand()/(RAND_MAX+1.0)); */ I don't get it. What are you comparing? The algorithms are not even close. and you have the code commented out.
 Signature Canadian Mind Products, Roedy Green. http://mindprod.com Java custom programming, consulting and coaching.
tom fredriksen - 16 Mar 2006 19:34 GMT >> /* data[c]=1.0+(unsigned int) (2000000000.0*rand()/(RAND_MAX+1.0)); */ > > I don't get it. What are you comparing? The algorithms are not even > close. and you have the code commented out. What do you mean? I am comparing the following.
C:
for(int d=0; d<50000; d++) { for(int c=0; c<count; c++) { total += data[c]; } }
Java:
for(int d=0; d<50000; d++) { for(int c=0; c<count; c++) { total += data[c]; } }
All the other stuff appears outside the loop, so it is irrelevant. In the C code I have to use a different initialisation method to be sure I get integer operations that are comparable, but it should not affect the measurement. Thats the only difference. Just so you now, the algorithm is a simplified internet checksum used in tcp/ip, which does not take into account overflow, its just a simple test that performs some "credible" math operation.
/tom
Roedy Green - 16 Mar 2006 19:59 GMT >public class Cksum I ran it on my machine:
with Jet Elapsed time (ms): 4641 Total: 476899872
with Java 1.6 client Elapsed time (ms): 11297 Total: -1699311728
I would change the benchmark to Random rand = new Random(149); to give repeatable results. Then that total would verify the algorithm worked.
That makes Jet the clear winner, far faster than C. That is because Jet does loop unravelling.
 Signature Canadian Mind Products, Roedy Green. http://mindprod.com Java custom programming, consulting and coaching.
tom fredriksen - 17 Mar 2006 00:11 GMT >> public class Cksum > [quoted text clipped - 14 lines] > > That makes Jet the clear winner, far faster than C. I consider using Jet is cheating, if I had used other techniques to enhance the C code or had access to some C optimisers I am sure I could make it go faster as well. But I was trying to make an equal implementation/compilation comparison.
Some comments - did you try it with the java 1.5 which is production grade compared to 1.6, with server option as well. - where are the numbers for the C implementation on your machine otherwise you need to tell us what kind of machine are you using?
/tom
Roedy Green - 17 Mar 2006 01:02 GMT >I consider using Jet is cheating, if I had used other techniques to >enhance the C code or had access to some C optimisers I am sure I could >make it go faster as well. But I was trying to make an equal >implementation/compilation comparison. It is not cheating if the result is the same and there are no conditions under which the code does not work. It is simply using a superior compiler. If you want to compare languages it is silly comparing them with less than the best compilers. You are then artificially skewing the result by which inept compilers you choose.
The best compilers are limited only by the theoretical constraints of the language. The not so hot ones have all sorts of limitation nothing whatever to do with the language.
 Signature Canadian Mind Products, Roedy Green. http://mindprod.com Java custom programming, consulting and coaching.
tom fredriksen - 17 Mar 2006 01:08 GMT >> I consider using Jet is cheating, if I had used other techniques to >> enhance the C code or had access to some C optimisers I am sure I could [quoted text clipped - 10 lines] > the language. The not so hot ones have all sorts of limitation > nothing whatever to do with the language. But where are your numbers for the C version and what cpu are you running it on?
/tom
Roedy Green - 17 Mar 2006 03:38 GMT >But where are your numbers for the C version and what cpu are you >running it on? you don't need them. All you need in the ratio of Jet to Java -client.
 Signature Canadian Mind Products, Roedy Green. http://mindprod.com Java custom programming, consulting and coaching.
tom fredriksen - 17 Mar 2006 10:44 GMT >> But where are your numbers for the C version and what cpu are you >> running it on? > > you don't need them. All you need in the ratio of Jet to Java > -client. Yes if it had been me running it on my machine, but since it is running on a different machine with different software than I used to test it it does matter. F.ex java 6 is not yet as fast as java 5, it might contain other runtime enhancements than java 5 and so on.
In any case you mentioned Jet is using loop unrolling, which is a techniques which boils down to algorithm enhancing, so by that if the C code had been using loop unrolling too, the results would be different.
You have to compare an apple with an apple not with a sugar coated apple. My point is this, code enhancements can be categorised and you can not use enhancements from a different category because it significantly changes the comparison. Because then its not about comparing languages its about comparing different algorithms.
To perform proper comparison and measurements all test must run under same or similar conditions, you can not mix and switch as you desire and then make a claim.
/tom
Chris Uppal - 17 Mar 2006 11:28 GMT > To perform proper comparison and measurements all test must run under > same or similar conditions, you can not mix and switch as you desire and > then make a claim. This is true. But don't take it too far; if one implementation strategy makes certain types of automatic optimisation possible which are impossible to apply with another strategy, then the advantages of using those optimisations are legitimately part of the comparison. They don't turn it into an apples vs. oranges comparison.
E.g. if a JITing JVM can detect the availability of Intel SMP instructions, and dynamically choose to generate code which utilises them, then that is a legitimate advantage over another implementation (of Java, C, or anything else) which uses static compilation, and therefore does not generate comparable code.
Another example, also hypothetical. If the semantics of a language such as C are such that the compiler cannot perform automatic loop unrolling, whereas the semantics of another language are such that the compiler /can/ spot some opportunities unaided, then it's perfectly legitimate to compare the two implementations directly.
-- chris
tom fredriksen - 17 Mar 2006 12:51 GMT >> To perform proper comparison and measurements all test must run under >> same or similar conditions, you can not mix and switch as you desire and [quoted text clipped - 10 lines] > legitimate advantage over another implementation (of Java, C, or anything else) > which uses static compilation, and therefore does not generate comparable code. If the claim is "compile the fastest code you possibly get in C and Java" then yes you are right, but then you are discussing which language has come further along in their development of optimised code. Sort of like comparing a Ferrari to a Koenigsegg car.
It is another matter to do a test, but limit what one language can use and dont limit what another language can use. One car must be a Ford Mondeo or similar, but the other car can be a Ferrari or similar if it wants. Then you are not comparing speeds of comparable items.
> Another example, also hypothetical. If the semantics of a language such as C > are such that the compiler cannot perform automatic loop unrolling, whereas the > semantics of another language are such that the compiler /can/ spot some > opportunities unaided, then it's perfectly legitimate to compare the two > implementations directly. Since loop unrolling and smp systems make the test enter an entirely different class of performance, you can not use those techniques unless both tests are using them. Otherwise its not a race, its a slaughter:)
When setting out on a project to perform a test, a statement of what the test is to perform must be decided. After that is done, it must be made sure that the test is objective and comparative based on the premise of the test. I am not claiming that my test absolutely adheres to those two criteria, basically because its an informal test, but I did try to make it relatively comparable. But there are of course too many variables in a proper test for me to undertake now. Because you would have to classify all compiler and programming techniques etc and decide which are applicable for the test to be objective and so on.
/tom
Roedy Green - 17 Mar 2006 20:47 GMT >If the claim is "compile the fastest code you possibly get in C and >Java" then yes you are right, but then you are discussing which language [quoted text clipped - 5 lines] >Mondeo or similar, but the other car can be a Ferrari or similar if it >wants. Then you are not comparing speeds of comparable items. If you introduce handicaps, YOU are rigging the outcome. You are not really measuring anything objective. You are tricking people into accepting your test as an objective measure of merit.
What counts is which performs best in the real world. Your job is to make the test as reflective as possible of the real world, not to make decisions on which optimisation techniques count as valid, unless for some reason a technique could not actually be used in the real world.
That is why, for example, you make the tests add and print results so the optimiser can't discard code in the test, which it could not do in the real world. You do that by making the test more realistic, not by disqualifying an optimiser.
.
 Signature Canadian Mind Products, Roedy Green. http://mindprod.com Java custom programming, consulting and coaching.
tom fredriksen - 17 Mar 2006 21:25 GMT > What counts is which performs best in the real world. Your job is to > make the test as reflective as possible of the real world, not to make > decisions on which optimisation techniques count as valid, unless for > some reason a technique could not actually be used in the real world. That would have been true if the point of the test was "get the best performance you can get of these two languages", but it was not it was an informal comparison to chart the landscape.
> That is why, for example, you make the tests add and print results so > the optimiser can't discard code in the test, which it could not do in > the real world. That has nothing to do with the rigging the test, it helps set up a comparable test, and you know it. Stick to the facts, not what suits your arguments.
> You do that by making the test more realistic, not by > disqualifying an optimiser. Of course it is entirely possible to implement another test which does exhibit such behaviour. please do so then, I have accomplished what I want. If you want something else then feel free to do so or not.
/tom
Scott Ellsworth - 20 Mar 2006 21:20 GMT > > What counts is which performs best in the real world. Your job is to > > make the test as reflective as possible of the real world, not to make [quoted text clipped - 4 lines] > performance you can get of these two languages", but it was not it was > an informal comparison to chart the landscape. Right, and part of the landscape is the available tool set.
BEA's JRockit has not been ported to the Mac, so my interest in it is minimal. GCC is on my platform, so my interest in it, especially with a reasonable optimization set, is high.
Similarly, someone on a platform where Jet works is going to be interested in it, while on an unsupported platform, it does them little good. It is not part of the landscape that they want charted.
So, whether _you_ find JRockit, Jet, or GCC with certain optimizations on useful for your purposes, it is still a valid comparison for some potential users. It lets them chart their landscape.
Scott
 Signature Scott Ellsworth scott@alodar.nospam.com Java and database consulting for the life sciences
Roedy Green - 17 Mar 2006 20:35 GMT >To perform proper comparison and measurements all test must run under >same or similar conditions, you can not mix and switch as you desire and >then make a claim. The result I was talking about was a factor of 4 faster. No fine detail is going to change that.
You remind me of a skinny kid named Ritchie Dowrey at whose house we used to play football. He owned the football. One every play he had a "new rule" that always favoured his team. Nobody knew enough to challenge him.
The theme was echoed in the movie Can Hieronymus Merkin Ever Forget Mercy Humppe and Find True Happiness?
You are making your rules up on the fly to generate your desired result. You are behaving like a religious fanatic distorting the evidence to produce a predecided conclusion.
Look at this from a practical point of view. You don't really care HOW a compiler gets its speed, all you care about is does it do the calculations faster. Therefore I dismiss your talk of the compiler "cheating".
 Signature Canadian Mind Products, Roedy Green. http://mindprod.com Java custom programming, consulting and coaching.
tom fredriksen - 17 Mar 2006 21:18 GMT > The result I was talking about was a factor of 4 faster. No fine > detail is going to change that. Now you are spreading FUD, microsoft style.
> You are making your rules up on the fly to generate your desired > result. You are behaving like a religious fanatic distorting the > evidence to produce a predecided conclusion. Enough with the personal characterisations! It makes you look like a fanatic desperately trying to convince everybody you are right.
> Look at this from a practical point of view. You don't really care > HOW a compiler gets its speed, all you care about is does it do the > calculations faster. Therefore I dismiss your talk of the compiler > "cheating". That's your prerogative, It still does not give you a statistically sound or objective result. Because you are controlling the results. I am not saying the measurement I am doing are perfect, just that they are more fair than what yours are.
But if you are convinced you are right, you can prove it by doing the following.
- implement loop unrolling and use a C optimiser, then run the tests again, then post the details of the code and optimiser used. - post the measurement numbers of both C tests.
if you can not do that, you can not prove fairness. You have nothing to loose because the Jet version is, according to you, superior anyway.
/tom
Thomas Hawtin - 17 Mar 2006 20:58 GMT > java -client 11.85 (java 1.5.0_04) > java -server 11.90 > gcj 12.26 (gcc 3.3.2 on linux 2.6.3) > C integer 11.01 > C float 7.23 I rewrote the Java version of the microbenchmark to be more realistic and conventional. My results
1.5.0_06-b05, Client: 12216, 12182, 12174, 12186 1.5.0_06-b05, Server: 11079, 4210, 4207, 4231 1.6.0-beta2-b76, Client: 10203, 10191, 10208, 10247 1.6.0-beta2-b76, Server: 12675, 12668, 5484, 5491 g++ (GCC) 4.0.0 20050519 (Red Hat 4.0.0-8), -O3: 6647.844000 (using commented out code: 5849.171000)
So what can we conclude? After a start up penalty, Sun's current Server HotSpot is much faster the C++. No. Microbenchmarks can help your understanding of how a particular compiler behaves. They are useless at determining the goodness of performance across languages.
Tom Hawtin
class Checksum { private static int core(int[] data) { int count = data.length; int total = 0;
for (int d=0; d<50000; d++) { for (int c=0; c<count; c++) { total += data[c]; } } return total; }
public static void main(String[] args) { java.util.Random rand = new java.util.Random(); int count = 65500; int[] data = new int[count];
for (int c=0; c<count; c++) { data[c] = rand.nextInt(2000000000); } for (int run=0; run<4; ++run) { long startTime = System.currentTimeMillis(); int total = core(data); long endTime = System.currentTimeMillis();
System.out.println("Elapsed time (ms): " + (endTime - startTime)); System.out.println("Total: " + total); } } }
 Signature Unemployed English Java programmer http://jroller.com/page/tackline/
Chris Uppal - 19 Mar 2006 15:43 GMT > So what can we conclude? After a start up penalty, Sun's current Server > HotSpot is much faster the C++. No. Microbenchmarks can help your > understanding of how a particular compiler behaves. They are useless at > determining the goodness of performance across languages. I got interested enough to reproduce Thomas's tests with a number of C++ compilers.
gcc running with -O3, and no other optimisation settings (life's too short even to read the man page!).
MS VC6, in "Release" mode, plus telling it to optimise for speed only, and to generate code targetting the "Pentium Pro" (the most modern target available).
MS VS 2003 in default "Release" mode. Note that this includes array overrun checking by default (presumably Tom considers this necessary foran apples to apples comparison -- although I don't).
MS VS 2003 in "Release" mode, plus telling it to generate code for a Pentium 4, and turning on all the other relevant-looking optimisations.
Java -client and -server. In both cases JDK 1.5.0
Results are:
gcc -O3 5458 5177 5278 5187 vc6 +opt 7020 6850 6759 6850 vs2003 3555 3385 3465 3385 vs2003 +opt 3635 3385 3385 3465 java -client 13770 13610 13699 13620 java -server 11456 3485 3365 3385
In all cases running on a 1.5 GHz celeron box. I haven't attempted to explore what would happen running the same code on diferent chips (especially AMD).
What can we conclude ? Well, provided we remember that this is only one very, very, specific test, and that other apparently similar tests might give very different results, I think it's obvious...
-- chris
Twisted - 21 Mar 2006 21:14 GMT At this point, it's looking like java -server is comparable to C++ with reasonably up-to-date stuff and integer math in a tight loop.
What about floating point math (say, a few adds and a couple mults) in a similar loop? How does it perform on different chips? Say (and someone in this group probably has access to each of these) -- Latest 32-bit Intel offering -- AMD Athlon same clock speed -- Athlon 64, same speed again -- dual core? (double the data length if you can make it use both cores; if you can't, report that fact.)
And what exactly is Jet? I know, I know, google it, but somehow I doubt page after page of aeronautical Web sites will be enlightening in this instance.
-- I am the terror that flaps in the net! I am the elusive window handle leak two hours before it's due to ship! I am TWISTED!
Roedy Green - 21 Mar 2006 22:51 GMT >And what exactly is Jet? I know, I know, google it, but somehow I doubt >page after page of aeronautical Web sites will be enlightening in this >instance. see http://mindprod.com/jgloss/jet.html and http://mindprod.com/jgloss/aot.html
 Signature Canadian Mind Products, Roedy Green. http://mindprod.com Java custom programming, consulting and coaching.
Twisted - 22 Mar 2006 10:00 GMT Ugh. They want you to pay money? Even for noncommercial/freeware development, open source, personal use, etc.???
Forget it. Especially as Sun's HotSpot with -server seems to perform same as native C, and will be far more portable.
-- I am the terror that flaps in the net! I am the broken build that dies without a stack trace! I am TWISTED!
Roedy Green - 23 Mar 2006 01:09 GMT On Fri, 17 Mar 2006 19:58:54 +0000, Thomas Hawtin <usenet@tackline.plus.com> wrote, quoted or indirectly quoted someone who said :
>1.5.0_06-b05, Client: 12216, 12182, 12174, 12186 >1.5.0_06-b05, Server: 11079, 4210, 4207, 4231 >1.6.0-beta2-b76, Client: 10203, 10191, 10208, 10247 >1.6.0-beta2-b76, Server: 12675, 12668, 5484, 5491 >g++ (GCC) 4.0.0 20050519 (Red Hat 4.0.0-8), -O3: 6647.844000 > (using commented out code: 5849.171000) here are my results on Win2K.
java 1.6 -client 11016 11046 11032 11047 java jdk1.6.0\bin] -server 12781 12766 6516 6500 Java jdk1.6.0\jre\bin -server 12391 12453 6500 6500 Jet 4.1 4656 4656 4656 4657
So Jet is faster than Hotspot by a factor of 2.7 to start and by 1.4 after HotSpot warms up.
Here is the key to Jet's speed: it unravelled the inner loop to handle an odd/even pair in one iteration.
L10: add ebx, 16(eax, esi, 4) ; bypass 16 bytes of overhead add ebx, 20(eax, esi, 4) ; indexing by 4-byte groups add esi,2 cmp esi,ecx jl L10
The unraveling likely does more than cut your cmp/jmp overhead in half. It gives the pipeline a little extra time to get the second operand ready..
 Signature Canadian Mind Products, Roedy Green. http://mindprod.com Java custom programming, consulting and coaching.
Twisted - 23 Mar 2006 11:43 GMT Is there an open source equivalent?
-- I am the terror that flaps in the net! I am the tiny kitten that pees in your shoe! I am TWISTED!
Roedy Green - 23 Mar 2006 21:12 GMT >Is there an open source equivalent? There are only two AOT compilers left standing. See http://mindprod.com/jgloss/aot.html
 Signature Canadian Mind Products, Roedy Green. http://mindprod.com Java custom programming, consulting and coaching.
Thomas Hawtin - 15 Mar 2006 12:56 GMT > This even applies to basic math calcs, without e.g. arrays (and thus > bounds-checking) and objects (dynamic dispatch, null pointer checking)? Bounds checking is exceptionally cheap. It's a register-register compare and an untaken conditional. It can be hoisted out of inner loops, but because it's such a cheap operation there isn't a enormous benefit.
Dynamic dispatch. A decent performing JVM will inline methods. It's not as if virtual functions often cause problems in C++ anyway.
Null pointer checking is similarly simple. Mostly it's a case of letting the memory management unit trap the page fault.
Probably the worst thing is object's memory layout. The inability to keep one object within the memory allocated to another. Think Complex[].
Tom Hawtin
 Signature Unemployed English Java programmer http://jroller.com/page/tackline/
Roedy Green - 15 Mar 2006 20:59 GMT On Wed, 15 Mar 2006 11:56:55 +0000, Thomas Hawtin <usenet@tackline.plus.com> wrote, quoted or indirectly quoted someone who said :
>Bounds checking is exceptionally cheap. It's a register-register compare >and an untaken conditional. It can be hoisted out of inner loops, but >because it's such a cheap operation there isn't a enormous benefit. Since Java's array elements are always powers of two, you can get the address offset from the index by a simple shift. Some hardware architectures even give you that shift for free. In languages where you can have arrays of objects rather than arrays of references, you have to do a full multiply. There it becomes really important to convert the multiply to an add each time through the loop, which chews up some of your precious registers.
As a side effect of this sort of hoisting you can eliminate the bounds checks. They are built in to the loop termination check.
 Signature Canadian Mind Products, Roedy Green. http://mindprod.com Java custom programming, consulting and coaching.
Free MagazinesGet these publications absolutely FREE for up to 12 months. There are no hidden fees and no obligation. Simply choose a title, complete the application form and submit it. Read more ...
|
|
|