Java Forum / General / July 2006
Arrays vs Buffers
daniel.w.gelder@gmail.com - 10 Jul 2006 05:29 GMT As an old-tymey C and C++ programmer I find Java very nice for implementing algorithms in, with its garbage collection and array bounds checking. However, as said old-tymey programmer, I end up wanting to challenge time and space by building a super optimized version of said algorithm.
I find Java not very good with this, as the array bounds checking seems to be a tremendous bottleneck when you're trying to write close to the metal. The requirement that arrays and objects initialize to clear memory is no slowdown since I can just use the factory model and cache the memory for such things.
I looked into Buffers, and they seem to be designed for the purpose of making reads and writes more deterministic. Do any JVMs actually use the properties of buffers to let the outputted JIT code run directly? Or would that be asking too much.
HOW do I get close to the metal in Java? Or at least pretend to?
Thanks. Dan
Stefan Ram - 10 Jul 2006 05:43 GMT Daniel W. Gelder writes:
>[How] do I get close to the metal in Java? »There have been numerous improvements made to the HotSpot Server VM's byte-code to native-code compiler. Some of the new optimizations include:
· Array bounds check elimination«
http://java.sun.com/j2se/1.4.2/performance.guide.html
See
http://google.to/search?q=%22array+bounds+check+elimination%22
Some Java-implementations (like maybe gcj) also might offer switches to turn off checks.
Robert Klemme - 10 Jul 2006 06:41 GMT > As an old-tymey C and C++ programmer I find Java very nice for > implementing algorithms in, with its garbage collection and array > bounds checking. However, as said old-tymey programmer, I end up > wanting to challenge time and space by building a super optimized > version of said algorithm. Are you trying to optimize to see how fast Java can go or do you have actual performance problems?
> I find Java not very good with this, as the array bounds checking seems > to be a tremendous bottleneck when you're trying to write close to the > metal. As Stefan pointed out already, JVM's nowadays do some optimizations in this area. Since you write "seems" I assume at the moment you did not yet have evidence that backs this.
> The requirement that arrays and objects initialize to clear > memory is no slowdown since I can just use the factory model and cache > the memory for such things. Actually you cannot cache memory in Java but you can cache objects. However, you might be surprised that this *may* actually have adversary effects depending on your algorithm. Java's memory handling and GC is quite advanced and optimized for dealing with a large percentage of short lived objects. Unfortunately I don't have my Java bookmarks around but with a bit of research you'll find those articles at Sun's and IBM's web site.
> I looked into Buffers, and they seem to be designed for the purpose of > making reads and writes more deterministic. Do any JVMs actually use > the properties of buffers to let the outputted JIT code run directly? > Or would that be asking too much. I'm sorry, I'm not sure I understand you completely here. Can you rephrase? To try an answer: Buffer is part of NIO which is intended for optimizing IO speed. I can't see a direct relation to the JVM running JIT compiled code here. This code doesn't even need to reside on disk to be run. Also, we're talking about to layers here, Java and the JVM implementation. Although of course certain features of the JVM are used to implement Java features I'm sure there's rarely a direct connection, and especially won't the JVM use Java classes for it's work - rather the other way round.
> HOW do I get close to the metal in Java? Or at least pretend to? JNI is always an option if you really need the speed.
Kind regards
robert
Chris Uppal - 10 Jul 2006 12:36 GMT > As an old-tymey C and C++ programmer I find Java very nice for > implementing algorithms in, with its garbage collection and array > bounds checking. However, as said old-tymey programmer, I end up > wanting to challenge time and space by building a super optimized > version of said algorithm. And presumably the single biggest thing you consider is cache (and perhaps pipeline) behaviour ?
> I find Java not very good with this, as the array bounds checking seems > to be a tremendous bottleneck when you're trying to write close to the > metal. Leaving aside the possibility, mentioned by other, that the bounds checking is often optimised out, have you considered how much overhead it would add even if all the checks were left in ? Taking into account the fact that the bounds will be in L1 cache at worst, that the index will be in register, and that you are running on a machine with pipelined execution and with speculative execution (I'm assuming a modern P-class chip); the bounds check may actually be free.
> The requirement that arrays and objects initialize to clear > memory is no slowdown since I can just use the factory model and cache > the memory for such things. It may be worth caching large arrays if your code doesn't require the initialise-to-zero semantics. I wouldn't bother with pooling other objects for two reasons -- one is that it interferes with GC and other optimisations, so it may cause your code to slow down. The other is that I suspect that eliminating redundant field initialisation code is one of the optimisations the JIT performs.
-- chris
daniel.w.gelder@gmail.com - 13 Jul 2006 07:24 GMT > > The requirement that arrays and objects initialize to clear > > memory is no slowdown since I can just use the factory model and cache [quoted text clipped - 6 lines] > redundant field initialisation code is one of the optimisations the JIT > performs. I'm doing graphics programming, not "business logic". Right now I'm deep in an algorithm to compare and operate on multiple lists of multiple polygons with multiple holes going to arbitrary depths. Should I use Linked Lists? Or arrays? Or both? Why? Where? Actually, the simplest way is to put the data in a two-dimensional array that has its own linked-list logic. Stuff like this was what C was designed for.
Given this two-dimensional array, how do I avoid the overhead of having the JITter check its bounds every time? Well, I'm obviously going to have to create a one-dimensional array and use the old [row*width+column] access trick, since if I let Java do the [][] thing, it's going to be checking for a null pointer in the row field every time, don't want that.
Next problem: the jitter may be able to analyze a function like this:
void function(int[] array, int n) { for (int i = 0; i < n; i++) { something(array[i]); } }
...and see that the only one real check needed, specifically whether "int[] array" is null and of length >= n, can be moved outside the loop. Goodie. But what about this?
void function(int[] array, int n) { for (int i = 0; i < n; i++) { something(array, i); } }
The check can be moved outside, but does the something() call have to do the same check if *it* accesses array[i]?
Is there an obscure "contract" anywhere designed to enumerate the conditions where the array needs to be rechecked?
As for JNI, J-N-O. I'd rather just run j2c and never even look back.
Dan
Chris Uppal - 13 Jul 2006 09:48 GMT > I'm doing graphics programming, not "business logic". I don't think anyone replying in this thread had jumped to any conclusions at all about what kind of programming you were doing.
> Right now I'm > deep in an algorithm to compare and operate on multiple lists of > multiple polygons with multiple holes going to arbitrary depths. Should > I use Linked Lists? Or arrays? Or both? Why? Where? Actually, the > simplest way is to put the data in a two-dimensional array that has its > own linked-list logic. Stuff like this was what C was designed for. Fair enough. And probably a good idea if the dataset is large or the access is performance critical.
> Given this two-dimensional array, how do I avoid the overhead of having > the JITter check its bounds every time? I doubt whether that overhead is significant in relation of the overhead of an extra level of indirection on (potentially) every array access.
> Well, I'm obviously going to > have to create a one-dimensional array and use the old > [row*width+column] access trick, since if I let Java do the [][] thing, > it's going to be checking for a null pointer in the row field every > time, don't want that. I doubt whether that overhead is significant in relation of the overhead of an extra level of indirection on (potentially) every array access.
I.e. (and on the continuing assumption that this is performance-critical code) I think you are doing the right thing, but for the wrong reasons.
> Next problem: the jitter may be able to analyze a function [...] > But what about this? [quoted text clipped - 6 lines] > } > }
> The check can be moved outside, but does the something() call have to > do the same check if *it* accesses array[i]? No idea. Probably. Unless the call to something() gets inlined.
I repeat: have you any evidence to suggest that these checks matter /in practise/ ? Maybe it does, I don't know. I would expect (admittedly with no more evidence than you have yet adduced ;-) that the effect would be trivial at worst.
> Is there an obscure "contract" anywhere designed to enumerate the > conditions where the array needs to be rechecked? Nope. Or rather, yes -- every array access has to be bounds checked (but, of course, implementations are allowed to cheat if they can get away with it undetectably).
-- chris
daniel.w.gelder@gmail.com - 14 Jul 2006 03:10 GMT > I repeat: have you any evidence to suggest that these checks matter /in > practise/ ? Maybe it does, I don't know. I would expect (admittedly with no > more evidence than you have yet adduced ;-) that the effect would be trivial at > worst. I went ahead and tried that...unfortunately I can't get gcvt() to work right in stdlib...maybe someone knows a better way to get the time out.
Thx
********* OK, here's the Java file: ***********
public class TestClass { int nextPointer(int[] array, int point) { return array[point]; } public TestClass() { int[] array = new int[3000]; for (int i = 0; i < 3000; i++) { array[i] = (int)(Math.random() * 3000d) % 3000; }
for (int times = 0; times < 10; times++) // interp vs. comp { long start = System.currentTimeMillis(); int pointer = 0; for (int i = 0; i < 10000000; i++) { pointer = nextPointer(array, pointer); } System.out.println(System.currentTimeMillis() - start); } } }
********** And here's a C++ file: **********
#include <stdio.h> #include <stdlib.h> #include <time.h>
int nextPointer(int* array, int point); int nextPointer(int* array, int point) { return array[point]; }
int main(void) { printf("Hello World, this is CodeWarrior!\n");
int* array = new int[3000]; for (int i = 0; i < 3000; i++) { array[i] = (int)(rand()) % 3000; }
for (int times = 0; times < 10; times++) // interp vs. comp { clock_t start = clock();
int pointer = 0; for (int i = 0; i < 10000000; i++) { pointer = nextPointer(array, pointer); }
printf(gcvt(difftime(clock(), start) * (double)1000, 5, new char[100])); printf("\n"); } return 0; }
Chris Uppal - 14 Jul 2006 14:51 GMT > I went ahead and tried that...unfortunately I can't get gcvt() to work > right in stdlib...maybe someone knows a better way to get the time out. I changed your code around a bit, increased the nuimber of loops in the inner loop, and re-arranged the code to allow the JIT to work without dependng on on-stack replacement. I also added a case where I explicitly coded a bound-check in addition to whatever the compiler provided for me (I won't post code here for reasons which will become apparent).
Comparing the following compilers: gcc is the MinGW version of gcc 3.4.2 running at -O3 (no other optimisation-related options). msvc is the C++ compiler from MS Visual Studio running with all time-releted optimisations turned on, bounds checking turned off, and optimising for a P4 processor. java is the "server" JVM from JDK 1.5.0_6
I got times (averaged over the last 10 of 20 runs -- which allows the JIT to get its act together): gcc: 278.4 138.2 msvc: 210.3 0.0 java: 332.6 209.2
The first column is the time for the loop with explicit bounds checks, the second column is the time without explict bounds checks.
Some things to note.
One is that I was surprised that only MSVC managed to optimise out the useless loop -- normally java -server is inconveniently good at optimising away benchmark code. However, it is clear that we need a better benchmark.
Another is that the (hypothetical) two consecutive sets of bounds checks in the java version don't get conflated into one check, nor made to vanish by the processors internal caching and pipelining.
There's another important point -- in this loop, we are not really doing anything /with/ the data, and the next array access is dependent on the value retreived in the previous step. This will not allow the processors pipeline to work correctly, and is also extremely unrepresentative.
So I added a couple of lines of code to sum each value as it was pulled out of the array, and then return the overall (meaningless) sum to the test harness where it was used in a conditional. That not only keeps the optimiser honest, but also provides something for the processors arithmetic units to do while it is waiting for the next element of the array to be pulled out of (probably) L1 cache. Fortunately, the test array is small enough that we don't need to worry about the random access into the array providing a seriously false load to the L1 and L2 caches, or the benchmark would be completely useless.
I will append the C++ and Java code to the end of this message.
With the changes, the times I saw were: gcc: 272.4 207.2 msvc: 342.6 205.2 java: 380.6 211.3
(Again, that's the average of the last 10 runs -- I'll append the raw data to this message too)
Notice:
There is very little difference between the runtimes for the three compilers when there are no explicit bounds checks.
Java is still not managing to fold the explicit and implicit bounds checks together, and neither is the processor managing to do so. I find this very surprising. Maybe the JIT is emitting code to tell the CPU to expect the explicit tests to fail, and thus buggering up the speculative execution that it would otherwise be able to do. The MS compiler seems as if it might be doing something similar, but GNU isn't. But that is pure (and not very willl informed, to be honest) speculation, so who knows...
Java without explicit bounds checks is about as fast in this test as in the previous one, thus supporting my feeling that the previous version wasn't making representative use of the CPUs pipelining and out-of-order execution.
So, broadly speaking, I think the effect of Java's bounds checking can be characterised as somewhere between trivial and zero. Of course, any such assertion must also note that this is only one micro-benchmark. I have attempted to make it at least minimally realistic, but for real purposes you would need to measure real code working on real data, with real access patterns.
-- chris
============== TestClass.java =============-- public class TestClass { int nextPointer(int[] array, int point) { return array[point]; }
int timeitWithoutCheck(int[] array) { int pointer = 0; int sum = 0; for (int i = 0; i < 100000000; i++) { pointer = nextPointer(array, pointer); sum += pointer; } return sum; // meaningless }
int timeitWithCheck(int[] array, int size) { int pointer = 0; int sum = 0; for (int i = 0; i < 100000000; i++) { if (pointer < 0) return 0; if (pointer >= size) return 0; pointer = nextPointer(array, pointer); sum += pointer; } return sum; // meaningless }
public static void main(String[] args) { TestClass test = new TestClass();
int[] array = new int[3000]; for (int i = 0; i < 3000; i++) array[i] = (int)(Math.random() * 3000d) % 3000;
for (int times = 0; times < 20; times++) // interp vs. comp { long start;
start = System.currentTimeMillis(); int s1 = test.timeitWithCheck(array, 3000); long with = System.currentTimeMillis() - start;
start = System.currentTimeMillis(); int s2 = test.timeitWithoutCheck(array); long without = System.currentTimeMillis() - start;
System.out.println(with + "\t" + without); if (s1 != s2) System.out.println("more fun fooling the optimiser"); } } } ============ TestClass.cpp ================== #include <stdio.h> #include <stdlib.h> #include <sys/timeb.h>
int nextPointer(int *array, int point) { return array[point]; }
int timeitWithoutCheck(int *array) { int pointer = 0; int sum = 0; for (int i = 0; i < 100000000; i++) { pointer = nextPointer(array, pointer); sum += pointer; } return sum; // meaningless }
int timeitWithCheck(int *array, int size) { int pointer = 0; int sum = 0; for (int i = 0; i < 100000000; i++) { if (pointer < 0) return 0; if (pointer >= size) return 0; pointer = nextPointer(array, pointer); sum += pointer; } return sum; // meaningless }
long currentTimeMillis() { struct timeb times;
ftime(×); return times.time * 1000 + times.millitm; }
int main(int argc, char **argv) { int* array = new int[3000]; for (int i = 0; i < 3000; i++) array[i] = (int)(rand()) % 3000;
for (int times = 0; times < 20; times++) // interp vs. comp { long start;
start = currentTimeMillis(); int s1 = timeitWithCheck(array, 3000); long with = currentTimeMillis() - start;
start = currentTimeMillis(); int s2 = timeitWithoutCheck(array); long without = currentTimeMillis() - start;
printf("%ld\t%ld\n", with, without); if (s1 != s2) printf("more fun fooling the optimiser\n"); } } ================ Raw data ===================== GCC -O3: 450 211 270 261 310 200 281 200 280 201 280 200 271 210 270 211 280 200 281 200 271 210 270 210 270 210 271 210 270 201 280 200 281 200 270 211 270 210 271 210
MS VS2003 + all time opts 521 210 340 201 340 210 341 200 341 200 341 200 440 201 350 201 340 210 341 210 341 200 350 201 350 200 341 210 341 200 341 210 340 211 340 200 341 210 341 210
JAVA -server: 471 220 371 210 371 210 381 200 370 211 370 211 380 200 381 200 381 200 381 210 421 250 381 200 380 211 370 210 381 200 381 200 381 210 371 210 370 211 370 211 ======================================
daniel.w.gelder@gmail.com - 15 Jul 2006 21:14 GMT I'm pretty impressed...
But why is every Java chat room so slow? Bad programming? Ah well. :_)
Thanks so much for helping me out.
Dan
Free MagazinesGet these publications absolutely FREE for up to 12 months. There are no hidden fees and no obligation. Simply choose a title, complete the application form and submit it. Read more ...
|
|
|