Home | Contact Us | FAQ | Search & Site Map | Link to Us
Sign In | Join | Other 45 Sites in Network
HomeAnnouncementsWhite Papers
Discussion GroupsFirst AidDatabasesJavaBeansGUIJava 3DVirtual MachineCORBASecurityToolsGeneral
Java DirectoryOpen Source ProjectsSample Book ChaptersUser GroupsWeb Resources
Related Topics
Databases.NETMore Topics ...

Java Forum / General / July 2006

Tip: Looking for answers? Try searching our database.

Arrays vs Buffers

Thread view: 
daniel.w.gelder@gmail.com - 10 Jul 2006 05:29 GMT
As an old-tymey C and C++ programmer I find Java very nice for
implementing algorithms in, with its garbage collection and array
bounds checking. However, as said old-tymey programmer, I end up
wanting to challenge time and space by building a super optimized
version of said algorithm.

I find Java not very good with this, as the array bounds checking seems
to be a tremendous bottleneck when you're trying to write close to the
metal. The requirement that arrays and objects initialize to clear
memory is no slowdown since I can just use the factory model and cache
the memory for such things.

I looked into Buffers, and they seem to be designed for the purpose of
making reads and writes more deterministic. Do any JVMs actually use
the properties of buffers to let the outputted JIT code run directly?
Or would that be asking too much.

HOW do I get close to the metal in Java? Or at least pretend to?

Thanks.
Dan
Stefan Ram - 10 Jul 2006 05:43 GMT
Daniel W. Gelder writes:
>[How] do I get close to the metal in Java?

     »There have been numerous improvements made to the HotSpot
     Server VM's byte-code to native-code compiler. Some of the
     new optimizations include:

       · Array bounds check elimination«

http://java.sun.com/j2se/1.4.2/performance.guide.html

 See

http://google.to/search?q=%22array+bounds+check+elimination%22

 Some Java-implementations (like maybe gcj) also might
 offer switches to turn off checks.
Robert Klemme - 10 Jul 2006 06:41 GMT
> As an old-tymey C and C++ programmer I find Java very nice for
> implementing algorithms in, with its garbage collection and array
> bounds checking. However, as said old-tymey programmer, I end up
> wanting to challenge time and space by building a super optimized
> version of said algorithm.

Are you trying to optimize to see how fast Java can go or do you have
actual performance problems?

> I find Java not very good with this, as the array bounds checking seems
> to be a tremendous bottleneck when you're trying to write close to the
> metal.

As Stefan pointed out already, JVM's nowadays do some optimizations in
this area.  Since you write "seems" I assume at the moment you did not
yet have evidence that backs this.

> The requirement that arrays and objects initialize to clear
> memory is no slowdown since I can just use the factory model and cache
> the memory for such things.

Actually you cannot cache memory in Java but you can cache objects.
However, you might be surprised that this *may* actually have adversary
effects depending on your algorithm.  Java's memory handling and GC is
quite advanced and optimized for dealing with a large percentage of
short lived objects.  Unfortunately I don't have my Java bookmarks
around but with a bit of research you'll find those articles at Sun's
and IBM's web site.

> I looked into Buffers, and they seem to be designed for the purpose of
> making reads and writes more deterministic. Do any JVMs actually use
> the properties of buffers to let the outputted JIT code run directly?
> Or would that be asking too much.

I'm sorry, I'm not sure I understand you completely here.  Can you
rephrase?  To try an answer: Buffer is part of NIO which is intended for
optimizing IO speed.  I can't see a direct relation to the JVM running
JIT compiled code here.  This code doesn't even need to reside on disk
to be run.  Also, we're talking about to layers here, Java and the JVM
implementation.  Although of course certain features of the JVM are used
to implement Java features I'm sure there's rarely a direct connection,
and especially won't the JVM use Java classes for it's work - rather the
other way round.

> HOW do I get close to the metal in Java? Or at least pretend to?

JNI is always an option if you really need the speed.

Kind regards

    robert
Chris Uppal - 10 Jul 2006 12:36 GMT
> As an old-tymey C and C++ programmer I find Java very nice for
> implementing algorithms in, with its garbage collection and array
> bounds checking. However, as said old-tymey programmer, I end up
> wanting to challenge time and space by building a super optimized
> version of said algorithm.

And presumably the single biggest thing you consider is cache (and perhaps
pipeline) behaviour ?

> I find Java not very good with this, as the array bounds checking seems
> to be a tremendous bottleneck when you're trying to write close to the
> metal.

Leaving aside the possibility, mentioned by other, that the bounds checking is
often optimised out, have you considered how much overhead it would add even if
all the checks were left in ?   Taking into account the fact that the bounds
will be in L1 cache at worst, that the index will be in register, and that you
are running on a machine with pipelined execution and with speculative
execution (I'm assuming a modern P-class chip); the bounds check may actually
be free.

> The requirement that arrays and objects initialize to clear
> memory is no slowdown since I can just use the factory model and cache
> the memory for such things.

It may be worth caching large arrays if your code doesn't require the
initialise-to-zero semantics.  I wouldn't bother with pooling other objects for
two reasons -- one is that it interferes with GC and other optimisations, so it
may cause your code to slow down.  The other is that I suspect that eliminating
redundant field initialisation code is one of the optimisations the JIT
performs.

   -- chris
daniel.w.gelder@gmail.com - 13 Jul 2006 07:24 GMT
> > The requirement that arrays and objects initialize to clear
> > memory is no slowdown since I can just use the factory model and cache
[quoted text clipped - 6 lines]
> redundant field initialisation code is one of the optimisations the JIT
> performs.

I'm doing graphics programming, not "business logic". Right now I'm
deep in an algorithm to compare and operate on multiple lists of
multiple polygons with multiple holes going to arbitrary depths. Should
I use Linked Lists? Or arrays? Or both? Why? Where? Actually, the
simplest way is to put the data in a two-dimensional array that has its
own linked-list logic. Stuff like this was what C was designed for.

Given this two-dimensional array, how do I avoid the overhead of having
the JITter check its bounds every time? Well, I'm obviously going to
have to create a one-dimensional array and use the old
[row*width+column] access trick, since if I let Java do the [][] thing,
it's going to be checking for a null pointer in the row field every
time, don't want that.

Next problem: the jitter may be able to analyze a function like this:

void function(int[] array, int n)
{
 for (int i = 0; i < n; i++)
  {
     something(array[i]);
  }
}

...and see that the only one real check needed, specifically whether
"int[] array" is null and of length >= n, can be moved outside the
loop. Goodie. But what about this?

void function(int[] array, int n)
{
 for (int i = 0; i < n; i++)
  {
     something(array, i);
  }
}

The check can be moved outside, but does the something() call have to
do the same check if *it* accesses array[i]?

Is there an obscure "contract" anywhere designed to enumerate the
conditions where the array needs to be rechecked?

As for JNI, J-N-O. I'd rather just run j2c and never even look back.

Dan
Chris Uppal - 13 Jul 2006 09:48 GMT
> I'm doing graphics programming, not "business logic".

I don't think anyone replying in this thread had jumped to any conclusions at
all about what kind of programming you were doing.

> Right now I'm
> deep in an algorithm to compare and operate on multiple lists of
> multiple polygons with multiple holes going to arbitrary depths. Should
> I use Linked Lists? Or arrays? Or both? Why? Where? Actually, the
> simplest way is to put the data in a two-dimensional array that has its
> own linked-list logic. Stuff like this was what C was designed for.

Fair enough.  And probably a good idea if the dataset is large or the access is
performance critical.

> Given this two-dimensional array, how do I avoid the overhead of having
> the JITter check its bounds every time?

I doubt whether that overhead is significant in relation of the overhead of an
extra level of indirection on (potentially) every array access.

> Well, I'm obviously going to
> have to create a one-dimensional array and use the old
> [row*width+column] access trick, since if I let Java do the [][] thing,
> it's going to be checking for a null pointer in the row field every
> time, don't want that.

I doubt whether that overhead is significant in relation of the overhead of an
extra level of indirection on (potentially) every array access.

I.e. (and on the continuing assumption that this is performance-critical code)
I think you are doing the right thing, but for the wrong reasons.

> Next problem: the jitter may be able to analyze a function [...]
> But what about this?
[quoted text clipped - 6 lines]
>    }
> }

> The check can be moved outside, but does the something() call have to
> do the same check if *it* accesses array[i]?

No idea.  Probably.  Unless the call to something() gets inlined.

I repeat: have you any evidence to suggest that these checks matter /in
practise/ ?  Maybe it does, I don't know.   I would expect (admittedly with no
more evidence than you have yet adduced ;-) that the effect would be trivial at
worst.

> Is there an obscure "contract" anywhere designed to enumerate the
> conditions where the array needs to be rechecked?

Nope.  Or rather, yes -- every array access has to be bounds checked (but, of
course, implementations are allowed to cheat if they can get away with it
undetectably).

   -- chris
daniel.w.gelder@gmail.com - 14 Jul 2006 03:10 GMT
> I repeat: have you any evidence to suggest that these checks matter /in
> practise/ ?  Maybe it does, I don't know.   I would expect (admittedly with no
> more evidence than you have yet adduced ;-) that the effect would be trivial at
> worst.

I went ahead and tried that...unfortunately I can't get gcvt() to work
right in stdlib...maybe someone knows a better way to get the time out.

Thx

********* OK, here's the Java file: ***********

public class TestClass {
    int nextPointer(int[] array, int point)
        {
            return array[point];
        }
    public TestClass()
        {
        int[] array = new int[3000];
        for (int i = 0; i < 3000; i++)
            {
                array[i] = (int)(Math.random() * 3000d) % 3000;
            }

        for (int times = 0; times < 10; times++)    // interp vs. comp
            {
                long start = System.currentTimeMillis();
                int pointer = 0;
                for (int i = 0; i < 10000000; i++)
                    {
                        pointer = nextPointer(array, pointer);
                    }
                System.out.println(System.currentTimeMillis() - start);
            }
        }
}

********** And here's a C++ file: **********

#include <stdio.h>
#include <stdlib.h>
#include <time.h>

int nextPointer(int* array, int point);
int nextPointer(int* array, int point)
    {
        return array[point];
    }

int main(void)
{
    printf("Hello World, this is CodeWarrior!\n");

    int* array = new int[3000];
    for (int i = 0; i < 3000; i++)
        {
            array[i] = (int)(rand()) % 3000;
        }

    for (int times = 0; times < 10; times++)    // interp vs. comp
        {
            clock_t start = clock();

            int pointer = 0;
            for (int i = 0; i < 10000000; i++)
                {
                    pointer = nextPointer(array, pointer);
                }

            printf(gcvt(difftime(clock(), start) * (double)1000, 5, new
char[100]));
            printf("\n");
        }
       
    return 0;
}
Chris Uppal - 14 Jul 2006 14:51 GMT
> I went ahead and tried that...unfortunately I can't get gcvt() to work
> right in stdlib...maybe someone knows a better way to get the time out.

I changed your code around a bit, increased the nuimber of loops in the inner
loop, and re-arranged the code to allow the JIT to work without dependng on
on-stack replacement.  I also added a case where I explicitly coded a
bound-check in addition to whatever the compiler provided for me (I won't post
code here for reasons which will become apparent).

Comparing the following compilers:
   gcc is the MinGW version of gcc 3.4.2 running at -O3 (no
       other optimisation-related options).
   msvc is the C++ compiler from MS Visual Studio running
       with all time-releted optimisations turned on, bounds checking
       turned off, and optimising for a P4 processor.
   java is the "server" JVM from JDK 1.5.0_6

I got times (averaged over the last 10 of 20 runs -- which allows the JIT to
get its act together):
   gcc:               278.4    138.2
   msvc:             210.3    0.0
   java:               332.6    209.2

The first column is the time for the loop with explicit bounds checks, the
second column is the time without explict bounds checks.

Some things to note.

One is that I was surprised that only MSVC managed to optimise out the useless
loop -- normally java -server is inconveniently good at optimising away
benchmark code.  However, it is clear that we need a better benchmark.

Another is that the (hypothetical) two consecutive sets of bounds checks in the
java version don't get conflated into one check, nor made to vanish by the
processors internal caching and pipelining.

There's another important point -- in this loop, we are not really doing
anything /with/ the data, and the next array access is dependent on the value
retreived in the previous step.  This will not allow the processors pipeline to
work correctly, and is also extremely unrepresentative.

So I added a couple of lines of code to sum each value as it was pulled out of
the array, and then return the overall (meaningless) sum to the test harness
where it was used in a conditional.  That not only keeps the optimiser honest,
but also provides something for the processors arithmetic units to do while it
is waiting for the next element of the array to be pulled out of (probably) L1
cache.  Fortunately, the test array is small enough that we don't need to worry
about the random access into the array providing a seriously false load to the
L1 and L2 caches, or the benchmark would be completely useless.

I will append the C++ and Java code to the end of this message.

With the changes, the times I saw were:
   gcc:            272.4    207.2
   msvc:          342.6    205.2
   java:            380.6    211.3

(Again, that's the average of the last 10 runs -- I'll append the raw data to
this message too)

Notice:

There is very little difference between the runtimes for the three compilers
when there are no explicit bounds checks.

Java is still not managing to fold the explicit and implicit bounds checks
together, and neither is the processor managing to do so.  I find this very
surprising.  Maybe the JIT is emitting code to tell the CPU to expect the
explicit tests to fail, and thus buggering up the speculative execution that it
would otherwise be able to do.  The MS compiler seems as if it might be doing
something similar, but GNU isn't.  But that is pure (and not very willl
informed, to be honest) speculation, so who knows...

Java without explicit bounds checks is about as fast in this test as in the
previous one, thus supporting my feeling that the previous version wasn't
making representative use of the CPUs pipelining and out-of-order execution.

So, broadly speaking, I think the effect of Java's bounds checking can be
characterised as somewhere between trivial and zero.  Of course, any such
assertion must also note that this is only one micro-benchmark.  I have
attempted to make it at least minimally realistic, but for real purposes you
would need to measure real code working on real data, with real access
patterns.

   -- chris

============== TestClass.java =============--
public class TestClass
{
int
nextPointer(int[] array, int point)
{
 return array[point];
}

int
timeitWithoutCheck(int[] array)
{
 int pointer = 0;
 int sum = 0;
 for (int i = 0; i < 100000000; i++)
 {
  pointer = nextPointer(array, pointer);
  sum += pointer;
 }
 return sum; // meaningless
}

int
timeitWithCheck(int[] array, int size)
{
 int pointer = 0;
 int sum = 0;
 for (int i = 0; i < 100000000; i++)
 {
  if (pointer < 0) return 0;
  if (pointer >= size) return 0;
  pointer = nextPointer(array, pointer);
  sum += pointer;
 }
 return sum; // meaningless
}

public static void
main(String[] args)
{
 TestClass test = new TestClass();

 int[] array = new int[3000];
 for (int i = 0; i < 3000; i++)
  array[i] = (int)(Math.random() * 3000d) % 3000;

 for (int times = 0; times < 20; times++) // interp vs. comp
 {
  long start;

  start = System.currentTimeMillis();
  int s1 = test.timeitWithCheck(array, 3000);
  long with = System.currentTimeMillis() - start;

  start = System.currentTimeMillis();
  int s2 = test.timeitWithoutCheck(array);
  long without = System.currentTimeMillis() - start;

  System.out.println(with + "\t" + without);
  if (s1 != s2)
   System.out.println("more fun fooling the optimiser");
 }
}
}
============ TestClass.cpp ==================
#include <stdio.h>
#include <stdlib.h>
#include <sys/timeb.h>

int
nextPointer(int *array, int point)
{
return array[point];
}

int
timeitWithoutCheck(int *array)
{
int pointer = 0;
int sum = 0;
for (int i = 0; i < 100000000; i++)
{
 pointer = nextPointer(array, pointer);
 sum += pointer;
}
return sum; // meaningless
}

int
timeitWithCheck(int *array, int size)
{
int pointer = 0;
int sum = 0;
for (int i = 0; i < 100000000; i++)
{
 if (pointer < 0) return 0;
 if (pointer >= size) return 0;
 pointer = nextPointer(array, pointer);
 sum += pointer;
}
return sum; // meaningless
}

long
currentTimeMillis()
{
struct timeb times;

ftime(&times);
return times.time * 1000 + times.millitm;
}

int
main(int argc, char **argv)
{
int* array = new int[3000];
for (int i = 0; i < 3000; i++)
array[i] = (int)(rand()) % 3000;

for (int times = 0; times < 20; times++) // interp vs. comp
{
 long start;

 start = currentTimeMillis();
 int s1 = timeitWithCheck(array, 3000);
 long with = currentTimeMillis() - start;

 start = currentTimeMillis();
 int s2 = timeitWithoutCheck(array);
 long without = currentTimeMillis() - start;

 printf("%ld\t%ld\n", with, without);
 if (s1 != s2)
  printf("more fun fooling the optimiser\n");
}
}
================ Raw data =====================
GCC -O3:
450     211
270     261
310     200
281     200
280     201
280     200
271     210
270     211
280     200
281     200
271     210
270     210
270     210
271     210
270     201
280     200
281     200
270     211
270     210
271     210

MS VS2003 + all time opts
521     210
340     201
340     210
341     200
341     200
341     200
440     201
350     201
340     210
341     210
341     200
350     201
350     200
341     210
341     200
341     210
340     211
340     200
341     210
341     210

JAVA -server:
471     220
371     210
371     210
381     200
370     211
370     211
380     200
381     200
381     200
381     210
421     250
381     200
380     211
370     210
381     200
381     200
381     210
371     210
370     211
370     211
======================================
daniel.w.gelder@gmail.com - 15 Jul 2006 21:14 GMT
I'm pretty impressed...

But why is every Java chat room so slow? Bad programming? Ah well. :_)

Thanks so much for helping me out.

Dan


Free Magazines

Get these publications absolutely FREE for up to 12 months. There are no hidden fees and no obligation. Simply choose a title, complete the application form and submit it. Read more ...

Oracle MagazineNetwork ComputingComputer WorldBio-IT WorldeWeekInformation WeekInfosecurity
 
Sign In
Join
My Latest Posts
My Monitored Threads
My Blog
My Photo Gallery
My Profile
My Homepage

Start New Thread
Enable EMail Alerts
Rate this Thread



©2008 Advenet LLC   Privacy Policy - Terms of Use
This website includes both content owned or controlled by Advenet as well as content owned or controlled by third parties.