Home | Contact Us | FAQ | Search & Site Map | Link to Us
Sign In | Join | Other 45 Sites in Network
HomeAnnouncementsWhite Papers
Discussion GroupsFirst AidDatabasesJavaBeansGUIJava 3DVirtual MachineCORBASecurityToolsGeneral
Java DirectoryOpen Source ProjectsSample Book ChaptersUser GroupsWeb Resources
Related Topics
Databases.NETMore Topics ...

Java Forum / Virtual Machine / March 2006

Tip: Looking for answers? Try searching our database.

Optimal x86-32 Sun Hotspot code generation?

Thread view: 
Adam Warner - 24 Mar 2006 02:14 GMT
Hi all,

I'm trying to create the fastest way to cast a Java long to an int while
preserving array bounds checking. This is the approach I suspect should be
optimal:

   public final static int toIntIndex(long index) {
       int high=(int) (index>>>32);
       if (high!=0) throw new ArrayIndexOutOfBoundsException();
       return (int) index;
   }

If the long index is positive and in int range high will be 0. If the
long index is negative then high will be non-zero. If the long index is
between 2^31 and below 2^32 then it will pass this test but still be
caught by Java's int bounds checking.

I believe the unsigned right shift by 32 should permit the test to be
conducted upon the 32 most significant bits of the 64-bit value, that is
no shift should actually be performed on 32-bit platforms.

I've induced the Sun Mustang b75 HotSpot debug server JIT to compile
toIntIndex and this is the generated assembly:

{method}
- klass: {other class}
- method holder:     'LongIndex'
- constants:         0x0693c688{constant pool}
- access:            0x81000019  public static final
- name:              'toIntIndex'
- signature:         '(J)I'
- max stack:         3
- max locals:        3
- size of params:    2
- method size:       22
- vtable index:      -2
- code size:         21
- code start:        0xb0e45210
- code end (excl):   0xb0e45225
- method data:       0xb0e47a58
- checked ex length: 0
- linenumber start:  0xb0e45225
- localvar length:   0
#
#  int ( long, half )
#
#r063 ESP+20: parm 0: long
#r062 ESP+16: parm 0: long
# -- Old ESP -- Framesize: 16 --
#r061 ESP+12: return address
#r060 ESP+ 8: pad2, in_preserve
#r059 ESP+ 4: pad2, in_preserve
#r058 ESP+ 0: pad2, in_preserve
#
abababab   N1: #        B1 <- B3 B2  Freq: 6.66667
abababab
000   B1: #     B3 B2 <- BLOCK HEAD IS JUNK   Freq: 6.66667
000     # stack bang
       PUSHL  EBP
       SUB    ESP,8    # Create frame
00e     MOV    ECX,[ESP + #16]
       MOV    EBX,[ESP + #20]
016     MOV    ECX.lo,ECX.hi
       SHR    ECX.lo,#32-32
       XOR    ECX.hi,ECX.hi
01a     MOV    ECX,ECX.lo
01a     TEST   ECX,ECX
01c     Jne,s  B3  P=0.000000 C=4.466667
01c
01e   B2: #     N1 <- B1  Freq: 4.46666
01e     MOV    ECX,[ESP + #16]
       MOV    EBX,[ESP + #20]
026     MOV    EAX,ECX.lo
028     ADD    ESP,8    # Destroy frame
       POPL   EBP
       TEST   PollPage,EAX     ! Poll Safepoint

032     RET
032
033   B3: #     N1 <- B1  Freq: 1e-06
033     MOV    ECX,#-67
038     NOP    # Pad for loops and calls
039     NOP    # Pad for loops and calls
03a     NOP    # Pad for loops and calls
03b     CALL,static  wrapper for: uncommon_trap
       # LongIndex::toIntIndex @ bci:10  L0=_ L1=_ L2=_
       #
040     INT3   ; ShouldNotReachHere
040

This of course is the non-inlined version of toIntIndex. I don't
understand some of the disassembly syntax (.hi, .lo?) but it at least
appears clear that a redundant "SHR ECX.lo,#32-32" instruction is being
generated. I'd appreciate confirmation my reasoning is correct/this is an
actual inefficiency before filing any report with Sun.

Regards,
Adam
Brendan - 24 Mar 2006 11:22 GMT
Hi,

Does this thing have an optimizer that you forgot to turn on?

The stack frame is a waste of time, they've inserted padding in code
that should never run, the branch prediction is wrong (forward branches
are assumed to be taken), the register usage and chosen instructions
are a joke, etc.

   <some alignment here if you like>
convertSignedLongToUnsignedInt:
   cmp dword [esp+8],0
   jne .withinBounds
   MOV    ECX,#-67
   CALL,static  wrapper for: uncommon_trap
   INT3   ; ShouldNotReachHere

   <some alignment here if you like>
.withinBounds:
   mov eax,[esp+4]
   TEST   PollPage,EAX     ! Poll Safepoint  ;Don't know what this is
meant to do! :-)
   ret

Cheers,

Brendan
Adam Warner - 25 Mar 2006 00:21 GMT
> Hi,
>
[quoted text clipped - 4 lines]
> are assumed to be taken), the register usage and chosen instructions are
> a joke, etc.

I now realise it's a Catch 22. The undocumented option
-XX:+PrintOptoAssembly "is not final ASM code but it's very close":
<http://www.javalobby.org/java/forums/m91938827.html>

But this undocumented option is only available in the fastdebug builds. I
remember reading somewhere that Sun does not have legal permission to
distribute the disassembler with their release products. Thus one can only
disassemble code generated by these builds:
<http://blogs.sun.com/roller/page/kto?entry=mustang_jdk_6_0_fastdebug>

"So using a fastdebug build might provide some information you wouldn't
get from running a product build. It is slower, but no where near as slow
as a debug build. The optimization isn't as high as with the product
build, but since the assert checking and debug code exists in these
builds, the code isn't the same anyway."

This explains the redundant stack frame and likely invalidates any
inference one can make about the quality of release build assembly code.
I apologise for not appreciating this earlier.

Regards,
Adam
Chris Uppal - 24 Mar 2006 11:58 GMT
> I believe the unsigned right shift by 32 should permit the test to be
> conducted upon the 32 most significant bits of the 64-bit value, that is
> no shift should actually be performed on 32-bit platforms.

I'm somewhat puzzled by this sentence.  I may well be misunderstanding you but
it sounds as if you assume that an int is 64-bit on a 64-bit platform or
possibly that a long is 32-bit on a 32-bit platform.  That's not the case: ints
are 32-bit, and longs 64-bit, on every platform.

   -- chris
Adam Warner - 24 Mar 2006 23:56 GMT
>> I believe the unsigned right shift by 32 should permit the test to be
>> conducted upon the 32 most significant bits of the 64-bit value, that
[quoted text clipped - 4 lines]
> platform or possibly that a long is 32-bit on a 32-bit platform.  That's
> not the case: ints are 32-bit, and longs 64-bit, on every platform.

I had a mental model of the long being transferred in two 32-bit registers
on a 32-bit platform. Let's call the registers H and L and write the long
as HL. In a higher level language to obtain the 32 most significant bits
of the long HL one could unsigned shift the long right by 32 and perhaps
cast the result to 32 bits. But at the lower level I was hoping the
compiler would say "let's just return the value of H".

Whether the value of H could be returned without shifting on a 64-bit
platform could depend upon whether the architecture permits 64-bit
registers to be accessed as independent 32-bit registers (which is why I
made that qualification).

Regards,
Adam
Roedy Green - 25 Mar 2006 01:17 GMT
>I had a mental model of the long being transferred in two 32-bit registers
>on a 32-bit platform. Let's call the registers H and L and write the long
>as HL. In a higher level language to obtain the 32 most significant bits
>of the long HL one could unsigned shift the long right by 32 and perhaps
>cast the result to 32 bits. But at the lower level I was hoping the
>compiler would say "let's just return the value of H".

Yes, at least Jet does just that.
Signature

Canadian Mind Products, Roedy Green.
http://mindprod.com Java custom programming, consulting and coaching.

Adam Warner - 25 Mar 2006 04:23 GMT
>>I had a mental model of the long being transferred in two 32-bit registers
>>on a 32-bit platform. Let's call the registers H and L and write the long
[quoted text clipped - 4 lines]
>
> Yes, at least Jet does just that.

Thanks Roedy, that's great to know! It looks like I will be able to build
relatively efficient long index bounds checking upon the JVM. By only
checking the H bits are zero the L check remains with the JVM (it's not
duplicated).

Regards,
Adam
Grumble - 24 Mar 2006 17:17 GMT
> I'm trying to create the fastest way to cast a Java long to an int while
> preserving array bounds checking. This is the approach I suspect should be
[quoted text clipped - 14 lines]
> conducted upon the 32 most significant bits of the 64-bit value, that is
> no shift should actually be performed on 32-bit platforms.

For what it's worth, out of curiosity, I wrote a similar function in C.

#include <stdint.h>
void abort(void);
int32_t foo(int64_t index)
{
 int32_t high = (uint64_t)index >> 32;
 if (high != 0) abort();
 return index;
}

for which gcc-3.4.4 -O2 generates the following code.

_foo:
    pushl    %ebp
    movl    %esp, %ebp
    subl    $8, %esp
/*
What for? Stack alignment?
Why won't it go away with -mpreferred-stack-boundary=4 ??
*/
    movl    12(%ebp), %edx
    movl    8(%ebp), %eax
    testl    %edx, %edx
    jne    L4
    leave
    ret
L4:
    call    _abort

and gcc-3.4.4 -Os -fomit-frame-pointer generates the following code.

_foo:
    cmpl    $0, 8(%esp)
    movl    4(%esp), %eax
    je    L2
    call    _abort
L2:
    ret

(I'd switch je to jne and exchange call _abort and ret.)
Skarmander - 24 Mar 2006 19:04 GMT
>> I'm trying to create the fastest way to cast a Java long to an int while
>> preserving array bounds checking. This is the approach I suspect should be
[quoted text clipped - 34 lines]
> /*
> What for? Stack alignment?

Yes. In particular, the Pentiums and in particular SSE do not like data
that's not royally aligned.

> Why won't it go away with -mpreferred-stack-boundary=4 ??

Because -mpreferred-stack-boundary is the base 2 logarithm of the number of
bytes to align to, not the actual number of bytes. In this case, you've
asked for a stack alignment of 16 bytes, which is the default. Try
-mpreferred-stack-boundary=2.

S.
Eric Albert - 25 Mar 2006 11:03 GMT
> >> I'm trying to create the fastest way to cast a Java long to an int while
> >> preserving array bounds checking. This is the approach I suspect should be
[quoted text clipped - 44 lines]
> asked for a stack alignment of 16 bytes, which is the default. Try
> -mpreferred-stack-boundary=2.

As far as I know, Mac OS X is the only widely used x86 operating system
to use 16-byte stack alignment by default for 32-bit.  Everyone else
uses 4-byte alignment.  For 64-bit, though, the AMD64 ABI requires
16-byte stack alignment.

-Eric

Signature

Eric Albert         ejalbert@cs.stanford.edu
http://outofcheese.org/

Skarmander - 25 Mar 2006 20:58 GMT
<snip>
>>> _foo:
>>>     pushl    %ebp
[quoted text clipped - 14 lines]
> to use 16-byte stack alignment by default for 32-bit.  Everyone else
> uses 4-byte alignment.

Well, it's true that, say, Windows doesn't *need* 16-byte aligment, but
recent gccs use 16-byte alignment by default for x86-32. This does often
raise eyebrows, but there seems to be some truth to the defense that those
extra bytes are a small price to pay for avoiding the risk of performance
loss when the alignment is necessary (for SSE and friends). The Pentium 3
and 4 allegedly like 16-byte alignment better as well, even without SSE
(I've never tested any of this, mind you).

S.
Eric Albert - 26 Mar 2006 09:24 GMT
> <snip>
> >>> _foo:
[quoted text clipped - 24 lines]
> and 4 allegedly like 16-byte alignment better as well, even without SSE
> (I've never tested any of this, mind you).

Ah; you're completely right about gcc.  I'd missed that it used
-mpreferred-stack-boundary=4 by default when not using -Os.  The
difference in Apple's gcc is that -mpreferred-stack-boundary=4 is also
set for -Os, since the system's ABI requires it.

-Eric

Signature

Eric Albert         ejalbert@cs.stanford.edu
http://outofcheese.org/



Free Magazines

Get these publications absolutely FREE for up to 12 months. There are no hidden fees and no obligation. Simply choose a title, complete the application form and submit it. Read more ...

Oracle MagazineNetwork ComputingComputer WorldBio-IT WorldeWeekInformation WeekInfosecurity
 
Sign In
Join
My Latest Posts
My Monitored Threads
My Blog
My Photo Gallery
My Profile
My Homepage

Start New Thread
Enable EMail Alerts
Rate this Thread



©2008 Advenet LLC   Privacy Policy - Terms of Use
This website includes both content owned or controlled by Advenet as well as content owned or controlled by third parties.