Java Forum / General / November 2005
JNI question: where is the 'jstring' defined?
NOBODY - 28 Oct 2005 01:44 GMT Hi,
I'm trying to read the char[] of a String from JNI code. I got the jchar* from the GetStringChars(... &isCopy) for now, and of course it works.
But the isCopy is JNI_TRUE. So performance cost is knocking at the door...
I don't intend to violate the jstring object received, But I would love to read its actual jchar* (with the offset and size attribute of String of course). I just can't find the stupid .h or .c file in the jvm source code.
Any idea where to look for? I'm trying to write a simple string transform (creates an encoded/decoded string).
Thanks.
Gordon Beaton - 28 Oct 2005 07:47 GMT > I don't intend to violate the jstring object received, But I would love to > read its actual jchar* (with the offset and size attribute of String of > course). I just can't find the stupid .h or .c file in the jvm source code. jstring is a pointer to an opaque datatype, and is likely just one of several aliases for jobject. If you really want to look inside it, you need to find its "real" definition from the source code of your specific JVM. If it's even available at all it will be a separate download from the JDK itself.
/gordon
 Signature [ do not email me copies of your followups ] g o r d o n + n e w s @ b a l d e r 1 3 . s e
Roedy Green - 28 Oct 2005 09:21 GMT >Any idea where to look for? >I'm trying to write a simple string transform (creates an encoded/decoded >string). When you run Javah it will generate a file something like this:
/* DO NOT EDIT THIS FILE - it is machine generated */ #include <jni.h>
/* Header for class com_mindprod_pcclock_PCClock */
#ifndef _Included_com_mindprod_pcclock_PCClock #define _Included_com_mindprod_pcclock_PCClock
#ifdef __cplusplus extern "C" { #endif
/* Inaccessible static: UTC */
/*
* Class: com_mindprod_pcclock_PCClock
* Method: nativeSetClock
* Signature: (IIIIIII)I
*/
JNIEXPORT jint JNICALL Java_com_mindprod_pcclock_PCClock_nativeSetClock
(JNIEnv *, jobject, jint, jint, jint, jint, jint, jint, jint);
#ifdef __cplusplus } #endif
#endif
I think you can begin your searches in jni.h
 Signature Canadian Mind Products, Roedy Green. http://mindprod.com Java custom programming, consulting and coaching.
Gordon Beaton - 28 Oct 2005 09:30 GMT > I think you can begin your searches in jni.h Unfortunately for him, the relevant datatypes are completely opaque, i.e. they are not publicly defined in the header files that come with Suns JDK, and I suspect the same is true of other common JDKs.
/gordon
 Signature [ do not email me copies of your followups ] g o r d o n + n e w s @ b a l d e r 1 3 . s e
Roedy Green - 28 Oct 2005 10:29 GMT >Unfortunately for him, the relevant datatypes are completely opaque, >i.e. they are not publicly defined in the header files that come with >Suns JDK, and I suspect the same is true of other common JDKs. that suggests Sun is telling you the structures are free to change at any time without warning. If you trace them and crack the structure, your code may not work on any other release.
 Signature Canadian Mind Products, Roedy Green. http://mindprod.com Java custom programming, consulting and coaching.
Chris Uppal - 28 Oct 2005 11:45 GMT > I'm trying to read the char[] of a String from JNI code. > I got the jchar* from the GetStringChars(... &isCopy) for now, and of > course it works. > > But the isCopy is JNI_TRUE. So performance cost is knocking at the door... If you aren't already bothered by the cost of calling a JNI method from Java (or a Java method from JNI) then I doubt if the cost of the copy is going to bother you much. I'd guess that the string would have to be several thousand characters long before the cost of the copy was greater than the overhead of a JNI call.
> I don't intend to violate the jstring object received, But I would love to > read its actual jchar* (with the offset and size attribute of String of > course). I just can't find the stupid .h or .c file in the jvm source > code. That information is not available (as Gordon has already said).
If you /really want/ to violate encapsulation, write fragile implementation-dependent code, etc, etc, then you could access the String's internal char[] value, int size, and int offset variables directly from JNI. But then you'd have to use GetCharArrayElements() and there's no reason to suppose that would be any quicker....
-- chris
NOBODY - 29 Oct 2005 14:39 GMT >> I'm trying to read the char[] of a String from JNI code. >> I got the jchar* from the GetStringChars(... &isCopy) for now, and of [quoted text clipped - 8 lines] > have to be several thousand characters long before the cost of the > copy was greater than the overhead of a JNI call. I made many tests before asking. The use case are short vs long strings (5 vs ~100) and strings that need escaping vs strings that do not need escaping (reserved chars are quotes, CR, LF, TAB, BS, \, \z: you can guess that looks like sql escaping.... But don't get me side tracked on prepared statement! that is another game!)
The encoder scans the strings first to find reserved chars (and counts the extra space it would require). If none, return the original jstring. Super fast. Otherwise, it malloc a new jchar* of the extended dimension and loops again to copy & escape the input chars.
Here are the results. JNI call overhead is below 10% with ~100 char strings (and most of our strings are long usually above 100 chars) Note: the java version uses StringBuffer.append(char) on every char, tested in a switch case to find if it is reserved. So the java version doesn't return the original string. This is because I would have needed about 8 calls to String.indexOf(reservedchar) just to figure out if it needs escaping. Pulling the char[] out of the String (to iterate the same ways I do in C) is not an option (similar prohibitive cost to GetStringChars).
(for 1'000'000 calls)
Short string that do not require escaping: "hello" java: avg 2.99 us jni: avg 7.56 us (2.53x slower)
Short string that requires escaping: "hel'lo" java: avg 3.69 us jni: avg 21.9 us (5.9x slower)
(from here, 100'000 calls)
Long string no escaping: "aaaaaaaaaaaaaaaaaaaaabbbbbbbbbbbbbbbbbbbbbbbbbbbcccccccccccccccccccccccc ccddddddddddddddddddeeeeeeeeeeeeeeeeeeeee" Java: avg 44.9 us JNI: avg 17.2 us (2.61x FASTER)
Long string + escaping: "aaaaa'aaaaaaaa'aaaaaaaa'bbbbbbbb'bbbbbbbbbb'bbbbbbbbb'ccccccccc'cccccccc cc'ccccccc'ddddddd'dddd'ddddddd'eeeee'eeeeeeeeee'eeeeee"
Java: avg 54.0 us JNI: avg 59.0 us (1.09x slower)
So, the benefit of JNI lies in the fact that it scale better that java for long strings, which compensate for it base call cost. That cost is the jni overhead, and the GetStringChars() of original jstring which I'm trying to eliminate. That encoding is the hotspot of our app, invoked about a minimum of 3000 times per second).
I have the jvm source. That's what I searched for the 'jstring' text. It's too much to read of course so I'm asking for those who might already know.
Thanks.
>> I don't intend to violate the jstring object received, But I would >> love to read its actual jchar* (with the offset and size attribute of [quoted text clipped - 10 lines] > > -- chris Chris Uppal - 30 Oct 2005 13:27 GMT > The use case are short vs long strings (5 vs ~100) and strings that need > escaping vs strings that do not need escaping (reserved chars are quotes, > CR, LF, TAB, BS, \, \z: you can guess that looks like sql escaping.... I'm not sure what you mean by \z so I've ignored it in my tests (below).
> But don't get me side tracked on prepared statement! that is another > game!) OK, I'll skip the lecture on security, but if you are going to execute these strings as SQL it still raises a couple of technical points. (1) since you can obviously trust the supplier of this data, the supplier is presumably under your control; if so then it might be easier for them to pre-escape the data for you. (2) the cost of string copies, etc, will be /tiny/ compared with the cost of parsing SQL, let alone executing it.
> The encoder scans the strings first to find reserved chars (and counts > the extra space it would require). If none, return the original jstring. > Super fast. Otherwise, it malloc a new jchar* of the extended dimension > and loops again to copy & escape the input chars. > > Here are the results. [sniped] Thanks. Interesting numbers.
It seems that you are using an inefficient Java implementation, and an oddly slow machine. My own experiments run more than an order of magnitude faster. I don't know what kind of kit you are using, but I presume this is intended to run on a server class machine. I would expect that to be a lot faster than my laptop; even if it didn't have a faster clock speed (afaik, clock speed is not a major factor for most servers) it still would have high spec-ed memory, large caches, and high bandwidth between memory and CPU -- which is just what this test needs.
I'm running JDK 1.5 on a 1.5 GHz WinXP laptop, using the -server flag and allowing the JITer time to warm up before measuring. The JNI code was compiled with MS VS.net 2003 with default optimisations for "release" mode.
I'll append my Java code at the end of this message. My JNI code is very similar (I can post that too if you want). It uses the same algorithm as the Java code; the only essential difference is that the JNI code has a small optimisation to avoid the overhead of malloc()-ing a temporary buffer when copying small strings. That saves around 0.1 usecs.
(All the following data averaged over at least 10'000'000 runs)
> Short string that do not require escaping: "hello" > java: avg 2.99 us > jni: avg 7.56 us (2.53x slower) For me: Java: 0.06 us JNI: 0.47 us
> Short string that requires escaping: "hel'lo" > java: avg 3.69 us > jni: avg 21.9 us (5.9x slower) For me: Java: 0.24 JNI: 0.90
> Long string no escaping: > "aaaaaaaaaaaaaaaaaaaaabbbbbbbbbbbbbbbbbbbbbbbbbbbcccccccccccccccccccccccc > ccddddddddddddddddddeeeeeeeeeeeeeeeeeeeee" > Java: avg 44.9 us > JNI: avg 17.2 us (2.61x FASTER) For me: Java: 0.58 JNI: 0.92
> Long string + escaping: > "aaaaa'aaaaaaaa'aaaaaaaa'bbbbbbbb'bbbbbbbbbb'bbbbbbbbb'ccccccccc'cccccccc > cc'ccccccc'ddddddd'dddd'ddddddd'eeeee'eeeeeeeeee'eeeeee" > > Java: avg 54.0 us > JNI: avg 59.0 us (1.09x slower) For me: Java: 3.07 JNI: 2.92
The above tests are a bit ad-hoc. More careful testing gives (note that the following are in /nano/seconds):
Java (no escaping) overhead: 23 nanos/char: 5
JNI (no escaping) overhead: 438 nanos/char: 5
Java (10% escaped) overhead: 102 nanos/char: 22
JNI (10% escaped) overhead: 812 nanos/char: 21
(Note that all the strings used for the JNI test fell within the scope of my malloc() optimisation, longer strings would have made JNI relatively slower)
You'll see that in the case where no copying needs to be done, the Java and JNI versions run at the same speed, they only differ in the fixed overhead of a JNI call. A very similar observation holds for the case where 10% of characters have to be escaped. So there seems to be no benefit in using JNI even for very long strings, since the curves don't cross. That's on my machine/JVM combo, of course, other machines may differ.
> That encoding is the hotspot of our app, invoked > about a minimum of 3000 times per second). Since this machine can process even the slowest of your four example inputs in around 3 usecs, it would be able to handle that workrate at < 0.1% CPU load. Maybe you should replace your server with a two-year-old "ultra-portable" laptop like mine ;-)
Come to think of it, even with the times you posted, and even if the entire workload were the slowest case, that workrate would still only give about a 15% CPU load... Is your profiler lying to you ?
-- chris
======== code =========== public String escape(String input) { int length = input.length(); int count = length; for (int i = 0; i < length; i++) { switch (input.charAt(i)) { case '\'': case '"': case '\\': case '\n': case '\r': case '\t': case 0x08: count++; } }
if (count == length) return input;
char[] b = new char[count]; int pos = 0; for (int i = 0; i < length; i++) { char ch = input.charAt(i); switch (ch) { case '\'': case '"': case '\\': case '\n': case '\r': case '\t': case 0x08: b[pos++] = '\\'; } b[pos++] = ch; }
return new String(b); }
NOBODY - 30 Oct 2005 17:56 GMT Thanks for your generous assistance. I have not seen such attention for a long time. My comments below.
> I'm not sure what you mean by \z so I've ignored it in my tests > (below). I believe it is EOL (end of line, or EOF, end of file, it is ascii 26 anyway).
> (1) since you can obviously trust the supplier of this data, the > supplier is presumably under your control; if so then it might be > easier for them to pre-escape the data for you. (2) the cost of > string copies, etc, will be /tiny/ compared with the cost of parsing > SQL, let alone executing it. For (1), the sql assembly is under our control, yes. But some of the sensitive strings are user provided strings and so needs escaping to prevent sql injection (security reason you assumed). For (2), I have little concerns for the code I cannot optimize (mysqld).
> It seems that you are using an inefficient Java implementation, and an > oddly slow machine. My own experiments run more than an order of > magnitude faster. Don't worry about the jvm/machine. It is a jdk 1.4.2_03, on a crappy dual p2. But the test was run on a dual opteron 2.2 ghz with 4 gigs DDR ram 333 mhz, and the proportion were similar, which is the important.
> [...] So, in a nutshell: JNI won't make it faster in most case. Got the point.
> Come to think of it, even with the times you posted, and even if the > entire workload were the slowest case, that workrate would still only > give about a 15% CPU load... Is your profiler lying to you ? I'm not escaping such simple strings. They were only for test purpose. We are escaping strings from 1 to 32k chars, and I mentionned that we are processing a minimum of 3000/s. In some cases, we could see bursts of 20000/s. (of course, that is on our dual opteron...) Don't pay attention to any absolute values nor provisionning of the CPU.
Your java code is reflecting very much my C code. Interresting to note that my brain stopped working at the sight of another String.charAt() which lead me to believe that I needed many indexOf()...
Hehehe. Thanks a lot, Chris!
Chris Uppal - 31 Oct 2005 10:25 GMT > > I'm not sure what you mean by \z so I've ignored it in my tests > > (below). > > I believe it is EOL (end of line, or EOF, end of file, it is ascii 26 > anyway). Control-Z. Used to be used as an end-of-file indicator by certain primitive OSes that were too dumb even to know how long their own files were...
BTW, you perhaps also ought to check for 0x0 since some tools might take that as end-of-string while others wouldn't. And that kind of disagreement over the meaning of a string can be a godsend for a cracker.
> For (1), the sql assembly is under our control, yes. But some of the > sensitive strings are user provided strings and so needs escaping to > prevent sql injection (security reason you assumed). Just a thought, but if only some of the strings are "untrusted" (and if you know which ones they are), can you save time by only escaping those ?
> > It seems that you are using an inefficient Java implementation, and an > > oddly slow machine. My own experiments run more than an order of [quoted text clipped - 3 lines] > p2. But the test was run on a dual opteron 2.2 ghz with 4 gigs DDR ram > 333 mhz, and the proportion were similar, which is the important. Hmm, that /should/ be faster than my laptop. Puzzling....
> I'm not escaping such simple strings. They were only for test purpose. > We are escaping strings from 1 to 32k chars, and I mentionned that we are > processing a minimum of 3000/s. In some cases, we could see bursts of > 20000/s. (of course, that is on our dual opteron...) Ah, yes. 50 usec / job doesn't give you a lot of time to play with.
-- chris
NOBODY - 01 Nov 2005 05:59 GMT >> > It seems that you are using an inefficient Java implementation, and >> > an oddly slow machine. My own experiments run more than an order [quoted text clipped - 6 lines] > > Hmm, that /should/ be faster than my laptop. Puzzling.... I really meant 'proportions'. On the dual opteron (although only 1 cpu is used I guess), everything is about 20x faster (can't remember the exact ratio)
Thanks again.
Roedy Green - 29 Oct 2005 15:45 GMT On Fri, 28 Oct 2005 11:45:25 +0100, "Chris Uppal" <chris.uppal@metagnostic.REMOVE-THIS.org> wrote, quoted or indirectly quoted someone who said :
>If you /really want/ to violate encapsulation, write fragile >implementation-dependent code, etc, etc, then you could access the String's >internal char[] value, int size, and int offset variables directly from JNI. >But then you'd have to use GetCharArrayElements() and there's no reason to >suppose that would be any quicker.... if there were a generic, safe way to do that, surely Sun would have used it. Even if you figure out a way, it will surely have a major catch.
 Signature Canadian Mind Products, Roedy Green. http://mindprod.com Java custom programming, consulting and coaching.
NOBODY - 30 Oct 2005 17:21 GMT > On Fri, 28 Oct 2005 11:45:25 +0100, "Chris Uppal" > <chris.uppal@metagnostic.REMOVE-THIS.org> wrote, quoted or indirectly [quoted text clipped - 8 lines] > if there were a generic, safe way to do that, surely Sun would have > used it. Sun would use their own implementation secretc if they had to provide such custom encoder, I agree. But they didn't have to and their implementation of jstring is clearly made obscure to prevent tampering memory objects. (honestly, there are CPU instructions that can help implement String.indexOf() and .equals() in much faster ways (on intels, SSTOS, SSCANS if I recall correctly, which sadly are not anymore on P4 for some reasons, but could be implemented in gate-array electronic and run 1024 char loops in 1 cpu clock... but now I'm drifting off topic!).
> Even if you figure out a way, it will surely have a major catch. I cannot see one, since the jstring object is the mirror of an immutable object, and the jni ref is valid and used only in that stackframe, and it is only read. Thanks anyway. But still, where is the 'jstring' .c file?!!
:-) Roedy Green - 31 Oct 2005 03:55 GMT >there are CPU instructions that can help implement >String.indexOf() and .equals() in much faster ways (on intels, SSTOS, >SSCANS if I recall correctly, which sadly are not anymore on P4 for some >reasons, but could be implemented in gate-array electronic and run 1024 >char loops in 1 cpu clock... but now I'm drifting off topic!). I would be astounded if those instructions disappeared. All kinds of code would stop working. What you may have read is that they are not implemented in hardware, but in microcode, so are not as fast as the equivalent longhand mov instructions.
 Signature Canadian Mind Products, Roedy Green. http://mindprod.com Java custom programming, consulting and coaching.
NOBODY - 01 Nov 2005 05:52 GMT >>there are CPU instructions that can help implement >>String.indexOf() and .equals() in much faster ways (on intels, SSTOS, [quoted text clipped - 7 lines] > implemented in hardware, but in microcode, so are not as fast as the > equivalent longhand mov instructions. Sounds familiar. I think you are right. My brain's memory lack parity bits...! But yeah, I think I remember something about not being a 1-clock operation anymore...
Free MagazinesGet these publications absolutely FREE for up to 12 months. There are no hidden fees and no obligation. Simply choose a title, complete the application form and submit it. Read more ...
|
|
|