Home | Contact Us | FAQ | Search & Site Map | Link to Us
Sign In | Join | Other 45 Sites in Network
HomeAnnouncementsWhite Papers
Discussion GroupsFirst AidDatabasesJavaBeansGUIJava 3DVirtual MachineCORBASecurityToolsGeneral
Java DirectoryOpen Source ProjectsSample Book ChaptersUser GroupsWeb Resources
Related Topics
Databases.NETMore Topics ...

Java Forum / General / December 2007

Tip: Looking for answers? Try searching our database.

JNI Unicode String puzzle

Thread view: 
Roedy Green - 18 Dec 2007 01:37 GMT
If you do JNI GetStringChars in C++, just what do you get?  an array
of TCHARS?  A null terminated TCHAR string?

If there is no null, is there some idiomatic way to convert to
null-terminated?

You do insert it on the Java side?
Do you have to allocate a buffer, copy and plop a null??

It seems odd there would not be something built-in to handle this.
Signature

Roedy Green Canadian Mind Products
The Java Glossary
http://mindprod.com

Arne Vajhøj - 18 Dec 2007 02:50 GMT
> If you do JNI GetStringChars in C++, just what do you get?  an array
> of TCHARS?  A null terminated TCHAR string?

It returns jchar* and jchar is unsigned short, so you should
get a TCHAR array (assuming _UNICODE).

GetStringUTFChars returns NULL terminated, so I would
expect GetStringChars to be the same.

> You do insert it on the Java side?
> Do you have to allocate a buffer, copy and plop a null??

What ?

Arne
Roedy Green - 18 Dec 2007 02:59 GMT
>GetStringUTFChars returns NULL terminated, so I would
>expect GetStringChars to be the same.

the docs don't mention the null.
Signature

Roedy Green Canadian Mind Products
The Java Glossary
http://mindprod.com

Arne Vajhøj - 18 Dec 2007 03:02 GMT
>> GetStringUTFChars returns NULL terminated, so I would
>> expect GetStringChars to be the same.
>
> the docs don't mention the null.

Have you upgoogled:

http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=4616318

?

It seems as it is not NULL terminated.

Arne
Roedy Green - 18 Dec 2007 03:30 GMT
>It returns jchar* and jchar is unsigned short, so you should
>get a TCHAR array (assuming _UNICODE).
>
>GetStringUTFChars returns NULL terminated, so I would
>expect GetStringChars to be the same.

I have been reading Sheng Liang's book.  He says definitely
GetStringChars can't be trusted to return null.

Some program listings on the net suggests GetStringUTFChars
does automatically append null.

This makes sense.  16-bit chars probably point you to the original
which has no trailing null.  It has to construct an 8-bit string, so
it might as well append the null for you.

Signature

Roedy Green Canadian Mind Products
The Java Glossary
http://mindprod.com

Roedy Green - 18 Dec 2007 05:38 GMT
On Tue, 18 Dec 2007 01:37:24 GMT, Roedy Green
<see_website@mindprod.com.invalid> wrote, quoted or indirectly quoted
someone who said :

>If you do JNI GetStringChars in C++, just what do you get?  an array
>of TCHARS?  A null terminated TCHAR string?

Mystery solved.  

GetStringChars (16-bit)  does not terminate with null.  You must use
wcsncpy_s to provide one.

GetStringUTFChars  (8-bit) does terminate with null.

C++ Unicode 16-bit functions do not work (quietly degrade to 8-bit
mode)  unless you define BOTH:

#define UNICODE
#define _UNICODE

I had forgotten what a nightmare C++ deeply nested typedefs with a
dozen aliases for every actual type are.  YUCCH!

It came clear with sizeof dumps.

Signature

Roedy Green Canadian Mind Products
The Java Glossary
http://mindprod.com

Lew - 18 Dec 2007 05:42 GMT
> I had forgotten what a nightmare C++ deeply nested typedefs with a
> dozen aliases for every actual type are.  YUCCH!

This resonates with what I've been saying about the dangers of adding
something like 'typedef' to Java.  For some reason people object to adding all
the extra type-safe decorations, such as in complicated generics declarations,
and they imagine that a 'typedef' will make life easier.

As anyone who's had to delve into another's source can tell you, things that
favor the original writer don't always favor a later reader of that source.
Shortcut idioms that hide too much, as 'typedef' can, do not always facilitate
maintenance of the program.

Signature

Lew

Zig - 20 Dec 2007 07:40 GMT
> On Tue, 18 Dec 2007 01:37:24 GMT, Roedy Green
> <see_website@mindprod.com.invalid> wrote, quoted or indirectly quoted
> someone who said :
>
>> If you do JNI GetStringChars in C++, just what do you get?  an array
>> of TCHARS?  A null terminated TCHAR string?

I'll assume you're writing for Windows.

> Mystery solved.
>
> GetStringChars (16-bit)  does not terminate with null.  You must use
> wcsncpy_s to provide one.
>
> GetStringUTFChars  (8-bit) does terminate with null.

In one of my standard includes for my Windows JNI projects, I have a  
protype for the function:

LPWSTR GetSzwStringCharsFromHeap(JNIEnv * env, HANDLE hHeap, jstring jstr)
{
    LPWSTR lpwResult=NULL;
    jsize jStrLen;

    if (jstr==NULL)
        goto finished;
    jStrLen=(*env)->GetStringLength(env, jstr);

    lpwResult=HeapAlloc(hHeap, HEAP_ZERO_MEMORY, (jStrLen+1)*sizeof(WCHAR));
    if (lpwResult==NULL)
    {
        fireJavaExceptionForSystemErrorCode(env, GetLastError());
        goto finished;
    }
    (*env)->GetStringRegion(env, jstr, 0L, jStrLen, lpwResult);
   
finished:
    return lpwResult;
}

(Callers should use (*env)->ExceptionCheck(env) to see if this function  
actually succeeded).

If there is a more conventional approach, I'ld love to hear it. Using  
GetStringRegion to copy data to the native buffer once seems like it  
should be more efficient than allocating a non-terminated buffer and a  
terminated buffer.

> C++ Unicode 16-bit functions do not work (quietly degrade to 8-bit
> mode)  unless you define BOTH:
>
> #define UNICODE
> #define _UNICODE

I try to avoid using LPTSTR and TCHAR wherever possible, and instead favor  
LPWSTR and WCHAR. Most Windows functions are declared as

#ifdef UNICODE
#define SomeFunction SomeFunctionW
#else
#define SomeFunction SomeFunctionA
#endif

(With the exception that functions new for Vista / Windows 2008 are  
generally UNICODE only)

Thus, I explicitly call SomeFunctionW, thus avoiding the compiler's global  
UNICODE definitions.

Isn't the UNICODE declaration supposed to be set by the C compiler's  
environment when it's in Unicode mode (which to me would suggest the  
compiler will compile "xyz" the same as L"xyz")? Since <jni.h> expects  
method & type signatures to be supplied as char* , it seems like switching  
the compiler to the full-blown Unicode mode would then break when you  
attempt to make JNI calls of the form:

(*env)->FindClass(env, "java/lang/Object");

Anyway, as some of this is speculation and my experimentation with such  
settings is minimal, I'ld be curious how your mileage goes.

For what it's worth though, if you just use the "W" functions and avoid  
the TCHAR abstraction, the rest seems to fall into place.

> I had forgotten what a nightmare C++ deeply nested typedefs with a
> dozen aliases for every actual type are.  YUCCH!
>
> It came clear with sizeof dumps.

Hope that was interesting or useful,

-Zig


Free Magazines

Get these publications absolutely FREE for up to 12 months. There are no hidden fees and no obligation. Simply choose a title, complete the application form and submit it. Read more ...

Oracle MagazineNetwork ComputingComputer WorldBio-IT WorldeWeekInformation WeekInfosecurity
 
Sign In
Join
My Latest Posts
My Monitored Threads
My Blog
My Photo Gallery
My Profile
My Homepage

Start New Thread
Enable EMail Alerts
Rate this Thread



©2008 Advenet LLC   Privacy Policy - Terms of Use
This website includes both content owned or controlled by Advenet as well as content owned or controlled by third parties.