Home | Contact Us | FAQ | Search & Site Map | Link to Us
Sign In | Join | Other 45 Sites in Network
HomeAnnouncementsWhite Papers
Discussion GroupsFirst AidDatabasesJavaBeansGUIJava 3DVirtual MachineCORBASecurityToolsGeneral
Java DirectoryOpen Source ProjectsSample Book ChaptersUser GroupsWeb Resources
Related Topics
Databases.NETMore Topics ...

Java Forum / Tools / September 2005

Tip: Looking for answers? Try searching our database.

differing sizes of wchar_t

Thread view: 
Henry Townsend - 25 Sep 2005 18:56 GMT
I hope I'm missing something obvious here ...

My app reads text data over a socket. It's actually done via libcurl
(http://curl.haxx.se) but that doesn't matter here; the point is that I
am presented with a void * pointing to a block of raw data and a long
giving the number of bytes in the block, and that's it.

However, since my app comprises both server and client I (the client)
know the data is line-oriented text. All it needs to do is break it into
lines and print them to stdout, skipping certain lines. This is not
hard, using a bunch of <string.h> routines like strchr and strstr, plus
fputs(). I have this all working - when the text is 8-bit ASCII.

But for full generality I'd like the server to deliver text in the UCS-2
charset (Unicode). I figured handling this on the client side would be a
simple matter of transposing char to wchar_t and strlen() to wcslen(),
etc. But it turns out that wchar_t on my platform (and on many,
including Linux and Solaris) is 4 bytes wide. So I've got a stream of
16-bit characters from the server, and mechanisms for handling 8- and
32-bit character streams on the client!

Is there a common/elegant solution here? I could allocate a buffer twice
as big as the incoming data and promote to 4-byte chars before operating
on it but that would be inelegant to say the least. Not to mention the
platforms where wchar_t is 2 bytes. I guess I could pass sizeof(wchar_t)
to the server and have it respond with 2- or 4-byte data based on that,
but that would mean a doubling of bandwidth consumption. What do people
usually do about this "impedance mismatch"?

Thanks,
Henry Townsend
Roedy Green - 25 Sep 2005 21:38 GMT
>I guess I could pass sizeof(wchar_t)
>to the server and have it respond with 2- or 4-byte data based on that,
>but that would mean a doubling of bandwidth consumption. What do people
>usually do about this "impedance mismatch"?

How about telling C you have a stream of bytes, then taking the UTF-16
apart yourself.  There was a long discussion here about how UTF-16 is
encoded.

See http://mindprod.com/jgloss/utf.html

In there anything in your C libraries equivalent to Java UTF-8
encodings? that will give you an array of 32-bit chars with an 8-bit
stream?

I do not catch whether this data is mixed binary/text or pure text.

If mixed you might use LEDataStream to prepare file that look like C
structures.  It encodes the strings as counted UTF-8.  You could take
those apart yourself.

I think the key to this is realising the encoding is not a big deal.
Signature

Canadian Mind Products, Roedy Green.
http://mindprod.com Again taking new Java programming contracts.



Free Magazines

Get these publications absolutely FREE for up to 12 months. There are no hidden fees and no obligation. Simply choose a title, complete the application form and submit it. Read more ...

Oracle MagazineNetwork ComputingComputer WorldBio-IT WorldeWeekInformation WeekInfosecurity
 
Sign In
Join
My Latest Posts
My Monitored Threads
My Blog
My Photo Gallery
My Profile
My Homepage

Start New Thread
Enable EMail Alerts
Rate this Thread



©2008 Advenet LLC   Privacy Policy - Terms of Use
This website includes both content owned or controlled by Advenet as well as content owned or controlled by third parties.