Home | Contact Us | FAQ | Search & Site Map | Link to Us
Sign In | Join | Other 45 Sites in Network
HomeAnnouncementsWhite Papers
Discussion GroupsFirst AidDatabasesJavaBeansGUIJava 3DVirtual MachineCORBASecurityToolsGeneral
Java DirectoryOpen Source ProjectsSample Book ChaptersUser GroupsWeb Resources
Related Topics
Databases.NETMore Topics ...

Java Forum / General / March 2008

Tip: Looking for answers? Try searching our database.

String.charAt() returns wrong char

Thread view: 
column.column@gmail.com - 22 Mar 2008 21:17 GMT
I need to have byte (or array of bytes) for some reason and I wont to
store it temporary in String. Unfortunately String.charAt returns bad
characters in case when byte a>127. Why?

int aa=0x92 ;
a=(byte)aa; // a becomes-110 . That is because byte  -128...127.
Anyway, bit layout this is the same

byte [] aaa  = new byte[] {a};
String ggg= new String(aaa); //creating string

a=(byte) ggg.charAt(0); // a becomes 25 - why?

Thank You
Mark Space - 22 Mar 2008 22:13 GMT
> I need to have byte (or array of bytes) for some reason and I wont to
> store it temporary in String. Unfortunately String.charAt returns bad
[quoted text clipped - 10 lines]
>
> Thank You

Probably the string is trying to interpret the byte as Unicode...
Eric Sosman - 22 Mar 2008 22:16 GMT
> I need to have byte (or array of bytes) for some reason and I wont to
> store it temporary in String. Unfortunately String.charAt returns bad
[quoted text clipped - 8 lines]
>
> a=(byte) ggg.charAt(0); // a becomes 25 - why?

    Short answer: Because chars are not bytes.

    Longer answer: When you construct a String from an array
of bytes, the bytes are decoded as representations of the
platform's default character set.  On my machine (which may
be using the same encoding as yours, because we get the same
final result), the array "new byte[] { -110 }" decodes to a
String whose single character has the code 8217 or \u2019,
a Unicode right single quotation mark.  When you convert this
char to a byte by chopping away the high-order half, you're
left with 25.  Other systems might give you different results.

    Your plan to store an array of "raw bytes" as a String
is flawed: Strings are not arrays, and they are made up not
of bytes but of chars.  Why do you think you need to do it?

Signature

Eric Sosman
esosman@ieee-dot-org.invalid

column.column@gmail.com - 23 Mar 2008 09:53 GMT
But maybe, it is possible to create string not in unicode format, but
in single byte coded characters? I found one more strange thing. My
serial communication class sends string to com port as needed -
character is 0x092. That means there is method to convert string to
bytes in right way.

> column.col...@gmail.com wrote:
> > I need to have byte (or array of bytes) for some reason and I wont to
[quoted text clipped - 29 lines]
> Eric Sosman
> esos...@ieee-dot-org.invalid
Lew - 23 Mar 2008 11:17 GMT
(please do not top-post)

Eric Sosman wrote:
>>      Longer answer: When you construct a String from an array
>> of bytes, the bytes are decoded as representations of the
[quoted text clipped - 9 lines]
>> is flawed: Strings are not arrays, and they are made up not
>> of bytes but of chars.  Why do you think you need to do it?

> But maybe, it is possible to create string not in unicode format, but
> in single byte coded characters?

No.

One can create a String /from/ single-byte encoded characters, by specifying
the encoding for the conversion.  The String itself will always comprise
16-bit-encoded characters.

Signature

Lew

rossum - 23 Mar 2008 11:27 GMT
>I need to have byte (or array of bytes) for some reason and I wont to
>store it temporary in String. Unfortunately String.charAt returns bad
[quoted text clipped - 10 lines]
>
>Thank You
There are ways to encode raw bytes as strings.  Have you tried hex
(=Base16) encoding or Base64 encoding?  Both of those will reversibly
convert between raw bytes and printable strings.

If you need the charAt() function for the string format then hex is
probably better because the mapping between bytes and character
positions is much simpler than with Base64.

rossum
column.column@gmail.com - 24 Mar 2008 11:45 GMT
> If you need the charAt() function for the string format then hex is
> probably better because the mapping between bytes and character
> positions is much simpler than with Base64.

You mean I must use charsetName in string create? I found following
char sets using Charset.availableCharsets(), but there is no Base16

{Big5=Big5, Big5-HKSCS=Big5-HKSCS, EUC-JP=EUC-JP, EUC-KR=EUC-KR,
GB18030=GB18030, GB2312=GB2312, GBK=GBK, IBM-Thai=IBM-Thai,
IBM00858=IBM00858, IBM01140=IBM01140, IBM01141=IBM01141,
IBM01142=IBM01142, IBM01143=IBM01143, IBM01144=IBM01144,
IBM01145=IBM01145, IBM01146=IBM01146, IBM01147=IBM01147,
IBM01148=IBM01148, IBM01149=IBM01149, IBM037=IBM037, IBM1026=IBM1026,
IBM1047=IBM1047, IBM273=IBM273, IBM277=IBM277, IBM278=IBM278,
IBM280=IBM280, IBM284=IBM284, IBM285=IBM285, IBM297=IBM297,
IBM420=IBM420, IBM424=IBM424, IBM437=IBM437, IBM500=IBM500,
IBM775=IBM775, IBM850=IBM850, IBM852=IBM852, IBM855=IBM855,
IBM857=IBM857, IBM860=IBM860, IBM861=IBM861, IBM862=IBM862,
IBM863=IBM863, IBM864=IBM864, IBM865=IBM865, IBM866=IBM866,
IBM868=IBM868, IBM869=IBM869, IBM870=IBM870, IBM871=IBM871,
IBM918=IBM918, ISO-2022-CN=ISO-2022-CN, ISO-2022-JP=ISO-2022-JP,
ISO-2022-JP-2=ISO-2022-JP-2, ISO-2022-KR=ISO-2022-KR,
ISO-8859-1=ISO-8859-1, ISO-8859-13=ISO-8859-13,
ISO-8859-15=ISO-8859-15, ISO-8859-2=ISO-8859-2, ISO-8859-3=ISO-8859-3,
ISO-8859-4=ISO-8859-4, ISO-8859-5=ISO-8859-5, ISO-8859-6=ISO-8859-6,
ISO-8859-7=ISO-8859-7, ISO-8859-8=ISO-8859-8, ISO-8859-9=ISO-8859-9,
JIS_X0201=JIS_X0201, JIS_X0212-1990=JIS_X0212-1990, KOI8-R=KOI8-R,
KOI8-U=KOI8-U, Shift_JIS=Shift_JIS, TIS-620=TIS-620, US-ASCII=US-
ASCII, UTF-16=UTF-16, UTF-16BE=UTF-16BE, UTF-16LE=UTF-16LE,
UTF-32=UTF-32, UTF-32BE=UTF-32BE, UTF-32LE=UTF-32LE, UTF-8=UTF-8,
windows-1250=windows-1250, windows-1251=windows-1251,
windows-1252=windows-1252, windows-1253=windows-1253,
windows-1254=windows-1254, windows-1255=windows-1255,
windows-1256=windows-1256, windows-1257=windows-1257,
windows-1258=windows-1258, windows-31j=windows-31j, x-Big5-Solaris=x-
Big5-Solaris, x-euc-jp-linux=x-euc-jp-linux, x-EUC-TW=x-EUC-TW, x-
eucJP-Open=x-eucJP-Open, x-IBM1006=x-IBM1006, x-IBM1025=x-IBM1025, x-
IBM1046=x-IBM1046, x-IBM1097=x-IBM1097, x-IBM1098=x-IBM1098, x-
IBM1112=x-IBM1112, x-IBM1122=x-IBM1122, x-IBM1123=x-IBM1123, x-
IBM1124=x-IBM1124, x-IBM1381=x-IBM1381, x-IBM1383=x-IBM1383, x-
IBM33722=x-IBM33722, x-IBM737=x-IBM737, x-IBM834=x-IBM834, x-IBM856=x-
IBM856, x-IBM874=x-IBM874, x-IBM875=x-IBM875, x-IBM921=x-IBM921, x-
IBM922=x-IBM922, x-IBM930=x-IBM930, x-IBM933=x-IBM933, x-IBM935=x-
IBM935, x-IBM937=x-IBM937, x-IBM939=x-IBM939, x-IBM942=x-IBM942, x-
IBM942C=x-IBM942C, x-IBM943=x-IBM943, x-IBM943C=x-IBM943C, x-IBM948=x-
IBM948, x-IBM949=x-IBM949, x-IBM949C=x-IBM949C, x-IBM950=x-IBM950, x-
IBM964=x-IBM964, x-IBM970=x-IBM970, x-ISCII91=x-ISCII91, x-ISO-2022-CN-
CNS=x-ISO-2022-CN-CNS, x-ISO-2022-CN-GB=x-ISO-2022-CN-GB, x-
iso-8859-11=x-iso-8859-11, x-JIS0208=x-JIS0208, x-JISAutoDetect=x-
JISAutoDetect, x-Johab=x-Johab, x-MacArabic=x-MacArabic, x-
MacCentralEurope=x-MacCentralEurope, x-MacCroatian=x-MacCroatian, x-
MacCyrillic=x-MacCyrillic, x-MacDingbat=x-MacDingbat, x-MacGreek=x-
MacGreek, x-MacHebrew=x-MacHebrew, x-MacIceland=x-MacIceland, x-
MacRoman=x-MacRoman, x-MacRomania=x-MacRomania, x-MacSymbol=x-
MacSymbol, x-MacThai=x-MacThai, x-MacTurkish=x-MacTurkish, x-
MacUkraine=x-MacUkraine, x-MS950-HKSCS=x-MS950-HKSCS, x-mswin-936=x-
mswin-936, x-PCK=x-PCK, x-UTF-16LE-BOM=x-UTF-16LE-BOM, X-UTF-32BE-
BOM=X-UTF-32BE-BOM, X-UTF-32LE-BOM=X-UTF-32LE-BOM, x-windows-50220=x-
windows-50220, x-windows-50221=x-windows-50221, x-windows-874=x-
windows-874, x-windows-949=x-windows-949, x-windows-950=x-windows-950,
x-windows-iso2022jp=x-windows-iso2022jp}
rossum - 24 Mar 2008 12:36 GMT
>> If you need the charAt() function for the string format then hex is
>> probably better because the mapping between bytes and character
>> positions is much simpler than with Base64.
>
>You mean I must use charsetName in string create? I found following
>char sets using Charset.availableCharsets(), but there is no Base16
Base16 is another name for Hex.  It only uses 16 characters
0123456789ABCDEF or 0123456789abcdef.  Each byte is translated into
two characters.

This is the code I use:

 /**
  * Converts a byte array into a hex string: "EB 33 0F 7E".  
  * The string uses uppercase with leading zeros and spaces
  * for separators.
  *
  * @param inBytes The byte array to convert.
  * @return A hex string with spaces for separators.
  */
 public static String asHex(byte[] inBytes) {
   final String separator = " ";
   final char leadingZero = '0';
   StringBuilder sb = new StringBuilder(inBytes.length * 3);
   for (int i = 0; i < inBytes.length; ++i) {
     if (i > 0) { sb.append(separator); }
     if (inBytes[i] >= 0 && inBytes[i] < 0x10) {
       sb.append(leadingZero);
     } // end if
     sb.append(Integer.toHexString(inBytes[i] & 0xFF));
   } // end for
   return sb.toString().toUpperCase();
 } // end asHex(byte[])

You may wish to remove the separator so your output looks more like
"EB330F7E".

I leave it up to you to do the reverse conversion from the string back
to bytes.

rossum
Roedy Green - 24 Mar 2008 13:09 GMT
>You mean I must use charsetName in string create? I found following
>char sets using Charset.availableCharsets(), but there is no Base16

see http://mindprod.com/jgloss/base64.html
in it not one of the supported encodings.
I don't think hex is either.
Signature


Roedy Green Canadian Mind Products
The Java Glossary
http://mindprod.com

Mark Space - 24 Mar 2008 20:18 GMT
>> If you need the charAt() function for the string format then hex is
>> probably better because the mapping between bytes and character
>> positions is much simpler than with Base64.
>
> You mean I must use charsetName in string create? I found following
> char sets using Charset.availableCharsets(), but there is no Base16

Here is my question:

Why use Strings at all?  Byte arrays are ideal for IO, just send the
array to the serial port you want.

If you are doing some text processing, there are methods that take
byte[] and convert large amounts of text quickly.  Yes, you still need a
Charset for this.

(Can you tell us what charset you are using?  What character is 92
anyway?  You haven't even told us yet.)
Roedy Green - 23 Mar 2008 20:18 GMT
>I need to have byte (or array of bytes) for some reason and I wont to
>store it temporary in String. Unfortunately String.charAt returns bad
>characters in case when byte a>127. Why?

there are scores of ways of converting bytes to String.  See
http://mindprod.com/jgloss/encoding.html

You want something quick, mindless and reversible, e.g. prepend a 0
byte.

ISO-8859-1 will do.

If you want something compact, see
http://mindprod.com/jgloss/armouring.html
Signature


Roedy Green Canadian Mind Products
The Java Glossary
http://mindprod.com

EJP - 25 Mar 2008 03:05 GMT
> I need to have byte (or array of bytes) for some reason and I wont to
> store it temporary in String.

Why? That's where your problem is. String is not a container for binary
data.


Free Magazines

Get these publications absolutely FREE for up to 12 months. There are no hidden fees and no obligation. Simply choose a title, complete the application form and submit it. Read more ...

Oracle MagazineNetwork ComputingComputer WorldBio-IT WorldeWeekInformation WeekInfosecurity
 
Sign In
Join
My Latest Posts
My Monitored Threads
My Blog
My Photo Gallery
My Profile
My Homepage

Start New Thread
Enable EMail Alerts
Rate this Thread



©2008 Advenet LLC   Privacy Policy - Terms of Use
This website includes both content owned or controlled by Advenet as well as content owned or controlled by third parties.