Home | Contact Us | FAQ | Search & Site Map | Link to Us
Sign In | Join | Other 45 Sites in Network
HomeAnnouncementsWhite Papers
Discussion GroupsFirst AidDatabasesJavaBeansGUIJava 3DVirtual MachineCORBASecurityToolsGeneral
Java DirectoryOpen Source ProjectsSample Book ChaptersUser GroupsWeb Resources
Related Topics
Databases.NETMore Topics ...

Java Forum / General / July 2007

Tip: Looking for answers? Try searching our database.

convert Java unicode escape to utf8

Thread view: 
Jeff Higgins - 06 Jul 2007 05:03 GMT
Hi,
How can I convert a String containing a
Java Unicode escape sequence to a String
containing the equivalent UTF8 representation?

For instance "\u4f55" -> "e4bd95"

Thanks,
Jeff Higgins
SadRed - 06 Jul 2007 05:40 GMT
> Hi,
> How can I convert a String containing a
[quoted text clipped - 5 lines]
> Thanks,
> Jeff Higgins

See Unicode standard documentation.
This might be handy for UTF-8 encoding:
http://homepage1.nifty.com/algafield/core0.html
bugbear - 06 Jul 2007 09:46 GMT
> Hi,
> How can I convert a String containing a
> Java Unicode escape sequence to a String
> containing the equivalent UTF8 representation?
>
> For instance "\u4f55" -> "e4bd95"

You mean a string containing the hex representation
for the UTF-8 bytes encoding of the string?

Or do you mean a byte array containing utf-8 bytes?

In Java, a string contains "characters" which are
UTF-16.

So a string never contains a "unicode escape sequence",
it merely contains a character. It is the compiler
which turns the escape sequence in your source code
into a "true" string.

  BugBear
bugbear - 06 Jul 2007 12:38 GMT
>> Hi,
>> How can I convert a String containing a
[quoted text clipped - 7 lines]
>
> Or do you mean a byte array containing utf-8 bytes?

String str = "\u4f55";
ByteArrayOutputStream baos = new ByteArrayOutputStream();
Charset cs1 = Charset.forName("UTF-8");
OutputStreamWriter osw = new OutputStreamWriter(baos, cs1);
osw.write(str);
byte want[] = baos.toByteArray();

(neither compiled nor tested)

  BugBear
Roedy Green - 06 Jul 2007 13:03 GMT
On Fri, 6 Jul 2007 00:03:52 -0400, "Jeff Higgins"
<oohiggins@yahoo.com> wrote, quoted or indirectly quoted someone who
said :

>For instance "\u4f55" -> "e4bd95"

If  by that \u4f55 you mean a single 16-bit char, you just have to
write to a Writer specifying UTF-8 as your encoding.  See
http://mindprod.com/applets/fileio.html for sample code.

If by that \u4f55 your mean 6 8-bit ASCII characters, nativetoascii
will convert it to other encodings.  see
http://mindprod.com/jgloss/native2asciiexe.html and
http://mindprod.com/jgloss/encoding.html
for details

--
Roedy Green Canadian Mind Products
The Java Glossary
http://mindprod.com
Roedy Green - 06 Jul 2007 13:04 GMT
On Fri, 6 Jul 2007 00:03:52 -0400, "Jeff Higgins"
<oohiggins@yahoo.com> wrote, quoted or indirectly quoted someone who
said :

>How can I convert a String containing a
>Java Unicode escape sequence to a String
>containing the equivalent UTF8 representation?
>
>For instance "\u4f55" -> "e4bd95"

If for some reason you wanted to roll your own utility, the code for
UTF-8 reading and writing its at http://mindprod.com/jgloss/utf.html

The code is primarily to help you understand the format.
--
Roedy Green Canadian Mind Products
The Java Glossary
http://mindprod.com
Jeff Higgins - 07 Jul 2007 03:00 GMT
> Hi,
> How can I convert a String containing a
[quoted text clipped - 5 lines]
> Thanks,
> Jeff Higgins

Ok,
 Thanks everyone for the generous responses.
SadRed for the pointer to the UTF8 definition.
I found it kind of hard to follow at first, but
now that I've found some code to follow along
with, it's making more sense. Bugbear for the
NIO example, as you can see I struggle with basic
IO now I need to understand wrapping and flipping.
And Roedy whose excellent mindprod site has been
a continuing source of enlightenment, Thanks.

Anyway,
for anyone else who read my OP and was
only able to shake their head in amazement at
it's utter incomprehensibility, here is what I
had \really\ hoped to accomplish.

How to encode a Unicode scalar value in UTF8?

public class Encode
{
 public static void main(String[] args)
 {
   int[] intArray = {0x4f55};
   byte[] byteArray = encode(intArray);
   for(byte b : byteArray)
   {
     System.out.print(Integer.toString((b & 0xff) + 0x100,
16).substring(1));
   }
 }
}

prints e4bd95

where encode(int[]) is a method described at:
<http://developers.sun.com/dev/gadc/technicalpublications/articles/utf8.html>
Hendrik Maryns - 11 Jul 2007 11:54 GMT
Jeff Higgins schreef:
>> Hi,
>> How can I convert a String containing a
[quoted text clipped - 43 lines]
> where encode(int[]) is a method described at:
> <http://developers.sun.com/dev/gadc/technicalpublications/articles/utf8.html>

Ok, I found out what the & 0xff is for, but mind explaining me why you
do + 0x100?

H.
- --
Hendrik Maryns
http://tcl.sfs.uni-tuebingen.de/~hendrik/
==================
http://aouw.org
Ask smart questions, get good answers:
http://www.catb.org/~esr/faqs/smart-questions.html
Jeff Higgins - 11 Jul 2007 16:03 GMT
> Jeff Higgins schreef:
>>> Hi,
[quoted text clipped - 47 lines]
> Ok, I found out what the & 0xff is for, but mind explaining me why you
> do + 0x100?

Well, quite frankly because Roedy Green told me to. Or rather showed
the technique \somewhere\ on his mindprod site. I can't find it now. :(

Boiled down, the code that produced the result follows.
I have no idea how it works, except that it seems to produce the desired
result.
Now you have caused me to have to twiddle bits until I understand.

Thanks,
JH

public class Test
{
 public static void main(String[] args)
 {
   int in = 0x4f55;
   byte[] out = new byte[3];
   out[0] = (byte)(in >> 12 | 0xE0);
   out[1] = (byte)(in >> 6 & 0x3F | 0x80);
   out[2] = (byte)(in & 0x3F | 0x80);
   for(byte b : out)
   {
     System.out.print(Integer.toString((b & 0xff + 0x100),
16).substring(1));
   }
 }
}
Jeff Higgins - 11 Jul 2007 17:30 GMT
>> Jeff Higgins schreef:
>>> How to encode a Unicode scalar value in UTF8?
[quoted text clipped - 23 lines]
> Well, quite frankly because Roedy Green told me to. Or rather showed
> the technique \somewhere\ on his mindprod site. I can't find it now. :(

OK,
Wish I could find it on mindprod site, but can't.
Must have served another purpose.
This works.

System.out.println(Integer.toString((b & 0xff),16));

> Boiled down, the code that produced the result follows.
> I have no idea how it works, except that it seems to produce the desired
[quoted text clipped - 20 lines]
>  }
> }
Roedy Green - 12 Jul 2007 02:55 GMT
On Wed, 11 Jul 2007 11:03:39 -0400, "Jeff Higgins"
<oohiggins@yahoo.com> wrote, quoted or indirectly quoted someone who
said :

>> Ok, I found out what the & 0xff is for, but mind explaining me why you
>> do + 0x100?
>
>Well, quite frankly because Roedy Green told me to. Or rather showed
>the technique \somewhere\ on his mindprod site. I can't find it now. :(

It is a trick for forcing lead zeroes.
see http://mindprod.com/jgloss/hex.html
--
Roedy Green Canadian Mind Products
The Java Glossary
http://mindprod.com
Jeff Higgins - 12 Jul 2007 03:17 GMT
> Jeff Higgins wrote: I can't find it now. :(
>
> It is a trick for forcing lead zeroes.
> see http://mindprod.com/jgloss/hex.html

thx
jh
Thomas Fritsch - 11 Jul 2007 18:13 GMT
Hendrik Maryns schrieb:
> Jeff Higgins schreef:
[...]
>>  int[] intArray = {0x4f55};
>>  byte[] byteArray = encode(intArray);
>>  for(byte b : byteArray)
>>  {
>>    System.out.print(Integer.toString((b & 0xff) + 0x100, 16).substring(1));
>>  }
[...]
> Ok, I found out what the & 0xff is for, but mind explaining me why you
> do + 0x100?
I think it is for inserting the leading "0" for each byte less than
0x10, which would be missing otherwise.

For example: Suppose b = 4
Then
  Integer.toString((b & 0xff), 16)  gives "4",
which is not what you want. You want "04".
The missing leading "0" is produced by the tricky +0x100 and substring(1)
  Integer.toString((b & 0xff) + 0x100, 16)               gives "104"
  Integer.toString((b & 0xff) + 0x100, 16).substring(1)  gives "04"

Signature

Thomas



Free Magazines

Get these publications absolutely FREE for up to 12 months. There are no hidden fees and no obligation. Simply choose a title, complete the application form and submit it. Read more ...

Oracle MagazineNetwork ComputingComputer WorldBio-IT WorldeWeekInformation WeekInfosecurity
 
Sign In
Join
My Latest Posts
My Monitored Threads
My Blog
My Photo Gallery
My Profile
My Homepage

Start New Thread
Enable EMail Alerts
Rate this Thread



©2009 Advenet LLC   Privacy Policy - Terms of Use
This website includes both content owned or controlled by Advenet as well as content owned or controlled by third parties.