Hi,
How can I convert a String containing a
Java Unicode escape sequence to a String
containing the equivalent UTF8 representation?
For instance "\u4f55" -> "e4bd95"
Thanks,
Jeff Higgins
SadRed - 06 Jul 2007 05:40 GMT
> Hi,
> How can I convert a String containing a
[quoted text clipped - 5 lines]
> Thanks,
> Jeff Higgins
See Unicode standard documentation.
This might be handy for UTF-8 encoding:
http://homepage1.nifty.com/algafield/core0.html
bugbear - 06 Jul 2007 09:46 GMT
> Hi,
> How can I convert a String containing a
> Java Unicode escape sequence to a String
> containing the equivalent UTF8 representation?
>
> For instance "\u4f55" -> "e4bd95"
You mean a string containing the hex representation
for the UTF-8 bytes encoding of the string?
Or do you mean a byte array containing utf-8 bytes?
In Java, a string contains "characters" which are
UTF-16.
So a string never contains a "unicode escape sequence",
it merely contains a character. It is the compiler
which turns the escape sequence in your source code
into a "true" string.
BugBear
bugbear - 06 Jul 2007 12:38 GMT
>> Hi,
>> How can I convert a String containing a
[quoted text clipped - 7 lines]
>
> Or do you mean a byte array containing utf-8 bytes?
String str = "\u4f55";
ByteArrayOutputStream baos = new ByteArrayOutputStream();
Charset cs1 = Charset.forName("UTF-8");
OutputStreamWriter osw = new OutputStreamWriter(baos, cs1);
osw.write(str);
byte want[] = baos.toByteArray();
(neither compiled nor tested)
BugBear
Roedy Green - 06 Jul 2007 13:03 GMT
On Fri, 6 Jul 2007 00:03:52 -0400, "Jeff Higgins"
<oohiggins@yahoo.com> wrote, quoted or indirectly quoted someone who
said :
>For instance "\u4f55" -> "e4bd95"
If by that \u4f55 you mean a single 16-bit char, you just have to
write to a Writer specifying UTF-8 as your encoding. See
http://mindprod.com/applets/fileio.html for sample code.
If by that \u4f55 your mean 6 8-bit ASCII characters, nativetoascii
will convert it to other encodings. see
http://mindprod.com/jgloss/native2asciiexe.html and
http://mindprod.com/jgloss/encoding.html
for details
--
Roedy Green Canadian Mind Products
The Java Glossary
http://mindprod.com
Roedy Green - 06 Jul 2007 13:04 GMT
On Fri, 6 Jul 2007 00:03:52 -0400, "Jeff Higgins"
<oohiggins@yahoo.com> wrote, quoted or indirectly quoted someone who
said :
>How can I convert a String containing a
>Java Unicode escape sequence to a String
>containing the equivalent UTF8 representation?
>
>For instance "\u4f55" -> "e4bd95"
If for some reason you wanted to roll your own utility, the code for
UTF-8 reading and writing its at http://mindprod.com/jgloss/utf.html
The code is primarily to help you understand the format.
--
Roedy Green Canadian Mind Products
The Java Glossary
http://mindprod.com
Jeff Higgins - 07 Jul 2007 03:00 GMT
> Hi,
> How can I convert a String containing a
[quoted text clipped - 5 lines]
> Thanks,
> Jeff Higgins
Ok,
Thanks everyone for the generous responses.
SadRed for the pointer to the UTF8 definition.
I found it kind of hard to follow at first, but
now that I've found some code to follow along
with, it's making more sense. Bugbear for the
NIO example, as you can see I struggle with basic
IO now I need to understand wrapping and flipping.
And Roedy whose excellent mindprod site has been
a continuing source of enlightenment, Thanks.
Anyway,
for anyone else who read my OP and was
only able to shake their head in amazement at
it's utter incomprehensibility, here is what I
had \really\ hoped to accomplish.
How to encode a Unicode scalar value in UTF8?
public class Encode
{
public static void main(String[] args)
{
int[] intArray = {0x4f55};
byte[] byteArray = encode(intArray);
for(byte b : byteArray)
{
System.out.print(Integer.toString((b & 0xff) + 0x100,
16).substring(1));
}
}
}
prints e4bd95
where encode(int[]) is a method described at:
<http://developers.sun.com/dev/gadc/technicalpublications/articles/utf8.html>
Hendrik Maryns - 11 Jul 2007 11:54 GMT
Jeff Higgins schreef:
>> Hi,
>> How can I convert a String containing a
[quoted text clipped - 43 lines]
> where encode(int[]) is a method described at:
> <http://developers.sun.com/dev/gadc/technicalpublications/articles/utf8.html>
Ok, I found out what the & 0xff is for, but mind explaining me why you
do + 0x100?
H.
- --
Hendrik Maryns
http://tcl.sfs.uni-tuebingen.de/~hendrik/
==================
http://aouw.org
Ask smart questions, get good answers:
http://www.catb.org/~esr/faqs/smart-questions.html
Jeff Higgins - 11 Jul 2007 16:03 GMT
> Jeff Higgins schreef:
>>> Hi,
[quoted text clipped - 47 lines]
> Ok, I found out what the & 0xff is for, but mind explaining me why you
> do + 0x100?
Well, quite frankly because Roedy Green told me to. Or rather showed
the technique \somewhere\ on his mindprod site. I can't find it now. :(
Boiled down, the code that produced the result follows.
I have no idea how it works, except that it seems to produce the desired
result.
Now you have caused me to have to twiddle bits until I understand.
Thanks,
JH
public class Test
{
public static void main(String[] args)
{
int in = 0x4f55;
byte[] out = new byte[3];
out[0] = (byte)(in >> 12 | 0xE0);
out[1] = (byte)(in >> 6 & 0x3F | 0x80);
out[2] = (byte)(in & 0x3F | 0x80);
for(byte b : out)
{
System.out.print(Integer.toString((b & 0xff + 0x100),
16).substring(1));
}
}
}
Jeff Higgins - 11 Jul 2007 17:30 GMT
>> Jeff Higgins schreef:
>>> How to encode a Unicode scalar value in UTF8?
[quoted text clipped - 23 lines]
> Well, quite frankly because Roedy Green told me to. Or rather showed
> the technique \somewhere\ on his mindprod site. I can't find it now. :(
OK,
Wish I could find it on mindprod site, but can't.
Must have served another purpose.
This works.
System.out.println(Integer.toString((b & 0xff),16));
> Boiled down, the code that produced the result follows.
> I have no idea how it works, except that it seems to produce the desired
[quoted text clipped - 20 lines]
> }
> }
Roedy Green - 12 Jul 2007 02:55 GMT
On Wed, 11 Jul 2007 11:03:39 -0400, "Jeff Higgins"
<oohiggins@yahoo.com> wrote, quoted or indirectly quoted someone who
said :
>> Ok, I found out what the & 0xff is for, but mind explaining me why you
>> do + 0x100?
>
>Well, quite frankly because Roedy Green told me to. Or rather showed
>the technique \somewhere\ on his mindprod site. I can't find it now. :(
It is a trick for forcing lead zeroes.
see http://mindprod.com/jgloss/hex.html
--
Roedy Green Canadian Mind Products
The Java Glossary
http://mindprod.com
Jeff Higgins - 12 Jul 2007 03:17 GMT
> Jeff Higgins wrote: I can't find it now. :(
>
> It is a trick for forcing lead zeroes.
> see http://mindprod.com/jgloss/hex.html
thx
jh
Thomas Fritsch - 11 Jul 2007 18:13 GMT
Hendrik Maryns schrieb:
> Jeff Higgins schreef:
[...]
>> int[] intArray = {0x4f55};
>> byte[] byteArray = encode(intArray);
>> for(byte b : byteArray)
>> {
>> System.out.print(Integer.toString((b & 0xff) + 0x100, 16).substring(1));
>> }
[...]
> Ok, I found out what the & 0xff is for, but mind explaining me why you
> do + 0x100?
I think it is for inserting the leading "0" for each byte less than
0x10, which would be missing otherwise.
For example: Suppose b = 4
Then
Integer.toString((b & 0xff), 16) gives "4",
which is not what you want. You want "04".
The missing leading "0" is produced by the tricky +0x100 and substring(1)
Integer.toString((b & 0xff) + 0x100, 16) gives "104"
Integer.toString((b & 0xff) + 0x100, 16).substring(1) gives "04"

Signature
Thomas