Java Forum / General / March 2006
platform's default charset ?
gk - 30 Jan 2006 10:14 GMT what is platform's default charset ?
String original = new String("A" + "\u00ea" + "\u00f1" + "\u00fc" + "C"); try { byte[] utf8Bytes = original.getBytes("UTF8"); byte[] defaultBytes = original.getBytes(); String roundTrip = new String(utf8Bytes, "UTF8"); String defaultTrip = new String(defaultBytes);
System.out.println("roundTrip = " + roundTrip); // output-1 System.out.println("defaultTrip = " + defaultTrip); // output-2
QUESTION :
why output-1 and output-2 are same ?
REASON OF THIS QUESTION :
String original = new String("A" + "\u00ea" + "\u00f1" + "\u00fc" + "C");
this is a unicode string and it looks like "AêñüC"
How could the second output output-2 produces the same output as output-1 ?
the ouput-2 has been encoded/decoded into "platform's default charset" . as i have used
byte[] defaultBytes = original.getBytes();
and
String defaultTrip = new String(defaultBytes);
for the output-2
(My System is windows XP ) ......so how that could produce the same output as output-1 which uses encoding UTF-8 ?
do yo want to say, windows XP supporting UTF-8 ? so, by default it picks up the UTF-8 encoding ?
in which place this 2 output i.e output-1 and output-2 wnt be same ?
is it in linux ? solaris ? or where this two output are not same .
thank you
Thomas Weidenfeller - 30 Jan 2006 12:25 GMT > what is platform's default charset ? Charset.defaultCharset()
> How could the second output output-2 produces the same output as > output-1 ? Why do you think they should be different at all? You start with the same Unicode string. Then you convert it into two (possibly different) byte representations. Then you convert the byte representations with the correct *matching reverse operation* back to two Unicode strings.
The version where you use the UTF-8 byte encoding can't fail. It is made to represent Unicode characters, and you provide Unicode characters for a start. From Java's point of view it is even a very trivial operation, since the VM uses a modified UTF-8 encoding internally, so there isn't much to do when converting to a UTF-8 byte sequence.
The only way the version which uses the platform's default encoding could fail would be if the platform's encoding could not represent a particular character in a platform-specific byte sequence. In that case you wouldn't get a full round trip conversion for such characters. This is, however, very unlikely, since you did chose Unicode characters which are all well in the Latin 1 range. This is the second most common character encoding after seven bit ASCII, and many character encodings encompass Latin 1 in one way or the other (the first 256 Unicode characters are actually the Latin 1 characters).
/Thomas
 Signature The comp.lang.java.gui FAQ: ftp://ftp.cs.uu.nl/pub/NEWS.ANSWERS/computer-lang/java/gui/faq http://www.uni-giessen.de/faq/archiv/computer-lang.java.gui.faq/
opalpa@gmail.com opalinski from opalpaweb - 30 Jan 2006 13:23 GMT "The Java 2 platform uses the UTF-16 representation in char arrays and in the String and StringBuffer classes" (http://java.sun.com/j2se/1.5.0/docs/api/java/lang/Character.html)
> From Java's point of view it is even a very trivial operation, > since the VM uses a modified UTF-8 encoding internally When one talks about Java using a modified UTF-8 it normally refers to Java representing UTF-8 a little different than most implementaitons. http://en.wikipedia.org/wiki/UTF-8#Modified_UTF-8_in_Java
Java uses UTF-16 interanally.
Opalinski opalpa@gmail.com http://www.geocities.com/opalpaweb/
Alex Buell - 30 Jan 2006 13:25 GMT On 30 Jan 2006 05:23:00 -0800 "opalpa@gmail.com opalinski from opalpaweb" <opalpa@gmail.com> waved a wand and this message magically appeared:
> Java uses UTF-16 interanally. "inter-anally"? Teehee.
 Signature http://www.munted.org.uk
"Honestly, what can I possibly say to get you into my bed?" - Anon.
Roedy Green - 30 Jan 2006 14:44 GMT On 30 Jan 2006 05:23:00 -0800, "opalpa@gmail.com opalinski from opalpaweb" <opalpa@gmail.com> wrote, quoted or indirectly quoted someone who said :
>Java uses UTF-16 interanally. what that a typo or a Freudian slip or a slur?
 Signature Canadian Mind Products, Roedy Green. http://mindprod.com Java custom programming, consulting and coaching.
opalpa@gmail.com opalinski from opalpaweb - 30 Jan 2006 20:52 GMT me> Java uses UTF-16 interanally. Alex> "inter-anally"? Teehee. Roedy> what that a typo or a Freudian slip or a slur?
Too many message windows to too many sexpartners. All this simultanallity; poor linear mind gets vexed.
Lol.
Cheers.
Chris Uppal - 30 Jan 2006 14:44 GMT > the VM uses a modified UTF-8 encoding internally, so there isn't > much to do when converting to a UTF-8 byte sequence. This is almost certainly untrue for any given JVM. It's true that some of the /external interfaces/ to the JVM, notably JNI and the classfile format, do use the modified version of UTF-8, but that in no way constrains, or (probably) reflects, the internal representation of Java Strings.
If we are talking about the Sun implementations, then Strings are represented (quite explicitly at Java level) as char[] arrays which hold Unicode data represented as UTF-16 sequences of 16-bit integers. Of course, there might be other versions of the platform which have different implementations of String. I suppose it's not impossible that one of them could use byte[] arrays in not-actually-UTF-8 format, but I find it hard to imagine a convincing motivation.
BTW, converting Sun's bastardised imitation of UTF-8 into real UTF-8 is /not/ trivial. Converting not-actually-UTF-8 into UTF-8 involves (logically) the same steps as converting not-actually-UTF-8 to UTF-16, decoding that to Unicode, and finally encoding that as UTF-8.
-- chris
Roedy Green - 30 Jan 2006 21:25 GMT On Mon, 30 Jan 2006 14:45:13 -0000, "Chris Uppal" <chris.uppal@metagnostic.REMOVE-THIS.org> wrote, quoted or indirectly quoted someone who said :
> Of course, there might be >other versions of the platform which have different implementations of String. >I suppose it's not impossible that one of them could use byte[] arrays in >not-actually-UTF-8 format, but I find it hard to imagine a convincing >motivation. To index and process strings you need them in 16 bit form. However, for storage of strings not actively being processed I could imagine some sort of caching scheme that converts them to UTF-8 for more compact storage. All string handling functions would have to be aware of the two formats and automatically unpack Strings when accessed for anything other than referencing the string as a whole.
 Signature Canadian Mind Products, Roedy Green. http://mindprod.com Java custom programming, consulting and coaching.
gk - 31 Jan 2006 05:32 GMT > > what is platform's default charset ? > > Charset.defaultCharset() this does not exists .
look here
http://java.sun.com/j2se/1.4.2/docs/api/java/nio/charset/Charset.html
Chris Uppal - 31 Jan 2006 10:48 GMT > > Charset.defaultCharset() > this does not exists . It's new in 1.5.
-- chris
Roedy Green - 31 Jan 2006 13:45 GMT On Tue, 31 Jan 2006 10:48:17 -0000, "Chris Uppal" <chris.uppal@metagnostic.REMOVE-THIS.org> wrote, quoted or indirectly quoted someone who said :
>> > Charset.defaultCharset() >> this does not exists . > >It's new in 1.5. prior to that you had look at a System property. It might even have been restricted to signed applets. See http://mindprod.com/jgloss/encoding.html I should have it all documented there.
 Signature Canadian Mind Products, Roedy Green. http://mindprod.com Java custom programming, consulting and coaching.
Piotr Kobzda - 13 Mar 2006 14:57 GMT > On Tue, 31 Jan 2006 10:48:17 -0000, "Chris Uppal" > <chris.uppal@metagnostic.REMOVE-THIS.org> wrote, quoted or indirectly [quoted text clipped - 10 lines] > http://mindprod.com/jgloss/encoding.html I should have it all > documented there. Less restrictive alternative than System properties querying is:
String defaultEncodingName = new java.io.OutputStreamWriter(System.out).getEncoding();
Regards, piotr
gk - 31 Jan 2006 05:45 GMT > The only way the version which uses the platform's default encoding > could fail would be if the platform's encoding could not represent a [quoted text clipped - 5 lines] > encompass Latin 1 in one way or the other (the first 256 Unicode > characters are actually the Latin 1 characters). bit confused.
do you mean, the defaulf character set for all the platform is "unicode",
because the DOC says,
String(byte[] bytes) Constructs a new String by decoding the specified array of bytes using the platform's default charset.
so, when i am doing the reverse thingie, if i dont mention the encoding format , the default charset will be invoked and they may produce different strings on different platforms.
do you mean, all the platforms have UTF-8 character set by default ?
do you mean, when i called , String defaultTrip = new String(defaultBytes); the UTF-8 has been called ?.....but how that cold be possible ? may be linux uses some other encoding as default , solaris uses some other encoding as default.....so, this would produce some other strings .............even, if they (platforms) have UTF-8 chars, how UTF-8 wold be called by default (because i have not mentioned in the constructor ) and so they are bound to produce different results ?
i dont have have other platforms, so i am not able to test it in another platforms.
i did it only in win-xp.
it is still confusing .
please explain.
and who knows , whats the default charset of other platforms ......so, this might produce some other strings
gk - 31 Jan 2006 05:56 GMT i discoveded this
import java.nio.charset.Charset; class StringTest { public static void main(String[] args) { String defaultEncodingName = System.getProperty( "file.encoding" ); System.out.println(defaultEncodingName); } }
output: ===== Cp1252
SO, my platform supports only Cp1252 encoding.
According to DOC >>
byte[] getBytes() Encodes this String into a sequence of bytes using the platform's default charset, storing the result into a new byte array.
AND
String(byte[] bytes) Constructs a new String by decoding the specified array of bytes using the platform's default charset.
and According to my code here,
byte[] defaultBytes = original.getBytes(); String defaultTrip = new String(defaultBytes);
they should work with platform's default charset and that is "Cp1252" ( my discovery)
note, this is not unicode !!.......
but when i printed
System.out.println("defaultTrip = " + defaultTrip);
it prints a unicode !!!!!.....this should have printed some other complex odd looking sring...is not it ?
Roedy Green - 31 Jan 2006 13:46 GMT >SO, my platform supports only Cp1252 encoding. unless you specifically ask for something else. That is just the default for Readers/Writer and String <=> byte[] conversion.
See http://mindprod.com/jgloss/encoding.html
 Signature Canadian Mind Products, Roedy Green. http://mindprod.com Java custom programming, consulting and coaching.
Roedy Green - 31 Jan 2006 20:01 GMT On Tue, 31 Jan 2006 13:46:47 GMT, Roedy Green <my_email_is_posted_on_my_website@munged.invalid> wrote, quoted or indirectly quoted someone who said :
>>SO, my platform supports only Cp1252 encoding. > >unless you specifically ask for something else. That is just the >default for Readers/Writer and String <=> byte[] conversion. > >See http://mindprod.com/jgloss/encoding.html see http://mindprod.com/jgloss/fileio.html for how to specify a different encoding for Reader/Writer
see http://mindprod.com/jgloss/conversion for how to specify a different one for String <=> byte[] conversion.
 Signature Canadian Mind Products, Roedy Green. http://mindprod.com Java custom programming, consulting and coaching.
gk - 01 Feb 2006 06:49 GMT here are some points i have taken note from your comments
1) java strings are simply chars ......may be we could think these are as unicode chars.
so, String str="one big string" .....is a bunc of unicode chars....
2) there is no encoding involved while we talk about Strings.......encoidng will come into picture, when we do the String <=> byte[] conversion.
3) we could use any encoidng to encode these bunch of unicode chars into byte[] array.....if those ebcoding recognises these unicode chars , then we are safe...becuase when we revert back, there will be no loss of data.
4) I is always suggested to use UTF-8 encoding while we convert it into byte[] and vice versa.
BUT, i am not comfortable when i run this "Roedy Green's" code (http://mindprod.com/jgloss/conversion)
String s = "abc"; // string -> byte[] byte [] b = s.getBytes( "8859_1" /* encoding */ ); // byte[] -> String String t = new String( b , "Cp1252" /* encoding */ );
This code prints t="abc" !!
see, we encoded the string via "8859_1" and retrieved via ""Cp1252"" ...and we get the original string.
i also tried...
String s = "abc"; // string -> byte[] byte [] b = s.getBytes( "windows-1250" /* encoding */ ); // byte[] -> String String t = new String( b , "Cp1252" /* encoding */ ); System.out.println(t);
again got t="abc"
there is No loss of data.
so, this means, each encoding recognises other encoding.....and thats why they are able to revert back.
but, this is not good.....it is not expected that one encoding would be recognised by other encoding !!....because, if that happens any body can hack any binary documents written in unknown encoding like this......the thief need not to know, whether the owner has encode the file in UTF-8, or "8859_1", or "Cp1252" or " "windows-1250" etc etc.....because, the thief knows encoding are brothers , and they recognise each other...so, he could decode by any encoding.
P.S : MIND IT..... i am talking about Cryptrography ....but here in this example we are loosing the meaning of the word "encoding".
gk - 01 Feb 2006 06:55 GMT sorry, i meant ...i am NOT talking abot Cryptrography and the different versions of encoding.
i am talking about these simple charset encoding .
Roedy Green - 01 Feb 2006 10:08 GMT >sorry, i meant ...i am NOT talking abot Cryptrography and the >different versions of encoding. > >i am talking about these simple charset encoding . so am I.
 Signature Canadian Mind Products, Roedy Green. http://mindprod.com Java custom programming, consulting and coaching.
Chris Uppal - 01 Feb 2006 08:11 GMT > so, this means, each encoding recognises other encoding.....and thats > why they are able to revert back. Not quite. Your argument is sensible but what you don't (yet ;-) know is that all or nearly all character encodings overlap for a certain range of characters. Specifically, the printable ASCII characters have the same numerical values in CP1252, ISO8859-1, and nearly all other character encodings (including ASCII). What's more the Unicode assigned code-points (numbers to you and me) for those characters are the same too.
So the String ABC contains the chars with numerical values 0x61 0x62 0x63. If we translate that to bytes using ISO8859-1 then we will get bytes with values 0x61 0x62 0x63. But don't let that mislead you, outside that limited range (essentially the printable characters in the range 32-127) things become very different.
In a way that overlap is very handy. It means that if someone sends me an old-fashioned, 8-bit, text file (not Unicode) written in English then the chances are that I'll be able to read it without me having to try to find out what codepage the author used to create it. Which is a good thing because (a) there's a good chance that the author hasn't got the faintest idea what a code-page /is/ let alone which one s/he used to create the file, and (b) I don't want to mess around trying to change code-page. Unfortunately, that only works for text using the restricted range of characters. As soon as you start using accented characters, or characters from non-English orthographies, the whole thing breaks down and life becomes very awkward. Which is what Unicode is /intended/ to avoid.
But in a way, it's a very Bad Thing too. Because of the overlap, it's very hard (at least for people handling mostly English text) to see when they've made a mistake with their programming. Or when they've carelessly, or sloppily, made assumptions about the code-page in use. It would be nice to have (perhaps as part of the standard JDK) a debugging Charset which mapped Unicode data to some sort of recognisable gibberish -- case-inverted or even "rot13" would do. For all I know, there could be one there already, and I've missed it...
-- chris
Thomas Hawtin - 01 Feb 2006 09:45 GMT > But in a way, it's a very Bad Thing too. Because of the overlap, it's very > hard (at least for people handling mostly English text) to see when they've [quoted text clipped - 4 lines] > "rot13" would do. For all I know, there could be one there already, and I've > missed it... UTF-16LE should more or less fit the bill. Perhaps UTF-16BE would work better with single characters (not entirely sure what happens with a single byte), although it is more common.
export LANG=tr_TR,UTF-16LE
Tom Hawtin
 Signature Unemployed English Java programmer http://jroller.com/page/tackline/
Chris Uppal - 01 Feb 2006 11:44 GMT [me:]
> > It would be nice to have (perhaps as part of the standard JDK) a > > debugging Charset which mapped Unicode data to some sort of [quoted text clipped - 3 lines] > UTF-16LE should more or less fit the bill. [...] > export LANG=tr_TR,UTF-16LE That's a thought. Not too sure about those NUL bytes though (haven't tried it yet).
BTW, for anyone who's interested, I rummaged around the Web a little and found a rot13 Charset, and the corresponding CharsetProvider, at the website for Ron Hitchens's "Java NIO" book (which I haven't read). The website is http://www.javanio.info/ the code (which is /not/ free for commercial use) is in: filearea/bookexamples/unpacked/com/ronsoft/books/nio/charset under the above root. See the files: RonsoftCharsetProvider.java Rot13Charset.java
The first of those files provides sketchy instructions for installing the new Charset; note that the instructions contain a typo; the filename META-INF/services/java.nio.charsets.spi.CharsetProvider shoud be META-INF/services/java.nio.charset.spi.CharsetProvider (no 's' on the end of charset).
-- chris
Roedy Green - 01 Feb 2006 12:39 GMT On Wed, 1 Feb 2006 11:46:07 -0000, "Chris Uppal" <chris.uppal@metagnostic.REMOVE-THIS.org> wrote, quoted or indirectly quoted someone who said :
>BTW, for anyone who's interested, I rummaged around the Web a little and found >a rot13 Charset, and the corresponding CharsetProvider, at the website for Ron >Hitchens's "Java NIO" book (which I haven't read). The website is > http://www.javanio.info/ If you feel up to rolling your own, the instructions for how to do it are at http://mindprod.com/jgloss/encoding.html#ROLLYOUROWN
It is a bunch of mindless housekeeping BS plus writing a decodeLoop and encodeLoop method to interconvert byte[] <=> char[]
 Signature Canadian Mind Products, Roedy Green. http://mindprod.com Java custom programming, consulting and coaching.
Roedy Green - 01 Feb 2006 10:16 GMT On Wed, 1 Feb 2006 08:11:56 -0000, "Chris Uppal" <chris.uppal@metagnostic.REMOVE-THIS.org> wrote, quoted or indirectly quoted someone who said :
> Which is a good thing because (a) >there's a good chance that the author hasn't got the faintest idea what a >code-page /is/ let alone which one s/he used to create the file, and (b) I >don't want to mess around trying to change code-page. And the encoding used is NOT embedded at the head of the document the way you might imagine it would be handled. The receiver just has to KNOW what encoding it is.
This reminds me back in the early 80s I wrote one of the first electronic medical billing programs for doctors for whom this was a complete novelty and status symbol. On a demo, one doctor was horrified, "You mean you have to TYPE; it doesn't just KNOW?"
Another doctor was furious at my incompetence when he discovered that he would lose keying when he rebooted his machine in the middle of data entry. I tried to explain that he should not reboot. There was no need to. He replied that he simply LIKED rebooting and he was not about to change his nervous habit.
 Signature Canadian Mind Products, Roedy Green. http://mindprod.com Java custom programming, consulting and coaching.
Roedy Green - 01 Feb 2006 10:16 GMT On Wed, 1 Feb 2006 08:11:56 -0000, "Chris Uppal" <chris.uppal@metagnostic.REMOVE-THIS.org> wrote, quoted or indirectly quoted someone who said :
>It would be nice to >have (perhaps as part of the standard JDK) a debugging Charset which mapped >Unicode data to some sort of recognisable gibberish -- case-inverted or even >"rot13" would do. For all I know, there could be one there already, and I've >missed it... what do you do with this?
 Signature Canadian Mind Products, Roedy Green. http://mindprod.com Java custom programming, consulting and coaching.
Chris Uppal - 01 Feb 2006 14:05 GMT [me:]
> > It would be nice to > > have (perhaps as part of the standard JDK) a debugging Charset > > which mapped Unicode data to some sort of recognisable gibberish -- > > case-inverted or even "rot13" would do.[...] > > what do you do with this? The problem for me, and I think for other programmers, is that you can't /see/ when something is happening using the wrong Charset. Since I'm only an English speaker, the only sample text I can read uses English characters throughout, and so if I use a wrong Charset there won't be any obvious differences (as "gk" found). So I'd like to be able to either set the default Charset to something that is instantly recognisable if it gets used when I'm not expecting it, or explicitly use my debugging charset, so that I can follow the data through and see that it is used everywhere that I intend.
Just a debugging aid. I'd have little use for it if I were -- say -- Korean.
It would probably be helpful as a teaching tool too (although I am not a teacher), since it would emphasise the difference between the character sequences in String (or similar) and the byte sequences produced by encoding -- differences that can be lost on those who's native language is ASCII-compatible.
-- chris
Roedy Green - 01 Feb 2006 21:22 GMT On 01 Feb 2006 14:05:25 GMT, "Chris Uppal" <chris.uppal@metagnostic.REMOVE-THIS.org> wrote, quoted or indirectly quoted someone who said :
>The problem for me, and I think for other programmers, is that you >can't /see/ when something is happening using the wrong Charset. Since [quoted text clipped - 5 lines] >use my debugging charset, so that I can follow the data through and see >that it is used everywhere that I intend. A very simple one might convert char s -> byte f, or simply that implemented some ligatures, see http://mindprod.com/jgloss/ligature.html to give a early American look to the page.
It then becomes a fully legit Charset you might use in real life. It can piggy back on any other charset adding ligaturisation to it.
See http://mindprod.com/encoding.html#ROLLYOUROWN
for how to proceed. Even a newbie could tackle this one.
 Signature Canadian Mind Products, Roedy Green. http://mindprod.com Java custom programming, consulting and coaching.
ozgwei - 03 Feb 2006 12:36 GMT > The problem for me, and I think for other programmers, is that you > can't /see/ when something is happening using the wrong Charset. Since [quoted text clipped - 14 lines] > produced by encoding -- differences that can be lost on those who's > native language is ASCII-compatible. Have you tried EBCDIC? The encoding name is Cp1047. But I don't know whether it is available in JVMs other than IBM's...
Roedy Green - 03 Feb 2006 22:46 GMT >Have you tried EBCDIC? The encoding name is Cp1047. But I don't know >whether it is available in JVMs other than IBM's... There are scores of national variants for EBCDIC.
Check out my chart at http://mindprod.com/jgloss/encoding.html for which ones are supported.
 Signature Canadian Mind Products, Roedy Green. http://mindprod.com Java custom programming, consulting and coaching.
Chris Uppal - 04 Feb 2006 12:55 GMT [me:]
> > The problem for me, and I think for other programmers, is that you > > can't see when something is happening using the wrong Charset. [...]
> Have you tried EBCDIC? The encoding name is Cp1047. But I don't know > whether it is available in JVMs other than IBM's... Thanks for the suggestion.
java -Dfile.encoding=Cp1047 my.test.Application
produces satisfying gibberish ;-)
(Actually it's probably /too/ gibberishish, Thomas's suggested UTF16 works a little better.)
-- chris
Chris Uppal - 31 Jan 2006 11:04 GMT > bit confused. I'm not certain, but I /think/ that you might be misunderstanding the relationship between Strings and Charsets.
A String has /no/ Charset, and is not associated with any particular byte encoding. (Technically this is only true if you are using the right APIs, but it close enough to being true to be a good approximation to start from[*]). That's to say a String contains pure Unicode data, not in any encoding, just pure characters. (Compare the way that an int contains pure integer data, separate from any encoding as big-endian or little-endian, or anything else). A Charset is only involved when you need to convert a String to bytes (or the other way around) in order to communicate with external systems or save the data to file.
So, in your original example, after String original = new String("A" + "\u00ea" + "\u00f1" + "\u00fc" + "C"); you have a String, original, which contains pure Unicode.
If you new do: byte[] utf8Bytes = original.getBytes("UTF8"); then you have the original data encoded as UTF-8. And later: String roundTrip = new String(utf8Bytes, "UTF8"); which gives you a new String containing pure Unicode data, assembled by decoding the UTF-8 bytes. Since UTF-8 is (by design) capable of encoding any Unicode data, no information will have been lost, and roundTrip will be the same as original.
When you do the same using the platform-default Charset: byte[] defaultBytes = original.getBytes(); String defaultTrip = new String(defaultBytes); The only thing that is different is that you are using a different Charset. So, if that Charset happens to be capable of encoding every character in the original String, no data will have been lost and roundTrip will be the same as original. If you had used any Unicode characters in original which could /not/ be encoded in the platform default Charset then the operation would have failed. Since the platform default Charset is machine-specific, that means that you don't really know what'd gong to happen when you convert Strings into byte[] arrays using it -- which is why using the platform default Charset is usually a bad idea.
But the important thing to realise is that Strings don't have Charsets. Charsets are only used when converting Strings to byte sequences.
-- chris
([*] We can talk more about that approximation, if you want, but it best to get the current confusion cleared up first)
Roedy Green - 30 Jan 2006 12:53 GMT >what is platform's default charset ? see http://mindprod.com/jgloss/encoding.html
for how to find out. Oddly it is a secret for unsigned Applets.
 Signature Canadian Mind Products, Roedy Green. http://mindprod.com Java custom programming, consulting and coaching.
Roedy Green - 30 Jan 2006 12:54 GMT > byte[] utf8Bytes =3D original.getBytes("UTF8"); > byte[] defaultBytes =3D original.getBytes(); > String roundTrip =3D new String(utf8Bytes, "UTF8"); > String defaultTrip =3D new String(defaultBytes); try dumping out the byte encodings. That will solve your mystery.
 Signature Canadian Mind Products, Roedy Green. http://mindprod.com Java custom programming, consulting and coaching.
Free MagazinesGet these publications absolutely FREE for up to 12 months. There are no hidden fees and no obligation. Simply choose a title, complete the application form and submit it. Read more ...
|
|
|