Home | Contact Us | FAQ | Search & Site Map | Link to Us
Sign In | Join | Other 45 Sites in Network
HomeAnnouncementsWhite Papers
Discussion GroupsFirst AidDatabasesJavaBeansGUIJava 3DVirtual MachineCORBASecurityToolsGeneral
Java DirectoryOpen Source ProjectsSample Book ChaptersUser GroupsWeb Resources
Related Topics
Databases.NETMore Topics ...

Java Forum / GUI / May 2005

Tip: Looking for answers? Try searching our database.

Charset problem when coverting from UTF-8 on OS X

Thread view: 
al schmid - 19 May 2005 11:00 GMT
Hello gurus

I read an XML doc over an internet connection. The XML comes encoded in
UTF-8, and with the values I do the following:

private String fromUTF8(String str) {
       try {
           return new String(str.getBytes("ISO8859_1"), "UTF-8");
       } catch (Exception e) {
           return str;
       }

}

The problem is that on OS X all the special chars of the german and
french languages (eg é, ö, ô etc.) get replaced by wrong characters
(mostly "??") when the app is run on OS X.
How can I make the app display the right chars (the umlaute etc...)?
What charset do I need to convert the UTF-8 Strings to? I tried leaving
them in UTF-8, converting to UTF-16 -- it all did not work.

Thanks a lot!
al
A. Bolmarcich - 19 May 2005 15:32 GMT
> Hello gurus
>
[quoted text clipped - 9 lines]
>
> }

How do you convert the bytes sent over the internet connection to a
String passed as an argument to fromUTF8?

That conversion may be using the default encoding, which does not
necessarily convert every possible byte value to a char value.
al schmid - 20 May 2005 08:31 GMT
I use the Element.getText() method of dom4j.

(...)
y = curUserElement.elementIterator("user_mid");
curUserMidElement = (Element) y.next();
fromUTF8(curUserUidElement.getText());

Why would it work on Windows but not on OS X?
thanks!
al
A. Bolmarcich - 23 May 2005 21:47 GMT
> I use the Element.getText() method of dom4j.
>
> (...)
> y = curUserElement.elementIterator("user_mid");
> curUserMidElement = (Element) y.next();
> fromUTF8(curUserUidElement.getText());

That does not fully answer my question of how do you convert the bytes
sent over the internet connection to a String passed as an argument to
fromUTF8.  In your original posting you wrote: "I read an XML doc over
an internet connection."

Although you described how you get a String from a DOM document, you
have not described how you get the bytes sent over the internet
connection into the DOM document.

> Why would it work on Windows but not on OS X?

Chances are there is a default byte-to-character conversion being done
that does not have a character for all the byte values sent over the
internet connection.  Without knowing the details of this conversion,
I can't give a detailed answer of why the results are different under
Microsoft Windows and Apple OS X.

Also, the expression

 new String(str.getBytes("ISO8859_1"), "UTF-8")

in your fromUTF8 function will return a useful value only if the
characters in str are US-ASCII.  Not every sequence of ISO8859_1
bytes is a valid sequence of UTF-8 bytes.

> thanks!
> al
al schmid - 25 May 2005 16:20 GMT
Sorry, it seems I didn't understand you.

Okay. On the server side I have a PHP script which encodes the XML via
utf8_encode($str) before echoing them.

On the client side I get the XML via the org.apache.commons.httpclient
where XMLString is the string which thereafter gets converted into a
dom4j-Document (Document doc =
DocumentHelper.parseText(this.xmlString);)

String line = new String("");
HttpClient client = new HttpClient();
           GetMethod method = new GetMethod(this.url);
           client.executeMethod(method);
           currentProgress = 20;

           BufferedReader br = new BufferedReader(new
InputStreamReader(method.getResponseBodyAsStream()));

           do  {
               line = br.readLine();
               this.XMLString += line;
           } while (line != null);

           method.releaseConnection();
A. Bolmarcich - 25 May 2005 19:25 GMT
> Sorry, it seems I didn't understand you.
>
[quoted text clipped - 14 lines]
>             BufferedReader br = new BufferedReader(new
> InputStreamReader(method.getResponseBodyAsStream()));

This answers my question.  This InputStreamReader constructor uses the
default byte to character encoding.  You can use

             BufferedReader br = new BufferedReader(new
 InputStreamReader(method.getResponseBodyAsStream(), "ISO8859_1"));

and the rest of your program should work as-is.

>             do  {
>                 line = br.readLine();
>                 this.XMLString += line;
>             } while (line != null);

Instead of reading the entire response from the server into a String, it
would be better to have a DOM parser parse the InputStream rather than a
String that you build from the InputStream.
al schmid - 27 May 2005 09:05 GMT
Now it works. Thanks a lot for your help!
al


Free Magazines

Get these publications absolutely FREE for up to 12 months. There are no hidden fees and no obligation. Simply choose a title, complete the application form and submit it. Read more ...

Oracle MagazineNetwork ComputingComputer WorldBio-IT WorldeWeekInformation WeekInfosecurity
 
Sign In
Join
My Latest Posts
My Monitored Threads
My Blog
My Photo Gallery
My Profile
My Homepage

Start New Thread
Enable EMail Alerts
Rate this Thread



©2008 Advenet LLC   Privacy Policy - Terms of Use
This website includes both content owned or controlled by Advenet as well as content owned or controlled by third parties.