Java Forum / GUI / September 2005
UTF-16 encoded HTML
Stephan Zimmermann - 31 Aug 2005 14:16 GMT Hi all, I have a HTML file, encoded in UTF-16, which my webbrowsers can display very well. But when I load the same URL into JEditorPane, I only see some ugly boxes and other trash. <code> JEditorPane ep = new JEditorPane(fileURL); </code>
I also tried to read the content of the file over an FileInputStream, which leads to a nice view of the HTML code. <code> InputStreamReader in = new InputStreamReader(new FileInputStream(fileURL.getFile()),"UTF-16"); ep.read(in,"text/html; charset=UTF-16"); </code>
Is there a way to display a UTF-16 encoded HTML file? Or is there another (easy) way to show (formatted) Information in multiple languages without using different encodings?
Andrew Thompson - 31 Aug 2005 14:38 GMT > I have a HTML file, encoded in UTF-16, which my webbrowsers can display very > well. Is it valid HTML? <http://validator.w3.org/>
> email-to: s.zimmermann@bm....onik.com BTW - this is not some sort of help desk, so if you could not be bothered coming back to the group to read replies, please don't bother posting in the first place.
 Signature Andrew Thompson physci.org 1point1c.org javasaver.com lensescapes.com athompson.info "If no one out there understands, start your own revolution and cut out the middle man." Billy Bragg 'Waiting For The Great Leap Forward'
Thomas Fritsch - 31 Aug 2005 14:40 GMT Stephan Zimmermann schrieb:
> Hi all, > I have a HTML file, encoded in UTF-16, which my webbrowsers can display very [quoted text clipped - 15 lines] > (easy) way to show (formatted) Information in multiple languages without > using different encodings? Does your HTML file have a line <meta http-equiv="Content-Type" content="text/html; charset=UTF-16"> in its header?
 Signature "Thomas:Fritsch$ops:de".replace(':','.').replace('$','@')
Stephan Zimmermann - 31 Aug 2005 15:27 GMT > Does your HTML file have a line > <meta http-equiv="Content-Type" content="text/html; charset=UTF-16"> > in its header? Yes, it has (now) and it is also valid HTML (now), but the problem stays the same. Firefox or s.th. can disply it, while my JEditorPane only shows some rubbish.
Andrew Thompson - 31 Aug 2005 15:52 GMT >> Does your HTML file have a line >> <meta http-equiv="Content-Type" content="text/html; charset=UTF-16"> [quoted text clipped - 3 lines] > same. Firefox or s.th. can disply it, while my JEditorPane only shows some > rubbish. According to this, Unicode was not included until HTML 4.0. <http://www.w3.org/International/O-HTML-charset.html>
Since JEditorPane is stuck rendering HTML 3.2, it might mean it does not, and will never, render Unicode correctly.
BTW -
> Email-To: s.zimmermann@bmelektronik.com You seem to have either ignored my advice not to request follow-ups by email, or you are not reading the group.
Good luck.
 Signature Andrew Thompson physci.org 1point1c.org javasaver.com lensescapes.com athompson.info "A simple prop to occupy my time.." R.E.M. 'The One I Love'
Roedy Green - 01 Sep 2005 06:45 GMT >Since JEditorPane is stuck rendering HTML 3.2, it might mean >it does not, and will never, render Unicode correctly. Haven't they always worked with Unicode Strings, not encoded byte arrays?
You could help partition the problem by generating a bit of the awkward HTML with a String literal and see how JEditorPane deals with that. You want first to find out if the problem is getting the String or rendering the String.
 Signature Canadian Mind Products, Roedy Green. http://mindprod.com Again taking new Java programming contracts.
Oliver Wong - 31 Aug 2005 15:47 GMT > Hi all, > I have a HTML file, encoded in UTF-16, which my webbrowsers can display [quoted text clipped - 17 lines] > (easy) way to show (formatted) Information in multiple languages without > using different encodings? Did you try reading it via the InputStreamReader (as per the second example), and then passing the string read-in to the JEditorPane?
- Oliver
Roedy Green - 01 Sep 2005 04:49 GMT >InputStreamReader in = new InputStreamReader(new >FileInputStream(fileURL.getFile()),"UTF-16"); >ep.read(in,"text/html; charset=UTF-16"); You want to find the culprit: 1. the file 2. your understanding of the file format 3. reading the file. 4. rendering the file.
I suggest dumping out some of the text you read, perhaps in hex, so you can verify it is precisely what you expect.
I also suggest peeking an the file itself with a hex editor to make sure it is truly UTF-16 not UTF-8.
It is a rare HTML file that is better encoded with UTF-16 than UTF-8.
You might read http://mindprod.com/jgloss/font.html on the limitations of AWT and Swing components to display exotic characters.
 Signature Canadian Mind Products, Roedy Green. http://mindprod.com Again taking new Java programming contracts.
Stephan Zimmermann - 01 Sep 2005 10:07 GMT Thanks a lot for the replies, I think we are getting closer. Let me summarise where we are: - My HTML file is valid 4.1 - The HTML renderer is only aware of HTML 3.2 - UTF-16 support is only available in HTML > 4.0 - The file I use is really UTF-16 --> that's bad :-(
So, after doing my Unicode homework, I did try out to use a UTF-8 encoded file, which gave me a much better result: <code> JEditorPane ep = new JEditorPane(fileURL); </code> lets me read in - and render correctly - any English or other Latin-something or even russian text. but when it comes to some Asian (Japan) textparts, my JEditorPane shows these nasty little boxes again. I think it is 'only' a font problem now. Would you agree with that, and can anyone suggest a solution?
Best regards, Stephan
P.S.: Is the 'email to...' thing solved now?
Thomas Fritsch - 01 Sep 2005 12:15 GMT Stephan Zimmermann schrieb:
> Thanks a lot for the replies, I think we are getting closer. Let me > summarise where we are: [quoted text clipped - 14 lines] > think it is 'only' a font problem now. Would you agree with that, and can > anyone suggest a solution? BTW: You can and should check first, whether your font supports Asian chars, indepedent of all that HTML/JEditorPane/encoding stuff: JTextField textField = new JTextField(); textField.setText("\u4e00\u4e01\u4e02"); // 3 chinese chars
 Signature "Thomas:Fritsch$ops:de".replace(':','.').replace('$','@')
Stephan Zimmermann - 01 Sep 2005 13:17 GMT > You can and should check first, whether your font supports Asian > chars, indepedent of all that HTML/JEditorPane/encoding stuff: > JTextField textField = new JTextField(); > textField.setText("\u4e00\u4e01\u4e02"); // 3 chinese chars This does not look too good, the standard font won't display Asian characters (nither your Chinese nor my Japanese). When I see it right, the font-problem is not only limited to 'making it work' on my development box, it is also about making it work on other boxes/platforms because the Font system is AWT and therefore depends on the OS and the installed fonts I am running, right?. And so I can not guarantee that what looks good here will look good there, even if I style my HTML correctly with CSS? And after all it leads to the question where to find and how to use appropriate fonts.
Oliver Wong - 01 Sep 2005 15:41 GMT > This does not look too good, the standard font won't display Asian > characters (nither your Chinese nor my Japanese). [quoted text clipped - 8 lines] > And after all it leads to the question where to find and how to use > appropriate fonts. I don't know where off the top of my head, but I think there are some freely available fonts with Japanese characters in them. Check their licensing agreements, and if posible, package them with your program, then tell Java to specifically load and use that font file.
- Oliver
Roedy Green - 02 Sep 2005 08:40 GMT >lets me read in - and render correctly - any English or other >Latin-something or even russian text. but when it comes to some Asian >(Japan) textparts, my JEditorPane shows these nasty little boxes again. I >think it is 'only' a font problem now. Would you agree with that, and can >anyone suggest a solution? That you can sort out with FontShower 2.0 which displays:
+ "\u30b0" // Katakana Japanese + "\u3041" // Hiragana Japanese
as part of its font sample test.
It looks as though many Asian fonts don't map properly to unicode, but rather map over top of the Roman alphabet.
 Signature Canadian Mind Products, Roedy Green. http://mindprod.com Again taking new Java programming contracts.
Free MagazinesGet these publications absolutely FREE for up to 12 months. There are no hidden fees and no obligation. Simply choose a title, complete the application form and submit it. Read more ...
|
|
|