Home | Contact Us | FAQ | Search & Site Map | Link to Us
Sign In | Join | Other 45 Sites in Network
HomeAnnouncementsWhite Papers
Discussion GroupsFirst AidDatabasesJavaBeansGUIJava 3DVirtual MachineCORBASecurityToolsGeneral
Java DirectoryOpen Source ProjectsSample Book ChaptersUser GroupsWeb Resources
Related Topics
Databases.NETMore Topics ...

Java Forum / GUI / September 2005

Tip: Looking for answers? Try searching our database.

UTF-16 encoded HTML

Thread view: 
Stephan Zimmermann - 31 Aug 2005 14:16 GMT
Hi all,
I have a HTML file, encoded in UTF-16, which my webbrowsers can display very
well. But when I load the same URL into JEditorPane, I only see some ugly
boxes and other trash.
<code>
JEditorPane ep = new JEditorPane(fileURL);
</code>

I also tried to read the content of the file over an FileInputStream, which
leads to a nice view of the HTML code.
<code>
InputStreamReader in = new InputStreamReader(new
FileInputStream(fileURL.getFile()),"UTF-16");
ep.read(in,"text/html; charset=UTF-16");
</code>

Is there a way to display a UTF-16 encoded HTML file? Or is there another
(easy) way to show (formatted) Information in multiple languages without
using different encodings?  
Andrew Thompson - 31 Aug 2005 14:38 GMT
> I have a HTML file, encoded in UTF-16, which my webbrowsers can display very
> well.

Is it valid HTML?  <http://validator.w3.org/>

> email-to: s.zimmermann@bm....onik.com

BTW - this is not some sort of help desk, so if you could
not be bothered coming back to the group to read replies,
please don't bother posting in the first place.

Signature

Andrew Thompson
physci.org 1point1c.org javasaver.com lensescapes.com athompson.info
"If no one out there understands, start your own revolution and cut out the
middle man."
Billy Bragg 'Waiting For The Great Leap Forward'

Thomas Fritsch - 31 Aug 2005 14:40 GMT
Stephan Zimmermann schrieb:

> Hi all,
> I have a HTML file, encoded in UTF-16, which my webbrowsers can display very
[quoted text clipped - 15 lines]
> (easy) way to show (formatted) Information in multiple languages without
> using different encodings?  
Does your HTML file have a line
<meta http-equiv="Content-Type" content="text/html; charset=UTF-16">
in its header?

Signature

"Thomas:Fritsch$ops:de".replace(':','.').replace('$','@')

Stephan Zimmermann - 31 Aug 2005 15:27 GMT
> Does your HTML file have a line
> <meta http-equiv="Content-Type" content="text/html; charset=UTF-16">
> in its header?

Yes, it has (now) and it is also valid HTML (now), but the problem stays the
same. Firefox or s.th. can disply it, while my JEditorPane only shows some
rubbish.
Andrew Thompson - 31 Aug 2005 15:52 GMT
>> Does your HTML file have a line
>> <meta http-equiv="Content-Type" content="text/html; charset=UTF-16">
[quoted text clipped - 3 lines]
> same. Firefox or s.th. can disply it, while my JEditorPane only shows some
> rubbish.

According to this, Unicode was not included until HTML 4.0.
<http://www.w3.org/International/O-HTML-charset.html>

Since JEditorPane is stuck rendering HTML 3.2, it might mean
it does not, and will never, render Unicode correctly.

BTW -
> Email-To: s.zimmermann@bmelektronik.com

You seem to have either ignored my advice not to request
follow-ups by email, or you are not reading the group.

Good luck.

Signature

Andrew Thompson
physci.org 1point1c.org javasaver.com lensescapes.com athompson.info
"A simple prop to occupy my time.."
R.E.M. 'The One I Love'

Roedy Green - 01 Sep 2005 06:45 GMT
>Since JEditorPane is stuck rendering HTML 3.2, it might mean
>it does not, and will never, render Unicode correctly.

Haven't they always worked with Unicode Strings, not encoded byte
arrays?

You could help partition the problem by generating a bit of the
awkward HTML with a String literal and see how JEditorPane deals with
that. You want first to find out if the problem is getting the String
or rendering the String.
Signature

Canadian Mind Products, Roedy Green.
http://mindprod.com Again taking new Java programming contracts.

Oliver Wong - 31 Aug 2005 15:47 GMT
> Hi all,
> I have a HTML file, encoded in UTF-16, which my webbrowsers can display
[quoted text clipped - 17 lines]
> (easy) way to show (formatted) Information in multiple languages without
> using different encodings?

   Did you try reading it via the InputStreamReader (as per the second
example), and then passing the string read-in to the JEditorPane?

   - Oliver
Roedy Green - 01 Sep 2005 04:49 GMT
>InputStreamReader in = new InputStreamReader(new
>FileInputStream(fileURL.getFile()),"UTF-16");
>ep.read(in,"text/html; charset=UTF-16");

You want to find the culprit:
1. the file
2. your understanding of the file format
3. reading the file.
4. rendering the file.

I suggest dumping out some of the text you read, perhaps in hex, so
you can verify it is precisely what you expect.

I also suggest peeking an the file itself with a hex editor to make
sure it is truly UTF-16 not UTF-8.

It is a rare HTML file that is  better encoded with UTF-16 than UTF-8.

You might read http://mindprod.com/jgloss/font.html
on the limitations of AWT and Swing components to display exotic
characters.

Signature

Canadian Mind Products, Roedy Green.
http://mindprod.com Again taking new Java programming contracts.

Stephan Zimmermann - 01 Sep 2005 10:07 GMT
Thanks a lot for the replies, I think we are getting closer. Let me
summarise where we are:
- My HTML file is valid 4.1
- The HTML renderer is only aware of HTML 3.2
- UTF-16 support is only available in HTML > 4.0
- The file I use is really UTF-16
--> that's bad :-(

So, after doing my Unicode homework, I did try out to use a UTF-8 encoded
file, which gave me a much better result:
<code>
JEditorPane ep = new JEditorPane(fileURL);
</code>
lets me read in - and render correctly - any English or other
Latin-something or even russian text. but when it comes to some Asian
(Japan) textparts, my JEditorPane shows these nasty little boxes again. I
think it is 'only' a font problem now. Would you agree with that, and can
anyone suggest a solution?

Best regards, Stephan

P.S.: Is the 'email to...' thing solved now?
Thomas Fritsch - 01 Sep 2005 12:15 GMT
Stephan Zimmermann schrieb:
> Thanks a lot for the replies, I think we are getting closer. Let me
> summarise where we are:
[quoted text clipped - 14 lines]
> think it is 'only' a font problem now. Would you agree with that, and can
> anyone suggest a solution?
BTW: You can and should check first, whether your font supports Asian
chars, indepedent of all that HTML/JEditorPane/encoding stuff:
  JTextField textField = new JTextField();
  textField.setText("\u4e00\u4e01\u4e02");  // 3 chinese chars

Signature

"Thomas:Fritsch$ops:de".replace(':','.').replace('$','@')

Stephan Zimmermann - 01 Sep 2005 13:17 GMT
> You can and should check first, whether your font supports Asian
> chars, indepedent of all that HTML/JEditorPane/encoding stuff:
>    JTextField textField = new JTextField();
>    textField.setText("\u4e00\u4e01\u4e02");  // 3 chinese chars

This does not look too good, the standard font won't display Asian
characters (nither your Chinese nor my Japanese).
When I see it right, the font-problem is not only limited to 'making it
work' on my development box, it is also about making it work on other
boxes/platforms because the Font system is AWT and therefore depends on the
OS and the installed fonts I am running, right?. And so I can not guarantee
that what looks good here will look good there, even if I style my HTML
correctly with CSS?
And after all it leads to the question where to find and how to use
appropriate fonts.
Oliver Wong - 01 Sep 2005 15:41 GMT
> This does not look too good, the standard font won't display Asian
> characters (nither your Chinese nor my Japanese).
[quoted text clipped - 8 lines]
> And after all it leads to the question where to find and how to use
> appropriate fonts.

   I don't know where off the top of my head, but I think there are some
freely available fonts with Japanese characters in them. Check their
licensing agreements, and if posible, package them with your program, then
tell Java to specifically load and use that font file.

   - Oliver
Roedy Green - 02 Sep 2005 08:40 GMT
>lets me read in - and render correctly - any English or other
>Latin-something or even russian text. but when it comes to some Asian
>(Japan) textparts, my JEditorPane shows these nasty little boxes again. I
>think it is 'only' a font problem now. Would you agree with that, and can
>anyone suggest a solution?

That you can sort out with FontShower 2.0 which displays:

 + "\u30b0" // Katakana Japanese
 + "\u3041" // Hiragana Japanese

as part of its font sample test.

It looks as though many Asian fonts don't map properly to unicode, but
rather map over top of the Roman alphabet.

 
 
Signature

Canadian Mind Products, Roedy Green.
http://mindprod.com Again taking new Java programming contracts.



Free Magazines

Get these publications absolutely FREE for up to 12 months. There are no hidden fees and no obligation. Simply choose a title, complete the application form and submit it. Read more ...

Oracle MagazineNetwork ComputingComputer WorldBio-IT WorldeWeekInformation WeekInfosecurity
 
Sign In
Join
My Latest Posts
My Monitored Threads
My Blog
My Photo Gallery
My Profile
My Homepage

Start New Thread
Enable EMail Alerts
Rate this Thread



©2008 Advenet LLC   Privacy Policy - Terms of Use
This website includes both content owned or controlled by Advenet as well as content owned or controlled by third parties.