Hi:
Consider this file, saved to disk as utf-8, no BOM.
---------------------------------------------------
public class x
{
public static void main (String args[])
{
System.out.println("\u0222");
}
}
--------------------------------------------------------
By the way, unicode 0x0222 looks like a funky eight --> Ȣ
You may not see it in this news post because of your
newsreader, doesn't matter.
While compiling I've tried all of:
javac x.java
javac -encoding utf-8 x.java
javac -encoding utf8 x.java
javax -encoding UTF-8 x.java
javac -encoding UTF8 x.java
Using:
JDK 1.5, on both linux and osx (same problem)
If you run this (regardless of how you compile it), you
will see '?' instead of the proper unicode character
(regardless of output device, even if you output to a
unicode capable terminal that can properly render
0x0222, you still see '?'
Am I missing something or is this like the biggest most
retarded bug ever ?
--j
Real Gagnon - 14 Aug 2007 03:13 GMT
java <javadesigner@yahoo.com> wrote in news:f9r0uc$c7tf$1
@netnews.upenn.edu:
> If you run this (regardless of how you compile it), you
> will see '?' instead of the proper unicode character
> (regardless of output device, even if you output to a
> unicode capable terminal that can properly render
> 0x0222, you still see '?'
Try to run it with :
java -Dfile.encoding=UTF8 x
Bye.

Signature
Real Gagnon from Quebec, Canada
* Java, Javascript, VBScript and PowerBuilder code snippets
* http://www.rgagnon.com/howto.html
* http://www.rgagnon.com/bigindex.html
java - 14 Aug 2007 03:45 GMT
> Try to run it with :
> java -Dfile.encoding=UTF8 x
Ok, I tried that and that solved the problem.
But why ?
javac -encoding UTF8 x.java --> x.class
Now, shouldn't x.class be entirely self contained ? It's not
java source anymore.
So why do I have to set this property ? Is it because
the PrintWriter (System.out) uses this "file.encoding"
property internally ?
Background:
This becomes tricky when I have differently encoded web pages
(say jsp's) on the server at the same time (all of which print
debugging messages using System.out)
-j
Real Gagnon - 14 Aug 2007 12:34 GMT
java <javadesigner@yahoo.com> wrote in news:f9r50k$c3sp$1
@netnews.upenn.edu:
> This becomes tricky when I have differently encoded web pages
> (say jsp's) on the server at the same time (all of which print
> debugging messages using System.out)
The "file.encoding" trick is maybe ok for small console program but maybe
not with a server. You may want to use a special PrintStream instead.
See http://www.rgagnon.com/javadetails/java-0046.html
Bye.

Signature
Real Gagnon from Quebec, Canada
* Java, Javascript, VBScript and PowerBuilder code snippets
* http://www.rgagnon.com/howto.html
* http://www.rgagnon.com/bigindex.html
Juha Laiho - 19 Aug 2007 12:22 GMT
java <javadesigner@yahoo.com> said:
>> Try to run it with :
>> java -Dfile.encoding=UTF8 x
[quoted text clipped - 11 lines]
>the PrintWriter (System.out) uses this "file.encoding"
>property internally ?
That is because the JVM runtime does attempt to find out what
character encoding the environment outside the JVM uses, and
apparently in your environment it gets a native character set
of something else that UTF8.
So, even if you have funky UTF-8 characters in your source,
Java may be able to print them out in environments with some
other native character encoding, if that other encoding
happens to have a code point for the same character glyph.
For example, source code with UTF-8 may contain the byte
sequence [0xc3, 0xa4], signifying lower-case a-diaeresis
character glyph. Now, if that source code is compiled
properly, letting the compiler know that the source is in UTF-8
character set, and subsequently the code is run in an environment
with ISO-8859-1 character set, the program will output just
one byte, 0xE4. Also, if the same code is run in an environment
configured for plain US-ASCII character set, it will output
only a question mark (as US-ASCII character set does not have
a glyph for the a-diaeresis character.

Signature
Wolf a.k.a. Juha Laiho Espoo, Finland
(GC 3.0) GIT d- s+: a C++ ULSH++++$ P++@ L+++ E- W+$@ N++ !K w !O !M V
PS(+) PE Y+ PGP(+) t- 5 !X R !tv b+ !DI D G e+ h---- r+++ y++++
"...cancel my subscription to the resurrection!" (Jim Morrison)
Thomas Fritsch - 14 Aug 2007 04:09 GMT
java schrieb:
> Consider this file, saved to disk as utf-8, no BOM.
> ---------------------------------------------------
[quoted text clipped - 5 lines]
> }
> }
[...]
> While compiling I've tried all of:
>
[quoted text clipped - 14 lines]
>
> Am I missing something
Aaahm, yes.
Your *source* contains only harmless ASCII characters.
Remember, \ u 0 2 2 are in range 0x0020...0x007F, where ASCII is
identical to UTF-8. Therefore all your effort to make the compiler
understand UTF-8 is pointless. (sorry)
Your problem is not a compile-problem (javac), but a runtime-problem
(java). Real Gagnon already told how to parametrize java to use UTF-8.
But even that might not solve your problem, if the font used by your
terminal doesn't contain a rendering for the 0x0222 character.
By the way: Even my "Arial Unicode MS" font, which contains all of the
greek, cyrillic, armenian, chinese etc characters, has no renderings in
the range 0x0220..0x024F.
> or is this like the biggest most
> retarded bug ever ?

Signature
Thomas
Roedy Green - 17 Aug 2007 21:05 GMT
You are confused between the
encoding of the Java source, and the
encoding you want for the console output.
To configure the encoding of your Java source. See
http://mindprod.com/jgloss/encoding.html#SOURCE
To configure the default encoding
of your console and files, See
http://mindprod.com/jgloss/encoding.html#CONSOLE
P.S. "x" is not a suitable class name.
Classes should begin with a capital letter.

Signature
Roedy Green Canadian Mind Products
The Java Glossary
http://mindprod.com