> Hello Everybody:
> As we all know,FileReader and FileWriter are both character stream
> classes.
Yes!
> When I use FileReader to read a text file which combines letters
> and Chinese Characters coding in ANSI's ascii.
No, you don't. Chinese simply cannot be coded in ASCII. May be your text
file is encoded in UTF-8 (see below).
> I know that each letter
> holds one byte disk space to store while every Chinese Characters
> occupies two.When that file has been read,it prints on the monitor
> screen totally corresponds with it's content!
There is already a misconception on your side:
(1) Correct is that ASCII requires one byte per character, because
ASCII can only encode the characters from 0x0000 to 0x007F, (into
bytes 0x00 .. 0x7F), nothing more.
(2) ASCII simply cannot encode the Chinese chars (0x4E00 .. 0xA000).
The key is to understand that there is a difference between *byte*
streams (InputStream, OutputStream) and *char* streams (Reader, Writer).
A byte is in range 0x00..0xFF, a char is in range 0x0000..0xFFFF.
Files are always sequences of bytes, but in your Java code you want to
deal with chars. Therefore Java has to do a translation between byte
streams and char streams, which is called "encoding" or "decoding".
Unfortunately there are many different encoding algorithms. "ASCII" is
just of them, others are "ISO-8859-1", "UTF-16", "UTF-8" and many more.
Some encodings ("UTF-8", "UTF-16") are able to encode all possible 65536
chars into bytes. Some others can encode only a subset of chars into
bytes (ASCII: only chars from 0x0000 to 0x007F, ISO-8859-1: only chars
from 0x0000 to 0x00FF). "UTF-16" always encodes 1 char into 2 bytes.
"UTF-8" encodes 1 char into 1, 2 or 3 bytes (depending on the char).
You find more info and more links at
<http://mindprod.com/jgloss/encoding.html>
> Now,here is my question:How does JVM identify one byte letter and two
> byte Chinese Character?
*You* tell it which encoding algorithm will be used. For example you can
write:
FileReader fr = new FileReader("text.txt", "UTF-8");
When you write:
FileReader fr = new FileReader("text.txt");
that actually means
FileReader fr = new FileReader("text.txt",
System.getProperty("file.encoding"));
If you choose the wrong encoding (for example: if you choose "UTF-16",
but your input file is actually encoded with "UTF-8"), then your program
simply will do wrong.
> Here is my program demo:
> import java.io.*;
[quoted text clipped - 20 lines]
> 这是一个测试文件!
> There are totally 31 characters in this file!
No, files always contain *bytes*, not *chars*.
Chars only occur within your Java program.

Signature
Thomas
Thomas Fritsch - 08 May 2007 15:50 GMT
> [...]
>> Now,here is my question:How does JVM identify one byte letter and two
[quoted text clipped - 7 lines]
> FileReader fr = new FileReader("text.txt",
> System.getProperty("file.encoding"));
Sorry, the above was wrong.
There is no constructor FileReader(String fileName, String encoding).
Hence there is no way to explicitly specify an encoding with FileReader.
When you write:
new FileReader("text.txt");
that essentially means
new InputStreamReader(new FileInputStream("test.txt"))
which in turn means
new InputStreamReader(new FileInputStream("test.txt"),
System.getProperty("file.encoding"))
Therefore I would strongly recommend *not* to use FileReader at all.
Instead use for example:
new InputStreamReader(new FileInputStream("test.txt"),
"UTF-8")
so that the encoding you get is really the encoding you want.

Signature
Thomas
> Hello Everybody:
> As we all know,FileReader and FileWriter are both character stream
> classes.When I use FileReader to read a text file which combines letters
> and Chinese Characters coding in ANSI's ascii.
Chinese characters can not be coded in ASCII.
Some links to get you started in the wonderful world of international
character sets:
http://czyborra.com/
http://www.i18nguy.com/unicode/codepages.html
http://www.unicode.org/
http://www.faqs.org/rfcs/rfc2044.html
Cheers
GRB

Signature
---------------------------------------------------------------------
Greg R. Broderick usenet200705@blackholio.dyndns.org
A. Top posters.
Q. What is the most annoying thing on Usenet?
---------------------------------------------------------------------