I use String.split to parse a string which is read from a file. The
delimiter for parsing is the character 0xFC. This works fine under
Windows but it fails under Linux.
I also tried StringTokienizer and got the same problem.
My code is as follows :
private static char [] cLevel1 = { '\u00fc' };
private static String delimiterMultiValueLevel1 = new String( cLevel1
);
...
BufferReader f = new BufferedReader(new FileReader(args[1]));
String data = f.readLine();
String[] splitted = data.split( delimiterMultiValueLevel1 );
Under Windows, spliited.length = the expected number of data separated
by 0xFC.
Under Linux, splitted.length is always 1.
I also tried this
private static char [] cLevel1 = { (char) 0xFC }; and has the same
reason : works under Windows, fails under Linux.
Any advise is really appreciated.
DT
On 30 Jan 2006 13:14:45 -0800, "dt1649651@yahoo.com"
<dt1649651@yahoo.com> wrote, quoted or indirectly quoted someone who
said :
>I use String.split to parse a string which is read from a file. The
>delimiter for parsing is the character 0xFC. This works fine under
>Windows but it fails under Linux.
You have a file of bytes, right, and if you peek with a hex editor you
can see the 0xFC sitting there.
When you process, you are not working with bytes anymore. Your
FileReader did a conversion using the DEFAULT ENCODING from bytes to
char.
Your algorithm depends on that translation taking the byte 0xfc and
converting it to the character 0x00fc. That happens for some
encodings. However some others translate it somewhere else.
The point is your data files are not actually encoded with the default
encoding, but with some particular encoding, perhaps ISO-8859-1 that
has the nice property of mapping 0xfc to 0x00fc.
so the practical solutions are.
1. Use a char like tab that won't be molested in pretty well any
encoding.
2. Use an explicit encoding on your Reader on all platforms. See
http://mindprod.com/jgloss/encoding.html
http://mindprod.com/applets/fileio.html

Signature
Canadian Mind Products, Roedy Green.
http://mindprod.com Java custom programming, consulting and coaching.
dt1649651@yahoo.com - 31 Jan 2006 20:31 GMT
Hi Roedy,
Thanks for your detailed reply and your helpful URLS.
After reading all those encoding info, I ended up with the following
changes and the program now works well under both Windows and Linux.
1. Specify the encoding in the delimiter
String delimiterMultiValueLevel1 = new String( cLevel1,
"ISO8859_1");
2. Apply the encoding to the file reading.
FileInputStream fi = new FileInputStream( args[1]);
InputStreamReader s = new InputStreamReader( fi,
"ISO8859_1");
BufferedReader f = new BufferedReader(s);
Thanks again,
DT