Home | Contact Us | FAQ | Search & Site Map | Link to Us
Sign In | Join | Other 45 Sites in Network
HomeAnnouncementsWhite Papers
Discussion GroupsFirst AidDatabasesJavaBeansGUIJava 3DVirtual MachineCORBASecurityToolsGeneral
Java DirectoryOpen Source ProjectsSample Book ChaptersUser GroupsWeb Resources
Related Topics
Databases.NETMore Topics ...

Java Forum / General / January 2006

Tip: Looking for answers? Try searching our database.

different default encoding ?

Thread view: 
dt1649651@yahoo.com - 30 Jan 2006 21:14 GMT
I use String.split to parse a string which is read from a file. The
delimiter for parsing is the character 0xFC. This works fine under
Windows but it fails under Linux.

I also tried StringTokienizer and got the same problem.

My code is as follows :
private static char [] cLevel1 = { '\u00fc' };

private static String delimiterMultiValueLevel1 = new String( cLevel1
);
...

BufferReader f = new BufferedReader(new FileReader(args[1]));
String data = f.readLine();
String[]  splitted = data.split( delimiterMultiValueLevel1 );

Under Windows, spliited.length = the expected number of data separated
by 0xFC.
Under Linux, splitted.length is always 1.

I also tried this
private static char [] cLevel1 = { (char) 0xFC }; and has the same
reason : works under Windows, fails under Linux.

Any advise is really appreciated.

DT
Roedy Green - 30 Jan 2006 22:37 GMT
On 30 Jan 2006 13:14:45 -0800, "dt1649651@yahoo.com"
<dt1649651@yahoo.com> wrote, quoted or indirectly quoted someone who
said :

>I use String.split to parse a string which is read from a file. The
>delimiter for parsing is the character 0xFC. This works fine under
>Windows but it fails under Linux.

You have a file of bytes, right, and if you peek with a hex editor you
can see the 0xFC sitting there.

When you process, you are not working with bytes anymore. Your
FileReader did a conversion using the DEFAULT ENCODING from bytes to
char.

Your algorithm depends on that translation taking the byte 0xfc and
converting it to the character 0x00fc.  That happens for some
encodings. However some others translate it somewhere else.

The point is your data files are not actually encoded with the default
encoding, but with some particular encoding, perhaps ISO-8859-1 that
has the nice property of mapping 0xfc to 0x00fc.

so the practical solutions are.

1.  Use a char like tab that won't be molested in pretty well any
encoding.

2. Use an explicit encoding on your Reader on all platforms.  See
http://mindprod.com/jgloss/encoding.html
http://mindprod.com/applets/fileio.html
Signature

Canadian Mind Products, Roedy Green.
http://mindprod.com Java custom programming, consulting and coaching.

dt1649651@yahoo.com - 31 Jan 2006 20:31 GMT
Hi Roedy,

Thanks for your detailed reply and your helpful URLS.
After reading all those encoding info, I ended up with the following
changes and the program now works well under both Windows and Linux.

1. Specify the encoding in the delimiter
    String delimiterMultiValueLevel1 = new String( cLevel1,
"ISO8859_1");

2. Apply the encoding to the file reading.
           FileInputStream fi = new FileInputStream( args[1]);
           InputStreamReader s = new InputStreamReader( fi,
"ISO8859_1");
           BufferedReader f = new BufferedReader(s);

Thanks again,

DT


Free Magazines

Get these publications absolutely FREE for up to 12 months. There are no hidden fees and no obligation. Simply choose a title, complete the application form and submit it. Read more ...

Oracle MagazineNetwork ComputingComputer WorldBio-IT WorldeWeekInformation WeekInfosecurity
 
Sign In
Join
My Latest Posts
My Monitored Threads
My Blog
My Photo Gallery
My Profile
My Homepage

Start New Thread
Enable EMail Alerts
Rate this Thread



©2008 Advenet LLC   Privacy Policy - Terms of Use
This website includes both content owned or controlled by Advenet as well as content owned or controlled by third parties.