Java Forum / First Aid / September 2005
output ascii text file
oleth - 25 Sep 2005 17:59 GMT Hello everyone, I am trying to write output to a text file. But I am not sure whether the results I get are normal.
I try to write out in ASCII format in order to be able to read with MS notepad. I have followed these 3 different tryout small programms (they follow) using FileWriter, BufferedWriter ,PrintStream and DataOutputStream. The problem is that when I open the file wrote with (FileWriter or BufferedWriter ) i only get a line with IIIIIIII... (Not "i" the letter. Another character similar to "i" without the dot over it , the one you get for ASCII numbers 1 to 5). When I use DataOutputStream I see what I ought to see but the problem is that is writes 2-digit chars and every character is followed by a space
When I use PrintStream all is fine. Is this the only way?.
Roedy Green - 25 Sep 2005 21:40 GMT >I try to write out in ASCII format in order to be able to read with MS >notepad. I have followed these 3 different tryout small programms (they [quoted text clipped - 7 lines] >is that is writes 2-digit chars and every character is followed by a >space DataOutputStream emits binary. This is not intended to be human-readable. See http://mindprod.com/jgloss/binary.hml http://mindprod.com/jgloss/binaryformat.html If you want something readable use a Writer or PrintWriter.
see http://mindprod.com/applets/fileio.html for how.
 Signature Canadian Mind Products, Roedy Green. http://mindprod.com Again taking new Java programming contracts.
oleth - 26 Sep 2005 09:49 GMT > DataOutputStream emits binary. This is not intended to be > human-readable. See http://mindprod.com/jgloss/binary.hml [quoted text clipped - 6 lines] > Canadian Mind Products, Roedy Green. > http://mindprod.com Again taking new Java programming contracts. I forgot to mention that I dont write strings, only seperate chars. So I use
FileWriter write(int) BufferedWriter write(int) PrintStream write(int) DataOutputStream writeChar(char)
In the documentation it says they write characters
Roedy Green - 26 Sep 2005 11:22 GMT >In the documentation it says they write characters those are 16 bit chars. Your editor is expecting 8-bit chars. In any case use a Writer. You can then easily flip your encoding.
See http://mindprod.com/jgloss/encoding.html
Try looking at your file with notepad. It might make sense of it, especially if you start with an endian marker.
See http://mindprod.com/jgloss/utf.html
 Signature Canadian Mind Products, Roedy Green. http://mindprod.com Again taking new Java programming contracts.
oleth - 26 Sep 2005 12:49 GMT > >In the documentation it says they write characters > > those are 16 bit chars. Your editor is expecting 8-bit chars. In any > case use a Writer. You can then easily flip your encoding. > > See http://mindprod.com/jgloss/encoding.html Hi Roedy
I read what you proposed and you are right. I stick with the Writer Classes and yes, with OutputStreamWriter you can write the encoding you want. But the problem remains... Notepad doesn't seem to recognize it at all (I get the IIIIII stuff). I tried the Gnome's gedit and it said "can't recognize the encoding of the file. Maybe you try to edit binary file" On the other hand the file size is 256 bytes, which I guess is the right size.
Here is the sample code I use in case it helps:
import java.io.*; import java.nio.charset.*;
public class WriteFile_encoding{
public static void main ( String[] args) throws IOException { String filename = new String("textfile.txt"); char tempChar = 'a'; char myChar[] = new char[1]; int k;
try { OutputStream outputFOS = new FileOutputStream(filename); OutputStream outputBOS = new BufferedOutputStream(outputFOS ); OutputStreamWriter outputOSW = new OutputStreamWriter(outputBOS, "US-ASCII");
// check to see what encoding is used System.out.println( outputOSW.getEncoding() );
for(k=0; k<=255; k++) { tempChar = (char) k; myChar[0] = (char) k;
System.out.println("("+ k +") " + tempChar); //outputOSW.write(k); // when I use that I get the same... outputOSW.write(myChar, 0, 1);
} outputOSW.close(); } catch(FileNotFoundException e) { //file could not be opened System.out.println("Unable to open file: " + filename); } catch(IOException e) { // the file could not be read or closed System.out.println("Unable to read or close file: " + filename); } } }
oleth - 26 Sep 2005 13:03 GMT I did some tests and it seems its clearing up a little. I read the file (the output from the code in the previous post). I dont know how, but its encoding is set to Cp1253 (Windows Greek). I use windows XP English but as a greek user I had added greek support.
So I guess something is wrong with my code when writting the file.
Here is the code to read the file:
import java.io.*; import java.nio.charset.*;
public class ReadFile_encoding {
public static void main ( String[] args) throws IOException { String filename = new String("textfile.txt");
try { FileReader inFile = new FileReader(filename); BufferedReader inFileBR = new BufferedReader(inFile );
InputStream inputFIS = new FileInputStream(filename); InputStream inputBIS = new BufferedInputStream(inputFIS ); InputStreamReader temp_inputISR = new InputStreamReader(inputBIS);
InputStreamReader inputISR = new InputStreamReader(inputBIS, temp_inputISR.getEncoding() );
System.out.println(inputISR.getEncoding() );
int i=0; char tempChar = 'a';
while( ( i=inputISR.read() )!=-1 ) { tempChar = (char) i; System.out.println( "(" + i + ") " + tempChar ); }
inFileBR.close(); } catch(FileNotFoundException e) { //file could not be opened System.out.println("Unable to open file: " + filename); } catch(IOException e) { // the file could not be read or closed System.out.println("Unable to read or close file: " + filename); } } }
Oliver Wong - 26 Sep 2005 19:06 GMT >I did some tests and it seems its clearing up a little. I read the file > (the output from the code in the previous post). I dont know how, but > its encoding is set to Cp1253 (Windows Greek). > I use windows XP English but as a greek user I had added greek support. Don't know how to solve your encoding problem, but I just wanted to comment on this line:
> String filename = new String("textfile.txt"); Is there a reason why you wrote this, instead of:
<code> String filename = "textfile.txt"; </code>
- Oliver
Roedy Green - 26 Sep 2005 21:25 GMT >So I guess something is wrong with my code when writting the file. Let me repeat my original advice. For code to read and write files please consult http://mindprod.com/applets/fileio.html
It will show you how to do it in binary or various encodings.
You don't write in one format and read in another.
The encodings you might find of interest are: UTF-8 = 8 bit Unicode Windows-1253 = 8-bit MS Greek UTF-16 = 16 bit Unicode Windows-1252 = Latin1 Windows default
if you want to experiment writing 16 bit unicode in binary with a DataOutputStream, write some 16 bit unicode using a Writer first then compare your two outputs using a hex viewer, so you can see what you are doing wrong.
See http://mindprod.com/jgloss/hex.html
 Signature Canadian Mind Products, Roedy Green. http://mindprod.com Again taking new Java programming contracts.
oleth - 27 Sep 2005 09:51 GMT > >So I guess something is wrong with my code when writting the file. > > Let me repeat my original advice. For code to read and write files > please consult http://mindprod.com/applets/fileio.html > It will show you how to do it in binary or various encodings. Hi, I went there, read the page and then used the applet to produce the read/write code. I used the options
sequential file | write | unbuffered | Locale encoding chars sequential file | read | unbuffered | Locale encoding chars
I intergrated the code (copy-paste) that the apllet displayed into my code using FileOutputStream - OutputStreamWriter -PrintWriter and FileInputStream - InputStreamReader
But 2 problems remain: 1) when I write the file, I still can't read it with notepad or gedit. Nano reads it. The first 31 are not recognized (expected). The last 126 are just questionmarks (?) 2) the hex editor reveals that chars 125-255 are all just 3F (the same).
Roedy Green - 27 Sep 2005 19:12 GMT >But 2 problems remain: >1) when I write the file, I still can't read it with notepad or gedit. >Nano reads it. The first 31 are not recognized (expected). The last 126 >are just questionmarks (?) >2) the hex editor reveals that chars 125-255 are all just 3F (the same). you have three problems. You cannot do file i/o in an Applet without signing it. See http://mindprod.com/jgloss/applets.html http://mindprod.com/jgloss/signedapplets.html
What encoding did you specify for your write? Look at the list at http://mindprod.com/jgloss/encoding.html to make sure you used the proper name and that your encoding is supported.
Please show the code you used modified from FileIO to do your output. The devil is in the details.
It sounds like you chose an 8-bit encoding that does not support the Unicode characters you tried to display.
An example would be if you tried to take Unicode Chinese characters and display them in 8-bit Cp863 French Canadian DOS.
 Signature Canadian Mind Products, Roedy Green. http://mindprod.com Again taking new Java programming contracts.
Andrew Thompson - 28 Sep 2005 08:55 GMT >>But 2 problems remain: >>1) when I write the file, I still can't read it with notepad or gedit. [quoted text clipped - 3 lines] > > you have three problems. You cannot do file i/o in an Applet When I first read oleth's reply, I made the same mistake.
The 'applet' that oleg refers to is your applet. I think oleg means they are copy/pasting the code from the text area of your FileI/O *applet* into the *application* source.
Andrew Thompson - 28 Sep 2005 08:56 GMT >>But 2 problems remain: >>1) when I write the file, I still can't read it with notepad or gedit. [quoted text clipped - 3 lines] > > you have three problems. You cannot do file i/o in an Applet When I first read oleth's reply, I made the same mistake.
The 'applet' that oleth refers to is your applet. I think oleth means they are copy/pasting the code from the text area of your FileI/O *applet* into their *application* source.
oleth - 28 Sep 2005 09:02 GMT > >But 2 problems remain: > >1) when I write the file, I still can't read it with notepad or gedit. [quoted text clipped - 5 lines] > signing it. See http://mindprod.com/jgloss/applets.html > http://mindprod.com/jgloss/signedapplets.html I dont use an applet in my code. I refered to the applet of the page http://mindprod.com/applets/fileio.html which "generates" I/O code.
> What encoding did you specify for your write? Look at the list at > http://mindprod.com/jgloss/encoding.html to make sure you used the > proper name and that your encoding is supported. > > Please show the code you used modified from FileIO to do your output. > The devil is in the details. That's absolutely true... I might be making somewhere some kind of stupid mistake, I can't find out. I use the "US-ASCII", which I found in both the sun's documentation and the page you provided. I would like to thank you for all your effort to help!
Here's the code:
/********************** write file **********************/
import java.io.*; //import java.nio.charset.*;
public class WriteFile_encoding2{
public static void main ( String[] args) throws IOException { String filename = new String("textfile"); char tempChar = 'a'; int k;
try { // O P E N FileOutputStream fos = new FileOutputStream( filename ); OutputStreamWriter eosw = new OutputStreamWriter( fos,"US-ASCII" );
PrintWriter prw = new PrintWriter( eosw, false /* auto flush on println */ );
// check to see what encoding is used System.out.println( eosw.getEncoding() );
for(k=0; k<=255; k++) { tempChar = (char) k;
System.out.println("("+ k +") " + tempChar);
// W R I T E prw.write( k ); prw.flush();
}
// C L O S E prw.close(); } catch(FileNotFoundException e) { //file could not be opened System.out.println("Unable to open file: " + filename); } catch(IOException e) { // the file could not be read or closed System.out.println("Unable to read or close file: " + filename); } } }
/********************** write file **********************/
import java.io.*; import java.nio.charset.*;
public class ReadFile_encoding2 {
public static void main ( String[] args) throws IOException { String filename = new String("textfile");
try { // O P E N FileInputStream fis = new FileInputStream( filename); InputStreamReader eisr = new InputStreamReader( fis,"US-ASCII" );
System.out.println(eisr.getEncoding() );
int aChar; char myChar='a';
while( ( aChar=eisr .read() )!=-1 ) { // R E A D myChar= (char) aChar; System.out.println( "(" + aChar + ") " + myChar); }
// C L O S E eisr .close(); } catch(FileNotFoundException e) { //file could not be opened System.out.println("Unable to open file: " + filename); } catch(IOException e) { // the file could not be read or closed System.out.println("Unable to read or close file: " + filename); } } }
Roedy Green - 28 Sep 2005 10:08 GMT >I dont use an applet in my code. I refered to the applet of the page >http://mindprod.com/applets/fileio.html which "generates" I/O code. That's good to hear. One headache out the way.
 Signature Canadian Mind Products, Roedy Green. http://mindprod.com Again taking new Java programming contracts.
Roedy Green - 28 Sep 2005 10:23 GMT >"US-ASCII" US-ASCII is an impoverished character set. It is about as basic as you can get. See http://mindprod.com/jgloss/ascii.html for the symbols it supports.
Try UTF-8 or Windows-1252 if you expect to see anything but basic unaccented letters, digits and typewriter punctuation.
Bit of history here. I remember sitting down with Vern Detwiler (later founder of Macdonald Detwiler). He was trying to come up with a 6-bit code that we would use at UBC. He was debating the merits of basing it on ASCII which was the first time I had ever heard the term.. He was bouncing ideas of various people about what they thought our code should look like.
My home tube-based machine used Friden Flexowriter 6-bit paper tape. TTYs used 8-bit paper tape code. Punch cards used 12-bits per column. Today we have Unicode which was 16 bits and has already grown to 32.
Every machine I worked on in the early days had its own local or proprietary encoding.
ASCII and Unicode were great leaps forward, more political coups than technological ones.
 Signature Canadian Mind Products, Roedy Green. http://mindprod.com Again taking new Java programming contracts.
oleth - 28 Sep 2005 11:59 GMT > >"US-ASCII" > [quoted text clipped - 21 lines] > ASCII and Unicode were great leaps forward, more political coups than > technological ones. Ok the issue is settled now. If I use US-ASCII and I can write only the 125 first characters, the rest are written as questionmark "?" (63). If I use ISO-8859-1 or UTF-8 (the sun docs say they are both standard charsets) I get all the letters 0-255.
Notepad though, still can't read the UTF8. I open it and it has only the "IIII "stuff. The funny thing is that when I go to "Save As" the file, alone notepad understand it's Unicode and sets by itself the "Encoding" to unicode, but in no vain :) Nevermind, I simply won't use notepad. Wordpad, Nano and gedit see the file just fine...
I greatly appreciate your help Roedy. Thanks again for all your time and effort :)
Oliver Wong - 28 Sep 2005 16:30 GMT > Notepad though, still can't read the UTF8. I open it and it has only > the "IIII "stuff. The funny thing is that when I go to "Save As" the > file, alone notepad understand it's Unicode and sets by itself the > "Encoding" to unicode, but in no vain :) > Nevermind, I simply won't use notepad. Wordpad, Nano and gedit see the > file just fine... I was told by a friend (but did not verify independently) that Microsoft Notepad's implementation of Unicode encodings are broken. Something about byte order marks being misinterpreted.
- Oliver
Roedy Green - 29 Sep 2005 03:02 GMT > I was told by a friend (but did not verify independently) that Microsoft >Notepad's implementation of Unicode encodings are broken. Something about >byte order marks being misinterpreted. Notepad on my Win2k machine is fine. It is just it has no way to deal with unmarked UTF-8. It won't take user hints. It insists on seeing the marker.
 Signature Canadian Mind Products, Roedy Green. http://mindprod.com Again taking new Java programming contracts.
oleth - 29 Sep 2005 08:36 GMT > > I was told by a friend (but did not verify independently) that Microsoft > >Notepad's implementation of Unicode encodings are broken. Something about [quoted text clipped - 6 lines] > Canadian Mind Products, Roedy Green. > http://mindprod.com Again taking new Java programming contracts. You are right. When saved as "UTF-8" (from notepad) in the file there are in the beginning EF BB BF.
I also note that it changes tha values written (with hex you see is completely different stuff in it). The file size also changes from 384 bytes to 575. I tried to figure out what encoding is notepad using, but in no vain. I wrote files with java as UTF-16BE, UTF-16LE and UTF-16, but no... The files java produces are completely different from what notepad saves
In fact when I use UTF-16 it sees "Unicode big endian" (the Save As). The LE, BE are interpreted as ANSI...
If I want the bytes written remain unalterd I have to use the "Unicode" encoding. It adds in the beginning FF FE and voila! the file is the same appart the 2 bytes in the beginning.
Did I mentioned, that when I read with notepad the file saved by notepad as "Unicode" or "UTF-8", I only get IIIIII stuff? ...
As far as concern Wordpad, when I open the file written by java as UTF8 it sees it right, but if you try to save the file as "Unicode Text Document", it adds FF FE and it doubles its size from 384 to 776 bytes.
I am new to encoding and stuff and there might be a logic explenation but it seems like magic ... I should have changed the post's subject to "output magic tricks with bacic text editors" :)
For the history Notepad version is: 5.1 (Build 2600.xpsp_sp2_rtm.040803-2158: Service Pack 2)
Roedy Green - 29 Sep 2005 09:19 GMT > The files java produces are completely different from what >notepad saves When you save-as in notepad notice that you can select the format.
 Signature Canadian Mind Products, Roedy Green. http://mindprod.com Again taking new Java programming contracts.
Roedy Green - 29 Sep 2005 09:33 GMT >As far as concern Wordpad, when I open the file written by java as UTF8 >it sees it right, but if you try to save the file as "Unicode Text >Document", it adds FF FE and it doubles its size from 384 to 776 bytes. FF FE is "UnicodeLittle" 16 bit, little-endian, marked. I'd think you should also be able to read it with "UTF-16".
Note the table at http://mindprod.com/jgloss/utf.html of markers for identifying encodings and the table of encodings at http://mindprod.com/jgloss/encoding.html
 Signature Canadian Mind Products, Roedy Green. http://mindprod.com Again taking new Java programming contracts.
Roedy Green - 29 Sep 2005 09:45 GMT >I wrote files with java as UTF-16BE, UTF-16LE and UTF-16, >but no... The files java produces are completely different from what >notepad saves Try writing in java with "UTF-8" encoding. The file you create will be missing the proper EF BB BF header.
You can glue on one this way:
create a 3 byte file with a hex editor containing just EF BB BF and save as head.txt ( or use a Java FileOutputStream to create it).
create your java file in java.txt as you are now with UTF-8 encoding.
copy /b head.txt + java.txt notepad.txt
Now examine the notepad.txt in notepad. It should be happy.
Now do the concatenate in java with a FileInputStream reading raw bytes into a byte array and then writing the header, then the bytes. See http://mindprod.com/applets/fileio.html for how. Alternatively, you can create the encoded file into a ByteArrayOutputStream than dump the 3 bytes and that into a FileOutputStream. See http://mindprod.com/applets/fileio.html for how.
What is the matter with Sun? They support every encoding under the Sun but don't let you create files in the main interchange encoding that anyone should ever use nowadays -- marked UTF-8.
 Signature Canadian Mind Products, Roedy Green. http://mindprod.com Again taking new Java programming contracts.
Roedy Green - 29 Sep 2005 03:01 GMT >Notepad though, still can't read the UTF8. Notepad wants a marker on front EF BB BF of its UTF-8 files. Does the file you create have that marker? You can check with a hex viewer.
See http://mindprod.com/jgloss/hex.html
Quoting from http://mindprod.com/encoding.html under UTF-8:
8-bit encoded Unicode. neé UTF8. Optional marker on front of file: EF BB BF for reading. Unfortunately, OutputStreamWriter does not automatically insert the marker on writing. Notepad can't read the file without this marker.
Now the question is, how do you get that marker in there? That is a trickier question than it first appears. You can't just emit the bytes EF BB BF since they will be encoded and changed! I don't have a simple answer off the top of my head. Does anyone?
 Signature Canadian Mind Products, Roedy Green. http://mindprod.com Again taking new Java programming contracts.
oleth - 29 Sep 2005 08:41 GMT Opps mistake. I replied one post back. I ment to reply your last post but I hit the wrong Reply link...
Roedy Green - 29 Sep 2005 15:55 GMT >Now the question is, how do you get that marker in there? That is a >trickier question than it first appears. You can't just emit the bytes >EF BB BF since they will be encoded and changed! I don't have a >simple answer off the top of my head. Does anyone? BONG BONG BONG -- sound of head hitting screen. The solution to this problem is ever so much simpler than I gave earlier.
You can't just emit the bytes EF BB BF since they will be encoded and changed. However, the solution is quite simple. prw.write( '\ufeff' ); at the head of the file. This will be encoded as EF BB BF.
Similarly for UTF-16, to put the byte order mark in at the head of the file use prw.write( '\ufeff' ); This will be encoded as FE FF.
 Signature Canadian Mind Products, Roedy Green. http://mindprod.com Again taking new Java programming contracts.
Free MagazinesGet these publications absolutely FREE for up to 12 months. There are no hidden fees and no obligation. Simply choose a title, complete the application form and submit it. Read more ...
|
|
|