Home | Contact Us | FAQ | Search & Site Map | Link to Us
Sign In | Join | Other 45 Sites in Network
HomeAnnouncementsWhite Papers
Discussion GroupsFirst AidDatabasesJavaBeansGUIJava 3DVirtual MachineCORBASecurityToolsGeneral
Java DirectoryOpen Source ProjectsSample Book ChaptersUser GroupsWeb Resources
Related Topics
Databases.NETMore Topics ...

Java Forum / First Aid / September 2005

Tip: Looking for answers? Try searching our database.

output ascii text file

Thread view: 
oleth - 25 Sep 2005 17:59 GMT
Hello everyone,
I am trying to write output to a text file. But I am not sure whether
the results I get are normal.

I try to write out in ASCII format in order to be able to read with MS
notepad. I have followed these 3 different tryout small programms (they
follow) using FileWriter, BufferedWriter ,PrintStream and
DataOutputStream.
The problem is that when I open the file wrote with (FileWriter or
BufferedWriter ) i only get a line with IIIIIIII... (Not "i" the
letter. Another character similar to "i" without the dot over it , the
one you get for ASCII numbers 1 to 5).
When I use DataOutputStream I see what I ought to see but the problem
is that is writes 2-digit chars and every character is followed by a
space

When I use PrintStream all is fine. Is this the only way?.
Roedy Green - 25 Sep 2005 21:40 GMT
>I try to write out in ASCII format in order to be able to read with MS
>notepad. I have followed these 3 different tryout small programms (they
[quoted text clipped - 7 lines]
>is that is writes 2-digit chars and every character is followed by a
>space

DataOutputStream emits binary. This is not intended to be
human-readable. See http://mindprod.com/jgloss/binary.hml
http://mindprod.com/jgloss/binaryformat.html
If you want something readable use a Writer or PrintWriter.

see http://mindprod.com/applets/fileio.html
for how.
Signature

Canadian Mind Products, Roedy Green.
http://mindprod.com Again taking new Java programming contracts.

oleth - 26 Sep 2005 09:49 GMT
> DataOutputStream emits binary. This is not intended to be
> human-readable. See http://mindprod.com/jgloss/binary.hml
[quoted text clipped - 6 lines]
> Canadian Mind Products, Roedy Green.
> http://mindprod.com Again taking new Java programming contracts.

I forgot to mention that I dont write strings, only seperate chars. So
I use

FileWriter write(int)
BufferedWriter write(int)
PrintStream write(int)
DataOutputStream writeChar(char)

In the documentation it says they write characters
Roedy Green - 26 Sep 2005 11:22 GMT
>In the documentation it says they write characters

those are 16 bit chars.  Your editor is expecting 8-bit chars. In any
case use a Writer.  You can then easily flip your encoding.

See http://mindprod.com/jgloss/encoding.html

Try looking at your file with notepad. It might make sense of it,
especially if you start with an endian marker.

See http://mindprod.com/jgloss/utf.html
Signature

Canadian Mind Products, Roedy Green.
http://mindprod.com Again taking new Java programming contracts.

oleth - 26 Sep 2005 12:49 GMT
> >In the documentation it says they write characters
>
> those are 16 bit chars.  Your editor is expecting 8-bit chars. In any
> case use a Writer.  You can then easily flip your encoding.
>
> See http://mindprod.com/jgloss/encoding.html

Hi Roedy

I read what you proposed and you are right. I stick with the Writer
Classes and yes, with OutputStreamWriter you can write the encoding you
want. But the problem remains...
Notepad doesn't seem to recognize it at all (I get the IIIIII stuff). I
tried the Gnome's gedit and it said "can't recognize the encoding of
the file. Maybe you try to edit binary file"
On the other hand the file size is 256 bytes, which I guess is the
right size.

Here is the sample code I use in case it helps:

import java.io.*;
import java.nio.charset.*;

public class WriteFile_encoding{

    public static void main ( String[] args) throws IOException
    {
        String filename = new String("textfile.txt");
        char tempChar = 'a';
        char myChar[] = new char[1];
        int k;

        try
        {
             OutputStream outputFOS = new FileOutputStream(filename);
                OutputStream outputBOS = new BufferedOutputStream(outputFOS );
                OutputStreamWriter outputOSW = new
OutputStreamWriter(outputBOS, "US-ASCII");

            // check to see what encoding is used
            System.out.println( outputOSW.getEncoding() );

            for(k=0; k<=255; k++)
            {
                tempChar = (char) k;
                myChar[0] = (char) k;

                System.out.println("("+ k +") " + tempChar);
                //outputOSW.write(k); // when I use that I get the same...
                outputOSW.write(myChar, 0, 1);

            }
            outputOSW.close();
        }
        catch(FileNotFoundException e)
        {
            //file could not be opened
            System.out.println("Unable to open file: " + filename);
        }
        catch(IOException e)
        {
            // the file could not be read or closed
            System.out.println("Unable to read or close file: " + filename);
        }
    }
}
oleth - 26 Sep 2005 13:03 GMT
I did some tests and it seems its clearing up a little. I read the file
(the output from the code in the previous post). I dont know how, but
its encoding is set to Cp1253 (Windows Greek).
I use windows XP English but as a greek user I had added greek support.

So I guess something is wrong with my code when writting the file.

Here is the code to read the file:

import java.io.*;
import java.nio.charset.*;

public class ReadFile_encoding {

    public static void main ( String[] args) throws IOException
    {
        String filename = new String("textfile.txt");

        try
        {
            FileReader inFile = new FileReader(filename);
            BufferedReader inFileBR = new BufferedReader(inFile );

             InputStream inputFIS = new FileInputStream(filename);
                InputStream inputBIS = new BufferedInputStream(inputFIS );
                InputStreamReader temp_inputISR = new
InputStreamReader(inputBIS);

                InputStreamReader inputISR = new InputStreamReader(inputBIS,
temp_inputISR.getEncoding() );

            System.out.println(inputISR.getEncoding() );

            int i=0;
            char tempChar = 'a';

            while( ( i=inputISR.read() )!=-1 )
            {
                tempChar = (char)    i;
                System.out.println( "(" + i + ") " + tempChar );
            }

            inFileBR.close();
        }
        catch(FileNotFoundException e)
        {
            //file could not be opened
            System.out.println("Unable to open file: " + filename);
        }
        catch(IOException e)
        {
            // the file could not be read or closed
            System.out.println("Unable to read or close file: " + filename);
        }
    }
}
Oliver Wong - 26 Sep 2005 19:06 GMT
>I did some tests and it seems its clearing up a little. I read the file
> (the output from the code in the previous post). I dont know how, but
> its encoding is set to Cp1253 (Windows Greek).
> I use windows XP English but as a greek user I had added greek support.

   Don't know how to solve your encoding problem, but I just wanted to
comment on this line:

> String filename = new String("textfile.txt");

   Is there a reason why you wrote this, instead of:

<code>
String filename = "textfile.txt";
</code>

   - Oliver
Roedy Green - 26 Sep 2005 21:25 GMT
>So I guess something is wrong with my code when writting the file.

Let me repeat my original advice. For code to read and write files
please consult http://mindprod.com/applets/fileio.html

It will show you how to do it in binary or various encodings.

You don't write in one format and read in another.

The encodings you might find of interest are:
UTF-8 = 8 bit Unicode
Windows-1253 = 8-bit MS Greek
UTF-16 = 16 bit Unicode
Windows-1252 = Latin1 Windows default

if you want to experiment writing 16 bit unicode in binary with a
DataOutputStream, write some 16 bit unicode using a Writer first then
compare your two outputs using a hex viewer, so you can see what you
are doing wrong.

See http://mindprod.com/jgloss/hex.html

Signature

Canadian Mind Products, Roedy Green.
http://mindprod.com Again taking new Java programming contracts.

oleth - 27 Sep 2005 09:51 GMT
> >So I guess something is wrong with my code when writting the file.
>
> Let me repeat my original advice. For code to read and write files
> please consult http://mindprod.com/applets/fileio.html
> It will show you how to do it in binary or various encodings.

Hi,
I went there, read the page and then used the applet to produce the
read/write code. I used the options

sequential file | write | unbuffered | Locale encoding chars
sequential file | read | unbuffered | Locale encoding chars

I intergrated the code (copy-paste) that the apllet displayed into my
code using
FileOutputStream - OutputStreamWriter -PrintWriter and
FileInputStream - InputStreamReader

But 2 problems remain:
1) when I write the file, I still can't read it with notepad or gedit.
Nano reads it. The first 31 are not recognized (expected). The last 126
are just questionmarks (?)
2) the hex editor reveals that chars 125-255 are all just 3F (the same).
Roedy Green - 27 Sep 2005 19:12 GMT
>But 2 problems remain:
>1) when I write the file, I still can't read it with notepad or gedit.
>Nano reads it. The first 31 are not recognized (expected). The last 126
>are just questionmarks (?)
>2) the hex editor reveals that chars 125-255 are all just 3F (the same).

you have three problems.  You cannot do file i/o in an Applet without
signing it.  See http://mindprod.com/jgloss/applets.html
http://mindprod.com/jgloss/signedapplets.html

What encoding did you specify for your write?  Look at the list at
http://mindprod.com/jgloss/encoding.html to make sure you used the
proper name and that your encoding is supported.

Please show the code you used modified from FileIO to do your output.
The devil is in the details.

It sounds like you chose an 8-bit encoding that does not support the
Unicode characters you tried to display.

An example would be if you tried to take Unicode Chinese characters
and display them in 8-bit Cp863 French Canadian DOS.
Signature

Canadian Mind Products, Roedy Green.
http://mindprod.com Again taking new Java programming contracts.

Andrew Thompson - 28 Sep 2005 08:55 GMT
>>But 2 problems remain:
>>1) when I write the file, I still can't read it with notepad or gedit.
[quoted text clipped - 3 lines]
>
> you have three problems.  You cannot do file i/o in an Applet

When I first read oleth's reply, I made the same mistake.

The 'applet' that oleg refers to is your applet.
I think oleg means they are copy/pasting the code from
the text area of your FileI/O *applet* into the
*application* source.
Andrew Thompson - 28 Sep 2005 08:56 GMT
>>But 2 problems remain:
>>1) when I write the file, I still can't read it with notepad or gedit.
[quoted text clipped - 3 lines]
>
> you have three problems.  You cannot do file i/o in an Applet

When I first read oleth's reply, I made the same mistake.

The 'applet' that oleth refers to is your applet.
I think oleth means they are copy/pasting the code from
the text area of your FileI/O *applet* into their
*application* source.
oleth - 28 Sep 2005 09:02 GMT
> >But 2 problems remain:
> >1) when I write the file, I still can't read it with notepad or gedit.
[quoted text clipped - 5 lines]
> signing it.  See http://mindprod.com/jgloss/applets.html
> http://mindprod.com/jgloss/signedapplets.html

I dont use an applet in my code. I refered to the applet of the page
http://mindprod.com/applets/fileio.html which "generates" I/O code.

> What encoding did you specify for your write?  Look at the list at
> http://mindprod.com/jgloss/encoding.html to make sure you used the
> proper name and that your encoding is supported.
>
> Please show the code you used modified from FileIO to do your output.
> The devil is in the details.
That's absolutely true... I might be making somewhere some kind of
stupid mistake, I can't find out.
I use the "US-ASCII", which I found in both the sun's documentation and
the page you provided.
I would like to thank you for all your effort to help!

Here's the code:

/**********************
write file
**********************/

import java.io.*;
//import java.nio.charset.*;

public class WriteFile_encoding2{

    public static void main ( String[] args) throws IOException
    {
        String filename = new String("textfile");
        char tempChar = 'a';
        int k;

        try
        {
            // O P E N
            FileOutputStream fos = new FileOutputStream( filename  );
            OutputStreamWriter eosw = new OutputStreamWriter( fos,"US-ASCII" );

            PrintWriter prw = new PrintWriter( eosw, false /* auto flush on
println */ );

            // check to see what encoding is used
            System.out.println( eosw.getEncoding() );

            for(k=0; k<=255; k++)
            {
                tempChar = (char) k;

                System.out.println("("+ k +") " + tempChar);

                // W R I T E
                prw.write( k );
                prw.flush();

            }

            // C L O S E
            prw.close();
        }
        catch(FileNotFoundException e)
        {
            //file could not be opened
            System.out.println("Unable to open file: " + filename);
        }
        catch(IOException e)
        {
            // the file could not be read or closed
            System.out.println("Unable to read or close file: " + filename);
        }
    }
}

/**********************
write file
**********************/

import java.io.*;
import java.nio.charset.*;

public class ReadFile_encoding2 {

    public static void main ( String[] args) throws IOException
    {
        String filename = new String("textfile");

        try
        {
            // O P E N
            FileInputStream fis = new FileInputStream( filename);
            InputStreamReader eisr = new InputStreamReader( fis,"US-ASCII" );

            System.out.println(eisr.getEncoding() );

            int aChar;
            char myChar='a';

            while( ( aChar=eisr .read() )!=-1 )
            {
                // R E A D
                myChar= (char) aChar;
                System.out.println( "(" + aChar + ") " + myChar);
            }

            // C L O S E
            eisr .close();
        }
        catch(FileNotFoundException e)
        {
            //file could not be opened
            System.out.println("Unable to open file: " + filename);
        }
        catch(IOException e)
        {
            // the file could not be read or closed
            System.out.println("Unable to read or close file: " + filename);
        }
    }
}
Roedy Green - 28 Sep 2005 10:08 GMT
>I dont use an applet in my code. I refered to the applet of the page
>http://mindprod.com/applets/fileio.html which "generates" I/O code.

That's good to hear. One headache out the way.
Signature

Canadian Mind Products, Roedy Green.
http://mindprod.com Again taking new Java programming contracts.

Roedy Green - 28 Sep 2005 10:23 GMT
>"US-ASCII"

US-ASCII is an impoverished character set.  It is about as basic as
you can get. See http://mindprod.com/jgloss/ascii.html
for the symbols it supports.

Try UTF-8  or Windows-1252 if you expect to see anything but basic
unaccented letters, digits and typewriter punctuation.

Bit of history here.  I remember sitting down with Vern Detwiler
(later founder of Macdonald Detwiler).  He was trying to come up with
a 6-bit code that we would use at  UBC. He was debating the merits of
basing it on ASCII which was the first time I had ever heard the
term.. He was bouncing ideas of various people about what they thought
our code should look like.

My home tube-based machine used Friden Flexowriter 6-bit paper tape.
TTYs used 8-bit paper tape code. Punch cards used  12-bits per column.
Today we have Unicode which was 16 bits and has already grown to  32.

Every machine I worked on in the early days had its own local or
proprietary encoding.

ASCII and Unicode were great leaps forward, more political coups than
technological ones.
Signature

Canadian Mind Products, Roedy Green.
http://mindprod.com Again taking new Java programming contracts.

oleth - 28 Sep 2005 11:59 GMT
> >"US-ASCII"
>
[quoted text clipped - 21 lines]
> ASCII and Unicode were great leaps forward, more political coups than
> technological ones.

Ok the issue is settled now.
If I use US-ASCII and I can write only the 125 first characters, the
rest are written as questionmark "?" (63).
If I use ISO-8859-1 or UTF-8 (the sun docs say they are both standard
charsets) I get all the letters 0-255.

Notepad though, still can't read the UTF8. I open it and it has only
the "IIII "stuff. The funny thing is that when I go to "Save As" the
file, alone notepad understand it's Unicode and sets by itself the
"Encoding" to unicode, but in no vain :)
Nevermind, I simply won't use notepad. Wordpad, Nano and gedit see the
file just fine...

I greatly appreciate your help Roedy. Thanks again for all your time
and effort :)
Oliver Wong - 28 Sep 2005 16:30 GMT
> Notepad though, still can't read the UTF8. I open it and it has only
> the "IIII "stuff. The funny thing is that when I go to "Save As" the
> file, alone notepad understand it's Unicode and sets by itself the
> "Encoding" to unicode, but in no vain :)
> Nevermind, I simply won't use notepad. Wordpad, Nano and gedit see the
> file just fine...

   I was told by a friend (but did not verify independently) that Microsoft
Notepad's implementation of Unicode encodings are broken. Something about
byte order marks being misinterpreted.

   - Oliver
Roedy Green - 29 Sep 2005 03:02 GMT
>    I was told by a friend (but did not verify independently) that Microsoft
>Notepad's implementation of Unicode encodings are broken. Something about
>byte order marks being misinterpreted.

Notepad on my Win2k machine is fine. It is just it has no way to deal
with unmarked UTF-8.  It won't take user hints. It insists on seeing
the marker.
Signature

Canadian Mind Products, Roedy Green.
http://mindprod.com Again taking new Java programming contracts.

oleth - 29 Sep 2005 08:36 GMT
> >    I was told by a friend (but did not verify independently) that Microsoft
> >Notepad's implementation of Unicode encodings are broken. Something about
[quoted text clipped - 6 lines]
> Canadian Mind Products, Roedy Green.
> http://mindprod.com Again taking new Java programming contracts.

You are right. When saved as "UTF-8" (from notepad) in the file there
are in the beginning EF BB BF.

I also note that it changes tha values written (with hex you see is
completely different stuff in it). The file size also changes from 384
bytes to 575. I tried to figure out what encoding is notepad using, but
in no vain. I wrote files with java as UTF-16BE, UTF-16LE and UTF-16,
but no... The files java produces are completely different from what
notepad saves

In fact when I use UTF-16 it sees "Unicode big endian" (the Save As).
The LE, BE are interpreted as ANSI...

If I want the bytes written remain unalterd I have to use the "Unicode"
encoding. It adds in the beginning FF FE and voila! the file is the
same appart the 2 bytes in the beginning.

Did I mentioned, that when I read with notepad the file saved by
notepad as "Unicode" or "UTF-8",  I only get IIIIII stuff? ...

As far as concern Wordpad, when I open the file written by java as UTF8
it sees  it right, but if you try to save the file as "Unicode Text
Document", it adds FF FE and it doubles its size from 384 to 776 bytes.

I am new to encoding and stuff and there might be a logic explenation
but it seems like magic ... I should have changed the post's subject to
"output magic tricks with bacic text editors" :)

For the history Notepad version is: 5.1 (Build
2600.xpsp_sp2_rtm.040803-2158: Service Pack 2)
Roedy Green - 29 Sep 2005 09:19 GMT
> The files java produces are completely different from what
>notepad saves

When you save-as in notepad notice that you can select the format.
Signature

Canadian Mind Products, Roedy Green.
http://mindprod.com Again taking new Java programming contracts.

Roedy Green - 29 Sep 2005 09:33 GMT
>As far as concern Wordpad, when I open the file written by java as UTF8
>it sees  it right, but if you try to save the file as "Unicode Text
>Document", it adds FF FE and it doubles its size from 384 to 776 bytes.

FF FE is "UnicodeLittle"  16 bit, little-endian, marked. I'd think you
should also be able to read it with "UTF-16".

Note the table at http://mindprod.com/jgloss/utf.html of markers for
identifying encodings and the table of encodings at
http://mindprod.com/jgloss/encoding.html

Signature

Canadian Mind Products, Roedy Green.
http://mindprod.com Again taking new Java programming contracts.

Roedy Green - 29 Sep 2005 09:45 GMT
>I wrote files with java as UTF-16BE, UTF-16LE and UTF-16,
>but no... The files java produces are completely different from what
>notepad saves

Try writing in java with "UTF-8" encoding. The file you create will be
missing the proper EF BB BF header.

You can glue on one this way:

create a 3 byte file with a hex editor containing just EF BB BF and
save as head.txt ( or use a Java FileOutputStream to create it).

create your java file in java.txt as you are now with UTF-8 encoding.

copy /b head.txt + java.txt notepad.txt

Now examine the notepad.txt in notepad. It should be happy.

Now do the concatenate in java with a FileInputStream reading raw
bytes  into a byte array and then writing the header, then the bytes.
See http://mindprod.com/applets/fileio.html
for how.  Alternatively, you can create the encoded file into a
ByteArrayOutputStream than dump the 3 bytes and that into a
FileOutputStream. See http://mindprod.com/applets/fileio.html
for how.

What is the matter with Sun? They support every encoding under the Sun
but don't let you create files in the main interchange encoding that
anyone should ever use nowadays -- marked UTF-8.

Signature

Canadian Mind Products, Roedy Green.
http://mindprod.com Again taking new Java programming contracts.

Roedy Green - 29 Sep 2005 03:01 GMT
>Notepad though, still can't read the UTF8.

Notepad wants a marker on front EF BB BF of its UTF-8 files.
Does the file you create have that marker? You can check with a hex
viewer.

See http://mindprod.com/jgloss/hex.html

Quoting from http://mindprod.com/encoding.html under UTF-8:

8-bit encoded Unicode. neé UTF8. Optional marker on front of file: EF
BB BF for reading. Unfortunately, OutputStreamWriter does not
automatically insert the marker on writing. Notepad can't read the
file without this marker.

Now the question is, how do you get that marker in there?  That is a
trickier question than it first appears. You can't just emit the bytes
EF BB BF since they will be encoded and changed!  I don't have a
simple answer off the top of my head.  Does anyone?

Signature

Canadian Mind Products, Roedy Green.
http://mindprod.com Again taking new Java programming contracts.

oleth - 29 Sep 2005 08:41 GMT
Opps mistake. I replied one post back. I ment to reply your last post
but I hit the wrong Reply link...
Roedy Green - 29 Sep 2005 15:55 GMT
>Now the question is, how do you get that marker in there?  That is a
>trickier question than it first appears. You can't just emit the bytes
>EF BB BF since they will be encoded and changed!  I don't have a
>simple answer off the top of my head.  Does anyone?

BONG BONG BONG -- sound of head hitting screen.  The solution to this
problem is ever so much simpler than I gave earlier.

You can't just emit the bytes EF BB BF since they will be encoded and
changed. However, the solution is quite simple. prw.write( '\ufeff' );
at the head of the file. This will be encoded as EF BB BF.

Similarly for UTF-16, to put the byte order mark in at the head of the
file use prw.write( '\ufeff' ); This will be encoded as FE FF.
Signature

Canadian Mind Products, Roedy Green.
http://mindprod.com Again taking new Java programming contracts.



Free Magazines

Get these publications absolutely FREE for up to 12 months. There are no hidden fees and no obligation. Simply choose a title, complete the application form and submit it. Read more ...

Oracle MagazineNetwork ComputingComputer WorldBio-IT WorldeWeekInformation WeekInfosecurity
 
Sign In
Join
My Latest Posts
My Monitored Threads
My Blog
My Photo Gallery
My Profile
My Homepage

Start New Thread
Enable EMail Alerts
Rate this Thread



©2008 Advenet LLC   Privacy Policy - Terms of Use
This website includes both content owned or controlled by Advenet as well as content owned or controlled by third parties.