Home | Contact Us | FAQ | Search & Site Map | Link to Us
Sign In | Join | Other 45 Sites in Network
HomeAnnouncementsWhite Papers
Discussion GroupsFirst AidDatabasesJavaBeansGUIJava 3DVirtual MachineCORBASecurityToolsGeneral
Java DirectoryOpen Source ProjectsSample Book ChaptersUser GroupsWeb Resources
Related Topics
Databases.NETMore Topics ...

Java Forum / General / January 2006

Tip: Looking for answers? Try searching our database.

Fell Swoop I/O

Thread view: 
Roedy Green - 14 Jan 2006 05:52 GMT
Writing or reading a byte[] in one fell swoop to write or read a file
should be extremely efficient.  In theory, the bytes could go straight
from your array to the hard disk controller.

I wonder if that is indeed true, for unbuffered files. Or are they
copied some sub-chunk size at a time. Has anyone peeked under the hood
or done some experiments to deduce what happens from timings.

Encoding though, even when you have a 1-1 char > byte encoding
requires Java to allocate some sort of transparent intermediate byte
buffer, even for unbuffered Writers.  How does Java decide how big to
make it?  Does it make it big enough to contain the entire String?

Has anyone peeked under the hood or experimented.  

A practical way of asking this question is:

It is better write an entire file unbuffered or write an entire file
with a buffer?  If buffered, what is a reasonable buffer size? Making
it too big causes more frequent GC.  Making it too small causes more
physical i/os.

Here is a place I would like tweakers where you could write your code
and let the tweaker optimiser AT THE  CLIENT SITE home in the optimum
settings for his platform.

see http://mindprod.com/jgloss/tweakable.html
Signature

Canadian Mind Products, Roedy Green.
http://mindprod.com Java custom programming, consulting and coaching.

NOBODY - 14 Jan 2006 07:07 GMT
> Writing or reading a byte[] in one fell swoop to write or read a file
> should be extremely efficient.  In theory, the bytes could go straight
[quoted text clipped - 17 lines]
> it too big causes more frequent GC.  Making it too small causes more
> physical i/os.

Let's think about what sun did since '95...
FileOutputStream has 2 native methods:
       write(byte)
       write(byte[], off, len)
that thousands of classes depend on.
Even the NIO channels are slower as I heard, since they were designed for
Selectable and locks, not so much for performance.
So, yeah, safe to say it is fast enough.

Optimal byte[] buffer size come from one thing: TESTING.

Keep it a power of your cluster size to be friendly, trust HDD
controllers and i/o schedulers pull at least the cluster size with all
sorts of 'read' or 'write' prediction, exploiting the disk cache.

Understand that 2 long writes at the same time on a single hdd will make
its head jump all over and drop to much less than just half the
performance. Your tests could be biased is your are swapping of other
disk activities.

The largest chunk possible, to reduce the i/o scheduling pieces and
reassembly and hope the i/o scheduler will thank you for a big contiguous
array of bytes.
Roedy Green - 14 Jan 2006 07:41 GMT
>The largest chunk possible, to reduce the i/o scheduling pieces and
>reassembly and hope the i/o scheduler will thank you for a big contiguous
>array of bytes.

There are some complications from the traditional wisdom.

1. Java's buffering can be inserted at various layers. Only the lowest
layer offers any help for I/O.

2. Java does encoding transformations. This implies hidden buffers of
which you have no control.

I need to do some experiments, but I think the fastest way to read a
file of chars will be:

1. find the length in bytes. This is not necessarily the length in
chars.

2. read the entire file in one read (buffered or unbuffered?) onto a
byte[].

3. use a new String which has a built in encoding conversion.
Signature

Canadian Mind Products, Roedy Green.
http://mindprod.com Java custom programming, consulting and coaching.

Andrey Kuznetsov - 14 Jan 2006 13:18 GMT
>>The largest chunk possible, to reduce the i/o scheduling pieces and
>>reassembly and hope the i/o scheduler will thank you for a big contiguous
[quoted text clipped - 4 lines]
> 1. Java's buffering can be inserted at various layers. Only the lowest
> layer offers any help for I/O.

Roedy,

just put Unified I/O in lowest layer and forget about performance.

I memorize that you asked me about tutorial.

However I don't have it yet, but I can give you some advices:

Unified I/O interface looks just like from RandomAccessFile (with some
extras).

Important thing is RandomAccessFactory.

It has following methods:

RandomAccess create();
RandomAccessRO createRO();
RandomAccessBuffer createBuffered();
RandomAccessBufferRO createBufferedRO();

(RO means read only)

It was difficult part.

Easy part is that you can create InputStream from RandomAccessRO
or OutputStream from RandomAccess
and use it as usual without changing your code.
See com.imagero.uio.io.RandomAccessInputStream
and com.imagero.uio.io.RandomAccessOutputStream.

Signature

Andrey Kuznetsov
http://uio.imagero.com Unified I/O for Java
http://reader.imagero.com Java image reader
http://jgui.imagero.com Java GUI components and utilities

NOBODY - 14 Jan 2006 16:50 GMT
> RandomAccess create();
> RandomAccessRO createRO();
> RandomAccessBuffer createBuffered();
> RandomAccessBufferRO createBufferedRO();

Simpler: knowing that a seek on a RAF will move the FD with it,
you can reposition buffered streams on it. Here:
(I was too lazy to implement the DataInput and DataOutput, but you get
the point)

-----------

import java.io.*;

public class SuperRAF {
   
    public final RandomAccessFile raf;
    public final MyBIS bis;
    public final BufferedOutputStream bos;
    public final DataInputStream dis;
    public final DataOutputStream dos;
   
    public SuperRAF(RandomAccessFile raf, int bufsize) throws
IOException {
        this.raf = raf;
        bis = new MyBIS(new FileInputStream(raf.getFD()), bufsize);
        bos = new BufferedOutputStream(new FileOutputStream(raf.getFD
()), bufsize);
        dis = new DataInputStream(bis);
        dos = new DataOutputStream(bos);
    }
   
   
    public void flush() throws IOException {
        bos.flush();
    }
   
    public void seek(long pos) throws IOException {
        bos.flush();
        bis.clear();
        raf.seek(pos);
    }
   
    //=======
   
    static class MyBIS extends BufferedInputStream {
        MyBIS(InputStream is, int size) {
            super(is, size);
        }
       
        MyBIS(InputStream is) {
            super(is);
        }
       
        void clear() {
            super.count = 0;
            super.markpos = -1;
            super.pos = 0;
            super.marklimit = 0;
            //super.buf = don't waste that
        }
    }
   
}
Andrey Kuznetsov - 14 Jan 2006 17:32 GMT
> Simpler: knowing that a seek on a RAF will move the FD with it,
> you can reposition buffered streams on it.

oh yes, and with raf.seek(0) you can just revind your IS.

Signature

Andrey Kuznetsov
http://uio.imagero.com Unified I/O for Java
http://reader.imagero.com Java image reader
http://jgui.imagero.com Java GUI components and utilities

NOBODY - 14 Jan 2006 16:13 GMT
>>The largest chunk possible, to reduce the i/o scheduling pieces and
>>reassembly and hope the i/o scheduler will thank you for a big
[quoted text clipped - 4 lines]
> 1. Java's buffering can be inserted at various layers. Only the lowest
> layer offers any help for I/O.

To me a simple file output stream is the closest to the i/o chunk.
Just do your buffering yourself is layer of uncontrolled buffering scares
you. But you did say you had files, not streams. So you control how it is
read.

My i/o test:

-----
import java.io.File;
import java.io.FileOutputStream;

public class IOSizer {
    public static void main(String[] args) throws Exception {
        File f = File.createTempFile("_IOSizer_",".tmp",new File
("."));
        f.deleteOnExit();
        FileOutputStream fos = new FileOutputStream(f);
        try {
            fos.write(1);
            fos.write(new byte[Integer.parseInt(args[0])]);
        } finally {
            fos.close();
        }
    }
}

---- and trace system write calls ----
/usr/bin/strace -x -e write java IOSizer 33333

[...]
write(5, "\x01", 1)                     = 1
write(5, "\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00"...,
33333) = 33333
[...]

> 2. Java does encoding transformations. This implies hidden buffers of
> which you have no control.

If your first stream is a bufferedinputstream (over file inputstream) of
a buffer size of your choice, the only buffers are supposed to be a few
bytes long, most probably reused, for the longest charset sequence, for
which I know utf-8 is probably one of at least 6 bytes (31 bits payload,
4 bytes for 20 bit unicode).

> I need to do some experiments, but I think the fastest way to read a
> file of chars will be:
[quoted text clipped - 6 lines]
>
> 3. use a new String which has a built in encoding conversion.

How were you intending to read a unique string otherwise? :-/
But if you can process your html in chunks (tabs, spaces, and all you
mentionned), You can probably just use a buffered reader over a
intputstream reader over the bufferedinputstream. Read a pack of lines
(like 200, or when you reached a string length threshold), and process it
in smaller pieces, keeping a stateful engine of where you are (opened
tags and such annoying things.)
Thomas Hawtin - 14 Jan 2006 18:08 GMT
> Writing or reading a byte[] in one fell swoop to write or read a file
> should be extremely efficient.  In theory, the bytes could go straight
> from your array to the hard disk controller.

Almost certainly the biggest overhead here is going to be with the disc
drive. Depending on circumstances the seek time or transfer time for
long files. Possibly if buffering causes a spike in memory usage, there
could be other problems.

There will be at least one additional copy for your operating system's
file cache. Also you aren't going to want your byte[] pinned while the
file system blocks, direct allocated ByteBuffers may be a win (for the
careful, or carefree).

> It is better write an entire file unbuffered or write an entire file
> with a buffer?  If buffered, what is a reasonable buffer size? Making
> it too big causes more frequent GC.  Making it too small causes more
> physical i/os.

I suspect there is a huge middle ground, where the exact size doesn't
matter.

Memory mapping is another way to go.

Tom Hawtin
Signature

Unemployed English Java programmer
http://jroller.com/page/tackline/

Dimitri Maziuk - 14 Jan 2006 18:42 GMT
Roedy Green sez:
> Writing or reading a byte[] in one fell swoop to write or read a file
> should be extremely efficient.  In theory, the bytes could go straight
> from your array to the hard disk controller.

There are a couple of buffering stages involved even before
the data gets to JVM:

1. HD read and writes are done in chunks (> 1 byte, configurable
on some systems).

2. Assuming a single disk, the slowest part of file copy process
is positioning disk head to write to destination file and then
re-positioning it back to read from the source. So OS and/or HD
controller buffer I/O requests and schedule them for optimal head
movement.

3. File data is buffered by OS (size depends on OS, available RAM,
number of open files, etc.)

(Now add concurrent I/O requests coming from multiple processes
on a time-sharing system to the mix.)

4. Then the data gets to JVM which (or may not) do still more
buffering.

5. Finally, you code yet another buffer -- your byte[] -- on
top of all that.

In theory, if you could read the entire file into byte[] and
then write the entire thing out, it should be the fastest:
let JVM, OS, and hardware optimize the actual disk I/O. In
practice you seldom have enough RAM for that.

In practice, with all that stuff going on behind the scenes
(that you have no control over), I wouldn't worry about it
at all: code what makes sense for your application. I tend
to use buffered readers when I need line-based reads -- not
because it's supposed to be faster but because I need readLine().

Dima
Signature

Q276304 - Error Message: Your Password Must Be at Least 18770 Characters
and Cannot Repeat Any of Your Previous 30689 Passwords           -- RISKS 21.37

Andrey Kuznetsov - 14 Jan 2006 18:58 GMT
> In practice, with all that stuff going on behind the scenes
> (that you have no control over), I wouldn't worry about it
> at all: code what makes sense for your application. I tend
> to use buffered readers when I need line-based reads -- not
> because it's supposed to be faster but because I need readLine().

For small files you can safely ignore buffering.
For huge files buffering can significantly speed up I/O.

Signature

Andrey Kuznetsov
http://uio.imagero.com Unified I/O for Java
http://reader.imagero.com Java image reader
http://jgui.imagero.com Java GUI components and utilities

Raymond DeCampo - 14 Jan 2006 18:51 GMT
> It is better write an entire file unbuffered or write an entire file
> with a buffer?  If buffered, what is a reasonable buffer size? Making
> it too big causes more frequent GC.  Making it too small causes more
> physical i/os.

Roedy,

What is your reasoning behind saying that a large buffer causes more
frequent garbage collection?

Thanks,
Ray

Signature

This signature intentionally left blank.

Roedy Green - 14 Jan 2006 19:16 GMT
On Sat, 14 Jan 2006 18:51:55 GMT, Raymond DeCampo
<nospam@twcny.rr.com> wrote, quoted or indirectly quoted someone who
said :

>What is your reasoning behind saying that a large buffer causes more
>frequent garbage collection?

Imagine a case where you had 1000 files each 100 bytes long and you
allocated 64K buffers.  You will fill up ram faster than had you use
no buffering or 100 byte buffers.
Signature

Canadian Mind Products, Roedy Green.
http://mindprod.com Java custom programming, consulting and coaching.

Raymond DeCampo - 14 Jan 2006 23:51 GMT
> On Sat, 14 Jan 2006 18:51:55 GMT, Raymond DeCampo
> <nospam@twcny.rr.com> wrote, quoted or indirectly quoted someone who
[quoted text clipped - 6 lines]
> allocated 64K buffers.  You will fill up ram faster than had you use
> no buffering or 100 byte buffers.

I see; I thought you meant in the case where there was one buffer and I
could not imagine how that applied.

Ray

Signature

This signature intentionally left blank.



Free Magazines

Get these publications absolutely FREE for up to 12 months. There are no hidden fees and no obligation. Simply choose a title, complete the application form and submit it. Read more ...

Oracle MagazineNetwork ComputingComputer WorldBio-IT WorldeWeekInformation WeekInfosecurity
 
Sign In
Join
My Latest Posts
My Monitored Threads
My Blog
My Photo Gallery
My Profile
My Homepage

Start New Thread
Enable EMail Alerts
Rate this Thread



©2009 Advenet LLC   Privacy Policy - Terms of Use
This website includes both content owned or controlled by Advenet as well as content owned or controlled by third parties.