Java Forum / General / March 2006
what happens to buffer ?
gk - 30 Mar 2006 06:34 GMT byte[] buffer = new byte[512]; int read; while ((read=in.read(buffer)) >0) { out.write(buffer, 0, read); }
in the first iteration, the buffer is filled up with bytes.
what happens in the next iteration ?
does buffer first cleared off and then filled up afresh
OR
the buffer is overwriiten with the new incoming bytes ?
Patricia Shanahan - 30 Mar 2006 06:43 GMT > byte[] buffer = new byte[512]; > int read; [quoted text clipped - 11 lines] > > the buffer is overwriiten with the new incoming bytes ? If "in" is an InputStream reference, the InputStream javadoc covers this in detail.
Patricia
gk - 30 Mar 2006 06:50 GMT > If "in" is an InputStream Yes. you are right.
javadoc says
"public int read(byte[] b) throws IOException
Reads some number of bytes from the input stream and stores them into the buffer array b. The number of bytes actually read is returned as an integer. This method blocks until input data is available, end of file is detected, or an exception is thrown.
If b is null, a NullPointerException is thrown. If the length of b is zero, then no bytes are read and 0 is returned; otherwise, there is an attempt to read at least one byte. If no byte is available because the stream is at end of file, the value -1 is returned; otherwise, at least one byte is read and stored into b.
The first byte read is stored into element b[0], the next one into b[1], and so on. The number of bytes read is, at most, equal to the length of b. Let k be the number of bytes actually read; these bytes will be stored in elements b[0] through b[k-1], leaving elements b[k] through b[b.length-1] unaffected.
If the first byte cannot be read for any reason other than end of file, then an IOException is thrown. In particular, an IOException is thrown if the input stream has been closed.
The read(b) method for class InputStream has the same effect as:
read(b, 0, b.length)
"
my question is
what happens in the next iteration ?
does buffer first cleared off and then filled up afresh
OR
the buffer is overwriiten with the new incoming bytes ?
javadoc does not answer this question.
Patricia Shanahan - 30 Mar 2006 06:54 GMT >>If "in" is an InputStream > [quoted text clipped - 43 lines] > > javadoc does not answer this question. Yes it does, because it does not say "This is the behavior for the first iteration only". The material you quoted applies to every call to InputStream's read with a byte buffer, regardless of whether it is the first call with that buffer or not.
Patricia
Oliver Wong - 30 Mar 2006 18:36 GMT [post re-ordered]
> my question is > [quoted text clipped - 7 lines] > > javadoc does not answer this question. It does:
> Let k be the number of bytes actually read; these bytes > will be stored in elements b[0] through b[k-1], leaving elements b[k] > through b[b.length-1] unaffected. - Oliver
Roedy Green - 30 Mar 2006 07:00 GMT >in the first iteration, the buffer is filled up with bytes. > [quoted text clipped - 5 lines] > >the buffer is overwriiten with the new incoming bytes ? why would it matter? You know how many bytes there are when you are done. If you are just curious, have a look at SRC..ZIP and failing that the sun source codes. See http://mindprod.com/jgloss/jdk.html
 Signature Canadian Mind Products, Roedy Green. http://mindprod.com Java custom programming, consulting and coaching.
Thomas Schodt - 30 Mar 2006 08:25 GMT > byte[] buffer a byte array reference "buffer"
> byte[] buffer = new byte[512]; is assigned to reference a (new) byte array of 512 bytes. Initially these bytes will all contain 0 (ascii NUL).
> int read; > while ((read=in.read(buffer)) >0) { here between 1 and 512 bytes of the byte array referenced by "buffer" are "filled" with byte values starting from - the start of the byte array.
> out.write(buffer, 0, read); > } [quoted text clipped - 8 lines] > > the buffer is overwritten with the new incoming bytes ? between 1 and 512 bytes of the byte array referenced by "buffer" are "filled" with byte values starting from - the start of the byte array.
Maybe you read about nio ByteBuffer and you are confusing the two?
gk - 30 Mar 2006 10:43 GMT more confused with those answers.
here i am explaining the problem in a nice way
say, in the first iteration ,there was 10 bytes in the stream.(because its streaming and bytes might come slowly slowly)
so 10 bytes is read by the read() method and going to the buffer.
write method uses this byte buffer.
Now, in the second iteration say, there is 25 bytes in the stream
so, so 25 bytes is read by the read() method and going to the buffer.
but in the first iteration buffer had 10 bytes ....what will happen to those 10 bytes now ?
does those will be cleared off first and then 25 bytes would be placed.
OR
the buffer would be completely overwritten with these new coming 25 bytes ?
> > byte[] buffer > [quoted text clipped - 33 lines] > Maybe you read about nio ByteBuffer > and you are confusing the two? Chris Uppal - 30 Mar 2006 11:00 GMT > the buffer would be completely overwritten with these new coming 25 > bytes ? That's correct. The first 25 bytes of the buffer would be overwritten. The 10 bytes from the previous read() would be lost.
(How could the second call to read() "know" that a previous call had put 10 bytes into the buffer ? And even if it did know, why should it care? Presumably if the programmer hadn't wanted to overwrite the existing data, then s/he would have used the longer form of read() which takes an argument to say where in the buffer to start writing.)
-- chris
gk - 30 Mar 2006 11:30 GMT what should be the size of buffer ?
is byte[] buffer = new byte[512]; ENOUGH ?
suppose, at some point of time huge number of bytes (say 1000 bytes) stormed .
Then what will happen ? the buffer cant accept more than 512 bytes .....will the additional bytes 1000-512 = 488 will still be in the stream ? or they will be lost ?
the reason is, some people use
byte[] buffer = new byte[256]; byte[] buffer = new byte[512]; byte[] buffer = new byte[1024];
which one is good ?
or anything is ok . does it matter really ? does the coder responsible for choosing the size of the byte ?
Gordon Beaton - 30 Mar 2006 11:48 GMT > what should be the size of buffer ? > [quoted text clipped - 17 lines] > or anything is ok . does it matter really ? does the coder > responsible for choosing the size of the byte ? Each time you call read(), the new bytes are written at the start of the buffer unless you tell read() to do otherwise. If there were already some data in the buffer from a previous read, it will be overwritten with the new data.
Also, read() will never read more than the number of characters you request, or the length of the buffer if you don't specify. Note that read() can and often will return *fewer* characters than you request, so you need to check the return value.
Any bytes you don't read will wait nicely in the stream until you choose to read them.
So you can decide to read as much or as little as you want each time, and can choose an apropriate buffer size. Normally it's more efficient to read a lot of data each time and in powers of two, but depending on your application you may want to read less.
/gordon
 Signature [ do not email me copies of your followups ] g o r d o n + n e w s @ b a l d e r 1 3 . s e
Patricia Shanahan - 30 Mar 2006 15:57 GMT ...
> So you can decide to read as much or as little as you want each time, > and can choose an apropriate buffer size. Normally it's more efficient > to read a lot of data each time and in powers of two, but depending on > your application you may want to read less. Why the preference for powers of two?
Patricia
Remon van Vliet - 30 Mar 2006 16:09 GMT > ... >> So you can decide to read as much or as little as you want each time, [quoted text clipped - 5 lines] > > Patricia Because it looks cool i think, but other than that i can think of exactly zero reasons to make buffers a power of 2.
Gordon Beaton - 30 Mar 2006 16:32 GMT > Why the preference for powers of two? When reading from a stream that maps to a file, I believe that read efficiency is improved by aligning reads to OS buffer sizes and ultimately file system or NFS block sizes. AFAIK all of these are normally powers of 2.
I suppose if you're reading from a TCP stream, then multiples of MSS bytes might be more appropriate.
Superstition? Maybe.
/gordon
 Signature [ do not email me copies of your followups ] g o r d o n + n e w s @ b a l d e r 1 3 . s e
Remon van Vliet - 30 Mar 2006 16:40 GMT >> Why the preference for powers of two? > [quoted text clipped - 9 lines] > > /gordon Even if that were so, you'd need to know the actual buffer sizes of said OS to have a noticable improvement though, and there's a fair chance even then the difference is negligable. It's often more useful to adjust the buffer size to something sensible for said application. All that said, i always allocate general purpose buffers to be a power of 2...there's something appealing to the numbers 512 and 4096....maybe i'm just weird
Oliver Wong - 30 Mar 2006 18:38 GMT > ... >> So you can decide to read as much or as little as you want each time, [quoted text clipped - 3 lines] > > Why the preference for powers of two? Because it has always been done that way. Do not question your elders!
- Oliver
Roedy Green - 30 Mar 2006 18:56 GMT >> So you can decide to read as much or as little as you want each time, >> and can choose an apropriate buffer size. Normally it's more efficient >> to read a lot of data each time and in powers of two, but depending on >> your application you may want to read less. > >Why the preference for powers of two? Physical i/o is done in terms of some power of two, often 512 bytes If your buffer is a nice muliple, physical i/o can do direct to it.
 Signature Canadian Mind Products, Roedy Green. http://mindprod.com Java custom programming, consulting and coaching.
Chris Smith - 30 Mar 2006 19:03 GMT > ... > > So you can decide to read as much or as little as you want each time, [quoted text clipped - 3 lines] > > Why the preference for powers of two? Because they are more exciting. Many developers, myself included, are quite reluctant to give up this remaining connection to our industry's more technical past.
Seriously, I can't think of a reason.
 Signature www.designacourse.com The Easiest Way To Train Anyone... Anywhere.
Chris Smith - Lead Software Developer/Technical Trainer MindIQ Corporation
Roedy Green - 30 Mar 2006 23:06 GMT >Seriously, I can't think of a reason. I am very surprised so many people did not immediately say:
if your buffer is not a multiple of 512, then the OS is going to have to allocate its own buffer to read the multiple of 512 and copy the bytes. If your buffer is a multiple of 512, there is a good chance it can do the I/O directly into your buffer. Physical i/o is done in terms of disk sectors.
 Signature Canadian Mind Products, Roedy Green. http://mindprod.com Java custom programming, consulting and coaching.
Patricia Shanahan - 31 Mar 2006 03:23 GMT >>Seriously, I can't think of a reason. > [quoted text clipped - 5 lines] > can do the I/O directly into your buffer. Physical i/o is done in > terms of disk sectors. I've done that sort of thing where I knew enough about disk transfer sizes and alignment requirements, and that the system would use it that way.
However, even java.nio buffers are only claimed to be suitable for direct I/O if they are allocated by the ByteBuffer allocateDirect factory method. Are you sure the JVM does direct I/O to ordinary byte arrays?
Patricia
Roedy Green - 31 Mar 2006 04:07 GMT > Are you sure the JVM does direct I/O to ordinary byte >arrays? The JVM likely has nothing to do with it. At the OS level you tell the OS to deliver X bytes from offset X in the file to offset Y in your buffer.
If the OS is clever it has prefetched those bytes and copied them to your buffer.
A long time ago hardware insisted on reading block and plopping them at 512 byte boundaries. I don't think that is still so, but I think it may still be so that physical buffers need to be multiples of 512. In Jet, buffers are automatically aligned on paragraph (16 byte boundaries)
In benchmarking, you must watch out that you don't reuse the same file since any reasonably decent OS will soon cache it.Once it is cached magic buffer sizes would no longer apply, at least until we get special hardware for copying pages around.
It is not just Windows you are talking about, but ancient old OS's like IBM's that support Java.
 Signature Canadian Mind Products, Roedy Green. http://mindprod.com Java custom programming, consulting and coaching.
James McGill - 31 Mar 2006 02:39 GMT > Why the preference for powers of two? Systems that live close to the architecture often find performance benefits, if not hard requirements, to align things on boundaries. It's quite possible that you might find low-level disc seeks that are not capable of seeking to an arbitrary address, but instead, deal in offsets from some block boundary -- and that will invariably be divided into some power of two.
But in this case, it's not at all clear, if it's even defined, whether it matters, or if there's any performance implication at all, or if the compiler or bytecode machine aligns them for you anyway, or if it would be more efficient to use a prime number instead of a power of two, or anything else about it. It's not a common thing to divide a buffer by two, or to arrange buffers for best fit in a larger "power-of-two" block, or to deal separately with "high and low half-buffers", or anything of this nature.
It appears this is a historical idiom, not of the language, but of the programmers. But it's hardly coincidental. Everything digital is organized in finite quantities, every resource being bounded by some power of two.
Maybe the next generation will revisit the merits of this whole "binary" thing, and something better will emerge. When it does, do you think we will have to throw away everything we know about discrete math?
In the meantime, I'll bet a dollar it does not matter whether you make your buffers 2000, 2047, 2048, or 2049 bytes. (And I'll gladly pay up if someone can show me metrics that show otherwise!)
Roedy Green - 31 Mar 2006 03:22 GMT On Thu, 30 Mar 2006 18:39:19 -0700, James McGill <jmcgill@cs.arizona.edu> wrote, quoted or indirectly quoted someone who said :
>In the meantime, I'll bet a dollar it does not matter whether you make >your buffers 2000, 2047, 2048, or 2049 bytes. (And I'll gladly pay up >if someone can show me metrics that show otherwise!) On the other paw, you might as well use powers of two since you have no evidence that avoiding them is better. Those are the natural size containers programmers think in.
Are you sure than even nio does not like buffers multiples of page frames? It seems highly unlikely.
If I am going to take you up on your bet, I want to find out in advance what you would consider "cheating". If I find even one OS where it matters do I get your dollar?
And how much of a percentage in speed additional by using magic multiples do I have to get to count as faster?
Am I allowed to use Jet, Java 1.6, -server?
 Signature Canadian Mind Products, Roedy Green. http://mindprod.com Java custom programming, consulting and coaching.
James McGill - 31 Mar 2006 05:27 GMT > If I find even one OS > where it matters do I get your dollar? If it's BSDI you only get a Canadian Dollar. If it's SCO, you owe me a dollar and shame on you for having a SCO box :-)
Chris Uppal - 30 Mar 2006 14:05 GMT > Then what will happen ? the buffer cant accept more than 512 bytes > .....will the additional bytes 1000-512 = 488 will still be in the > stream ? or they will be lost ? The extra bytes remain in the stream until you are ready to read them.
> the reason is, some people use > [quoted text clipped - 3 lines] > > which one is good ? It doesn't matter very much. In theory the larger the buffer the higher the potential speed, but in practise I just choose a number like 4096 and don't worry about it.
The reason it can be faster is that IF each call to read() ends up calling the similar function in the underlying OS, then there's a certain fixed overhead per call. So the more data you read in one call, the less the overhead when averaged over all the bytes you read.
Note that that doesn't apply if you are using a BufferedInputStream (or something like it) because it does the buffering for you. That way you can read tiny little chunks at a time (or even single bytes at a time) with very little effect on performance.
-- chris
Thomas Schodt - 30 Mar 2006 12:21 GMT gk wrote (edited):
> in the first iteration, there were 10 byte values in the stream. > (because its streaming and bytes might come slowly slowly) [quoted text clipped - 15 lines] > 10 byte values were stored in the byte array. > what will happen to those 10 byte values now ? They are lost.
> are those cleared first and then 25 byte values would be placed. > > OR > > the buffer would be completely overwritten with these new coming 25 > bytes ? What is the difference? Is there a difference? Not that I know of.
Re 'clear' - Are you asking if read() first stores zeroes in all the bytes of the byte array? No, it does not.
Re 'new bytes' - Are you asking if the byte array referenced by "buffer" is replaced with a new byte array? No, it is not. "buffer" still references the same byte array only now some of the bytes of the byte array have new values.
What are you asking?
Patricia Shanahan - 30 Mar 2006 15:32 GMT > more confused with those answers. > [quoted text clipped - 6 lines] > > write method uses this byte buffer. "The first byte read is stored into element b[0], the next one into b[1], and so on. The number of bytes read is, at most, equal to the length of b. Let k be the number of bytes actually read; these bytes will be stored in elements b[0] through b[k-1], leaving elements b[k] through b[b.length-1] unaffected."
Assume all 10 bytes are read by the first call, k is 10 and b is your byte buffer. Following the read call, elements 0 through 9 of your buffer contain the 10 bytes of read data. Elements 10 through 511 still contain whatever they contained before the read call.
> Now, in the second iteration say, there is 25 bytes in the stream > > so, so 25 bytes is read by the read() method and going to the > buffer. "The first byte read is stored into element b[0], the next one into b[1], and so on. The number of bytes read is, at most, equal to the length of b. Let k be the number of bytes actually read; these bytes will be stored in elements b[0] through b[k-1], leaving elements b[k] through b[b.length-1] unaffected."
Assume all bytes are read by the second call, k is 25 and b is your byte buffer. Following the read call, elements 0 through 24 of your buffer contain the 25 bytes of read data. Elements 25 through 511 still contain whatever they contained before the read call.
The repetition of the quote is deliberate. That paragraph tells you what happens to each element of your buffer, regardless of whether it is the first read call or the millionth read call.
> but in the first iteration buffer had 10 bytes ....what will happen > to those 10 bytes now ? Suppose you had written:
byte[0] = 7;
followed some time later by
byte[0] = 23;
What happens to the 7? That is what happens to the first 10 elements, when you do a read that gets at least 10 bytes of data.
> does those will be cleared off first and then 25 bytes would be placed. I don't see any reason for a prior clear operation, rather than just writing the new data over the old.
Patricia
Free MagazinesGet these publications absolutely FREE for up to 12 months. There are no hidden fees and no obligation. Simply choose a title, complete the application form and submit it. Read more ...
|
|
|