>or way too small for a huge file.
> temporary buffer (but I bet the implementation messes up somewhere).
Tom Hawtin wrote:
>>> FileReader fileReader = new FileReader(file);
>>> CharBuffer charBuffer = CharBuffer.allocate((int)file.length());
>> This could allocate a buffer three times to large,
> Going over Javadocs... could you elaborate?
Because Strings and Chars are encoded, as are files. UTF-8, for example, uses
one to three bytes per character depending on the character set and other
factors.
I'm not sure about how Tom arrived at three times as large but I can easily
see how the CharBuffer could be twice as large as the file data. CharBuffers
are allocated at two bytes per character. A file encoding that uses 8 bits
per character will only fill half such a buffer. I'm guessing that Tom is
familiar with some combination of encoding schemes that would have the
CharBuffer wind up three times too large for the file.
>> or way too small for a huge file.
If the file uses a multibyte encoding with lots of characters that require
more than two bytes each.
>>> fileReader.read(charBuffer);
>> This does not necessarily read all that could be read. Should be in a
[quoted text clipped - 5 lines]
> Is this a sufficent loop?
> while(fileReader.ready()){fileReader.read(charBuffer);}
No. You'll have to fill the buffer, flip() it, read it to store or processe
the data, then rewind() and repeat. I haven't played with java.nio much but
if I erred here someone should step up and correct me pretty quickly.
<http://java.sun.com/developer/technicalArticles/releases/nio/index.html>
<http://www.javaworld.com/javaworld/jw-09-2001/jw-0907-merlin.html>
GIYF.

Signature
Lew
Jeff Higgins - 25 May 2007 15:06 GMT
> Tom Hawtin wrote:
>>>> FileReader fileReader = new FileReader(file);
[quoted text clipped - 4 lines]
>
> Because Strings and Chars are encoded, as are files. ...
OK, chars are not bytes. (int)file.length() not a good choice here.
>>> or way too small for a huge file.
if file.length() > Integer.MAX_VALUE file == huge file
>>>> fileReader.read(charBuffer);
>>> This does not necessarily read all that could be read. Should be in a
[quoted text clipped - 10 lines]
> java.nio much but if I erred here someone should step up and correct me
> pretty quickly.
Going back over Javadocs -- silly condition.
> <http://java.sun.com/developer/technicalArticles/releases/nio/index.html>
> <http://www.javaworld.com/javaworld/jw-09-2001/jw-0907-merlin.html>
Thanks for the pointers. I read the javaworld article, very interesting.
> GIYF.
GIGR The Google isa great resource.
Back to the OP which caught my eye, and to Tom's response,
"One byte at a time. Not going to be fast."
OK, scratch the CharBuffer solution. Now my latest solution:
[snippet]
startBlock = System.currentTimeMillis();
for(int i = 0; i < 10; i++)
{
File file = new File("file.9612544.bytes");
byte[] a = new byte[(int)file.length()];
FileInputStream fis = new FileInputStream(file);
fis.read(a);
String str = new String(a,"US-ASCII");
fis.close();
}
endBlock = System.currentTimeMillis();
startLoop = System.currentTimeMillis();
for(int i = 0; i < 10; i++)
{
File file = new File("file.9612544.bytes");
byte[] a = new byte[(int)file.length()];
FileInputStream fis = new FileInputStream(file);
int n;
int c = 0;
while ((n = fis.read()) != -1)
{
a[0] = (byte)n;
}
String str = new String(a,"US-ASCII");
fis.close();
}
endLoop = System.currentTimeMillis();
Block 1547
Loop 287750
Thanks,
appreciate the OP
and all the comments.
Jeff Higgins
Knute Johnson - 27 May 2007 03:33 GMT
>> Tom Hawtin wrote:
>>>>> FileReader fileReader = new FileReader(file);
[quoted text clipped - 45 lines]
> FileInputStream fis = new FileInputStream(file);
> fis.read(a);
This may or may not read as many bytes as the length of the array a and
is therefore guaranteed not to work every time. See the docs.
> String str = new String(a,"US-ASCII");
> fis.close();
[quoted text clipped - 11 lines]
> {
> a[0] = (byte)n;
a[c++] = (byte)n;
> }
> String str = new String(a,"US-ASCII");
[quoted text clipped - 9 lines]
> and all the comments.
> Jeff Higgins

Signature
Knute Johnson
email s/nospam/knute/
Arne Vajhøj - 27 May 2007 03:50 GMT
>> File file = new File("file.9612544.bytes");
>> byte[] a = new byte[(int)file.length()];
[quoted text clipped - 3 lines]
> This may or may not read as many bytes as the length of the array a and
> is therefore guaranteed not to work every time. See the docs.
s/guaranteed not/not guaranteed/w
Arne
Jeff Higgins - 04 Jul 2007 04:48 GMT
>>> Again, I'm sorry but I haven't been able to figure out what might
>>> cause read(charBuffer) to not read all that could be read?
[quoted text clipped - 15 lines]
>>
>> GIYF.
<http://mindprod.com:80/jgloss/readeverything.html>
> Again, I'm sorry but I haven't been able to figure out what might
> cause read(charBuffer) to not read all that could be read?
The fact that the Javadoc specifically says so?
Jeff Higgins - 04 Jul 2007 14:14 GMT
>> Again, I'm sorry but I haven't been able to figure out what might
>> cause read(charBuffer) to not read all that could be read?
>
> The fact that the Javadoc specifically says so?
:-) Yup, it is what it is.
Better for me to focus on what rather than why.
Patricia Shanahan - 04 Jul 2007 16:54 GMT
>>> Again, I'm sorry but I haven't been able to figure out what might
>>> cause read(charBuffer) to not read all that could be read?
>> The fact that the Javadoc specifically says so?
>
> :-) Yup, it is what it is.
> Better for me to focus on what rather than why.
I think the "why" is because part of the file may be buffered in memory.
Disk reads are always in fixed block sizes, and the data required to
fill the program buffer may cross block boundaries.
Suppose some, but not all, of the data for the read call is already in
memory. The system could make you wait many milliseconds for a physical
read to let it fill your buffer. It is often more efficient to let you
get on with processing the data that is already available, in parallel
with a physical read to get more data. For example, the read call may be
being issued by a BufferedReader doing a readLine, and it can return
data to its caller as soon as it has a whole line.
Patricia
Roedy Green - 04 Jul 2007 19:38 GMT
>For example, the read call may be
>being issued by a BufferedReader doing a readLine, and it can return
>data to its caller as soon as it has a whole line.
even though we did double buffering and the like back in the days of
16K machines, I don't think java.io itself is that smart. I don't
think it is clever enough to read ahead another buffer why processing
the previous one , or letting your start processing lines before the
i/o completes.
--
Roedy Green Canadian Mind Products
The Java Glossary
http://mindprod.com
Jeff Higgins - 04 Jul 2007 20:45 GMT
>>>> Again, I'm sorry but I haven't been able to figure out what might
>>>> cause read(charBuffer) to not read all that could be read?
[quoted text clipped - 14 lines]
> being issued by a BufferedReader doing a readLine, and it can return
> data to its caller as soon as it has a whole line.
LOL :-) What us noobs won't go through to gain a little understanding!
Yup, during the course of this discussion I spent a good bit of energy
exploring some of the issues you describe. Mostly what I took away from it
was:
When using the basic IO facilities I should be concentrating on what I'm
hoping
to accomplish and not how the JVM is fetching bytes from whatever physical
medium.
What caused most of my confusion I suppose was the fact that I didn't have a
real
use-case in mind for this exploration. The OP wanted to know how to read the
contents of a file into a String, and I immediatly reacted by trying to find
a solution
to that problem when I may well have been better off asking "What am I
hoping
to accomplish here?". When given the advice, "This does not necessarily read
all
that could be read. Should be in a loop.", and after having consulted the
javadocs
my next question should probably have been: "Ok, now what?" instead of
"Well, why not?".
Anyway, it's been a pleasant line of inquiry, and fun.
Thanks for the response, much appreciated.
JH