I just did a test, and my java code now runs as fast as the c++ code
(93 seconds, time including all, I basically use a "stopwatch" for it
beacuse it is what I need -- 1 or 2 seconds of miscount is possible).
Just in case other people may be interested in it, below, I briefly
state how I do the fast file read (for plain ascii in my case):
1) read in the file using InputStream, each time read in 32K data into
a byte[] buffer.
2) write my own "readLine()" method, which scan in the byte[] buffer,
and return a new byte[] as a line.
3) write my own "split(char c)" method, which break one byte[] into
many byte[].
If we want to hash this byte[], then write a string class around it to
provide the hash and other functions, etc. Try not convet them to
java's String, which will be slow.
Thanks all on this group. :-)
William Brogden - 12 Feb 2006 16:39 GMT
> I just did a test, and my java code now runs as fast as the c++ code
> (93 seconds, time including all, I basically use a "stopwatch" for it
[quoted text clipped - 6 lines]
> 2) write my own "readLine()" method, which scan in the byte[] buffer,
> and return a new byte[] as a line.
Why a new byte[] when all you need is a start index and count? (Of course
that depends on keeping the initial buffer around.)
> 3) write my own "split(char c)" method, which break one byte[] into
> many byte[].
See above question - the object holding index and count could calculate
a hashcode when it is created.
> If we want to hash this byte[], then write a string class around it to
> provide the hash and other functions, etc. Try not convet them to
> java's String, which will be slow.
Very true!
-------------
Bill