....
> > I need to download large CSV files into String objects for processing.
> > Unfortunately my download routine seems to be exceptionally slow.
....
> > ...takes several minutes.
>
> Your code iterates through the file, reading a line at a time.
...
> ... Java can read the
> entire file as a single read, about 100 times faster
> than it could read the file line by line.
Just to test that theory with an URL connection,
I tried the following method on a 735Kb file.
****************************************
public static String getContent(URL url)
{
String s = "";
StringBuffer sb = new StringBuffer("");
long t1, t2, t3, t4;
t1 = (new Date()).getTime();
try
{
URLConnection urlCon = url.openConnection();
BufferedReader br = new BufferedReader(new
InputStreamReader(urlCon.getInputStream()));
String line = "";
while ((line = br.readLine()) != null)
if(sb.equals("")) { sb.append(line); }
else { sb.append("\n" + line); }
br.close();
}
catch(UnknownHostException e) { System.out.println(e); }
catch(IOException e) { System.out.println(e); }
t2 = (new Date()).getTime();
t3 = (new Date()).getTime();
try
{
URLConnection urlCon = url.openConnection();
InputStream is = urlCon.getInputStream();
byte b1[] = new byte[is.available()];
int sz = is.read(b1);
if (sz>=0) s = new String(b1);
}
catch(UnknownHostException e) { System.out.println(e); }
catch(IOException e) { System.out.println(e); }
t4 = (new Date()).getTime();
String message = "Times:\nLine: \t" + (t2-t1) + "\nFile: \t" + (t4-t3);
System.out.println( message );
return message;
}
****************************************
The results are..
Times:
Line: 140
File: 50
Well, not 100 times faster (scratches head, maybe I
was using a String as well) but almost 3 times faster..
--
Andrew Thompson
* http://www.PhySci.org/ PhySci software suite
* http://www.1point1C.org/ 1.1C - Superluminal!
* http://www.AThompson.info/andrew/ personal site
Andrew Thompson - 08 Dec 2003 06:49 GMT
...
> Just to test that theory with an URL connection,
> I tried the following method on a 735Kb file.
......
> The results are..
> Times:
> Line: 140
> File: 50
Those numbers were impressive, no?
Would be more impressive if my method
had been _reading_ the _entire_ file.
Which it was not!
I only noticed later when I started returning the
file contents rather than just the time differences..
--
Andrew Thompson
* http://www.PhySci.org/ PhySci software suite
* http://www.1point1C.org/ 1.1C - Superluminal!
* http://www.AThompson.info/andrew/ personal site
Anthony Borla - 08 Dec 2003 07:48 GMT
> ....
<SNIP>
> Just to test that theory with an URL connection,
> I tried the following method on a 735Kb file.
Well done, Andrew - an object lesson in practical programming !
It's often forgotten by those new to programming, or a particular
programming language, that programming is a *practical* art, and that
experimentation plays a key role in formulating soutions.
Put simply, if the programmer is not sure about something, or can find
little, or no relevant information on it, editor and compiler should be
wielded to whip up a little test code and try ideas out. The worst that can
happen is that the ideas don't pan out; on the upside, something new is
learned, and the problem solved.
Cheers,
Anthony Borla
Andrew Thompson - 08 Dec 2003 08:03 GMT
> <SNIP>
..
> > Just to test that theory with an URL connection,
> > I tried the following method on a 735Kb file.
..
> Well done, Andrew - an object lesson in practical programming !
>
> It's often forgotten by those new to programming, or a particular
> programming language, that programming is a *practical* art, and that
> experimentation plays a key role in formulating soutions.
It's lucky I 'tested' it further later. :)
Pike - 08 Dec 2003 20:13 GMT
Thanks Andrew, and to everyone else who's contributed to this thread.
It's so kind of you to assist me with my problem.
I did some doodling with the code last night, and got something pretty
close to your solution (but it's slower so I won't waste precious web
space by posting it). However, it didn't have the is.available() bit
and was thus only marginally faster than the Line by line reading
method!!!
Thanks again!
Pike.