Home | Contact Us | FAQ | Search & Site Map | Link to Us
Sign In | Join | Other 45 Sites in Network
HomeAnnouncementsWhite Papers
Discussion GroupsFirst AidDatabasesJavaBeansGUIJava 3DVirtual MachineCORBASecurityToolsGeneral
Java DirectoryOpen Source ProjectsSample Book ChaptersUser GroupsWeb Resources
Related Topics
Databases.NETMore Topics ...

Java Forum / General / December 2003

Tip: Looking for answers? Try searching our database.

Optimising the downloading of a large csv file into a string

Thread view: 
Pike - 07 Dec 2003 17:25 GMT
Hi,

I need to download large CSV files into String objects for processing.
Unfortunately my download routine seems to be exceptionally slow.  I
believe it's because of the following line

ret+="\n" + line;

If I download the csv files into Excel via Internet Explorer the
transfer takes a few seconds, but using the method below takes several
minutes.

Does anyone know how I can make the download method faster?  I can't
find any java methods which will download the whole file in one go.

Thanks,

import java.io.*;
import java.net.*;

public class download {
   
   public static String download(String filename) {
       String ret="";
   URL javacodingURL = null;
    try {
           javacodingURL = new URL(filename);
   }catch(MalformedURLException e){
           // Malformed URL
           System.out.println("Error in given URL");
           return ret;
   }
       
   try {
           URLConnection connection = javacodingURL.openConnection();
           BufferedReader br = new BufferedReader(new
             InputStreamReader(connection.getInputStream()));
           String line = "";
           while ((line = br.readLine()) != null)
               if(ret.equals("")){
                   ret=line;
               }else{
                   ret+="\n" + line;
               }
           br.close();
  }catch(UnknownHostException e){
           System.out.println("Unknown Host");
           return ret;
  }catch(IOException e){
           System.out.println("Error in opening URLConnection,
Reading or Writing");
           return ret;
  }
  return ret;
 }// end download method
}// end download class
Tor Iver Wilhelmsen - 07 Dec 2003 17:53 GMT
> Does anyone know how I can make the download method faster?  I can't
> find any java methods which will download the whole file in one go.

You should build up the result using a StringBuffer.

             StringBuffer buf = new StringBuffer();

>             String line = "";
>             while ((line = br.readLine()) != null)

               buf.append(line).append('\n');

>             br.close();
>    }catch(UnknownHostException e){
[quoted text clipped - 5 lines]
>             return ret;
>    }

             // Remove trainling \n
             buf.setLength(buf.length()-1);
             ret = buf.toString();

>    return ret;
>   }// end download method
> }// end download class
Chris Uppal - 07 Dec 2003 19:42 GMT
>     try {
>             URLConnection connection = javacodingURL.openConnection();
>             BufferedReader br = new BufferedReader(new
>               InputStreamReader(connection.getInputStream()));

Besides using a StringBuffer as has already been suggested, you should put the
buffering "as close" to the raw input stream as possible.  I.e something like:

       try {
             URLConnection connection = javacodingURL.openConnection();
              Reader reader = new InputStreamReader(
                                                     new BufferedReader(

connection.getInputStream()));
               ....

Otherwise the InputStreamReader will be reading tiny little chunks from the
(presumably) unbuffered InputStream created by the URLConnection.

   -- chris
Andrew Thompson - 07 Dec 2003 23:52 GMT
> Hi,
>
[quoted text clipped - 7 lines]
> transfer takes a few seconds, but using the method below takes several
> minutes.

Your code iterates through the file, reading a line at a time.

I tried that in in some code so I could update a progress bar.

I gave it up when I realised that Java can read the
entire file as a single read, about 100 times faster
than it could read the file line by line.

--
Andrew Thompson
* http://www.PhySci.org/ PhySci software suite
* http://www.1point1C.org/ 1.1C - Superluminal!
* http://www.AThompson.info/andrew/ personal site
Andrew Thompson - 08 Dec 2003 02:06 GMT
....
> > I need to download large CSV files into String objects for processing.
> >  Unfortunately my download routine seems to be exceptionally slow.
....
> > ...takes several minutes.
>
> Your code iterates through the file, reading a line at a time.
...
> ... Java can read the
> entire file as a single read, about 100 times faster
> than it could read the file line by line.

Just to test that theory with an URL connection,
I tried the following method on a 735Kb file.

****************************************
public static String getContent(URL url)
{
String s = "";
StringBuffer sb = new StringBuffer("");

long t1, t2, t3, t4;

t1 = (new Date()).getTime();
try
{
 URLConnection urlCon = url.openConnection();
 BufferedReader br = new BufferedReader(new
InputStreamReader(urlCon.getInputStream()));
 String line = "";
 while ((line = br.readLine()) != null)
 if(sb.equals("")) { sb.append(line); }
 else { sb.append("\n" + line); }
 br.close();
}
catch(UnknownHostException e) { System.out.println(e); }
catch(IOException e) { System.out.println(e); }
t2 = (new Date()).getTime();

t3 = (new Date()).getTime();
try
{
 URLConnection urlCon = url.openConnection();
 InputStream is = urlCon.getInputStream();
 byte b1[] = new byte[is.available()];
 int sz = is.read(b1);
 if (sz>=0) s = new String(b1);
}
catch(UnknownHostException e) { System.out.println(e); }
catch(IOException e) { System.out.println(e); }
t4 = (new Date()).getTime();

String message = "Times:\nLine: \t" + (t2-t1) + "\nFile: \t" + (t4-t3);
System.out.println( message );

return message;
}
****************************************

The results are..
Times:
Line:     140
File:     50

Well, not 100 times faster (scratches head, maybe I
was using a String as well) but almost 3 times faster..

--
Andrew Thompson
* http://www.PhySci.org/ PhySci software suite
* http://www.1point1C.org/ 1.1C - Superluminal!
* http://www.AThompson.info/andrew/ personal site
Andrew Thompson - 08 Dec 2003 06:49 GMT
...
> Just to test that theory with an URL connection,
> I tried the following method on a 735Kb file.
......
> The results are..
> Times:
> Line: 140
> File: 50

Those numbers were impressive, no?

Would be more impressive if my method
had been _reading_ the _entire_ file.
Which it was not!

I only noticed later when I started returning the
file contents rather than just the time differences..

--
Andrew Thompson
* http://www.PhySci.org/ PhySci software suite
* http://www.1point1C.org/ 1.1C - Superluminal!
* http://www.AThompson.info/andrew/ personal site
Anthony Borla - 08 Dec 2003 07:48 GMT
> ....

<SNIP>

> Just to test that theory with an URL connection,
> I tried the following method on a 735Kb file.

Well done, Andrew - an object lesson in practical programming !

It's often forgotten by those new to programming, or a particular
programming language, that programming is a *practical* art, and that
experimentation plays a key role in formulating soutions.

Put simply, if the programmer is not sure about something, or can find
little, or no relevant information on it, editor and compiler should be
wielded to whip up a little test code and try ideas out. The worst that can
happen is that the ideas don't pan out; on the upside, something new is
learned, and the problem solved.

Cheers,

Anthony Borla
Andrew Thompson - 08 Dec 2003 08:03 GMT
> <SNIP>
..
> > Just to test that theory with an URL connection,
> > I tried the following method on a 735Kb file.
..
> Well done, Andrew - an object lesson in practical programming !
>
> It's often forgotten by those new to programming, or a particular
> programming language, that programming is a *practical* art, and that
> experimentation plays a key role in formulating soutions.

It's lucky I 'tested' it further later.    :)
Pike - 08 Dec 2003 20:13 GMT
Thanks Andrew, and to everyone else who's contributed to this thread.
It's so kind of you to assist me with my problem.

I did some doodling with the code last night, and got something pretty
close to your solution (but it's slower so I won't waste precious web
space by posting it).  However, it didn't have the is.available() bit
and was thus only marginally faster than the Line by line reading
method!!!

Thanks again!

Pike.
Stephen Ostermiller - 19 Dec 2003 17:07 GMT
Try using this CSV Parser:
http://ostermiller.org/utils/ExcelCSV.html

String[][] values = com.Ostermiller.util.ExcelCSVParser.parse(
   new InputStreamReader(
       javacodingURL.openConnection().getInputStream()
   )
);

It should deal with your problems for you.  It does buffering, it does
efficient string creation.  Plus it is only one line of code.

Stephen


Free Magazines

Get these publications absolutely FREE for up to 12 months. There are no hidden fees and no obligation. Simply choose a title, complete the application form and submit it. Read more ...

Oracle MagazineNetwork ComputingComputer WorldBio-IT WorldeWeekInformation WeekInfosecurity
 
Sign In
Join
My Latest Posts
My Monitored Threads
My Blog
My Photo Gallery
My Profile
My Homepage

Start New Thread
Enable EMail Alerts
Rate this Thread



©2008 Advenet LLC   Privacy Policy - Terms of Use
This website includes both content owned or controlled by Advenet as well as content owned or controlled by third parties.