Home | Contact Us | FAQ | Search & Site Map | Link to Us
Sign In | Join | Other 45 Sites in Network
HomeAnnouncementsWhite Papers
Discussion GroupsFirst AidDatabasesJavaBeansGUIJava 3DVirtual MachineCORBASecurityToolsGeneral
Java DirectoryOpen Source ProjectsSample Book ChaptersUser GroupsWeb Resources
Related Topics
Databases.NETMore Topics ...

Java Forum / General / November 2006

Tip: Looking for answers? Try searching our database.

speeding up URLConnection reading

Thread view: 
mark - 04 Nov 2006 17:16 GMT
Hello,

I want to read the content of some webpages and make some string
comparisons with them (i.e. check if there is some text in it, use some
regular expressions, etc.).

StringBuilder htmlCode = new StringBuilder();
URL url = new URL(fileName);
URLConnection conn = url.openConnection();
conn.connect();
BufferedReader dis = new BufferedReader(new
InputStreamReader(conn.getInputStream()));
String inputLine = "";
for(;;) {
        inputLine = dis.readLine();
          if (inputLine == null) break;
       htmlCode.append(inputLine);
}

It works, but it is very, very slow comparing to browser. Do you know
any ways to speed it up??

Regards, mark
Daniel Pitts - 05 Nov 2006 00:12 GMT
> Hello,
>
[quoted text clipped - 19 lines]
>
> Regards, mark

Don't use a buffered reader, as you don't need to read it one line at a
time.

       final URL url = new URL(adjustUrl(page));
       final HttpURLConnection connection = (HttpURLConnection)
url.openConnection();

       connection.setRequestMethod(method);
       connection.connect();
       try {
           final InputStream is = connection.getInputStream();
           final Reader reader = new InputStreamReader(is);
           final char[] buf = new char[1024];
           int read;
           final StringBuffer sb = new StringBuffer();
           while((read = reader.read(buf)) > 0) {
               sb.append(buf, 0, read);
           }
       } finally {
           connection.disconnect();
       }
mark - 10 Nov 2006 01:21 GMT
Hello,

> Don't use a buffered reader, as you don't need to read it one line at a
> time.

Thank you. It's speed up the speed, although comparing to webbrowser it
is still not enough. Do you know any other trick which could help me
here? Thanks!

Regards, mark
EJP - 10 Nov 2006 07:25 GMT
> Thank you. It's speed up the speed, although comparing to webbrowser it
> is still not enough. Do you know any other trick which could help me
> here? Thanks!

Raise that buffer from 1024 to 16384.
mark - 10 Nov 2006 17:19 GMT
Hello,

> Raise that buffer from 1024 to 16384.

Thank you. I did it but still no big improvement. I actually tried to
play with jacarta httpClient and it increases the performance. The
problem is that it is still unsatisfactory (i.e. it got the websites
(cause I am going through a lot of pages at once) in 10 minutes, while
my friend's script in visual basic did it in 3 minutes. So the
difference is big, too big :(.

GetMethod httpget = new GetMethod(fileName);
httpget.setDoAuthentication(false);
httpget.getParams().setParameter("http.connection.stalecheck", false);
httpget.getParams().setParameter("http.protocol.expect-continue",
false);
try {
httpclient.executeMethod(httpget);
Reader reader = new InputStreamReader(
httpget.getResponseBodyAsStream(), httpget.getResponseCharSet());
char[] buf = new char[131072];
int read;
while((read = reader.read(buf)) > 0) {
htmlCode.append(buf, 0, read);
}} catch (Exception e) {
   e.printStackTrace();
} finally {
   httpget.releaseConnection();
} return htmlCode.toString();

Any ideas how could I greatly improve its quality (is it possible in
java)??

Regards, mark
Daniel Pitts - 10 Nov 2006 18:01 GMT
> Hello,
>
[quoted text clipped - 30 lines]
>
> Regards, mark

Multithread it, if you're downloading more than one thing, do them in
paralelle.
su_dang@hotmail.com - 10 Nov 2006 19:24 GMT
> Hello,
>
[quoted text clipped - 30 lines]
>
> Regards, mark

You might want to put some statements to see how long it takes to
establish the connection and how long it takes to read the content.

Su Dang
EJP - 11 Nov 2006 00:34 GMT
> Any ideas how could I greatly improve its quality (is it possible in
> java)??

You could get rid of the Reader and use an InputStream. But I think
you're up against some network connectivity thing really.
mark - 11 Nov 2006 10:53 GMT
Hello,

> You could get rid of the Reader and use an InputStream. But I think
> you're up against some network connectivity thing really.

I have just made some measurements and the most time consuming is
getting the message into the string. I am actually using:

StringBuilder str = new StringBuilder();
char[] b = new char[32678];
Reader reader = new InputStreamReader(
method.getResponseBodyAsStream(), method.getResponseCharSet());
for (int n; (n = reader.read(b)) != -1;) str.append(b, 0, n);
String answer = str.toString();

Is it possible to make it faster (all the chars are just a standard
ascii text so there is no need to take care about utf, etc.).
EJP - 12 Nov 2006 04:38 GMT
>  Is it possible to make it faster (all the chars are just a standard
ascii text so there is no need to take care about utf, etc.).

LIke I said, you could use an InputStream instead of the Reader.
Chris Uppal - 13 Nov 2006 08:26 GMT
> I have just made some measurements and the most time consuming is
> getting the message into the string. I am actually using:
[quoted text clipped - 5 lines]
> for (int n; (n = reader.read(b)) != -1;) str.append(b, 0, n);
> String answer = str.toString();

I find it /very/ hard to believe that decoding ASCII-valued binary data into
ASCII-valued string data is slower than transmitting that data across a
network.  I think you must have mis-measured somehow.

   -- chris


Free Magazines

Get these publications absolutely FREE for up to 12 months. There are no hidden fees and no obligation. Simply choose a title, complete the application form and submit it. Read more ...

Oracle MagazineNetwork ComputingComputer WorldBio-IT WorldeWeekInformation WeekInfosecurity
 
Sign In
Join
My Latest Posts
My Monitored Threads
My Blog
My Photo Gallery
My Profile
My Homepage

Start New Thread
Enable EMail Alerts
Rate this Thread



©2008 Advenet LLC   Privacy Policy - Terms of Use
This website includes both content owned or controlled by Advenet as well as content owned or controlled by third parties.