Home | Contact Us | FAQ | Search & Site Map | Link to Us
Sign In | Join | Other 45 Sites in Network
HomeAnnouncementsWhite Papers
Discussion GroupsFirst AidDatabasesJavaBeansGUIJava 3DVirtual MachineCORBASecurityToolsGeneral
Java DirectoryOpen Source ProjectsSample Book ChaptersUser GroupsWeb Resources
Related Topics
Databases.NETMore Topics ...

Java Forum / General / November 2006

Tip: Looking for answers? Try searching our database.

infinite loop with http requests

Thread view: 
yawnmoth - 20 Nov 2006 17:37 GMT
I'm trying to write something that'll let me output the contents of a
given webpage while skipping over the headers.  Since I'm trying to
learn raw HTTP, I'm using Sockets and not URL.

Anyway, the header of an HTTP response ends when you have "\r\n\r\n".
BufferedReader's readLine treats that as two lines since it considers
"\r\n" to be a line terminating character.  Since it also strips off
the line terminating characters, readLine should return the second line
as "".

Per that, I've written a program that will loop, continuously, until ""
is encountered.  Unfortunately, "" never appears to be encountered and
thus I have an infinite loop.

Here's my code:

import java.net.*;
import java.io.*;

public class HttpRequestor
{
  public static void main(String[] args) {
     try {
        Socket sock = new Socket("www.google.com", 80);
        String httpRequest = "GET / HTTP/1.0\r\nHost:
www.google.com\r\n\r\n";
        sock.getOutputStream().write(httpRequest.getBytes());
        BufferedReader text = new BufferedReader(new
InputStreamReader(sock.getInputStream()));

        String line, output = "";
        while (text.readLine() != "");
        while ((line = text.readLine()) != null) {

System.out.println("\r\n'"+URLEncoder.encode(line)+"'\r\n");
        }
     }
     catch (Exception e) {
        e.printStackTrace();
     }
  }
}

To confirm that I was indeed getting "" back from readLine, I wrote the
following:

import java.net.*;
import java.io.*;

public class HttpRequestor
{
  public static void main(String[] args) {
     try {
        Socket sock = new Socket("www.google.com", 80);
        String httpRequest = "GET / HTTP/1.0\r\nHost:
www.google.com\r\n\r\n";
        sock.getOutputStream().write(httpRequest.getBytes());
        BufferedReader text = new BufferedReader(new
InputStreamReader(sock.getInputStream()));

        String line, output = "";
        while ((line = text.readLine()) != null) {

System.out.println("\r\n'"+URLEncoder.encode(line)+"'\r\n");
        }
     }
     catch (Exception e) {
        e.printStackTrace();
     }
  }
}

This shows that "" is indeed being returned by readLine.  So why
doesn't the while loop in the first program terminate when "" is
received?

Any insights would be appreciated - thanks!
Robert Klemme - 20 Nov 2006 17:41 GMT
> I'm trying to write something that'll let me output the contents of a
> given webpage while skipping over the headers.  Since I'm trying to
[quoted text clipped - 71 lines]
> doesn't the while loop in the first program terminate when "" is
> received?

Because you compare strings with == (identity) instead with equals()
(equivalence).

    robert
yawnmoth - 20 Nov 2006 18:27 GMT
> <snip>
> Because you compare strings with == (identity) instead with equals()
> (equivalence).
That was it - thanks! :)
Oliver Wong - 20 Nov 2006 17:55 GMT
> I'm trying to write something that'll let me output the contents of a
> given webpage while skipping over the headers.  Since I'm trying to
> learn raw HTTP, I'm using Sockets and not URL.

[snip most of the code]
>         Socket sock = new Socket("www.google.com", 80);

   I recommend against using google as your test server. Google does some
funky stuff when it detects that Java is connecting to it, which may give
you unexpected results.

   - Oliver
Daniel Pitts - 20 Nov 2006 18:22 GMT
> > I'm trying to write something that'll let me output the contents of a
> > given webpage while skipping over the headers.  Since I'm trying to
[quoted text clipped - 8 lines]
>
>     - Oliver

Good suggestion except for two things, He isn't using Java's URL API,
which is what's responsible for setting the User-Agent string. Second,
you can override the User-Agent string, and google couldn't possible
know the difference.

In any case, his problem is that the OP is comparingwith line == "",
when he should use line.equals(""), or better yet line.size() == 0

HTH,
Daniel.
Chris Uppal - 20 Nov 2006 19:29 GMT
> >     I recommend against using google as your test server. Google does
> > some funky stuff when it detects that Java is connecting to it, which
> > may give you unexpected results.
[...]
> Good suggestion except for two things, He isn't using Java's URL API,
> which is what's responsible for setting the User-Agent string. Second,
> you can override the User-Agent string, and google couldn't possible
> know the difference.

I agree with Oliver's advice.  Google is perfectly at liberty to treat requests
differently depending on how they /appear/ to have been submitted.

If I were them I would group requests into at least three categories: ones that
appear to be legit (as far as we can tell from the various meta-info in a
request); those that appear to come from frequently abused clients (such as the
Java stuff); and those where we can't tell much.   I would be less aggressive
about -- say -- shutting off an over-eager client IP address if the requests
appeared to be from a normal browser than if they appeared to come from
uncontrolled code.  And I'd put the "can't tell" ones somewhere in the middle.

But the bottom line is not that Google /can/ treat requests differently
depending on apparently immaterial meta stuff, but that it /does/ do so --
which makes it a very poor example domain for a beginner (to HTTP) to test
against.

   -- chris
Daniel Pitts - 20 Nov 2006 20:29 GMT
> > >     I recommend against using google as your test server. Google does
> > > some funky stuff when it detects that Java is connecting to it, which
[quoted text clipped - 22 lines]
>
>     -- chris

Okay, while my point was that you can "trick" google into thinking that
it is probably a legit client, your point is well taken.

I suppose a good way to learn HTTP is to set up a webserver in your own
development environment (such as apache, resin, etc...), and use it
instead of a third party website. That way you also have control over
the content being produced.

- Daniel.


Free Magazines

Get these publications absolutely FREE for up to 12 months. There are no hidden fees and no obligation. Simply choose a title, complete the application form and submit it. Read more ...

Oracle MagazineNetwork ComputingComputer WorldBio-IT WorldeWeekInformation WeekInfosecurity
 
Sign In
Join
My Latest Posts
My Monitored Threads
My Blog
My Photo Gallery
My Profile
My Homepage

Start New Thread
Enable EMail Alerts
Rate this Thread



©2008 Advenet LLC   Privacy Policy - Terms of Use
This website includes both content owned or controlled by Advenet as well as content owned or controlled by third parties.