Home | Contact Us | FAQ | Search & Site Map | Link to Us
Sign In | Join | Other 45 Sites in Network
HomeAnnouncementsWhite Papers
Discussion GroupsFirst AidDatabasesJavaBeansGUIJava 3DVirtual MachineCORBASecurityToolsGeneral
Java DirectoryOpen Source ProjectsSample Book ChaptersUser GroupsWeb Resources
Related Topics
Databases.NETMore Topics ...

Java Forum / General / October 2006

Tip: Looking for answers? Try searching our database.

Browser versus Java URLConnection

Thread view: 
little_mm@ntlworld.com - 04 Oct 2006 15:54 GMT
Hi All

Perhaps someone knows the answer to this problem. I open a connection
to a URL and read lines one at a time from the URL using a
InputStreamReader and a BufferedReader:

              // Open connection to URL
               URLConnection conn =
(URLConnection)pageURL.openConnection();
               conn.setReadTimeout(timeout);
               conn.setConnectTimeout(timeout);
               conn.setUseCaches(false);
               InputStream pageStream = conn.getInputStream();
               BufferedReader reader = new BufferedReader(new
InputStreamReader(pageStream));

               String line;
               StringBuffer pageBuffer = new StringBuffer();
               while ((line = reader.readLine()) != null)
               {
                   System.out.println(line);
                   pageBuffer.append(line);
               }
               return pageBuffer.toString();

However, the actual text I get back from the URL is different from that
saved out of a browser from the same URL. Particularly, the browser
saves £ characters, whereas the lines read in Java are missing
these characters altogether. Also, some of the characters have actually
been deleted in the Java lines. I have tried using different character
encodings in the second argument of the InputStreamReader, this has
virtually no effect, except using UTF-16 which returns a large number
of "?" characters in the stream. The content type header of the page
says it is ISO-8859-1, but this character encoding string with the
InputStreamReader changes nothing in the Java code: the £ symbol is
still missing.

In the browser, if I change the character encoding to "UTF-8" then the
£ symbol is still properly displayed in the browser. In other words,
it looks like I am receiving different data from the server depending
upon whether I use the browser or the code. I'm not sure if it has
anything to do with the encoding, but I'm just guessing.

Thanks,
Nubs.
Andrew Thompson - 04 Oct 2006 16:29 GMT
...
> Perhaps someone knows the answer to this problem. I open a connection
> to a URL ...

What URL (specifically)?

> ...However, the actual text I get back from the URL is different from that
> saved out of a browser ...

What browser (make, version, OS - specifically)?

Is the saved text identical to the text shown when
you 'view source' in the 'a browser'?

Andrew T.
little_mm@ntlworld.com - 04 Oct 2006 16:33 GMT
Thanks for the response Andrew.

URL: http://www.net-a-porter.com/Shop/Shop/Shoes/All?pageNumber=0

Browser: Mozilla Firefox, but same effect in IE6, OS: Windows XP.

Yes, I think view source and save page are identical, although I
haven't checked byte-for-byte.

Nubs.

> ...
> > Perhaps someone knows the answer to this problem. I open a connection
[quoted text clipped - 11 lines]
>
> Andrew T.
Chris Uppal - 04 Oct 2006 17:18 GMT
> Perhaps someone knows the answer to this problem. I open a connection
> to a URL and read lines one at a time from the URL using a
> InputStreamReader and a BufferedReader:
[...]
> However, the actual text I get back from the URL is different from that
> saved out of a browser from the same URL. Particularly, the browser
> saves £ characters, whereas the lines read in Java are missing
> these characters altogether. Also, some of the characters have actually
> been deleted in the Java lines.

Maybe the website is using something like the Accept-Language: field in the
request to decide what currency (etc) to send back.  I don't know what the Java
HTTP client will send in that field by default, but it is unlikely to be
'en-GB' which is what my browser would send.

I just tried it myself, but -- most unfortunately -- the site has just stopped
responding.  I /do/ hope my little experiment didn't kill it...

   -- chris
little_mm@ntlworld.com - 04 Oct 2006 17:22 GMT
> > Perhaps someone knows the answer to this problem. I open a connection
> > to a URL and read lines one at a time from the URL using a
[quoted text clipped - 15 lines]
>
>     -- chris

Hi Chris - thanks for the response. So, question: how do you mimic the
browser's HTTP requests precisely, so that a website generally behaves
in the same way? For example, how do you change the Accept-Language
field?

Thanks,
Nubs.
Tor Iver Wilhelmsen - 04 Oct 2006 19:35 GMT
> Hi Chris - thanks for the response. So, question: how do you mimic the
> browser's HTTP requests precisely, so that a website generally behaves
> in the same way? For example, how do you change the Accept-Language
> field?

Look at URLConnection.setRequestProperty().
little_mm@ntlworld.com - 04 Oct 2006 22:12 GMT
> > Hi Chris - thanks for the response. So, question: how do you mimic the
> > browser's HTTP requests precisely, so that a website generally behaves
> > in the same way? For example, how do you change the Accept-Language
> > field?
>
> Look at URLConnection.setRequestProperty().

OK, many thanks Iver.
Chris Uppal - 05 Oct 2006 11:39 GMT
> Hi Chris - thanks for the response. So, question: how do you mimic the
> browser's HTTP requests precisely, so that a website generally behaves
> in the same way?

I see that Tor has already answered.  I want to add that their server is back
up this morning, and I've just tried again (it stayed up this time !).  The bad
news is that changing the Accept-Language field to, say, "da" made no
difference -- it still sent back a page where the price of the first boot was
&pound; <some jaw-droppingly large number>.   So that was a red-herring, I'm
afraid.

   -- chris


Free Magazines

Get these publications absolutely FREE for up to 12 months. There are no hidden fees and no obligation. Simply choose a title, complete the application form and submit it. Read more ...

Oracle MagazineNetwork ComputingComputer WorldBio-IT WorldeWeekInformation WeekInfosecurity
 
Sign In
Join
My Latest Posts
My Monitored Threads
My Blog
My Photo Gallery
My Profile
My Homepage

Start New Thread
Enable EMail Alerts
Rate this Thread



©2008 Advenet LLC   Privacy Policy - Terms of Use
This website includes both content owned or controlled by Advenet as well as content owned or controlled by third parties.