Home | Contact Us | FAQ | Search & Site Map | Link to Us
Sign In | Join | Other 45 Sites in Network
HomeAnnouncementsWhite Papers
Discussion GroupsFirst AidDatabasesJavaBeansGUIJava 3DVirtual MachineCORBASecurityToolsGeneral
Java DirectoryOpen Source ProjectsSample Book ChaptersUser GroupsWeb Resources
Related Topics
Databases.NETMore Topics ...

Java Forum / General / April 2006

Tip: Looking for answers? Try searching our database.

URL connections

Thread view: 
Ben - 11 Apr 2006 15:56 GMT
I'm writting a webcrawler so far it parses correctly, connects and gets
the response code just fine. The problem is that everytime I try to
access a .htm link it throws a FileNotFound exception.

Is the .htm extension not supported by the URL class?
if not is there a class that will allow me to connect to a .htm link to
get the response code?

thanks for the help
Ben
Ben - 11 Apr 2006 16:11 GMT
> I'm writting a webcrawler so far it parses correctly, connects and gets
> the response code just fine. The problem is that everytime I try to
[quoted text clipped - 6 lines]
> thanks for the help
> Ben

same for the .php extension
Chris Uppal - 11 Apr 2006 16:35 GMT
> Is the .htm extension not supported by the URL class?

The extension shouldn't make any difference at all to whether the URL can be
read.  You almost certainly have a different problem.  Possibly an URL-encoding
issue, or a relative URL issue.  Stick some tracing in (or use Ethereal) to
find out exactly what you are asking the server for.

   -- chris
Ben - 11 Apr 2006 17:02 GMT
>>Is the .htm extension not supported by the URL class?
>
[quoted text clipped - 4 lines]
>
>     -- chris

Thanks, I'm actually already tracking that one down...
but without success thus far
VisionSet - 11 Apr 2006 17:29 GMT
> I'm writting a webcrawler so far it parses correctly, connects and gets
> the response code just fine. The problem is that everytime I try to
[quoted text clipped - 3 lines]
> if not is there a class that will allow me to connect to a .htm link to
> get the response code?

You aren't dealing with files surely?
So the exception sounds reasonable.

--
Mike W
Ben - 11 Apr 2006 17:40 GMT
>>I'm writting a webcrawler so far it parses correctly, connects and gets
>>the response code just fine. The problem is that everytime I try to
[quoted text clipped - 9 lines]
> --
> Mike W

Not exactly sure what you mean, I need to open the source code from
every page I access and parse it to find the links, then check the
links, find out if I need to parse more page, and so on till it's done.

So in that way I am dealing with files, what I don't understand is why
I'm getting the exception. My URL is correct, it works in a browser but
when I try to get the response code I get a 404 with a FileNotFound
exception.

Ben
VisionSet - 11 Apr 2006 17:53 GMT
> Not exactly sure what you mean, I need to open the source code from
> every page I access and parse it to find the links, then check the
[quoted text clipped - 4 lines]
> when I try to get the response code I get a 404 with a FileNotFound
> exception.

Unless I misunderstand, you are programmatically accessing a page available
on the net and parseing the contents over a URLConnection.  Unless you save
the page locally I don't see any files involved from your side of the wire.

--
Mike W
Ben - 11 Apr 2006 18:07 GMT
>>Not exactly sure what you mean, I need to open the source code from
>>every page I access and parse it to find the links, then check the
[quoted text clipped - 11 lines]
> --
> Mike W

Correct, I don't save the pages locally, but here is a weird instance of
my error:

HttpURLConnection con = (HttpURLConnection)desti.openConnection();
int response = con.getResponseCode();
checking.setStatusCode( response );

using that last piece of code, if my desti URL is created like this:

desti = new URL("http://www.studentprograms.vt.edu");

it works fine, it connects, parses and everything, but if I create the
same URL (in reality) like this:

desti = new URL("http://www.studentprograms.vt.edu/index.php");

it wil give me the FileNotFound exception, but those two URLs are equal,
they point to the same file...

Any pointers?

thanks Ben
VisionSet - 11 Apr 2006 18:56 GMT
...
> but if I create the
> same URL (in reality) like this:
[quoted text clipped - 5 lines]
>
> Any pointers?

Stop refering to them as files they are not they are resources from your and
your codes perspective.

The URL constructor does not throw a FileNotFoundException so you need to
brush up your stack trace cause location skills!

What line is it stating as the cause of the FNF exception?  There lies your
problem.

--
Mike W
Ben - 11 Apr 2006 19:08 GMT
> ...
>
[quoted text clipped - 19 lines]
> --
> Mike W

It's thrown on the line:

InputStream in = page.openStream();

where page is a URL object
Ben
Chris Uppal - 11 Apr 2006 19:40 GMT
> using that last piece of code, if my desti URL is created like this:
>
[quoted text clipped - 6 lines]
>
> it [404s]

Are those two URLs /exactly/ the ones that fail for you ?  If so then there's
something odd going on, probably with your setup somewhere.  I can fetch both
of them, using a program written at the pure TCP level (no HTTP libraries to
confuse the issue), with no problems at all.

Assuming that the URLs you quotes are not in fact exactly the problematic one,
then I suggest that you fire up Ethereal (you did mention that you had that, I
think) and then, using both your simple test program[*] and the browser of your
choice, attempt to fetch both.  That will give you a precise record of exactly
what each HTTP client was asking for in each case (and also of any associated
gumph like acceptable encodings, languages, and whatnot).  That will give you
the best pointer you are ever going to get into what's going wrong.

Come to that, do the same thing even if the above URLs /are/ exactly the
problematic ones.

   -- chris

([*] you do /have/ a simple test program, I assume....)
Ben - 11 Apr 2006 19:59 GMT
>>using that last piece of code, if my desti URL is created like this:
>>
[quoted text clipped - 26 lines]
>
> ([*] you do /have/ a simple test program, I assume....)

the ones that fail are the ones that have an .(extension).

From the two URL I posted earlier, they are some of the ones I'm
testing on, the first one works but not the second. Yet they are
essentially the same.

Ben
Arvind - 11 Apr 2006 23:51 GMT
Umm it is really interesting that you can't get the inputstream at all
- Can you get the stack trace here... ?

--
Arvind
Chris Uppal - 12 Apr 2006 10:06 GMT
> the ones that fail are the ones that have an .(extension).

OK, let's get this straight.

Have you got a small test program which exhibits the problem ?  (it should not
take more than a couple of minutes to write and shouldn't be more than a
handful of lines long). If not then you are just wasting time (ours as well as
your own).

Have you use Ethereal (or similar) to watch what happens when you try to
download those URLs ?  If not then, again, you are just wasting time.

   -- chris
Ben - 12 Apr 2006 13:11 GMT
>>the ones that fail are the ones that have an .(extension).
>
[quoted text clipped - 9 lines]
>
>     -- chris

Thanks for the help, I managed to track down the error.
But yes I had test code, I use the TDD method to develop code, so I
always have test code, even before I have actual code.

Somewhere deep in my program I was adding a '/' at the end of URL so I
could have an easier time constructing absolute URL from relative URL. I
just needed to change that, and construct my URLs using the
new URL(string protocol, string host, string path) constructor instead
of the new URL(string url).

I guess that that little bit of code made the URL invalid, but when I
invoked the toString method to check it it looked just fine.

Thanks for the help again,
Ben


Free Magazines

Get these publications absolutely FREE for up to 12 months. There are no hidden fees and no obligation. Simply choose a title, complete the application form and submit it. Read more ...

Oracle MagazineNetwork ComputingComputer WorldBio-IT WorldeWeekInformation WeekInfosecurity
 
Sign In
Join
My Latest Posts
My Monitored Threads
My Blog
My Photo Gallery
My Profile
My Homepage

Start New Thread
Enable EMail Alerts
Rate this Thread



©2008 Advenet LLC   Privacy Policy - Terms of Use
This website includes both content owned or controlled by Advenet as well as content owned or controlled by third parties.