Java Forum / General / January 2006
Understanding HTTP
Roedy Green - 12 Jan 2006 10:34 GMT I am seeing a great many errors of this form in my http server error log:
[Thu Jan 12 00:57:41 2006] [error] [client 205.214.208.5] unable to access file "../jgloss/rs232c.html" in parsed file net:/com/mindprod/www/bgloss/cables.html, referer: http://www.google.com/search?q=cabling+computer+case&hl=en&lr=&start=10&sa=N
It is as if URLS beginning with ../ give people trouble. They all seem to work fine for me using Opera, Firefox, Mozilla, Netscape and IE.
Is there something not quite kosher about a URL beginning ../?
 Signature Canadian Mind Products, Roedy Green. http://mindprod.com Java custom programming, consulting and coaching.
Andrea Desole - 12 Jan 2006 13:40 GMT > I am seeing a great many errors of this form in my http server error > log: [quoted text clipped - 8 lines] > > Is there something not quite kosher about a URL beginning ../? I don't think so. And I don't seem to have problems with my Firefox either. But, looking at the error message, doesn't it look more like the server is trying to access it, and not the client?
Roedy Green - 12 Jan 2006 17:27 GMT On Thu, 12 Jan 2006 14:40:27 +0100, Andrea Desole <news@desole.demon.NOSPAMPLEASE.nl> wrote, quoted or indirectly quoted someone who said :
>I don't think so. And I don't seem to have problems with my Firefox either. >But, looking at the error message, doesn't it look more like the server >is trying to access it, and not the client? Perhaps under some circumstance the server loses track of the absolute position, so that relative URLS don't work.
The server must somehow track that for each user? How would a browser resolve it without a knowledge of the file structure? Yet how could a server keep track of every user's current position indefinitely?
 Signature Canadian Mind Products, Roedy Green. http://mindprod.com Java custom programming, consulting and coaching.
Andrea Desole - 13 Jan 2006 09:29 GMT > Perhaps under some circumstance the server loses track of the absolute > position, so that relative URLS don't work. > > The server must somehow track that for each user? How would a browser > resolve it without a knowledge of the file structure? Yet how could a > server keep track of every user's current position indefinitely? the server doesn't have to. It's up to the client to build the url with the information the server gives, and then ask the file with that url. As far as I remember it's perfectly legal for the server to give a relative url. So, when the client loads the page http://www.mindprob.com/bgloss/cables.html, and in that page there is a link to ../jgloss/rs232c.html, the client itself should know, when the link is clicked, that it has to ask the url http://www.mindprob.com/jgloss/rs232c.html. The client knows it because it builds the url relatively to the location of the page that has just been loaded. The server, on the other side, afther the page cables.html has been delivered, doesn't care anymore. This is one of the reasons why I would say that the problem is more likely on the server. The value of the referer makes me think too. I have the impression that what happened in that case is the following: - someone went on google, and found a link to your page - asked the page to your server - for an unknown reason the server seems to parse the page, and try to access the page pointed to by the link
So the message on the log was probably written *before* the client actually got any page from your server
Roedy Green - 13 Jan 2006 10:29 GMT On Fri, 13 Jan 2006 10:29:41 +0100, Andrea Desole <news@desole.demon.NOSPAMPLEASE.nl> wrote, quoted or indirectly quoted someone who said :
>This is one of the reasons why I would say that the problem is more >likely on the server. ISP of course blames it on a broken browser that is not building proper URLs. It might not be a real browser, but somebody's home grown spidering program or the like.
I will pass your comment on.
 Signature Canadian Mind Products, Roedy Green. http://mindprod.com Java custom programming, consulting and coaching.
Andrea Desole - 13 Jan 2006 10:39 GMT > ISP of course blames it on a broken browser that is not building > proper URLs. It might not be a real browser, but somebody's home > grown spidering program or the like. that's not impossible, but still not very likely, I guess. I think you can find it out by logging the browser type. And, in case someone is trying to be identified as another browser, I would look at the ip addresses. I don't think there are many people using spiders, so you shouldn't have many different addresses in the log with that error. What you can also try is to test it from a machine with a well known address, so that you can check in the log if you have the error or not.
> I will pass your comment on. I don't claim any responsibility :-)
Oliver Wong - 13 Jan 2006 21:26 GMT > On Thu, 12 Jan 2006 14:40:27 +0100, Andrea Desole > <news@desole.demon.NOSPAMPLEASE.nl> wrote, quoted or indirectly quoted [quoted text clipped - 11 lines] > resolve it without a knowledge of the file structure? Yet how could a > server keep track of every user's current position indefinitely? When an HTTP client connects to an HTTP server, one of the operations a client may perform is a GET operation. This is the operation typically used to retrieve a web page. The GET operation takes one parameter, which is some identifier for the resource to get. There is no concept of "absolute" or "relative" path in this case, and the identifier does not even need to reflect a path in any sense (though in most implementations, the identifier is a path describing path to the desired HTML file).
A web browser does not only act as an HTTP client, but also as an HTML parser and renderer. HTML does define a concept of a relative URL. It is up to the browser to note where the HTML file was retrieved from, and from there, to translate the relative URL into an "absolute" path that is valid for a GET operation.
It is not uncommon for the webserver to do some semi-complex manipulation on the argument to GET to determine what file to retrieve. For example, the server might respond to a request like "GET ~owong/foo.html" by loading the file locate at "/usr/home/owong/wwwroot/foo.html".
Similarly, it might respond to a request like "GET me_a_random_number" by randomly generating a number, and returning that number, and not actually access the file system at all.
- Oliver
Chris Uppal - 12 Jan 2006 13:57 GMT > Is there something not quite kosher about a URL beginning ../? More than "not quite kosher", any client that asks a server for a non-absolute URL is completely, totally, broken. An HTTP GET (etc) must specify the full, absolute, URL, or the request is (a) illegal, (b) meaningless.
OTOH, it's not obvious from your log example that the client is actually asking for a relative URL,. The wording makes it sound more like the server /itself/ is trying, and failing, to resolve a relative URL in your cables.htlm page. I have no idea why it might be doing that.
-- chris
Roedy Green - 12 Jan 2006 17:31 GMT On 12 Jan 2006 13:57:31 GMT, "Chris Uppal" <chris.uppal@metagnostic.REMOVE-THIS.org> wrote, quoted or indirectly quoted someone who said :
>OTOH, it's not obvious from your log example that the client is >actually asking for a relative URL,. The wording makes it sound more >like the server /itself/ is trying, and failing, to resolve a relative >URL in your cables.htlm page. I have no idea why it might be doing >that. My web pages are full of relative URLs. That is how I manage to use the same HTML to navigate a local copy or a web copy.
Perhaps what is happening is the browser is supposed to resolve the ../ to an absolute URL, but some browsers (or programs sniffing my website) fail to.
In that case, I need not sweat it. The user has faulty software.
 Signature Canadian Mind Products, Roedy Green. http://mindprod.com Java custom programming, consulting and coaching.
Dave Glasser - 12 Jan 2006 14:20 GMT Roedy Green <my_email_is_posted_on_my_website@munged.invalid> wrote on Thu, 12 Jan 2006 10:34:52 GMT in comp.lang.java.programmer:
>I am seeing a great many errors of this form in my http server error >log: [quoted text clipped - 6 lines] >It is as if URLS beginning with ../ give people trouble. They all seem >to work fine for me using Opera, Firefox, Mozilla, Netscape and IE. That error is occurring on your server, not on the client. It looks like the webserver is checking the validity of local links in the page as it serves the page. If you look at that page, you'll see next to that link "[an error occurred while processing this directive]". It looks like it's getting confused because you have the actual directory "bgloss" mapped to the virtual directory "jgloss." The server is looking for /com/mindprod/www/jgloss/rs232c.html on your disk and not finding it. It looks like a server bug, since the server presumably knows about the mapping.
The fix would be to make the link read href="rs232c.html".
 Signature Check out QueryForm, a free, open source, Java/Swing-based front end for relational databases.
http://qform.sourceforge.net
If you're a musician, check out RPitch Relative Pitch Ear Training Software.
http://rpitch.sourceforge.net
Roedy Green - 12 Jan 2006 17:39 GMT >That error is occurring on your server, not on the client. It looks >like the webserver is checking the validity of local links in the page [quoted text clipped - 7 lines] > >The fix would be to make the link read href="rs232c.html". Hmm in url http://mindprod.com/bgloss/cables.html is href="../jgloss/rs232c.html" which references file http://mindprod.com/jgloss/rs232c.html
the first url is mapped to net:/com/mindprod/www/bgloss/cables.html the second to net:/com/mindprod/www/jgloss/rs232c.html
there is not supposed to be any direct mapping between the files.
I can't see why the server would even be interested in rs232c.html unless the user clicked the link.
It looks like a google search. Perhaps google accelerator is chasing links and not resolving them properly.
 Signature Canadian Mind Products, Roedy Green. http://mindprod.com Java custom programming, consulting and coaching.
mgungora - 13 Jan 2006 15:05 GMT Just curious... If I browse straight to "http://mindprod.com/bgloss/cables.html" (not from google), I see "[an error occurred while processing this directive]." besides the "More Than You Wanted To Know About RS232C" link. What is that? Are you building the page dynamically?
Roedy Green - 13 Jan 2006 20:55 GMT >Just curious... If I browse straight to >"http://mindprod.com/bgloss/cables.html" (not from google), I see "[an >error occurred while processing this directive]." besides the "More >Than You Wanted To Know About RS232C" link. What is that? Are you >building the page dynamically? I think I understand what is going on now, a limitation in Novell SSI.
The HTML I post on the website looks like this
<span class="essay">See the essay <a href="../jgloss/rs232c.html">More Than You Wanted To Know About RS232C</a><span class="updated"> <!--#FLASTMOD FILE="../jgloss/rs232c.html"--></span>.</span>
the #FLASTMOD is a Novell SSI command to insert the date of the file so that there is an indication how old the essay is. To process that, dynamically SSI has to look up that page. So the Novell server is chasing ../ links, and falling on its nose.
The link itself is fine. It is just the decoration of the date that is screwing up.
My solution will be to generate these dates with static macros instead. So then the code I type will look like <!-- macro Updated ../jgloss/rs232c.html --> and, before webpage is uploaded, it will generate <span class="updated">2006-01-13</span>
 Signature Canadian Mind Products, Roedy Green. http://mindprod.com Java custom programming, consulting and coaching.
Luc The Perverse - 14 Jan 2006 00:38 GMT >>Just curious... If I browse straight to >>"http://mindprod.com/bgloss/cables.html" (not from google), I see "[an [quoted text clipped - 24 lines] > and, before webpage is uploaded, it will generate > <span class="updated">2006-01-13</span> One bad thing about running a "unique" OS is that no one can help you when you run into an obscure problem.
-- LTP
:)
Free MagazinesGet these publications absolutely FREE for up to 12 months. There are no hidden fees and no obligation. Simply choose a title, complete the application form and submit it. Read more ...
|
|
|