Home | Contact Us | FAQ | Search & Site Map | Link to Us
Sign In | Join | Other 45 Sites in Network
HomeAnnouncementsWhite Papers
Discussion GroupsFirst AidDatabasesJavaBeansGUIJava 3DVirtual MachineCORBASecurityToolsGeneral
Java DirectoryOpen Source ProjectsSample Book ChaptersUser GroupsWeb Resources
Related Topics
Databases.NETMore Topics ...

Java Forum / General / November 2007

Tip: Looking for answers? Try searching our database.

finding subdirectories from starting URL

Thread view: 
Alan - 12 Nov 2007 00:20 GMT
I want to find subdirectories from a starting URL.  For example,
if I start with http://www.someplace.net, I want to be able to find
the subdirectories there, e.g.:

http://www.someplace.net/documentation/
http://www.someplace.net/about/
http://www.someplace.net/images/

   Are there Java methods that facilitate this?

Thanks, Alan
Daniel Pitts - 12 Nov 2007 00:32 GMT
>      I want to find subdirectories from a starting URL.  For example,
> if I start with http://www.someplace.net, I want to be able to find
[quoted text clipped - 7 lines]
>
> Thanks, Alan

There is no easy way to do that, unless someplace.net gives you a
listing page.  Generally, in order to do that, you have to have either
direct access to the disk, access to an FTP account on that machine, or
you have to crawl the web site and parse out the urls.

Note, this is not a limitation of Java, but simply a result of the way
http works.

There are plenty of web-crawling libraries/programs out there, I suggest
you Google for them.

Good luck,
Daniel

Signature

Daniel Pitts' Tech Blog: <http://virtualinfinity.net/wordpress/>

Sherman Pendley - 12 Nov 2007 04:40 GMT
>>      I want to find subdirectories from a starting URL.  For example,
>> if I start with http://www.someplace.net, I want to be able to find
[quoted text clipped - 15 lines]
> Note, this is not a limitation of Java, but simply a result of the way
> http works.

In particular, note that documentation, about, and images may not even be
directories at all. A content-management system could use them as keys
into a database of managed documents, for example, or as category keywords
that are used to dynamically assemble a list of documents in the specified
category.

sherm--

Signature

WV News, Blogging, and Discussion: http://wv-www.com
Cocoa programming in Perl: http://camelbones.sourceforge.net

Andrew Thompson - 12 Nov 2007 03:11 GMT
>I want to find subdirectories from a starting URL.  For example,
>if I start with http://www.someplace.net, I want to be able to find
>the subdirectories there, ..

Why?  What business is that of yours?
What does this ability offer to the end user?

Signature

Andrew Thompson
http://www.athompson.info/andrew/

Roedy Green - 12 Nov 2007 03:15 GMT
>http://www.someplace.net/documentation/
>http://www.someplace.net/about/
>http://www.someplace.net/images/
>
>    Are there Java methods that facilitate this?

In general no. It is considered confidential information.  Sometimes a
server will give you a directory listing in HTML if you give it an URL
of a directory without in index.html file in it.
Signature

Roedy Green Canadian Mind Products
The Java Glossary
http://mindprod.com

Lew - 12 Nov 2007 04:45 GMT
>> http://www.someplace.net/documentation/
>> http://www.someplace.net/about/
[quoted text clipped - 5 lines]
> server will give you a directory listing in HTML if you give it an URL
> of a directory without in index.html file in it.

In addition, many directories on the server hard drive, while they are
subdirectories of the web document directory, will not be accessible to public
clients.  A classic example is the WEB-INF/ directory tree in Java EE apps,
but Apache .htaccess can also restrict directories.

Signature

Lew

Andrew Thompson - 12 Nov 2007 06:11 GMT
>>http://www.someplace.net/documentation/
...
>>    Are there Java methods that facilitate this?
>
>In general no. It is considered confidential information.  Sometimes a
>server will give you a directory listing in HTML if you give it an URL
>of a directory without in index.html file in it.

An ironic side note is that I'd like my server to *allow*
automatic 'directory listing' for dirs with no index.html,
but cannot figure how to achieve it using the arcane
..contol panel ..thingy the host offers.

If directory indexing *is* turned on, it is a perhaps tedious
but mundane task to parse the resulting HTML, looking for
links to sub-dirs and resources.

Signature

Andrew Thompson
http://www.athompson.info/andrew/

Chris ( Val ) - 12 Nov 2007 06:33 GMT
> >>http://www.someplace.net/documentation/
> ..
[quoted text clipped - 8 lines]
> but cannot figure how to achieve it using the arcane
> .contol panel ..thingy the host offers.

[snip]

If it is an IIS web server, you will usually find a checkbox
in the IIS server properties, or for memory, you can even get
to it by right clicking on the virtual directory and editing
the properties.

Tomcat for example has the following setting in its web.xml file:

   <init-param>
       <param-name>listings</param-name>
       <param-value>true</param-value>
   </init-param>

But I think it's only good whilst developing.

--
Chris
Alan - 12 Nov 2007 14:37 GMT
Thanks for the information.  I think I shall just follow href
links instead of finding directories.

Thanks, Alan
Andrew Thompson - 12 Nov 2007 16:02 GMT
>Thanks for the information.  I think I shall just follow href
>links instead of finding directories.

In that case, as Daniel mentioned, search "Web crawler"/
"web crawling".  There have been some interesting discussions
about crawlers in these groups, across the ages.  As I vaguely
recall there was a source posted for one by Mr Omar Khan
..ahh yes.
<http://groups.google.com/group/comp.lang.java.programmer/msg/df4a6f43d57e3e6a

But please (please, please) respect the directions of the
site's robots.txt (if it has one).

Signature

Andrew Thompson
http://www.athompson.info/andrew/



Free Magazines

Get these publications absolutely FREE for up to 12 months. There are no hidden fees and no obligation. Simply choose a title, complete the application form and submit it. Read more ...

Oracle MagazineNetwork ComputingComputer WorldBio-IT WorldeWeekInformation WeekInfosecurity
 
Sign In
Join
My Latest Posts
My Monitored Threads
My Blog
My Photo Gallery
My Profile
My Homepage

Start New Thread
Enable EMail Alerts
Rate this Thread



©2009 Advenet LLC   Privacy Policy - Terms of Use
This website includes both content owned or controlled by Advenet as well as content owned or controlled by third parties.