Home | Contact Us | FAQ | Search & Site Map | Link to Us
Sign In | Join | Other 45 Sites in Network
HomeAnnouncementsWhite Papers
Discussion GroupsFirst AidDatabasesJavaBeansGUIJava 3DVirtual MachineCORBASecurityToolsGeneral
Java DirectoryOpen Source ProjectsSample Book ChaptersUser GroupsWeb Resources
Related Topics
Databases.NETMore Topics ...

Java Forum / General / July 2006

Tip: Looking for answers? Try searching our database.

gettin all pages in a website

Thread view: 
kkrish - 18 Jul 2006 07:33 GMT
hi all,
       Is it possible to read all the html pages in website, given
the web site address?If so how to get the pages.Should we search for
all the "href" links , will it be sufficient.I am new to java jsp.
Thanks in advance.
Krishna.V.J
Ingo R. Homann - 18 Jul 2006 08:25 GMT
Hi,

> hi all,
>         Is it possible to read all the html pages in website, given
> the web site address?If so how to get the pages.Should we search for
> all the "href" links , will it be sufficient.I am new to java jsp.
> Thanks in advance.
> Krishna.V.J

There are several (versions of a) program(s) called "wget" which do
exactly that. (Note that this has nothing to do with java...)

Ciao,
Ingo
kkrish - 18 Jul 2006 11:15 GMT
Hi,
   Thanks.Is it impossible to do capturing website pages in java?If
possible  how to proceed.
Philipp Leitner - 18 Jul 2006 12:19 GMT
> Hi,
>     Thanks.Is it impossible to do capturing website pages in java?If
> possible  how to proceed.

Of course it is not /impossible/ to do, but I don't know of any
standard library to do it (what means that you would have to implement
the functionality yourself) - that may be a little annoying (depending
on your project size), but not a huge problem I guess.

Just one sidenote: searching for '<a href' will generally not be enough
- there are also plenty of other redirects out there (JavaScript for
example).

/philipp
Oliver Wong - 18 Jul 2006 17:13 GMT
> Just one sidenote: searching for '<a href' will generally not be enough
> - there are also plenty of other redirects out there (JavaScript for
> example).

   There may also be "secret" pages that aren't linked to anywhere else.
There may be password protected webpages. There may be dynamically generated
web pages which depend upon the IP address of the request (for example, if
the request comes from 127.0.0.1, the "super-administrator" page is shown,
otherwise the "normal user" page is shown). With dynamically generated web
pages, there could be infinitely many pages. So in general, no, it's not
possible to get all pages in a website.

   - Oliver
TechBookReport - 18 Jul 2006 16:18 GMT
> Hi,
>     Thanks.Is it impossible to do capturing website pages in java?If
> possible  how to proceed.

You might find the following article useful:
http://www.developer.com/java/other/article.php/1573761

Signature

TechBookReport Java - http://www.techbookreport.com/JavaIndex.html



Free Magazines

Get these publications absolutely FREE for up to 12 months. There are no hidden fees and no obligation. Simply choose a title, complete the application form and submit it. Read more ...

Oracle MagazineNetwork ComputingComputer WorldBio-IT WorldeWeekInformation WeekInfosecurity
 
Sign In
Join
My Latest Posts
My Monitored Threads
My Blog
My Photo Gallery
My Profile
My Homepage

Start New Thread
Enable EMail Alerts
Rate this Thread



©2009 Advenet LLC   Privacy Policy - Terms of Use
This website includes both content owned or controlled by Advenet as well as content owned or controlled by third parties.