Home | Contact Us | FAQ | Search & Site Map | Link to Us
Sign In | Join | Other 45 Sites in Network
HomeAnnouncementsWhite Papers
Discussion GroupsFirst AidDatabasesJavaBeansGUIJava 3DVirtual MachineCORBASecurityToolsGeneral
Java DirectoryOpen Source ProjectsSample Book ChaptersUser GroupsWeb Resources
Related Topics
Databases.NETMore Topics ...

Java Forum / General / November 2007

Tip: Looking for answers? Try searching our database.

Java program that reads websites to download from a file

Thread view: 
Generic Usenet Account - 19 Nov 2007 13:53 GMT
Sometime back I wrote a simple Java program that reads a list of URLs
from a file and stores their contents on the local file system.  I
have no problems with normal (i.e. html) pages, but I am not able to
download asp files.  They are stored as zero length files.

I would greatly appreciate if someone could suggest a way out.

-- Bhat

My source code follows:

/////////////////// Source code begin /////////////////////

// This program reads a list of URLs to access and store on the local
// file system from a file.  The name of the file is passed as the
// first command line argument.  Each URL is on a separate line.
// Lines beginning with the '#' character are treated as blanks and
// are skipped.
//
import java.io.*;
import java.net.*;
import java.security.*;

class WebsiteLoader
{
 public static char replaceChar = '~';

 public static void main(String argv[]) throws IOException
 {
   // The following two lines were suggested by the following
website:
   // http://www.javaworld.com/javaworld/javatips/jw-javatip96.html
   // They help in suppressing the java.net.MalformedURLException
   System.setProperty("java.protocol.handler.pkgs",
                      "com.sun.net.ssl.internal.www.protocol");
   Security.addProvider(new com.sun.net.ssl.internal.ssl.Provider());

   BufferedReader br;
   String origName;

   if(argv.length != 0)
   {
     br = new BufferedReader(new FileReader(argv[0]));

     // Read URLs from the file.  Skip blank lines and lines
beginning
     // with the '#' character.
     for(;;)
     {
       origName = br.readLine();
       if(origName == null)
         break;

       origName = origName.trim();

       if(origName.length() == 0)
         continue;

       if(origName.charAt(0) == '#')
         continue;

       URL url = new URL(origName);
       if(url == null)
         continue;

       BufferedReader bufRdr = new BufferedReader(new
InputStreamReader(url.openStream()));

       // The name of the file to which the website contents are
written
       // is derived from the URL by substituting the following
characters
       // with some "non-offending" character:
       //  \,/,:,*,?,",<,>,|

       String modName = origName;
       modName = modName.replace('\\', replaceChar);
       modName = modName.replace('/', replaceChar);
       modName = modName.replace(':', replaceChar);
       modName = modName.replace('*', replaceChar);
       modName = modName.replace('?', replaceChar);
       modName = modName.replace('"', replaceChar);
       modName = modName.replace('<', replaceChar);
       modName = modName.replace('>', replaceChar);
       modName = modName.replace('|', replaceChar);

       FileWriter fWriter = new FileWriter(modName);

       System.out.println("Writing contents of " + origName + " to "
+
                          "the following file: " + modName);
       for(;;)
       {
         String thisLine = bufRdr.readLine();
         if(thisLine == null)
           break;

         fWriter.write(thisLine);
       }
     }
   }
 }
}
Arne Vajhøj - 20 Nov 2007 02:41 GMT
> Sometime back I wrote a simple Java program that reads a list of URLs
> from a file and stores their contents on the local file system.  I
> have no problems with normal (i.e. html) pages, but I am not able to
> download asp files.  They are stored as zero length files.
>
> I would greatly appreciate if someone could suggest a way out.

>         BufferedReader bufRdr = new BufferedReader(new
> InputStreamReader(url.openStream()));

Your code should not distinguish between static HTML
and various server side scripting (ASP, PHP, JSP, ASP.NET).

An ASP page may check browser type and reject your request or
maybe expect a cookie or something.

You will need to experiment a bit to find out what is
causing the problem.

Arne


Free Magazines

Get these publications absolutely FREE for up to 12 months. There are no hidden fees and no obligation. Simply choose a title, complete the application form and submit it. Read more ...

Oracle MagazineNetwork ComputingComputer WorldBio-IT WorldeWeekInformation WeekInfosecurity
 
Sign In
Join
My Latest Posts
My Monitored Threads
My Blog
My Photo Gallery
My Profile
My Homepage

Start New Thread
Enable EMail Alerts
Rate this Thread



©2009 Advenet LLC   Privacy Policy - Terms of Use
This website includes both content owned or controlled by Advenet as well as content owned or controlled by third parties.