Home | Contact Us | FAQ | Search & Site Map | Link to Us
Sign In | Join | Other 45 Sites in Network
HomeAnnouncementsWhite Papers
Discussion GroupsFirst AidDatabasesJavaBeansGUIJava 3DVirtual MachineCORBASecurityToolsGeneral
Java DirectoryOpen Source ProjectsSample Book ChaptersUser GroupsWeb Resources
Related Topics
Databases.NETMore Topics ...

Java Forum / General / April 2006

Tip: Looking for answers? Try searching our database.

parse HTML

Thread view: 
VitaminB - 25 Apr 2006 17:21 GMT
Hello,

I want to parse a HTML document to get all URLs of the frames in a
frameset. I get a "NullPointer Exception" in the System.out.println...

Thanks a lot for you help.

Regards,
Marcus

##################
Java Code:
##################

URL urlobj = new URL(str);

HttpURLConnection uc = null;
uc = (HttpURLConnection)urlobj.openConnection();
uc.setUseCaches(false);
DataInputStream is = new DataInputStream(uc.getInputStream());

HTMLEditorKit hKit = new HTMLEditorKit();
HTMLDocument hDoc = new HTMLDocument();
hKit.read(is, hDoc, 0);
HTMLDocument.Iterator it = hDoc.getIterator(HTML.Tag.FRAME);

AttributeSet attSet = it.getAttributes();
String s = (String)attSet.getAttribute(HTML.Attribute.SRC);
System.out.println(s);

##################
Beispiel HTML-Seite:
##################

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Frameset//EN">
<html>
<head>

<script language="JavaScript" type="text/javascript">
<!--
self._domino_name = "_Main";
// -->
</script>
</head>

<frameset cols="45%,55%">

<frame
src="/Test/HET/PerformanceTestDB.nsf/ContentDeliveryMeasurement?OpenForm">

<frameset rows="1*,1*">

<frame src="/Test/HET/PerformanceTestDB.nsf/DocsInserted?OpenView">

<frame name="docPreviewFrame"
src="/Test/HET/PerformanceTestDB.nsf/select?OpenForm">
</frameset>
</frameset>
</html>
Amfur Kilnem - 25 Apr 2006 17:32 GMT
> AttributeSet attSet = it.getAttributes();
> String s = (String)attSet.getAttribute(HTML.Attribute.SRC);
> System.out.println(s);

attSet.getAttribute must've returned null.
VitaminB - 25 Apr 2006 18:27 GMT
Oliver Wong - 25 Apr 2006 19:27 GMT
> I want to parse a HTML document to get all URLs of the frames in a
> frameset. I get a "NullPointer Exception" in the System.out.println...
[...]

> ##################
> Java Code:
[quoted text clipped - 15 lines]
> String s = (String)attSet.getAttribute(HTML.Attribute.SRC);
> System.out.println(s);

   I don't see how you could have gotten an NPE from the
System.out.printlnt statement. Are you sure you didn't get it from the line
above, or possibly somewhere else? See the section titled "If you get an
error message, repeat it exactly." at
http://riters.com/JINX/index.cgi/Suggestions_20for_20Asking_20Questions_20on_20N
ewsgroups


   - Oliver
VitaminB - 25 Apr 2006 19:35 GMT
OK, now I worked on my code and get anothere exception. But I similary
don't know why.

Here is the failure:
javax.swing.text.ChangedCharSetException
    at
javax.swing.text.html.parser.DocumentParser.handleEmptyTag(DocumentParser.java:198)
    at javax.swing.text.html.parser.Parser.startTag(Parser.java:401)
    at javax.swing.text.html.parser.Parser.parseTag(Parser.java:1875)
    at javax.swing.text.html.parser.Parser.parseContent(Parser.java:1910)
    at javax.swing.text.html.parser.Parser.parse(Parser.java:2076)
    at
javax.swing.text.html.parser.DocumentParser.parse(DocumentParser.java:135)
    at
javax.swing.text.html.parser.ParserDelegator.parse(ParserDelegator.java:107)
    at javax.swing.text.html.HTMLEditorKit.read(HTMLEditorKit.java:262)
    at javax.swing.text.DefaultEditorKit.read(DefaultEditorKit.java:163)
    at Stress.urlRequest(Stress.java:76)
    at Stress.run(Stress.java:40)

Here are the code:

             public long[] urlRequest(String str) {
        Cal starttime = new Cal();
        long[] read = new long[2];
        try {
            int c = 0;
            byte[] rc = new byte[1024];
            URL urlobj = new URL(str);

             HTTPRequest request = new HTTPRequest(str, user, pass);
            DataInputStream is = new DataInputStream( request.get() );

            HTMLEditorKit hKit = new HTMLEditorKit();
            HTMLDocument hDoc = new HTMLDocument();
            hKit.read(is, hDoc, 0);

            HTMLDocument.Iterator it = hDoc.getIterator(HTML.Tag.FRAME);
            it.next();
            AttributeSet attSet = it.getAttributes();
            String s = (String)attSet.getAttribute(HTML.Attribute.SRC);
            System.out.println(s);

            //System.out.println(attSet.getAttributeCount());

               while (( c = is.read(rc)) != -1 ) {
                   read[0] = read[0] + c;
               }
               Cal endtime = new Cal();
               read[1] = endtime.getTimeInMillis() -
starttime.getTimeInMillis();
               return read;
        }
           catch ( Exception e ) {
            e.printStackTrace();
        }
        return read;
    }
   
}


Free Magazines

Get these publications absolutely FREE for up to 12 months. There are no hidden fees and no obligation. Simply choose a title, complete the application form and submit it. Read more ...

Oracle MagazineNetwork ComputingComputer WorldBio-IT WorldeWeekInformation WeekInfosecurity
 
Sign In
Join
My Latest Posts
My Monitored Threads
My Blog
My Photo Gallery
My Profile
My Homepage

Start New Thread
Enable EMail Alerts
Rate this Thread



©2008 Advenet LLC   Privacy Policy - Terms of Use
This website includes both content owned or controlled by Advenet as well as content owned or controlled by third parties.