Home | Contact Us | FAQ | Search & Site Map | Link to Us
Sign In | Join | Other 45 Sites in Network
HomeAnnouncementsWhite Papers
Discussion GroupsFirst AidDatabasesJavaBeansGUIJava 3DVirtual MachineCORBASecurityToolsGeneral
Java DirectoryOpen Source ProjectsSample Book ChaptersUser GroupsWeb Resources
Related Topics
Databases.NETMore Topics ...

Java Forum / General / March 2006

Tip: Looking for answers? Try searching our database.

Parsing html

Thread view: 
stixwix - 14 Mar 2006 11:26 GMT
What are peoples' favourite way of doing this?
I tried Tagsoup but have little experience of XML and can't find any
decent docs on the XPath bit.
The following prints the doc (a basic html file) title as expected:

    URL url = new URL("file:///c:\\tmp\\test.htm");
    Parser p = new Parser();
    SAX2DOM sax2dom = new SAX2DOM();
    p.setContentHandler(sax2dom);
    p.parse(new InputSource(url.openStream()));
    Node doc = sax2dom.getDOM();
    String titlePath = "/html:html/html:head/html:title";
    XObject title = XPathAPI.eval(doc,titlePath);
    System.out.println("Title is '"+title+"'");

However, changing the titlePath to the following doesn't give the text
from the body tag:

    String titlePath = "/html:html/html:body";

I would eventually like to be able to parse html comments into my java
prog as well.

Thanks,
Andy
jcsnippets.atspace.com - 14 Mar 2006 12:05 GMT
"stixwix" <andywickson@gmail.com> wrote in news:1142331986.119994.88230
@j52g2000cwj.googlegroups.com:

> What are peoples' favourite way of doing this?
> I tried Tagsoup but have little experience of XML and can't find any
[quoted text clipped - 21 lines]
> Thanks,
> Andy

If you're going to parse Html files, have a look at
http://sourceforge.net/projects/htmlparser - very easy to use, samples
included.

Best regards,

JayCee
Signature

http://jcsnippets.atspace.com
a collection of source code, tips and tricks

deadlycow21@gmail.com - 14 Mar 2006 17:49 GMT
> "stixwix" <andywickson@gmail.com> wrote in news:1142331986.119994.88230
> @j52g2000cwj.googlegroups.com:
[quoted text clipped - 35 lines]
> http://jcsnippets.atspace.com
> a collection of source code, tips and tricks

as posted above http://sourceforge.net/projects/htmlparser is a good
one.


Free Magazines

Get these publications absolutely FREE for up to 12 months. There are no hidden fees and no obligation. Simply choose a title, complete the application form and submit it. Read more ...

Oracle MagazineNetwork ComputingComputer WorldBio-IT WorldeWeekInformation WeekInfosecurity
 
Sign In
Join
My Latest Posts
My Monitored Threads
My Blog
My Photo Gallery
My Profile
My Homepage

Start New Thread
Enable EMail Alerts
Rate this Thread



©2008 Advenet LLC   Privacy Policy - Terms of Use
This website includes both content owned or controlled by Advenet as well as content owned or controlled by third parties.