Home | Contact Us | FAQ | Search & Site Map | Link to Us
Sign In | Join | Other 45 Sites in Network
HomeAnnouncementsWhite Papers
Discussion GroupsFirst AidDatabasesJavaBeansGUIJava 3DVirtual MachineCORBASecurityToolsGeneral
Java DirectoryOpen Source ProjectsSample Book ChaptersUser GroupsWeb Resources
Related Topics
Databases.NETMore Topics ...

Java Forum / General / December 2006

Tip: Looking for answers? Try searching our database.

extract data from xhtml

Thread view: 
Damo_Suzuki - 07 Dec 2006 16:36 GMT
Hi,
I am in the process of extracting data from a html document. I used
Jtidy to convert it to XHTML . Now that I have the XHTML how can i
extract data from it. Say, I wanted to extract a node with the tag <h2
class ="r">.......</h2> , does anyone know or have sample code to
achieve this. I've been knocking my head off a brick wall for a few
days now trying to do this.
Thanks
Flo 'Irian' Schaetz - 07 Dec 2006 16:53 GMT
> I am in the process of extracting data from a html document. I used
> Jtidy to convert it to XHTML . Now that I have the XHTML how can i
> extract data from it.

As a valid XHTML document is well formed XML, you should be able to parse
it - either with a DOMParser or SAXParser. Searching for them in Google
should bring up enough examples how to use them.

Flo
Damo_Suzuki - 07 Dec 2006 22:08 GMT
Hi,
Now that its in XHTML can I use DocumentBuilder to extract data from it
. I dont want to write the xhml to a file. my code looks like this :

      tidy.parse(in, System.out);

      DocumentBuilderFactory domFactory =
DocumentBuilderFactory.newInstance();
      domFactory.setNamespaceAware(true);
      DocumentBuilder builder = domFactory.newDocumentBuilder();
      Document doc = builder.parse(XXXXXXXXXX);

In the parse method 'in' is the file i want to extract data from. Its
gotten straight off the web, "JTidied" and output to the console.  Can
I somehow use this as the paramater where all the X's are  for the
DocumentBuilder parse method?
Thanks


Free Magazines

Get these publications absolutely FREE for up to 12 months. There are no hidden fees and no obligation. Simply choose a title, complete the application form and submit it. Read more ...

Oracle MagazineNetwork ComputingComputer WorldBio-IT WorldeWeekInformation WeekInfosecurity
 
Sign In
Join
My Latest Posts
My Monitored Threads
My Blog
My Photo Gallery
My Profile
My Homepage

Start New Thread
Enable EMail Alerts
Rate this Thread



©2008 Advenet LLC   Privacy Policy - Terms of Use
This website includes both content owned or controlled by Advenet as well as content owned or controlled by third parties.