> I am in the process of extracting data from a html document. I used
> Jtidy to convert it to XHTML . Now that I have the XHTML how can i
> extract data from it.
As a valid XHTML document is well formed XML, you should be able to parse
it - either with a DOMParser or SAXParser. Searching for them in Google
should bring up enough examples how to use them.
Flo
Damo_Suzuki - 07 Dec 2006 22:08 GMT
Hi,
Now that its in XHTML can I use DocumentBuilder to extract data from it
. I dont want to write the xhml to a file. my code looks like this :
tidy.parse(in, System.out);
DocumentBuilderFactory domFactory =
DocumentBuilderFactory.newInstance();
domFactory.setNamespaceAware(true);
DocumentBuilder builder = domFactory.newDocumentBuilder();
Document doc = builder.parse(XXXXXXXXXX);
In the parse method 'in' is the file i want to extract data from. Its
gotten straight off the web, "JTidied" and output to the console. Can
I somehow use this as the paramater where all the X's are for the
DocumentBuilder parse method?
Thanks