Hi!
I have problem, my class parse html document work pretty good, but if in
html document find "<html xmlns="http://www.w3.org/1999/xhtml"
xml:lang="en" lang="en">" then return error. Why ?
import javax.swing.text.html.*;
import javax.swing.text.*;
import java.net.*;
public class HTMLParse extends HTMLEditorKit.ParserCallback{
int begin=0;
int end = 0;
//public void handleError(String errorMsg,int pos){
// System.out.println("Wystapil blad: "+errorMsg);
//System.exit(2); //gdybysmy chcieli wyjsc na problem z tagiem,
ale nie chcemy
// }
public void handleStartTag(HTML.Tag tag, MutableAttributeSet attrSet,
int pos) {
if (tag == HTML.Tag.P) {
begin++;
}
}
public void handleEndTag(HTML.Tag t,int pos){
if (t == HTML.Tag.STRONG) {
end++;
}
}
public void handleText(char[] data, int cos){
if(begin == 3 || begin == 4 || begin == 5 )
System.out.println(data);
}
}
Tom Hawtin - 02 Apr 2007 00:30 GMT
> I have problem, my class parse html document work pretty good, but if in
> html document find "<html xmlns="http://www.w3.org/1999/xhtml"
> xml:lang="en" lang="en">" then return error. Why ?
>
> import javax.swing.text.html.*;
The Swing HTML parse is ancient (and basic). Modern XHTML (XML format) I
believe will give it problems.
You could use something like JTidy (google it) to reformat the document
as old school HTML. Alternatively a short SAX handler could remove the
XMLisms (like using <x/> instead of <x></x>).
Tom Hawtin