>> Pattern spanBR =
>> Pattern.compile("(<span[^>]*?>).*?(</span){0}.*?(<newline/>).*?(<span){0}.*?(</span>)",
[quoted text clipped - 12 lines]
>
> i have to move up <br> recursive inside all span, any suggestions?
Give up with regular expressions, and use a context free grammar based
parser instead. See http://java-source.net/open-source/html-parsers
- Oliver
skajotde - 16 Jul 2006 10:57 GMT
> Give up with regular expressions, and use a context free grammar based
> parser instead. See http://java-source.net/open-source/html-parsers
>
> - Oliver
Yes, it's not too bad solution. At the moment i'm using this code:
// wylapanie <br style="font-weight: bold;"/>
Pattern badBR = Pattern.compile("<br.*?>", Pattern.DOTALL |
Pattern.CASE_INSENSITIVE);
Matcher matcherBR = badBR.matcher(html);
html = matcherBR.replaceAll("<newline/>");
// usuniecie pustych span'ow
Pattern emptySpan = Pattern.compile("<span[^>]*?></span>",
Pattern.DOTALL | Pattern.CASE_INSENSITIVE);
Matcher matcherSpan = emptySpan.matcher(html);
html = matcherSpan.replaceAll("");
// przesuniecie <newline/> ze spanu miedzy dwa spany
Pattern spanBR = Pattern.compile(
"(<span[^>]*?>)([^<>]*?)(<newline/>)([^<>]*?)(</span>)",
Pattern.DOTALL | Pattern.CASE_INSENSITIVE);
Matcher matcherSpanBR = spanBR.matcher(html);
int numLoop = 0;
while (matcherSpanBR.find() == true) {
html = matcherSpanBR.replaceAll("$1$2$5$3$1$4$5");
matcherSpan.reset(html);
// jeszcze raz usun puste span'y
html = matcherSpan.replaceAll("");
matcherSpanBR.reset(html);
numLoop++;
// max 3 poziomy zagniezdzenia
if (numLoop > 3) break;
}
I hope it's sufficient (my bug was resolved).
Thanks for help
Cheers
Kamil