Home | Contact Us | FAQ | Search & Site Map | Link to Us
Sign In | Join | Other 45 Sites in Network
HomeAnnouncementsWhite Papers
Discussion GroupsFirst AidDatabasesJavaBeansGUIJava 3DVirtual MachineCORBASecurityToolsGeneral
Java DirectoryOpen Source ProjectsSample Book ChaptersUser GroupsWeb Resources
Related Topics
Databases.NETMore Topics ...

Java Forum / General / July 2006

Tip: Looking for answers? Try searching our database.

regex - extract <br> before span

Thread view: 
skajotde@gmail.com - 14 Jul 2006 12:37 GMT
Hi all

I'd like extraxt <br> before span.

example:

<span style="text-decoration: underline;"><span style="text-decoration:
underline;">ff<br>ff</span></span>

to:

<span style="text-decoration: underline;"><span style="text-decoration:
underline;">ff</span></span><br><span style="text-decoration:
underline;"><span style="text-decoration: underline;">ff</span></span>

* br outside of span

Pattern spanBR =
Pattern.compile("(<span[^>]*?>).*?(</span){0}.*?(<newline/>).*?(<span){0}.*?(</span>)",
Pattern.DOTALL | Pattern.CASE_INSENSITIVE);

Matcher matcherSpanBR = spanBR.matcher(html);

while (matcherSpanBR.matches()) {
        html =  matcherSpanBR.replaceAll("$1$2$5$1$3$1$4$5");
}

My question is how say "part of text without </span> between <span and
<newline/> and save this part text to register" (<newline/> is my br
aftter first conversion).

Cheers
Kamil
skajotde@gmail.com - 14 Jul 2006 12:46 GMT
> Pattern spanBR =
> Pattern.compile("(<span[^>]*?>).*?(</span){0}.*?(<newline/>).*?(<span){0}.*?(</span>)",
> Pattern.DOTALL | Pattern.CASE_INSENSITIVE);

Before My pattenr looks like:

"(<span[^>]*?>)(.*?)(<newline/>)(.*?)(</span>)"

But this pattern matches:

<span style="text-decoration: underline;"></span>
<br /><span style="text-decoration: underline;"><span
style="text-decoration: underline;">ff<br>ff</span></span><br />* Some
test: <span style="font-weight: bold;"> Some test</span>

i have to move up <br> recursive inside all span, any suggestions?
Oliver Wong - 14 Jul 2006 15:11 GMT
>> Pattern spanBR =
>> Pattern.compile("(<span[^>]*?>).*?(</span){0}.*?(<newline/>).*?(<span){0}.*?(</span>)",
[quoted text clipped - 12 lines]
>
> i have to move up <br> recursive inside all span, any suggestions?

   Give up with regular expressions, and use a context free grammar based
parser instead. See http://java-source.net/open-source/html-parsers

   - Oliver
skajotde - 16 Jul 2006 10:57 GMT
>     Give up with regular expressions, and use a context free grammar based
> parser instead. See http://java-source.net/open-source/html-parsers
>
>     - Oliver

Yes, it's not too bad solution. At the moment i'm using this code:

//  wylapanie <br style="font-weight: bold;"/>
       Pattern badBR = Pattern.compile("<br.*?>", Pattern.DOTALL |
Pattern.CASE_INSENSITIVE);
       Matcher matcherBR = badBR.matcher(html);
       html =  matcherBR.replaceAll("<newline/>");

       // usuniecie pustych span'ow
       Pattern emptySpan = Pattern.compile("<span[^>]*?></span>",
Pattern.DOTALL | Pattern.CASE_INSENSITIVE);
       Matcher matcherSpan = emptySpan.matcher(html);
       html =  matcherSpan.replaceAll("");

       // przesuniecie <newline/> ze spanu miedzy dwa spany
       Pattern spanBR = Pattern.compile(

"(<span[^>]*?>)([^<>]*?)(<newline/>)([^<>]*?)(</span>)",
               Pattern.DOTALL | Pattern.CASE_INSENSITIVE);
       Matcher matcherSpanBR = spanBR.matcher(html);
       int numLoop = 0;
       while (matcherSpanBR.find() == true) {
           html = matcherSpanBR.replaceAll("$1$2$5$3$1$4$5");
           matcherSpan.reset(html);
           // jeszcze raz usun puste span'y
           html =  matcherSpan.replaceAll("");
           matcherSpanBR.reset(html);
           numLoop++;
           // max 3 poziomy zagniezdzenia
           if (numLoop > 3) break;
       }

I hope it's sufficient (my bug was resolved).

Thanks for help

Cheers
Kamil


Free Magazines

Get these publications absolutely FREE for up to 12 months. There are no hidden fees and no obligation. Simply choose a title, complete the application form and submit it. Read more ...

Oracle MagazineNetwork ComputingComputer WorldBio-IT WorldeWeekInformation WeekInfosecurity
 
Sign In
Join
My Latest Posts
My Monitored Threads
My Blog
My Photo Gallery
My Profile
My Homepage

Start New Thread
Enable EMail Alerts
Rate this Thread



©2008 Advenet LLC   Privacy Policy - Terms of Use
This website includes both content owned or controlled by Advenet as well as content owned or controlled by third parties.