Home | Contact Us | FAQ | Search & Site Map | Link to Us
Sign In | Join | Other 45 Sites in Network
HomeAnnouncementsWhite Papers
Discussion GroupsFirst AidDatabasesJavaBeansGUIJava 3DVirtual MachineCORBASecurityToolsGeneral
Java DirectoryOpen Source ProjectsSample Book ChaptersUser GroupsWeb Resources
Related Topics
Databases.NETMore Topics ...

Java Forum / General / February 2006

Tip: Looking for answers? Try searching our database.

Replace characters on html using regex????

Thread view: 
AjalaDeveloper - 24 Feb 2006 23:21 GMT
Hello people, I'm trying to do something very easy (I think so!). I'm
using Java regular expression to remove-> // in an html code, for
example in the next example, I would like to replace // with /, but
only those wich aren't next to http:
---------------------------------------------------------------------------­---------

<img src="http://example.com//img/img.jpg>
---------------------------------------------------------------------------­---------

I have problems in this example, just because I don't want to remove //

which are next to http:

I'd like to get: <img src="http://example.com/img/img.jpg>. I'm trying
wih the next java code:
---------------------------------------------------------------------------­---------

completeHtml = completeHtml.replaceAll("(?!http://)//*","/");
---------------------------------------------------------------------------­---------

I'm getting as result this: <img src="http:/example.com/img/img.jpg">.
I don't want to lose a / in http://

Please help me! Tk you
lewmania942@yahoo.fr - 25 Feb 2006 19:49 GMT
> I would like to replace // with /, but only those wich aren't
> next to http:

> <img src="http://example.com//img/img.jpg>

> completeHtml = completeHtml.replaceAll("(?!http://)//*","/");

nobody proposed a "good" way to do it with regexp (if such a
thing exists) so I propose a quick and dirty solution:

completeHtml = completeHtml.replaceAll("http://",
"http:///").replaceAll("//","/");

Note that I'm not for (nor against) using such code: I'm just
proposing a solution.

Hope it helps :)
Alan Moore - 26 Feb 2006 01:12 GMT
>Hello people, I'm trying to do something very easy (I think so!). I'm
>using Java regular expression to remove-> // in an html code, for
[quoted text clipped - 20 lines]
>
>Please help me! Tk you

What you want is a lookbehind, not a lookahead:

 completeHtml = completeHtml.replaceAll("(?<!http:)//","/");
Rob Skedgell - 26 Feb 2006 14:58 GMT
> Hello people, I'm trying to do something very easy (I think so!). I'm
> using Java regular expression to remove-> // in an html code, for
> example in the next example, I would like to replace // with /, but
> only those wich aren't next to http:

---------------------------------------------------------------------------­---------

> <img src="http://example.com//img/img.jpg>

---------------------------------------------------------------------------­---------

> I have problems in this example, just because I don't want to remove
> //
[quoted text clipped - 3 lines]
> I'd like to get: <img src="http://example.com/img/img.jpg>. I'm trying
> wih the next java code:

---------------------------------------------------------------------------­---------

> completeHtml = completeHtml.replaceAll("(?!http://)//*","/");

---------------------------------------------------------------------------­---------

> I'm getting as result this: <img src="http:/example.com/img/img.jpg">.
> I don't want to lose a / in http://

You should also note that the SGML DOCTYPE declaration may also contain
double slashes which you want to preserve, something which may look
like this:

<!DOCTYPE HTML PUBLIC
          "-//W3C//DTD HTML 4.01 Transitional//EN"
          "http://www.w3.org/TR/REC-html401-19991224/loose.dtd">

or

<!DOCTYPE html
 PUBLIC "-//W3C//DTD XHTML 1.1//EN"
 "http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd">

You might find a quick and dirty solution to not changing the //s here
might be to skip the first few (say 3-5) lines, or everything between
the "<!DOCTYPE" and its closing ">". Of course, if the HTML documents
concerned don't have a DOCTYPE declaration, there's no need to worry
about this.

Signature

Rob Skedgell <rob+news@nephelococcygia.demon.co.uk>
GnuPG/PGP: 7DA3 1579 C0DD 8748 C05A  B984 E2A2 3234 D14B 6DD7



Free Magazines

Get these publications absolutely FREE for up to 12 months. There are no hidden fees and no obligation. Simply choose a title, complete the application form and submit it. Read more ...

Oracle MagazineNetwork ComputingComputer WorldBio-IT WorldeWeekInformation WeekInfosecurity
 
Sign In
Join
My Latest Posts
My Monitored Threads
My Blog
My Photo Gallery
My Profile
My Homepage

Start New Thread
Enable EMail Alerts
Rate this Thread



©2009 Advenet LLC   Privacy Policy - Terms of Use
This website includes both content owned or controlled by Advenet as well as content owned or controlled by third parties.