Hello people, I'm trying to do something very easy (I think so!). I'm
using Java regular expression to remove-> // in an html code, for
example in the next example, I would like to replace // with /, but
only those wich aren't next to http:
------------------------------------------------------------------------------------
<img src="http://example.com//img/img.jpg>
------------------------------------------------------------------------------------
I have problems in this example, just because I don't want to remove //
which are next to http:
I'd like to get: <img src="http://example.com/img/img.jpg>. I'm trying
wih the next java code:
------------------------------------------------------------------------------------
completeHtml = completeHtml.replaceAll("(?!http://)//*","/");
------------------------------------------------------------------------------------
I'm getting as result this: <img src="http:/example.com/img/img.jpg">.
I don't want to lose a / in http://
Please help me! Tk you
lewmania942@yahoo.fr - 25 Feb 2006 19:49 GMT
> I would like to replace // with /, but only those wich aren't
> next to http:
> <img src="http://example.com//img/img.jpg>
> completeHtml = completeHtml.replaceAll("(?!http://)//*","/");
nobody proposed a "good" way to do it with regexp (if such a
thing exists) so I propose a quick and dirty solution:
completeHtml = completeHtml.replaceAll("http://",
"http:///").replaceAll("//","/");
Note that I'm not for (nor against) using such code: I'm just
proposing a solution.
Hope it helps :)
Alan Moore - 26 Feb 2006 01:12 GMT
>Hello people, I'm trying to do something very easy (I think so!). I'm
>using Java regular expression to remove-> // in an html code, for
[quoted text clipped - 20 lines]
>
>Please help me! Tk you
What you want is a lookbehind, not a lookahead:
completeHtml = completeHtml.replaceAll("(?<!http:)//","/");
Rob Skedgell - 26 Feb 2006 14:58 GMT
> Hello people, I'm trying to do something very easy (I think so!). I'm
> using Java regular expression to remove-> // in an html code, for
> example in the next example, I would like to replace // with /, but
> only those wich aren't next to http:
------------------------------------------------------------------------------------
> <img src="http://example.com//img/img.jpg>
------------------------------------------------------------------------------------
> I have problems in this example, just because I don't want to remove
> //
[quoted text clipped - 3 lines]
> I'd like to get: <img src="http://example.com/img/img.jpg>. I'm trying
> wih the next java code:
------------------------------------------------------------------------------------
> completeHtml = completeHtml.replaceAll("(?!http://)//*","/");
------------------------------------------------------------------------------------
> I'm getting as result this: <img src="http:/example.com/img/img.jpg">.
> I don't want to lose a / in http://
You should also note that the SGML DOCTYPE declaration may also contain
double slashes which you want to preserve, something which may look
like this:
<!DOCTYPE HTML PUBLIC
"-//W3C//DTD HTML 4.01 Transitional//EN"
"http://www.w3.org/TR/REC-html401-19991224/loose.dtd">
or
<!DOCTYPE html
PUBLIC "-//W3C//DTD XHTML 1.1//EN"
"http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd">
You might find a quick and dirty solution to not changing the //s here
might be to skip the first few (say 3-5) lines, or everything between
the "<!DOCTYPE" and its closing ">". Of course, if the HTML documents
concerned don't have a DOCTYPE declaration, there's no need to worry
about this.

Signature
Rob Skedgell <rob+news@nephelococcygia.demon.co.uk>
GnuPG/PGP: 7DA3 1579 C0DD 8748 C05A B984 E2A2 3234 D14B 6DD7