Hi All,
I have a question regarding Regexp. The string that I need to change
is:
href="http://www.mysite.com/test1.html" ... href="/test2.html" ...
and this is what I would like to get after the replaceAll:
href="http://www.mysite.com/test1.html" ...
href="http://www.mysite.com/test2.html" ...
In other words, match all occurences of href=" that are not followed by
the http:// sequence.
I did look in the docs but could not figure out how to exclude a
string. Any ideas?
Regards,
Sebastian
Oliver Wong - 18 Jul 2006 20:01 GMT
> Hi All,
>
[quoted text clipped - 13 lines]
> I did look in the docs but could not figure out how to exclude a
> string. Any ideas?
Your example doesn't match your specification. If you were to match all
occurences of href=" that are not followed by the http:// sequence, with the
input:
<input>
href="http://www.mysite.com/test1.html" ... href="/test2.html" ...
</input>
you'd get one match:
<output>
<match>href="</match>
</output>
you also mention a "replaceAll" but you don't say what you're replacing, and
with what.
Perhaps it'd help if you specified the goal, and not the method.
Are you trying to change all relative URLs in an HTML document to absolute
URLs?
- Oliver
Ben - 21 Jul 2006 21:01 GMT
>> Hi All,
>>
[quoted text clipped - 37 lines]
>
> - Oliver
In case you're trying to replace relative URL with absolute, look at the
URL class, one of its constructor does just that:
Something like: URL absolute = new URL( URL referenceURL, String relative)
Ben
John Maline - 18 Jul 2006 20:09 GMT
> In other words, match all occurences of href=" that are not followed by
> the http:// sequence.
A pattern like "href=\"(?!http://).*" would exclude the string "http://"
after the "href=\"" part. Depending on how everything's configured,
you've got to be sure to actually match the stuff you've just excluded
(as I do with the ".*").
The java.util.regex.Pattern doc on writing a pattern can be tough to
read. Maybe unavoidable, regular expressions can be tough. The (?!X)
construct is mentioned as a "zero-width negative lookahead" under
Special constructs. By zero-width, they mean it doesn't actually
consume any characters. It just asserts that at the current point in
the match, we must not be looking at X.
Cheers!
John