I was writing a quick-and-dirty regex to search html text and pull out the
source url from IMG tags. I first tried:
Pattern p = Pattern.compile("<img (?:[^>]* )?src=\"([^\"]*)\"");
(I know that this pattern makes all kinds of unwarranted assumptions about
the html, but that's another topic.) The problem I was having was that
although this pattern matches, it only results in one capture group--group
0. I was expecting the parens after src= to give me the url in capture group
1, but no such luck. It's only when I double the parens:
Pattern p = Pattern.compile("<img (?:[^>]* )?src=\"(([^\"]*))\"");
that the src value is captured.
So my question is: why do I need to double the parens?
Thanks,
Ted Hopp
Jussi Piitulainen - 13 Nov 2006 11:10 GMT
> I was writing a quick-and-dirty regex to search html text and
> pull out the source url from IMG tags. I first tried:
[quoted text clipped - 14 lines]
>
> So my question is: why do I need to double the parens?
You don't need to double the parens. You need to provide a
short program that demonstrates the problem. The following is
longer than needed, but it fails to fail in the way that you
describe: it has single parens in the pattern, accesses group
1, and prints here.be.it/1 and here.be.it/2 as expected:
import java.util.regex.Matcher;
import java.util.regex.Pattern;
class Roska {
public static void main(String [] args) {
String t1 = "left <img stuff src=\"here.be.it/1\" etc.>";
String t2 = " then left <img src=\"here.be.it/2\" etc.>";
Pattern p = Pattern
.compile("<img (?:[^>]* )?src=\"([^\"]*)\"");
Matcher m = p.matcher(t1 + t2);
while (m.find()) {
System.out.println(m.group(1));
}
}
}