I am trying to extract all urls for a perticular page, but without a
success.
java.util.regex.Pattern p = Pattern.compile("<a
href=\"http://(.*)\">",Pattern.MULTILINE);
java.util.regex.Matcher m = p.matcher(strhtmpage);
while ( m.find() )
{
System.out.println( "LINKS: " + m.group(1) );
}
lordy - 07 Aug 2006 02:19 GMT
> I am trying to extract all urls for a perticular page, but without a
> success.
[quoted text clipped - 6 lines]
> System.out.println( "LINKS: " + m.group(1) );
> }
Your ".*" is greedy by default. You want a reluctant matcher. Or use
something like [^"]* instead. (Which will be more efficient).
Read Javadoc or perlre to understand greedy regexps and all will become
clear.
Lordy