> Create a file and add the following lines:-
>
[quoted text clipped - 6 lines]
>
> The expected result is "pattern" is only found in line 1.
This doesn't work:
(?<!%.{0,100})pattern
But this does for the example
(?<!%\s{0,100})pattern
Note that
(?<!%.*)pattern
Leads to the error message, that the look behind pattern does not have an
obvious max length.
This could mean
1) You found a bug in Eclipse
2) You found a bug in Java std lib
3) Our understanding of the RE engines lookbehind mechanism is incomplete
Kind regards
robert
Alan Moore - 05 Jun 2005 21:45 GMT
>This doesn't work:
>
[quoted text clipped - 18 lines]
>
>3) Our understanding of the RE engines lookbehind mechanism is incomplete
It's #2. Lookbehind means that the enclosed subexpression matches
starting at some position before the current match position and ending
AT the current match position. The way it's implemented, the
subexpression is allowed to match BEYOND the current match position.
That's what's happening with the first regex above: the ".{0,100}"
matches all the way to the end of the line, then that position is
compared to the current match position. They don't line up, so the
subexpression fails and the negative lookbehind incorrectly succeeds.
In fact, quantifiers don't really work at all. I thought the regex
above would work if a reluctant quantifier were used:
(?<!%.{0,100}?)pattern
...but it doesn't. It just tries to match the dot zero times, fails
and gives up. Whether the subexpression matches too much or too
little, it never goes back to try matching a different amount.
I'll go ahead and file the bug report if nobody has any objection.
Chris - 06 Jun 2005 03:04 GMT
I don't object as long as you point it's the negative lookbehind that
isn't working. The original regex is "(?<!%).*string" which is to say
find all lines that contain "string" that doesn't have a "%" anywhere
in the line before the "string".
Alan Moore - 06 Jun 2005 06:55 GMT
>I don't object as long as you point it's the negative lookbehind that
>isn't working. The original regex is "(?<!%).*string" which is to say
>find all lines that contain "string" that doesn't have a "%" anywhere
>in the line before the "string".
The only difference between positive and negative lookbehind is
whether a match is treated as success or failure. They're both
supposed to exhaust all possibilities trying to find the match but, as
you've discovered, they don't. Below is a test case I wrote up. All
four regexes should match "foo1", "foo2" and "foo3", but only the
fourth one does, and it's a hack.
import java.util.regex.*;
public class Test
{
public static void main(String[] args)
{
String str =
"%foo1\n%bar foo2\n%bar foo3\n%blahblah foo4\nfoo5";
String[] rgxs = { "(?<=%.{0,5})foo\\d",
"(?<=%.{0,5}?)foo\\d",
"(?<=%.{0,5}\\b)foo\\d",
"foo\\d(?<=%.{0,5}foo\\d)" };
for (int i=0; i<rgxs.length; i++)
{
Pattern p = Pattern.compile(rgxs[i]);
Matcher m = p.matcher(str);
System.out.println();
System.out.println(p.pattern());
while (m.find())
{
System.out.println(m.group());
}
}
}
}
Robert Klemme - 06 Jun 2005 10:09 GMT
> I don't object as long as you point it's the negative lookbehind that
> isn't working. The original regex is "(?<!%).*string" which is to say
> find all lines that contain "string" that doesn't have a "%" anywhere
> in the line before the "string".
Btw, this one should also do the job if you just want to omit a single
char:
^[^%]*pattern
Kind regards
robert
Chris - 08 Jun 2005 05:17 GMT
Robert,
Thanks, that worked.
Chris - 06 Jun 2005 03:06 GMT
A more generic case would to find some artifact that's *not* commented
out!
Alan Moore - 06 Jun 2005 01:01 GMT
>> Create a file and add the following lines:-
>>
[quoted text clipped - 21 lines]
>Leads to the error message, that the look behind pattern does not have an
>obvious max length.
Try this: pattern(?<=%.{0,100}pattern)
Note that "pattern" has to have an obvious maximum length.