
Signature
Michael Powe michael@trollope.org Naugatuck CT USA
War is a sociological safety valve that cleverly diverts popular
hatred for the ruling classes into a happy occasion to mutilate or
kill foreign enemies. - Ernest Becker
> >>>>> "Jussi" == Jussi Piitulainen writes:
>
[quoted text clipped - 10 lines]
> In my test, it happens everywhere -- the regexp fails when there's
> nothing there and when there's text there.
Right, except I would say _nowhere_ rather than everywhere. If (?!\S+)
matches, \] does not. If \] matches, (?!\S+) does not.
> Jussi> Positive lookahead:
>
[quoted text clipped - 7 lines]
> The reason for my testing was because the regexp fails to match the
> case where there is nothing between the brackets. Note that the
I thought that was the case that succeeded. That pattern is just like
(.*)\[\](.*) with an extra condition that the part of input that
matches \](.*) must also match \S+, which it does, since the ] is
there.
Are you sure that you understand that a lookahead pattern always
consumes an empty string? So your whole pattern can only match a pair
of brackets [], with the two groups on each side of it.
> In the real-world case that led me to examine the lookahead option,
> I had a regexp matching a long string (9 group captures) that failed
> when one of the expected groups, inside a bracket pair, was empty.
> \\S+ does not match inside [] and thus caused the whole regex to
> fail.
\S matches the right bracket, and eats it, too. (?=\S+) also matches
the right bracket but doesn't eat it.
Nine groups sounds rather complicated. Do you need to do it all in one
expression?
> I would like to see a useful, nontrivial application of lookahead.
> It doesn't appear to me that there is one.
I think there is a candidate in the other post I made, this morning I
think, where someone wanted to split a certain file at each <?xml...>
thingamajic in it.
(Which reminds me, you might consider the use of non-greedy patterns,
like .*?, since those .* try to eat the bracket pairs, too, and that
may lead to something that feels unintuitive.)
> And the negative lookahead just appears broken.
Let me contrive an example of sorts: a maximal digit sequence not
bounded by a . or a - or an e.
import java.util.regex.Matcher;
import java.util.regex.Pattern;
class NonLook {
public static void main(String [] _) {
Matcher m = Pattern
.compile("(?<![.e\\-\\d])\\d++(?![.e\\-])")
.matcher("pi 3.14 314e-2 1024 e 2.7 27e-1 31415926");
while (m.find()) {
System.out.println(m.group(0));
}
}
}
Ok, I had to throw in a lookbehind, a possessive quantifier in \d++,
and a \d inside the lookbehind. This does not eat the preceding or
following character, and matches even where there is no following
character at all. It seems to work.
> Jussi> (Javadoc for 1.4.2 was not too helpful here, so I
> Jussi> experimented a bit, never having used these myself.)
>
> I actually have Habibi's book, _Java Regular Expressions_, but IMO
> it is not very useful if you already have good knowledge of regex.
Does it tell what (?>X) does? Sun's doc says it matches "X, as an
independent, non-capturing group". I have no idea what an independent
group is. (I know that I'm not looking at the latest documentation.)
...
> Ironically, Habibi criticizes perl's conditional construct in regex,
> and it is exactly that construct that I need in the case described
> here.
There are likely to be other ways.
If your problem is that a pair of brackets in your input may contain
an empty string that you need to match, then you need to match an
empty string there. There is no way around that.