Java Forum / General / November 2005
boolean endsWith(String s, Pattern pattern)
lepikhin@gmail.com - 07 Nov 2005 14:44 GMT Anybody hear about the
boolean endsWith(String s, Pattern pattern)
implementations?
lepikhin@gmail.com - 07 Nov 2005 14:46 GMT PS. I know about "something$"-like regexp
Roedy Green - 07 Nov 2005 15:43 GMT >boolean endsWith(String s, Pattern pattern) The source code for all the regex stuff is there for you to look at in src.zip.
IIRC it compiles Patterns to commands to a state machine interpreter, not byte code.
 Signature Canadian Mind Products, Roedy Green. http://mindprod.com Java custom programming, consulting and coaching.
Rhino - 07 Nov 2005 18:53 GMT > Anybody hear about the > > boolean endsWith(String s, Pattern pattern) > > implementations? Since a boolean can only contain the boolean values 'true' and 'false', how could it possibly contain a String? Why would anyone write a method to search for Strings within a boolean?
Rhino
Roedy Green - 08 Nov 2005 04:14 GMT On Mon, 7 Nov 2005 13:53:09 -0500, "Rhino" <no.offline.contact.please@nospam.com> wrote, quoted or indirectly quoted someone who said :
>> boolean endsWith(String s, Pattern pattern) >> [quoted text clipped - 3 lines] >could it possibly contain a String? Why would anyone write a method to >search for Strings within a boolean? huh?
I think he is asking does string s end with given pattern yes or no.
he can't make it an instance method of String since String is final. He left out the word "static", though I suppose it could be implemented on some unrelated object.
 Signature Canadian Mind Products, Roedy Green. http://mindprod.com Java custom programming, consulting and coaching.
lepikhin@gmail.com - 08 Nov 2005 12:40 GMT > I think he is asking does string s end with given pattern yes or no. Yes. Sorry for ambiguous problem statement.
Probmlem with "something$"-like regexps is, that you need to look through the whole string. It's very slow.
Roedy Green - 08 Nov 2005 13:34 GMT >Probmlem with "something$"-like regexps is, that you need to look >through the whole string. It's very slow. it depends on how clever the Pattern compiler is. A smart one might scan the string backwards.
 Signature Canadian Mind Products, Roedy Green. http://mindprod.com Java custom programming, consulting and coaching.
lepikhin@gmail.com - 08 Nov 2005 14:36 GMT > it depends on how clever the Pattern compiler is. A smart one might > scan the string backwards. Unfortunately Java Pattern compiler (java.util.regex.Pattern) is stupid. I've checked it.
Roedy Green - 08 Nov 2005 14:48 GMT >> it depends on how clever the Pattern compiler is. A smart one might >> scan the string backwards. > >Unfortunately Java Pattern compiler (java.util.regex.Pattern) is >stupid. I've checked it. In that case, if this were a bottleneck, you could help the regex along by chopping off all but the last N characters.
 Signature Canadian Mind Products, Roedy Green. http://mindprod.com Java custom programming, consulting and coaching.
Chris Uppal - 08 Nov 2005 16:01 GMT > > it depends on how clever the Pattern compiler is. A smart one might > > scan the string backwards. > > Unfortunately Java Pattern compiler (java.util.regex.Pattern) is > stupid. I've checked it. Construct your regexp backwards, reverse the string, and look for a match at the beginning of that...
-- chris
lepikhin@gmail.com - 09 Nov 2005 12:17 GMT > Construct your regexp backwards It's difficult, for some reasons.
Thanks anyway.
John C. Bollinger - 10 Nov 2005 12:42 GMT In an earlier post, lepikhin@gmail.com wrote:
>Unfortunately Java Pattern compiler (java.util.regex.Pattern) is >stupid. I've checked it. And in a different one:
>>Construct your regexp backwards > > It's difficult, for some reasons. My apologies, but does that mean you're stupid, too? I mean, you criticize the regex compiler as stupid for not being able to do a job equivalent to the one you say is too difficult for *you*. Regexes do not easily reverse; as far as I know, there is no general algorithm for "reversing" a regex, so it is unsurprising that Pattern doesn't know how to do it.
You started out asking how to determine whether a string ends with a match for a particular pattern. The usual approach is to prepend ".*" to the pattern and match the whole string with it.
 Signature John Bollinger jobollin@indiana.edu
Chris Uppal - 11 Nov 2005 17:29 GMT > Regexes do > not easily reverse; as far as I know, there is no general algorithm for > "reversing" a regex, so it is unsurprising that Pattern doesn't know how > to do it. Why do you say that ? Am I missing something, or are you talking about the difficulty of reversing (some of) the Perl-ish extensions to the classic regexp concept ?
It seems (to me) that reversing a classic regexp is straightforward using the following (rather redundant) transformations on the parsed AST representation of the regexp:
reverse( c ) -> c reverse( . ) -> . reverse( [a-b] ) -> [a-b] reverse( R1 | R2 ) -> reverse(R1) | reverse(R2) reverse( R1R2 ) -> reverse(R2) reverse(R1) reverse( ( R ) ) -> ( reverse(R) ) reverse( R * ) -> reverse(R) * reverse( R + ) -> reverse(R) + reverse( ^ R) -> reverse(R) $ reverse( R $ ) -> ^ reverse(R)
(where a, b, and c are characters and R, R1, and R1 are regexps). I have no idea what some of the Perl-ish extensions would even mean if reversed, but "classic" regexps are good enough for me, and perhaps for most (legitimate) applications.
-- chris
John C. Bollinger - 12 Nov 2005 02:54 GMT >>Regexes do >>not easily reverse; as far as I know, there is no general algorithm for [quoted text clipped - 4 lines] > difficulty of reversing (some of) the Perl-ish extensions to the classic regexp > concept ? I am talking about reversing regexes expressed in the language supported by java.util.regex.Pattern, a language even more feature-laden than Perl's regex language.
> It seems (to me) that reversing a classic regexp is straightforward using the > following (rather redundant) transformations on the parsed AST representation [quoted text clipped - 15 lines] > "classic" regexps are good enough for me, and perhaps for most (legitimate) > applications. Your argument for how that particular regex language can be reversed is convincing, but you don't have to add much to the language to make the task a lot harder. How about reluctant quantifiers, for instance? We had a different thread in the past week where the regex engine's leftmost matching behavior with respect to reluctant quantifiers produced a result that it would be tricky to reproduce working from the other end. (This is a regex feature also supported by Perl, and one with some good uses.)
It might be possible (though nontrivial) to solve the problem with reluctant quantifiers, but what about greedy ones? Those are too closely tied to details of the engine's matching behavior to admit reversal, I think.
 Signature John Bollinger jobollin@indiana.edu
Roedy Green - 12 Nov 2005 03:44 GMT On Fri, 11 Nov 2005 21:54:41 -0500, "John C. Bollinger" <jobollin@indiana.edu> wrote, quoted or indirectly quoted someone who said :
>I am talking about reversing regexes expressed in the language supported >by java.util.regex.Pattern, a language even more feature-laden than >Perl's regex language. A optimisation does not need to handle all cases. It is perfectly legit to cherry pick the easy ones to optimise.
 Signature Canadian Mind Products, Roedy Green. http://mindprod.com Java custom programming, consulting and coaching.
Chris Uppal - 14 Nov 2005 10:37 GMT > > Why do you say that ? Am I missing something, or are you talking about > > the difficulty of reversing (some of) the Perl-ish extensions to the [quoted text clipped - 3 lines] > by java.util.regex.Pattern, a language even more feature-laden than > Perl's regex language. Right. I tend to ignore the non-classical extensions, myself.
> Your argument for how that particular regex language can be reversed is > convincing, but you don't have to add much to the language to make the > task a lot harder. How about reluctant quantifiers, for instance? It seems to me that the greedy/reluctant distinction is only about handling ambiguous "parses". That's to say that a switching between greedy and reluctant never changes the language that the regexp recognises, but only changes the way that sub-regexps are assigned to substrings in the input sequence. (I'm assuming that the regexp is being used to make a simple Boolean judgement "does this string match?", the streaming mode is obviously different). For instance, given the pattern: /a*a*/ and the input sequence: aaaa there are various possible parses: ()(aaaa) (a)(aaa) (aa)(aa) (aaa)(a) ()(aaaa) and the choice amongst them is determined by the greediness of the matches. I haven't been able to prove it (possibly because I lack a precise definition of "reluctant", possibly because I can't think straight), but I conjecture that inverting the greediness of each subexpression while reversing the pattern would produce the "same" parse of the reversed string.
I have no idea about the possessive qualifier. But then I don't care either -- it's an aesthetic disaster, a theoretic horror, and I hope (and expect) never to see any legitimate use for it.
Beyond those cases, the only non-classical features of the Java regexp package (I think) are: The wide palette of character classes, which can be seen as syntactic sugar for character ranges, or even for "raw" alternation, and which reverse trivially. The various end-of-word, end-of-line, etc, pseudo-characters. The pseudo-characters present a problem if the set is not closed under reversal, I don't know whether it is (CR-LF, anyone ?). And the backref feature, which obviously doesn't reverse.
-- chris
John C. Bollinger - 15 Nov 2005 03:17 GMT >>Your argument for how that particular regex language can be reversed is >>convincing, but you don't have to add much to the language to make the [quoted text clipped - 7 lines] > judgement "does this string match?", the streaming mode is obviously > different). You're right. I was thinking more about assigning parts of the matched string to groups the same way, and not as much about the overall "does this string match?", which is the most important aspect of the question the OP asked (somewhere back there in the distance...).
[...]
> I conjecture that > inverting the greediness of each subexpression while reversing the pattern > would produce the "same" parse of the reversed string. Now that would be a tidy result. A little prodding at it didn't make a counterexample fall out, but I'm too lazy to try to really prove it.
> I have no idea about the possessive qualifier. But then I don't care either -- > it's an aesthetic disaster, a theoretic horror, and I hope (and expect) never > to see any legitimate use for it. Do I sense a mild distaste for the feature? :^) In truth, I agree with you. I raised the issue only because I think they're a certain killer of general-purpose reversal of arbitrary Java regex patterns. Perhaps in itself that is reflective of the possessive quantifiers' perversion.
> Beyond those cases, the only non-classical features of the Java regexp package > (I think) are: The wide palette of character classes, which can be seen as [quoted text clipped - 3 lines] > closed under reversal, I don't know whether it is (CR-LF, anyone ?). And the > backref feature, which obviously doesn't reverse. As I consider it, I think perhaps even a backreference is reversible by exchanging the places of the last reference to each group and its referrent, and fixing up the group indices in the references. I don't see a pseudo-character for a CR/LF pair, but such a combination is recognized as a single line terminator when not in UNIX_LINES mode. I think that could be handled by use of the available individual metacharacters for CR and LF. If I'm right (now) then that leaves the possessive quantifiers as the only non-reversible feature. With that, you have established your point very well indeed.
 Signature John Bollinger jobollin@indiana.edu
Chris Uppal - 16 Nov 2005 10:31 GMT > > I have no idea about the possessive qualifier. But then I don't care > > either -- it's an aesthetic disaster, a theoretic horror, and I hope [quoted text clipped - 5 lines] > regex patterns. Perhaps in itself that is reflective of the possessive > quantifiers' perversion. That's a good way to think of it. Should I ever write a regexp reverser, I shall throw a FeatureInsupportableException in such cases ;-)
BTW, there's also the zero-width-look{ahead/behind} operator. That, if I've understood it correctly (Sun, understandably don't care to admit what it actually does) can be handled trivially in the case where the lookahead/behind is a fixed string, otherwise it's another job for FeatureInsupportableException I fear.
-- chris
Free MagazinesGet these publications absolutely FREE for up to 12 months. There are no hidden fees and no obligation. Simply choose a title, complete the application form and submit it. Read more ...
|
|
|