Home | Contact Us | FAQ | Search & Site Map | Link to Us
Sign In | Join | Other 45 Sites in Network
HomeAnnouncementsWhite Papers
Discussion GroupsFirst AidDatabasesJavaBeansGUIJava 3DVirtual MachineCORBASecurityToolsGeneral
Java DirectoryOpen Source ProjectsSample Book ChaptersUser GroupsWeb Resources
Related Topics
Databases.NETMore Topics ...

Java Forum / General / November 2005

Tip: Looking for answers? Try searching our database.

boolean endsWith(String s, Pattern pattern)

Thread view: 
lepikhin@gmail.com - 07 Nov 2005 14:44 GMT
Anybody hear about the

boolean endsWith(String s, Pattern pattern)

implementations?
lepikhin@gmail.com - 07 Nov 2005 14:46 GMT
PS. I know about "something$"-like regexp
Roedy Green - 07 Nov 2005 15:43 GMT
>boolean endsWith(String s, Pattern pattern)
The source code for all the regex stuff is there for you to look at in
src.zip.

IIRC it compiles Patterns to commands to a state machine interpreter,
not byte code.
Signature

Canadian Mind Products, Roedy Green.
http://mindprod.com Java custom programming, consulting and coaching.

Rhino - 07 Nov 2005 18:53 GMT
> Anybody hear about the
>
> boolean endsWith(String s, Pattern pattern)
>
> implementations?

Since a boolean can only contain the boolean values 'true' and 'false', how
could it possibly contain a String? Why would anyone write a method to
search for Strings within a boolean?

Rhino
Roedy Green - 08 Nov 2005 04:14 GMT
On Mon, 7 Nov 2005 13:53:09 -0500, "Rhino"
<no.offline.contact.please@nospam.com> wrote, quoted or indirectly
quoted someone who said :

>> boolean endsWith(String s, Pattern pattern)
>>
[quoted text clipped - 3 lines]
>could it possibly contain a String? Why would anyone write a method to
>search for Strings within a boolean?
huh?

I think he is asking does string s end with given pattern yes or no.

he can't make it an instance method of String since String is final.
He left out the word "static", though I suppose it could be
implemented on some unrelated object.

Signature

Canadian Mind Products, Roedy Green.
http://mindprod.com Java custom programming, consulting and coaching.

lepikhin@gmail.com - 08 Nov 2005 12:40 GMT
> I think he is asking does string s end with given pattern yes or no.
Yes. Sorry for ambiguous problem statement.

Probmlem with "something$"-like regexps is, that you need to look
through the whole string. It's very slow.
Roedy Green - 08 Nov 2005 13:34 GMT
>Probmlem with "something$"-like regexps is, that you need to look
>through the whole string. It's very slow.

it depends on how clever the Pattern compiler is.  A smart one might
scan the string backwards.
Signature

Canadian Mind Products, Roedy Green.
http://mindprod.com Java custom programming, consulting and coaching.

lepikhin@gmail.com - 08 Nov 2005 14:36 GMT
> it depends on how clever the Pattern compiler is.  A smart one might
> scan the string backwards.

Unfortunately Java Pattern compiler (java.util.regex.Pattern) is
stupid. I've checked it.
Roedy Green - 08 Nov 2005 14:48 GMT
>> it depends on how clever the Pattern compiler is.  A smart one might
>> scan the string backwards.
>
>Unfortunately Java Pattern compiler (java.util.regex.Pattern) is
>stupid. I've checked it.

In that case, if this were a bottleneck, you could help the regex
along by chopping off all but the last N characters.
Signature

Canadian Mind Products, Roedy Green.
http://mindprod.com Java custom programming, consulting and coaching.

Chris Uppal - 08 Nov 2005 16:01 GMT
> > it depends on how clever the Pattern compiler is.  A smart one might
> > scan the string backwards.
>
> Unfortunately Java Pattern compiler (java.util.regex.Pattern) is
> stupid. I've checked it.

Construct your regexp backwards, reverse the string, and look for a match at
the beginning of that...

   -- chris
lepikhin@gmail.com - 09 Nov 2005 12:17 GMT
> Construct your regexp backwards
It's difficult, for some reasons.

Thanks anyway.
John C. Bollinger - 10 Nov 2005 12:42 GMT
In an earlier post, lepikhin@gmail.com wrote:
>Unfortunately Java Pattern compiler (java.util.regex.Pattern) is
>stupid. I've checked it.

And in a different one:
>>Construct your regexp backwards
>
> It's difficult, for some reasons.

My apologies, but does that mean you're stupid, too?  I mean, you
criticize the regex compiler as stupid for not being able to do a job
equivalent to the one you say is too difficult for *you*.  Regexes do
not easily reverse; as far as I know, there is no general algorithm for
"reversing" a regex, so it is unsurprising that Pattern doesn't know how
to do it.

You started out asking how to determine whether a string ends with a
match for a particular pattern.  The usual approach is to prepend ".*"
to the pattern and match the whole string with it.

Signature

John Bollinger
jobollin@indiana.edu

Chris Uppal - 11 Nov 2005 17:29 GMT
> Regexes do
> not easily reverse; as far as I know, there is no general algorithm for
> "reversing" a regex, so it is unsurprising that Pattern doesn't know how
> to do it.

Why do you say that ?  Am I missing something, or are you talking about the
difficulty of reversing (some of) the Perl-ish extensions to the classic regexp
concept ?

It seems (to me) that reversing a classic regexp is straightforward using the
following (rather redundant) transformations on the parsed AST representation
of the regexp:

   reverse( c ) -> c
   reverse( . ) -> .
   reverse( [a-b] ) -> [a-b]
   reverse( R1 | R2 ) -> reverse(R1) | reverse(R2)
   reverse( R1R2 ) -> reverse(R2) reverse(R1)
   reverse( ( R ) ) -> ( reverse(R) )
   reverse( R * ) -> reverse(R) *
   reverse( R + ) -> reverse(R) +
   reverse( ^ R) -> reverse(R) $
   reverse( R $ ) -> ^ reverse(R)

(where a, b,  and c are characters and R, R1, and R1 are regexps).  I have no
idea what some of the Perl-ish extensions would even mean if reversed, but
"classic" regexps are good enough for me, and perhaps for most (legitimate)
applications.

   -- chris
John C. Bollinger - 12 Nov 2005 02:54 GMT
>>Regexes do
>>not easily reverse; as far as I know, there is no general algorithm for
[quoted text clipped - 4 lines]
> difficulty of reversing (some of) the Perl-ish extensions to the classic regexp
> concept ?

I am talking about reversing regexes expressed in the language supported
by java.util.regex.Pattern, a language even more feature-laden than
Perl's regex language.

> It seems (to me) that reversing a classic regexp is straightforward using the
> following (rather redundant) transformations on the parsed AST representation
[quoted text clipped - 15 lines]
> "classic" regexps are good enough for me, and perhaps for most (legitimate)
> applications.

Your argument for how that particular regex language can be reversed is
convincing, but you don't have to add much to the language to make the
task a lot harder.  How about reluctant quantifiers, for instance?  We
had a different thread in the past week where the regex engine's
leftmost matching behavior with respect to reluctant quantifiers
produced a result that it would be tricky to reproduce working from the
other end.  (This is a regex feature also supported by Perl, and one
with some good uses.)

It might be possible (though nontrivial) to solve the problem with
reluctant quantifiers, but what about greedy ones?  Those are too
closely tied to details of the engine's matching behavior to admit
reversal, I think.

Signature

John Bollinger
jobollin@indiana.edu

Roedy Green - 12 Nov 2005 03:44 GMT
On Fri, 11 Nov 2005 21:54:41 -0500, "John C. Bollinger"
<jobollin@indiana.edu> wrote, quoted or indirectly quoted someone who
said :

>I am talking about reversing regexes expressed in the language supported
>by java.util.regex.Pattern, a language even more feature-laden than
>Perl's regex language.

A optimisation does not need to handle all cases. It is perfectly
legit to cherry pick the easy ones to optimise.
Signature

Canadian Mind Products, Roedy Green.
http://mindprod.com Java custom programming, consulting and coaching.

Chris Uppal - 14 Nov 2005 10:37 GMT
> > Why do you say that ?  Am I missing something, or are you talking about
> > the difficulty of reversing (some of) the Perl-ish extensions to the
[quoted text clipped - 3 lines]
> by java.util.regex.Pattern, a language even more feature-laden than
> Perl's regex language.

Right.  I tend to ignore the non-classical extensions, myself.

> Your argument for how that particular regex language can be reversed is
> convincing, but you don't have to add much to the language to make the
> task a lot harder.  How about reluctant quantifiers, for instance?

It seems to me that the greedy/reluctant distinction is only about handling
ambiguous "parses".  That's to say that a switching between greedy and
reluctant never changes the language that the regexp recognises, but only
changes the way that sub-regexps are assigned to substrings in the input
sequence.  (I'm assuming that the regexp is being used to make a simple Boolean
judgement "does this string match?", the streaming mode is obviously
different).  For instance, given the pattern:
   /a*a*/
and the input sequence:
   aaaa
there are various possible parses:
   ()(aaaa)
   (a)(aaa)
   (aa)(aa)
   (aaa)(a)
   ()(aaaa)
and the choice amongst them is determined by the greediness of the matches.  I
haven't been able to prove it (possibly because I lack a precise definition of
"reluctant", possibly because I can't think straight), but I conjecture that
inverting the greediness of each subexpression while reversing the pattern
would produce the "same" parse of the reversed string.

I have no idea about the possessive qualifier.  But then I don't care either --
it's an aesthetic disaster, a theoretic horror, and I hope (and expect) never
to see any legitimate use for it.

Beyond those cases, the only non-classical features of the Java regexp package
(I think) are: The wide palette of character classes, which can be seen as
syntactic sugar for character ranges, or even for "raw" alternation, and which
reverse trivially.  The various end-of-word, end-of-line, etc,
pseudo-characters.  The pseudo-characters present a problem if the set is not
closed under reversal, I don't know whether it is (CR-LF, anyone ?). And the
backref feature, which obviously doesn't reverse.

   -- chris
John C. Bollinger - 15 Nov 2005 03:17 GMT
>>Your argument for how that particular regex language can be reversed is
>>convincing, but you don't have to add much to the language to make the
[quoted text clipped - 7 lines]
> judgement "does this string match?", the streaming mode is obviously
> different).

You're right.  I was thinking more about assigning parts of the matched
string to groups the same way, and not as much about the overall "does
this string match?", which is the most important aspect of the question
the OP asked (somewhere back there in the distance...).

[...]

>                                                   I conjecture that
> inverting the greediness of each subexpression while reversing the pattern
> would produce the "same" parse of the reversed string.

Now that would be a tidy result.  A little prodding at it didn't make a
counterexample fall out, but I'm too lazy to try to really prove it.

> I have no idea about the possessive qualifier.  But then I don't care either --
> it's an aesthetic disaster, a theoretic horror, and I hope (and expect) never
> to see any legitimate use for it.

Do I sense a mild distaste for the feature? :^)
In truth, I agree with you.  I raised the issue only because I think
they're a certain killer of general-purpose reversal of arbitrary Java
regex patterns.  Perhaps in itself that is reflective of the possessive
quantifiers' perversion.

> Beyond those cases, the only non-classical features of the Java regexp package
> (I think) are: The wide palette of character classes, which can be seen as
[quoted text clipped - 3 lines]
> closed under reversal, I don't know whether it is (CR-LF, anyone ?). And the
> backref feature, which obviously doesn't reverse.

As I consider it, I think perhaps even a backreference is reversible by
exchanging the places of the last reference to each group and its
referrent, and fixing up the group indices in the references.  I don't
see a pseudo-character for a CR/LF pair, but such a combination is
recognized as a single line terminator when not in UNIX_LINES mode.  I
think that could be handled by use of the available individual
metacharacters for CR and LF.  If I'm right (now) then that leaves the
possessive quantifiers as the only non-reversible feature.  With that,
you have established your point very well indeed.

Signature

John Bollinger
jobollin@indiana.edu

Chris Uppal - 16 Nov 2005 10:31 GMT
> > I have no idea about the possessive qualifier.  But then I don't care
> > either -- it's an aesthetic disaster, a theoretic horror, and I hope
[quoted text clipped - 5 lines]
> regex patterns.  Perhaps in itself that is reflective of the possessive
> quantifiers' perversion.

That's a good way to think of it.  Should I ever write a regexp reverser, I
shall throw a FeatureInsupportableException in such cases ;-)

BTW, there's also the zero-width-look{ahead/behind} operator.  That, if I've
understood it correctly (Sun, understandably don't care to admit what it
actually does)  can be handled trivially in the case where the lookahead/behind
is a fixed string, otherwise it's another job for FeatureInsupportableException
I fear.

   -- chris


Free Magazines

Get these publications absolutely FREE for up to 12 months. There are no hidden fees and no obligation. Simply choose a title, complete the application form and submit it. Read more ...

Oracle MagazineNetwork ComputingComputer WorldBio-IT WorldeWeekInformation WeekInfosecurity
 
Sign In
Join
My Latest Posts
My Monitored Threads
My Blog
My Photo Gallery
My Profile
My Homepage

Start New Thread
Enable EMail Alerts
Rate this Thread



©2009 Advenet LLC   Privacy Policy - Terms of Use
This website includes both content owned or controlled by Advenet as well as content owned or controlled by third parties.