Home | Contact Us | FAQ | Search & Site Map | Link to Us
Sign In | Join | Other 45 Sites in Network
HomeAnnouncementsWhite Papers
Discussion GroupsFirst AidDatabasesJavaBeansGUIJava 3DVirtual MachineCORBASecurityToolsGeneral
Java DirectoryOpen Source ProjectsSample Book ChaptersUser GroupsWeb Resources
Related Topics
Databases.NETMore Topics ...

Java Forum / General / February 2007

Tip: Looking for answers? Try searching our database.

Parens Matching

Thread view: 
Ken Kast - 16 Feb 2007 03:34 GMT
Does anyone have a regular expression pattern that would include a test for
balanced parens of arbitrary nestedness?
IchBin - 16 Feb 2007 04:01 GMT
> Does anyone have a regular expression pattern that would include a test for
> balanced parens of arbitrary nestedness?

Is arbitrary 'nestedness' some new type of epistemological terminology?

Sorry, just could not stop myself.

Signature

Thanks in Advance...           http://weconsultants.prophp.org
IchBin, Pocono Lake, Pa, USA   http://ichbinquotations.awardspace.com
______________________________________________________________________
'If there is one, Knowledge is the "Fountain of Youth"'
-William E. Taylor, Regular Guy (1952-)

Ken Kast - 16 Feb 2007 04:54 GMT
Nope. Something I made up in order to make the message not so dry.

>> Does anyone have a regular expression pattern that would include a test
>> for balanced parens of arbitrary nestedness?
>
> Is arbitrary 'nestedness' some new type of epistemological terminology?
>
> Sorry, just could not stop myself.
Michael - 16 Feb 2007 07:12 GMT
> Does anyone have a regular expression pattern that would include a test for
> balanced parens of arbitrary nestedness?

No.  No one does.

There's a well known proof in Computer Science theory that it is not
possible to create a regular expression for balanced parentheses.

Look up "the pumping lemma" for details.

Michael
Mark Jeffcoat - 16 Feb 2007 12:41 GMT
>> Does anyone have a regular expression pattern that would include a test for
>> balanced parens of arbitrary nestedness?
[quoted text clipped - 3 lines]
> There's a well known proof in Computer Science theory that it is not
> possible to create a regular expression for balanced parentheses.

Michael is quite right. Of course, that proof is silent
on the subject of creating a regular expression that accepts
balanced strings with at most Long.MAX_VALUE open parens.
(And really, most texts will have fewer than half that many.)

While you're waiting for the program you wrote to write that
regular expression to finish, you can pass the time with a
counter that increments on '(', and decrements on ')'.

Signature

Mark Jeffcoat
Austin, TX

Alex Hunsley - 17 Feb 2007 12:21 GMT
>>> Does anyone have a regular expression pattern that would include a test for
>>> balanced parens of arbitrary nestedness?
[quoted text clipped - 6 lines]
> on the subject of creating a regular expression that accepts
> balanced strings with at most Long.MAX_VALUE open parens.

Although in theory you could actually count this many with a single long:

  Long.MAX_VALUE - Long.MIN_VALUE + 1

I have a feeling this expression is equal to:

  - (2 * Long.MIN_VALUE)
Tim Slattery - 16 Feb 2007 14:15 GMT
>> Does anyone have a regular expression pattern that would include a test for
>> balanced parens of arbitrary nestedness?
[quoted text clipped - 3 lines]
>There's a well known proof in Computer Science theory that it is not
>possible to create a regular expression for balanced parentheses.

OTOH, it would be pretty simple to write a loop that cycles through a
string and keeps track of left and right parens. Add 1 to the counter
when you find a left paren, subtract one for a right paren. If the
counter ever goes negative, throw an error. If it's not zero at the
end of the string, throw an error.

--  
Tim Slattery
Slattery_T@bls.gov
http://members.cox.net/slatteryt
Michael - 16 Feb 2007 19:14 GMT
> >There's a well known proof in Computer Science theory that it is not
> >possible to create a regular expression for balanced parentheses.
[quoted text clipped - 4 lines]
> counter ever goes negative, throw an error. If it's not zero at the
> end of the string, throw an error.

I guess I should have spelled it out further - you can't match
balanced parentheses *with a regular expression.*  Of course, there
are other ways to do it.  Either write the kind or program above, or
write a recursive decent parser, (or use a regular expression that
only matches up to N balanced parentheses) etc.  Basically, balanced
parentheses aren't a regular grammar, they're a context free grammar,
so you need a more powerful tool.  In particular, they're a grammar
somthing like this:
   Expression -> nothing | ( Expression )

Sorry if I caused more confusion that I removed.  It was late.

Michael
Alex Hunsley - 17 Feb 2007 01:27 GMT
>>> Does anyone have a regular expression pattern that would include a test for
>>> balanced parens of arbitrary nestedness?
[quoted text clipped - 8 lines]
> counter ever goes negative, throw an error. If it's not zero at the
> end of the string, throw an error.

Although even this isn't correct, in theory... due to the finite memory
of the computer, it is possible to give it a sequence that should pass
(the brackets match), but won't pass, because the computer runs out of
memory. Ok, a little silly, but...

> --  
> Tim Slattery
> Slattery_T@bls.gov
> http://members.cox.net/slatteryt
Lew - 17 Feb 2007 04:36 GMT
> Although even this isn't correct, in theory... due to the finite memory
> of the computer, it is possible to give it a sequence that should pass
> (the brackets match), but won't pass, because the computer runs out of
> memory. Ok, a little silly, but...

Not so silly, really. It points to a valuable principle of workaday
programming, nay, engineering overall. When the theory says there is no
perfect algorithm, the practice suggests to use an imperfect one. You code to
handle the realistic cases and discontinuously reject inputs that exceed the
algorithm's ability, like the memory-hog example.

This way of thinking also leads one to jump out of "What regular expression?"
into "What algorithm?"

- Lew
Alex Hunsley - 17 Feb 2007 12:19 GMT
>> Although even this isn't correct, in theory... due to the finite memory
>> of the computer, it is possible to give it a sequence that should pass
[quoted text clipped - 9 lines]
> This way of thinking also leads one to jump out of "What regular
> expression?" into "What algorithm?"

Good points. I have also touched on this subject with a reply I just
wrote to a post from Daniel. In Comp Sci. proper, when you're talking
about bracket matching, you want a machine that solves it for *any*
input. So in that sense, it can't be done by a normal real world machine...

lex

> - Lew
Chris Uppal - 17 Feb 2007 17:15 GMT
> Although even this isn't correct, in theory... due to the finite memory
> of the computer, it is possible to give it a sequence that should pass
> (the brackets match), but won't pass, because the computer runs out of
> memory. Ok, a little silly, but...

You can do it in constant space for arbitrary length inputs /if/ you are
allowed to use the input as working space too -- just replace '()' by '' until
there's nothing left.

A more elegant method would be to keep the unclosed-bracket count in unbounded
integer representation (BigInteger-style), stored in (overwriting) the portion
of the input sequence which has already been scanned.

   -- chris
Alex Hunsley - 18 Feb 2007 20:08 GMT
>> Although even this isn't correct, in theory... due to the finite memory
>> of the computer, it is possible to give it a sequence that should pass
[quoted text clipped - 4 lines]
> allowed to use the input as working space too -- just replace '()' by '' until
> there's nothing left.

Yay! That's the Turing Machine way. Overwrite the tape...
There was something perversely satisfying about 'writing' Turing
machines at uni to solve problems like multiplication of two numbers (in
binary). The first and original "virtual machine"!

I love the idea that it wouldn't be hard to build an actual (finite!)
Turing machine out of bits of something or other, and have toyed with
that idea at various times.

> A more elegant method would be to keep the unclosed-bracket count in unbounded
> integer representation (BigInteger-style), stored in (overwriting) the portion
> of the input sequence which has already been scanned.
Alex Hunsley - 17 Feb 2007 01:26 GMT
>> Does anyone have a regular expression pattern that would include a test for
>> balanced parens of arbitrary nestedness?
[quoted text clipped - 5 lines]
>
> Look up "the pumping lemma" for details.

That thang always made me think of either pumping lemons or pumping
lemmings.

The matched parenthesis problem can be done with a non-deterministic
state machine, but not with a deterministic one.... hence no regular
expression can do it.
Daniel Pitts - 17 Feb 2007 01:44 GMT
> >> Does anyone have a regular expression pattern that would include a test for
> >> balanced parens of arbitrary nestedness?
[quoted text clipped - 12 lines]
> state machine, but not with a deterministic one.... hence no regular
> expression can do it.

Hmm, I don't think your definition is correct.
Regex isn't always implementedwith a DSM, it might use NDA, which
still doesn't solve it.

In either case, the following is a deterministic state machine solves
the problem.

private static boolean isBalancedCount(String string) {
       int parens = 0;
       for (char c : string.toCharArray()) {
           if (c == '(') {
               ++parens;
           }
           if (c == ')' && --parens < 0) {
               return false;
           }
       }
       return parens == 0;
   }
Alex Hunsley - 17 Feb 2007 12:16 GMT
>>>> Does anyone have a regular expression pattern that would include a test for
>>>> balanced parens of arbitrary nestedness?
[quoted text clipped - 12 lines]
> Regex isn't always implementedwith a DSM, it might use NDA, which
> still doesn't solve it.

What do you mean by NDA? Non-deterministic automaton?

> In either case, the following is a deterministic state machine solves
> the problem.
[quoted text clipped - 11 lines]
>         return parens == 0;
>     }

Nope, it doesn't solve the problem. The int type in Java has finite
capacity - see Integer.MAX_VALUE - so more accurately this a finite
deterministic state machine. I'm actually trying to remember if
'deterministic state machine' usually implies the 'finite' part in
common usage. A look at wikipedia seems to suggest that, but I wasn't
looking at it for long...

So, anyway, I can think of an input that causes the above code to fail -
and hence problem not solved.

To give a simplified example of this, suppose we have a Java where
Integer.MAX_VALUE = 1, and .MIN_VALUE = -2. So an int can have 4
distinct values for int. Assuming this, would would your code save about
the following inputs?

(((( - this would be a false positive
(((((((( - another false positive
((())) - this would be a false negative

Similarly, your code above in the world of real Java would give a false
positive for the input:

   '(' * [(Integer.MAX_VALUE - Integer.MIN_VALUE + 1) * n]

where n is any positive integer.

Basically, true parens matching is scuppered, because of the need for
infinite memory!

lex
Daniel Pitts - 18 Feb 2007 18:58 GMT
> >>>> Does anyone have a regular expression pattern that would include a test for
> >>>> balanced parens of arbitrary nestedness?
[quoted text clipped - 61 lines]
>
> lex

(((( and (((((((( both give the correct output.
((())) also results in the correct output. Hmm, are you sure you
analyzed my algorithm correctly?

As for the case of:
  '(' * [(Integer.MAX_VALUE - Integer.MIN_VALUE + 1) * n]

You're talking about at least a 4 gigabyte string.  As a mater of
practicality, I find that type of input to be unreasonable, but it
would be simple enough to change the code to hand off to a different
algorithm:

private static boolean isBalancedCount(String string) {
   int parens = 0;
   for (char c : string.toCharArray()) {
       if (c == '(' && parens++ == Integer.MAX_INT) {
           // We can't handle the trueth, but isBalancedRe can!
           return isBalancedRe(string);
       }
       if (c == ')' && --parens < 0) {
           return false;
       }
   }
   return parens == 0;
}
Alex Hunsley - 18 Feb 2007 19:52 GMT
>> To give a simplified example of this, suppose we have a Java where
>> Integer.MAX_VALUE = 1, and .MIN_VALUE = -2. So an int can have 4
[quoted text clipped - 20 lines]
> ((())) also results in the correct output. Hmm, are you sure you
> analyzed my algorithm correctly?

I wasn't talking about your code running under normal Java, I was
talking about a pretend one with very small Integer.MAX_VALUE etc. To
quote myself: "To give a simplified example of this, suppose we have a
Java where ... [stuff] ... assuming this, what would your code [say]
about the following inputs?" I was making a simplification to make a
more accessible toy example where it was easy to play with the cases
where it would produce the wrong answers for 'extreme' cases.

> As for the case of:
>    '(' * [(Integer.MAX_VALUE - Integer.MIN_VALUE + 1) * n]
[quoted text clipped - 3 lines]
> would be simple enough to change the code to hand off to a different
> algorithm:

I think we're coming back to the issue mentioned before about theory
versus practicality: I do know that your code works for all practical
purposes - it's correct for all practical purposes - as opposed to the
computer science theoretic sense in which bracket matching can't be done
(by a regular expression or by a FSA or by a finite computer).

I'm just being too theoretic for my own good, is all....

lex
Oliver Wong - 21 Feb 2007 17:21 GMT
>>> The matched parenthesis problem can be done with a non-deterministic
>>> state machine, but not with a deterministic one.... hence no regular
>>> expression can do it.

   I disagree, because a proof (by construction) exists which shows that
any non-deterministic state machine can be converted to a determinisitc
state machine. If the NDSM has K states, then DSM will have 2^K states. If
you allow for infinite NDSM, it only seems fair that you also allow for
infinite DSMs.

>> Hmm, I don't think your definition is correct.
>> Regex isn't always implementedwith a DSM, it might use NDA, which
>> still doesn't solve it.
>
> What do you mean by NDA? Non-deterministic automaton?

Probably. I'll come back to this in a moment later in this post.

>> In either case, the following is a deterministic state machine solves
>> the problem.
[quoted text clipped - 18 lines]
> usage. A look at wikipedia seems to suggest that, but I wasn't looking at
> it for long...

   The usually, when someone talks about deterministic automatons and
non-deterministic automatons, they mean finite ones. So the usual
abbreviations are DFA and NDFA. We can, of course, talk about infinite
automatons, but there seems to be less literature on that topic (perhaps
because it's equivalent to, but more complex than, some other machine, e.g.
a push down stack machine or a Turing Machine or something, I don't know).

   So if you're talking about finite DAs and Nodes, then they cannot solve
the "balance parentheses" problem. If you're talking about infinite ones,
then if a NDA can solve it, so can a DA. And if a NDA cannot solve it,
neither can a DA. The two are equivalent in power.

   The interests in NDAs is that they take up less "code" to represent
(recall that a K state NDA translates into a 2^K state DA), *and* (and
here's the part that makes them interesting to study nowadays) a quantum
computer can run NDAs "natively", whereas a classical computer would have to
emulate the NDA by converting it (either implicitly or explicitly) into a
DA, and gain that 2^K performance penalty factor.

   - Oliver
Daniel Pitts - 21 Feb 2007 18:03 GMT
> >>> The matched parenthesis problem can be done with a non-deterministic
> >>> state machine, but not with a deterministic one.... hence no regular
[quoted text clipped - 57 lines]
>
>     - Oliver

Its important to say that an NDA with K states has an equivalent DA
with at most 2^K states. The set of all NDA contains the set of all
Deterministic Automota. Determinism is a property of the stateful
automata.

For example, if you have the simple regex "abcd", the NDA and DA
produce the same graph, since it is only possible to be in one state
at a time. (s)->a->b->c->d->(e)

Whereas "a[bc]d" produces the NDA
 (s)->a
     / \
    b   c
     \ /
      d->(e)
and the DA
(s)->a->b and c->d->(e)

The DA actually has fewer "states" than the NDA inthis case (as b and
c are combined)
Christian - 21 Feb 2007 19:03 GMT
Daniel Pitts schrieb:

>>>>> The matched parenthesis problem can be done with a non-deterministic
>>>>> state machine, but not with a deterministic one.... hence no regular
[quoted text clipped - 72 lines]
> The DA actually has fewer "states" than the NDA inthis case (as b and
> c are combined)

for me important would be to say that java's regexp seem not to be
equivalent to regexp in science (which have the same power as DFA or NFA
or epsilon-FA)

javas regexp allow you to backreference something you mached earlier
which might make them as powerful as a ContexFree-grammar/PDA (may be I
don't know, just a guess)

so it seems for me currently not proven that you can't do this parens
matching...
but this seems to be rather scientific intrest...there are better ways
to do it...
Oliver Wong - 21 Feb 2007 19:09 GMT
[long discussion snipped]

> Daniel Pitts schrieb:
>>
>> Its important to say that an NDA with K states has an equivalent DA
>> with at most 2^K states.

Right.

>> The set of all NDA contains the set of all
>> Deterministic Automota. Determinism is a property of the stateful
>> automata.

Right, that's what I intended to convey.

[...]

> for me important would be to say that java's regexp seem not to be
> equivalent to regexp in science (which have the same power as DFA or NFA
> or epsilon-FA)

   Most modern languages's regexp implementation seem to be derived from
Perl's regexp implementation, which was already more powerful than
"classical" regular expressions, DFAs and NFAs.

   - Oliver
Gordon Beaton - 22 Feb 2007 09:16 GMT
> Most modern languages's regexp implementation seem to be derived
> from Perl's regexp implementation, which was already more powerful
> than "classical" regular expressions, DFAs and NFAs.

Most Perl regular expressions can be rewritten using "classical"
regexps, IIUC it's only backreferences that can't.

On the other hand, the Perl (Java, Python, Ruby, etc) regexp
implementation is extremely *slow*:

 http://swtch.com/~rsc/regexp/regexp1.html

/gordon

Signature

[ don't email me support questions or followups ]
g o r d o n  +  n e w s  @  b a l d e r 1 3 . s e

Chris Uppal - 22 Feb 2007 13:22 GMT
> On the other hand, the Perl (Java, Python, Ruby, etc) regexp
> implementation is extremely *slow*:
>
>   http://swtch.com/~rsc/regexp/regexp1.html

Thank you!   I had come across that page recently, and was just wondering how I
was going to find it again to mention here...

(One caveat: he says that the NFA-based approaches are "little-known" -- that
seems something of an overstatement to me.  I doubt if there are many people
with an interest in the /implementation/ of regexps who don't know about
NFA/DFA construction, at least in general terms.  If all he means is that most
/users/ of Perl-style regexps don't realise that there's a performance price
for the complicated hacks, then I have no disagreement.)

   -- chris
Gordon Beaton - 22 Feb 2007 13:34 GMT
> Thank you! I had come across that page recently, and was just
> wondering how I was going to find it again to mention here...

For me it was exactly the opposite - I saw it on the erlang-questions
list just yesterday and had to figure out where I had seen *this*
thread in order to post it...

/gordon

Signature

[ don't email me support questions or followups ]
g o r d o n  +  n e w s  @  b a l d e r 1 3 . s e

Oliver Wong - 22 Feb 2007 15:09 GMT
> For example, if you have the simple regex "abcd", the NDA and DA
> produce the same graph, since it is only possible to be in one state
[quoted text clipped - 11 lines]
> The DA actually has fewer "states" than the NDA inthis case (as b and
> c are combined)

   I was thinking about these comments a bit more, and I realized I don't
really understand your notation. In particular, I have no idea how to read
your DA. What you call an NDA looks like a DA to me. That is,

(s)->a
   / \
  b   c
   \ /
    d->(e)

looks like a *deterministic* finite automaton, in that there's only ever at
most 1 possible transition you can take given the next input token and the
current state. To make it an NDA, you'd probably need something like:

(s)->a
   /|\
  / | \
 b  b  c
 |  |  |
 h  i  j
  \ | /
   \|/
    d->(e)

(which implements the regexp: "a(bh|bi|cj)d")

where if you're at the top state, and you see a b, you don't know whether to
take the left one or the middle one.

   But even then, I think (but cannot recall precisely) that the way the
terms NFA and DFA were defined to me, every DFA is considered to also be an
NFA (and so perhaps the N should stand for "not necessarily deterministic"
rather than "non-determistic"), but not every NFA is also a DFA.

   - Oliver
Chris Uppal - 22 Feb 2007 16:18 GMT
>     I was thinking about these comments a bit more, and I realized I don't
> really understand your notation. In particular, I have no idea how to read
> your DA.

I think Daniel's notation is non-standard, but it does make seem to make sense.
What he's doing is removing the states themselves from the picture, leaving
only the trails of input words that have been recognised.

So in:

> Whereas "a[bc]d" produces the NDA
>  (s)->a
>      / \
>     b   c
>      \ /
>       d->(e)

the first 'a' is not a state label, but marks the fact that we have accepted an
'a'.  The actual state we are in has no name (in this notation) but is sort of
hidden "just after" the label 'a'.  From that state we can transition by
accepting 'b' to an anonymous state hidden behind the label 'b', or can
transition by accepting 'c' to the anonymous state hidden behind the 'c' label.
And from either of those states we can accept 'd' to go to the corresponding
state, and since that is a final state, we have matched the input.  And the NFA
does indeed recognise a(b|c)d.

I don't think the notation covers all possible cases, for all it is nice and
clear for the cases it /does/ cover.  The problems comes when one state has two
or more incoming edges with different labels (including epsilon transitions):

(S0)---x---> (S1)
|            |
a            y
|            |
v            |
(S2)---b--->[S3]

(I hope my ASCII art is legible).  S0 is the start state, from which we can
transition via 'x' to S1, or via 'a' to S2.  From S1 and S2 we can transition
via 'b' or 'y' (respectively) to S3, the accepting state.   Thus it recognises
(ab|xy).  I don't think that can be represented in Daniel's notation without
changing the definition of the NFA itself.

I don't know whether every regexp can be represented as /some/ NFA in Daniel's
notation.  I haven't been able to think of a counter example yet, although the
NFA is sometimes more awkward than one which can be written in the classic
diagram format.

>     But even then, I think (but cannot recall precisely) that the way the
> terms NFA and DFA were defined to me, every DFA is considered to also be
> an NFA (and so perhaps the N should stand for "not necessarily
> deterministic" rather than "non-determistic"), but not every NFA is also
> a DFA.

It depends on how you see it.  Every DFA corresponds trivially to an NFA with
the same structure, so in that sense the DFAs are a subset of the NFAs.  But
the formal definition of a DFA is different from that of a NFA, since it has a
/single/' state at any one time rather than a set of states.  And of course
type Set<State> is not assignable to, or from, type State.

   -- chris
Daniel Pitts - 22 Feb 2007 18:49 GMT
On Feb 22, 8:18 am, "Chris Uppal" <chris.up...@metagnostic.REMOVE-
THIS.org> wrote:
> >     I was thinking about these comments a bit more, and I realized I don't
> > really understand your notation. In particular, I have no idea how to read
[quoted text clipped - 3 lines]
> What he's doing is removing the states themselves from the picture, leaving
> only the trails of input words that have been recognised.

It is indeed non-standard.  I was trying to represent transitions more
than states, and was trying to do so concisely.

> So in:
>
[quoted text clipped - 35 lines]
> NFA is sometimes more awkward than one which can be written in the classic
> diagram format.
I wasn't trying to create an all-purpose notation, I doubt it works
for all regexp, but it might.

> >     But even then, I think (but cannot recall precisely) that the way the
> > terms NFA and DFA were defined to me, every DFA is considered to also be
[quoted text clipped - 7 lines]
> /single/' state at any one time rather than a set of states.  And of course
> type Set<State> is not assignable to, or from, type State.
In Java, that may be true, but in implementing a NFA-to-DFA converter,
you may choose to create a class StateSet implements State, since
every State in the DFA results from a set of states in the NDA.

Daniel.
Chris Uppal - 23 Feb 2007 17:28 GMT
>     The interests in NDAs is that they take up less "code" to represent
> (recall that a K state NDA translates into a 2^K state DA), *and* (and
> here's the part that makes them interesting to study nowadays) a quantum
> computer can run NDAs "natively", whereas a classical computer would have
> to emulate the NDA by converting it (either implicitly or explicitly)
> into a DA, and gain that 2^K performance penalty factor.

BTW: As far as I can tell, the quantum theorists are claiming that a quantum
FSA can recognise more than just the regular languages.

(Personally, I think that if that claim is true, and if it has physical
meaning, then all it means is that the things claimed to be quantum analogues
of FSAs are not actually analogous.)

   -- chris
Oliver Wong - 23 Feb 2007 19:57 GMT
>>     The interests in NDAs is that they take up less "code" to represent
>> (recall that a K state NDA translates into a 2^K state DA), *and* (and
[quoted text clipped - 13 lines]
> analogues
> of FSAs are not actually analogous.)

   I've never heard of these quantum FSAs, but they're probably not what
I was referring to in my quantum-computer-natively-running-NDA statement.

   Usually, when a classical computer tries to run an NDA, it either
converts it to an equivalent DFA (and thus risks getting an explosion of
states), or it keeps a set of all possible states the NDA could be in. For
example, if the classical computer thinks that the current possible set of
states that the NDA could be in is {s6}, and the NDA graph looks something
like...

           ,--a--->(s7)-->...
          /
...--->(s6)----a--->(s8)-->...
          \
           `--b--->(s9)-->...

   ... then upon encountering the token "a", the classical
single-processor computer will consume that token, and then update the set
of possible states to be {s7, s8}. Then it needs to look at the next
token, and compare that to the successors of s7, and iteratively, look at
the successors of s8. And you need to do this for each token in the input
string you're trying to accept. So the overall running time is something
like O(n*l), where n is the number of states, and l is the length of the
input. With multi-processors, you can start to do these computations in
parallel, but you're limited by the number of processors: a P-processor
classical computer could only analyze P states at a time, and in the worst
case NDA, you might not have any idea what state you're in at all, and so
you'd need as many processors as there are nodes the graph. P is constant
with respect to the problem size, and so the running time is still
O(n*l/P) == O(n*l).

   Quantum physics is nowhere near my expertise, but my understanding is
that with a quantum computer, there's some sort of natural fit between
"having a set of possible states the NDA could be in" and "have a set of
possible states a qubit could be in". You would just go through each token
of the input, transition by transition, until you reach the end, and then
you "observe" the qubit, revealing finally what state you ended up in (and
whether or not it is an accepting state). The running time becomes O(l),
directly proportional to the size of the input string, and unaffected by
the size of the graph.

   - Oliver
Chris Uppal - 17 Feb 2007 17:19 GMT
> The matched parenthesis problem can be done with a non-deterministic
> state machine, but not with a deterministic one.... hence no regular
> expression can do it.

I would understand "non-deterministic state machine" to mean the
non-deterministic variant of finite state automaton).  And the deterministic
and non-deterministic versions of FSAs are equivalent in power.  Do you use the
term to mean something other form of automaton ?

   -- chris
Alex Hunsley - 18 Feb 2007 19:59 GMT
>> The matched parenthesis problem can be done with a non-deterministic
>> state machine, but not with a deterministic one.... hence no regular
[quoted text clipped - 3 lines]
> non-deterministic variant of finite state automaton).  And the deterministic
> and non-deterministic versions of FSAs are equivalent in power.

Are you sure about that? As far as I can recall, a non-deterministic FSA
(as that is what I am meaning) can solve the bracket matching problem,
but a deterministic one can't. Put it this way - how would you design a
deterministic FSA that did bracket matching on arbitrary length input?
You can't, if it is going to remain finite... And it's easy enough to do
with a non-deterministic FSA.

Is it that *some* non-deterministic FSA can be 'unfolded' into
deterministic ones, but not all, perhaps?

lex

> Do you use the
> term to mean something other form of automaton ?

No, FSA is right, that's basically what I'm thinking of...
Patricia Shanahan - 18 Feb 2007 21:34 GMT
>>> The matched parenthesis problem can be done with a non-deterministic
>>> state machine, but not with a deterministic one.... hence no regular
[quoted text clipped - 9 lines]
> You can't, if it is going to remain finite... And it's easy enough to do
> with a non-deterministic FSA.

How do you do it with a non-deterministic FSA?

> Is it that *some* non-deterministic FSA can be 'unfolded' into
> deterministic ones, but not all, perhaps?

No, ANY non-deterministic FSA can be unfolded into a deterministic one.

The simple, brute force, approach is to create a DFSA whose states are
the sets of states of the NDFSA. It has a transition from x to y on
input symbol s if, and only if, y is the set of possible states of the
NDFSA after input s when in some state in x.

The DFSA's start state is the set of start states of the NDFSA. Its
accept states are those states that contain at least one of the NDFSA's
accept states.

The DFSA tracks the set of states the NDFSA could reach, from any start
state, given the input so far.

Patricia
Alex Hunsley - 19 Feb 2007 10:19 GMT
>>>> The matched parenthesis problem can be done with a non-deterministic
>>>> state machine, but not with a deterministic one.... hence no regular
[quoted text clipped - 12 lines]
>
> How do you do it with a non-deterministic FSA?

Here's one I just sketched. Does it look right?

http://farm1.static.flickr.com/136/395180058_ea4d66f451.jpg?v=0

But I can't see how this can be made into a DFSA because from what I
sketched for that you end up with a big 'ladder' of states that
stretches off to infinity (hence non finite).

>> Is it that *some* non-deterministic FSA can be 'unfolded' into
>> deterministic ones, but not all, perhaps?
[quoted text clipped - 12 lines]
> The DFSA tracks the set of states the NDFSA could reach, from any start
> state, given the input so far.

Thanks for the reminder of all that!
lex
Chris Uppal - 19 Feb 2007 13:26 GMT
> Here's one I just sketched. Does it look right?
>
> http://farm1.static.flickr.com/136/395180058_ea4d66f451.jpg?v=0

I don't think that does what you intend.  For instance, if it sees input string
")" then it'll end up in states {A,B}.  The ')' will take it from the start
state, A, to state A, and also (non-deterministically) to state B.  Since A is
accepting that means it will accept the input ")" -- which is rather noticeably
not a string of balanced, properly nested, brackets ;-)

> But I can't see how this can be made into a DFSA because from what I
> sketched for that you end up with a big 'ladder' of states that
> stretches off to infinity (hence non finite).

It is equivalent to a DFA with two states, I'll call 'em S0 and S1.
S0 is the start state and is accepting.
There are four transitions as follows (I hope the notation is self-evident):
   S0 -- ) --> S0
   S0 -- ( --> S1
   S1 -- ( --> S1
   S1 -- ) --> S0

That DFA was generated and minimised mechanically, btw.  It is always possible
to transform an NFA into an equivalent DFA, and what's more it can be done by a
fairly simple algorithm -- no human insight required ;-)   The key point is
that, since the  NFA has -- by definition -- a finite set of states, the set of
/sets/ of states is also finite.  And by constructing a DFA with a state
corresponding to each (non-empty) set of NFA states, you can translate the NFA
into (typically rather bigger) DFA.  If you'd like more details then you could
start with the Wikipeadia article:
   http://en.wikipedia.org/wiki/Powerset_construction
(I haven't read it, so I don't know how clear, or even correct, it is, but it
does have convincing-looking pictures ;-)   Or try Google:
   nfa dfa construction

The algorithm yields a 3-state FSA with 6 transitions, and then minimising that
produces the above.

   -- chris
Patricia Shanahan - 19 Feb 2007 14:32 GMT
>>>>> The matched parenthesis problem can be done with a non-deterministic
>>>>> state machine, but not with a deterministic one.... hence no regular
[quoted text clipped - 16 lines]
>
> http://farm1.static.flickr.com/136/395180058_ea4d66f451.jpg?v=0

It accepts any string whose last character is ")", regardless of whether
the parentheses are balanced or not. There is an edge for ")" leading to
the accepting state A from both itself and B.

> But I can't see how this can be made into a DFSA because from what I
> sketched for that you end up with a big 'ladder' of states that
> stretches off to infinity (hence non finite).

Each rung on the ladder can be represented by a set of states of the
NDFSA, and the next set of states depends only on the current set of
states and the input.

Patricia
Christian - 19 Feb 2007 15:32 GMT
Patricia Shanahan schrieb:

>>>>>> The matched parenthesis problem can be done with a non-deterministic
>>>>>> state machine, but not with a deterministic one.... hence no regular
[quoted text clipped - 26 lines]
>> sketched for that you end up with a big 'ladder' of states that
>> stretches off to infinity (hence non finite).

the power set of any finite set is finite!
http://en.wikipedia.org/wiki/Powerset_construction
basically the same for any automat or machine ... non deterministic and
deterministic allways have the same power ..
even on a touring machine ...  non-deterministic may just be faster...

> Each rung on the ladder can be represented by a set of states of the
> NDFSA, and the next set of states depends only on the current set of
> states and the input.
>
> Patricia
Daniel Pitts - 17 Feb 2007 01:14 GMT
> Does anyone have a regular expression pattern that would include a test for
> balanced parens of arbitrary nestedness?

Two ways:

public class NestedParens {
   public static void main(String[] args) {
       String[] strings = {
               "Test case 1",
               "(Test case 2)",
               "((Test) case ) 3",
               ")test case 4 (",
       };
       for (String string: strings) {
           System.out.println(string +
                   " is " + (isBalancedRe(string)
                           ?
                           "Balanced" : "Mis-matched"));
       }

       for (String string: strings) {
           System.out.println(string +
                   " is " + (isBalancedCount(string)
                           ?
                           "Balanced" : "Mis-matched"));
       }
   }

   public static boolean isBalancedRe(String string) {
       String last;
       do {
           last = string;
           string = string.replaceAll("\\([^()]*\\)", "");
       } while (!last.equals(string));
       return !string.matches(".*[()].*");
   }

   private static boolean isBalancedCount(String string) {
       int parens = 0;
       for (char c : string.toCharArray()) {
           switch (c) {
               case ')':
                   if (--parens < 0) {
                       return false;
                   }
                   break;
               case '(':
                   ++parens;
           }
       }
       return parens == 0;
   }
}
Daniel Pitts - 17 Feb 2007 01:28 GMT
On Feb 16, 5:14 pm, "Daniel Pitts" <googlegrou...@coloraura.com>
wrote:

> > Does anyone have a regular expression pattern that would include a test for
> > balanced parens of arbitrary nestedness?
[quoted text clipped - 50 lines]
>
> }

Or, a little more concisely:
   public static boolean isBalancedRe(String string) {
       while (string.matches(".*\\([^()]*\\).*")) {
           string = string.replaceAll("\\([^()]*\\)", "");
       }
       return !string.matches(".*[()].*");
   }

   private static boolean isBalancedCount(String string) {
       int parens = 0;
       for (char c : string.toCharArray()) {
           if (c == '(') {
               ++parens;
           }
           if (c == ')' && --parens < 0) {
               return false;
           }
       }
       return parens == 0;
   }
...

Or, to get down right esoteric:
   private static boolean isBalancedCount(String string) {
       int parens = 0;
       for (char c : string.toCharArray()) {
           if ((c == '(' && ++ parens < 0) || c==')' && --parens < 0)
{
               return false;
           }
       }
       return parens == 0;
   }

CafeBabe points to those who know why that works the way it does.
Chris Uppal - 17 Feb 2007 17:21 GMT
>     private static boolean isBalancedCount(String string) {
>         int parens = 0;
[quoted text clipped - 6 lines]
>         return parens == 0;
>     }

I have rarely seen code I liked less.  Well done ;-)

   -- chris
Alex Hunsley - 19 Feb 2007 10:41 GMT
> Or, to get down right esoteric:
>     private static boolean isBalancedCount(String string) {
[quoted text clipped - 9 lines]
>
> CafeBabe points to those who know why that works the way it does.

I saw CafeBabe scavenging some fields for DeadBeef last night! Poor babe....
lex
Ken Kast - 17 Feb 2007 03:48 GMT
Absolutely never thought I would generate so much conversation. I need to
sit down with a glass of wine and digest this.

I probably should have been more transparent in my original post. I'm not
trying to test a string for balanced parens. I need it to be a match
termination condition.  In other words, I have a fairly complex pattern. So
it tells when a match is done. I need to say the match is done if the
character pattern is played out or the match will play out before I balance
parens.

I first wrote this in C#. .Net has extended RegExp that lets you swap named
groups on condition. This effectively lets you model a stack. So you push
left parens on the stack and pop them with right ones. Now I want to
implement the same app in Java. I can certainly assume a low level nesting,
on the order of 3 or 4, so running out of memory or running out of time
aren't concerns. I haven't thought it through, but I could probably do it
brute force by iterating thru a set of patterns. That would solve 6 sigma's
worth of real world situations. I was hoping for something more elegant.

Thanks to everyone for your thoughts.  I certainly learned how lemmings have
influenced computer science.

Ken

> Does anyone have a regular expression pattern that would include a test
> for balanced parens of arbitrary nestedness?
Chris Uppal - 17 Feb 2007 17:27 GMT
> I probably should have been more transparent in my original post. I'm not
> trying to test a string for balanced parens. I need it to be a match
> termination condition.  In other words, I have a fairly complex pattern.
> So it tells when a match is done. I need to say the match is done if the
> character pattern is played out or the match will play out before I
> balance parens.

It sounds to me as if this is a mis-application of regular expressions (IMO
nearly all applications of regular expressions in "real" programming languages
/are/ mis-applications).

Would it be easier to use bracket matching to delimit the sub-sequence you want
with real code and then (if regexes still seem like a good idea) use regexes to
pick it apart ?

Alternatively, use regexes to "tokenise" the input (identifying (, ), and the
other stuff you are interested in) and then use real code to traverse the
tokens (words) using bracket-counting to keep track of where you are in the
input stream.

> I first wrote this in C#. .Net has extended RegExp that lets you swap
> named groups on condition.

<shudder/>

   -- chris


Free Magazines

Get these publications absolutely FREE for up to 12 months. There are no hidden fees and no obligation. Simply choose a title, complete the application form and submit it. Read more ...

Oracle MagazineNetwork ComputingComputer WorldBio-IT WorldeWeekInformation WeekInfosecurity
 
Sign In
Join
My Latest Posts
My Monitored Threads
My Blog
My Photo Gallery
My Profile
My Homepage

Start New Thread
Enable EMail Alerts
Rate this Thread



©2008 Advenet LLC   Privacy Policy - Terms of Use
This website includes both content owned or controlled by Advenet as well as content owned or controlled by third parties.