Home | Contact Us | FAQ | Search & Site Map | Link to Us
Sign In | Join | Other 45 Sites in Network
HomeAnnouncementsWhite Papers
Discussion GroupsFirst AidDatabasesJavaBeansGUIJava 3DVirtual MachineCORBASecurityToolsGeneral
Java DirectoryOpen Source ProjectsSample Book ChaptersUser GroupsWeb Resources
Related Topics
Databases.NETMore Topics ...

Java Forum / General / January 2006

Tip: Looking for answers? Try searching our database.

finite state automaton

Thread view: 
Roedy Green - 23 Dec 2005 05:37 GMT
Consider a simple finite state automaton to parse property files.

They look like this:
# a comment
keyword=value

I want to categorise each fragment of text as either comment, keyword
or value.  Now throw in a complication. Inside any of those three
things might be literals of the form \uffff

I find myself creating all kinds of rinky dink mechanisms to handle
the literals.  I wondered if there is a clean way to do it.

There are two problems.

1) It is clumsy to invent three literal states one for in comment, one
inkeyword and one invalue just so it can remember what it was doing.
Yet whole idea of a finite state automaton in that the memory of the
system is supposed to be encapsulated in the state.

2) you leave the literal state based on a count, not the presence of
some delimiter.  I could create 5 states to mark progress down the
literal, but this seems a bit nuts.
Signature

Canadian Mind Products, Roedy Green.
http://mindprod.com Java custom programming, consulting and coaching.

Raymond DeCampo - 31 Dec 2005 20:46 GMT
> Consider a simple finite state automaton to parse property files.
>
[quoted text clipped - 19 lines]
> some delimiter.  I could create 5 states to mark progress down the
> literal, but this seems a bit nuts.

Roedy,

Why not run the property file through a pre-processor to handle escape
sequences, similar to what javac does?  After all, the standard property
file format supports \\ and \ followed by a line break for line
continuation and who knows what else....

HTH,
Ray

Signature

XML is the programmer's duct tape.

Stefan Ram - 31 Dec 2005 21:11 GMT
>Why not run the property file through a pre-processor to handle
>escape sequences, similar to what javac does?

 You mean a preprocessor like

native2ascii -reverse

 See

http://download.java.net/jdk6/docs/tooldocs/windows/native2ascii.html
Stefan Ram - 31 Dec 2005 21:22 GMT
Raymond DeCampo <nospam@twcny.rr.com> was quoting:
>>I want to categorise each fragment of text as either comment, keyword
>>or value.  Now throw in a complication. Inside any of those three
>>things might be literals of the form \uffff
>>I find myself creating all kinds of rinky dink mechanisms to handle
>>the literals.  I wondered if there is a clean way to do it.

 The clean way is a scanner with two layers:

 The first layer converts each \u-Sequence to a code point.

 The second layer then reads code points supplied by the first
 layer and does not have to care about the \u-sequences
 anymore.
Raymond DeCampo - 01 Jan 2006 01:21 GMT
> Raymond DeCampo <nospam@twcny.rr.com> was quoting:
>
[quoted text clipped - 11 lines]
>   layer and does not have to care about the \u-sequences
>   anymore.

Gee, thanks for replying to my post, removing my contribution, removing
the OP's name making it seem as if I wrote what the OP did to the casual
observer, and then re-stating my idea.  That was really helpful.

Ray

Signature

XML is the programmer's duct tape.

Roedy Green - 02 Jan 2006 16:26 GMT
On Sat, 31 Dec 2005 20:46:42 GMT, Raymond DeCampo
<nospam@twcny.rr.com> wrote, quoted or indirectly quoted someone who
said :

>Why not run the property file through a pre-processor to handle escape
>sequences, similar to what javac does?  After all, the standard property
>file format supports \\ and \ followed by a line break for line
>continuation and who knows what else....

I considered that, but I wanted to display the file literally.  If the
file contained embedded \uxxx characters in binary, I wanted to
display them differently from ones properly encoded with \uxxxx

I have since solved the problem with kludge, a lookahead that handles
the entire sequence as if it were a single char from the overall state
machine point of view.

You can see it working at http://mindprod.com/jgloss/properties.html

Signature

Canadian Mind Products, Roedy Green.
http://mindprod.com Java custom programming, consulting and coaching.

Stefan Ram - 01 Jan 2006 01:30 GMT
Roedy Green <my_email_is_posted_on_my_website@munged.invalid>
might have written, quoted or indirectly quoted something like:
>I want to categorise each fragment of text as either comment, keyword
>or value. Now throw in a complication. Inside any of those three
>things might be literals of the form \uffff
>I find myself creating all kinds of rinky dink mechanisms to handle
>the literals. I wondered if there is a clean way to do it.

 The clean way is a scanner with two layers:

 The first layer converts each \u-Sequence to a code point.

 The second layer then reads code points supplied by the first
 layer and does not have to care about the \u-sequences
 anymore.


Free Magazines

Get these publications absolutely FREE for up to 12 months. There are no hidden fees and no obligation. Simply choose a title, complete the application form and submit it. Read more ...

Oracle MagazineNetwork ComputingComputer WorldBio-IT WorldeWeekInformation WeekInfosecurity
 
Sign In
Join
My Latest Posts
My Monitored Threads
My Blog
My Photo Gallery
My Profile
My Homepage

Start New Thread
Enable EMail Alerts
Rate this Thread



©2009 Advenet LLC   Privacy Policy - Terms of Use
This website includes both content owned or controlled by Advenet as well as content owned or controlled by third parties.