Home | Contact Us | FAQ | Search & Site Map | Link to Us
Sign In | Join | Other 45 Sites in Network
HomeAnnouncementsWhite Papers
Discussion GroupsFirst AidDatabasesJavaBeansGUIJava 3DVirtual MachineCORBASecurityToolsGeneral
Java DirectoryOpen Source ProjectsSample Book ChaptersUser GroupsWeb Resources
Related Topics
Databases.NETMore Topics ...

Java Forum / General / September 2007

Tip: Looking for answers? Try searching our database.

Properly encoding "Project Gutenburg 1913 Webster Unabridged Dictionary".

Thread view: 
Daniel Pitts - 20 Sep 2007 06:03 GMT
So, I've spent all day working on this. Funfun...

Back story: Project Gutenburg create free ebooks from content that is
now in the public domain, including the "1913 Webster Unabridged
Dictionary".  The problem with this particular work (pgw050*.txt), is
that it uses a very "odd" character set, and an almost-xml markup (it
may be valid SGML, but I wouldn't bank on it)

Its part DOS extended ascii, and then some proprietary character
codes.

My goal:
I'd like to get this into a form that is easily processed by a
program.  I think the best way to do this is to put it into a robust
XML formal.  This would involved cleaning up the markup to be more
valid XML, as well as processing some of the character codes into
nicer forms.  I've already written a program that will read the
original texts, and re-encode the files as UTF-8, using appropriate
character substitution when possible.

At this point, I'm not sure if I'd be better off converting their
custom "entities" into the equivalent UTF-8 encoded characters, or if
it would be better to convert all entities and non-standard characters
into some sort of XML encoded entities.

Anyone have suggestions on what would be the most useful way to go?
Hunter Gratzner - 20 Sep 2007 09:14 GMT
> So, I've spent all day working on this. Funfun...
>
> Back story: Project Gutenburg

It's Gutenberg, not Gutenburg.

> create free ebooks from content that is
> now in the public domain, including the "1913 Webster Unabridged
> Dictionary".  The problem with this particular work (pgw050*.txt), is

Thanks for not providing a link to the file, so we are saved from
having to have a look at it.
Daniel Pitts - 20 Sep 2007 15:25 GMT
> > So, I've spent all day working on this. Funfun...
>
> > Back story: Project Gutenburg
>
> It's Gutenberg, not Gutenburg.
I actually knew that, but my fingers decided to do what they wanted,
not what I wanted :-)

> > create free ebooks from content that is
> > now in the public domain, including the "1913 Webster Unabridged
> > Dictionary".  The problem with this particular work (pgw050*.txt), is
>
> Thanks for not providing a link to the file, so we are saved from
> having to have a look at it.

Ah, indeed.

Thanks for the constructive response.

Jeff Higgins provided the link in a reply: <http://www.gutenberg.org/
dirs/etext96/pgw050ab.txt>
Thanks Jeff!

Thanks,
Daniel.
Jeff Higgins - 20 Sep 2007 14:39 GMT
> So, I've spent all day working on this. Funfun...
>
[quoted text clipped - 15 lines]
> original texts, and re-encode the files as UTF-8, using appropriate
> character substitution when possible.

Whew. After a quick read of webfont.asc and tagset.web I can feel
your pain.  I think the main problem here is that the typesetters /style/
conveys so much information.  For instance:

216    d8      Ø     <par/ double vertical bar (short length; the long
              length is the graphics character 186)
              This precedes words marked with a double vertical bar in
              the original dictionary, signifying that the word was
              adopted directly into English without modification of
              the spelling.

For myself, I suppose the question would be:  Do I want my
/program/ to understand and/or act upon the fact that a character
code 0xd8 signifies the above or is it strictly for a /human/ readers'
consumption?  If the former probably an XML tag would be appropriate,
if the latter maybe an appropriate glyph is sufficient.

<http://www.gutenberg.org/dirs/etext96/pgw050ab.txt>

> At this point, I'm not sure if I'd be better off converting their
> custom "entities" into the equivalent UTF-8 encoded characters, or if
> it would be better to convert all entities and non-standard characters
> into some sort of XML encoded entities.
>
> Anyone have suggestions on what would be the most useful way to go?
Jeff Higgins - 20 Sep 2007 21:36 GMT
>> So, I've spent all day working on this. Funfun...
>>
[quoted text clipped - 3 lines]
>> that it uses a very "odd" character set, and an almost-xml markup (it
>> may be valid SGML, but I wouldn't bank on it)

Another thought strikes me.  Have you looked any of the many
"dictionary markup" languages already out there?  Have you seen
the GNU CIDE?
http://www.ibiblio.org/webster/
Daniel Pitts - 20 Sep 2007 22:23 GMT
> >> So, I've spent all day working on this. Funfun...
>
[quoted text clipped - 7 lines]
> "dictionary markup" languages already out there?  Have you seen
> the GNU CIDE?http://www.ibiblio.org/webster/

Heh, same source material, but it looks like more care was taken in
the translation to *machine readable* format.  I'll check it out.
Thanks for the pointer. (Searching for Public Domain Dictionary
doesn't turn up as much relevant hits as it should :-) )
Daniel Pitts - 20 Sep 2007 21:43 GMT
> > So, I've spent all day working on this. Funfun...
>
[quoted text clipped - 32 lines]
> consumption?  If the former probably an XML tag would be appropriate,
> if the latter maybe an appropriate glyph is sufficient.

Thanks for the reply.  My main goal is to retain as much semantic
meaning as possible for the program to understand. So if I understand
your point, I should convert it to XML tags to maintain that
information...

This brings up a related point.  In XML, can "&blah;" entities have
semantic meaning associated with them? Or are they only replacements
for otherwise difficult-to-represent characters?  That makes a
difference between using &directlyAdopted; and <directly-adopted/>

> <http://www.gutenberg.org/dirs/etext96/pgw050ab.txt>
>
[quoted text clipped - 4 lines]
>
> > Anyone have suggestions on what would be the most useful way to go?

Thanks,
Daniel.
Jeff Higgins - 20 Sep 2007 22:11 GMT
> Daniel Pitts wrote:
> > So, I've spent all day working on this. Funfun...
[quoted text clipped - 33 lines]
> consumption?  If the former probably an XML tag would be appropriate,
> if the latter maybe an appropriate glyph is sufficient.

Thanks for the reply.  My main goal is to retain as much semantic
meaning as possible for the program to understand. So if I understand
your point, I should convert it to XML tags to maintain that
information...

This brings up a related point.  In XML, can "&blah;" entities have
semantic meaning associated with them? Or are they only replacements
for otherwise difficult-to-represent characters?  That makes a
difference between using &directlyAdopted; and <directly-adopted/>

Well, if your asking me personally, I'd have to say I'm no XML expert
and that the best I could do is to point you to the appropriate part
of the spec, sorry.

<http://www.w3.org/TR/2006/REC-xml-20060816/#sec-physical-struct>

> <http://www.gutenberg.org/dirs/etext96/pgw050ab.txt>
>
[quoted text clipped - 4 lines]
>
> > Anyone have suggestions on what would be the most useful way to go?

Thanks,
Daniel.
Roedy Green - 20 Sep 2007 18:24 GMT
On Thu, 20 Sep 2007 05:03:36 -0000, Daniel Pitts
<googlegroupie@coloraura.com> wrote, quoted or indirectly quoted
someone who said :

>At this point, I'm not sure if I'd be better off converting their
>custom "entities" into the equivalent UTF-8 encoded characters, or if
>it would be better to convert all entities and non-standard characters
>into some sort of XML encoded entities.

Perhaps the way to go is to devise a font that renders these odd
characters correctly.  Then the text could be easily manipulated
programmatically with tiny mods to existing software. Then you could
even publish it as a PDF document.

Your problem then becomes political, talking some skilled type
designer into donating her skills in return for some exposure.
Signature

Roedy Green Canadian Mind Products
The Java Glossary
http://mindprod.com

Roedy Green - 20 Sep 2007 18:29 GMT
On Thu, 20 Sep 2007 17:24:26 GMT, Roedy Green
<see_website@mindprod.com.invalid> wrote, quoted or indirectly quoted
someone who said :

>Your problem then becomes political, talking some skilled type
>designer into donating her skills in return for some exposure.

If you have some high res scans of the original text, your job is not
designing a font, but the much easier job of "stealing" the font from
the original samples.  I looked into a similar problem circa 1990 to
"steal" Chinese fonts from hand painted fonts on mechanical optical
typesetters.  The tools were primitive -- interactively defining
Bezier curves with Adobe tools.

There are people who will create you a font from a sample of your
handwriting or printing for a nominal charge.  Perhaps one of them has
the tools and skills to solve your problem.
Signature

Roedy Green Canadian Mind Products
The Java Glossary
http://mindprod.com

RedGrittyBrick - 21 Sep 2007 10:43 GMT
> On Thu, 20 Sep 2007 05:03:36 -0000, Daniel Pitts
> <googlegroupie@coloraura.com> wrote, quoted or indirectly quoted
[quoted text clipped - 12 lines]
> Your problem then becomes political, talking some skilled type
> designer into donating her skills in return for some exposure.

The purpose of a dictionary is semantic. The actual glyphs are
comparatively unimportant. The intellectual accomplishment does not lie
mainly in the choice of symbols.

If you want to reproduce the beautiful typography of the original, use
high quality image scans.

Otherwise I'd translate the glyphs to something semantically or visually
close in the unicode character set.

I think I'd try for a purely semantic markup in XML. Then create a
stylesheet that would render it in XHTML (say) and which would introduce
glyphs and fonts as close to the original as possible. That way, if
unicode ever gets extended to include some of the odd characters used in
the original, you only have to amend the stylesheet.

So I'd represent the "double vertical bar" as an attribute of a tag.
e.g. <word spelling="adopted"> The stylesheet could insert a glyph
visually close to "double vertical bar".

In particular, I'd translate markup like "<universbold>" into
<exposition> or <shape-description> or something. I'm pretty sure
Webster didn't compose his dictionary with LaserJet fonts in mind :-)
Daniel Pitts - 21 Sep 2007 16:31 GMT
On Sep 21, 2:43 am, RedGrittyBrick <redgrittybr...@spamweary.foo>
wrote:
> > On Thu, 20 Sep 2007 05:03:36 -0000, Daniel Pitts
> > <googlegrou...@coloraura.com> wrote, quoted or indirectly quoted
[quoted text clipped - 36 lines]
> <exposition> or <shape-description> or something. I'm pretty sure
> Webster didn't compose his dictionary with LaserJet fonts in mind :-)

Heh. He probably was using a BubbleJet :-)

But seriously.  I'd like to keep the original intent (the
transcriber's, not necessarily Webster's), and then in a later stage
of the processing, convert it to the more semantic meaning, and
probably ignore the rendering of that information.   My personal use-
case actually only cares about the relationships between words, and
the part of speech.   For instance, I'd like to be able to recognize
Ran, Run, and Runs as different tenses of the same word, and Leaf/
Leaves as different inflections of the same word.

Actually, thats not quite my "ultimate" goal.  The ultimate goal is to
create an English Imperative Sentence parser to use in a text
adventure game.  I just figured I might as well do something useful
for the community while I'm at it (in this case, semanticize the
dictionary).  Although it appears that gcide_xml may have done what I
wanted to do already.
John W. Kennedy - 22 Sep 2007 04:10 GMT
> Actually, thats not quite my "ultimate" goal.  The ultimate goal is to
> create an English Imperative Sentence parser to use in a text
> adventure game.

I cannot find that you have ever participated in rec.arts.int-fiction.
Assuming this to be true, then it is highly likely you have no idea of
what you are getting into. Most fundamentally, you can't do a useful I-F
parser (assuming that, by "parser", you mean more than a mere lexer)
unless it is integrated with the world model. And you're also going to
have to create a descriptive language and a compiler for it.

Please study Inform 6, Inform 7 (they are completely different), TADS 2,
TADS 3, Hugo, and Adrift, and then see if A) you really have anything
new to contribute to the state of the art, and B) you have the time to
produce it. I would estimate that any new system offering a significant
improvement on existing tools should take about ten man-years to do from
scratch. You'll also probably need at least two collaborators, a test
writer, and a documentation writer. At a minimum, don't try to create
your own tests; you need a dedicated adversary, because this problem
domain is rife with edge and corner cases.

Signature

John W. Kennedy
"The whole modern world has divided itself into Conservatives and
Progressives. The business of Progressives is to go on making mistakes.
The business of the Conservatives is to prevent the mistakes from being
corrected."
  -- G. K. Chesterton

Daniel Pitts - 22 Sep 2007 05:26 GMT
> > Actually, thats not quite my "ultimate" goal.  The ultimate goal is to
> > create an English Imperative Sentence parser to use in a text
> > adventure game.
>
> I cannot find that you have ever participated in rec.arts.int-fiction.
Indeed, I have not.
> Assuming this to be true, then it is highly likely you have no idea of
> what you are getting into. Most fundamentally, you can't do a useful I-F
> parser (assuming that, by "parser", you mean more than a mere lexer)
> unless it is integrated with the world model. And you're also going to
> have to create a descriptive language and a compiler for it.
Actually, my plan is to describe the world model with Java objects
(hence this being a Java group)

> Please study Inform 6, Inform 7 (they are completely different), TADS 2,
> TADS 3, Hugo, and Adrift, and then see if A) you really have anything
> new to contribute to the state of the art, and B) you have the time to
> produce it.
A) If I don't have anything worth while to contribute, at least I'll
have gained knowledge. This isn't about bettering existing tools and
platforms, but about bettering myself.  I will take a look at those
you suggested, but I'll probably continue on with my project anyway.
I do have *some* experience working on a Lima M.U.D.

> I would estimate that any new system offering a significant
> improvement on existing tools should take about ten man-years to do from
> scratch. You'll also probably need at least two collaborators, a test
> writer, and a documentation writer. At a minimum, don't try to create
> your own tests; you need a dedicated adversary, because this problem
> domain is rife with edge and corner cases.
Agreed. The part that I find the most difficult to model, parse, and
query is the complex relationships that can occur amongst several
objects.  It's easy enough to say that a bowl in on a table, but what
about an apple between the banana and the orange in the bowl on the
wooden table.

Every journey starts with but a footstep.  It may take 10 man years to
complete, but if I don't start on my own, I'll never know. I'm 26, so
if this a project that takes me until I'm 36, I'll still be young
enough to enjoy the results.   In any case, if this DOES get to a
point where I think it might become something useful to the community,
I'm sure I will be able to find plenty of collaborators.

Thanks for the pointers both to the existing projects, and to the raif
group.  I'm sure I will find it invaluable as I go on.

Cheers,
Daniel.
Patricia Shanahan - 22 Sep 2007 06:10 GMT
...
> Agreed. The part that I find the most difficult to model, parse, and
> query is the complex relationships that can occur amongst several
> objects.  It's easy enough to say that a bowl in on a table, but what
> about an apple between the banana and the orange in the bowl on the
> wooden table.

I think there are far more basic issues. Here's a classic example of the
context-sensitivity of the English language: "Time flies like an arrow.".

If it is advice from a senior researcher to a junior researcher in an
entymology lab, "time" is a verb, "flies" is a noun, and "like an arrow"
modifies how to go about timing flies.

If it is a comment on how fast time seems to go by, "time" is a noun,
"flies" is a verb, and "like an arrow" modifies how time flies.

Patricia
RedGrittyBrick - 22 Sep 2007 12:56 GMT
> ...
>> Agreed. The part that I find the most difficult to model, parse, and
[quoted text clipped - 12 lines]
> If it is a comment on how fast time seems to go by, "time" is a noun,
> "flies" is a verb, and "like an arrow" modifies how time flies.

Time flies like an arrow.
Fruit flies like a banana.
- Groucho Marx

Signature

RGB

Daniel Pitts - 22 Sep 2007 18:32 GMT
> ...
>
[quoted text clipped - 15 lines]
>
> Patricia

I actually have a plan on how to handle context, but that particular
sentence is not imperative in the second sense that you provided.
Since I'm narrowing the scope of sentence types down to imperative,
that helps eliminate _some_ ambiguous situations.   Indeed, most
languages (including programming) are somewhat sensitive to context.

For example, the Java "sentence":
s+=10;

could mean "Increase the int 's' by 10.", or "append '10' to the
String 's'".  It could even be an error if "s" isn't numeric or a
String.

The only reason that isn't considered a problem in Java, is that its
"easy" to determine the context of a statement (scoping rules are
specific and well-defined).  On the other hand, "Get the other key"
depends on context that would be harder to model in a computer.
Especially after a few interactions...

"You see a red key and a blue key."
Look at the red key
"The key is red."
Look at the other key
"The other key is blue."
Get the other key.  <-- Does other point to the other other key, or to
the original other key?

Its been my experience with interactive fictions that the sentence
interpreters tend to need you to be very specific.  I'm sure there are
some out there that have forms of context handling, but I want to
experiment on my own to see how I would go about it.

Originally, I think contextual information will have to be provided by
the world-view designer, with a little help about the "obvious"
context.  Eventually, if the imperative sentence parser becomes good
enough, I would consider expanding the scope of it so that the parser
understood other types of sentences, and could glean information about
the current context simply by the descriptions involved.
John W. Kennedy - 22 Sep 2007 20:57 GMT
> ....
>> Agreed. The part that I find the most difficult to model, parse, and
>> query is the complex relationships that can occur amongst several
>> objects.  It's easy enough to say that a bowl in on a table, but what
>> about an apple between the banana and the orange in the bowl on the
>> wooden table.

> I think there are far more basic issues. Here's a classic example of the
> context-sensitivity of the English language: "Time flies like an arrow.".

> If it is advice from a senior researcher to a junior researcher in an
> entymology lab, "time" is a verb, "flies" is a noun, and "like an arrow"
> modifies how to go about timing flies.

> If it is a comment on how fast time seems to go by, "time" is a noun,
> "flies" is a verb, and "like an arrow" modifies how time flies.

And if it is an observation by an surrealist, "time" is an adjective,
"flies" is a noun, "like" is a verb, and "an arrow" is the direct object.

Here's a worse one: "It's a pretty little girls school". I count six
parsings.

Signature

John W. Kennedy
"I want everybody to be smart. As smart as they can be. A world of
ignorant people is too dangerous to live in."
  -- Garson Kanin. "Born Yesterday"

Stefan Ram - 22 Sep 2007 21:23 GMT
>And if it is an observation by an surrealist, "time" is an adjective,
>"flies" is a noun, "like" is a verb, and "an arrow" is the direct object.

     »[I]n an analysis of a set of 891 sentences
     ranging in length from 1 to 25 words, a team led by
     Kathryn Baker found an average of 27 possible ways to
     parse each sentence.«

http://scienceblogs.com/cognitivedaily/2006/12/machine_translation_taking_a_q.php

     »"Time flies like an arrow" --

     1. Time proceeds as quickly as an arrow proceeds.
        (the intended reading)

     2. Measure the speed of flies in the same way that
        you measure the speed of an arrow.

     3. Measure the speed of flies in the same way that
        an arrow measures the speed of flies.

     4. Measure the speed of flies that resemble an arrow.

     5. Flies of a particular kind, time-flies,
        are fond of an arrow.«

 »The Language Instinct«, Steven Pinker
Lew - 22 Sep 2007 21:24 GMT
> Here's a worse one: "It's a pretty little girls school". I count six
> parsings.

I trust none of them involve the possessive of "girl", singular or plural.
That would involve the appropriate placement of apostrophe.

Signature

Lew

John W. Kennedy - 22 Sep 2007 22:29 GMT
>> Here's a worse one: "It's a pretty little girls school". I count six
>> parsings.
>
> I trust none of them involve the possessive of "girl", singular or
> plural. That would involve the appropriate placement of apostrophe.

No, I'm not counting that; if we were looking at the spoken form,
however, we could, which would give even more readings.

Signature

John W. Kennedy
"But now is a new thing which is very old--
that the rich make themselves richer and not poorer,
which is the true Gospel, for the poor's sake."
  -- Charles Williams.  "Judgement at Chelmsford"

Lew - 22 Sep 2007 22:45 GMT
>>> Here's a worse one: "It's a pretty little girls school". I count six
>>> parsings.
[quoted text clipped - 4 lines]
> No, I'm not counting that; if we were looking at the spoken form,
> however, we could, which would give even more readings.

<http://www.phrases.org.uk/bulletin_board/48/messages/808.html>

Given the messiness of human input, one might well have to disregard niceties
of punctuation to arrive at the intended input.

Signature

Lew
"The world needs a computer that does what we want instead of what we tell it
to do."

John W. Kennedy - 22 Sep 2007 20:42 GMT
> Agreed. The part that I find the most difficult to model, parse, and
> query is the complex relationships that can occur amongst several
> objects.  It's easy enough to say that a bowl in on a table, but what
> about an apple between the banana and the orange in the bowl on the
> wooden table.

You're still looking at the purely linguistic problems. But there's more
to it than that. For example, what about a cabinet with a closed door,
but which also has a flat surface on top? What if the door is made of
glass? What if it's made of smoky glass, but there's a switch that can
turn on an interior light? All these things have to be handled by the
world model, but -- they also drag in your parser's disambiguator.

Signature

John W. Kennedy
"Sweet, was Christ crucified to create this chat?"
  -- Charles Williams.  "Judgement at Chelmsford"

Daniel Pitts - 23 Sep 2007 01:17 GMT
> > Agreed. The part that I find the most difficult to model, parse, and
> > query is the complex relationships that can occur amongst several
[quoted text clipped - 13 lines]
> "Sweet, was Christ crucified to create this chat?"
>    -- Charles Williams.  "Judgement at Chelmsford"

Actually, the parser can give a set of all possible parsings, and the
model could determine which makes the most sense based on the current
context.

Yes, the world model is an important part of the interactive fiction.
Its also the easier part to handle in my opinion.  The reason its
easier is that you can limit the world model in ways that you can't
limit what the human will type (without given them an express set of
allowable inputs).  When you come across an ambiguous statement, you
can do one of several things, including asking for clarification or
making a best guess based on current context.
John W. Kennedy - 23 Sep 2007 03:04 GMT
> Actually, the parser can give a set of all possible parsings, and the
> model could determine which makes the most sense based on the current
> context.

Then you are ruling out the ability to do:

   > Take the box.
   Which box do you mean? The red box or the blue box?

...which has been regarded as bare-minimum practice for decades.

> Yes, the world model is an important part of the interactive fiction.
> Its also the easier part to handle in my opinion.

It, too, has nasty possibilities that I suspect you've not yet
considered. Can the player, while seated in a vehicle, reach out and
take an object from the surrounding environment? Have you complete
insurance against putting A inside (or on top of) B while B is inside
(or on top of) A? And don't forget combinatorial explosion.

> The reason its
> easier is that you can limit the world model in ways that you can't
> limit what the human will type (without given them an express set of
> allowable inputs).

Sure, but go too far, and you'll be damned for mimetic failure.

Signature

John W. Kennedy
"The first effect of not believing in God is to believe in anything...."
  -- Emile Cammaerts, "The Laughing Prophet"

Daniel Pitts - 23 Sep 2007 17:21 GMT
> > Actually, the parser can give a set of all possible parsings, and the
> > model could determine which makes the most sense based on the current
[quoted text clipped - 4 lines]
>     > Take the box.
>     Which box do you mean? The red box or the blue box?

Are you sure I'm ruling that out?
If their is an equal probability of the user meaning either the red
box or the blue box, then I could easily present that question.   If
the user then replies with "The first one", or "Red", or
"Either" (etc...), then the contextual information will give the
interpreter enough information to figure out what the user really
meant.
> ...which has been regarded as bare-minimum practice for decades.
>
[quoted text clipped - 6 lines]
> insurance against putting A inside (or on top of) B while B is inside
> (or on top of) A? And don't forget combinatorial explosion.
I think I handled that by:
if (a.inReachOfPlayer());

and in  the "add(Relationship relationship, Thing thing)" method, I
check to see if thing's relationship tree includes this already.

> > The reason its
> > easier is that you can limit the world model in ways that you can't
> > limit what the human will type (without given them an express set of
> > allowable inputs).
>
> Sure, but go too far, and you'll be damned for mimetic failure.
What is mimetic failure? I've never heard that term.

Anyway, why are you so convinced that I haven't got the engineering
capability to come up with solutions for these problems?  Have any of
these concerns of yours been been proven impossible to resolve, or
just difficult to resolve?  I'm not a junior programmer, I've
engineering software for 18 years.  If this was my first project, I'd
probably be doomed to failure as you've suggested, its much more (I
concede, not 100%) likely to succeed given my experience.

And whats the harm in trying?

I do thank you for your interest in ensuring that I don't (waste my
time? fail? why are you pointing these out?). I assure you that the
whole thing is just for the learning experience anyway.  Even if my
effort produces naught, the project wouldn't have failed.

If you are interested in discussing the intricacies of text-based user
interaction with me, I'd be pleased to continue our conversation, but
I'd appreciate it if you try to alter your tone.  It feels like your
assuming I couldn't have thought about things before you point them
out to me.

Thanks,
Daniel.
Lew - 23 Sep 2007 17:46 GMT
 > If you are interested in discussing the intricacies of text-based user
> interaction with me, I'd be pleased to continue our conversation, but
> I'd appreciate it if you try to alter your tone.  It feels like your
> assuming I couldn't have thought about things before you point them
> out to me.

I wish people would stop being tone-of-voice police.  This is Usenet, in a
group that is designed for free-wheeling consideration of Java technical
issues.  JWK had some points to draw to your attention.  You should jump off
the high horse of personal aggrievement and consider his points simply on
their merits.  His points were on topic, technical and designed to elucidate
issues introduced from your posts.  That suffices.  He owes you no more.  He
owes me no more, anyway.

I suggest that you get over it.

Signature

Lew

Daniel Pitts - 23 Sep 2007 18:04 GMT
>   > If you are interested in discussing the intricacies of text-based user
>
[quoted text clipped - 9 lines]
> their merits.  His points were on topic, technical and designed to elucidate
> issues introduced from your posts.
Hey, don't worry, I'm not Ed or Twisted, I don't get offended quite so
easily.  I responded to the technical aspect of his post without
resorting to ad hominem.  I'm just asking that he doesn't make
assumptions about my abilities in his posts.  Don't get me wrong, I do
appreciate his bringing up the technical challenges that await me on
this project.

In any case, I've asked once, and if the response isn't up to my
"emotional standards" ;), I'll simply drop the thread.

> That suffices.  He owes you no more.  He
> owes me no more, anyway.
He might not owe you or I, but *I* owe it to myself to ask for a
little respect.  If I feel like I wont get that respect, I'll take
your following suggestion to heart (as I intended to from the start).

> I suggest that you get over it.
>
> --
> Lew

Thanks,
Not a troll,
Daniel.
Lew - 23 Sep 2007 18:13 GMT
> Thanks,
> Not a troll,
> Daniel.

No, you are most emphatically not a troll.  You are truly one of the White
Hats here.

As the subject myself of actual, direct /ad hominem/ attacks in these hallowed
halls, I know how difficult it can be to put up with disrespect.

I support your right to ask for respect.  I also point out that JWK isn't
writing for you alone, but for all the viewers who might not have thought
about all the implications that you have.  JWK supports everyone when he
elucidates those issues.

Also, don't expect him to be telepathic.  How can he know of what you have
thought?  He exercises due diligence by bringing up points that he
"/suspect[s]/ you've not yet considered." (Emphasis mine.)

I offer that asking for respect here is pointless.  Ask for information,
knowledge or guidance.  Let self-respect suffice.

How about we all stop expecting people to coddle our namby-pambiness and just
deal with the content of messages?

Signature

Lew

Daniel Pitts - 23 Sep 2007 19:28 GMT
> > Thanks,
> > Not a troll,
[quoted text clipped - 23 lines]
> --
> Lew

F***ing idiot!

(Just kidding!)

I don't think that asking for respect is pointless, but expecting it
might be.  Its dangerous for one to confuse a request with an
expectation.  I requested respect, but I don't expect it.

Perhaps what I should have ask for was to continue this conversation
with JWK out of public attention and that we discuss the topic at the
level that I'm capable, rather than at the LCD of all of cljp.  :-)

Although, I agree with you that answers here should be at the level
that benefits the wider audience.  It makes me wonder though, perhaps
a more advanced-topic forum is desirable.  Or perhaps cljp should be
reclaimed, and all the basic->intermediate topics could be shifted to
cljh.

Or, maybe my questions are more domain specific, and I need to move my
conversation into the appropriate group. For this thread, perhaps, as
JWK suggested, rec.arts.int-fiction, or perhaps rec.games.int-fiction
would be more appropriate.

To J.W.K.  Would you be interested in continuing this discussion
through e-mail? (Don't use the address I have here, it won't go
through)

Thanks,
Daniel.
Andrew Thompson - 23 Sep 2007 19:56 GMT
..
>...perhaps cljp should be
>reclaimed, and all the basic->intermediate topics could be shifted to
>cljh.

I agree fully.  c.l.j.h. is a group well designed for beginners
where (perhaps unproductively excessive) politeness is
expected.  I invite anybody that feels the slightest bit
'fragile' to post there, and stop wasting the bandwidth of
c.l.j.p. posters with such dross.

Signature

Andrew Thompson
http://www.athompson.info/andrew/

John W. Kennedy - 24 Sep 2007 03:56 GMT
> To J.W.K.  Would you be interested in continuing this discussion
> through e-mail? (Don't use the address I have here, it won't go
> through)

Honestly, you'd be better off with real experts. I've been programming
since 1965, my wife and I were beta testers for Infocom from 1984 on,
and I've been involved in post-Infocom IF software since the early 90s
(mainly after-the-fact OS/2 support for Infocom and a Java servlet that
could execute most Infocom games on cellphones via WAP -- in case you
don't know, Infocom games ran on a virtual machine), but there are
people way more knowledgeable than I am, people that I look up to in
this field the way that I look up to people like Jane Austen, Kálmán
Imre, or Joe Straczynski in theirs.

I know enough to know that developing an IF parser is like herding cats;
I don't claim to be a cat herder myself. I'm only getting involved in
this because, as far as I know, I'm the only one in CLJP who's dipped a
toe in this pool at all -- and I've seen people crash and burn.
Signature

John W. Kennedy
"The bright critics assembled in this volume will doubtless show, in
their sophisticated and ingenious new ways, that, just as /Pooh/ is
suffused with humanism, our humanism itself, at this late date, has
become full of /Pooh./"
  -- Frederick Crews.  "Postmodern Pooh", Preface

Daniel Pitts - 24 Sep 2007 04:51 GMT
> > To J.W.K.  Would you be interested in continuing this discussion
> > through e-mail? (Don't use the address I have here, it won't go
[quoted text clipped - 21 lines]
> become full of /Pooh./"
>    -- Frederick Crews.  "Postmodern Pooh", Preface

Indeed, you do seem to be the most knowledgeable on this topic in this
group.  Perhaps I should seek a mentor in raif then.  I do have
experience building parsers.  As a matter of fact, I've created some
sophisticated parsers by hand, rather than relying on a tool.

Anyway, enough about my random wanderings as a programmer.  I
downloaded Inform 7 today, and I've been playing with it all day.  So
far I'm impressed, but not overwhelmed.  I find it easier to model my
world with code rather than natural language, but I'm sure that I'll
get the hang of this eventually.

Thanks for your help JWK.
Daniel.
John W. Kennedy - 24 Sep 2007 20:20 GMT
> I find it easier to model my
> world with code rather than natural language, but I'm sure that I'll
> get the hang of this eventually.

I am myself not at all sure about the natural-language aspect of Inform
7 (horrid memories of supporting COBOL), but it embodies by far the most
powerful "calculus of IF", so to speak, that I'm aware of.

Anyway, good luck!
Signature

John W. Kennedy
If Bill Gates believes in "intelligent design", why can't he apply it to
Windows?

John W. Kennedy - 23 Sep 2007 20:24 GMT
>>> Actually, the parser can give a set of all possible parsings, and the
>>> model could determine which makes the most sense based on the current
[quoted text clipped - 11 lines]
> interpreter enough information to figure out what the user really
> meant.

But now, you see, you've entangled the world model with the parser
again. It really can't be avoided.

>> ...which has been regarded as bare-minimum practice for decades.

>>> Yes, the world model is an important part of the interactive fiction.
>>> Its also the easier part to handle in my opinion.

>> It, too, has nasty possibilities that I suspect you've not yet
>> considered. Can the player, while seated in a vehicle, reach out and
>> take an object from the surrounding environment? Have you complete
>> insurance against putting A inside (or on top of) B while B is inside
>> (or on top of) A? And don't forget combinatorial explosion.

> I think I handled that by:
> if (a.inReachOfPlayer());

That is only to say that you can solve the problem by solving it. How do
you define inReachOfPlayer() when there may be arbitrary container
objects surrounding a and/or the player? (And remember, by the way, that
a modern system has to allow for player-ness to move from one character
to another.)

> and in  the "add(Relationship relationship, Thing thing)" method, I
> check to see if thing's relationship tree includes this already.

Nope. An object cannot contain another object that contains it, but an
NPC can be friendly with another NPC that is friendly with it. And, on
the other hand, you've forgotten the cabinet with a shelf on top.

>>> The reason its
>>> easier is that you can limit the world model in ways that you can't
>>> limit what the human will type (without given them an express set of
>>> allowable inputs).

>> Sure, but go too far, and you'll be damned for mimetic failure.

> What is mimetic failure? I've never heard that term.

From the American Heritage Dictionary:
mimesis, noun: The imitation or representation of aspects of the
sensible world, especially human actions, in literature and art.

> Anyway, why are you so convinced that I haven't got the engineering
> capability to come up with solutions for these problems?

I'm not. I'm just warning you that you're tackling an intrinsically hard
problem that experts have been working on for decades, and that if you
don't familiarize yourself with the state of the art, you're going to
lay a big, fat egg.

> ...

> It feels like your
> assuming I couldn't have thought about things before you point them
> out to me.

I am assuming only that you are not prodigiously more gifted than anyone
else who has ever tried this -- and that group includes the founders of
Infocom, who were graduates of the MIT Artificial Intelligence
Laboratory, and Graham Nelson, the leading contemporary theorist and the
creator of Inform and Inform 7, who lectures on mathematics at Oxford
University and is also a published poet.

I cannot recommend too strongly that you acquaint yourself with
rec.arts.int-fiction and some modern IF development systems. Inform 7
(<URL:http://www.inform-fiction.org>) is still in beta, but is probably
the most advanced.
Signature

John W. Kennedy
"When a man contemplates forcing his own convictions down another man's
throat, he is contemplating both an unchristian act and an act of
treason to the United States."
  -- Joy Davidman, "Smoke on the Mountain"

Ed Kirwan - 24 Sep 2007 22:09 GMT
snipski
>> I would estimate that any new system offering a significant
>> improvement on existing tools should take about ten man-years to do from
>> scratch.
snip

> Every journey starts with but a footstep.  It may take 10 man years to
> complete, but if I don't start on my own, I'll never know.

I like that, because I've been there: and I eventually found a stall selling
big, "I've failed," tee-shirts, just like John W. said I would. I bought
myself a nice, bright green one. (See, "Violentia," below.)

Almost every step towards this particular failure, however, was rewarding,
and I carry both lessons and lesions with me still (if only all failures
yielded such insights). I've often thought that IF is the perfect
environment in which to cuts ones OO-teeth (not that you are, Daniel)
because you can get by with a little and add sophistication until the cows
come home. In short, it's so damn extensible. It's worth doing as a
code-structuring exercise alone, just don't ever expect to see a
finish-line.

FWIW, you're going to meet the Visitor. I'm sure you've met him before, but
in IF, he's the biggest, meanest bruiser you've ever seen. And he's in a
bad mood.

Murderously bad ...

Signature

.ed

www.EdmundKirwan.com - Home of The Fractal Class Composition

Daniel Pitts - 24 Sep 2007 23:58 GMT
> snipski
> >> I would estimate that any new system offering a significant
[quoted text clipped - 17 lines]
> code-structuring exercise alone, just don't ever expect to see a
> finish-line.
No project or product is finished until its end-of-lifed.  And at that
point, its only finished in the sense of its mortality. :-)  I'm glad
someone else sees my point of view on this.

> FWIW, you're going to meet the Visitor. I'm sure you've met him before, but
> in IF, he's the biggest, meanest bruiser you've ever seen. And he's in a
> bad mood.
Do you mean the Visitor pattern? Or is this some reference to
something I don't yet know?

> Murderously bad ...
*gulp* :-)

> --
> .ed
>
> www.EdmundKirwan.com- Home of The Fractal Class Composition

Thanks,
Daniel.
Ed Kirwan - 25 Sep 2007 06:15 GMT
snip
> No project or product is finished until its end-of-lifed.  And at that
> point, its only finished in the sense of its mortality. :-)  I'm glad
[quoted text clipped - 4 lines]
>> in a bad mood.
> Do you mean the Visitor pattern?

I do indeed.

At least if you take the simplistic Verb, Noun, Adverb, Adjective (etc.)
approach that I took, because an action will depend on which type of each
of these is involved.

For example, "Take sword," will have a different outcome from, "Take water,"
and it's the nature of the verb-object/noun-object interaction that defines
this different outcome. I found my verbs constantly visiting my nouns to
find out what to do next.

Signature

.ed

www.EdmundKirwan.com - Home of The Fractal Class Composition

Daniel Pitts - 25 Sep 2007 16:27 GMT
> snip
> > No project or product is finished until its end-of-lifed.  And at that
[quoted text clipped - 21 lines]
>
> www.EdmundKirwan.com- Home of The Fractal Class Composition

My plan is actually to define a grammar that will parse the input
sentence possible parse trees.  Then figure out from the world model
which of those parse tree's makes the most sense (or if I'd have to
ask for clarification).

I'll probably want to use the visitor pattern to visit the objects in
my world model though.
Jeff Higgins - 21 Sep 2007 15:16 GMT
> So, ...

I must thank you for posting this article.  After having read
your post I spent some time browsing the WWW on the subject
and found a lot of interesting stuff.  Here are links to two things
that I found particularly interseting.

I rediscovered the Princeton University WordNet project.
<http://wordnet.princeton.edu/>

And through that link discovered a most wonderful (free)
dictionary utility program for the Windows platform:

WordWeb 5 for Windows
<http://wordweb.info/free/>

This program allows me to place my mouse cursor over a word
in any other program and with a CTRL + right click bring up
a useful dictionary/thesarus already opened to the word under
the cursor!! How neat! I've tried it in my newsreader "Outlook"
and in IE7 and OpenOffice Writer, even Eclipse. How's it do that?

Anyway, this is not a commercial advertisement, I am not
in any way associated the above mentioned organizations.

Thanks,
JH


Free Magazines

Get these publications absolutely FREE for up to 12 months. There are no hidden fees and no obligation. Simply choose a title, complete the application form and submit it. Read more ...

Oracle MagazineNetwork ComputingComputer WorldBio-IT WorldeWeekInformation WeekInfosecurity
 
Sign In
Join
My Latest Posts
My Monitored Threads
My Blog
My Photo Gallery
My Profile
My Homepage

Start New Thread
Enable EMail Alerts
Rate this Thread



©2009 Advenet LLC   Privacy Policy - Terms of Use
This website includes both content owned or controlled by Advenet as well as content owned or controlled by third parties.