Helo,
i have a task to recognize two languages: English and Lithuanian.
First thing, i think, is to search for specific letters, also for
words such like: are, to, into, the etc. Am i going the right way?:)
any ideas how to do that?
Gordon Beaton - 01 Jul 2005 10:33 GMT
> First thing, i think, is to search for specific letters, also for
> words such like: are, to, into, the etc. Am i going the right way?:)
>
> any ideas how to do that?
Yes, I believe that searching for "stop words" in each of the
languages is a simple and reasonably accurate method.
This kind of problem is often discussed in comp.ai.nat-lang, so you
might want to ask there too.
There are various lists of stop words available on the web,
such as this one: http://meta.wikimedia.org/wiki/Stop_word_list
/gordon

Signature
[ do not email me copies of your followups ]
g o r d o n + n e w s @ b a l d e r 1 3 . s e
Andrius Klimavi?ius - 04 Jul 2005 12:28 GMT