Home | Contact Us | FAQ | Search & Site Map | Link to Us
Sign In | Join | Other 45 Sites in Network
HomeAnnouncementsWhite Papers
Discussion GroupsFirst AidDatabasesJavaBeansGUIJava 3DVirtual MachineCORBASecurityToolsGeneral
Java DirectoryOpen Source ProjectsSample Book ChaptersUser GroupsWeb Resources
Related Topics
Databases.NETMore Topics ...

Java Forum / General / July 2005

Tip: Looking for answers? Try searching our database.

recongizing language

Thread view: 
Andrius Klimavi?ius - 01 Jul 2005 09:35 GMT
Helo,

i have a task to recognize two languages: English and Lithuanian.
First thing, i think, is to search for specific letters, also for
words such like: are, to, into, the etc. Am i going the right way?:)

any ideas how to do that?
Gordon Beaton - 01 Jul 2005 10:33 GMT
> First thing, i think, is to search for specific letters, also for
> words such like: are, to, into, the etc. Am i going the right way?:)
>
> any ideas how to do that?

Yes, I believe that searching for "stop words" in each of the
languages is a simple and reasonably accurate method.

This kind of problem is often discussed in comp.ai.nat-lang, so you
might want to ask there too.

There are various lists of stop words available on the web,
such as this one: http://meta.wikimedia.org/wiki/Stop_word_list

/gordon

Signature

[  do not email me copies of your followups  ]
g o r d o n + n e w s @  b a l d e r 1 3 . s e

Andrius Klimavi?ius - 04 Jul 2005 12:28 GMT


Free Magazines

Get these publications absolutely FREE for up to 12 months. There are no hidden fees and no obligation. Simply choose a title, complete the application form and submit it. Read more ...

Oracle MagazineNetwork ComputingComputer WorldBio-IT WorldeWeekInformation WeekInfosecurity
 
Sign In
Join
My Latest Posts
My Monitored Threads
My Blog
My Photo Gallery
My Profile
My Homepage

Start New Thread
Enable EMail Alerts
Rate this Thread



©2008 Advenet LLC   Privacy Policy - Terms of Use
This website includes both content owned or controlled by Advenet as well as content owned or controlled by third parties.