Home | Contact Us | FAQ | Search & Site Map | Link to Us
Sign In | Join | Other 45 Sites in Network
HomeAnnouncementsWhite Papers
Discussion GroupsFirst AidDatabasesJavaBeansGUIJava 3DVirtual MachineCORBASecurityToolsGeneral
Java DirectoryOpen Source ProjectsSample Book ChaptersUser GroupsWeb Resources
Related Topics
Databases.NETMore Topics ...

Java Forum / General / January 2006

Tip: Looking for answers? Try searching our database.

Unknown file format - read problem

Thread view: 
Grzegorz Stasica - 17 Jan 2006 22:13 GMT
hi,

I've dictionary which apparently works only on Win. I'd like to use its
database (one file) and write some application so I could use it from linux
as well. The problem is that I've never "hacked" these things and have no
idea where to start from. The file itself is about 65MB. What is the way to
understand the file structure in order to read valid data.
Sashi - 17 Jan 2006 22:37 GMT
How is this "one file" organized? Is it a simple text file (I guess not
or else you would haven't posted)? Is it tightly integrated into the
application that uses it? Is it a compressed file? Is it a bit file? A
byte file?
If I were you, I'd first put that file on my Linux box, open it with
hava and print out the first 5kB and see what comes out.
Play around with it and it might make sense.
Sashi
> hi,
>
[quoted text clipped - 3 lines]
> idea where to start from. The file itself is about 65MB. What is the way to
> understand the file structure in order to read valid data.
hiwa - 18 Jan 2006 01:28 GMT
I'd use good old hexdump and analyze the file format.
Then, extract its pure text part.
Roedy Green - 18 Jan 2006 03:34 GMT
On Tue, 17 Jan 2006 23:13:17 +0100, "Grzegorz Stasica"
<gstasica@poczta.onet.pl> wrote, quoted or indirectly quoted someone
who said :

>I've dictionary which apparently works only on Win. I'd like to use its
>database (one file) and write some application so I could use it from linux
>as well. The problem is that I've never "hacked" these things and have no
>idea where to start from. The file itself is about 65MB. What is the way to
>understand the file structure in order to read valid data.

Dictionaries typically use some quite sophisticated compression
techniques.  Basically they are going to try to avoid storing a
duplicate of the string
"responsibil"
in responsible
and responsibility.

If you have software to build custom dictionaries, that will make it
easier to decode, since you can build dictionaries with only one or
two words to study.

Unlike ordinary compression, dictionaries have to be used in
compressed form.
Signature

Canadian Mind Products, Roedy Green.
http://mindprod.com Java custom programming, consulting and coaching.

Gordon Beaton - 18 Jan 2006 07:49 GMT
> Basically they are going to try to avoid storing a
> duplicate of the string
>  "responsibil"
> in responsible
> and responsibility.

I would certainly hope that any dictionary I use doesn't assume that
"responsibil" is part of "responsible", regardless of how they choose
to store the words.

/gordon

Signature

[  do not email me copies of your followups  ]
g o r d o n + n e w s @  b a l d e r 1 3 . s e



Free Magazines

Get these publications absolutely FREE for up to 12 months. There are no hidden fees and no obligation. Simply choose a title, complete the application form and submit it. Read more ...

Oracle MagazineNetwork ComputingComputer WorldBio-IT WorldeWeekInformation WeekInfosecurity
 
Sign In
Join
My Latest Posts
My Monitored Threads
My Blog
My Photo Gallery
My Profile
My Homepage

Start New Thread
Enable EMail Alerts
Rate this Thread



©2009 Advenet LLC   Privacy Policy - Terms of Use
This website includes both content owned or controlled by Advenet as well as content owned or controlled by third parties.