Home | Contact Us | FAQ | Search & Site Map | Link to Us
Sign In | Join | Other 45 Sites in Network
HomeAnnouncementsWhite Papers
Discussion GroupsFirst AidDatabasesJavaBeansGUIJava 3DVirtual MachineCORBASecurityToolsGeneral
Java DirectoryOpen Source ProjectsSample Book ChaptersUser GroupsWeb Resources
Related Topics
Databases.NETMore Topics ...

Java Forum / General / October 2006

Tip: Looking for answers? Try searching our database.

Random Access File as Set

Thread view: 
carlbernardi@gmail.com - 24 Oct 2006 01:50 GMT
Hi,

I need to be able to use the Random Access File class as a Set class so
there will be no duplicate entries.  I thought about building a Random
Access File that scans itself and won't add a similar entry but I have
a few million unique entries of which many could be similar.

Thanks,

Carl
John W. Kennedy - 24 Oct 2006 01:58 GMT
> Hi,
>
> I need to be able to use the Random Access File class as a Set class so
> there will be no duplicate entries.  I thought about building a Random
> Access File that scans itself and won't add a similar entry but I have
> a few million unique entries of which many could be similar.

Before there was SQL, and before there were B-trees, there were hash
files. You're basically asking the first question asked by the first
programmer who, nearly half a century ago. designed the first IBM RAMAC
305 disk file.

Signature

John W. Kennedy
"The blind rulers of Logres
Nourished the land on a fallacy of rational virtue."
  -- Charles Williams.  "Taliessin through Logres: Prelude"

Matt Humphrey - 24 Oct 2006 02:02 GMT
> Hi,
>
> I need to be able to use the Random Access File class as a Set class so
> there will be no duplicate entries.  I thought about building a Random
> Access File that scans itself and won't add a similar entry but I have
> a few million unique entries of which many could be similar.

With a million entries this really sounds like it should be a database.
I've built file-based object sets by using an index that contains the hash
code and points to the address in the Random Access File where the object is
found, but I was dealing with only tens of thousands.  What are you storing
in the file--arbitrary objects or strings or numbers?  Are you expecting to
delete items from the file? How big are the objects and are they easy to
test for equality?  Can you make a truly unique hash code?

Matt Humphrey matth@ivizNOSPAM.com  http://www.iviz.com/
Mark Rafn - 24 Oct 2006 18:00 GMT
>I need to be able to use the Random Access File class as a Set class so
>there will be no duplicate entries.  I thought about building a Random
>Access File that scans itself and won't add a similar entry but I have
>a few million unique entries of which many could be similar.

How big is your dataset?  Don't overlook the idea of just dumping it all into
a HashSet or TreeSet in memory (and deciding which to use will be useful even
if you decide to go disk-based).  Modern systems with multiple GiB of RAM can
handle things that would have been lunacy a few years ago.  

If you intend to scale to many gigs, then on-disk solutions are needed.  Look
into sleepycat or some other disk hashing or tree storage sytem.  Don't try to
write it yourself unless you're doing it as a learning project and don't mind
making bunches of mistakes.
--
Mark Rafn    dagon@dagon.net    <http://www.dagon.net/>


Free Magazines

Get these publications absolutely FREE for up to 12 months. There are no hidden fees and no obligation. Simply choose a title, complete the application form and submit it. Read more ...

Oracle MagazineNetwork ComputingComputer WorldBio-IT WorldeWeekInformation WeekInfosecurity
 
Sign In
Join
My Latest Posts
My Monitored Threads
My Blog
My Photo Gallery
My Profile
My Homepage

Start New Thread
Enable EMail Alerts
Rate this Thread



©2009 Advenet LLC   Privacy Policy - Terms of Use
This website includes both content owned or controlled by Advenet as well as content owned or controlled by third parties.