Home | Contact Us | FAQ | Search & Site Map | Link to Us
Sign In | Join | Other 45 Sites in Network
HomeAnnouncementsWhite Papers
Discussion GroupsFirst AidDatabasesJavaBeansGUIJava 3DVirtual MachineCORBASecurityToolsGeneral
Java DirectoryOpen Source ProjectsSample Book ChaptersUser GroupsWeb Resources
Related Topics
Databases.NETMore Topics ...

Java Forum / General / June 2006

Tip: Looking for answers? Try searching our database.

Adding and deleting files in an atomic way

Thread view: 
Chris - 25 Jun 2006 03:10 GMT
I need to be able to have multiple processes add and delete files in a
directory in completely atomic way. Can't quite figure out how to do it.

I have three independent processes. One adds small files to a directory
(the "Adder"). Another merges the small files into big ones, and then
deletes the small ones (the "Merger"). A third simply reads the files in
a random way (the "Reader"). The Reader is heavily multi-threaded --
lots of reading might be going on simultaneously.

The files are read-only. There will be a maximum of a few hundred files
at any given time.

These different functions may or may not be running in the same JVM. It
is possible that multiple JVMs will be hitting the same directory,
possibly ones running on different machines all hitting a shared drive.

When a Reader thread starts to read files, the list of files must not
change until it's done. The file-reading process takes at most a second
or two. If the Merger wants to delete a file during that time, it must wait.

The Adder process must be able to notify the Reader and Merger processes
that a new file has been added. The Merger must be able to notify the
Reader that the current list of files has changed, so that the next time
the Reader starts a new thread, it uses the most current list.

I'm guessing that I might be able to do all this by having a plain text
file in the directory that lists the "current" files, and just have the
Adder and Merger processes put an exclusive file lock on it whenever the
list needs to change. The Adder and Merger can create any new files with
a .tmp extension, and then rename them in a very fast operation to make
them live.

I haven't figured out how to handle it, though, if the system crashes
while the Merger is renaming or deleting files, or how to prevent files
from being deleted while the Reader is using them (how do we know when
the various Reader threads have finished with a file?). I'm hoping that
I won't need to implement some kind of transaction log with commit/rollback.

Any thoughts appreciated.
Andrey Kuznetsov - 25 Jun 2006 03:46 GMT
>I need to be able to have multiple processes add and delete files in a
>directory in completely atomic way. Can't quite figure out how to do it.
[quoted text clipped - 11 lines]
> possible that multiple JVMs will be hitting the same directory, possibly
> ones running on different machines all hitting a shared drive.

if they run on same mashine they could communicate through socket,
for different mashines you will need some kind server.

Andrey

Signature

http://uio.imagero.com Unified I/O for Java
http://reader.imagero.com Java image reader
http://jgui.imagero.com Java GUI components and utilities

Gijs Peek - 25 Jun 2006 08:47 GMT
> I need to be able to have multiple processes add and delete files in a
> directory in completely atomic way. Can't quite figure out how to do it.
[quoted text clipped - 37 lines]
>
> Any thoughts appreciated.

I think you are dealing with the producer/consumer problem here. There is
lots of information on that problem on the web. It is usually solved using
semaphores (supported in java in the class java.util.concurrent.Semaphore
since java 1.5).
steve - 25 Jun 2006 11:47 GMT
>> I need to be able to have multiple processes add and delete files in a
>> directory in completely atomic way. Can't quite figure out how to do it.
[quoted text clipped - 42 lines]
> semaphores (supported in java in the class java.util.concurrent.Semaphore
> since java 1.5).

if it's on different processes or different jvm
The way i currently do this is with a database (because it is already there)
and then as each stage progresses i set a flag, well actually a counter, and
each process only looks for records with flags between a certain range.

The advantage  of doing this is that , if a process dies , or does not
complete, then the file is not flagged in the database, as needing action, by
the next process,

Also the system can be restarted, without loss of position or status of the
files under processing.

as you are running different processes (different machines), you will need
some sort of server /client system.
but be careful what you store , in ram, and ensure that the system can
automatically recover, should anything , crash ,blowup, loose power , or any
other such problem.

Steve
Filip Larsen - 25 Jun 2006 11:39 GMT
> I need to be able to have multiple processes add and delete files in a
> directory in completely atomic way. Can't quite figure out how to do it.
[quoted text clipped - 13 lines]
> the various Reader threads have finished with a file?). I'm hoping that
> I won't need to implement some kind of transaction log with commit/rollback.

If you associate a unique file name pattern to *all* the different
processes or states that requires exclusive access to a file, you can
signal ownership by file name and acquire or hand-over ownership using
rename and you can skip using file locks. This obviously requires that
all processes are cooperating and that file renames are atomic and
detectable on the file systems you are using. This way, file renames
works much like acquiring semaphores.

For instance, if your Merger somehow decides it wants to delete file1 it
will periodically try to rename it to file1.delete and only continue
with deleting after successful rename. The Merger component can keep the
list of files that should be deleted but which it does not yet own in a
persistent list. After crash Merger can restore that list and also
delete all files that match *.delete.

When a Reader wants to read a file it likewise try to rename the file in
a loop and only when successful it can read it. If rename fails it
probably also have to check that the file has not been deleted (i.e.
"file1.*" should match something). When the reader is done reading it
renames the file back to its base name signaling that the file is free
to use.

If you have a producer that hands over files to a consumer you can have
the producer rename the file to signal that file is ready for
consumption. If you only have one consumer then it can assume it owns
files renamed for it, but if you have multiple consumers they should
each use a unique rename to acquire the file first.

If you have processes that can crash and never come back, you may have
to use a clean-up process that detects files owned by such crashed
processes. One way could be to have each process regularly stamp a
unique file, e.g. "process.reader1" so that all files owned by reader1
("*.reader1") can be released if the stamp gets too old. This is similar
to other lease systems used in distributed systems. You can even include
lease time in your stamp file, filename or timestamp if you need
different lease periods.

Regards,
Signature

Filip Larsen

Mark Space - 25 Jun 2006 19:29 GMT
> I need to be able to have multiple processes add and delete files in a
> directory in completely atomic way. Can't quite figure out how to do it.
>
> I have three independent processes. One adds small files to a directory

This is a classic Reader-Writer problem.  Your Reader process(es) is a
Reader, and your Adder and Merger are both writers.

You should implement this, imho, with system level file locking.  I
think NIO gives you access to the system file locking mechanisms.

If you are going to allow *completely* different processes to access
these file, you need to think about user privileges too.  For example,
if someone comes along after you and starts opening and modifying these
files in a fourth process (say, a Perl script somewhere) you need to
make sure you can lock their script out when you need to (or wait
indefinitely for a lock until the script is done).

If you are going to be the only one accessing these files, then you
could instead adopt some conventions instead of asserting complete
control over the whole file.  For example, a common "flag" to lock a
file is to just lock the first byte.  This indicates the file is in-use,
and no one else should use it.

You might be able to "lock" a whole subdirectory, which *might* prevent
new files from being created.  I'm not sure about this though, check NIO
and your OS.

You may have to modify your expectations a bit to use file locking, but
I think you'll have a more robust solution at the end.

> I haven't figured out how to handle it, though, if the system crashes
> while the Merger is renaming or deleting files, or how to prevent files
> from being deleted while the Reader is using them (how do we know when
> the various Reader threads have finished with a file?). I'm hoping that
> I won't need to implement some kind of transaction log with
> commit/rollback.

I think you'll have to create some sort of journal or log, and roll back
or proceed forward during recovery.  This will have to be integrated
into the start-up of the system, or maybe integrated into the start-up
of the application you are building.  I can't really give you more help
here, sorry, I don't know off hand of any auto-recovery type objects for
Java.


Free Magazines

Get these publications absolutely FREE for up to 12 months. There are no hidden fees and no obligation. Simply choose a title, complete the application form and submit it. Read more ...

Oracle MagazineNetwork ComputingComputer WorldBio-IT WorldeWeekInformation WeekInfosecurity
 
Sign In
Join
My Latest Posts
My Monitored Threads
My Blog
My Photo Gallery
My Profile
My Homepage

Start New Thread
Enable EMail Alerts
Rate this Thread



©2008 Advenet LLC   Privacy Policy - Terms of Use
This website includes both content owned or controlled by Advenet as well as content owned or controlled by third parties.