> witht...@gmail.com wrote:
> > Hello all,
[quoted text clipped - 34 lines]
>
> Patricia
Patricia,
Two files would be considered identical if their contents are
identical. Here's one example of what I would like to accomplish:
Suppose we have a file AFILE.dat on the partition mounted at /mnt/
adrive. The program first becomes aware of the file by the user
informing it of /mnt/adrive/AFILE.dat.
Sometime after the program terminates, the user decides to copy
AFILE.dat to some other partition, for example, /other. The Unix path
to the file is now /other/AFILE.dat (although it could just easily by /
other/RENAMED.dat, as far as the program should care).
The next time the program runs, is there any way to determine that the
file that used to be at /mnt/adrive/AFILE.dat is now at /other/
AFILE.dat?
The Windows API call that I linked seems to offer a way to accomplish
this by hooking custom code into the system's copy command. The hooked-
in code could update the program's data structure with the new path.
Do other operating systems offer methods like this, let alone ones
that would be easily accessible to Java?
Thanks for your input,
-Jason
Eric Sosman - 19 Feb 2007 19:47 GMT
> [...]
> Two files would be considered identical if their contents are
> identical.
So, for example, if there are two hundred forty-seven
zero-length files at various places in your file system,
they are all the same file?
> Here's one example of what I would like to accomplish:
>
[quoted text clipped - 6 lines]
> to the file is now /other/AFILE.dat (although it could just easily by /
> other/RENAMED.dat, as far as the program should care).
You said "copy" (which is what motivated my semi-rhetorical
question above). After the copy, the file system holds two
files with different complete paths but identical content. Why
should your program care? What are you trying to do with these
identical-content copies that requires you to find both -- or
"all" -- of them?
> The next time the program runs, is there any way to determine that the
> file that used to be at /mnt/adrive/AFILE.dat is now at /other/
> AFILE.dat?
... and at /mnt/adrive/AFILE.dat as well, perhaps. I know
of no reasonable[*] way to accomplish this.
[*] You might, at startup, scan every accessible file in the
file system and check for duplications. Computing a checksum for
every file and entering them all in a Map keyed on the checksum
might make this feasible, but it still wouldn't be reasonable.
> The Windows API call that I linked seems to offer a way to accomplish
> this by hooking custom code into the system's copy command. The hooked-
> in code could update the program's data structure with the new path.
Windows can invoke these hooks even when your program isn't
running?
> Do other operating systems offer methods like this, let alone ones
> that would be easily accessible to Java?
I've never heard of a file system with this property. It would
sort of vitiate the entire notion of a "backup copy," wouldn't it?

Signature
Eric Sosman
esosman@acm-dot-org.invalid
Patricia Shanahan - 19 Feb 2007 20:09 GMT
>> witht...@gmail.com wrote:
>>> Hello all,
[quoted text clipped - 33 lines]
> Two files would be considered identical if their contents are
> identical. Here's one example of what I would like to accomplish:
Under that definition, I don't think you can depend on tracking copies
and moves. There will be situations in which two files have identical
contents without any local copying relationship.
For example, the same program may be checked out more than once from a
version repository on a different system.
For the identical contents problem I would focus on calculating hashes,
and comparing them. If two files have identical hashes, there is at
least a possibility that they have identical content, and you can
consider either more detailed hashing or direct comparison.
Patricia
a249@mailinator.com - 19 Feb 2007 21:04 GMT
> Sometime after the program terminates, the user decides to copy
> AFILE.dat to some other partition, for example, /other.
That's the user's god given right and really not your business. Don't
give the user an own file if he isn't supposed to do with it whatever
him pleases. Put the data in a database, or put the data in some file
which requires elevated rights to work with, outside of the user's
home. Then only give the application, not the user the necessary
rights to work with the DB or file.
> The Windows API call that I linked seems to offer a way to accomplish
> this by hooking custom code into the system's copy command.
That schema will break in hundreds of ways. It will, for example,
break the moment the user uses a tool which doesn't use the particular
copy command. All it takes is that one tool copies the data by
individual read/write operations, and deletes the original after the
copy is complete. It will also break if the user decides to put in in
a ZIP archive, delete the original, than unpack the archive in some
other directory. It will break if the user creates some backup,
deletes the original and than restores the backup in some other
directory.