> It is easy for files which are binary equal or can be somehow match
>line by line with some diff-like code, but other than I couldn't
>really find anything in java or otherwise
If you had a way of chunking the file. e.g. sentences in a text file,
newlines in a CSV file ,then you could compute a hashCode for each
"sentence".
You could then process your two files and create a list of hashcodes.
Then sort each list. Then compare counting matches. That gives you a
rough idea of how many sentences they have in common and how many are
unique to each. Compute a ratio of common/total unique sentences.
Ignore collisions (two sentences (either same of different) in same
file producing same hash code.
It is rude to ask questions in one group with followup to another.
I was thinking of some logic like this for creating delta files, that
could efficiently transmit changes to text files that have mainly been
reordered.

Signature
Roedy Green Canadian Mind Products
The Java Glossary
http://mindprod.com