Java Forum / General / February 2007
Hashing function different values on different OS ?
Lawrence - 17 Feb 2007 05:21 GMT Hi all, I use a simple function to create a hash of a file using sha for an utility i'm writing.
The function is here : public static String digest(File file) throws FileNotFoundException, IOException, NoSuchAlgorithmException { MessageDigest sha; sha = MessageDigest.getInstance("sha"); DigestInputStream din = new DigestInputStream(new BufferedInputStream(new FileInputStream(file)),sha);
while (din.read() != -1){} din.close();
return sha.digest().toString();
}
I send a file over a network (LAN) between a mac and a windows computer, both using my application. I sent zip files, mp3s, jpegs, bmps, txt, tiff, gif, and videos and it all worked perfectly, but the outcoming hash is different for the same file. How weird is that ?Maybe the name of the file matters ?It shouldn't.
Luc The Perverse - 17 Feb 2007 06:34 GMT > Hi all, I use a simple function to create a hash of a file using sha > for [quoted text clipped - 21 lines] > outcoming hash is different for the same file. > How weird is that ?Maybe the name of the file matters ?It shouldn't. IF you weren't using java I'd say it could be an endian problem.
Test small file and dump hex to screen and compare it.
I'm intrigued ;)
-- LTP
:) Richter~9.6 - 17 Feb 2007 10:10 GMT > Hi all, I use a simple function to create a hash of a file using sha > for [quoted text clipped - 21 lines] > outcoming hash is different for the same file. > How weird is that ?Maybe the name of the file matters ?It shouldn't. Have you tried zipping up the contents before moving it and unzipping it on the target machine?
Regards, Richard
Alex Hunsley - 17 Feb 2007 12:56 GMT > Hi all, I use a simple function to create a hash of a file using sha > for [quoted text clipped - 17 lines] > I send a file over a network (LAN) between a mac and a windows > computer, both using my application. Like Luc, I was suspecting endian problems for a moment, but Java's standard streams assume network byte order (big endian), so Java operating at both ends should match up ok. Could it be something to do with how MessageDigest may be doing any seeding? lex
> I sent zip files, mp3s, jpegs, bmps, txt, tiff, gif, and videos and it > all worked perfectly, but the > outcoming hash is different for the same file. > How weird is that ?Maybe the name of the file matters ?It shouldn't. Eric Sosman - 17 Feb 2007 13:53 GMT > Hi all, I use a simple function to create a hash of a file using sha > for [quoted text clipped - 21 lines] > outcoming hash is different for the same file. > How weird is that ?Maybe the name of the file matters ?It shouldn't. Have you examined the way you "send the file" over the network? Note that Mac and Windows use different conventions to mark the ends of lines in text files, so "the same" text will be represented by different byte sequences on the two machines. Transport mechanisms like FTP make the conversion automatically, so you may not have noticed it happening.
 Signature Eric Sosman esosman@acm-dot-org.invalid
Paul Tomblin - 17 Feb 2007 14:54 GMT In a previous article, Eric Sosman <esosman@acm-dot-org.invalid> said:
>network? Note that Mac and Windows use different conventions >to mark the ends of lines in text files, so "the same" text >will be represented by different byte sequences on the two >machines. Transport mechanisms like FTP make the conversion >automatically, so you may not have noticed it happening. Just to expand on that a bit, if you transfer using ftp and tell it that the file is ascii, it will convert the ends of lines, and if you tell it that it's binary it won't. Some ftp clients auto-detect what you're sending and set the binary/ascii flag correctly, but many don't, and if you send a binary file without telling it that it's binary, it will end up badly corrupted.
 Signature Paul Tomblin <ptomblin@xcski.com> http://blog.xcski.com/ The way NT mounts filesystems is something I'd expect to find in a barnyard or on a stock-breeding farm. -- Mike Andrews
Lawrence - 17 Feb 2007 17:27 GMT To answer your question let me explain. I transfer the file using my own java program, I use simple chunks of bytes and I save them to new files. Since both client & server are in java and written by me I believe there shoulodn't be any endian problem of any sort. At the end the program is pretty simple, I make a hash code, i send the hash code with some other info such as file name and file size, then the clients connects back and request the file by sending the hash, i check on a hashmap the file, i send it via chunks of bytes. I do check that if the chunk is not fulled by the InputStream i write only the read data, on both client and server. When the transfer is completed the client checks that the file received has the same hash that the server initially stated. This is always false. For any file type. But I tried many types and including dmg disk images or rar files, jpegs, videos, zip and they all work afterwards. I'm going to send a very small file and check on both sides the hex prints. Will let you know ..
> In a previous article, Eric Sosman <esos...@acm-dot-org.invalid> said: > [quoted text clipped - 10 lines] > you send a binary file without telling it that it's binary, it will end up > badly corrupted. Lawrence - 17 Feb 2007 17:49 GMT > To answer your question let me explain. > I transfer the file using my own java program, I use simple chunks of Sorry for the bad quoting before. I just tried with a hex editor to open a file send on both sides, and they are equal. So the problem is in the function. For a file that has inside the 4 characters "CIAO" hex [ 43 49 41 4F ] on MAC the hash is [B@425743 For the same file, on a Windows machine is [B@472d48
Done again on a mac is [B@238016. Done again on the windows machine is [B@3ae941
I don't understand .. how is this possible ?
Maybe there is something wrong to having an array of bytes to string ? The statement that returns in the method i posed.
Thanks folks
Mike Schilling - 17 Feb 2007 20:17 GMT >> To answer your question let me explain. >> I transfer the file using my own java program, I use simple chunks of [quoted text clipped - 12 lines] > > I don't understand .. how is this possible ? "[B@425763" means "This is a byte array, and it's object number 425763 in the JVM". It doesn't say anything about the contents of the byte array. I presume it comes from code like
byte[] barr. System.out.println(barr.toString());
Try something like
for (int i = 0; i < barr.length; i++) { System.out.print(Integer.toHexStrng(barr[i] & 0xFF); System.out.print(", "); }
to see what the byte array contains..
Lothar Kimmeringer - 17 Feb 2007 18:08 GMT > return sha.digest().toString(); byte[].toString doesn't work the way you think. You have to do something like this:
byte[] digest = sha.digest(); StringBuffer sb = new StringBuffer(); for (int i = 0; i < digest.length; i++){ if ((digest[i] & 0xff) < 16){ sb.append("0"); } sb.append(Integer.toHexString(digest[i] & 0xff); sb.append(" "); } return sb.toString();
I wrote this by hand without checking for errors, so the correct result might be different.
BTW: When reading or writing data, don't use Streams or Readers/Writers that convert data like PrintStreams or InputStreamReader/OutputStreamWriter.
Regards, Lothar
 Signature Lothar Kimmeringer E-Mail: spamfang@kimmeringer.de PGP-encrypted mails preferred (Key-ID: 0x8BC3CD81)
Always remember: The answer is forty-two, there can only be wrong questions!
Lawrence - 17 Feb 2007 18:29 GMT On Feb 17, 6:08 pm, Lothar Kimmeringer <news200...@kimmeringer.de> wrote:
> > return sha.digest().toString(); > [quoted text clipped - 14 lines] > I wrote this by hand without checking for errors, so the > correct result might be different. Cool, I though that an array to string will always return the same value but i forgot that arrays are objects that have other things such as references when they do toString .. I will test your code (but I need to have a look back to shift operator and bit wise and) :P
Lew - 17 Feb 2007 22:49 GMT Lothar Kimmeringer wrote:
>> sb.append(Integer.toHexString(digest[i] & 0xff);
> I need to have a look back to shift operator and bit wise and) :P This use of the operator & is called "masking", and the int operand 0xff in this example a "mask".
Only the bits in the other operand that match position with the 1s in the mask will make it through to the result. The rest are masked out, as with a resist in a circuit-board etching.
In the given example, the lowest byte of digest[i] will show up in the lowest byte of the argument to toHexString(), masked in by the 0xff; the upper bytes of the argument will all be zeroed. This has an effect of ensuring a positive argument to toHexString().
- Lew
Lawrence - 18 Feb 2007 10:28 GMT [SNIP]
>the upper bytes > of the argument will all be zeroed. This has an effect of ensuring a positive > argument to toHexString(). [SNIP]
> - Lew Wait. I though something different. Hex rappresent at most 16 different combinations per digit, so two hex digit rappresent 256 combination , 8 bits, 1 byte. Then it does some kind of implicit conversion applying and bit wise operation between 0xFF which is like a bit string of 8 1s. The result should be an number (what, hex or int or even a byte) that if is smaller than 16 means it will be of only one digit, therefore a 0 is added in front of the hex digit.
Am I wrong ?
Lothar Kimmeringer - 18 Feb 2007 10:53 GMT > Hex rappresent at most 16 different combinations per digit, so two hex > digit rappresent 256 combination Hex represents a value with the base of 16. One "digit" can therefore represent numbers from 0 to 15. How many "combinations" can be represented depends on the bitlength. Integer (used here) can hold 32 Bits, so a Hex-number can be up to 8 Hex-digits (aka Nibbles) long.
> , 8 bits, 1 byte. > Then it does some kind of implicit conversion applying and bit wise > operation between > 0xFF which is like a bit string of 8 1s. The usage of the mask has the reason to covert the signed byte to an unsigned int-value. Alternatively you can do a digest[i] + (digest[i] < 0 ? 256 : 0); But this is much more complicated to read and understand what is intended to happen here.
If you don't do this kind of thing and you just do a Integer.toHexString((int) digest[i]); a set value of e.g. 255 will lead to the hex-value of FFFFFFFF to be returned. Why? If you set 255 (0xff) to a byte that is signed, the value will be -1 after that (that's what 0xff represents). If you just cast it to an int, the value still is -1, there are just more bits being set (0xffffffff).
The construct (digest[i] & 0xff) tells the VM, to cast digest[i] to int (0xffffffff) and do a logical AND with the value 0xff). The result is 0x000000ff, which is the same value as being set previously.
> The result should be an number (what, hex or int or even a byte) > that if is smaller than 16 means it will be of only one digit, > therefore > a 0 is added in front of the hex digit. That's the first check. Alternatively the if-statement can be if(digest[i] >= 0 && digest[i] < 16) but again this is harder to read and understand two weeks later.
In C you just would use "unsigned byte" (I know byte doesn't exist in C but I don't want to start confusing things staring to use char here). In Java you always have to do these kind of things when handling unsigned data with signed types.
Regards, Lothar
 Signature Lothar Kimmeringer E-Mail: spamfang@kimmeringer.de PGP-encrypted mails preferred (Key-ID: 0x8BC3CD81)
Always remember: The answer is forty-two, there can only be wrong questions!
Lew - 18 Feb 2007 13:06 GMT Lawrence wrote:
>> Hex rappresent at most 16 different combinations per digit, so two hex >> digit rappresent 256 combination Don't confuse a numeric value with the String representation of that value.
>> Then it does some kind of implicit conversion applying and bit wise >> operation between >> 0xFF which is like a bit string of 8 1s. 0xff is a number, equal to 255. It is 32 bits long, not 8. The top 24 bits are 0.
>> The result should be an number (what, hex or int or even a byte) In this case, an int. 0xff is an int, digest[i] is no wider than an int, so the result of & is an int.
>> that if is smaller than 16 means it will be of only one digit, >> therefore Digits only apply to the String form. The int form is always four bytes long.
>> a 0 is added in front of the hex digit. In the String representation only.
You need to study types and numeric operations in Java.
- Lew
Lawrence - 18 Feb 2007 20:12 GMT [SNIP}
> You need to study types and numeric operations in Java. > > - Lew I do. Thank you a lot, all of you. At least now I understand how it does it, I hate when I don't.
Lawrence
Lew - 18 Feb 2007 21:57 GMT Lew wrote:
>> You need to study types and numeric operations in Java.
> I do. > Thank you a lot, all of you. > At least now I understand how it does it, I hate when I don't. I apologize. I should have phrased that advice, "The reasons for this behavior are in the definitions of (numeric) types and numeric operations in Java."
In a nutshell, binary numeric operations perform unary and binary operand promotion at various points. Literals like '0xff' have the virtue of representing positive int values while looking an awful lot like unsigned byte values. This makes them ideal to mask (signed) narrow values into positive wider ones.
Some view Java's snubbing of unsigned bytes as a flaw. That's as may be, but it is a reality for good or ill.
In the world of implicit conversions, be very, very aware.
Gird your loins and venture into the world of unadorned truth in the Java Language Specification (JLS).
<http://java.sun.com/docs/books/jls/third_edition/html/j3TOC.html>
Integer literals: <http://java.sun.com/docs/books/jls/third_edition/html/lexical.html#3.10.1>
The integer bitwise operators: <http://java.sun.com/docs/books/jls/third_edition/html/expressions.html#5233>
Numeric promotions: <http://java.sun.com/docs/books/jls/third_edition/html/conversions.html#5.6>
- Lew
Free MagazinesGet these publications absolutely FREE for up to 12 months. There are no hidden fees and no obligation. Simply choose a title, complete the application form and submit it. Read more ...
|
|
|