Home | Contact Us | FAQ | Search & Site Map | Link to Us
Sign In | Join | Other 45 Sites in Network
HomeAnnouncementsWhite Papers
Discussion GroupsFirst AidDatabasesJavaBeansGUIJava 3DVirtual MachineCORBASecurityToolsGeneral
Java DirectoryOpen Source ProjectsSample Book ChaptersUser GroupsWeb Resources
Related Topics
Databases.NETMore Topics ...

Java Forum / General / May 2007

Tip: Looking for answers? Try searching our database.

MD5 algorithm

Thread view: 
Christoph Burschka - 14 May 2007 11:38 GMT
I'm trying to implement the MD5 algorithm. (Note: I'm not looking for an
existing MD5 library; this is an exercise). It compiles and returns
something that looks like a hash superficially, but that's clearly the
wrong result. Even if there were a Unicode/Ascii problem, the empty
string should return the correct result (since it is always padded to
one 1-bit and 511 0-bits), but it doesn't.

Also, several strings (say, "" and "The") return the same incorrect hash
- but by dumping the 128-bit hash in each iteration, I've found only
very little in common between the intermediate steps, only the final one
is completely identical.

This is the function, as implemented from the pseudocode on Wikipedia.
http://en.wikipedia.org/wiki/MD5#Pseudocode

--------------------------------------------------------

Constants:

int[4] s = {
        0x67452301,
        0xEFCDAB89,
        0x98BADCFE,
        0x10325476
    };   
int[64] shift= {
    7, 12, 17, 22,  7, 12, 17, 22,  7, 12, 17, 22,  7, 12, 17, 22,
  5,  9, 14, 20,  5,  9, 14, 20,  5,  9, 14, 20,  5,  9, 14, 20,
  4, 11, 16, 23,  4, 11, 16, 23,  4, 11, 16, 23,  4, 11, 16, 23,
  6, 10, 15, 21,  6, 10, 15, 21,  6, 10, 15, 21,  6, 10, 15, 21
             };

int[64] k;

  for (int i=0;i<64;i++)
  {
    k[i]=(int)(Math.abs(Math.sin(i+1))*1073741824*4);
  }
   

private static int[] processFin(int[] message)
{
  int[] result=s.clone(); // initialize
  // message is already padded, just split:
  int blocks=message.length/16;
  for (int i=0;i<blocks;i++)
  { // for each 16-int block
    int[] state=result.clone(); // copy state
    for (int j=0;j<64;j++)
      {
        int  f,g,temp;
        if  (j<16)
        {
          f= state[1] & state[2]  | ~state[1]  &  state[3];
          g=j;
        }
        else  if  (j<32)
        {
          f=state[3]  &  state[1] | ~state[3] & state[2];
          g=(5*j+1)%16;
        }
        else  if  (j<48)
        {
          f=state[1]^state[2]^state[3];
          g=(3*j+5)%16;
        }
        else
        {
          f=state[2]^ (state[1] | ~state[3]);
          g=(7*j)%16;
        }
        temp=state[3];
        state[3]=state[2];
        state[2]=state[1];
        state[1]=(state[0]+f+k[j]+message[i*16+g])  <<  shift[j];
        state[0]=temp;
      }
      for(int  j=0;j<4;j++)  result[i]+=state[i];
    }
    return  result;
}

---------------------------------------------
The hex string returned for both "" and "The" is
3c00a101efcdab8998badcfe10325476.

I just can't see what's wrong with the code - all parts of the
pseudocode description seem to be implemented exactly as they should
be... unless the signed ints don't wrap modulo 2^32 as unsigned ones
should. But they do for all examples I could think of, and anyway, that
shouldn't cause those hashes to be exactly the same...

--cb
sebbaer - 14 May 2007 12:03 GMT
Her i have found some intressting webpages about it. Maybe it helps.

http://userpages.umbc.edu/~mabzug1/cs/md5/md5.html
Christoph Burschka - 14 May 2007 12:40 GMT
sebbaer schrieb:
> Her i have found some intressting webpages about it. Maybe it helps.
>
> http://userpages.umbc.edu/~mabzug1/cs/md5/md5.html

If I had found something that helped me in the first 10 Google results,
I wouldn't have had to ask here, now would I? :)

The only thing I could use would be some kind of step-by-step
walkthrough for the MD5 calculation of a short string, so I could check
it against the intermediate steps in my own calculation

--cb
Roedy Green - 14 May 2007 14:21 GMT
On Mon, 14 May 2007 13:40:38 +0200, Christoph Burschka
<christoph.burschka@rwth-aachen.de> wrote, quoted or indirectly quoted
someone who said :

>The only thing I could use would be some kind of step-by-step
>walkthrough for the MD5 calculation of a short string, so I could check
>it against the intermediate steps in my own calculation

Can you find Sun's code?  Do you have an assembler level bebugger than
will let you single step through the code?
--

Roedy Green Canadian Mind Products
The Java Glossary
http://mindprod.com
Roedy Green - 14 May 2007 15:14 GMT
On Mon, 14 May 2007 13:40:38 +0200, Christoph Burschka
<christoph.burschka@rwth-aachen.de> wrote, quoted or indirectly quoted
someone who said :

>The only thing I could use would be some kind of step-by-step
>walkthrough for the MD5 calculation of a short string, so I could check
>it against the intermediate steps in my own calculation
--

Roedy Green Canadian Mind Products
The Java Glossary
http://mindprod.com
Oliver Wong - 14 May 2007 17:29 GMT
> sebbaer schrieb:
>> Her i have found some intressting webpages about it. Maybe it helps.
[quoted text clipped - 3 lines]
> If I had found something that helped me in the first 10 Google results,
> I wouldn't have had to ask here, now would I? :)

   You'd be surprised...

   - Oliver
Filip Larsen - 14 May 2007 16:29 GMT
> private static int[] processFin(int[] message)
> {
[quoted text clipped - 3 lines]
>   for (int i=0;i<blocks;i++)
>   { // for each 16-int block

It seems you are skipping the last 16 element in messages. Perhaps the
"i < blocks" condition should have been "i <= blocks"?

Regards,
Signature

Filip Larsen

Christoph Burschka - 14 May 2007 22:24 GMT
>> private static int[] processFin(int[] message)
>> {
[quoted text clipped - 8 lines]
>
> Regards,

Mh... you had me there for a few moments. But no, the message is already padded
to a multiple of 16, so blocks is exactly the right number of blocks (if the
message has 32 elements, blocks will be 2, and the loop will go from block #0 to
block #1).

If i is the number of the block, the messages in the block are message[16*i ..
16*i+15]. Since blocks is exactly the message length divided by 16, the loop I
have now goes from message[0] to message[(message.length / 16 -1) * 16 +15], or
message[message.length-1]. If I changed the boundary condition to i<=blocks, I'd
be running out of the array boundary.

--cb
Filip Larsen - 14 May 2007 23:12 GMT
> Mh... you had me there for a few moments. But no, the message is already padded
> to a multiple of 16, so blocks is exactly the right number of blocks (if the
> message has 32 elements, blocks will be 2, and the loop will go from block #0 to
> block #1).

Ok, hard to discount that without seeing the padding code.

>  for(int  j=0;j<4;j++)  result[i]+=state[i];

Next bug candidate is this line, where it looks like the arrays should
have been indexed by j and not i.

Regards,
Signature

Filip Larsen

Christoph Burschka - 15 May 2007 01:06 GMT
>> Mh... you had me there for a few moments. But no, the message is
>> already padded
[quoted text clipped - 12 lines]
>
> Regards,

Wow... that thing is literally crawling with bugs. Unfortunately, that still
didn't do the trick - now the result is different again, but the hashes are
still wrong and also nearly identical:

[]: 0112a34159cdab8981f7dcfea9eb2476
[ ]: 0112a34159cdab8981f7dcfea9eb2476
[The]: 8112a34159cdab8981f7dcfea9eb2476
[ttr]: 8112a34159cdab8981f7dcfea9eb2476
[ t t]: 7512a34159cdab8981f7dcfea9eb2476
[The quick brown fox jumps over the lazy dog]: f3b76b417dcdab89cf30dcfe97eb2476

I think I'll sleep on it and look at the code again when I'm more awake. Thanks
very much for the help! :)

--
cb
rossum - 14 May 2007 16:44 GMT
>I'm trying to implement the MD5 algorithm. (Note: I'm not looking for an
>existing MD5 library; this is an exercise). It compiles and returns
[quoted text clipped - 12 lines]
>
>--------------------------------------------------------

[snip]

>private static int[] processFin(int[] message)
>{
>   int[] result=s.clone(); // initialize
>   // message is already padded, just split:
Have you implemented the padding correctly?  To quote RFC 1321:
"3.1 Step 1. Append Padding Bits

  The message is "padded" (extended) so that its length (in bits) is
  congruent to 448, modulo 512. That is, the message is extended so
  that it is just 64 bits shy of being a multiple of 512 bits long.
  Padding is always performed, even if the length of the message is
  already congruent to 448, modulo 512.

  Padding is performed as follows: a single "1" bit is appended to
  the message, and then "0" bits are appended so that the length in
  bits of the padded message becomes congruent to 448, modulo 512. In
  all, at least one bit and at most 512 bits are appended.

3.2 Step 2. Append Length

  A 64-bit representation of b (the length of the message before the
  padding bits were added) is appended to the result of the previous
  step. In the unlikely event that b is greater than 2^64, then only
  the low-order 64 bits of b are used. (These bits are appended as
  two 32-bit words and appended low-order word first in accordance
  with the previous conventions.)

  At this point the resulting message (after padding with bits and
  with b) has a length that is an exact multiple of 512 bits.
  Equivalently, this message has a length that is an exact multiple
  of 16 (32-bit) words. Let M[0 ... N-1] denote the words of the
  resulting message, where N is a multiple of 16."

Note the endianness of the 64 bit message length field, which is not
clearly explained in the Wikipedia article - two big-endian 32 bit
words in little-endian order.  This is not a standard byte order and
you will need to implement it carefully.

It is also necessary to get all endian issues resolved correctly
within the algorithm itself - they can often be a source of problems
when implementing cryptographic algorithms.

RFC 1321 (http://tools.ietf.org/html/rfc1321) also includes a
canonical implementation of MD5 in C, which might be helpful to read.
It also contains a few more Test Vectors than the Wikipedia article.

rossum
Christoph Burschka - 14 May 2007 22:37 GMT
> Note the endianness of the 64 bit message length field, which is not
> clearly explained in the Wikipedia article - two big-endian 32 bit
[quoted text clipped - 10 lines]
>
> rossum

Whoops. I didn't get the little-endian thing. So far I just padded my int[]
array to a length of 14 mod 16, then added the length value (long) as two int
words in normal big-endian order.

So what I should do is just swap the order of the two ints that I split the long
value into, leaving the ints themselves unchanged.

Mh. But that can't be my only mistake. Consider the zero-length example - ""
will always be padded to 0x8000[...]00, and the unpadded length, 0, is the same
regardless of endianness. Still, this message gives me the wrong result.

Not even mentioning that I'm somewhat skeptic of the possibility that the order
of two 32bit words at the end of the message could so badly break the hashing
function as to give it the same hash for two random short messages... but I'll
fix this and see what happens.

--cb
Christoph Burschka - 14 May 2007 23:25 GMT
>> Note the endianness of the 64 bit message length field, which is not
>> clearly explained in the Wikipedia article - two big-endian 32 bit
[quoted text clipped - 28 lines]
>
> --cb

Whoops again. There's a lot more wrong with my padding than I thought. I appear
to be adding the message length some 8 words before the actual end, and I don't
even know how I managed to do that.

Also, the initialization of my sine table is messed up - casting from double to
int will apparently just set too-high values to INT_MAX_VALUE. But if I cast the
sine to int before multiplying it by 2^32, well... it just ends up at 0, of
course. I'll need to do something else here...

Aha! Casting first to long to get a whole number, then to int, appears to give
me something closer to what I want - the long->int conversion wraps instead of
setting to INT_MAX_VALUE. Whether the result is also the correct one remains to
be seen...

--cb
Christoph Burschka - 15 May 2007 10:02 GMT
Mh... several more problems fixed. For example, I missed another
Endianness reversal: The words of the message itself are processed with
the lowest byte first.

My fix doesn't look too pretty, but I think it's the fastest it gets
without heavily optimizing:

    private static int littleEndian(int in)
    {
        int out=0;
        if (in<0)
        {
            in-=Integer.MIN_VALUE;
            out+=128;
        }
        for (int i=0;i<4;i++)
        {
            out+=in%256*Math.pow(256,3-i);
            in/=256;
        }
        return out;
    }

Also, I missed a crucial part of the algorithm:

        b := ((a + f + k[i] + w[g]) leftrotate r[i]) + b

And finally, it appears that the bitwise shift operator isn't doing what
I thought it would - no rotation; bits to the left are removed, 0s are
added to the right. Too bad - I'd hoped I could use the built-in <<
operator; now I need a new function for it. (i=i*2+(i<0?1:0)) is
reasonably elegant for this, however.

Oh, and the final problem: I forgot to reverse the Endianness of the
hash result after calculation.

With all this, I am now finally getting the correct result for the
zero-length message (d41d8cd98f00b204e9800998ecf8427e). I'm still
getting incorrect results for actual messages, but they have nothing in
common and I can be reasonably sure that's because of incorrect padding
or character encoding, not the algorithm.

--
cb


Free Magazines

Get these publications absolutely FREE for up to 12 months. There are no hidden fees and no obligation. Simply choose a title, complete the application form and submit it. Read more ...

Oracle MagazineNetwork ComputingComputer WorldBio-IT WorldeWeekInformation WeekInfosecurity
 
Sign In
Join
My Latest Posts
My Monitored Threads
My Blog
My Photo Gallery
My Profile
My Homepage

Start New Thread
Enable EMail Alerts
Rate this Thread



©2009 Advenet LLC   Privacy Policy - Terms of Use
This website includes both content owned or controlled by Advenet as well as content owned or controlled by third parties.