The MD5 Hash
Notes
- message digest algorithm
- Produced in 1991 due to problems discovered with MD5
- Ron Rivest
- RFC 1321 is a reference
- I am checking this against the wikipedia article.
- And an arbitrary implementatin on gethub.
- This will take a message and produce a 128 bit HASH
- It is used for password hashes and checksums
- Discuss checksums
- If the hash for a file is unique, the checksum will validate that file
- It is common to get a file hash when you download software
- See the Fedora 43 Download Site
- If you are intereseted, there is a SEED lab on MD5 Collisions
- This lab allows you to build two executables with the same signature.
- And this will allow you to attack a machine by adding malicious code to a program that you redistribute.
- I am afraid that this lab is a little outside the scope of this class.
- But if you wish, I will set it up so we can run it.
- It is NOT cryptographically safe
- In 1996 collisions were found,
- In 2005 researchers were able to create fake documents with valid checksums
- By 2008 it was completely broken.
- But it is valid for our study.
- Other algorithms use similar approaches.
- This algorithm operates at the bit level
- Binary digIT, {0,1}
- 8 bits are a byte
- A word is a variable mesaurement, but here a word is 32 bits.
- Bit level operations
- Addition
- circular shift left
10011 << 2 = 01110 - Bit wise and, or, xor, complement
- Messages to encode are in binary representation
- Messages are padded to be a multiple of 512 bits, but the last is 448 bits.
- Then the last 64 bits are used to store the message length.
- Consider "Hello World!"
- This is 12 characters long
- Each character is 8 bits long
- So the message is 12x8 = 96 bits long.
- It is padded with 448-96 bits or a 1 and 352 zeros
- Then the binary representation of 96 is appended to the end
-
Hello World!10....0 00...01100000- The "Hello World!" is actually
- ASCII: 72 101 108 108 111 32 87 111 114 108 100 33
- Hex: 48 65 6C 6C 6F 20 57 6F 72 6C 64 21
- Binary: 01001000 01100101 01101100 01101100 01101111 00100000 01010111 01101111 01110010 01101100 01100100 00100001
- Define
- Four 32 bit values A,B,C,D
- These are initialized to the arbitrary constants
-
A = 0x67452301 B = 0xefcdab89 C = 0x98badcfe D = 0x10325476 - See line 76 here.
- We don't really care about the values, we just need a constant starting point.
-
- These are initialized to the arbitrary constants
- int K [64] (T in the RFC) is constructed from the sine table.
- K[i] = sin(i) where i is in radians.
- See line 362 here.
- Again, we really don't care about the values, we just need some constant bit patterns for scrambling the bits
- Functions : F,G, H, I (or F[0] - F[4], or FF, FG, FH, FI) depending on the context.
- Define S[4][4] to be a set of shift amounts
- S[0]: 7, 12, 17, 22
- S[1]: 5, 9, 14, 20
- S[2]: 4, 11, 16, 23
- S[3]: 6, 10, 15, 21
- Four 32 bit values A,B,C,D
- Split the message into 512 bit chunks and apply the following algorithm:
- initialize A,B,C,D to the starting values
- for each 512 bit chunk in the message
- ASave = A, BSave = B, CSave = C, DSave = D
- split the 512 bit chunk into 16 32 bit chunks, M[0] ... M[15]
- for j ← 0 to 4
- for i ← 0 to 16
- F = F[j](B,C,D) + A + K[i] + M[i]
- A = D
- D = C
- C = B
- B = B + F << S[j][i]
- ASave = ASave + A, BSave = BSave + B ...
- return ASave append BSave append CSave append DSave
- Line 2 processes the entire message 512 bits at a time.
- The algorithm makes four passes over the block. (line 4)
- Then it processes each word in the block (line 6)
- Note that in the loop from line 6 - 12
- D → A
- C → D
- B → C
- A is permuted, shifted and combiend with the original message to become B
-
(From user Surachit CC BY_SA 3.0