Hash Functions in TheAlgorithms/Python
Six hash algorithms from a 13-byte checksum to 256-bit cryptographic digests: each trading reliability for speed differently.
What you will learn
- How MD5 applies four rounds of bitwise mixing operations to 512-bit blocks
- How SHA-1 expands each block to 80 words and applies four conditional mixing functions
- How SHA-256 uses 64 rounds of sigma and majority functions to compress each block
- How Adler-32 maintains two running sums for a fast but weaker checksum
- How the Luhn algorithm validates credit card numbers by doubling alternate digits
- How DJB2 produces a hash using a magic seed and a bit-shift-plus-add accumulation
Prerequisites
- Comfort reading Python loops and integer arithmetic
- Basic familiarity with bitwise operations (AND, OR, XOR, shift)
- No prior knowledge of hash function internals required
MD5: four rounds of bitwise mixing per block
hashes/md5.py:402MD5 processes each 512-bit block in 64 rounds, selecting a different bitwise mixing function for each group of 16 rounds.
MD5 is now considered cryptographically broken for collision resistance, but its structure is a clear example of how iterative block hash functions work. Each 512-bit block is processed through 64 rounds with four state variables a, b, c, d. The round function f changes every 16 rounds, cycling through XOR and AND combinations of the state variables. The round index determines which word from the block (g) feeds into the mixing step, and a precomputed constant derived from sin(i) provides nonlinearity. Each round rotates and adds the mixed value into b. After all 64 rounds the block's result is added back to the initial state. For a full tour of MD5's padding, constants, and reformatting steps, see the deeper tour.
MD5 processes each 512-bit block in 64 rounds of bitwise mixing, using four different functions across four groups of 16 rounds to create diffusion.
# Process bit string in chunks, each with 16 32-char words
for block_words in get_block_words(bit_string):
a = a0
b = b0
c = c0
d = d0
# Hash current chunk
for i in range(64):
if i <= 15:
# f = (b & c) | (not_32(b) & d) # Alternate definition for f
f = d ^ (b & (c ^ d))
g = i
elif i <= 31:
# f = (d & b) | (not_32(d) & c) # Alternate definition for f
f = c ^ (d & (b ^ c))
g = (5 * i + 1) % 16
elif i <= 47:
f = b ^ c ^ d
g = (3 * i + 5) % 16
else:
f = c ^ (b | not_32(d))
g = (7 * i) % 16
f = (f + a + added_consts[i] + block_words[g]) % 2**32
a = d
d = c
c = b
b = sum_32(b, left_rotate_32(f, shift_amounts[i]))SHA-1: expand to 80 words, then compress with four conditions
hashes/sha1.py:100SHA-1 expands each 512-bit block to 80 words via XOR and rotation, then compresses those 80 words through four conditional mixing functions.
SHA-1 operates on a five-variable state (a, b, c, d, e) and processes each 512-bit block in 80 rounds. Before the main loop, expand_block XORs four earlier words and left-rotates by 1 to extend the original 16 words to 80, increasing the diffusion of each input bit across more rounds. The compression loop then selects one of four mixing functions based on the round index: IF, XOR, MAJ, and XOR again. Each round rotates a by 5, adds the mixing result plus a round constant, and shifts the other variables down. The result is masked to 32 bits with & 0xFFFFFFFF. SHA-1 is also now considered weak against collision attacks, but its structure directly influenced SHA-256. For full SHA-1 analysis see the deeper tour.
SHA-1 expands each block to 80 words before compression, increasing the number of input bits that influence each output bit across 80 rounds.
for block in self.blocks:
expanded_block = self.expand_block(block)
a, b, c, d, e = self.h
for i in range(80):
if 0 <= i < 20:
f = (b & c) | ((~b) & d)
k = 0x5A827999
elif 20 <= i < 40:
f = b ^ c ^ d
k = 0x6ED9EBA1
elif 40 <= i < 60:
f = (b & c) | (b & d) | (c & d)
k = 0x8F1BBCDC
elif 60 <= i < 80:
f = b ^ c ^ d
k = 0xCA62C1D6
a, b, c, d, e = (
self.rotate(a, 5) + f + e + k + expanded_block[i] & 0xFFFFFFFF,
a,
self.rotate(b, 30),
c,
d,
)SHA-256: sigma functions and majority vote per round
hashes/sha256.py:159SHA-256 compresses 64 rounds per block using sigma rotation functions, a choice function, and a majority function over the eight-word state.
SHA-256 maintains eight 32-bit state variables (a, b, c, d, e, f, g, h) and processes 64 rounds per block. The compression step in each round computes two intermediate values. temp1 mixes h, three rotations of e, the choice function ch (which selects bits from f or g based on e), the round constant, and the expanded word. temp2 combines three rotations of a and the majority function maj over (a, b, c). The state variables then shift down one position, with a and e receiving the new mixed values. All arithmetic is modulo 2^32. The initial hash values and round constants in SHA-256 are the fractional parts of the square roots and cube roots of the first 8 and 64 primes respectively.
SHA-256's strength comes from two independent mixing paths per round: the sigma-1 choice path into e and the sigma-0 majority path into a, applied 64 times per block.
# Compression
s1 = self.ror(e, 6) ^ self.ror(e, 11) ^ self.ror(e, 25)
ch = (e & f) ^ ((~e & (0xFFFFFFFF)) & g)
temp1 = (
h + s1 + ch + self.round_constants[index] + words[index]
) % 0x100000000
s0 = self.ror(a, 2) ^ self.ror(a, 13) ^ self.ror(a, 22)
maj = (a & b) ^ (a & c) ^ (b & c)
temp2 = (s0 + maj) % 0x100000000
h, g, f, e, d, c, b, a = (
g,
f,
e,
((d + temp1) % 0x100000000),
c,
b,
a,
((temp1 + temp2) % 0x100000000),
)Adler-32: two running sums, fast but not collision-resistant
hashes/adler32.py:25Adler-32 maintains two accumulators: A sums character ordinals and B sums all prior A values, both modulo 65521.
Adler-32 was designed for speed over reliability. The module docstring names its trade-off explicitly: it is faster than a CRC of the same length but less reliable than Fletcher-32. The algorithm uses two 16-bit accumulators. a starts at 1 (not 0, because 0 would not change on a sequence of null bytes) and adds each character's ordinal value modulo 65521. b accumulates the running sum of all a values, also modulo 65521. The choice of 65521 is deliberate: it is the largest prime that fits in 16 bits, which maximises the period before the modulo wraps. The final result packs both accumulators into a 32-bit integer by shifting b left 16 bits and ORing with a. The doctest confirms adler32('Algorithms') produces 363791387.
Adler-32's speed comes from only two accumulators updated per byte; its weakness is that short messages with the same byte sum can collide.
a = 1
b = 0
for plain_chr in plain_text:
a = (a + ord(plain_chr)) % MOD_ADLER
b = (b + a) % MOD_ADLER
return (b << 16) | aLuhn: validate a card number by doubling alternate digits
hashes/luhn.py:20Luhn doubles every other digit from the right, subtracts 9 if the doubled value exceeds 9, and checks that the total is divisible by 10.
The Luhn algorithm is not a hash function in the cryptographic sense. It is a checksum designed to detect single-digit transcription errors in credit card numbers. The last digit is the check digit and is separated before the loop. The remaining digits are reversed so that processing from left to right after reversal is equivalent to processing right to left from the original number. Every even-indexed digit in the reversed list (which is every odd position from the right in the original) is doubled. If the doubled value exceeds 9, subtracting 9 is equivalent to summing the two digits of the result. The total must be divisible by 10 for the number to be valid. The doctest confirms that only 79927398713 passes among 10 nearby candidates.
Luhn's doubling step detects single-digit errors and transpositions in card numbers; it is a check digit scheme, not a hash function.
check_digit: int
_vector: list[str] = list(string)
__vector, check_digit = _vector[:-1], int(_vector[-1])
vector: list[int] = [int(digit) for digit in __vector]
vector.reverse()
for i, digit in enumerate(vector):
if i & 1 == 0:
doubled: int = digit * 2
if doubled > 9:
doubled -= 9
check_digit += doubled
else:
check_digit += digit
return check_digit % 10 == 0DJB2: a magic seed and shift-plus-add per character
hashes/djb2.py:32DJB2 starts at 5381 and updates the accumulator as hash times 33 plus the character ordinal, masked to 32 bits at the end.
DJB2 is a non-cryptographic hash function credited to Dan Bernstein. Its entire implementation fits in four lines, yet it produces surprisingly good distribution across common string inputs. The seed 5381 is described in the module comment as magic: it is odd, prime, and deficient, which together produce better avalanche properties than most arbitrary starting values. The accumulation step (hash_value << 5) + hash_value is equivalent to multiplying by 33 using only a shift and an add. Each character's ordinal is added to the result. The final & 0xFFFFFFFF masks to 32 bits without performing a modulo, which is faster. The module also mentions an XOR variant where ord(x) is XORed rather than added, which Bernstein now prefers.
DJB2's speed comes from replacing multiplication with a bit-shift-and-add; its distribution quality comes from the carefully chosen seed 5381 and multiplier 33.
hash_value = 5381
for x in s:
hash_value = ((hash_value << 5) + hash_value) + ord(x)
return hash_value & 0xFFFFFFFFYou've walked through 6 key areas of the The Algorithms - Python codebase.
Continue: Data Structures in TheAlgorithms/Python → Browse all projectsCreate code tours for your project
Intraview lets AI create interactive walkthroughs of any codebase. Install the free VS Code extension and generate your first tour in minutes.
Install Intraview Free