The Algorithms - Python Python TheAlgorithms/Python

Hash Functions in TheAlgorithms/Python

Six hash algorithms from a 13-byte checksum to 256-bit cryptographic digests: each trading reliability for speed differently.

6 stops ~12 min Verified 2026-05-05

What you will learn

How MD5 applies four rounds of bitwise mixing operations to 512-bit blocks
How SHA-1 expands each block to 80 words and applies four conditional mixing functions
How SHA-256 uses 64 rounds of sigma and majority functions to compress each block
How Adler-32 maintains two running sums for a fast but weaker checksum
How the Luhn algorithm validates credit card numbers by doubling alternate digits
How DJB2 produces a hash using a magic seed and a bit-shift-plus-add accumulation

Prerequisites

Comfort reading Python loops and integer arithmetic
Basic familiarity with bitwise operations (AND, OR, XOR, shift)
No prior knowledge of hash function internals required

1 / 6

MD5: four rounds of bitwise mixing per block

hashes/md5.py:402

MD5 processes each 512-bit block in 64 rounds, selecting a different bitwise mixing function for each group of 16 rounds.

MD5 is now considered cryptographically broken for collision resistance, but its structure is a clear example of how iterative block hash functions work. Each 512-bit block is processed through 64 rounds with four state variables a, b, c, d. The round function f changes every 16 rounds, cycling through XOR and AND combinations of the state variables. The round index determines which word from the block (g) feeds into the mixing step, and a precomputed constant derived from sin(i) provides nonlinearity. Each round rotates and adds the mixed value into b. After all 64 rounds the block's result is added back to the initial state. For a full tour of MD5's padding, constants, and reformatting steps, see the deeper tour.

Key takeaway

MD5 processes each 512-bit block in 64 rounds of bitwise mixing, using four different functions across four groups of 16 rounds to create diffusion.

    # Process bit string in chunks, each with 16 32-char words
    for block_words in get_block_words(bit_string):
        a = a0
        b = b0
        c = c0
        d = d0

        # Hash current chunk
        for i in range(64):
            if i <= 15:
                # f = (b & c) | (not_32(b) & d)     # Alternate definition for f
                f = d ^ (b & (c ^ d))
                g = i
            elif i <= 31:
                # f = (d & b) | (not_32(d) & c)     # Alternate definition for f
                f = c ^ (d & (b ^ c))
                g = (5 * i + 1) % 16
            elif i <= 47:
                f = b ^ c ^ d
                g = (3 * i + 5) % 16
            else:
                f = c ^ (b | not_32(d))
                g = (7 * i) % 16
            f = (f + a + added_consts[i] + block_words[g]) % 2**32
            a = d
            d = c
            c = b
            b = sum_32(b, left_rotate_32(f, shift_amounts[i]))

2 / 6

SHA-1: expand to 80 words, then compress with four conditions

hashes/sha1.py:100

SHA-1 expands each 512-bit block to 80 words via XOR and rotation, then compresses those 80 words through four conditional mixing functions.

SHA-1 operates on a five-variable state (a, b, c, d, e) and processes each 512-bit block in 80 rounds. Before the main loop, expand_block XORs four earlier words and left-rotates by 1 to extend the original 16 words to 80, increasing the diffusion of each input bit across more rounds. The compression loop then selects one of four mixing functions based on the round index: IF, XOR, MAJ, and XOR again. Each round rotates a by 5, adds the mixing result plus a round constant, and shifts the other variables down. The result is masked to 32 bits with & 0xFFFFFFFF. SHA-1 is also now considered weak against collision attacks, but its structure directly influenced SHA-256. For full SHA-1 analysis see the deeper tour.

Key takeaway

SHA-1 expands each block to 80 words before compression, increasing the number of input bits that influence each output bit across 80 rounds.

        for block in self.blocks:
            expanded_block = self.expand_block(block)
            a, b, c, d, e = self.h
            for i in range(80):
                if 0 <= i < 20:
                    f = (b & c) | ((~b) & d)
                    k = 0x5A827999
                elif 20 <= i < 40:
                    f = b ^ c ^ d
                    k = 0x6ED9EBA1
                elif 40 <= i < 60:
                    f = (b & c) | (b & d) | (c & d)
                    k = 0x8F1BBCDC
                elif 60 <= i < 80:
                    f = b ^ c ^ d
                    k = 0xCA62C1D6
                a, b, c, d, e = (
                    self.rotate(a, 5) + f + e + k + expanded_block[i] & 0xFFFFFFFF,
                    a,
                    self.rotate(b, 30),
                    c,
                    d,
                )

3 / 6

SHA-256: sigma functions and majority vote per round

hashes/sha256.py:159

SHA-256 compresses 64 rounds per block using sigma rotation functions, a choice function, and a majority function over the eight-word state.

SHA-256 maintains eight 32-bit state variables (a, b, c, d, e, f, g, h) and processes 64 rounds per block. The compression step in each round computes two intermediate values. temp1 mixes h, three rotations of e, the choice function ch (which selects bits from f or g based on e), the round constant, and the expanded word. temp2 combines three rotations of a and the majority function maj over (a, b, c). The state variables then shift down one position, with a and e receiving the new mixed values. All arithmetic is modulo 2^32. The initial hash values and round constants in SHA-256 are the fractional parts of the square roots and cube roots of the first 8 and 64 primes respectively.

Key takeaway

SHA-256's strength comes from two independent mixing paths per round: the sigma-1 choice path into e and the sigma-0 majority path into a, applied 64 times per block.

                # Compression
                s1 = self.ror(e, 6) ^ self.ror(e, 11) ^ self.ror(e, 25)
                ch = (e & f) ^ ((~e & (0xFFFFFFFF)) & g)
                temp1 = (
                    h + s1 + ch + self.round_constants[index] + words[index]
                ) % 0x100000000
                s0 = self.ror(a, 2) ^ self.ror(a, 13) ^ self.ror(a, 22)
                maj = (a & b) ^ (a & c) ^ (b & c)
                temp2 = (s0 + maj) % 0x100000000

                h, g, f, e, d, c, b, a = (
                    g,
                    f,
                    e,
                    ((d + temp1) % 0x100000000),
                    c,
                    b,
                    a,
                    ((temp1 + temp2) % 0x100000000),
                )

4 / 6

Adler-32: two running sums, fast but not collision-resistant

hashes/adler32.py:25

Adler-32 maintains two accumulators: A sums character ordinals and B sums all prior A values, both modulo 65521.

Adler-32 was designed for speed over reliability. The module docstring names its trade-off explicitly: it is faster than a CRC of the same length but less reliable than Fletcher-32. The algorithm uses two 16-bit accumulators. a starts at 1 (not 0, because 0 would not change on a sequence of null bytes) and adds each character's ordinal value modulo 65521. b accumulates the running sum of all a values, also modulo 65521. The choice of 65521 is deliberate: it is the largest prime that fits in 16 bits, which maximises the period before the modulo wraps. The final result packs both accumulators into a 32-bit integer by shifting b left 16 bits and ORing with a. The doctest confirms adler32('Algorithms') produces 363791387.

Key takeaway

Adler-32's speed comes from only two accumulators updated per byte; its weakness is that short messages with the same byte sum can collide.

    a = 1
    b = 0
    for plain_chr in plain_text:
        a = (a + ord(plain_chr)) % MOD_ADLER
        b = (b + a) % MOD_ADLER
    return (b << 16) | a

5 / 6

Luhn: validate a card number by doubling alternate digits

hashes/luhn.py:20

Luhn doubles every other digit from the right, subtracts 9 if the doubled value exceeds 9, and checks that the total is divisible by 10.

The Luhn algorithm is not a hash function in the cryptographic sense. It is a checksum designed to detect single-digit transcription errors in credit card numbers. The last digit is the check digit and is separated before the loop. The remaining digits are reversed so that processing from left to right after reversal is equivalent to processing right to left from the original number. Every even-indexed digit in the reversed list (which is every odd position from the right in the original) is doubled. If the doubled value exceeds 9, subtracting 9 is equivalent to summing the two digits of the result. The total must be divisible by 10 for the number to be valid. The doctest confirms that only 79927398713 passes among 10 nearby candidates.

Key takeaway

Luhn's doubling step detects single-digit errors and transpositions in card numbers; it is a check digit scheme, not a hash function.

    check_digit: int
    _vector: list[str] = list(string)
    __vector, check_digit = _vector[:-1], int(_vector[-1])
    vector: list[int] = [int(digit) for digit in __vector]

    vector.reverse()
    for i, digit in enumerate(vector):
        if i & 1 == 0:
            doubled: int = digit * 2
            if doubled > 9:
                doubled -= 9
            check_digit += doubled
        else:
            check_digit += digit

    return check_digit % 10 == 0

6 / 6

DJB2: a magic seed and shift-plus-add per character

hashes/djb2.py:32

DJB2 starts at 5381 and updates the accumulator as hash times 33 plus the character ordinal, masked to 32 bits at the end.

DJB2 is a non-cryptographic hash function credited to Dan Bernstein. Its entire implementation fits in four lines, yet it produces surprisingly good distribution across common string inputs. The seed 5381 is described in the module comment as magic: it is odd, prime, and deficient, which together produce better avalanche properties than most arbitrary starting values. The accumulation step (hash_value << 5) + hash_value is equivalent to multiplying by 33 using only a shift and an add. Each character's ordinal is added to the result. The final & 0xFFFFFFFF masks to 32 bits without performing a modulo, which is faster. The module also mentions an XOR variant where ord(x) is XORed rather than added, which Bernstein now prefers.

Key takeaway

DJB2's speed comes from replacing multiplication with a bit-shift-and-add; its distribution quality comes from the carefully chosen seed 5381 and multiplier 33.

    hash_value = 5381
    for x in s:
        hash_value = ((hash_value << 5) + hash_value) + ord(x)
    return hash_value & 0xFFFFFFFF

Your codebase next

Create code tours for your project

Intraview lets AI create interactive walkthroughs of any codebase. Install the free VS Code extension and generate your first tour in minutes.

Install Intraview Free