The Algorithms - Python Python TheAlgorithms/Python

How SHA-1 Works

Pad the message, split into 512-bit blocks, expand each block to 80 words, run 80 rounds of bit operations, accumulate five 32-bit words into a 160-bit digest.

6 stops ~20 min Verified 2026-05-05

What you will learn

Why SHA-1 processes fixed-size 512-bit blocks and what padding accomplishes
How the five initialization constants encode the starting hash state in hexadecimal
What left rotation does to a 32-bit word and why the SHA-1 round function needs it
How expand_block derives 80 scheduled words from 16 input words via XOR and rotate-by-1
Why the four round functions produce different f and k values for each 20-iteration band
How the final_hash method accumulates per-block results into the five 32-bit words
Why SHA-1 is deprecated for new uses after the SHAttered collision attack of 2017

Prerequisites

Comfort reading Python bitwise operators (<<, >>, |, ^, &, ~)
Familiarity with hexadecimal notation
No prior knowledge of cryptographic hash functions required

The story behind SHA-1

SHA-1 (Secure Hash Algorithm 1) was published by the National Institute of Standards and Technology in 1995 as FIPS 180-1. It was designed by the National Security Agency and produces a 160-bit message digest from an input of any length. The design builds on MD4 and MD5, both by Ron Rivest, extending the digest length from 128 bits to 160 bits to increase collision resistance.

For two decades SHA-1 was the dominant hash function in digital certificates, code signing, and version control. Git uses SHA-1 for commit identifiers, not for security, but for content addressing. TLS 1.1 and earlier used SHA-1 in certificate signatures. PGP used it as the default hash through the mid-2000s.

The theoretical weakness was known since 2005, when Xiaoyun Wang and colleagues published a collision attack requiring 2^69 operations rather than the 2^80 brute-force baseline. The practical collision arrived in 2017 as the SHAttered attack, published by Marc Stevens, Elie Bursztein, Pierre Karpman, Ange Albertini, and Yarik Markov from Google and CWI Amsterdam. They produced two different PDF files with identical SHA-1 hashes, definitively breaking the collision resistance property. SHA-1 is now deprecated for certificate signatures and any new cryptographic use. The replacement is the SHA-2 family (SHA-256, SHA-384, SHA-512). This implementation is explicitly for education.

1 / 6

Module Docstring: The Algorithm in Plain English

hashes/sha1.py:1

The module docstring explains the full pipeline in prose: pad, split into 512-bit blocks, expand, compress, accumulate into a 160-bit buffer.

Before any Python appears, the module docstring describes the SHA-1 pipeline in the order the code implements it: pad the message, split into 64-byte (512-bit) blocks, expand each block, compress it, add the result to a running 160-bit buffer, return the final state. That seven-step prose is the roadmap for reading the class below. Lines 11 through 16 explain the one-way property: computing a hash from a message is fast; finding the message from a hash is computationally infeasible. That property is what makes hash functions useful for password storage, content addressing, and digital signatures. The reference URL on line 26 links to an illustrated walkthrough that many learners find helpful alongside reading the code.

Key takeaway

The docstring states the full seven-step pipeline before any code. Reading it first makes the class structure predictable: one method per pipeline stage.

"""
Implementation of the SHA1 hash function and gives utilities to find hash of string or
hash of text from a file. Also contains a Test class to verify that the generated hash
matches what is returned by the hashlib library

Usage: python sha1.py --string "Hello World!!"
       python sha1.py --file "hello_world.txt"
       When run without any arguments, it prints the hash of the string "Hello World!!
       Welcome to Cryptography"

SHA1 hash or SHA1 sum of a string is a cryptographic function, which means it is easy
to calculate forwards but extremely difficult to calculate backwards. What this means
is you can easily calculate the hash of a string, but it is extremely difficult to know
the original string if you have its hash. This property is useful for communicating
securely, send encrypted messages and is very useful in payment systems, blockchain and
cryptocurrency etc.

The algorithm as described in the reference:
First we start with a message. The message is padded and the length of the message
is added to the end. It is then split into blocks of 512 bits or 64 bytes. The blocks
are then processed one at a time. Each block must be expanded and compressed.
The value after each compression is added to a 160-bit buffer called the current hash
state. After the last block is processed, the current hash state is returned as
the final hash.

Reference: https://deadhacker.com/2006/02/21/sha-1-illustrated/
"""

2 / 6

The Class and the Five Initialization Constants

hashes/sha1.py:34

The SHA1Hash class stores data and h, where h is the five 32-bit initialization constants defined in FIPS 180-1.

The class doctest at lines 37 through 39 pins the complete pipeline in one expression: pass a bytestring, call final_hash(), expect a 40-character hex string. That single line is both the usage example and the regression test that CI runs on every push. The __init__ method stores the input data and initializes self.h to the five constants defined in FIPS 180-1. Those constants are the fractional parts of the square roots of the first five primes, which is why they appear to be arbitrary hex numbers but are actually deterministic: 0x67452301 is the integer part of sqrt(2) * 2^32 modulo 2^32. The docstring's note that 0x is Python hexadecimal notation is aimed at learners encountering hex literals for the first time.

Key takeaway

The five initialization constants are FIPS-specified and deterministic. The class doctest pins the full pipeline: bytes in, 40-hex-char digest out.

class SHA1Hash:
    """
    Class to contain the entire pipeline for SHA1 hashing algorithm
    >>> SHA1Hash(bytes('Allan', 'utf-8')).final_hash()
    '872af2d8ac3d8695387e7c804bf0e02c18df9e6e'
    """

    def __init__(self, data):
        """
        Initiates the variables data and h. h is a list of 5 8-digit hexadecimal
        numbers corresponding to
        (1732584193, 4023233417, 2562383102, 271733878, 3285377520)
        respectively. We will start with this as a message digest. 0x is how you write
        hexadecimal numbers in Python
        """
        self.data = data
        self.h = [0x67452301, 0xEFCDAB89, 0x98BADCFE, 0x10325476, 0xC3D2E1F0]

3 / 6

rotate: Left Bit Rotation on 32-Bit Words

hashes/sha1.py:52

Left bit rotation: shift n left by b, OR the wrapped high bits back into the low end, then mask to 32 bits.

Left rotation is the primitive that SHA-1 uses in both the message schedule and the round function. The formula on line 59 is the standard two-shift implementation: shift n left by b bits to move bits toward the high end, shift n right by 32 - b to retrieve the bits that wrapped, then OR the two halves together. The final mask & 0xFFFFFFFF discards anything above 32 bits, which Python requires because its integers are arbitrary-precision and do not overflow. Without the mask, large inputs would accumulate extra high bits. The doctest confirms rotate(12, 2) returns 48: 12 in binary is 1100; shifting left by 2 gives 110000, which is 48. A rotate by 2 of a number under 2^30 is multiplication by 4.

Key takeaway

Left rotation = (n << b) | (n >> (32-b)), masked to 32 bits. The mask is required because Python integers have no natural overflow.

    @staticmethod
    def rotate(n, b):
        """
        Static method to be used inside other methods. Left rotates n by b.
        >>> SHA1Hash('').rotate(12,2)
        48
        """
        return ((n << b) | (n >> (32 - b))) & 0xFFFFFFFF

4 / 6

padding and split_blocks: Preparing the Input

hashes/sha1.py:61

padding appends a 1-bit, zero bytes, and the 64-bit message length to reach a multiple of 512 bits. split_blocks slices the result into 64-byte chunks.

SHA-1 requires the message to be a multiple of 512 bits before processing. The padding rule is fixed: append the byte 0x80 (a 1 bit followed by seven 0 bits), pad with zero bytes until 8 bytes remain before the next 512-bit boundary, then write the original message length in bits as a big-endian 64-bit integer. Line 65 computes the zero padding length with (63 - (len(self.data) + 8) % 64), which ensures the padded message reaches exactly one block boundary. Line 66 packs the original bit length with struct.pack(">Q", 8 * len(self.data)), where > is big-endian and Q is unsigned 64-bit. The split_blocks method then slices the padded bytestring into 64-byte chunks, one per 512-bit block.

Key takeaway

Padding appends 0x80, then zeros, then the 64-bit big-endian message length. The result is always a multiple of 64 bytes. split_blocks slices it into one 64-byte list per block.

    def padding(self):
        """
        Pads the input message with zeros so that padded_data has 64 bytes or 512 bits
        """
        padding = b"\x80" + b"\x00" * (63 - (len(self.data) + 8) % 64)
        padded_data = self.data + padding + struct.pack(">Q", 8 * len(self.data))
        return padded_data

    def split_blocks(self):
        """
        Returns a list of bytestrings each of length 64
        """
        return [
            self.padded_data[i : i + 64] for i in range(0, len(self.padded_data), 64)
        ]

5 / 6

expand_block: 16 Words to 80 Words

hashes/sha1.py:77

expand_block unpacks a 64-byte block into 16 big-endian 32-bit words, then derives 64 more by XOR-and-rotate-1 over earlier words.

The message schedule expands 16 input words into 80 scheduled words, one per round in the compression function. Line 83 unpacks the 64-byte block as 16 big-endian unsigned 32-bit integers (>16L) and appends 64 zeros as placeholders. Lines 84 and 85 fill positions 16 through 79: each word is the XOR of the word three positions back, eight back, fourteen back, and sixteen back, then rotated left by 1. The rotation by 1 was added in SHA-1 over the original SHA-0 precisely to improve avalanche behavior and close a weakness that allowed constructing collisions in SHA-0. The commented-out @staticmethod on line 77 is a leftover from an earlier draft; the method uses self.rotate, so it cannot be static.

Key takeaway

Sixteen input words expand to 80 via XOR of four predecessors then rotate-left-1. That rotate distinguishes SHA-1 from SHA-0 and was the specific fix for a known weakness.

    # @staticmethod
    def expand_block(self, block):
        """
        Takes a bytestring-block of length 64, unpacks it to a list of integers and
        returns a list of 80 integers after some bit operations
        """
        w = list(struct.unpack(">16L", block)) + [0] * 64
        for i in range(16, 80):
            w[i] = self.rotate((w[i - 3] ^ w[i - 8] ^ w[i - 14] ^ w[i - 16]), 1)
        return w

6 / 6

final_hash: The Compression Loop and the Digest

hashes/sha1.py:88

For each block: copy h to five working variables, run 80 rounds across four bands, then add the results back to h.

The compression function runs 80 rounds split into four 20-round bands. Rounds 0 through 19 use f = (b & c) | ((~b) & d), the IF function, and k = 0x5A827999. Rounds 20 through 39 use XOR of three words. Rounds 40 through 59 use the majority function. Rounds 60 through 79 return to XOR. Each band has a fixed constant k derived from the square root of 2, 3, 5, and 10 respectively. Line 116 applies the round update: rotate a left by 5, add f, add e, add k, add the scheduled word, mask to 32 bits, then shift the five working variables left by one position. After all 80 rounds, line 123 adds the working variables back into self.h with modular addition. The format string on line 130 serializes five 32-bit words as ten hex digits each into a 40-character digest.

Key takeaway

Eighty rounds, four bands, one add-back per block. The format string at the end packs five 32-bit words into a 40-character hex digest.

    def final_hash(self):
        """
        Calls all the other methods to process the input. Pads the data, then splits
        into blocks and then does a series of operations for each block (including
        expansion).
        For each block, the variable h that was initialized is copied to a,b,c,d,e
        and these 5 variables a,b,c,d,e undergo several changes. After all the blocks
        are processed, these 5 variables are pairwise added to h ie a to h[0], b to h[1]
        and so on. This h becomes our final hash which is returned.
        """
        self.padded_data = self.padding()
        self.blocks = self.split_blocks()
        for block in self.blocks:
            expanded_block = self.expand_block(block)
            a, b, c, d, e = self.h
            for i in range(80):
                if 0 <= i < 20:
                    f = (b & c) | ((~b) & d)
                    k = 0x5A827999
                elif 20 <= i < 40:
                    f = b ^ c ^ d
                    k = 0x6ED9EBA1
                elif 40 <= i < 60:
                    f = (b & c) | (b & d) | (c & d)
                    k = 0x8F1BBCDC
                elif 60 <= i < 80:
                    f = b ^ c ^ d
                    k = 0xCA62C1D6
                a, b, c, d, e = (
                    self.rotate(a, 5) + f + e + k + expanded_block[i] & 0xFFFFFFFF,
                    a,
                    self.rotate(b, 30),
                    c,
                    d,
                )
            self.h = (
                self.h[0] + a & 0xFFFFFFFF,
                self.h[1] + b & 0xFFFFFFFF,
                self.h[2] + c & 0xFFFFFFFF,
                self.h[3] + d & 0xFFFFFFFF,
                self.h[4] + e & 0xFFFFFFFF,
            )
        return ("{:08x}" * 5).format(*self.h)

Your codebase next

Create code tours for your project

Intraview lets AI create interactive walkthroughs of any codebase. Install the free VS Code extension and generate your first tour in minutes.

Install Intraview Free