The Algorithms - Python Python TheAlgorithms/Python

How Heapsort Works

Build a max-heap over the array, then repeatedly swap the root to the back and restore the heap. O(n log n) worst case with no extra memory.

6 stops ~16 min Verified 2026-05-05
What you will learn
  • Why J. W. J. Williams chose a binary heap as the sorting engine in 1964 and what Robert W. Floyd's heapify improvement added
  • How the binary-heap property maps to a plain Python list using the 2i+1 and 2i+2 index formulas
  • How heapify compares a node against its two children and swaps downward to restore the heap property
  • Why the build phase runs from n//2-1 down to 0 rather than from 0 up
  • How the extraction phase shrinks the heap boundary on each iteration to sort in place
  • Why heapsort is not stable and how that affects its use compared with mergesort or Timsort
Prerequisites
  • Comfort reading Python with index arithmetic
  • Familiarity with recursion in a helper function
  • No prior knowledge of heapsort required
The story behind heapsort

J. W. J. Williams published heapsort in 1964 in Communications of the ACM alongside his description of the binary heap data structure. The algorithm solved a standing problem: how to guarantee O(n log n) sorting in the worst case without requiring extra memory. Quicksort had been published three years earlier by Tony Hoare, but quicksort degrades to O(n squared) on certain inputs unless a random pivot is used. Williams sidestepped that entirely by organizing the array as a max-heap and extracting the maximum element one at a time.

That same year, Robert W. Floyd published an improved build phase. The naive approach inserts elements one by one from scratch, costing O(n log n) for the build alone. Floyd showed that starting from the bottom of the tree and sifting each node down costs O(n) total for the build, because most nodes are near the leaves and require very few comparisons. That Floyd heapify is what you see in this file.

Heapsort is in-place and O(n log n) worst case, two properties quicksort cannot guarantee simultaneously. The trade-off is that heapsort is not stable and its memory-access pattern is cache-unfriendly, so in practice Timsort and introsort beat it on real hardware even when their worst-case bounds are weaker.

1 / 6

Module Docstring: Short and Direct

sorts/heap_sort.py:1

A one-sentence docstring names the algorithm. There are no run instructions here because the __main__ block carries them.

Most algorithm files in this repository open with a paragraph-length docstring that names the algorithm, describes its behavior, and lists the two invocation commands. This file skips the description and the commands and states the algorithm name only. That brevity is intentional: the algorithm needs no prose framing at the module level because the two functions below are each self-documented with docstrings and doctests. When a file is short enough to read in two screens, a one-sentence module docstring is sufficient. Compare with sorts/insertion_sort.py, which adds a four-sentence paragraph explaining the shift-back-until-in-order mechanic; that extra prose is warranted there because the algorithm's motion is less familiar than the heap property. Here the heapify function's doctest does the explanatory work.

Key takeaway

Module docstrings can be one sentence when the functions below carry their own doctests and docstrings. Length is earned by complexity.

"""
A pure Python implementation of the heap sort algorithm.
"""
2 / 6

heapify: Signature and Doctest

sorts/heap_sort.py:6

heapify takes an array, an index, and a heap-size boundary. Two doctest steps show a root node sifting down twice.

The heap_size parameter is the key design choice in this signature. By accepting the boundary as an argument rather than using len(unsorted) directly, the function can operate on a shrinking region of the same list during the extraction phase. That is what makes heapsort in-place: the sorted elements accumulate at the back of the array while heap_size decreases, and heapify never touches the sorted suffix. The two-step doctest makes the sift-down visible. Starting from [1, 4, 3, 5, 2], the first call finds 1 at the root is smaller than its children (4 and 3), so it swaps 1 down and the root becomes 4; a second heapify is needed because 4 is smaller than 5. After two calls the heap root holds the maximum value.

Key takeaway

The heap_size parameter lets heapify ignore the sorted suffix at the back of the list, which is what makes the extraction phase in-place.

def heapify(unsorted: list[int], index: int, heap_size: int) -> None:
    """
    :param unsorted: unsorted list containing integers numbers
    :param index: index
    :param heap_size: size of the heap
    :return: None
    >>> unsorted = [1, 4, 3, 5, 2]
    >>> heapify(unsorted, 0, len(unsorted))
    >>> unsorted
    [4, 5, 3, 1, 2]
    >>> heapify(unsorted, 0, len(unsorted))
    >>> unsorted
    [5, 4, 3, 1, 2]
    """
    largest = index
3 / 6

heapify: The Sift-Down Logic

sorts/heap_sort.py:20

Three comparisons: left child index 2i+1, right child index 2i+2, then recurse on the subtree rooted at the swapped position.

A binary heap stored in a list places the left child of node at index i at index 2i+1 and the right child at 2i+2. That is the only formula this function needs to navigate the heap. Lines 21 and 22 compute both child indices. Lines 23 and 26 each check whether the child exists (index within heap_size) and whether it is larger than the current candidate for the largest position. After both checks, largest points to whichever of the three nodes (current, left child, right child) holds the maximum value. If that is not the root, line 30 swaps the root and the larger child, then line 31 recurses downward. The recursion terminates when the node is already the largest in its subtree or when it reaches a leaf. Each level of the heap has at most one comparison, so heapify costs O(log n).

Key takeaway

Left child is at 2i+1, right at 2i+2. Three comparisons find the largest; one swap and one recursive call push the smaller value down.

    largest = index
    left_index = 2 * index + 1
    right_index = 2 * index + 2
    if left_index < heap_size and unsorted[left_index] > unsorted[largest]:
        largest = left_index

    if right_index < heap_size and unsorted[right_index] > unsorted[largest]:
        largest = right_index

    if largest != index:
        unsorted[largest], unsorted[index] = (unsorted[index], unsorted[largest])
        heapify(unsorted, largest, heap_size)
4 / 6

heap_sort: Build Phase

sorts/heap_sort.py:34

The function doctest pins four cases. The build phase iterates from n//2-1 down to 0, calling heapify on each non-leaf node.

The four doctests cover duplicates, the empty list, negative values, and an eleven-element mixed list including both a very large and a very small integer. The build phase on lines 51 through 53 is Robert W. Floyd's O(n) improvement. The loop starts at n//2 - 1, which is the index of the last non-leaf node in a zero-indexed binary heap. Leaf nodes (indices from n//2 to n-1) are already trivially valid heaps of size one, so heapifying them would be wasted work. Going from n//2-1 down to 0 ensures that when heapify is called at any index, both its children already satisfy the heap property, so the sift-down finds the correct position in O(log n) passes. The total build cost is O(n) because most nodes are near the bottom of the tree and their sift paths are short.

Key takeaway

Starting at n//2-1 and going down is Floyd's O(n) build: leaves are already heaps, so only non-leaf nodes need sifting.

def heap_sort(unsorted: list[int]) -> list[int]:
    """
    A pure Python implementation of the heap sort algorithm

    :param collection: a mutable ordered collection of heterogeneous comparable items
    :return: the same collection ordered by ascending

    Examples:
    >>> heap_sort([0, 5, 3, 2, 2])
    [0, 2, 2, 3, 5]
    >>> heap_sort([])
    []
    >>> heap_sort([-2, -5, -45])
    [-45, -5, -2]
    >>> heap_sort([3, 7, 9, 28, 123, -5, 8, -30, -200, 0, 4])
    [-200, -30, -5, 0, 3, 4, 7, 8, 9, 28, 123]
    """
    n = len(unsorted)
    for i in range(n // 2 - 1, -1, -1):
        heapify(unsorted, i, n)
5 / 6

heap_sort: Extraction Phase

sorts/heap_sort.py:54

Swap the root (the maximum) to the back, shrink the heap by one, restore the heap property. Repeat until the array is sorted.

After the build phase, unsorted[0] holds the maximum element of the entire array. The extraction loop swaps it to position i (the last unsorted position), then calls heapify(unsorted, 0, i) with the heap size reduced by one. That i in the heapify call is what keeps the freshly placed maximum out of the heap: heapify never considers indices at or beyond its heap_size argument. After n-1 iterations the loop ends when only one element remains at index 0, and since every larger element has already been placed at its correct position, the full array is sorted ascending. The sorted elements grow from right to left; no extra memory is allocated. Each iteration costs O(log i), and summing from i = n-1 down to 1 gives O(n log n) total.

Key takeaway

Each iteration places the current maximum at the end, shrinks the heap boundary, and restores the heap. The sorted region grows in the same array; no extra allocation is needed.

    for i in range(n - 1, 0, -1):
        unsorted[0], unsorted[i] = unsorted[i], unsorted[0]
        heapify(unsorted, 0, i)
    return unsorted
6 / 6

The __main__ Guard: doctest and Conditional Driver

sorts/heap_sort.py:60

Direct execution runs doctest.testmod(), then reads input only if the user provides it, printing via an f-string assignment expression.

Two details here differ from the bubble_sort.py and merge_sort.py drivers. First, the import of doctest is deferred inside the if __name__ block rather than at the top of the file, which keeps the module lighter when it is imported rather than run directly. Second, the input is guarded by if user_input: on line 65, so pressing Enter without typing anything exits cleanly instead of producing a ValueError from int(item) on an empty string. The print(f"{heap_sort(unsorted) = }") idiom on line 67 uses Python's debug f-string format (= suffix), which prints both the expression and its value as heap_sort(unsorted) = [...]. That is more informative than a bare print when a contributor is checking output during development.

Key takeaway

The conditional input guard prevents a ValueError on an empty Enter. The f-string = suffix prints both the call expression and the result for clearer debugging output.

if __name__ == "__main__":
    import doctest

    doctest.testmod()
    user_input = input("Enter numbers separated by a comma:\n").strip()
    if user_input:
        unsorted = [int(item) for item in user_input.split(",")]
        print(f"{heap_sort(unsorted) = }")
Your codebase next

Create code tours for your project

Intraview lets AI create interactive walkthroughs of any codebase. Install the free VS Code extension and generate your first tour in minutes.

Install Intraview Free