The Algorithms - Python Python TheAlgorithms/Python

Navigating TheAlgorithms/Python: A Tour of the 43 Categories

Sample one canonical algorithm from each major category (a sort, a search, a graph traversal, a DP class, a cipher) and end on the autoapi_dirs list that names every category.

7 stops ~18 min Verified 2026-05-05
What you will learn
  • How DIRECTORY.md groups every algorithm under one of 43 category headings
  • What an idiomatic sort file looks like: insertion_sort with a Comparable Protocol and six doctests
  • How searches/binary_search.py guards inputs and returns -1 on miss
  • How graphs/breadth_first_search.py uses a queue, a visited set, and an adjacency dict
  • How dynamic_programming/fibonacci.py memoizes incrementally on a class instance
  • How ciphers/caesar_cipher.py pairs encrypt and decrypt around a shared alphabet
  • How pyproject.toml's autoapi_dirs lists every algorithm directory by name
Prerequisites
  • Comfort reading short Python files with type hints
  • No prior knowledge of any specific algorithm required
1 / 7

DIRECTORY.md: How the 43 Categories Are Laid Out

DIRECTORY.md:1

DIRECTORY.md is a flat list of H2 headings, one per algorithm directory, with one bullet per file. Reading it is reading the architecture.

The repository's table of contents is also its module map. Each ## heading names a top-level directory; each bullet is one algorithm file. Audio Filters has three. Backtracking has 21. The full file enumerates 43 such categories, ranging from one-file directories like blockchain/ to 309-file directories like project_euler/. The flatness is deliberate. There is no core/, no common/, no nested abstractions. To find the binary-search implementations you scroll to ## Searches; to find Dijkstra's algorithm you scroll to ## Graphs. The remaining stops in this tour walk one canonical file from each of the most-trafficked categories so you can see how the consistency holds across the catalog.

Key takeaway

Forty-three H2 headings, one per algorithm directory. The category structure is the architecture; reading the catalog tells you where every algorithm lives.


## Audio Filters
  * [Butterworth Filter](audio_filters/butterworth_filter.py)
  * [Iir Filter](audio_filters/iir_filter.py)
  * [Show Response](audio_filters/show_response.py)

## Backtracking
  * [All Combinations](backtracking/all_combinations.py)
  * [All Permutations](backtracking/all_permutations.py)
  * [All Subsequences](backtracking/all_subsequences.py)
  * [Coloring](backtracking/coloring.py)
  * [Combination Sum](backtracking/combination_sum.py)
  * [Crossword Puzzle Solver](backtracking/crossword_puzzle_solver.py)
  * [Generate Parentheses](backtracking/generate_parentheses.py)
  * [Generate Parentheses Iterative](backtracking/generate_parentheses_iterative.py)
  * [Hamiltonian Cycle](backtracking/hamiltonian_cycle.py)
  * [Knight Tour](backtracking/knight_tour.py)
  * [Match Word Pattern](backtracking/match_word_pattern.py)
  * [Minimax](backtracking/minimax.py)
  * [N Queens](backtracking/n_queens.py)
  * [N Queens Math](backtracking/n_queens_math.py)
  * [Power Sum](backtracking/power_sum.py)
  * [Rat In Maze](backtracking/rat_in_maze.py)
  * [Sudoku](backtracking/sudoku.py)
  * [Sum Of Subsets](backtracking/sum_of_subsets.py)
  * [Word Break](backtracking/word_break.py)
  * [Word Ladder](backtracking/word_ladder.py)
  * [Word Search](backtracking/word_search.py)
2 / 7

sorts/insertion_sort.py: An Idiomatic Sort

sorts/insertion_sort.py:27

The signature uses PEP 695 type-parameter syntax bound to a Comparable Protocol, and the docstring carries six doctests for varied input shapes.

Sorts is the canonical category, and insertion_sort.py is its smallest readable example. The signature uses PEP 695 type-parameter syntax (def insertion_sort[T: Comparable]) bound to a Comparable Protocol declared earlier in the file, which means the function works on any mutable sequence whose items support <. The docstring is the test suite. Pytest with --doctest-modules runs every >>> example in this block on every push: empty list, negative integers, ASCII letters, and two random 100-element samples that compare against Python's built-in sorted(). The actual loop is six lines below this docstring. Browse sorts/ for 50 more files in this exact shape.

Key takeaway

Sorts cluster tightly around this shape: a typed signature, a Comparable bound, a docstring that doubles as the test suite, and a short loop body.

def insertion_sort[T: Comparable](collection: MutableSequence[T]) -> MutableSequence[T]:
    """A pure Python implementation of the insertion sort algorithm

    :param collection: some mutable ordered collection with heterogeneous
    comparable items inside
    :return: the same collection ordered by ascending

    Examples:
    >>> insertion_sort([0, 5, 3, 2, 2])
    [0, 2, 2, 3, 5]
    >>> insertion_sort([]) == sorted([])
    True
    >>> insertion_sort([-2, -5, -45]) == sorted([-2, -5, -45])
    True
    >>> insertion_sort(['d', 'a', 'b', 'e', 'c']) == sorted(['d', 'a', 'b', 'e', 'c'])
    True
    >>> import random
    >>> collection = random.sample(range(-50, 50), 100)
    >>> insertion_sort(collection) == sorted(collection)
    True
    >>> import string
    >>> collection = random.choices(string.ascii_letters + string.digits, k=100)
    >>> insertion_sort(collection) == sorted(collection)
    True
    """
3 / 7

searches/binary_search.py: Input Guard, Return -1 on Miss

searches/binary_search.py:180

binary_search validates ascending order, returns the index on hit, returns -1 on miss. Doctests cover both outcomes.

Searches is the next-most-trafficked category, and binary search is its signature algorithm. Two patterns from the contribution checklist appear here at once. First, the function rejects bad input with a Python exception: raise ValueError("sorted_collection must be sorted in ascending order") after a pairwise scan that costs O(n) but guarantees the precondition holds. Second, the doctests cover the contract end to end: a hit at index 0, a hit at the last index, a hit in the middle, and a miss that returns -1. The loop itself uses left + (right - left) // 2 instead of (left + right) // 2 to avoid integer overflow, a defensive habit copied from the classic CLRS textbook.

Key takeaway

Search functions follow the same pattern: validate inputs, raise ValueError on violations, return a sentinel value (here -1) when the target is absent.

def binary_search(sorted_collection: list[int], item: int) -> int:
    """Pure implementation of a binary search algorithm in Python

    Be careful collection must be ascending sorted otherwise, the result will be
    unpredictable

    :param sorted_collection: some ascending sorted collection with comparable items
    :param item: item value to search
    :return: index of the found item or -1 if the item is not found

    Examples:
    >>> binary_search([0, 5, 7, 10, 15], 0)
    0
    >>> binary_search([0, 5, 7, 10, 15], 15)
    4
    >>> binary_search([0, 5, 7, 10, 15], 5)
    1
    >>> binary_search([0, 5, 7, 10, 15], 6)
    -1
    """
    if any(a > b for a, b in pairwise(sorted_collection)):
        raise ValueError("sorted_collection must be sorted in ascending order")
    left = 0
    right = len(sorted_collection) - 1

    while left <= right:
        midpoint = left + (right - left) // 2
        current_item = sorted_collection[midpoint]
        if current_item == item:
            return midpoint
        elif item < current_item:
            right = midpoint - 1
        else:
            left = midpoint + 1
    return -1
4 / 7

graphs/breadth_first_search.py: Queue, Visited Set, Adjacency Dict

graphs/breadth_first_search.py:40

BFS uses a Queue, a visited set, and an adjacency dict. The doctest exercises a small directed graph with cycles.

Graphs is one of the largest categories at 64 files, and BFS is its smallest readable traversal. The state is three named structures: a visited set, a queue from the standard library's queue.Queue, and the adjacency map self.vertices that the enclosing Graph class maintains. The loop is the textbook BFS: dequeue, scan adjacent vertices, enqueue any that have not been visited yet. The doctest at the top is what makes this readable as a self-contained file. It builds a four-vertex graph with a cycle (vertex 3 has a self-loop), runs BFS from vertex 2, and asserts the visited set is [0, 1, 2, 3] after sorting. Reading the doctest tells you what the function does without scrolling.

Key takeaway

Graph traversals lean on three structures: a visited set, a frontier (queue for BFS, stack for DFS), and an adjacency map the enclosing Graph class owns.

    def bfs(self, start_vertex: int) -> set[int]:
        """
        >>> g = Graph()
        >>> g.add_edge(0, 1)
        >>> g.add_edge(0, 1)
        >>> g.add_edge(0, 2)
        >>> g.add_edge(1, 2)
        >>> g.add_edge(2, 0)
        >>> g.add_edge(2, 3)
        >>> g.add_edge(3, 3)
        >>> sorted(g.bfs(2))
        [0, 1, 2, 3]
        """
        # initialize set for storing already visited vertices
        visited = set()

        # create a first in first out queue to store all the vertices for BFS
        queue: Queue = Queue()

        # mark the source node as visited and enqueue it
        visited.add(start_vertex)
        queue.put(start_vertex)

        while not queue.empty():
            vertex = queue.get()

            # loop through all adjacent vertex and enqueue it if not yet visited
            for adjacent_vertex in self.vertices[vertex]:
                if adjacent_vertex not in visited:
                    queue.put(adjacent_vertex)
                    visited.add(adjacent_vertex)
        return visited
5 / 7

dynamic_programming/fibonacci.py: Class-Based Memoization

dynamic_programming/fibonacci.py:1

The Fibonacci class stores the sequence on the instance and extends it incrementally on each call. Doctests pin the contract.

Dynamic programming is the third-largest category at 51 files. Most DP solutions in the repository use a function with a memoization dict; this one uses a class so the cache survives across calls. self.sequence starts at [0, 1] and grows on demand. Each call to get(index) computes the gap with a walrus assignment ((difference := index - (len(self.sequence) - 2))) and appends only the missing terms. A second call to get on the same instance is therefore O(1) for already-computed indices. The two doctests pin the shape of the return value: a list of every Fibonacci number up to but not including the index. Browse dynamic_programming/ for knapsack, longest common subsequence, edit distance, and matrix chain order.

Key takeaway

DP files keep state explicit. This one stores the sequence on the instance; others use a dict argument or functools.cache. The pattern is always the same: avoid recomputation.

"""
This is a pure Python implementation of Dynamic Programming solution to the fibonacci
sequence problem.
"""


class Fibonacci:
    def __init__(self) -> None:
        self.sequence = [0, 1]

    def get(self, index: int) -> list:
        """
        Get the Fibonacci number of `index`. If the number does not exist,
        calculate all missing numbers leading up to the number of `index`.

        >>> Fibonacci().get(10)
        [0, 1, 1, 2, 3, 5, 8, 13, 21, 34]
        >>> Fibonacci().get(5)
        [0, 1, 1, 2, 3]
        """
        if (difference := index - (len(self.sequence) - 2)) >= 1:
            for _ in range(difference):
                self.sequence.append(self.sequence[-1] + self.sequence[-2])
        return self.sequence[:index]
6 / 7

ciphers/caesar_cipher.py: An Encrypt and Decrypt Pair

ciphers/caesar_cipher.py:71

The encrypt loop walks the input one character at a time, shifts the alphabet index by the key, and wraps modulo the alphabet length.

Ciphers is one of the more eclectic categories at 48 files, ranging from Caesar to RSA to Enigma. The Caesar implementation here is the smallest example of the pattern that holds across the directory: every cipher ships an encrypt and a decrypt function with matching signatures. The encrypt loop walks input_string character by character, looks up each character's index in the configured alphabet, shifts that index by key, and wraps with % len(alpha) so the shift loops back from z to a. Characters outside the alphabet (spaces, punctuation) pass through unchanged. The decrypt function below this slice mirrors the same loop with subtraction instead of addition. Real cryptography lives at ciphers/rsa_cipher.py; this file is the textbook entry point.

Key takeaway

Cipher files come in encrypt/decrypt pairs. Both functions, plus the doctests, fit in one short file because Caesar is one shift operation.

    # Set default alphabet to lower and upper case english chars
    alpha = alphabet or ascii_letters

    # The final result string
    result = ""

    for character in input_string:
        if character not in alpha:
            # Append without encryption if character is not in the alphabet
            result += character
        else:
            # Get the index of the new key and make sure it isn't too large
            new_key = (alpha.index(character) + key) % len(alpha)

            # Append the encoded character to the alphabet
            result += alpha[new_key]

    return result


def decrypt(input_string: str, key: int, alphabet: str | None = None) -> str:
7 / 7

pyproject.toml: autoapi_dirs Lists Every Category by Name

pyproject.toml:188

The Sphinx config enumerates every algorithm directory in autoapi_dirs. That list is the source of truth for what counts as a category.

The categories listed above are not arbitrary; they are listed by name in pyproject.toml. The autoapi_dirs array is what the Sphinx documentation build reads to know which directories to autodoc, and it doubles as the single source of truth for what counts as an algorithm category in the repository. The slice above shows the first 30 entries through machine_learning; the full list continues through maths, matrix, networking_flow, neural_network, other, physics, project_euler, quantum, scheduling, searches, sorts, strings, and web_programming. Pick any name on this list and you have a directory full of files in the same shape as the ones in the previous six stops. The marquee algorithm tours land in follow-up commits.

Key takeaway

The 43 categories are pinned in pyproject.toml's autoapi_dirs. Pick a category, open its directory, and every file follows the typed + docstring + doctest shape.

[tool.sphinx-pyproject]
copyright = "2014, TheAlgorithms"
autoapi_dirs = [
  "audio_filters",
  "backtracking",
  "bit_manipulation",
  "blockchain",
  "boolean_algebra",
  "cellular_automata",
  "ciphers",
  "computer_vision",
  "conversions",
  "data_compression",
  "data_structures",
  "digital_image_processing",
  "divide_and_conquer",
  "dynamic_programming",
  "electronics",
  "file_transfer",
  "financial",
  "fractals",
  "fuzzy_logic",
  "genetic_algorithm",
  "geodesy",
  "geometry",
  "graphics",
  "graphs",
  "greedy_methods",
  "hashes",
  "knapsack",
  "linear_algebra",
  "linear_programming",
  "machine_learning",
Your codebase next

Create code tours for your project

Intraview lets AI create interactive walkthroughs of any codebase. Install the free VS Code extension and generate your first tour in minutes.

Install Intraview Free