The Algorithms - Python Python TheAlgorithms/Python

How Binary Search Works

Halve the search space on every step. The 1946 algorithm that still trips up professionals on overflow and off-by-one errors.

6 stops ~16 min Verified 2026-05-05
What you will learn
  • Why comparing at the midpoint halves the remaining candidates on every iteration
  • How mid = low + (high - low) // 2 avoids integer overflow compared to (low + high) // 2
  • How the iterative binary_search function validates sort order before searching
  • How binary_search_by_recursion encodes the same invariant with a shrinking left/right window
  • How binary_search_std_lib delegates to bisect.bisect_left for a one-statement lookup
  • Why the file benchmarks four implementations and what the runtime order reveals
  • How the doctest pins four cases: hit at index 0, hit at last index, interior hit, miss returning -1
Prerequisites
  • Comfort reading Python while loops
  • Familiarity with recursion and the call stack
  • No prior knowledge of binary search required
The story behind binary search

Binary search is older than stored-program computers. John Mauchly described the idea in 1946 in The Moore School Lectures, a foundational course on ENIAC. The algorithm was clear enough to state in a sentence: compare the target to the middle element, discard the half that cannot contain it, repeat. What turned out to be surprisingly difficult was writing a correct implementation.

The first published binary search appeared in 1946. The first bug-free binary search for an arbitrary n took until 1962, when D. H. Lehmer published one. Jon Bentley noted in his 1986 book Programming Pearls that when he asked professional programmers to write a binary search from memory, about 90 percent produced a version with at least one bug, most commonly an off-by-one error in the boundary update or an integer overflow in the midpoint calculation. The overflow comes from computing (low + high) // 2 in a language with bounded integers: if both bounds are large, their sum wraps. The safe form is low + (high - low) // 2, and Python avoids the issue entirely with arbitrary-precision integers. The insight is still worth knowing because the code you read in C, Java, or Rust uses bounded arithmetic.

1 / 6

Module Header: Four Implementations in One File

searches/binary_search.py:1

The shebang, docstring, and imports establish that this file collects several binary search variants under one module, all runnable via doctest or the __main__ block.

Most algorithm files in this repository hold one function. This one holds four distinct binary search implementations. The module docstring declares the reason: this is a survey file, not a single-algorithm stub. The shebang on line 1 makes it directly executable on Unix. Two imports follow: bisect from the standard library, which the binary_search_std_lib variant delegates to, and pairwise from itertools, which the primary implementation uses to validate sort order before searching. Both imports appear at the top because the file loads every variant, not only the primary one. The docstring names the two ways to exercise the code: python3 -m doctest -v for the contract tests, and python3 binary_search.py for the benchmark and interactive session at the bottom.

Key takeaway

A four-variant survey file begins with a docstring that names the class of algorithm, two imports, and two run commands.

#!/usr/bin/env python3

"""
Pure Python implementations of binary search algorithms

For doctests run the following command:
python3 -m doctest -v binary_search.py

For manual testing run:
python3 binary_search.py
"""

import bisect
from itertools import pairwise
2 / 6

bisect_left: The Midpoint Formula That Avoids Overflow

searches/binary_search.py:17

bisect_left finds the first insertion point in a sorted array. The midpoint formula avoids integer overflow in bounded-integer languages like C or Java.

The midpoint formula on line 50 is the line that has tripped up programmers for decades. Writing (lo + hi) // 2 looks correct, but in languages with 32-bit integer arithmetic both bounds can overflow the type when summed, producing a negative number and an out-of-range index. The safe form lo + (hi - lo) // 2 computes the same value but keeps the intermediate result bounded by the collection size. Python uses arbitrary-precision integers so the overflow never occurs here, but the formula is worth knowing because virtually every production binary search in C, Java, or Rust needs it. The loop body applies the classic two-branch invariant: if the midpoint value is strictly less than the target, the answer must lie in the upper half, so advance lo; otherwise narrow hi down. When the pointers meet, lo is the insertion point.

Key takeaway

lo + (hi - lo) // 2 is the overflow-safe midpoint. Python does not need it, but every C or Java binary search does.

def bisect_left(
    sorted_collection: list[int], item: int, lo: int = 0, hi: int = -1
) -> int:
    """
    Locates the first element in a sorted array that is larger or equal to a given
    value.

    It has the same interface as
    https://docs.python.org/3/library/bisect.html#bisect.bisect_left .

    :param sorted_collection: some ascending sorted collection with comparable items
    :param item: item to bisect
    :param lo: lowest index to consider (as in sorted_collection[lo:hi])
    :param hi: past the highest index to consider (as in sorted_collection[lo:hi])
    :return: index i such that all values in sorted_collection[lo:i] are < item and all
        values in sorted_collection[i:hi] are >= item.

    Examples:
    >>> bisect_left([0, 5, 7, 10, 15], 0)
    0
    >>> bisect_left([0, 5, 7, 10, 15], 6)
    2
    >>> bisect_left([0, 5, 7, 10, 15], 20)
    5
    >>> bisect_left([0, 5, 7, 10, 15], 15, 1, 3)
    3
    >>> bisect_left([0, 5, 7, 10, 15], 6, 2)
    2
    """
    if hi < 0:
        hi = len(sorted_collection)

    while lo < hi:
        mid = lo + (hi - lo) // 2
        if sorted_collection[mid] < item:
            lo = mid + 1
        else:
            hi = mid

    return lo
3 / 6

binary_search: The Primary Iterative Implementation

searches/binary_search.py:180

binary_search validates ascending order with pairwise, then searches with two converging pointers. Returns the index on hit, -1 on miss.

The guard on line 200 is the contribution checklist in action. Before searching, any(a > b for a, b in pairwise(sorted_collection)) scans every adjacent pair in one pass, and raises ValueError if any inversion exists. The cost is O(n) upfront, which doubles the work on very small inputs but makes the precondition explicit rather than leaving silently wrong results. The search loop on lines 205 through 213 converges two pointers from opposite ends: left starts at zero, right at the last index, and each iteration either returns the matching index or narrows the window by half. The loop terminates when left exceeds right, at which point no match exists and the function returns -1. The four doctests on lines 191 through 198 pin all four outcomes: hit at the first position, hit at the last position, interior hit, and a clean miss.

Key takeaway

Validate sort order before searching, then converge two pointers. Return the index on hit, -1 when the window collapses without a match.

def binary_search(sorted_collection: list[int], item: int) -> int:
    """Pure implementation of a binary search algorithm in Python

    Be careful collection must be ascending sorted otherwise, the result will be
    unpredictable

    :param sorted_collection: some ascending sorted collection with comparable items
    :param item: item value to search
    :return: index of the found item or -1 if the item is not found

    Examples:
    >>> binary_search([0, 5, 7, 10, 15], 0)
    0
    >>> binary_search([0, 5, 7, 10, 15], 15)
    4
    >>> binary_search([0, 5, 7, 10, 15], 5)
    1
    >>> binary_search([0, 5, 7, 10, 15], 6)
    -1
    """
    if any(a > b for a, b in pairwise(sorted_collection)):
        raise ValueError("sorted_collection must be sorted in ascending order")
    left = 0
    right = len(sorted_collection) - 1

    while left <= right:
        midpoint = left + (right - left) // 2
        current_item = sorted_collection[midpoint]
        if current_item == item:
            return midpoint
        elif item < current_item:
            right = midpoint - 1
        else:
            left = midpoint + 1
    return -1
4 / 6

binary_search_by_recursion: The Same Invariant, Stack-Encoded

searches/binary_search.py:320

The recursive variant encodes the search window as call-stack frames. When the right pointer falls below the left, the item is absent.

The iterative and recursive versions express the same invariant differently. Where the loop used while left <= right as its continuation condition, the recursive version checks if right < left: return -1 as its base case on line 347. Both say the same thing: when the search window is empty, the item is absent. The midpoint formula on line 350 is identical to the one in bisect_left. The two recursive calls on lines 355 and 357 each pass a narrowed window, discarding either the right or left half depending on which side the target must fall. Each call adds a frame to the call stack, so the depth is O(log n). The doctest requires callers to pass explicit left and right bounds, which is why the examples include 0, 4; the defaults handle the first call automatically.

Key takeaway

The recursive base case (right < left) is the exact termination condition of the iterative loop, encoded as a function return instead of a loop exit.

def binary_search_by_recursion(
    sorted_collection: list[int], item: int, left: int = 0, right: int = -1
) -> int:
    """Pure implementation of a binary search algorithm in Python by recursion

    Be careful collection must be ascending sorted otherwise, the result will be
    unpredictable
    First recursion should be started with left=0 and right=(len(sorted_collection)-1)

    :param sorted_collection: some ascending sorted collection with comparable items
    :param item: item value to search
    :return: index of the found item or -1 if the item is not found

    Examples:
    >>> binary_search_by_recursion([0, 5, 7, 10, 15], 0, 0, 4)
    0
    >>> binary_search_by_recursion([0, 5, 7, 10, 15], 15, 0, 4)
    4
    >>> binary_search_by_recursion([0, 5, 7, 10, 15], 5, 0, 4)
    1
    >>> binary_search_by_recursion([0, 5, 7, 10, 15], 6, 0, 4)
    -1
    """
    if right < 0:
        right = len(sorted_collection) - 1
    if list(sorted_collection) != sorted(sorted_collection):
        raise ValueError("sorted_collection must be sorted in ascending order")
    if right < left:
        return -1

    midpoint = left + (right - left) // 2

    if sorted_collection[midpoint] == item:
        return midpoint
    elif sorted_collection[midpoint] > item:
        return binary_search_by_recursion(sorted_collection, item, left, midpoint - 1)
    else:
        return binary_search_by_recursion(sorted_collection, item, midpoint + 1, right)
5 / 6

binary_search_std_lib: Delegation to bisect

searches/binary_search.py:217

bisect.bisect_left returns the insertion point; a bounds check converts that into an index-or-minus-one return, matching the API of the hand-coded versions.

The standard library provides binary search through bisect.bisect_left, but its return value is an insertion point rather than a hit-or-miss index. That mismatch is why this wrapper exists. Line 239 calls bisect.bisect_left, which returns the leftmost position where item can be inserted while keeping the list sorted. That position is a hit only if it is within bounds and the value there equals the target. Lines 240 through 241 encode that two-part check: if the index is at the end of the collection, the item exceeds every element and is absent; if the element at the index does not match, the item is absent. Otherwise return the index. The sort-order validation on line 237 compares the full list against sorted(), an O(n log n) check rather than the O(n) pairwise scan used by binary_search.

Key takeaway

bisect.bisect_left returns an insertion point, not a match index. A two-part bounds-and-value check converts it to the expected hit-or-minus-one API.

def binary_search_std_lib(sorted_collection: list[int], item: int) -> int:
    """Pure implementation of a binary search algorithm in Python using stdlib

    Be careful collection must be ascending sorted otherwise, the result will be
    unpredictable

    :param sorted_collection: some ascending sorted collection with comparable items
    :param item: item value to search
    :return: index of the found item or -1 if the item is not found

    Examples:
    >>> binary_search_std_lib([0, 5, 7, 10, 15], 0)
    0
    >>> binary_search_std_lib([0, 5, 7, 10, 15], 15)
    4
    >>> binary_search_std_lib([0, 5, 7, 10, 15], 5)
    1
    >>> binary_search_std_lib([0, 5, 7, 10, 15], 6)
    -1
    """
    if list(sorted_collection) != sorted(sorted_collection):
        raise ValueError("sorted_collection must be sorted in ascending order")
    index = bisect.bisect_left(sorted_collection, item)
    if index != len(sorted_collection) and sorted_collection[index] == item:
        return index
    return -1
6 / 6

The __main__ Block: Benchmarks and an Interactive Session

searches/binary_search.py:407

The __main__ block runs doctests, benchmarks all four implementations on a 1000-element range, then accepts user input for a live search.

Most algorithm files in this repository carry a minimal __main__ block that reads from stdin and prints one result. This file goes further because there are four implementations to compare. The searches tuple defined at line 399 lists them fastest to slowest by the file's comment. The benchmark on lines 418 through 425 runs each implementation 5,000 times on a range(1000) collection, searching for the middle element. Running python3 binary_search.py prints the timing for all four, which illustrates concretely why delegating to bisect wins: it calls into a C extension rather than a pure Python loop. The interactive session at the bottom accepts a comma-separated list from stdin, sorts it (using the same sorted() call the binary_search_std_lib validation uses), and searches for a user-provided target. The sorted() call on line 428 means the validation in the search function always passes.

Key takeaway

The benchmark loop runs all four implementations on the same input and prints wall-clock timings. binary_search_std_lib wins because it calls a C extension.

if __name__ == "__main__":
    import doctest
    import timeit

    doctest.testmod()
    for search in searches:
        name = f"{search.__name__:>26}"
        print(f"{name}: {search([0, 5, 7, 10, 15], 10) = }")  # type: ignore[operator]

    print("\nBenchmarks...")
    setup = "collection = range(1000)"
    for search in searches:
        name = search.__name__
        print(
            f"{name:>26}:",
            timeit.timeit(
                f"{name}(collection, 500)", setup=setup, number=5_000, globals=globals()
            ),
        )

    user_input = input("\nEnter numbers separated by comma: ").strip()
    collection = sorted(int(item) for item in user_input.split(","))
    target = int(input("Enter a single number to be found in the list: "))
    result = binary_search(sorted_collection=collection, item=target)
    if result == -1:
        print(f"{target} was not found in {collection}.")
    else:
        print(f"{target} was found at position {result} of {collection}.")
Your codebase next

Create code tours for your project

Intraview lets AI create interactive walkthroughs of any codebase. Install the free VS Code extension and generate your first tour in minutes.

Install Intraview Free