How Counting Sort Works
No comparisons, no pivots. Count key frequencies, prefix-sum them, then place each element at its exact output index in one backward pass.
What you will learn
- Why counting sort avoids comparisons by treating element values as array indices
- How coll_max and coll_min determine the counting array's length, bounding it to the key range k
- How the forward count pass tallies each value's frequency in O(n)
- How the prefix-sum pass transforms raw frequencies into final output positions
- How the backward placement pass writes each element to its exact position and decrements to preserve stability
- How counting_sort_string lifts integer counting sort onto character ordinals in one comprehension
- Why counting sort is O(n + k) and when k dominates to make it impractical
Prerequisites
- Comfort reading Python list comprehensions and the range() function
- Familiarity with the idea of an array index as a position
- No prior knowledge of counting sort required
The story behind counting sort
Harold Seward described counting sort in his 1954 MIT master's thesis on information classification. The central observation is that comparison-based sorting has an information-theoretic lower bound of O(n log n): to distinguish all n! orderings of n items, a comparison tree needs at least log-base-2 of n! leaves, which grows as n log n. Counting sort sidesteps that bound entirely by treating element values as array indices rather than things to compare. The cost shifts from comparisons to key range: if the values span a range of k, you allocate a counting array of size k, scan the input once, and reconstruct the sorted output from the counts.
Seward's algorithm is most useful when k is small relative to n: sorting one million exam scores from 0 to 100 is O(n + 100) = O(n), where comparison sorts cost O(n log n). Counting sort is also the inner loop of LSD radix sort, which applies it once per digit position with k fixed at 10 (decimal digits) or 256 (bytes). The stability of counting sort, achieved here by iterating the placement loop in reverse, is what makes radix sort correct: digits processed at earlier positions must retain their relative order through later passes.
Module Docstring: Run Instructions and Two Commands
sorts/counting_sort.py:1The module docstring names the algorithm and gives the two invocation commands every algorithm file in this repository carries.
Counting sort occupies a different conceptual category from quicksort or mergesort, and a reader needs that context before the code. The module docstring names it a "counting sort algorithm" rather than a "comparison sort," which is the first signal. Two doctest commands appear instead of one because some systems expose a python binary pointing to Python 2 and others only ship python3; both are listed so a contributor can use either. The manual run command at line 8 invokes the __main__ block at the bottom of the file. This shape (name, doctest command, manual command) is the same shape as sorts/merge_sort.py and sorts/radix_sort.py and every other sort in the category.
Two doctest commands handle both Python 2 and Python 3 binary names. The shape is identical across every algorithm file in sorts/.
"""
This is pure Python implementation of counting sort algorithm
For doctests run following command:
python -m doctest -v counting_sort.py
or
python3 -m doctest -v counting_sort.py
For manual testing run:
python counting_sort.py
"""Signature, Doctest, and the Contract
sorts/counting_sort.py:12The signature takes an untyped collection. Three doctests pin ascending output, empty input, and negatives as the contract.
The signature is deliberately untyped compared to the Protocol-bound generics in sorts/insertion_sort.py. Counting sort only works on integer keys (or values that can serve as array indices), so typing it with a Comparable Protocol would mislead: you could pass floats or strings and get a runtime error on the index arithmetic. The doctest at line 18 proves duplicates (two 2s) land in the right positions. The doctest at line 20 proves the empty-collection guard works. The doctest at line 22 proves negative integers work, which requires the coll_min offset in the counting array rather than a naive zero-based index. These three examples are the same three shapes the contribution checklist asks for: a typical case, the empty case, and an edge case involving unusual values.
Three doctests cover duplicates, empty input, and negative integers. Negatives are the non-obvious case: they require offsetting the counting array index by coll_min.
def counting_sort(collection):
"""Pure implementation of counting sort algorithm in Python
:param collection: some mutable ordered collection with heterogeneous
comparable items inside
:return: the same collection ordered by ascending
Examples:
>>> counting_sort([0, 5, 3, 2, 2])
[0, 2, 2, 3, 5]
>>> counting_sort([])
[]
>>> counting_sort([-2, -5, -45])
[-45, -5, -2]
"""Empty Guard and Range Discovery
sorts/counting_sort.py:25The empty guard returns early. Then max and min determine the counting array length, which equals the key range k.
Before counting anything, the algorithm needs to know two facts about the input. The empty guard on line 26 handles the degenerate case early, because max([]) raises ValueError and would crash the lines below without this check. Lines 31 and 32 scan the collection for its maximum and minimum values in two O(n) passes. Those two values determine the counting array length on line 35: coll_max + 1 - coll_min allocates exactly enough buckets to hold every distinct key, with bucket 0 representing coll_min and bucket k-1 representing coll_max. For the doctest case [-2, -5, -45], coll_min is -45, coll_max is -2, and the counting array needs 44 buckets. Without the offset, indexing by the raw value would produce a negative index and a runtime error.
The counting array length is the key range k, not the collection length. The coll_min offset shifts negative or non-zero-based keys into valid indices.
# if the collection is empty, returns empty
if collection == []:
return []
# get some information about the collection
coll_len = len(collection)
coll_max = max(collection)
coll_min = min(collection)
# create the counting array
counting_arr_length = coll_max + 1 - coll_min
counting_arr = [0] * counting_arr_lengthCount Pass and Prefix-Sum Pass
sorts/counting_sort.py:38The count pass tallies each value's frequency. The prefix-sum pass converts raw counts into final output positions for each key.
Two sequential passes transform the counting array from a frequency table into a position map. The count pass on lines 39 and 40 walks every element once and increments the bucket for that key. After this pass, counting_arr[i] holds how many times the value coll_min + i appears in the input. The prefix-sum pass on lines 44 and 45 accumulates those counts left to right: after the pass, counting_arr[i] holds the total number of elements with value at most coll_min + i. That total is exactly the 1-indexed output position of the last occurrence of value coll_min + i. The count pass on lines 39-40 is O(n); the prefix-sum pass on lines 44-45 is O(k) where k is the counting array length. Combined with the O(n) count pass over the input, the two together give the O(n + k) total complexity.
After the prefix-sum pass, counting_arr[i] is the 1-indexed output position of the last occurrence of value coll_min + i. That position drives the placement loop.
# count how much a number appears in the collection
for number in collection:
counting_arr[number - coll_min] += 1
# sum each position with it's predecessors. now, counting_arr[i] tells
# us how many elements <= i has in the collection
for i in range(1, counting_arr_length):
counting_arr[i] = counting_arr[i] + counting_arr[i - 1]Backward Placement Pass: Stable Output
sorts/counting_sort.py:47Iterating in reverse preserves input order for equal keys, making the sort stable. Each placed element decrements its bucket to avoid overwriting.
The placement loop is the step that distinguishes a stable counting sort from a naive one. ordered on line 48 is a fresh output array the same length as the input. The loop on line 52 walks the input backward, from the last element to the first. For each element, line 53 reads the prefix-sum position from counting_arr (subtracting 1 because the positions are 1-indexed), writes the element there, and line 54 decrements the bucket. The backward iteration is what makes the sort stable: two equal elements in the input will both look up the same bucket, but because the last one is processed first, it lands at the higher position, and the earlier one lands one slot below it, preserving their original relative order. LSD radix sort depends on this stability to accumulate orderings across digit positions correctly.
The backward iteration is not a style choice. It is what makes the sort stable. LSD radix sort is correct only because this loop preserves relative order for equal keys.
# create the output collection
ordered = [0] * coll_len
# place the elements in the output, respecting the original order (stable
# sort) from end to begin, updating counting_arr
for i in reversed(range(coll_len)):
ordered[counting_arr[collection[i] - coll_min] - 1] = collection[i]
counting_arr[collection[i] - coll_min] -= 1
return orderedcounting_sort_string and the __main__ Driver
sorts/counting_sort.py:59counting_sort_string lifts integer sorting onto character ordinals in one comprehension. The __main__ block asserts the string variant then exercises the integer sort interactively.
counting_sort_string on line 59 shows counting sort generalizing to any domain whose elements map to integers. The implementation is a single expression: convert the string's characters to ordinal integers with ord(c), sort the integers, convert back to characters with chr(i), and join. No new algorithm needed. The one doctest on line 61 verifies that the characters of "thisisthestring" land in ASCII order, which a reader can hand-check by noting that e (101) precedes g (103) and so on. The __main__ block asserts this doctest explicitly before the interactive driver, so any drift in the chr/ord round-trip fails immediately when the script is run directly. Browse sorts/ for other sort files that extend their primary function to strings in the same pattern.
ord(c) and chr(i) lift integer counting sort onto characters without any new algorithm. The assert in __main__ catches drift on direct invocation.
def counting_sort_string(string):
"""
>>> counting_sort_string("thisisthestring")
'eghhiiinrsssttt'
"""
return "".join([chr(i) for i in counting_sort([ord(c) for c in string])])
if __name__ == "__main__":
# Test string sort
assert counting_sort_string("thisisthestring") == "eghhiiinrsssttt"
user_input = input("Enter numbers separated by a comma:\n").strip()
unsorted = [int(item) for item in user_input.split(",")]
print(counting_sort(unsorted))You've walked through 6 key areas of the The Algorithms - Python codebase.
Continue: How Linear Search Works → Browse all projectsCreate code tours for your project
Intraview lets AI create interactive walkthroughs of any codebase. Install the free VS Code extension and generate your first tour in minutes.
Install Intraview Free