Data Structures in TheAlgorithms/Python
Seven fundamental data structures side by side: the pointer chain, the key-ordered tree, the self-balancing tree, the heap, the hash table, the union-find, and the prefix tree.
What you will learn
- How a singly linked list node stores data and a reference to the next node
- How a BST insert walk compares at each node to find the correct leaf position
- How AVL tree right rotation rewires three pointers and recomputes heights
- How max_heapify sinks a node by comparing against children and swapping
- How a hash table maps a key to an index and resolves collisions with linear probing
- How disjoint set union and find use rank and path compression to flatten trees
- How a trie insert and find walk a character-keyed dictionary per level
Prerequisites
- Comfort reading Python classes and instance attributes
- Familiarity with the concept of pointers or references
- No prior knowledge of any specific data structure required
Linked list: a node holds data and a pointer to the next node
data_structures/linked_list/singly_linked_list.py:8A linked list node is a two-field object: one for the stored value and one for the reference to the next node.
A singly linked list is the simplest chain structure: each node contains a value and a pointer to the next node. The @dataclass decorator removes the need to write __init__ by hand; the two field declarations on lines 22 and 23 define both the constructor signature and the instance attributes. data accepts Any type, so the same node structure works for integers, strings, or arbitrary objects. next_node defaults to None, meaning a newly created node is by default the tail of the list. The four doctests cover the expected repr format for different data types. The linked list's power is O(1) insertion at the head; its cost is O(n) access to any specific index because there is no random access. All other operations in the LinkedList class depend on this two-field building block.
A linked list node is exactly two fields: the stored value and a pointer to the next node. Everything else the list does is built on top of these two.
@dataclass
class Node:
"""
Create and initialize Node class instance.
>>> Node(20)
Node(20)
>>> Node("Hello, world!")
Node(Hello, world!)
>>> Node(None)
Node(None)
>>> Node(True)
Node(True)
"""
data: Any
next_node: Node | None = None
def __repr__(self) -> str:BST insert: walk left or right until a leaf is found
data_structures/binary_tree/binary_search_tree.py:170BST insert traverses from root to a leaf by comparing the new value against each node, then attaches the new node at the first empty child slot.
A binary search tree maintains a single invariant: every value in a node's left subtree is smaller, and every value in its right subtree is larger. Insert enforces this invariant by starting at the root and walking down: if the new value is smaller than the current node, go left; otherwise go right. The walk continues until an empty child slot is found. The BST's search performance depends on its height: O(log n) in the average case on random input, but O(n) in the worst case when values arrive in sorted order and the tree degrades to a linked list. That worst case is why balanced variants like AVL trees exist. The new_node.parent = parent_node assignment on line 193 records the parent reference, which is needed for deletion operations.
BST insert enforces the ordering invariant by walking left or right at each node until an empty slot is found. The tree height determines search cost.
def __insert(self, value) -> None:
"""
Insert a new node in Binary Search Tree with value label
"""
new_node = Node(value) # create a new Node
if self.empty(): # if Tree is empty
self.root = new_node # set its root
else: # Tree is not empty
parent_node = self.root # from root
if parent_node is None:
return
while True: # While we don't get to a leaf
if value < parent_node.value: # We go left
if parent_node.left is None:
parent_node.left = new_node # We insert the new node in a leaf
break
else:
parent_node = parent_node.left
elif parent_node.right is None:
parent_node.right = new_node
break
else:
parent_node = parent_node.right
new_node.parent = parent_nodeAVL tree: right rotation rewires three pointers to restore balance
data_structures/binary_tree/avl_tree.py:87Right rotation promotes the left child to the parent position, moves the left child's right subtree to the demoted parent's left, and updates heights.
AVL trees are a self-balancing BST variant. After every insert or delete, the tree checks the height difference between subtrees (the balance factor) at each ancestor. If any node has a balance factor outside the range of negative one to positive one, a rotation restores balance. Right rotation handles the case where the left subtree is too deep. The docstring's ASCII art shows the transformation: node A (the unbalanced root) is demoted to A's right child, and B (A's left child) is promoted to take A's place. B's right subtree moves to A's left, so the BST ordering property is preserved. The three pointer updates on lines 101-102 are the entire structural change; lines 103-106 recompute the heights of both affected nodes. Height is always computed as the maximum of the two children's heights plus one.
AVL rotation is three pointer assignments plus two height updates. The ASCII diagram in the docstring shows exactly which subtree moves where.
def right_rotation(node: MyNode) -> MyNode:
r"""
A B
/ \ / \
B C Bl A
/ \ --> / / \
Bl Br UB Br C
/
UB
UB = unbalanced node
"""
print("left rotation node:", node.get_data())
ret = node.get_left()
assert ret is not None
node.set_left(ret.get_right())
ret.set_right(node)
h1 = my_max(get_height(node.get_right()), get_height(node.get_left())) + 1
node.set_height(h1)
h2 = my_max(get_height(ret.get_right()), get_height(ret.get_left())) + 1
ret.set_height(h2)
return retHeap: max_heapify sinks a node to its correct level
data_structures/heap/heap.py:111max_heapify finds the largest among a parent and its two children, swaps if a child is larger, then recurses on the swapped position.
A max-heap is a complete binary tree stored in a flat array where every parent is at least as large as its children. max_heapify is the function that repairs a single violation of that property. Given the index of a potentially out-of-place element, it checks both children. The index of the largest among the three (parent and two children) is recorded in violation. If the largest is not the parent, the two values swap, and max_heapify recurses on the position the parent moved to. That recursion continues until the element settles into a position where it is at least as large as both children. The entire heap is built by calling max_heapify from the last internal node up to the root. The class doctest shows that inserting into a heap of 11 integers produces a correctly ordered max-heap.
max_heapify is O(log n) because it sinks a value at most one level per call, and the heap has at most log n levels.
def max_heapify(self, index: int) -> None:
"""
correct a single violation of the heap property in a subtree's root.
It is the function that is responsible for restoring the property
of Max heap i.e the maximum element is always at top.
"""
if index < self.heap_size:
violation: int = index
left_child = self.left_child_idx(index)
right_child = self.right_child_idx(index)
# check which child is larger than its parent
if left_child is not None and self.h[left_child] > self.h[violation]:
violation = left_child
if right_child is not None and self.h[right_child] > self.h[violation]:
violation = right_child
# if violation indeed exists
if violation != index:
# swap to fix the violation
self.h[violation], self.h[index] = self.h[index], self.h[violation]
# fix the subsequent violation recursively if any
self.max_heapify(violation)Hash table: hash by modulo, probe linearly on collision
data_structures/hashing/hash_table.py:56The hash function maps a key to an index via modulo the table size; collisions are resolved by linear probing to the next available slot.
The hash function is the most consequential single line in a hash table implementation: key % self.size_table maps any integer key to an index in the backing array. The doctest makes the collision problem visible: both 10 and 20 hash to index 0 in a size-5 table. When a collision occurs, the abstract method _collision_resolution handles it; the concrete implementation uses linear probing, which increments the index by one until an empty slot is found. The doctest in _collision_resolution shows that inserting 17, 18, and 99 into a size-3 table assigns 17 to index 2, 18 to index 0, and 99 to index 1 after linear probing. Load factor controls when the table resizes; this implementation triggers at 75% occupancy by default. The bloom filter marquee tour covers a related hash-based structure.
A hash function with modulo maps keys to indices in O(1), but collisions are inevitable when keys outnumber slots; linear probing resolves them at the cost of clustering.
def hash_function(self, key):
"""
Generates hash for the given key value
Examples:
Creating HashTable with size 5
>>> ht = HashTable(5)
>>> ht.hash_function(10)
0
>>> ht.hash_function(20)
0
>>> ht.hash_function(4)
4
>>> ht.hash_function(18)
3
>>> ht.hash_function(-18)
2
>>> ht.hash_function(18.5)
3.5
>>> ht.hash_function(0)
0
>>> ht.hash_function(-0)
0
"""
return key % self.size_tableDisjoint set: union by rank, find with path compression
data_structures/disjoint_set/disjoint_set.py:14Disjoint set uses rank to keep trees flat during union and path compression to shorten lookup chains during find.
Disjoint set (union-find) solves the problem of tracking which elements belong to the same group. The key insight is that it does not need to store the full group membership list; it only needs to answer whether two elements share a root. make_set initializes each node as its own root with rank zero. union_set first finds the roots of both nodes, then attaches the lower-rank root under the higher-rank one. This union-by-rank heuristic keeps trees shallow. find_set walks up to the root, and the single assignment on line 47, x.parent = find_set(x.parent), implements path compression: every node on the path is directly wired to the root after the first lookup. Together, union-by-rank and path compression bring the amortised cost per operation to nearly O(1), the inverse-Ackermann function.
Union-by-rank keeps trees short; path compression flattens them further on every find. Together they give near-O(1) amortised operations.
def make_set(x: Node) -> None:
"""
Make x as a set.
"""
# rank is the distance from x to its' parent
# root's rank is 0
x.rank = 0
x.parent = x
def union_set(x: Node, y: Node) -> None:
"""
Union of two sets.
set with bigger rank should be parent, so that the
disjoint set tree will be more flat.
"""
x, y = find_set(x), find_set(y)
if x == y:
return
elif x.rank > y.rank:
y.parent = x
else:
x.parent = y
if x.rank == y.rank:
y.rank += 1
def find_set(x: Node) -> Node:
"""
Return the parent of x
"""
if x != x.parent:
x.parent = find_set(x.parent)
return x.parentTrie: insert and find walk a character-keyed dictionary per level
data_structures/trie/trie.py:9A trie node stores a dictionary of child nodes keyed by character; insert walks and creates nodes per character; find walks and returns False at any missing character.
A trie stores strings by breaking each word into individual characters and sharing common prefixes. Each TrieNode holds a dictionary mapping each character to the next TrieNode, and an is_leaf flag marking whether a complete word ends at this node. Insert walks the character sequence, creating a new node for any character not yet present, and sets is_leaf = True at the final character. Find walks the same path: any missing character returns False immediately, and reaching the end of the word returns is_leaf rather than True, which correctly distinguishes the word car from the prefix ca of card. The docstring notes the O(n squared) space complexity of a basic trie, though in practice tries with shared prefixes are far more compact than storing words individually.
Trie insert and find both walk one dictionary level per character. The is_leaf flag distinguishes a complete word from a shared prefix.
class TrieNode:
def __init__(self) -> None:
self.nodes: dict[str, TrieNode] = {} # Mapping from char to TrieNode
self.is_leaf = False
def insert_many(self, words: list[str]) -> None:
"""
Inserts a list of words into the Trie
:param words: list of string words
:return: None
"""
for word in words:
self.insert(word)
def insert(self, word: str) -> None:
"""
Inserts a word into the Trie
:param word: word to be inserted
:return: None
"""
curr = self
for char in word:
if char not in curr.nodes:
curr.nodes[char] = TrieNode()
curr = curr.nodes[char]
curr.is_leaf = True
def find(self, word: str) -> bool:
"""
Tries to find word in a Trie
:param word: word to look for
:return: Returns True if word is found, False otherwise
"""
curr = self
for char in word:
if char not in curr.nodes:
return False
curr = curr.nodes[char]
return curr.is_leafYou've walked through 7 key areas of the The Algorithms - Python codebase.
Continue: Conversion Algorithms in TheAlgorithms/Python → Browse all projectsCreate code tours for your project
Intraview lets AI create interactive walkthroughs of any codebase. Install the free VS Code extension and generate your first tour in minutes.
Install Intraview Free