Neural Networks in TheAlgorithms/Python
Five files tracing how a network learns: from a single-neuron forward pass to backpropagation, multi-layer weight matrices, and the activation functions that make it possible.
What you will learn
- How a single-neuron forward pass iterates a weight update using the sigmoid derivative as an error scaling factor
- How the DenseLayer class stores weights and bias as NumPy matrices and applies them during forward propagation
- How back_propagation computes gradient with respect to weights and bias and updates them in-place
- How TwoHiddenLayerNeuralNetwork wires three weight matrices to represent input, two hidden, and one output layer
- How ReLU clips negative activations to zero using a single NumPy maximum call
- How the Swish activation multiplies the input by its own sigmoid, producing a smooth non-monotonic function
Prerequisites
- Comfort with NumPy matrix operations and Python classes
- Basic understanding of what a loss function and a learning rate are
simple_neural_network.py: one neuron, one weight, forward-propagate until convergence
neural_network/simple_neural_network.py:28A single-neuron network updates its weight by multiplying the prediction error by the sigmoid derivative, then stepping the weight in the corrective direction.
This file reduces a neural network to its smallest readable form: one neuron, one weight, no hidden layers. The weight starts as a random float on the range (1, 199). Each iteration applies three sub-steps. First, layer_1 is the sigmoid of the current weight scaled by INITIAL_VALUE; this is the forward pass. Second, layer_1_error measures how far the output is from the target. Third, layer_1_delta multiplies that error by the sigmoid derivative, which controls how large the correction step should be near the flat tails of the sigmoid curve. The weight then updates by that delta scaled again by INITIAL_VALUE. After 450,000 iterations, the output reliably falls in the range (31, 33) when the expected value is 32. After only 1,000 iterations it does not converge, as the second doctest confirms.
A single-neuron network needs three lines per iteration: forward pass through sigmoid, error computation, delta scaled by sigmoid derivative, then weight update.
def forward_propagation(expected: int, number_propagations: int) -> float:
"""Return the value found after the forward propagation training.
>>> res = forward_propagation(32, 450_000) # Was 10_000_000
>>> res > 31 and res < 33
True
>>> res = forward_propagation(32, 1000)
>>> res > 31 and res < 33
False
"""
# Random weight
weight = float(2 * (random.randint(1, 100)) - 1)
for _ in range(number_propagations):
# Forward propagation
layer_1 = sigmoid_function(INITIAL_VALUE * weight)
# How much did we miss?
layer_1_error = (expected / 100) - layer_1
# Error delta
layer_1_delta = layer_1_error * sigmoid_function(layer_1, True)
# Update weight
weight += INITIAL_VALUE * layer_1_delta
return layer_1 * 100DenseLayer: forward propagation applies the stored weight matrix and bias
neural_network/back_propagation_neural_network.py:69A DenseLayer's forward pass is a matrix multiply of the weight matrix with the input, minus bias, passed through the activation function.
The back-propagation framework in this file organizes a neural network as a list of DenseLayer objects. Each layer owns a weight matrix initialized with small random values and a bias vector of the same shape. The input layer is a special case: it passes data through unchanged. Every other layer computes np.dot(self.weight, self.xdata) - self.bias, which is the linear transformation that maps the previous layer's output to this layer's pre-activation values. The result is passed to self.activation, which defaults to sigmoid if none is specified during construction. Both self.wx_plus_b and self.output are saved because the backward pass needs them to compute the gradient. The BPNN class's train method chains calls to forward_propagation across all layers before computing loss.
A DenseLayer's forward pass is one line of math: np.dot(weight, input) - bias, then apply the activation function. Both intermediate values are cached for the backward pass.
def forward_propagation(self, xdata):
self.xdata = xdata
if self.is_input_layer:
# input layer
self.wx_plus_b = xdata
self.output = xdata
return xdata
else:
self.wx_plus_b = np.dot(self.weight, self.xdata) - self.bias
self.output = self.activation(self.wx_plus_b)
return self.outputTwoHiddenLayerNeuralNetwork: three weight matrices connect four layers
neural_network/two_hidden_layers_neural_network.py:11Three weight matrices are initialized with shapes that encode the node counts at each layer transition: input-to-hidden-1, hidden-1-to-hidden-2, hidden-2-to-output.
The weight matrix shapes in this constructor encode the entire network topology. input_layer_and_first_hidden_layer_weights is shaped (input_features, 4), meaning the first hidden layer has 4 nodes regardless of how many input features the data has. first_hidden_layer_and_second_hidden_layer_weights is (4, 3), connecting 4 nodes to 3. second_hidden_layer_and_output_layer_weights is (3, 1), collapsing 3 hidden nodes to a single output. Reading these three shapes in order tells you the full architecture without running the code. np.random.default_rng() initializes random weights in (0, 1), which avoids the symmetry-breaking problem where all neurons would learn the same feature if they all started at the same value. The predicted_output field starts as an array of zeros with the same shape as output_array.
The three weight matrix shapes directly encode the network topology; reading their dimensions tells you the node counts at every layer without running the code.
class TwoHiddenLayerNeuralNetwork:
def __init__(self, input_array: np.ndarray, output_array: np.ndarray) -> None:
"""
This function initializes the TwoHiddenLayerNeuralNetwork class with random
weights for every layer and initializes predicted output with zeroes.
input_array : input values for training the neural network (i.e training data) .
output_array : expected output values of the given inputs.
"""
# Input values provided for training the model.
self.input_array = input_array
# Random initial weights are assigned where first argument is the
# number of nodes in previous layer and second argument is the
# number of nodes in the next layer.
# Random initial weights are assigned.
# self.input_array.shape[1] is used to represent number of nodes in input layer.
# First hidden layer consists of 4 nodes.
rng = np.random.default_rng()
self.input_layer_and_first_hidden_layer_weights = rng.random(
(self.input_array.shape[1], 4)
)
# Random initial values for the first hidden layer.
# First hidden layer has 4 nodes.
# Second hidden layer has 3 nodes.
self.first_hidden_layer_and_second_hidden_layer_weights = rng.random((4, 3))
# Random initial values for the second hidden layer.
# Second hidden layer has 3 nodes.
# Output layer has 1 node.
self.second_hidden_layer_and_output_layer_weights = rng.random((3, 1))ReLU: clip negative activations to zero with a single NumPy call
neural_network/activation_functions/rectified_linear_unit.py:18ReLU maps negative inputs to zero and leaves positive inputs unchanged, introducing nonlinearity without a saturating range.
ReLU (Rectified Linear Unit) became the dominant activation function in deep networks because it solves the vanishing gradient problem that plagues sigmoid and tanh. When the sigmoid output is near 0 or near 1, its derivative is nearly zero, which means gradients multiplied through many sigmoid layers shrink to nothing. ReLU's derivative for positive inputs is exactly 1, so gradients pass through without shrinking. The tradeoff is that neurons receiving only negative inputs produce a zero derivative always, the so-called dead neuron problem. The implementation is one line: np.maximum(0, vector) applies element-wise maximum between 0 and each input, zeroing negatives and leaving positives intact. The doctest confirms: input [-1, 0, 5] returns [0, 0, 5].
ReLU is np.maximum(0, vector): one line that zeroes negative inputs and leaves positives unchanged, enabling gradients to flow without saturation.
def relu(vector: list[float]):
"""
Implements the relu function
Parameters:
vector (np.array,list,tuple): A numpy array of shape (1,n)
consisting of real values or a similar list,tuple
Returns:
relu_vec (np.array): The input numpy array, after applying
relu.
>>> vec = np.array([-1, 0, 5])
>>> relu(vec)
array([0, 0, 5])
"""
# compare two arrays and then return element-wise maxima.
return np.maximum(0, vector)Swish: multiply input by its own sigmoid to get a smooth non-monotonic activation
neural_network/activation_functions/swish.py:33Swish multiplies each input by its own sigmoid value, producing a smooth function that allows small negative outputs near zero.
ReLU is piecewise linear and non-differentiable at zero; that kink can cause optimization instabilities in some architectures. Swish, introduced in a 2017 Google Brain paper linked in the module docstring, addresses this by returning vector times sigmoid(vector). For large positive inputs, sigmoid approaches 1 so Swish is approximately linear. For large negative inputs, sigmoid approaches 0 so Swish approaches 0. Near zero, Swish dips slightly negative (around -0.28 at input -1) before returning to zero, giving the network a richer signal than ReLU's hard zero clamp. The file also defines a parameterized swish function that multiplies the input by sigmoid(beta times vector), where beta is a trainable scaling parameter. This generalization was shown to further improve performance on some tasks. Both variants are implemented and doctested in the file.
Swish is vector times sigmoid(vector): one multiplication that produces a smooth, slightly negative trough near zero instead of ReLU's hard clamp.
def sigmoid_linear_unit(vector: np.ndarray) -> np.ndarray:
"""
Implements the Sigmoid Linear Unit (SiLU) or swish function
Parameters:
vector (np.ndarray): A numpy array consisting of real values
Returns:
swish_vec (np.ndarray): The input numpy array, after applying swish
Examples:
>>> sigmoid_linear_unit(np.array([-1.0, 1.0, 2.0]))
array([-0.26894142, 0.73105858, 1.76159416])
>>> sigmoid_linear_unit(np.array([-2]))
array([-0.23840584])
"""
return vector * sigmoid(vector)You've walked through 5 key areas of the The Algorithms - Python codebase.
Continue: Project Euler Solutions in TheAlgorithms/Python → Browse all projectsCreate code tours for your project
Intraview lets AI create interactive walkthroughs of any codebase. Install the free VS Code extension and generate your first tour in minutes.
Install Intraview Free