From Perceptrons to Deep Networks
Neural networks mimic the structure of the human brain — interconnected nodes (neurons) that pass signals to one another. Despite the biological metaphor, they are essentially mathematical functions that learn to approximate complex relationships in data.
A Minimal Neural Network with Python
Let's build a simple neural network from scratch using NumPy. This will help demystify what happens beneath the abstraction layers of deep learning frameworks.
import numpy as np
class SimpleNeuralNetwork:
def __init__(self, input_size, hidden_size, output_size):
# Initialize weights randomly
self.W1 = np.random.randn(input_size, hidden_size)
self.b1 = np.zeros((1, hidden_size))
self.W2 = np.random.randn(hidden_size, output_size)
self.b2 = np.zeros((1, output_size))
def relu(self, x):
return np.maximum(0, x)
def softmax(self, x):
exp_x = np.exp(x - np.max(x, axis=1, keepdims=True))
return exp_x / exp_x.sum(axis=1, keepdims=True)
def forward(self, X):
self.z1 = X @ self.W1 + self.b1
self.a1 = self.relu(self.z1)
self.z2 = self.a1 @ self.W2 + self.b2
self.output = self.softmax(self.z2)
return self.output
def backward(self, X, y, output, lr=0.01):
m = X.shape[0]
dz2 = output - y # Gradient of softmax + cross-entropy
dW2 = (self.a1.T @ dz2) / m
db2 = dz2.sum(axis=0, keepdims=True) / m
dz1 = dz2 @ self.W2.T * (self.z1 > 0)
dW1 = (X.T @ dz1) / m
db1 = dz1.sum(axis=0, keepdims=True) / m
self.W2 -= lr * dW2
self.b2 -= lr * db2
self.W1 -= lr * dW1
self.b1 -= lr * db1
Understanding the Flow
Forward Pass
Data flows through the network layer by layer. Each neuron computes a weighted sum of its inputs, adds a bias, and applies an activation function. Popular choices include ReLU, sigmoid, and tanh.
Backward Pass (Backpropagation)
The network calculates how much each weight contributed to the error, then updates them in the opposite direction. This is done using the chain rule from calculus — the same principle that makes deep learning possible.
Using a Framework: TensorFlow/Keras
Once you understand the mechanics, frameworks like TensorFlow make it trivial to scale up:
import tensorflow as tf
model = tf.keras.Sequential([
tf.keras.layers.Dense(64, activation='relu', input_shape=(784,)),
tf.keras.layers.Dropout(0.2),
tf.keras.layers.Dense(10, activation='softmax')
])
model.compile(optimizer='adam',
loss='sparse_categorical_crossentropy',
metrics=['accuracy'])
model.fit(X_train, y_train, epochs=10, batch_size=32,
validation_data=(X_test, y_test))
Key Takeaways
- Neural networks learn by adjusting weights through backpropagation
- Activation functions introduce non-linearity, enabling the network to model complex patterns
- Frameworks abstract away the math, but understanding it builds intuition
Conclusion
Building a neural network from scratch reveals the elegance of the process. The math is straightforward, but the results can be extraordinary. From there, stacking more layers, adding regularization, and tuning hyperparameters unlocks the full power of deep learning. Your first network is just the beginning.