ReLU Activation Function in Neural Networks
1. Introduction
The Rectified Linear Unit (ReLU) is one of the most widely used activation functions in modern neural networks.
Its primary purpose is to introduce non-linearity into the network while maintaining computational efficiency.
Mathematically, ReLU is defined as:
2. How ReLU Works
- If ( x > 0 ) → ( f(x) = x ) (passes the value as it is).
- If ( x \leq 0 ) → ( f(x) = 0 ) (neuron output is off).
| Input (x) | Output (f(x)) |
|---|---|
| -3 | 0 |
| -0.7 | 0 |
| 0 | 0 |
| 1.2 | 1.2 |
| 5 | 5 |
Graphically, ReLU looks like:
3. Role of ReLU in Neural Networks
Without activation functions, no matter how many layers a neural network has, it will behave as a linear transformation.
ReLU allows the network to learn complex, nonlinear mappings.
Advantages:
- Non-linearity: Enables the network to capture complex patterns.
- Efficient computation: Just a threshold at zero.
- Better gradient flow: Avoids the vanishing gradient problem found in sigmoid/tanh.
- Sparse activation: Many outputs are zero, reducing computation and helping regularization.
4. Variants of ReLU
4.1 Leaky ReLU
Instead of outputting zero for negative values, Leaky ReLU allows a small slope:
Where ( \alpha ) is a small constant (e.g., ( \alpha = 0.01 )).
4.2 Parametric ReLU (PReLU)
Similar to Leaky ReLU, but ( \alpha ) is learned during training:
Where ( a ) is a trainable parameter.
4.3 Exponential Linear Unit (ELU)
ELU smooths the curve for negative inputs:
4.4 GELU (Gaussian Error Linear Unit)
A smoother alternative to ReLU, often used in Transformers:
Where ( \Phi(x) ) is the standard Gaussian CDF.
5. Python Code Examples
5.1 Basic ReLU Implementation
import numpy as np
def relu(x):
return np.maximum(0, x)
# Example
x = np.array([-3, -0.5, 0, 1, 2])
print(relu(x)) # Output: [0. 0. 0. 1. 2.]
5.2 Using ReLU in PyTorch
import torch
import torch.nn as nn
# Example: single layer with ReLU
model = nn.Sequential(
nn.Linear(5, 3),
nn.ReLU()
)
x = torch.tensor([[-1.0, 0.5, 2.0, -0.3, 4.0]])
output = model(x)
print(output)
5.3 Using Leaky ReLU in Keras
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, LeakyReLU
model = Sequential()
model.add(Dense(64, input_dim=10))
model.add(LeakyReLU(alpha=0.01))
model.add(Dense(1, activation='sigmoid'))
6. Limitations of ReLU
- Dying ReLU Problem: Neurons can get stuck outputting only 0 for all inputs.
- No activation for negative inputs can limit representation power.
- Possible solutions:
- Use Leaky ReLU, PReLU, or ELU.
- Careful weight initialization.
| Activation | Formula | Pros | Cons |
|---|---|---|---|
| ReLU | Fast, simple, avoids vanishing gradient | Dying ReLU | |
| Leaky ReLU | if , else | Fixes dying neurons | Slightly more compute |
| PReLU | Learnable slope for negatives | Flexible | Risk of overfitting |
| ELU | Smooth negative side | Better mean activations | Slightly slower |
| GELU | Smooth, Gaussian-based | Used in transformers | More complex to compute |
