ReLU Activation Function in Neural Networks

1. Introduction

The Rectified Linear Unit (ReLU) is one of the most widely used activation functions in modern neural networks.
Its primary purpose is to introduce non-linearity into the network while maintaining computational efficiency.

Mathematically, ReLU is defined as:

f(x) = \max(0, x)

2. How ReLU Works

If ( x > 0 ) → ( f(x) = x ) (passes the value as it is).
If ( x \leq 0 ) → ( f(x) = 0 ) (neuron output is off).

Input (x)	Output (f(x))
-3	0
-0.7	0
0	0
1.2	1.2
5	5

Graphically, ReLU looks like:

f(x) = \begin{cases} 0 & \text{if } x \leq 0 \\ x & \text{if } x > 0 \end{cases}

3. Role of ReLU in Neural Networks

Without activation functions, no matter how many layers a neural network has, it will behave as a linear transformation.
ReLU allows the network to learn complex, nonlinear mappings.

Advantages:

Non-linearity: Enables the network to capture complex patterns.
Efficient computation: Just a threshold at zero.
Better gradient flow: Avoids the vanishing gradient problem found in sigmoid/tanh.
Sparse activation: Many outputs are zero, reducing computation and helping regularization.

4. Variants of ReLU

4.1 Leaky ReLU

Instead of outputting zero for negative values, Leaky ReLU allows a small slope:

f(x) = \begin{cases} \alpha x & \text{if } x < 0 \\ x & \text{if } x \geq 0 \end{cases}

Where ( \alpha ) is a small constant (e.g., ( \alpha = 0.01 )).

4.2 Parametric ReLU (PReLU)

Similar to Leaky ReLU, but ( \alpha ) is learned during training:

f(x) = \begin{cases} a x & \text{if } x < 0 \\ x & \text{if } x \geq 0 \end{cases}

Where ( a ) is a trainable parameter.

4.3 Exponential Linear Unit (ELU)

ELU smooths the curve for negative inputs:

f(x) = \begin{cases} \alpha (e^x - 1) & \text{if } x < 0 \\ x & \text{if } x \geq 0 \end{cases}

4.4 GELU (Gaussian Error Linear Unit)

A smoother alternative to ReLU, often used in Transformers:

\text{GELU}(x) = x \cdot \Phi(x)

Where ( \Phi(x) ) is the standard Gaussian CDF.

5. Python Code Examples

5.1 Basic ReLU Implementation

import numpy as np

def relu(x):
    return np.maximum(0, x)

# Example
x = np.array([-3, -0.5, 0, 1, 2])
print(relu(x))  # Output: [0.  0.  0.  1.  2.]

5.2 Using ReLU in PyTorch

import torch
import torch.nn as nn

# Example: single layer with ReLU
model = nn.Sequential(
    nn.Linear(5, 3),
    nn.ReLU()
)

x = torch.tensor([[-1.0, 0.5, 2.0, -0.3, 4.0]])
output = model(x)
print(output)

5.3 Using Leaky ReLU in Keras

from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, LeakyReLU

model = Sequential()
model.add(Dense(64, input_dim=10))
model.add(LeakyReLU(alpha=0.01))
model.add(Dense(1, activation='sigmoid'))

6. Limitations of ReLU

Dying ReLU Problem: Neurons can get stuck outputting only 0 for all inputs.
No activation for negative inputs can limit representation power.
Possible solutions:

Use Leaky ReLU, PReLU, or ELU.
Careful weight initialization.

Activation	Formula	Pros	Cons
ReLU	$\max(0, x)$	Fast, simple, avoids vanishing gradient	Dying ReLU
Leaky ReLU	$x$ if $x \ge 0$ , else $\alpha x$	Fixes dying neurons	Slightly more compute
PReLU	Learnable slope for negatives	Flexible	Risk of overfitting
ELU	Smooth negative side	Better mean activations	Slightly slower
GELU	Smooth, Gaussian-based	Used in transformers	More complex to compute

Topics

ReLU Activation Function in Neural Networks

1. Introduction

2. How ReLU Works

3. Role of ReLU in Neural Networks

4. Variants of ReLU

4.1 Leaky ReLU

4.2 Parametric ReLU (PReLU)

4.3 Exponential Linear Unit (ELU)

4.4 GELU (Gaussian Error Linear Unit)

5. Python Code Examples

5.1 Basic ReLU Implementation

5.2 Using ReLU in PyTorch

5.3 Using Leaky ReLU in Keras

6. Limitations of ReLU

ORA.ai

Hello! I'm your AI assistant

Topics

1. Introduction

2. How ReLU Works

3. Role of ReLU in Neural Networks

4. Variants of ReLU

4.1 Leaky ReLU

4.2 Parametric ReLU (PReLU)

4.3 Exponential Linear Unit (ELU)

4.4 GELU (Gaussian Error Linear Unit)

5. Python Code Examples

5.1 Basic ReLU Implementation

5.2 Using ReLU in PyTorch

5.3 Using Leaky ReLU in Keras

6. Limitations of ReLU

🍪 We use cookies

Cookie Settings

Essential Cookies

Analytics Cookies

Marketing Cookies

ORA.ai

Hello! I'm your AI assistant