Residual Network, Skip Connection Network
Residual networks (ResNets) utilize skip connections to enhance the training of deep neural
networks, allowing for better gradient flow and mitigating issues like vanishing gradients.
What are Residual Networks?
Residual Networks, or ResNets, are a type of deep learning architecture designed to facilitate
the training of very deep neural networks. They were introduced to address the degradation
problem, where adding more layers to a network does not necessarily improve performance
and can even worsen it. ResNets achieve this by incorporating skip connections, which allow
the model to learn residual mappings instead of direct transformations.
Architecture of Residual Networks
A typical residual block consists of a few stacked layers (e.g., convolutional layers) and a
shortcut connection that performs an element-wise addition with the output of these layers.
The output of a residual block can be mathematically represented as:
y=F(x)+x
where
F(x)
F(x) is the output of the stacked layers and
x is the input to the block. This identity mapping introduces no extra parameters and adds
no computational complexity.
Applications and Impact
ResNets have revolutionized deep learning, particularly in computer vision tasks such as
image classification and object detection. The architecture allows for the construction of
networks with hundreds or even thousands of layers, achieving state-of-the-art results on
benchmark datasets like ImageNet.
In summary, residual networks with skip connections are a powerful architectural pattern
that enhances the training of deep neural networks by improving gradient flow, simplifying
learning, and preserving information across layers. This has made them a foundational
component in modern deep learning applications.
1.1 Introduction to Deep Networks Deep learning involves stacking multiple layers to learn
complex features. However, as networks become deeper, they encounter a significant hurdle
known as the Vanishing Gradient Problem. This problem makes it increasingly difficult to
train very deep architectures because the model stops learning effectively.
1.2 Mathematical Root of the Problem The issue arises during the backpropagation phase
of training.
To update weights, the network calculates the derivative of the loss function with
respect to each weight using the chain rule.
This calculation involves the multiplication of derivatives across many layers:
dLoss
=Deri v 1 × D eri v 2 ×⋯ × D eri v n
dW
If these derivatives are small (e.g., values between 0 and 1), multiplying them
repeatedly results in an exponentially smaller value.
Eventually, the gradient becomes so small (close to zero) that the weights are barely
updated, stalling the training process.
1.3 Impact on Performance When the gradient vanishes, the network cannot converge to an
optimal solution, leading to poor accuracy despite the increased depth. ResNet was
specifically designed to allow gradients to flow through very deep networks without
disappearing.
The ResNet Solution — Residual Learning
2.1 The Concept of Skip Connections The core innovation of ResNet is the Skip Connection
(or shortcut connection). Instead of every layer directly learning the desired underlying
mapping, ResNet allows the input to "skip" one or more layers and be added directly to the
output of those layers.
2.2 The Residual Function In a traditional network, the layers try to learn a function H(x). In
ResNet, we reformulate this:
Let x be the input to a set of layers.
Let F(x) be the residual function learned by those layers.
The final output y is defined as: y=F(x)+x.
This means the layers are actually learning the difference (the "residual") between the input
and the output: F(x)=y−x.
2.3 Intuition Behind Residuals The logic is that it is easier for a network to learn to drive a
residual to zero than to learn an identity mapping from scratch. If the network determines
that the current layer isn't adding value, it can simply learn to make , allowing the input to
pass through unchanged (the identity), which preserves the performance of the shallower
model
Architectural Building Blocks — The Identity Block
3.1 Definition of the Identity Block The Identity Block is the standard building block used in
ResNet when the input dimensions match the output dimensions.
3.2 Structure of the Block A typical identity block (as seen in ResNet-50) consists of three
convolutional layers:
1. 1x1 Convolution: Used to reduce the number of channels (bottleneck).
2. 3x3 Convolution: Used to capture spatial features.
3. 1x1 Convolution: Used to restore the channel dimensions to the original size.
3.3 The Shortcut Path In this block, the shortcut path is a "straight wire" that carries the
input directly to the end of the block.
Because the dimensions of the input and the output of the convolutional path are
identical, they can be added together element-wise without any modification.
This addition is performed before the final ReLU activation function.
Architectural Building Blocks — The Convolutional Block
4.1 When Dimensions Mismatch In many parts of the network, we need to reduce
the spatial size of the image (height and width) while increasing the number of
filters. When the output size of the convolutional layers differs from the input size, a
simple identity shortcut is not possible because we cannot add two matrices of
different shapes.
4.2 The Convolutional Shortcut To resolve this, ResNet uses a Convolutional Block
(often represented by a dotted line in diagrams).
In this block, the shortcut path contains its own 1x1 convolutional layer.
This 1x1 convolution is responsible for transforming the dimensions of the input x
(adjusting height, width, and depth) so that they match the output of the main
convolutional path.
Typically, this is achieved by using a stride of 2 in the 1x1 convolution to halve the
spatial dimensions.
4.3 Balancing the Network By using these blocks, ResNet can effectively "resize" the
data in the shortcut path, ensuring that the addition F(x)+x remains mathematically
valid even as the data flows through deeper, more complex layers
Practical Implementation & Layer Specifics
5.1 Initial Pre-processing Layers Before entering the residual blocks, the input image (e.g., a
150×150×3 color image) passes through initial layers to reduce its size:
Conv1: A 7×7 filter with a stride of 2 and padding of 3. This reduces the spatial dimensions
significantly (e.g., from 150 down to 75).
Max Pooling: A 3×3 window with a stride of 2 further reduces the size (e.g., from 75 to 38)
5.2 Increasing Filter Depth As the image size decreases spatially, the number of filters
(channels) increases to capture more complex features:
The network might start with 64 filters.
Subsequent stages increase this to 128, 256, and 512 filters.
In a ResNet-50 architecture, these filters are often expanded by a factor of 4 at the end of a
block (e.g., a block with 512 filters might output 2048 channels)
5.3 Summary of Operations
Identity Block: Used when Input Size = Output Size.
Convolutional Block: Used when Input Size not equal to Output Size; uses a 1x1 Conv
in the shortcut.
Global Average Pooling: Used at the end of the network to reduce the feature maps
to a single vector for classification
[Link]
[Link]
Slide 1: Title Slide
Title: Deep Residual Learning for Image Recognition
Subtitle: Overcoming the Vanishing Gradient Problem with ResNet-50
Presenter: [Your Name]
Slide 2: The Problem: Vanishing Gradients
Issue: As networks become deeper, they encounter the "Vanishing Gradient
Problem".
Mechanism: During backpropagation, small derivatives are multiplied across many
layers using the chain rule.
dLoss
Mathematical Root: = Deri v 1 × Deri v 2 × … × Deri v .pact: Gradients become
dW
so small that weights are barely updated, stalling training.
Slide 3: The Solution: Residual Learning
Innovation: Introduction of Skip Connections (shortcut connections).
Residual Function: Instead of learning H(x), the layers learn the residual F(x) = y - x.
Output Formula: y = F(x) + x.
Logic: It is easier for a network to drive a residual to zero than to learn an identity
mapping from scratch.
Slide 4: ResNet-50 Block: The Identity Block
Usage: Applied when input dimensions match output dimensions.
Bottleneck Structure:
o 1x1 Conv: Reduces channel count (bottleneck).
o 3x3 Conv: Captures spatial features.
o 1x1 Conv: Restores channel dimensions.
Shortcut: An element-wise addition of input X to the output of the layers.
Slide 5: ResNet-50 Block: The Convolutional Block
Usage: Applied when dimensions mismatch (e.g., halving spatial size while increasing
filters).
The Shortcut: Contains its own 1x1 convolutional layer.
Function: Transforms input x to match the height, width, and depth of the main path.
Slide 6: Architecture and Performance
Pre-processing: Uses a 7x7 filter (Stride 2) and Max Pooling to reduce image size
early.
Depth Expansion: Filter depth increases through stages (64 ->128 ->256 ->512).
Global Average Pooling: Used at the end to prepare data for classification.
2. Word Document Content (Copy for Report)
Technical Summary: Residual Networks (ResNets)
1. Introduction Residual Networks (ResNets) utilize skip connections to enhance the training
of deep neural networks. They were introduced to address the "degradation problem,"
where adding more layers to a network worsens performance rather than improving it.
2. The Vanishing Gradient Problem Training deep architectures is difficult because gradients
can vanish during backpropagation. Since derivatives are multiplied across layers, small
values result in an exponentially smaller gradient, preventing weights from updating
effectively.
3. Residual Learning Mechanism The core innovation of ResNet is the Skip Connection,
which allows the input to "skip" layers and be added to the output. Mathematically, if x is
the input and F(x) is the residual function, the output y is y = F(x) + x. This identity mapping
introduces no extra parameters and adds no computational complexity.
4. Building Blocks
Identity Block: Used when dimensions are identical. It employs a bottleneck design
(1x1, 3x3, and 1x1 convolutions) to manage computational cost.
Convolutional Block: Used when dimensions change. A 1x1 convolution is added to
the shortcut path to resize the input so it can be added to the output.
5. Impact ResNets have revolutionized computer vision tasks such as image classification and
object detection. They allow for networks with hundreds or thousands of layers, achieving
state-of-the-art results on datasets like ImageNet.
Next Step: Would you like me to provide a Python script that you can run to automatically
generate a table of these architectural layers?