0% found this document useful (0 votes)
3 views4 pages

Convolution Types and Activation Functions

The document outlines various convolution types used in CNNs, including Standard, Dilated, Transposed, Separable, Grouped, Pointwise, Causal, and Deformable convolutions, each with distinct characteristics and applications. It also discusses the importance of nonlinearity in CNNs, detailing different activation functions like Sigmoid, Tanh, ReLU, Leaky ReLU, ELU, and Softmax, along with their advantages and disadvantages. The overall focus is on enhancing feature extraction and improving model performance in tasks such as image generation and object detection.

Uploaded by

successtrbtet
Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views4 pages

Convolution Types and Activation Functions

The document outlines various convolution types used in CNNs, including Standard, Dilated, Transposed, Separable, Grouped, Pointwise, Causal, and Deformable convolutions, each with distinct characteristics and applications. It also discusses the importance of nonlinearity in CNNs, detailing different activation functions like Sigmoid, Tanh, ReLU, Leaky ReLU, ELU, and Softmax, along with their advantages and disadvantages. The overall focus is on enhancing feature extraction and improving model performance in tasks such as image generation and object detection.

Uploaded by

successtrbtet
Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

1.

Standard Convolution

 The basic sliding filter operation across the input.


 Each filter computes a weighted sum over the receptive field

2. Dilated (Atrous) Convolution

 Introduces gaps ("dilations") between kernel elements.


 Increases receptive field without increasing parameters.
 Useful in segmentation tasks (e.g., DeepLab).

3. Transposed Convolution (Deconvolution / Fractionally Strided)

 Upsampling variant of convolution.


 Used for generating larger feature maps (e.g., in decoders, GANs).

4. Separable Convolutions

 Spatially Separable: Breaks a 2D kernel into two 1D kernels (e.g., 3×3 → 3×1 + 1×3).
 Depthwise Separable: Splits convolution into two steps:
1. Depthwise convolution (per-channel).
2. Pointwise convolution (1×1 across channels).
 Used in MobileNet for efficiency.

5. Grouped Convolution

 Input channels are split into groups, and each group is convolved separately.
 Reduces computation.
 Used in ResNeXt and AlexNet.

6. Pointwise Convolution (1×1 Conv)

 A convolution with kernel size = 1.


 Used for channel mixing and dimensionality reduction.
 Core part of Inception modules.

7. Causal Convolution

 Ensures output at time t depends only on input at time ≤ t.


 Used in temporal models like WaveNet.

8. Deformable Convolution

 Learns offsets for sampling positions instead of fixed grid.


 Improves handling of geometric transformations (e.g., object detection).
In short:

 Standard = basic
 Dilated = bigger receptive field
 Transposed = upsampling
 Separable (depthwise/pointwise) = efficiency
 Grouped = channel grouping
 Causal = time-series
 Deformable = adaptive receptive field

Variant Operation / Formula Key Idea Use Case


Standard Sliding kernel over input, Feature extraction in
y=∑w⋅xy = \sum w \cdot x
Convolution weighted sum CNNs
Semantic
y=∑w⋅xd⋅iy = \sum w \cdot Inserts gaps (dilation rate)
Dilated (Atrous) segmentation
x_{d \cdot i} in kernel
(DeepLab)
Transposed Spreads input over output Image generation,
Reverse of standard conv
(Deconv) grid, learns upsampling decoders, GANs
Spatially 2D kernel → two 1D Factorizes kernel (e.g., 3×3 Reduces
Separable kernels → 3×1 + 1×3) computation
Depthwise Depthwise conv + Convolution per channel + MobileNet, efficient
Separable Pointwise (1×1) mixing channels CNNs
Grouped Split input channels into Convolve each group
AlexNet, ResNeXt
Convolution groups separately
Mixes channels,
Pointwise (1×1) Kernel size = 1×1 Inception modules
dimensionality reduction
Causal yt=∑w⋅x≤ty_t = \sum w Only depends on current & Time-series,
Convolution \cdot x_{\leq t} past inputs WaveNet
Deformable y=∑w⋅x(p+Δp)y = \sum w Learns offsets for sampling Object detection,
Convolution \cdot x(p + \Delta p) positions dense prediction

CNN LEARNING NONLINEARITY FUNCTION IN CNN:

 After convolution and pooling layers extract features, the activation function introduces non-
linearity so that the CNN can approximate nonlinear decision boundaries.

 Without nonlinearity, multiple convolution layers would collapse into a single linear
transformation → CNN would behave like a single linear classifier.
low of Nonlinearity in CNN

1. Input image → Convolution layer (linear feature extraction)


2. Activation (ReLU, etc.) → Nonlinearity
3. Pooling → Downsampling
4. Stack multiple layers (conv + activation)
5. Fully connected + Softmax for final prediction

Activation
Advantages Disadvantages
Function
– Smooth output between 0 and 1 – Vanishing gradient (small updates
Sigmoid (probability-like) – Historically well in deep layers) – Not zero-centered →
understood slower convergence
– Output between -1 and 1 (zero-
– Still suffers vanishing gradient –
Tanh centered) – Stronger gradients than
Slower than ReLU
sigmoid
– Very fast to compute – Reduces
– Dead neuron problem (neurons
ReLU vanishing gradient problem – Sparse
stuck at 0 forever) – Not smooth at 0
activation (only positive neurons fire)
– Fixes dead neuron issue (small slope for
– Extra parameter α to tune – Slightly
Leaky ReLU negatives) – Works better than ReLU in
more compute
some tasks
ELU – Smooth curve for negative values – – More computationally expensive –
Activation
Advantages Disadvantages
Function
Faster convergence than ReLU – Mean Slower than ReLU
activations closer to 0 → helps training
– Not used in hidden layers – Can be
Softmax – Converts raw scores into probability
unstable with very large inputs (needs
(output layer) distribution – Good for classification
normalization)

You might also like