0% found this document useful (0 votes)
11 views6 pages

Deep Learning

The document compares three deep learning architectures: Convolutional Neural Networks (CNNs), Recurrent Neural Networks (RNNs), and Transformers. CNNs excel in processing spatial data like images, RNNs are designed for sequential data but struggle with long-term dependencies, while Transformers utilize self-attention for better context understanding and parallelization. Each architecture has distinct strengths, weaknesses, and use cases, making them suitable for different types of data and tasks.

Uploaded by

livilivi3898
Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views6 pages

Deep Learning

The document compares three deep learning architectures: Convolutional Neural Networks (CNNs), Recurrent Neural Networks (RNNs), and Transformers. CNNs excel in processing spatial data like images, RNNs are designed for sequential data but struggle with long-term dependencies, while Transformers utilize self-attention for better context understanding and parallelization. Each architecture has distinct strengths, weaknesses, and use cases, making them suitable for different types of data and tasks.

Uploaded by

livilivi3898
Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd

Comparison of Deep Learning Modules

Introduction to CNNs vs. RNNs vs. Transformers IN Deep Learning

Convolutional Neural Networks (CNNs) are specialized for grid data like images, using
filters to learn local spatial features with translation invariance. Recurrent Neural Networks (RNNs) were
designed for sequential data, processing it step-by-step with a hidden state to capture short-term memory,
but they often fail to model long-term dependencies. The modern Transformer architecture eliminates this
sequential processing by relying on the Self-Attention Mechanism to weigh the global context of every
element simultaneously, enabling better long-range dependency capture and faster training through
parallelization.

Convolutional Neural Networks (CNNs)


A specialized network for processing structured grid data like images (2D) or signals (1D). It
uses convolutional layers to automatically and adaptively learn spatial hierarchies of features (like edges,
textures, and shapes).

Key Uses
Computer Vision: Image Classification, Object Detection, Image Segmentation.

Strength
Translation Invariance: Can detect a feature regardless of its position in the image due to
shared weights/filters. Parameter Efficiency due to weight sharing. Excellent at capturing local spatial
patterns.

Weakness
Poor at modelling sequential or temporal dependencies. Limited ability to capture long-range
global context without very deep stacks.

Use Case Fit


Best for data where proximity (locality) and spatial structure are the most important factors.

Example
Identifying a cat in a photo: The CNN learns local features (whiskers, ears) and combines them into
higher-level representations (face, body) regardless of where the cat is positioned in the frame.
Recurrent Neural Networks (RNNs)

A network designed for processing sequential data like text, speech, or time-series. They have a
recurrent connection that allows information from the previous step (via a hidden state/memory) to be
carried forward, making the current prediction dependent on past inputs.

Key Uses

Sequence Modelling: Simple Time-Series Forecasting, basic Language Modelling, and early Machine
Translation. (Often replaced by LSTMs/GRUs due to limitations).

Strength

Excels at processing and understanding temporal dependencies in sequential data. The concept of a
hidden state provides a form of "memory.

Weakness

Vanishing/Exploding Gradient Problem: Struggles to learn or remember long-term dependencies


(information far back in the sequence). No Parallelization: Must process data one step at a time,
making training slow.

Use Case Fit

Suitable for short sequences or real-time streaming data where processing must be sequential.

Example

Predictive text/Auto-completion (simple case): Given the words "I love to eat fresh fruit," the RNN
uses the hidden state from the previous words to predict the next word is 'fruit' (or similar).
Transformers

A revolutionary architecture introduced in 2017 that also handles sequential data. It completely
replaced recurrence with the Self-Attention Mechanism, which allows it to weigh the importance of
all other elements in the sequence relative to the current element, regardless of their position.

Key Uses

State-of-the-Art NLP and Beyond: Large Language Models (GPT, BERT), Machine Translation, Text
Summarization, and increasingly Computer Vision (Vision Transformers/ViTs).

Strength

Long-Range Dependency Capture: Self-Attention considers the entire context at once. High
Parallelization: Eliminates the sequential bottleneck of RNNs, drastically speeding up training on
GPUs/TPUs.

Weakness

Computationally Expensive: The self-attention mechanism is $O(n^2)$ complexity with respect to


sequence length ($n$), making it resource-heavy for very long sequences. Requires massive datasets
to train effectively.

Use Case Fit

Best for tasks requiring a deep understanding of global context and long-range dependencies,
especially when massive data and compute are available.

Example

Machine Translation: Translating a long, complex sentence by allowing the model to simultaneously
look at every word in the source sentence to determine the best translation for any single word.
CNNs vs. RNNs vs. Transformers IN Deep Learning

Convolutional
Recurrent Neural
Feature Neural Transformer
Network (RNN)
Network (CNN)

Primary Spatial Data Sequential Data Sequential Data


Data Type (Images, Grids) (Text, Time-Series) (Text, Time-Series)

Convolutional
Self-Attention
Layers (shared Recurrent
Mechanism
weights to Connections
Core (calculates
extract local (processes data token-
Mechanism relationships
spatial features by-token, uses a hidden
between all tokens
like edges and state for memory)
simultaneously)
shapes)

Global/Long-
Sequential/Short-to-
Local (Each Range (Considers
Medium Term
neuron sees only all tokens in the
(Struggles with very
Handling of a small, sequence at every
long-range
Context neighboring step, making it
dependencies due to
region of the excellent for long-
the Vanishing Gradient
input) term
Problem)
dependencies)

High High (Self-


(Convolution attention is matrix
Low (Must process the
Parallelizati operation can be multiplication,
sequence one step
on done in parallel enabling massive
after the other)
across the parallelization
image) during training)

Image Simple Time Series Machine


Classification, Prediction, basic Translation, Large
Primary Use
Object Detection, Speech Recognition Language Models
Cases
Image (often superseded by (LLMs), Generative
Segmentation LSTMs/Transformers) AI
Supervised vs. Unsupervised Deep Learning (Learning Paradigm)

Supervised Deep
Feature Unsupervised Deep Learning
Learning

Labeled Data (Input is


paired with a
Unlabeled Data (Only input data
Training correct/desired output,
is provided; no corresponding
Data or "ground truth," e.g.,
output labels)
an image of a cat labeled
"cat")

Prediction and Pattern Discovery and


Classification (Learn a Representation Learning
Goal mapping function from (Discover hidden structures,
input ($X$) to output groupings, or features within the
($Y$)) data)

Image Classification, Clustering (e.g., K-Means,


Regression (predicting grouping similar customers),
Common
continuous values like Dimensionality Reduction (e.g.,
Tasks
house price), Sentiment Autoencoders), Generative
Analysis Modeling (e.g., GANs)

Objective (Uses clear Subjective/Exploratory


Model metrics like Accuracy, (Evaluation is harder; often uses
Evaluatio Precision, Recall, Mean internal metrics like cluster
n Squared Error, based on cohesion or requires human
the known labels) interpretation)

You might also like