Unit 2

The document discusses the principles of learning in deep networks, focusing on backpropagation and gradient descent techniques. It outlines the training process, advantages, and challenges of backpropagation, as well as the differences between batch, stochastic, and mini-batch gradient descent. Stochastic Gradient Descent is highlighted as a fast and efficient method for training neural networks, particularly for large datasets.

Uploaded by

Tamil Ravee

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

4 views23 pages

Unit 2

Uploaded by

Tamil Ravee

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

UNIT II

LEARNING IN DEEP NETWORKS

UNIT II LEARNING IN DEEP NETWORKS

Back propagation training, Learning the weights, Chain rule, Stochastic

gradient descent, Sigmoid units and vanishing gradient, Rectified Linear Unit
(ReLU) and its variants - Cross entropy for classification and activation, Batch
learning.
Backpropagation (Backward Propagation of Errors) is a supervised learning algorithm used to train neural
networks by minimizing the loss function.
It computes gradients of the loss with respect to weights and biases using the chain rule of calculus and
updates them iteratively.
• 2. Purpose of Backpropagation
• Reduce the difference between predicted output and actual output
• Enable learning in multi-layer (deep) networks
• Optimize model parameters efficiently
• 3. Why Backpropagation is Important
• Efficient Weight Update
Computes gradients for all parameters in a single backward pass.
• Scalability
Works for deep and complex architectures.
• Automated Learning
Network adjusts weights automatically to reduce error.
• Foundation of Deep Learning
Enables CNNs, RNNs, Transformers, etc.
[Link]
Training Process Overview
• Backpropagation consists of two main phases:
• Forward Pass
• Backward Pass
Backward Pass (Core of Backpropagation)
•Error is propagated from output layer to input layer
•Gradients are computed using chain rule
•Each weight is adjusted based on its contribution to the error
Iterative Learning
•Forward pass → Error calculation → Backward pass → Weight update
•Repeated over many epochs
•Training continues until loss is minimized
Advantages Challenges / Limitations
[Link] to implement [Link] Gradient Problem
[Link] gradient computation [Link] Gradient Problem
[Link] with deep networks [Link] in complex networks
[Link] generalization ability [Link] to learning rate
[Link] to large datasets [Link] differentiable functions
• Example of Back Propagation
BACK PROPAGATION
Chain Rule:
• A neural network is a computational graph made of many connected functions.
The chain rule is used to compute how the loss changes with respect to each
weight and bias by multiplying derivatives through these functions.
• Backpropagation is the systematic application of the chain rule to efficiently
calculate gradients, which are used to update the network parameters. Thus, the
chain rule is the core mathematical principle behind learning in deep neural
networks.
• Without the chain rule, deep learning models cannot be trained.
Chain Rule:
• “Learning the weights” means automatically finding the best values of the
weights and biases of a neural network so that its predictions become accurate.
• Learning the weights is achieved by:
Backpropagation (to compute gradients) + Stochastic Gradient Descent (to
update parameters).
• Learning the weights means using gradients and SGD to adjust weights so that the
neural network makes better predictions.
Gradient Descent
• Gradient Descent is an optimization algorithm used in deep learning to
train a neural network by minimizing the loss (error) of the model.
• It helps the model learn by adjusting weights and biases so that predictions
become more accurate.
Why do we need Gradient Descent?
• In deep learning, the model makes predictions and calculates an error using a
loss function.
The goal is to reduce this error as much as possible.
Gradient Descent finds the best values of parameters that make this error
minimum.
How does Gradient Descent work?
• The neural network makes a prediction.
• The error (loss) is calculated.
• The gradient (slope of the error curve) is computed.
• The weights are updated in the opposite direction of the gradient.
• This process repeats until the error becomes very small.
• This is done using the chain rule of calculus
Batch Gradient Descent
• In Batch Gradient Descent, the gradient is calculated using the entire training dataset.
• How it works
• The model processes all training samples.
• The total loss is computed.
• The gradient is calculated using all data.
• Weights are updated once per epoch.
η – Eta
Δ (Uppercase Delta)
δ (Lowercase Delta)
∇ (Nabla / Del Operator) — Gradient
∂ Partial Derivative
Stochastic Gradient Descent (SGD)
• In SGD, the gradient is computed using only one data sample at a time.
• How it works
• Pick one data point.
• Compute loss and gradient.
• Update weights immediately.
• Repeat for all data points.
Mini-Batch Gradient Descent
• This is a combination of Batch GD and SGD.
It uses a small group of samples (mini-batch) for each update.
• Typical mini-batch sizes: 32, 64, 128
• How it works
• Divide dataset into small batches.
• Compute gradient for one batch.
• Update weights.
• Repeat for all batches.
Comparison Table
Feature Batch GD SGD Mini-Batch GD
Data used Full dataset 1 sample Small batch
Speed Slow Very fast Fast
Stability Very stable Noisy Balanced
Memory High Low Medium
Used in practice Rare Sometimes Most common
Why Mini-Batch is preferred in Deep Learning?
Modern deep learning (CNNs, RNNs, Transformers) uses Mini-Batch Gradient
Descent because it:
•Works well with GPUs
•Is fast and stable
•Handles big datasets
Stochastic Gradient Descent (SGD) in Deep Learning
• Stochastic Gradient Descent (SGD) is one of the most important optimization
algorithms used to train neural networks. It updates the model’s weights using
one training example at a time instead of using the whole dataset.
• SGD is widely used in deep learning because it is fast, memory-efficient, and
effective for large datasets.
• Why is it called “Stochastic”?
• The word stochastic means random.
In SGD, one random data point is chosen at a time to compute the gradient and
update the weights.
So the updates are noisy and random, but this helps the model learn better.
How SGD works
• Suppose we have a dataset with 1,00,000 samples.
• Instead of waiting to process all 1,00,000 samples (as in Batch Gradient
Descent), SGD:
• Takes one sample
• Computes the loss
• Finds the gradient
• Updates the weights immediately
• Moves to the next sample
• This happens thousands of times in one training cycle (epoch).
Advantages of SGD
Disadvantages of SGD
[Link] fast for large datasets
[Link] value fluctuates
[Link] less memory
[Link] not move smoothly
[Link] escape local minima
[Link] careful learning rate tuning
[Link] well for online learning
Summary
Stochastic Gradient Descent is a fast and powerful algorithm that updates neural network weights
one data point at a time, allowing efficient learning for large-scale deep learning problems.

Gradient Descent & Backpropagation Explained
No ratings yet
Gradient Descent & Backpropagation Explained
7 pages
Gradient Descent in Neural Networks
No ratings yet
Gradient Descent in Neural Networks
3 pages
Gradient-Based Optimization in Deep Learning
No ratings yet
Gradient-Based Optimization in Deep Learning
9 pages
Neural Network Training and Optimization
No ratings yet
Neural Network Training and Optimization
34 pages
Unit 2 - DLTM
No ratings yet
Unit 2 - DLTM
62 pages
Gradient Descent in Neural Networks
No ratings yet
Gradient Descent in Neural Networks
26 pages
Training Supervised Deep Learning Models
No ratings yet
Training Supervised Deep Learning Models
25 pages
Understanding Gradient Descent Algorithms
No ratings yet
Understanding Gradient Descent Algorithms
13 pages
Seminar Notes
No ratings yet
Seminar Notes
9 pages
Deep Learning: Gradient Optimization Techniques
No ratings yet
Deep Learning: Gradient Optimization Techniques
40 pages
Backpropagation and Gradient Descent Explained
No ratings yet
Backpropagation and Gradient Descent Explained
10 pages
Understanding Backpropagation Basics
No ratings yet
Understanding Backpropagation Basics
12 pages
Gradient Descent Optimization Techniques
No ratings yet
Gradient Descent Optimization Techniques
54 pages
Lec 4 Neural
No ratings yet
Lec 4 Neural
42 pages
Understanding Neural Networks & Optimization
No ratings yet
Understanding Neural Networks & Optimization
37 pages
Gradient Descent Variations Explained
No ratings yet
Gradient Descent Variations Explained
21 pages
CNN Batch Size and Optimization Techniques
100% (1)
CNN Batch Size and Optimization Techniques
59 pages
Vanishing Gradient in Neural Networks
No ratings yet
Vanishing Gradient in Neural Networks
12 pages
Module 03 Backprop Opti
No ratings yet
Module 03 Backprop Opti
72 pages
Unit2 DeepLearning ComprehensiveNotes
No ratings yet
Unit2 DeepLearning ComprehensiveNotes
20 pages
Deep Learning Foundations Explained
No ratings yet
Deep Learning Foundations Explained
30 pages
Optimizing Neural Network Training Techniques
No ratings yet
Optimizing Neural Network Training Techniques
34 pages
Neural Network and Deep Learning Assignment Arokiya Aswanth A
No ratings yet
Neural Network and Deep Learning Assignment Arokiya Aswanth A
7 pages
Stochastic Gradient Descent in Deep Learning
No ratings yet
Stochastic Gradient Descent in Deep Learning
9 pages
Deep Learning Optimization Techniques
No ratings yet
Deep Learning Optimization Techniques
23 pages
Neural Network Optimization Algorithms
No ratings yet
Neural Network Optimization Algorithms
25 pages
Adam Optimizer in Neural Networks
No ratings yet
Adam Optimizer in Neural Networks
24 pages
Backpropagation in Deep Learning Explained
No ratings yet
Backpropagation in Deep Learning Explained
48 pages
Understanding Gradient Descent Techniques
No ratings yet
Understanding Gradient Descent Techniques
31 pages
Neural Network and Deep Learning Assignment
No ratings yet
Neural Network and Deep Learning Assignment
7 pages
DLT 3
No ratings yet
DLT 3
11 pages
Deep Learning Optimizers Explained
No ratings yet
Deep Learning Optimizers Explained
12 pages
SGD Variants in Neural Networks
No ratings yet
SGD Variants in Neural Networks
211 pages
Supervised Deep Learning Techniques
No ratings yet
Supervised Deep Learning Techniques
28 pages
Stochastic Gradient Descent
No ratings yet
Stochastic Gradient Descent
2 pages
Unit II
No ratings yet
Unit II
14 pages
Backpropagation and Gradient Descent in Deep Learning
No ratings yet
Backpropagation and Gradient Descent in Deep Learning
16 pages
Unit Ii
No ratings yet
Unit Ii
31 pages
FFNN Training and Backpropagation Guide
No ratings yet
FFNN Training and Backpropagation Guide
46 pages
Single Feed Forward
No ratings yet
Single Feed Forward
147 pages
Unit 3 R23 DL
No ratings yet
Unit 3 R23 DL
24 pages
Gradient Descent
No ratings yet
Gradient Descent
10 pages
Overview of Gradient Descent Methods
No ratings yet
Overview of Gradient Descent Methods
2 pages
Optimizing Neural Networks with Gradients
No ratings yet
Optimizing Neural Networks with Gradients
31 pages
Understanding Gradient Descent Techniques
No ratings yet
Understanding Gradient Descent Techniques
8 pages
Backpropagation in Deep Learning Explained
No ratings yet
Backpropagation in Deep Learning Explained
7 pages
Overview of Gradient Descent Algorithms
No ratings yet
Overview of Gradient Descent Algorithms
12 pages
Multilayer Neural Networks Overview
No ratings yet
Multilayer Neural Networks Overview
24 pages
An Overview of Gradient Descent Optimization Algorithms
No ratings yet
An Overview of Gradient Descent Optimization Algorithms
14 pages
Understanding Gradient Descent in ML
No ratings yet
Understanding Gradient Descent in ML
19 pages
CBOW vs Skip-Gram in Word2Vec
No ratings yet
CBOW vs Skip-Gram in Word2Vec
170 pages
Deep Learning Basics and Gradient Descent
No ratings yet
Deep Learning Basics and Gradient Descent
23 pages
Final SGD Notes
No ratings yet
Final SGD Notes
24 pages
Module 1
No ratings yet
Module 1
19 pages
Key Deep Learning Terms Explained
No ratings yet
Key Deep Learning Terms Explained
9 pages
Back Propagation in Neural Networks
No ratings yet
Back Propagation in Neural Networks
16 pages
Deep Learning Optimization Algorithms
No ratings yet
Deep Learning Optimization Algorithms
31 pages
Spring 2024 Deep Learning Course Guide
No ratings yet
Spring 2024 Deep Learning Course Guide
3 pages
Activation in Neural Network Layers
100% (2)
Activation in Neural Network Layers
45 pages
Research Contributions in Computer Vision
No ratings yet
Research Contributions in Computer Vision
2 pages
Recursive Neural Networks Overview
100% (1)
Recursive Neural Networks Overview
71 pages
Perceptron Neural Network Implementation
No ratings yet
Perceptron Neural Network Implementation
3 pages
Neural Networks & Deep Learning Course
No ratings yet
Neural Networks & Deep Learning Course
3 pages
Deep Learning Practical by Ritik Kumar
No ratings yet
Deep Learning Practical by Ritik Kumar
17 pages
Generative AI Course Schedule Overview
No ratings yet
Generative AI Course Schedule Overview
1 page
Back-Propagation Algorithm Overview
No ratings yet
Back-Propagation Algorithm Overview
24 pages
Soft Computing Mid Exam Questions
No ratings yet
Soft Computing Mid Exam Questions
3 pages
Understanding Generative Adversarial Networks
No ratings yet
Understanding Generative Adversarial Networks
79 pages
Introduction to Neural Networks Basics
No ratings yet
Introduction to Neural Networks Basics
28 pages
Neural Networks Overview and Backpropagation
No ratings yet
Neural Networks Overview and Backpropagation
38 pages
Computational Units in Deep Learning
No ratings yet
Computational Units in Deep Learning
19 pages
Perceptrons and Neural Network Training
No ratings yet
Perceptrons and Neural Network Training
13 pages
Deep Learning vs Machine Learning Guide
No ratings yet
Deep Learning vs Machine Learning Guide
20 pages
Handwritten Digit Recognition Model
No ratings yet
Handwritten Digit Recognition Model
2 pages
DL Spring2026 Adeel ANN Classification Week4
No ratings yet
DL Spring2026 Adeel ANN Classification Week4
65 pages
Deep Learning Basics and Neural Networks
No ratings yet
Deep Learning Basics and Neural Networks
80 pages
100x GenAI Indepth Curriculum
No ratings yet
100x GenAI Indepth Curriculum
6 pages
Back-Propagation Neural Network Overview
No ratings yet
Back-Propagation Neural Network Overview
42 pages
Supervised Training of SNNs in PyTorch
No ratings yet
Supervised Training of SNNs in PyTorch
24 pages
Comparing Neural Network Models
No ratings yet
Comparing Neural Network Models
3 pages
Fake News Detection with Deep Learning
No ratings yet
Fake News Detection with Deep Learning
8 pages
Deep Learning Interview Insights
No ratings yet
Deep Learning Interview Insights
13 pages
Multi-Layer Perceptron Overview
No ratings yet
Multi-Layer Perceptron Overview
88 pages
ResNet50 Architecture Overview
No ratings yet
ResNet50 Architecture Overview
4 pages
Introduction to ANN with Keras
No ratings yet
Introduction to ANN with Keras
252 pages
ANN Question Paper 2022
No ratings yet
ANN Question Paper 2022
4 pages
Anlp 11 Distillation
No ratings yet
Anlp 11 Distillation
28 pages

Unit 2

Uploaded by

Unit 2

Uploaded by

UNIT II

LEARNING IN DEEP NETWORKS

Back propagation training, Learning the weights, Chain rule, Stochastic

You might also like