0% found this document useful (0 votes)

21 views12 pages

Stochastic Gradient Descent in Deep Learning

Stochastic Gradient Descent (SGD) is a key algorithm in Deep Learning that optimizes objective functions by using a single sample or mini-batch at each step, making it suitable for large datasets. Unlike traditional Gradient Descent, which can be computationally intensive, SGD reduces the number of calculations significantly, allowing for efficient handling of big data and quick updates when new data is available. The document outlines the basic principles of SGD, its advantages over standard Gradient Descent, and the iterative process involved in optimizing model parameters.

Uploaded by

poojamani133

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

21 views12 pages

Stochastic Gradient Descent in Deep Learning

Uploaded by

poojamani133

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

DEEP LEARNING

Stochastic Gradient Descent

Rajesh A
Stochastic Gradient Descent

▪ Stochastic Gradient Descent (SGD) is one of the core algorithms used in Deep
Learning
▪ SGD is based on Gradient Descent ; Gradient Descent an iterative optimization
process that searches for an objective function’s optimum value (Min/Max)

▪ Gradient simply measures the change in all weights with regard to the change
in error. You can also think of a gradient as the slope of a function
▪ In mathematical terms, a gradient is a partial derivative with respect to its
inputs
Gradient Descent

▪ A dataset of 3 items of weight and height. Given

a particular weight we want to predict the height
▪ Simple Linear Regression Model
▪ There could be multiple models possible, which
the is the best fit model?
▪ This simple model can be represented as

H=I+sW

▪ What are the values of Intercept(I) and Slope (s)

for the best fit linear regression model?
Gradient Descent – Loss Function

▪ Identify the Loss Function for the proposed model

▪ There are various loss functions like Sum of

squared Error (SSE), Mean Squared Error
(MSE), Root Mean squared Error (RMSE), Mean
Absolute Error (MAE), R2 Error…
Loss function = Sum of Squared Residuals

Loss function = (y1 – yp1)2 + (y2 – yp2)2 + (y3 – yp3)2

▪ The model with least sum squared residuals (Loss Function) value, is the most suitable
model
▪ Optimization (Minimization) problem
▪ Computation requirement for this is Huge. For N data points, P (number of features [I,
s] to evaluate) number of computations is (N x P). More number of features makes the
computation more complex
Gradient Descent – Basic Idea

▪ Take long step (descent) when slope (Gradient) is high

L
▪ As slope(gradient) comes close to zero, take smaller
steps
▪ Helps to decrease the computational complexity

s I
b – Next value of attribute (I or s) L

a – Current Value of attribute (I or s)

γ – Learning Rate
- Gradient

I
Gradient Descent - example

…
Gradient Descent – Basic Idea…

A gradient descent of an n-dimensional

function f(x) at a given point p is

= - γ
p+1 p

▪ Gradient Descent can be very sensitive to the learning rate (γ)

▪ In Practice, reasonable learning rate can be determined automatically determined
by starting large and getting smaller with each step
Gradient Descent

Step1: Take derivative of loss function for each attribute (feature) in it

Step2: Pick up Random values to the attribute (feature)

Step3: Plugin the current attribute values derivatives (Gradient)

Step4: Calculate step sizes for each attribute : ( slope x γ)

Step5: Calculate the new value for each attribute = current value – step size

This iterative process continues till step size becomes very small ( < threshold ) or
number of iterations becomes very large ( > limit )
Stochastic Gradient Descent

▪ Scenario: 23,000 genes to predict is a person has a particular disease and there are
1,000,000 million samples.
𝑑 𝐿𝑜𝑠𝑠 𝐹𝑢𝑛𝑐𝑡𝑖𝑜𝑛 𝑑 𝐿𝑜𝑠𝑠 𝐹𝑢𝑛𝑐𝑡𝑖𝑜𝑛
▪ This would need 23,000 derivatives …
𝑑 (𝑔𝑒𝑛𝑒1) 𝑑 (𝑔𝑒𝑛𝑒23,000)

▪ In each of the derivative, there are 1,000,000 terms

▪ 23 billion terms to be calculated in each step
▪ It is common to take 1000 steps,
▪ A total of 2.3 trillion (2,300,000,000,000) terms

For Big Data, Gradient Descent is SLOW Stochastic Gradient Descent

Stochastic Gradient Descent…
▪ Stochastic Gradient Descent would randomly pick
one sample for each step
▪ In this example, SGD will reduce number of terms by
3. When we had 1,000,000 samples, SDG would
reduce the number of terms by 1,000,000

▪ Stochastic Gradient Descent would is especially

useful when there are redundancies in data
▪ In this example, we have 12 data points. But there is
redundancy that forms 3 clusters
▪ SDG reduce number of terms by 12
Stochastic Gradient Descent…

▪ Strict definition of Strict Stochastic Gradient Descent

is to only use one sample for step
▪ It is more common to select a small subset of data or
mini-batch for each step for better results while
keeping the number of terms manageable level

▪ SDG can handle new data very efficiently

▪ We can just take another step for attribute value
estimates without taking having to start from scratch
Stochastic Gradient Descent

▪ Stochastic Gradient Descent is just like Gradient Descent, except it only looks at one
sample or a mini-batch for per each step
▪ Gradient Descent may not be computationally feasible on Big Data. Stochastic
Gradient Descent can handle Big Data (Large number of features and large amount
of data points)
▪ Stochastic Gradient can update the model efficiently when new data is available

Stochastic Gradient Descent Explained
No ratings yet
Stochastic Gradient Descent Explained
5 pages
Gradient Descent and Stochastic Gradient Descent1
No ratings yet
Gradient Descent and Stochastic Gradient Descent1
18 pages
Stochastic Gradient Descent Explained
No ratings yet
Stochastic Gradient Descent Explained
4 pages
Lect - 20 Stochastic Gradient Descent
No ratings yet
Lect - 20 Stochastic Gradient Descent
73 pages
Gradient Descent vs Stochastic Methods
No ratings yet
Gradient Descent vs Stochastic Methods
42 pages
Understanding Gradient Descent Methods
No ratings yet
Understanding Gradient Descent Methods
15 pages
Understanding Gradient Descent Methods
No ratings yet
Understanding Gradient Descent Methods
4 pages
GD vs. SGD in Machine Learning
No ratings yet
GD vs. SGD in Machine Learning
11 pages
Stochastic Gradient Descent Overview
No ratings yet
Stochastic Gradient Descent Overview
3 pages
L12-Gradient Descent
No ratings yet
L12-Gradient Descent
32 pages
Types of Gradient Descent Explained
No ratings yet
Types of Gradient Descent Explained
8 pages
Stochastic Gradient Descent Explained
No ratings yet
Stochastic Gradient Descent Explained
8 pages
Optimize Learning with SGD & Hyperparameters
No ratings yet
Optimize Learning with SGD & Hyperparameters
15 pages
Gradient Descent New
No ratings yet
Gradient Descent New
35 pages
Adam Optimizer in Neural Networks
No ratings yet
Adam Optimizer in Neural Networks
24 pages
Understanding Gradient Descent Techniques
No ratings yet
Understanding Gradient Descent Techniques
40 pages
Gradient Descent for Local Minima
No ratings yet
Gradient Descent for Local Minima
7 pages
Overview of Stochastic Gradient Descent
No ratings yet
Overview of Stochastic Gradient Descent
11 pages
Understanding Gradient Descent Methods
No ratings yet
Understanding Gradient Descent Methods
14 pages
Understanding Gradient Descent Techniques
No ratings yet
Understanding Gradient Descent Techniques
31 pages
KNN LR Mahesh
No ratings yet
KNN LR Mahesh
45 pages
Gradient Descent in Machine Learning
No ratings yet
Gradient Descent in Machine Learning
42 pages
Gradient Descent in Machine Learning
No ratings yet
Gradient Descent in Machine Learning
45 pages
Understanding Stochastic Gradient Descent
No ratings yet
Understanding Stochastic Gradient Descent
10 pages
Stochastic Gradient Descent Explained
No ratings yet
Stochastic Gradient Descent Explained
63 pages
Gradient Descent in Machine Learning
No ratings yet
Gradient Descent in Machine Learning
24 pages
Gradient Descent and SGD Overview
No ratings yet
Gradient Descent and SGD Overview
31 pages
Understanding Gradient Descent Methods
No ratings yet
Understanding Gradient Descent Methods
42 pages
Understanding Gradient Descent
No ratings yet
Understanding Gradient Descent
20 pages
Gradient Descent Optimization Techniques
No ratings yet
Gradient Descent Optimization Techniques
54 pages
Overview of Gradient Descent Methods
No ratings yet
Overview of Gradient Descent Methods
27 pages
Understanding Gradient Descent Methods
No ratings yet
Understanding Gradient Descent Methods
2 pages
Understanding Gradient Descent Variants
No ratings yet
Understanding Gradient Descent Variants
5 pages
Tut 4 - Optimization and Gradient Descent
No ratings yet
Tut 4 - Optimization and Gradient Descent
24 pages
Stochastic Gradient Descent Explained
No ratings yet
Stochastic Gradient Descent Explained
27 pages
Deep Learning: Dr. Nehal Sakr
No ratings yet
Deep Learning: Dr. Nehal Sakr
52 pages
SGD 2
No ratings yet
SGD 2
18 pages
Stochastic Gradient Descent
No ratings yet
Stochastic Gradient Descent
2 pages
Gradient Descent Explained for Deep Learning
No ratings yet
Gradient Descent Explained for Deep Learning
5 pages
Understanding Gradient Descent Methods
No ratings yet
Understanding Gradient Descent Methods
37 pages
Stochastic Gradient Descent in ML
No ratings yet
Stochastic Gradient Descent in ML
24 pages
Unit 3
No ratings yet
Unit 3
54 pages
Understanding Gradient Descent in ML
No ratings yet
Understanding Gradient Descent in ML
17 pages
Deep Learning Optimizers Explained
No ratings yet
Deep Learning Optimizers Explained
12 pages
Types of Gradient Descent Algorithms
No ratings yet
Types of Gradient Descent Algorithms
9 pages
Lecture - 04 05 06
No ratings yet
Lecture - 04 05 06
65 pages
Understanding Gradient Descent in ML
No ratings yet
Understanding Gradient Descent in ML
11 pages
Understanding Gradient Descent Optimization
No ratings yet
Understanding Gradient Descent Optimization
8 pages
Understanding Gradient Descent Basics
No ratings yet
Understanding Gradient Descent Basics
22 pages
Linear Regression and Gradient Descent Guide
No ratings yet
Linear Regression and Gradient Descent Guide
29 pages
Understanding Stochastic Gradient Descent
No ratings yet
Understanding Stochastic Gradient Descent
9 pages
Understanding Gradient Descent Methods
No ratings yet
Understanding Gradient Descent Methods
17 pages
Final SGD Notes
No ratings yet
Final SGD Notes
24 pages
Gradient Descent
No ratings yet
Gradient Descent
7 pages
Optimizing Gradient Descent Techniques
No ratings yet
Optimizing Gradient Descent Techniques
4 pages
Gradient Descent Techniques Overview
No ratings yet
Gradient Descent Techniques Overview
65 pages
Gradient Descent Techniques Overview
No ratings yet
Gradient Descent Techniques Overview
65 pages
Stochastic Gradient Descent in Python
No ratings yet
Stochastic Gradient Descent in Python
28 pages
Deep Learning: Gradient Optimization Techniques
No ratings yet
Deep Learning: Gradient Optimization Techniques
40 pages
ViViT for Video Violence Detection
No ratings yet
ViViT for Video Violence Detection
7 pages
Detecting Fake News with Machine Learning
No ratings yet
Detecting Fake News with Machine Learning
83 pages
IoT and Deep Learning for Crop Management
No ratings yet
IoT and Deep Learning for Crop Management
10 pages
Deep Learning for Product Matching
No ratings yet
Deep Learning for Product Matching
17 pages
NetVLAD: CNN for Place Recognition
No ratings yet
NetVLAD: CNN for Place Recognition
11 pages
Learning Methods in Neural Networks
No ratings yet
Learning Methods in Neural Networks
54 pages
Classical Computer Vision Techniques
No ratings yet
Classical Computer Vision Techniques
28 pages
Vision Transformers for Spinach Disease Detection
No ratings yet
Vision Transformers for Spinach Disease Detection
5 pages
Understanding Deep Learning Basics
No ratings yet
Understanding Deep Learning Basics
3 pages
Google Review Sentiment Analysis Tool
No ratings yet
Google Review Sentiment Analysis Tool
24 pages
Win Data Science Competitions on Kaggle
No ratings yet
Win Data Science Competitions on Kaggle
78 pages
Bias-Variance Decomposition Explained
No ratings yet
Bias-Variance Decomposition Explained
10 pages
Predictive Maintenance for Flow Instruments
No ratings yet
Predictive Maintenance for Flow Instruments
6 pages
Machine Learning System Design Insights
No ratings yet
Machine Learning System Design Insights
51 pages
Vision-Based Gaze Estimation Review
No ratings yet
Vision-Based Gaze Estimation Review
19 pages
Performance Metrics for Imbalanced Text Classification
No ratings yet
Performance Metrics for Imbalanced Text Classification
9 pages
Back Propagation in Neural Networks
No ratings yet
Back Propagation in Neural Networks
9 pages
Deep Learning Methods Explained
No ratings yet
Deep Learning Methods Explained
42 pages
Bias-Variance Tradeoff Explained
No ratings yet
Bias-Variance Tradeoff Explained
15 pages
Model-Based Machine Learning John Winn Christopher M. Bishop Thomas Diethe Etc.
No ratings yet
Model-Based Machine Learning John Winn Christopher M. Bishop Thomas Diethe Etc.
13 pages
Perception CNN for Facial Expression Recognition
No ratings yet
Perception CNN for Facial Expression Recognition
13 pages
Image Classification with LeNet-5 Model
No ratings yet
Image Classification with LeNet-5 Model
4 pages
Adaptive Sample Weighting with MLP
No ratings yet
Adaptive Sample Weighting with MLP
12 pages
Decision Tree Learning Techniques Guide
No ratings yet
Decision Tree Learning Techniques Guide
27 pages
AI in Healthcare Data Analysis Report
No ratings yet
AI in Healthcare Data Analysis Report
22 pages
Statistical Learning Theory Overview
No ratings yet
Statistical Learning Theory Overview
94 pages
JavaScript Libraries for AI & ML
No ratings yet
JavaScript Libraries for AI & ML
2 pages
Machine Learning Practice - Viva Prep
No ratings yet
Machine Learning Practice - Viva Prep
42 pages
History of Deep Learning Evolution
No ratings yet
History of Deep Learning Evolution
3 pages
Lecture 3 Language Model
No ratings yet
Lecture 3 Language Model
125 pages

Stochastic Gradient Descent in Deep Learning

Uploaded by

Stochastic Gradient Descent in Deep Learning

Uploaded by

DEEP LEARNING

Stochastic Gradient Descent

▪ A dataset of 3 items of weight and height. Given

▪ What are the values of Intercept(I) and Slope (s)

▪ Identify the Loss Function for the proposed model

▪ There are various loss functions like Sum of

Loss function = (y1 – yp1)2 + (y2 – yp2)2 + (y3 – yp3)2

▪ Take long step (descent) when slope (Gradient) is high

a – Current Value of attribute (I or s)

A gradient descent of an n-dimensional

▪ Gradient Descent can be very sensitive to the learning rate (γ)

Step1: Take derivative of loss function for each attribute (feature) in it

Step2: Pick up Random values to the attribute (feature)

Step3: Plugin the current attribute values derivatives (Gradient)

Step4: Calculate step sizes for each attribute : ( slope x γ)

▪ In each of the derivative, there are 1,000,000 terms

For Big Data, Gradient Descent is SLOW Stochastic Gradient Descent

▪ Stochastic Gradient Descent would is especially

▪ Strict definition of Strict Stochastic Gradient Descent

▪ SDG can handle new data very efficiently

You might also like