0% found this document useful (0 votes)

9 views53 pages

Module 2

The document provides an overview of regression models, particularly focusing on linear regression and logistic regression, as foundational concepts for understanding deep learning. It explains the training process of neural networks, including defining architectures, handling data, specifying loss functions, and training models using techniques like gradient descent. Additionally, it introduces the perceptron as a basic unit for decision-making in neural networks, illustrating its application in both binary classification and logical functions.

Uploaded by

f20220549

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

9 views53 pages

Module 2

Uploaded by

f20220549

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

Deep Learning

CS F425
BITS Pilani Prof. Pratik Narang
Department of CSIS
Pilani Campus
BITS Pilani
Pilani Campus

Regression models
Introduction
• Before we venture into deep learning, we need to cover the basics of neural
network training.

• Here, we will understand the entire training process, including:

o Defining simple neural network architectures
o Handling data
o Specifying a loss function
o Training the model

o Classic statistical learning techniques such as linear and softmax regression can
be thought of as linear neural networks.
Linear Regression
• Regression refers to a set of methods for modeling the relationship
between one or more independent variables and a dependent variable.
o The purpose of regression is most often to characterize the relationship
between the inputs and outputs.
o Machine learning, on the other hand, is most often concerned with prediction.

o We can use regression whenever we want to predict a numerical value.

o Predicting prices (of homes, stocks, etc.)
o Predicting length of stay (for patients in the hospital)
o Demand forecasting (for retail sales)
Linear Regression
• Linear regression flows from a few simple assumptions:
• The relationship between the independent variables 𝒙 and the
dependent variable 𝑦 is linear, i.e., 𝑦 can be expressed as a weighted
sum of the elements in 𝒙, given some noise on the observations.
• We will use 𝑛 to denote the number of examples. We index the data
examples by 𝑖.

• and the corresponding label as 𝑦(𝑖)

Basic Elements of Linear Regression

Linear Model Loss Function Analytic Solution

Making Predictions
Gradient Descent with the Learned
Model
Basic Elements of Linear Regression

Linear Model Loss Function Analytic Solution

Making Predictions
Gradient Descent with the Learned
Model
Linear regression: linear model
• The linearity assumption says that the target (price) can be expressed as a
weighted sum of the features (area and age):
𝑝𝑟𝑖𝑐𝑒 = 𝑤𝑎𝑟𝑒𝑎 ⋅ 𝑎𝑟𝑒𝑎 + 𝑤𝑎𝑔𝑒 ⋅ 𝑎𝑔𝑒 + 𝑏
• 𝑤𝑎𝑟𝑒𝑎 and 𝑤𝑎𝑔𝑒 are called weights, and 𝑏 is called a bias(/offset/intercept).
• The weights determine the influence of each feature on our prediction.
• The bias just says what value the predicted price should take when all of the
features take value 0.
• The equation above is an affine transformation of input features, which is
characterized by a linear transformation of features via weighted sum, combined
with a translation via the added bias.
Linear regression: linear model
• Given a dataset, our goal is to choose the weights 𝒘 and the bias 𝑏 such that, on
average, the predictions made by our model best fit the true observations.

• Models whose output prediction is determined by the affine transformation of

input features are linear models.
Linear regression: linear model
• In machine learning, we usually work with high-dimensional datasets.
• When our inputs consist of 𝑑 features, we express our prediction 𝑦ො as
𝑦ො = 𝑤1 𝑥1 + ⋯ + 𝑤𝑑 𝑥𝑑 + 𝑏
• Collecting all features into a vector 𝒙 ∈ 𝑅𝑑 and all weights into a vector 𝒘 ∈ 𝑅𝑑 ,
we can express our model using a dot product:
𝑦ො = 𝒘⊤ 𝒙 + 𝑏
the vector 𝒙 corresponds to features of a single data example.
• We refer to features of our entire dataset of 𝒏 examples via the matrix 𝑿 ∈ 𝑅𝑛×𝑑 ,
where, 𝑿 contains one row for every example and one column for every feature.
• For a collection of features 𝑿, the predictions 𝑦ො ∈ 𝑅𝑛 can be expressed via the
matrix-vector product:
𝑦ො = 𝑿𝒘 + 𝑏
Linear regression: linear model
• Given the features of a training dataset 𝑿 and corresponding (known) labels 𝑦, the goal
of linear regression is to find the weight vector 𝒘 and the bias term 𝑏 such that, given
features of a new data example sampled from the same distribution as 𝑿, the new
example’s label will (in expectation) be predicted with the lowest error.

• We would not expect to find a real-world dataset of 𝑛 examples where 𝑦 𝑖 exactly

equals 𝒘⊤ 𝒙 𝑖 + 𝑏 for all 1 ≤ 𝑖 ≤ 𝑛.
o Thus, even when we are confident that the underlying relationship is linear, we will
incorporate a noise term to account for such errors.
• Before searching for the best parameters (or model parameters) 𝒘 and 𝑏, we need two
more things:
1. A quality measure for some given model.
2. A procedure for updating the model to improve its quality.
Basic Elements of Linear Regression

Linear Model Loss Function Analytic Solution

Making Predictions
Gradient Descent with the Learned
Model
Linear regression: loss function
• To think about how to fit data with our model, we need to determine a measure
of fitness.
• The loss function quantifies the distance between the real and predicted value of
the target.
o The loss will be a non-negative number where smaller values are better.
o Perfect predictions incur a loss of 0.
• The most popular loss function in regression problems is the squared error:
𝑖 1 𝑖 𝑖 2
𝑙 (𝒘, 𝑏) = 𝑦ො − 𝑦
2
1
• The constant makes no difference but will prove to be notationally convenient, cancelling out
2
when we take the derivative of the loss.
Linear regression: loss function
• The empirical error is only a function of the model parameters.

• Consider the example below where we plot a regression problem for

a one-dimensional case.

• Large differences between estimates

𝑦ො 𝑖 and observations 𝑦 𝑖 lead to
even larger contributions to the loss,
due to the quadratic dependence.
Linear regression: loss function
• To measure the quality of a model on the entire dataset of 𝑛 examples, we
average (or equivalently, sum) the losses on the training set:

1 𝑛 1 𝑛 2
𝐿(𝒘, 𝑏) = σ𝑖=1 𝑙 𝑖 𝒘, 𝑏 = σ𝑖=1 𝒘⊤ 𝑥 𝑖 +𝑏−𝑦 𝑖
𝑛 𝑛

• When training the model, we want to find parameters (𝒘∗ , 𝑏 ∗ ) that

minimize the total loss across all training examples:

𝒘∗ , 𝑏 ∗ = 𝑎𝑟𝑔𝑚𝑖𝑛𝒘,𝑏 𝐿(𝒘, 𝑏)
Basic Elements of Linear Regression

Linear Model Loss Function Analytic Solution

Making Predictions
Gradient Descent with the Learned
Model
Linear regression: analytic solution
• Linear regression can be solved analytically by applying a simple formula:
• Subsume the bias 𝑏 into the parameter 𝒘 by appending a column to the design matrix
consisting of all ones.
• Then our prediction problem is to minimize ∥ 𝑦 − 𝑿𝒘 ∥2 .
• Take the loss surface to be the minimum of the loss over the entire domain.
• Taking the derivative of the loss with respect to 𝒘 and setting it equal to zero yields the
analytic solution:

𝒘 ∗ = 𝑿⊤ 𝑿 −1 𝑿⊤ 𝑦

• The requirement of an analytic solution is so restrictive that it would exclude all exciting
aspects of deep learning.
• Simple problems like linear regression may admit analytic solutions but, we should not
get used to such good fortune!
Basic Elements of Linear Regression

Linear Model Loss Function Analytic Solution

Making Predictions
Gradient Descent with the Learned
Model
Linear regression: gradient descent
• In cases where we cannot solve the models analytically, we can still train models
effectively in practice.
• The key technique for optimizing nearly any deep learning model is called
gradient descent.
• Gradient descent iteratively reduces the error by updating the parameters in the
direction that incrementally lowers the loss function.
Starting
point

loss

Value of weight Point of convergence

Linear regression: gradient descent
• The most naïve application of gradient descent consists of taking the derivative
of the loss function, which is an average of the losses computed on every single
example in the dataset.
o This is extremely slow: we must pass over the entire dataset before making a single update.
o Thus, we will sample a random minibatch of examples every time we need to compute the
update, this variant called minibatch stochastic gradient descent.
• In each iteration:
1. We first randomly sample a minibatch B consisting of a fixed number of training examples.
2. We then compute the derivative (gradient) of the average loss on the minibatch with
regard to the model parameters.
3. Finally, we multiply the gradient by a predetermined positive value η and subtract the
resulting term from the current parameter values.
Linear regression: gradient descent
• We can express the update mathematically:
𝜂
𝒘, 𝑏 ← 𝒘, 𝑏 − σ𝑖∈𝐵 𝜕 𝒘,𝑏 𝑙 𝑖 𝒘, 𝑏
𝐵
• 𝒘 is the weights vector, 𝑏 is the bias, η is predetermined positive
value (learning rate), cardinality |𝑩| represents the number of
examples in each minibatch (the batch size), and “𝜕 𝒘,𝑏 𝑙 𝑖 𝒘, 𝑏 ”
means the partial derivative of the loss of 𝑖th element.

Randomly Iteratively sample Update the

initialize the random parameters in the
values of the minibatches from direction of the
model parameters the data negative gradient
Linear regression: gradient descent
• We can write this out explicitly as follows:

𝜂 𝑖 𝜂 𝑖
𝒘←𝒘− σ𝑖∈𝐵 𝜕𝒘 𝑙 𝒘, 𝑏 = 𝒘 − σ𝑖∈𝐵 𝒙 𝒘⊤ 𝒙 𝑖 +𝑏−𝑦 𝑖
𝐵 𝐵
𝜂 𝑖 𝜂
𝑏←𝑏− σ𝑖∈𝐵 𝜕𝑏 𝑙 𝒘, 𝑏 = 𝑏 − σ𝑖∈𝐵(𝒘⊤ 𝒙 𝑖
+𝑏−𝑦 𝑖 )
𝐵 𝐵

• The values of the batch size and learning rate are manually pre-
specified and not typically learned through model training.
o These parameters that are tunable but not updated in the training loop are
called hyperparameters.
Linear regression: gradient descent
• Linear regression happens to be a learning problem where there is only one minimum
over the entire domain.
• For more complicated models, like deep networks, the loss surfaces contain many
minima.
• Deep learning practitioners seldom struggle to find parameters that minimize the loss on
training sets.
• The more formidable task is to find parameters that will achieve
low loss on data that we have not seen before.
• A challenge called generalization.
Basic Elements of Linear Regression

Linear Model Loss Function Analytic Solution

Making Predictions
Gradient Descent with the Learned
Model
Making Predictions with the Learned Model
• Given the learned linear regression model 𝒘 ෝ ⊤ 𝒙 + 𝑏෠ , we can estimate
the price of a new house given its area 𝑥1 and age 𝑥2 .
o Estimating targets given features is commonly called prediction or
inference.
BITS Pilani
Pilani Campus

Logistic Regression
Regression vs. Classification
• Regression estimates a continuous value
• Classification predicts a discrete category
Logistic regression
• This is just like linear regression, except that the values y we want to
predict takes on only a small number of discrete values.
• For now, we will focus on the binary classification problem in which y
can take on only two values: 0 and 1.
• For instance, if we are trying to build a spam classifier for email, then
x (i) may be some features of a piece of email, and y may be 1 if it is a
piece of spam mail, and 0 otherwise.
Logistic regression
• We could approach the classification problem ignoring the fact that y
is discrete-valued, and use our old linear regression algorithm to try
to predict y given x.
• However, this method performs very poorly.
• Intuitively, it also doesn’t make sense to consider 𝑦ො values larger than
1 or smaller than 0 when we know that y ∈ {0, 1}.
Logistic regression
• Let’s change the form for our hypotheses for the prediction 𝑦.
ො
• We will choose
ෝ=σ(θT)
𝒚
• where

• is called the logistic function or the sigmoid function

The logistic/sigmoid function
BITS Pilani
Pilani Campus

Perceptron
Perceptron
• A perceptron takes several binary inputs, x1,x2,…,
and produces a single binary output

• Computing the output? – we introduce weights,

which express the importance of an input

• By varying the weights and the threshold, we can

get different models of decision-making
• Perceptron helps makes decisions by weighing up
evidence!
Perceptron
• A complex network of perceptrons could make quite subtle decisions!

• The first layer of perceptrons is making three very simple decisions by weighing
the input evidence.
• The perceptrons in the second layer? – making decision by weighing results from
the first layer, thus can make a decision at a more complex and abstract level.
• More complex decisions can be made by the perceptron in the third layer.
• In this way, a many-layer network of perceptrons can engage in sophisticated
decision making.
Perceptron – notations
• Let's make the notations uniform!

• Instead of

• We will use
Perceptron – more uses!
• Another way perceptrons can be used is to compute the elementary
logical functions, such as AND, OR, and NAND
Perceptron
• The computational universality of perceptrons does not imply that
perceptrons are merely a new type of NAND gate!

• We want to devise learning algorithms which can automatically tune

the weights and biases of a network of artificial neurons.

• Instead of explicitly laying out a circuit of NAND and other gates, our
neural networks can simply learn to solve problems, sometimes for
problems where it would be extremely difficult to directly design a
conventional circuit.
Sigmoid neurons
• How can we devise learning algorithms for a neural
network?
• Let’s say we have a network of perceptrons, and
we want to use to learn to solve some problem.
• To see how learning might work, suppose we make
a small change in some weight (or bias) in the
network.
• What we want: this small change in weight should
cause only a small corresponding change in the
output from the network.

• This property will make learning possible. [WHY?]

Sigmoid neurons
• However… perceptrons won’t work that way.
• A small change in weights/bias of any single perceptron in the
network may cause the output of that perceptron to completely flip,
say from 0 to 1.

• To overcome this, we introduce sigmoid neurons – similar to

perceptrons, but modified so that small changes in their weights and
bias cause only a small change in their output.
Sigmoid neurons
• The sigmoid neuron has inputs, x1,x2,…. But instead of being just 0
or 1, these inputs can also take on any values between 0 and 1
• The sigmoid neuron has weights for each input, w1,w2,…, and an
overall bias, b. But the output is not 0 or 1. Instead, it's σ(w⋅x+b),
where σ is called the sigmoid function, and is defined by:

• So, the output of a sigmoid neuron with inputs x1,x2,…, weights

w1,w2,…, and bias b is:
Sigmoid function
• This shape is smoothed-out version of a step function.
• It's the smoothness of the σ function that is the crucial fact, not its
detailed form.
• The smoothness of σ means that small changes Δwj in the weights
and Δb in the bias will produce a small change Δoutput in the
output from the neuron.
• Calculus tells us that Δoutput is well approximated by

• where the sum is over all the weights, wj, and ∂output/∂wj and ∂output/∂b denote partial
derivatives of the output with respect to wj and b, respectively.
• Δoutput is a linear function of the changes Δwj and Δb in the weights and bias.
• This linearity makes it easy to choose small changes in the weights and biases to achieve any
desired small change in the output.
Neural networks
“Feedforward” neural networks
• We studied neural networks where the output from one layer is used
as input to the next layer.
• Such networks are called feedforward neural networks.
• This means there are no loops in the network – information is always
fed forward, never fed back.
Neural Networks – notations
Neural Networks – notations
• We use a similar notation for the network's biases and activations.
Explicitly, we use blj for the bias of the jth neuron in the lth layer. And
we use alj for the activation of the jth neuron in the lth layer.
Neural Networks – notations
• So, the activation alj of the jth neuron in the lth layer is related to the
activations in the (l−1)th layer by the equation:

• where the sum is over all neurons k in the (l−1)th layer.

Neural Networks – notations
• We rewrite this expression in a matrix form.
• We define a weight matrix wl for each layer, l, where the entries of
the weight matrix wl are the weights connecting to the lth layer of
neurons. That is, the entry in the jth row and kth column is wljk
• Similarly, for each layer l, we define a bias vector, bl , where the
components of the bias vector are just the values blj , one component
for each neuron in the lth layer.
• We define an activation vector al whose components are the
activations alj.
• Finally:
• Sometimes written as: al=σ(zl) and zl≡wlal−1+bl
Exercise
• Compute (write the equation) for the weighted input and activations
for each layer
Computing the activations
Neural Networks – learning
• Training/testing data!
• Each training input x is a 28×28 = 784-dimensional vector
• Each entry in the vector represents the grey value for a single pixel in
the image.
• We'll denote the corresponding desired output by y=y(x), where y is a
10-dimensional vector.
• Example: if a particular training image, x, depicts a 6, then
y(x)=(0,0,0,0,0,0,1,0,0,0)T is the desired output from the network.
Neural Networks – learning
• We need an algorithm which lets us find weights and biases so that the output
from the network approximates y(x) for all training inputs x.
• To quantify how well we're achieving this goal, we define a cost function

• Here, w denotes the collection of all weights in the network, b all the biases, n is
the total number of training inputs, a is the vector of outputs from the network
when x is input, and the sum is over all training inputs x.
• C is the quadratic cost function and same as MSE.
• The aim of our training algorithm will be to minimize the cost C(w,b) as a function
of the weights and biases.
• We can use gradient descent. It can be viewed as a way of taking small steps in
the direction which does the most to immediately decrease C.
Neural Networks – learning
• We use gradient descent to find the weights wk and biases bl which
minimize the cost
• Gradient descent update rule, in terms of w and b

• By repeatedly applying this update rule we can "roll down the hill", and
hopefully find a minimum of the cost function.
• We compute derivatives for gradient descent using “backpropagation”.
Thank you!

Content adapted from: Michael A. Nielsen, "Neural Networks and Deep Learning", Determination Press, 2015

Linear Regression Basics and Techniques
No ratings yet
Linear Regression Basics and Techniques
34 pages
Understanding Linear Regression Basics
No ratings yet
Understanding Linear Regression Basics
17 pages
Linear Neural Networks Explained
No ratings yet
Linear Neural Networks Explained
36 pages
Linear Models Training Overview
No ratings yet
Linear Models Training Overview
23 pages
Simple Linear Regression Overview
No ratings yet
Simple Linear Regression Overview
37 pages
History of Linear Regression
No ratings yet
History of Linear Regression
39 pages
Linear Regression and Optimization Techniques
No ratings yet
Linear Regression and Optimization Techniques
41 pages
Linear Regression Basics in ML
No ratings yet
Linear Regression Basics in ML
1 page
Week 5 - Unit IV - Regression Classification I (Linear Regression)
No ratings yet
Week 5 - Unit IV - Regression Classification I (Linear Regression)
54 pages
Linear Regression in Machine Learning
No ratings yet
Linear Regression in Machine Learning
13 pages
Linear & Logistic Regression Guide
No ratings yet
Linear & Logistic Regression Guide
34 pages
Final Technical Report: School of Electrical & Electronic Engineering
No ratings yet
Final Technical Report: School of Electrical & Electronic Engineering
12 pages
Understanding Regression Techniques
No ratings yet
Understanding Regression Techniques
37 pages
Introduction to Linear Regression
No ratings yet
Introduction to Linear Regression
9 pages
Machine Learning: Linear Models Overview
No ratings yet
Machine Learning: Linear Models Overview
84 pages
Understanding Regression Analysis Basics
No ratings yet
Understanding Regression Analysis Basics
25 pages
Supervised Machine Learning: Linear Models and Fundamentals
No ratings yet
Supervised Machine Learning: Linear Models and Fundamentals
49 pages
Understanding Regression Algorithms
No ratings yet
Understanding Regression Algorithms
26 pages
Final Technical Report: School of Electrical & Electronic Engineering
No ratings yet
Final Technical Report: School of Electrical & Electronic Engineering
12 pages
Understanding Linear Regression Basics
No ratings yet
Understanding Linear Regression Basics
36 pages
Final Technical Report: School of Electrical & Electronic Engineering
No ratings yet
Final Technical Report: School of Electrical & Electronic Engineering
12 pages
( ( - ENGN601 - ) Introduction To AI) 1 - Lecture 3 (Lecture Slides)
No ratings yet
( ( - ENGN601 - ) Introduction To AI) 1 - Lecture 3 (Lecture Slides)
41 pages
Understanding Linear Regression Basics
No ratings yet
Understanding Linear Regression Basics
68 pages
Machine Learning: Linear Regression Guide
No ratings yet
Machine Learning: Linear Regression Guide
36 pages
Linear Regression Analysis in Python
No ratings yet
Linear Regression Analysis in Python
115 pages
Linear Regression
No ratings yet
Linear Regression
19 pages
Sec ML Week05 Linear Regression
No ratings yet
Sec ML Week05 Linear Regression
27 pages
Decision Tree
No ratings yet
Decision Tree
74 pages
Linear Neural Networks for Regression
No ratings yet
Linear Neural Networks for Regression
43 pages
Machine Learning Training Models Guide
No ratings yet
Machine Learning Training Models Guide
94 pages
Understanding Linear Regression Basics
No ratings yet
Understanding Linear Regression Basics
30 pages
Training Machine Learning Models Overview
No ratings yet
Training Machine Learning Models Overview
83 pages
Understanding Linear Regression Basics
No ratings yet
Understanding Linear Regression Basics
4 pages
Definition of Linear Regression
No ratings yet
Definition of Linear Regression
60 pages
Understanding Linear Regression Basics
No ratings yet
Understanding Linear Regression Basics
10 pages
Introduction to Machine Learning Concepts
No ratings yet
Introduction to Machine Learning Concepts
33 pages
Linear Regression in Machine Learning
No ratings yet
Linear Regression in Machine Learning
38 pages
3 ML Linear Models Regression
No ratings yet
3 ML Linear Models Regression
44 pages
Linear Models in Machine Learning
No ratings yet
Linear Models in Machine Learning
44 pages
Understanding Linear Regression Variables
No ratings yet
Understanding Linear Regression Variables
18 pages
Neural Networks & Optimization Basics
No ratings yet
Neural Networks & Optimization Basics
67 pages
Understanding Multiple Linear Regression
No ratings yet
Understanding Multiple Linear Regression
7 pages
Supervised Learning: Linear Regression Basics
No ratings yet
Supervised Learning: Linear Regression Basics
33 pages
Machine Learning Concepts and Techniques
No ratings yet
Machine Learning Concepts and Techniques
155 pages
Week2 GD, SGD, KNN
No ratings yet
Week2 GD, SGD, KNN
98 pages
Linear Regression and Classification Techniques
No ratings yet
Linear Regression and Classification Techniques
42 pages
Linear Regression Essentials in Python
No ratings yet
Linear Regression Essentials in Python
23 pages
Supervised Learning: Linear Regression Guide
No ratings yet
Supervised Learning: Linear Regression Guide
9 pages
Introduction to Machine Learning Concepts
No ratings yet
Introduction to Machine Learning Concepts
54 pages
Linear Regression Lab Manual
100% (1)
Linear Regression Lab Manual
8 pages
Linear Regression Explained: Concepts & Code
No ratings yet
Linear Regression Explained: Concepts & Code
7 pages
Understanding Solution Conductivity
No ratings yet
Understanding Solution Conductivity
1 page
Grade 8 Algebra Answer Key
No ratings yet
Grade 8 Algebra Answer Key
104 pages
Understanding Measures of Variability
No ratings yet
Understanding Measures of Variability
16 pages
Stateful Stream Processing in POS Systems
No ratings yet
Stateful Stream Processing in POS Systems
11 pages
Water Hammer Analysis in Pipelines
No ratings yet
Water Hammer Analysis in Pipelines
7 pages
Airsmart User'S Manual: ™ Controller
No ratings yet
Airsmart User'S Manual: ™ Controller
58 pages
Multiply With Regrouping
No ratings yet
Multiply With Regrouping
2 pages
Electron Transfer in Voltaic Cells
No ratings yet
Electron Transfer in Voltaic Cells
52 pages
Common Z-Transforms and ROC Table
No ratings yet
Common Z-Transforms and ROC Table
2 pages
MSc Data Analytics at BSBI: Career Pathways
No ratings yet
MSc Data Analytics at BSBI: Career Pathways
2 pages
Deha Tasarim Katalog
No ratings yet
Deha Tasarim Katalog
100 pages
Telangana Physics Lab Manual 2023
No ratings yet
Telangana Physics Lab Manual 2023
107 pages
Fractional Distillation Project Report
No ratings yet
Fractional Distillation Project Report
19 pages
Material Point Method: Theory & Applications
No ratings yet
Material Point Method: Theory & Applications
14 pages
Microscopy Atlas of Medicinal Plants
100% (31)
Microscopy Atlas of Medicinal Plants
265 pages
Grade X Chemistry Summative Assessment
No ratings yet
Grade X Chemistry Summative Assessment
9 pages
Python Interview Questions & Answers
No ratings yet
Python Interview Questions & Answers
33 pages
Understanding Alkynes: Properties & Reactions
No ratings yet
Understanding Alkynes: Properties & Reactions
6 pages
Introduction to Analytical Chemistry
No ratings yet
Introduction to Analytical Chemistry
23 pages
AASHTO T 90: Plastic Limit Testing
No ratings yet
AASHTO T 90: Plastic Limit Testing
4 pages
Komatsu Motor Grader: Peserta Training
No ratings yet
Komatsu Motor Grader: Peserta Training
93 pages
Unit Testing with Junit and Mocking Guide
No ratings yet
Unit Testing with Junit and Mocking Guide
5 pages
THOR Operator Manual Overview
No ratings yet
THOR Operator Manual Overview
231 pages
SIE 370 Final Exam Practice Guide
No ratings yet
SIE 370 Final Exam Practice Guide
10 pages
Simple Video Rental Inventory System
No ratings yet
Simple Video Rental Inventory System
12 pages
Understanding Mathematical Statements
No ratings yet
Understanding Mathematical Statements
7 pages
Diploma Program Syllabus 1st Year 2024
No ratings yet
Diploma Program Syllabus 1st Year 2024
17 pages
SQLite
No ratings yet
SQLite
35 pages
VoIP Protocols: SIP, RTP, RTCP, H.323
No ratings yet
VoIP Protocols: SIP, RTP, RTCP, H.323
31 pages
Understanding California Bearing Ratio
No ratings yet
Understanding California Bearing Ratio
13 pages

Module 2

Uploaded by

Module 2

Uploaded by

Deep Learning

• Here, we will understand the entire training process, including:

o We can use regression whenever we want to predict a numerical value.

• and the corresponding label as 𝑦(𝑖)

Linear Model Loss Function Analytic Solution

Linear Model Loss Function Analytic Solution

• Models whose output prediction is determined by the affine transformation of

• We would not expect to find a real-world dataset of 𝑛 examples where 𝑦 𝑖 exactly

Linear Model Loss Function Analytic Solution

• Consider the example below where we plot a regression problem for

• Large differences between estimates

• When training the model, we want to find parameters (𝒘∗ , 𝑏 ∗ ) that

Linear Model Loss Function Analytic Solution

Linear Model Loss Function Analytic Solution

Value of weight Point of convergence

Randomly Iteratively sample Update the

Linear Model Loss Function Analytic Solution

• is called the logistic function or the sigmoid function

• Computing the output? – we introduce weights,

• By varying the weights and the threshold, we can

• We want to devise learning algorithms which can automatically tune

• This property will make learning possible. [WHY?]

• To overcome this, we introduce sigmoid neurons – similar to

• So, the output of a sigmoid neuron with inputs x1,x2,…, weights

• where the sum is over all neurons k in the (l−1)th layer.

You might also like