Unit 1:
Introduction to
Deep Learning
Basic Terminology in Deep Learning: Weights, Biases, Neurons,
Activation Functions,
Training a Neural Network, Forward Pass, Loss Functions (MSE,
Cross-Entropy) Backpropagation and Gradient Descent
Weights
• Numerical parameters in a neural network.
• Represent the strength/importance of input
features.
• Each input is multiplied by a weight before being
passed to the neuron.
Equatio
n:
z=w1.x1+w2.x2+...+wn.xn
+b
Why are Weights
Important?
• Decide how much influence each input has on the
output.
• Adjusted during training using backpropagation +
gradient descent.
• Perfectly tuned weights → accurate predictions.
Real-Life Analogy of
Weights
Cooking Recipe
Analogy
• Inputs = ingredients (flour, sugar, salt, oil).
• Weights = how much of each ingredient you add.
• Bias = the “extra spice” you always add no matter
what.
• Output = the taste of the dish.
• Training = trying, tasting, and adjusting ingredient
Case Study Analogy of weights:
Online Movie Recommendation
• Inputs = user’s past viewing history (action, comedy,
drama, thriller).
• Weights = how much importance the system gives to each
genre.
i. Action → +0.8
ii. Drama → +0.6
iii. Comedy → +0.2
iv. Thriller → -0.4 (user dislikes thrillers, so negative weight).
• Bias = general trend (e.g., festive season → push family
movies).
• Output = recommended movie list.
Bias
• Bias is a trainable parameter in a neural network.
• It is added to the weighted sum of inputs before
applying the activation function.
• Mathematical form:
z=(w1x1+w2x2+...
+wnxn)+b
where b= bias.
Purpose of
Bias
• Allows the model to fit data better, even when inputs are
zero.
• Prevents the network from being overly dependent on just
weights.
Example
• Without bias: neuron always passes through origin (0,0).
• With bias: neuron can shift and better match real-world
data.
Case Study Analogy of
Bias
House Price
Prediction
• Inputs = square footage, number of rooms, location score.
• Weights = importance of each input.
• Bias = base price of the house (even an empty plot has
some cost).
• Without bias → house with 0 sq. ft. and 0 rooms = price = 0
(not realistic).
• With bias → model accounts for base land value, permits,
Neuron
•
s
A neuron is the basic unit of a neural network.
• It takes multiple inputs, applies weights, adds a bias, passes
the result through an activation function, and produces an
output.
Mathematical
Representation:
y=f(w1x1+w2x2+...+wn
xn+b)
where:
xi= inputs
wi= weights
b = bias
f = activation function
y = output
Purpose of Neuron:
• Captures patterns and relationships in data.
• Multiple neurons together form layers, which
combine to create deep networks.
Analogy (Real-Life Example) of
Neurons
Light Bulb
Analogy
• Inputs = switches connected to the bulb.
• Weights = thickness/quality of the wires (how much current
passes).
• Bias = small backup battery to ensure minimum current.
• Activation function = ON/OFF threshold of the bulb (does it
glow or not?).
• Output = brightness of the bulb.
Case Study Analogy of
Neurons
Email Spam
Detection
• Inputs = words in the email (e.g., “discount”, “lottery”,
“urgent”).
• Weights = importance of each word (e.g., “lottery” → +0.9,
“hello” → +0.1).
• Bias = base tendency (system might flag suspicious emails
even if few spam words are missing).
• Activation function = decision boundary (spam or not spam).
Activation
Function
• An activation function decides whether a neuron should “fire”
or not.
• It introduces non-linearity into the model.
• Without activation functions, neural networks would just be
linear models, no matter how many layers you stack.
Role in Neural Networks
• Maps the weighted sum of inputs into an output.
• Helps the network learn complex patterns (curves, images,
language, etc.).
• Controls the range of output values (e.g., 0–1, -1 to +1).
Types of Activation
Functions
[Link] Function: Outputs 0 or 1 based on threshold; basic perceptron
activation.
[Link] (Logistic): Smoothly maps inputs to range (0, 1); good for
probabilities.
[Link] (Hyperbolic Tangent): Maps inputs to (-1, 1); zero-centered
activation.
[Link] (Rectified Linear Unit): Outputs max(0, x); widely used for
hidden layers.
[Link] ReLU: Allows a small negative slope for x < 0 to avoid dead
neurons.
Click on Link - Activation
Function
Lin
k
Training a Neural
Network
Training a Neural Network is the process of teaching a neural
network to learn patterns in data by adjusting its weights and
biases using a training dataset. The goal is to minimize the error
(loss) between the predicted output and the actual output.
Key Components of
Training
[Link] Data: Features used by the network (e.g., pixel values for an
image).
[Link] & Biases: Parameters that are adjusted during training.
[Link] Functions: Introduce non-linearity and allow the network to
model complex relationships.
[Link] Function: Measures how far the predictions are from actual outputs
(e.g., MSE, Cross-Entropy).
[Link]: Algorithm to adjust weights/biases (e.g., Gradient Descent).
[Link] Rate: Step size for updating weights.
Step-by-Step Process of
Training
Step 1: Forward
Pass
• Inputs are fed through the network.
• Each neuron computes a weighted sum +
bias.
• Activation function is applied to generate
output.
• The network produces predictions.
Step 2: Compute
Loss
• Compare predictions with actual outputs using a loss
function.
• Examples:
[Link] (Mean Squared Error) for regression
[Link]-Entropy Loss for classification
Step 3:
Backpropagation
• Compute gradients of loss w.r.t weights and biases using
chain rule.
• Determines how much each weight contributed to the
error.
Step 4: Weight & Bias
Update
Use an optimizer (e.g., Gradient Descent) to update weights
and biases:
wnew=wold−η.
(∂L/∂w)
η = learning rate, L
= loss
Step 5:
Repeat
Repeat forward pass → loss → backpropagation → weight update
for many epochs until loss is minimized.
Easy-to-Understand Analogy of Training a Neural
Network
Analogy: Learning to Bake a
Cake
• Input ingredients: Flour, sugar, eggs = Input features
• Recipe instructions: Network structure & activation functions
• Taste test: Loss function measures how good the cake is
• Adjust ingredients: Backpropagation adjusts flour/sugar/eggs =
updating weights
• Keep baking: Repeat until cake tastes perfect = Training for many
epochs
Forward
Pass
The Forward Pass is the process in a neural network where input
data is passed through each layer, each neuron computes its
output using weights, biases, and activation functions, and
finally the network produces a predicted output.
It is the first step in training
Step-by-Step Process of Forward
Pass
[Link] Layer: Receives raw input features (e.g., pixel values,
numerical data).
[Link] Layers Computation:
n
Each neuron computes the weighted sum of inputs + bias:
z=∑wixi
i
+b
Apply the activation function to introduce non-
linearity:
a=f(
z)
3. Output Layer:
• Receives outputs from the last hidden layer
• Computes weighted sum + bias
• Applies activation (e.g., softmax for classification, linear for
regression)
• Produces the final prediction
Easy-to-Understand Analogy of
Forward pass
Analogy: Assembly Line
Factory
[Link] materials (input data) enter the assembly line (input layer).
[Link] worker (neuron) processes materials based on instructions
(weights and biases).
[Link] (activation functions) adjust the output for each step.
[Link] product (predicted output) comes out at the end of the line (output
layer).
Loss Functions (MSE, Cross-
Entropy)
Loss Function is a mathematical function that measures how far
the predicted output of a neural network is from the actual
target.
• It is also called cost function or objective function.
• The goal of training is to minimize the loss by adjusting
weights and biases.
Types of Loss
Functions
1. Mean Squared Error
(MSE)
Definition: Measures the average squared difference between
predicted and actual values. Commonly used for regression
problems.
Formula:
• Easy Analogy of MSE: Guessing weight of apples
If you guess 100g and actual is 120g, the squared
difference = 400.
Average all guesses to see overall error.
• Range: 0 → ∞ (0 means perfect
prediction)
• Real-Life Applications:
[Link] house prices
[Link] stock prices
[Link] prediction
[Link] crop yield
[Link]-based medical measurements
• When to Use: Use MSE for regression problems where output
is continuous.
2. Cross-Entropy Loss (Log
Loss)
• Definition: Measures the difference between predicted
probability distribution and actual distribution. Commonly
used for classification problems
• Formula (Binary
Classification):
• Easy Analogy: Choosing the correct door in a game show
[Link] you predict 90% chance for correct door, but choose wrong
door, penalty is high.
[Link] network to assign high probability to correct
answer.
• Range: 0 → ∞ (0 means perfect
prediction)
• Real-Life Applications of Cross-Entropy
Loss:
[Link] classification (cats vs dogs vs other
animals)
[Link] digit recognition (MNIST)
[Link] detection (email spam/not spam)
[Link] diagnosis (multi-class medical
conditions)
• [Link]
When to Useanalysis (positive,
Cross-Entropy negative,
Loss: Use Cross-Entropy for
neutral)
classification problems where outputs are probabilities.
Short Summary of Loss
Functions
[Link] → regression, continuous outputs, penalizes large errors
heavily.
[Link]-Entropy → classification, probabilistic outputs,
penalizes wrong predictions logarithmically.
[Link] are essential for gradient-based learning, because the
optimizer uses loss gradients to update weights.
Backpropagati
on
Backpropagation (short for backward propagation of
errors) is the process of updating the weights and biases
of a neural network by calculating the gradient of the loss
function with respect to each parameter.
• It allows the network to learn from its mistakes.
• Works in conjunction with gradient descent to
minimize loss.
Step-by-Step Process of
Backpropagation
[Link] Pass: Compute the predicted output for the input
data.
[Link] Loss: Compare prediction with actual output using
a loss function.
[Link] Pass (Backpropagation):
• Compute gradient of loss w.r.t each weight and bias using the
chain rule.
• Determines how much each parameter contributed to the
erro
4. Weight & Bias Update:
5. Repeat: Perform for all training examples
(batch/mini-batch/epoch) until loss is minimized.
Easy-to-Understand Analogy for
Backpropagation
Analogy: Learning to Shoot
Arrows
[Link], you shoot an arrow (forward pass).
[Link] see where it lands compared to the target (loss
computation).
[Link] calculate how much to adjust your aim and angle
(backpropagation).
[Link] and shoot again (weight update).
[Link] until you hit the target consistently.
Case Study / Real-Life Example of
Backpropagation
Handwritten Digit Recognition (MNIST
Dataset):
[Link]: 28x28 pixel image of a digit.
[Link] Pass: Compute predicted probabilities using the
current weights.
[Link] Computation: Cross-Entropy loss between predicted
probabilities and actual label.
[Link]: Compute gradients of loss w.r.t all weights
and biases.
[Link] Update: Adjust weights to reduce error.
[Link]: For all images over multiple epochs until high
accuracy is achieved
Summary of
Backpropagation
• Backpropagation is the core learning mechanism in neural
networks.
• Uses the chain rule to efficiently calculate gradients.
• Works hand-in-hand with gradient descent.
• Without backpropagation, the network cannot learn from its
mistakes.
Gradient
Descent
Gradient Descent is an optimization algorithm used to minimize
the loss function of a neural network by iteratively updating the
weights and biases in the direction of the steepest decrease of
the loss.
• It is the core method for training neural networks.
• Helps the network learn optimal parameters.
Step-by-Step Process of Gradient
Descent
[Link] Weights & Biases: Usually randomly.
[Link] Pass: Compute predicted output.
[Link] Loss: Measure difference between prediction and
actual output.
[Link] Gradients: Using backpropagation, calculate
5. Update Parameters: Move in the opposite direction of
the gradient:
6. Repeat: Continue over all training examples for multiple
epochs until loss is minimized.
Easy-to-Understand Analogy of Gradient
Descent
Analogy: Hiking Down a Mountain to Find the
Lowest Point
• You are on a mountain (loss function graph).
• Your goal is to reach the valley (minimum loss).
• You look at the slope (gradient) and take a step downhill
(update weights).
• Repeat until you reach the bottom (minimum loss).
Variants of Gradient
Descent
[Link] Gradient Descent: Uses the entire dataset to compute
gradients.
[Link] Gradient Descent (SGD): Updates weights after
each training sample.
[Link]-batch Gradient Descent: Uses a small batch of samples
for each update (combines benefits of batch & SGD).
Real-Life Applications of Gradient
Descent
[Link] neural networks for image recognition
[Link] stock prices (regression)
[Link] language processing tasks (text classification,
sentiment analysis)
[Link] systems
[Link] and control systems
When to Use Gradient
Descent ?
[Link] used for training neural networks to minimize loss.
[Link] of learning rate and batch size depends on dataset
size and network complexity.
[Link] or mini-batch is preferred for large datasets due to
efficiency.
Summary of Gradient
Descent
• Gradient Descent is like learning by trial and error.
• Learning rate controls step size; too high → may overshoot,
too low → slow convergence.
• Works hand-in-hand with Backpropagation.