0% found this document useful (0 votes)
8 views45 pages

Deep Learning Basics: Weights & Biases

Unit 1 introduces deep learning concepts including weights, biases, neurons, activation functions, and the training process of neural networks. It explains how these components work together to minimize loss through techniques like backpropagation and gradient descent. Real-life analogies are provided to illustrate these concepts, making them more relatable and easier to understand.

Uploaded by

tejashirurkar78
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views45 pages

Deep Learning Basics: Weights & Biases

Unit 1 introduces deep learning concepts including weights, biases, neurons, activation functions, and the training process of neural networks. It explains how these components work together to minimize loss through techniques like backpropagation and gradient descent. Real-life analogies are provided to illustrate these concepts, making them more relatable and easier to understand.

Uploaded by

tejashirurkar78
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd

Unit 1:

Introduction to
Deep Learning
Basic Terminology in Deep Learning: Weights, Biases, Neurons,
Activation Functions,

Training a Neural Network, Forward Pass, Loss Functions (MSE,


Cross-Entropy) Backpropagation and Gradient Descent
Weights
• Numerical parameters in a neural network.
• Represent the strength/importance of input
features.
• Each input is multiplied by a weight before being
passed to the neuron.

Equatio
n:
z=w1​.x1​+w2​.x2​+...+wn​.xn​
+b
Why are Weights
Important?
• Decide how much influence each input has on the
output.
• Adjusted during training using backpropagation +
gradient descent.
• Perfectly tuned weights → accurate predictions.
Real-Life Analogy of
Weights
Cooking Recipe
Analogy
• Inputs = ingredients (flour, sugar, salt, oil).
• Weights = how much of each ingredient you add.
• Bias = the “extra spice” you always add no matter
what.
• Output = the taste of the dish.
• Training = trying, tasting, and adjusting ingredient
Case Study Analogy of weights:
Online Movie Recommendation
• Inputs = user’s past viewing history (action, comedy,
drama, thriller).
• Weights = how much importance the system gives to each
genre.
i. Action → +0.8
ii. Drama → +0.6
iii. Comedy → +0.2
iv. Thriller → -0.4 (user dislikes thrillers, so negative weight).
• Bias = general trend (e.g., festive season → push family
movies).
• Output = recommended movie list.
Bias
• Bias is a trainable parameter in a neural network.
• It is added to the weighted sum of inputs before
applying the activation function.
• Mathematical form:
z=(w1​x1​+w2​x2​+...
+wn​xn​)+b
where b= bias.
Purpose of
Bias
• Allows the model to fit data better, even when inputs are
zero.
• Prevents the network from being overly dependent on just
weights.
Example
• Without bias: neuron always passes through origin (0,0).
• With bias: neuron can shift and better match real-world
data.
Case Study Analogy of
Bias
House Price
Prediction
• Inputs = square footage, number of rooms, location score.
• Weights = importance of each input.
• Bias = base price of the house (even an empty plot has
some cost).
• Without bias → house with 0 sq. ft. and 0 rooms = price = 0
(not realistic).
• With bias → model accounts for base land value, permits,
Neuron

s
A neuron is the basic unit of a neural network.
• It takes multiple inputs, applies weights, adds a bias, passes
the result through an activation function, and produces an
output.
Mathematical
Representation:
y=f(w1​x1​+w2​x2​+...+wn​
xn​+b)
where:
xi​= inputs
wi​= weights
b = bias
f = activation function
y = output
Purpose of Neuron:
• Captures patterns and relationships in data.
• Multiple neurons together form layers, which
combine to create deep networks.
Analogy (Real-Life Example) of
Neurons
Light Bulb
Analogy

• Inputs = switches connected to the bulb.


• Weights = thickness/quality of the wires (how much current
passes).
• Bias = small backup battery to ensure minimum current.
• Activation function = ON/OFF threshold of the bulb (does it
glow or not?).
• Output = brightness of the bulb.
Case Study Analogy of
Neurons
Email Spam
Detection

• Inputs = words in the email (e.g., “discount”, “lottery”,


“urgent”).
• Weights = importance of each word (e.g., “lottery” → +0.9,
“hello” → +0.1).
• Bias = base tendency (system might flag suspicious emails
even if few spam words are missing).
• Activation function = decision boundary (spam or not spam).
Activation
Function
• An activation function decides whether a neuron should “fire”
or not.
• It introduces non-linearity into the model.
• Without activation functions, neural networks would just be
linear models, no matter how many layers you stack.

Role in Neural Networks


• Maps the weighted sum of inputs into an output.
• Helps the network learn complex patterns (curves, images,
language, etc.).
• Controls the range of output values (e.g., 0–1, -1 to +1).
Types of Activation
Functions
[Link] Function: Outputs 0 or 1 based on threshold; basic perceptron
activation.
[Link] (Logistic): Smoothly maps inputs to range (0, 1); good for
probabilities.
[Link] (Hyperbolic Tangent): Maps inputs to (-1, 1); zero-centered
activation.
[Link] (Rectified Linear Unit): Outputs max(0, x); widely used for
hidden layers.
[Link] ReLU: Allows a small negative slope for x < 0 to avoid dead
neurons.
Click on Link - Activation
Function

Lin
k
Training a Neural
Network
Training a Neural Network is the process of teaching a neural
network to learn patterns in data by adjusting its weights and
biases using a training dataset. The goal is to minimize the error
(loss) between the predicted output and the actual output.
Key Components of
Training

[Link] Data: Features used by the network (e.g., pixel values for an
image).
[Link] & Biases: Parameters that are adjusted during training.
[Link] Functions: Introduce non-linearity and allow the network to
model complex relationships.
[Link] Function: Measures how far the predictions are from actual outputs
(e.g., MSE, Cross-Entropy).
[Link]: Algorithm to adjust weights/biases (e.g., Gradient Descent).
[Link] Rate: Step size for updating weights.
Step-by-Step Process of
Training
Step 1: Forward
Pass
• Inputs are fed through the network.
• Each neuron computes a weighted sum +
bias.
• Activation function is applied to generate
output.
• The network produces predictions.
Step 2: Compute
Loss
• Compare predictions with actual outputs using a loss
function.
• Examples:
[Link] (Mean Squared Error) for regression
[Link]-Entropy Loss for classification
Step 3:
Backpropagation
• Compute gradients of loss w.r.t weights and biases using
chain rule.
• Determines how much each weight contributed to the
error.
Step 4: Weight & Bias
Update
Use an optimizer (e.g., Gradient Descent) to update weights
and biases:
wnew​=wold​−η.
(∂L/∂w)​
η = learning rate, L
= loss
Step 5:
Repeat
Repeat forward pass → loss → backpropagation → weight update
for many epochs until loss is minimized.
Easy-to-Understand Analogy of Training a Neural
Network
Analogy: Learning to Bake a
Cake

• Input ingredients: Flour, sugar, eggs = Input features


• Recipe instructions: Network structure & activation functions
• Taste test: Loss function measures how good the cake is
• Adjust ingredients: Backpropagation adjusts flour/sugar/eggs =
updating weights
• Keep baking: Repeat until cake tastes perfect = Training for many
epochs
Forward
Pass

The Forward Pass is the process in a neural network where input


data is passed through each layer, each neuron computes its
output using weights, biases, and activation functions, and
finally the network produces a predicted output.
It is the first step in training
Step-by-Step Process of Forward
Pass

[Link] Layer: Receives raw input features (e.g., pixel values,

numerical data).
[Link] Layers Computation:
n
Each neuron computes the weighted sum of inputs + bias:
z=∑​wi​xi​
i
+b
Apply the activation function to introduce non-
linearity:
a=f(
z)
3. Output Layer:
• Receives outputs from the last hidden layer
• Computes weighted sum + bias
• Applies activation (e.g., softmax for classification, linear for
regression)
• Produces the final prediction
Easy-to-Understand Analogy of
Forward pass
Analogy: Assembly Line
Factory
[Link] materials (input data) enter the assembly line (input layer).

[Link] worker (neuron) processes materials based on instructions

(weights and biases).


[Link] (activation functions) adjust the output for each step.
[Link] product (predicted output) comes out at the end of the line (output
layer).
Loss Functions (MSE, Cross-
Entropy)

Loss Function is a mathematical function that measures how far


the predicted output of a neural network is from the actual
target.
• It is also called cost function or objective function.
• The goal of training is to minimize the loss by adjusting
weights and biases.
Types of Loss
Functions
1. Mean Squared Error
(MSE)
Definition: Measures the average squared difference between
predicted and actual values. Commonly used for regression
problems.

Formula:
• Easy Analogy of MSE: Guessing weight of apples

If you guess 100g and actual is 120g, the squared


difference = 400.
Average all guesses to see overall error.
• Range: 0 → ∞ (0 means perfect
prediction)
• Real-Life Applications:
[Link] house prices
[Link] stock prices
[Link] prediction
[Link] crop yield
[Link]-based medical measurements

• When to Use: Use MSE for regression problems where output


is continuous.
2. Cross-Entropy Loss (Log
Loss)
• Definition: Measures the difference between predicted
probability distribution and actual distribution. Commonly
used for classification problems

• Formula (Binary
Classification):
• Easy Analogy: Choosing the correct door in a game show
[Link] you predict 90% chance for correct door, but choose wrong
door, penalty is high.
[Link] network to assign high probability to correct
answer.
• Range: 0 → ∞ (0 means perfect
prediction)
• Real-Life Applications of Cross-Entropy
Loss:
[Link] classification (cats vs dogs vs other
animals)
[Link] digit recognition (MNIST)
[Link] detection (email spam/not spam)
[Link] diagnosis (multi-class medical
conditions)
• [Link]
When to Useanalysis (positive,
Cross-Entropy negative,
Loss: Use Cross-Entropy for
neutral)
classification problems where outputs are probabilities.
Short Summary of Loss
Functions
[Link] → regression, continuous outputs, penalizes large errors
heavily.
[Link]-Entropy → classification, probabilistic outputs,
penalizes wrong predictions logarithmically.
[Link] are essential for gradient-based learning, because the
optimizer uses loss gradients to update weights.
Backpropagati
on
Backpropagation (short for backward propagation of
errors) is the process of updating the weights and biases
of a neural network by calculating the gradient of the loss
function with respect to each parameter.
• It allows the network to learn from its mistakes.
• Works in conjunction with gradient descent to
minimize loss.
Step-by-Step Process of
Backpropagation
[Link] Pass: Compute the predicted output for the input
data.
[Link] Loss: Compare prediction with actual output using
a loss function.
[Link] Pass (Backpropagation):
• Compute gradient of loss w.r.t each weight and bias using the
chain rule.
• Determines how much each parameter contributed to the
erro
4. Weight & Bias Update:
5. Repeat: Perform for all training examples
(batch/mini-batch/epoch) until loss is minimized.
Easy-to-Understand Analogy for
Backpropagation
Analogy: Learning to Shoot
Arrows
[Link], you shoot an arrow (forward pass).
[Link] see where it lands compared to the target (loss
computation).
[Link] calculate how much to adjust your aim and angle
(backpropagation).
[Link] and shoot again (weight update).
[Link] until you hit the target consistently.
Case Study / Real-Life Example of
Backpropagation
Handwritten Digit Recognition (MNIST
Dataset):
[Link]: 28x28 pixel image of a digit.
[Link] Pass: Compute predicted probabilities using the
current weights.
[Link] Computation: Cross-Entropy loss between predicted
probabilities and actual label.
[Link]: Compute gradients of loss w.r.t all weights
and biases.
[Link] Update: Adjust weights to reduce error.
[Link]: For all images over multiple epochs until high
accuracy is achieved
Summary of
Backpropagation
• Backpropagation is the core learning mechanism in neural
networks.
• Uses the chain rule to efficiently calculate gradients.
• Works hand-in-hand with gradient descent.
• Without backpropagation, the network cannot learn from its
mistakes.
Gradient
Descent
Gradient Descent is an optimization algorithm used to minimize
the loss function of a neural network by iteratively updating the
weights and biases in the direction of the steepest decrease of
the loss.
• It is the core method for training neural networks.
• Helps the network learn optimal parameters.
Step-by-Step Process of Gradient
Descent
[Link] Weights & Biases: Usually randomly.
[Link] Pass: Compute predicted output.
[Link] Loss: Measure difference between prediction and
actual output.
[Link] Gradients: Using backpropagation, calculate

5. Update Parameters: Move in the opposite direction of


the gradient:

6. Repeat: Continue over all training examples for multiple


epochs until loss is minimized.
Easy-to-Understand Analogy of Gradient
Descent
Analogy: Hiking Down a Mountain to Find the
Lowest Point
• You are on a mountain (loss function graph).
• Your goal is to reach the valley (minimum loss).
• You look at the slope (gradient) and take a step downhill
(update weights).
• Repeat until you reach the bottom (minimum loss).
Variants of Gradient
Descent

[Link] Gradient Descent: Uses the entire dataset to compute


gradients.
[Link] Gradient Descent (SGD): Updates weights after
each training sample.
[Link]-batch Gradient Descent: Uses a small batch of samples
for each update (combines benefits of batch & SGD).
Real-Life Applications of Gradient
Descent
[Link] neural networks for image recognition
[Link] stock prices (regression)
[Link] language processing tasks (text classification,
sentiment analysis)
[Link] systems
[Link] and control systems
When to Use Gradient
Descent ?

[Link] used for training neural networks to minimize loss.


[Link] of learning rate and batch size depends on dataset
size and network complexity.
[Link] or mini-batch is preferred for large datasets due to
efficiency.
Summary of Gradient
Descent
• Gradient Descent is like learning by trial and error.
• Learning rate controls step size; too high → may overshoot,
too low → slow convergence.
• Works hand-in-hand with Backpropagation.

You might also like