DL Micro
DL Micro
1
Q3) What is Neuron? Explain structure of Q4) Explain Architecture of Feed Forward
Biological Neuron Neural Network with necessary convention
A neuron is the basic structural and functional A Feed Forward Neural Network (FFNN) is the
unit of the nervous system. It is a specialized cell simplest type of Artificial Neural Network in which
that receives information, processes it, and information flows only in one direction, i.e., from
transmits signals to other neurons, muscles, or input layer to output layer through hidden layer(s).
glands in the form of electrical and chemical There is no feedback connection or loop in this
impulses. Neurons are responsible for network. It is mainly used for classification and
communication within the brain and throughout prediction tasks.
the body.
Architecture of Feed Forward Neural Network:
Structure of a Biological Neuron:
1. Input Layer
1. Cell Body (Soma) This is the first layer of the network. It
The cell body contains the nucleus and receives the input data/features from the
other organelles. It controls all activities of external environment. Each neuron in the
the neuron like metabolism, growth, and input layer represents one input feature.
signal processing. It also integrates the Input layer only passes data forward and
incoming signals. does not perform computation.
2. Dendrites 2. Hidden Layer(s)
Dendrites are branch-like structures Hidden layers are placed between input
connected to the cell body. Their main and output layers. These layers perform
function is to receive signals (inputs) from actual processing and learning. Each
other neurons and carry them towards the neuron in hidden layer receives weighted
cell body. inputs, applies an activation function, and
passes output to the next layer. A network
3. Axon
can have one or more hidden layers.
The axon is a long, tube-like structure that
carries nerve impulses away from the cell 3. Output Layer
body to other neurons or target cells. It This is the last layer of the network. It
acts as the output pathway of the neuron. produces the final output of the neural
network. The output depends on the
4. Myelin Sheath
problem type:
The axon is often covered by a fatty
insulating layer called myelin sheath. It • For classification: output may be class
increases the speed of signal transmission labels or probabilities
by preventing signal loss.
• For regression: output may be a numeric
5. Nodes of Ranvier value
These are small gaps between myelin
Necessary conventions in FFNN:
sheath segments on the axon. They help in
faster transmission of impulses through a • Weights (w): Each connection between
process called saltatory conduction. neurons has a weight. It represents the
strength of the connection.
6. Axon Terminals (Synaptic Terminals)
These are the end branches of the axon. • Bias (b): Each neuron has a bias value
They release neurotransmitters to which helps to shift the activation function.
communicate with the next neuron through
• Net input:
a junction called synapse.
net = Σ (xi * wi) + b
• Activation function (f): Converts net input
into output. Common functions are
sigmoid, ReLU, and tanh.
output = f(net)
2
Q5) What is Back Propagation Learning? Explain Q6) How weights are updated at output layer for
algorithm / weight update Multilayer Neural Network?
Back Propagation Learning is a supervised In a Multilayer Neural Network, weights at the
learning algorithm used to train multi-layer feed output layer are updated using the
forward neural networks. In this method, the error backpropagation learning rule. The main goal is to
between actual output and desired output is reduce the error between the target output and
calculated at the output layer and then the actual output by adjusting the weights using
propagated backward through hidden layers to gradient descent.
update the weights. The main aim is to minimize
Steps for weight update at output layer:
the total error by adjusting weights and biases.
1. Calculate net input to output neuron
Back Propagation Algorithm (Steps):
For an output neuron (k):
1. Initialize Weights and Biases net_k = (\sum (w_{jk} \cdot y_j)) + b_k
Set all weights and biases to small random Where,
values. (w_{jk}) = weight from hidden neuron (j) to
output neuron (k)
2. Forward Pass (Forward Propagation)
(y_j) = output of hidden neuron (j)
Give input vector to the network and
(b_k) = bias of output neuron
compute output layer result by passing
data through hidden layers using activation 2. Calculate actual output
functions. (o_k = f(net_k))
where (f) is activation function
3. Calculate Error
(sigmoid/tanh/ReLU).
Find difference between target output (T)
and actual output (Y). 3. Compute error at output neuron
Error for one output neuron: Error difference:
E = 1/2 (T − Y)² (e_k = (t_k - o_k))
where (t_k) is target output.
4. Backward Pass (Error Propagation)
Compute error gradients starting from 4. Calculate delta (error term) for output
output layer to hidden layers using chain neuron
rule. (\delta_k = (t_k - o_k) \cdot f'(net_k))
5. Weight Update 5. Update output layer weights
Update weights using gradient descent Weight update rule:
method to reduce error. (\Delta w_{jk} = \eta \cdot \delta_k \cdot
y_j)
Weight Update Rule:
For weight w between neurons: New weight:
w(new) = w(old) + Δw (w_{jk}(new) = w_{jk}(old) + \Delta w_{jk})
Where, 6. Update output layer bias
Δw = η × δ × x (\Delta b_k = \eta \cdot \delta_k)
(b_k(new) = b_k(old) + \Delta b_k)
η = learning rate (small positive constant)
x = input to the neuron
δ = error term (gradient)
For Output Layer Neuron:
δ = (T − Y) × f'(net)
For Hidden Layer Neuron:
δ = f'(net) × Σ (δ(next layer) × w)
Bias Update:
b(new) = b(old) + η × δ
3
Q7) What is Autoencoder? Explain Q8) Explain Architecture / Layers of CNN and
Undercomplete + Sparse Autoencoder their functions
An Autoencoder is a type of Artificial Neural CNN (Convolutional Neural Network) is a deep
Network used for unsupervised learning. It is learning model mainly used for image processing,
mainly used to learn efficient data representation pattern recognition, and computer vision tasks.
(encoding) by compressing the input data into a CNN automatically extracts important features
smaller form and then reconstructing the same from input images using different layers and
input at the output. The goal of an autoencoder is performs classification or detection.
to minimize reconstruction error between input
Architecture / Layers of CNN and their
and output.
functions:
Basic Structure of Autoencoder:
1. Input Layer
1. Encoder: Converts input data into This layer takes the input image in the form
compressed representation (latent vector). of pixels.
Example: 32×32×3 image (width × height ×
2. Bottleneck / Latent Space: The
channels).
compressed hidden layer representation.
2. Convolution Layer
3. Decoder: Reconstructs original input from
This is the main layer of CNN. It applies
the latent vector.
filters/kernels on the input image to extract
features like edges, corners, textures, and
patterns.
1) Undercomplete Autoencoder: Output of this layer is called feature map.
An undercomplete autoencoder has a hidden Function: Feature extraction and reducing
layer (latent layer) with fewer neurons than the parameters compared to fully connected
input layer. This forces the network to compress networks.
the data and learn the most important features.
3. Activation Layer (ReLU)
Key points: After convolution, an activation function is
• Hidden layer size < Input layer size applied, mostly ReLU (Rectified Linear
Unit).
• Learns compact and meaningful ReLU(x) = max(0, x)
representation Function: Adds non-linearity and improves
• Used for dimensionality reduction (like learning speed.
PCA) and feature extraction 4. Pooling Layer (Subsampling Layer)
Example: Input 100 features → Hidden 20 Pooling reduces the size of feature maps by
neurons → Output 100 features taking maximum or average value in a
2) Sparse Autoencoder: region.
A sparse autoencoder may have a hidden layer Types: Max Pooling, Average Pooling
size equal to or greater than input size, but it Function: Reduces computation, prevents
applies a sparsity constraint so that only a few overfitting, and keeps important features.
neurons are active at a time. This helps the model 5. Fully Connected Layer (FC Layer)
learn useful patterns and avoid copying input After feature extraction, the output is
directly. flattened into a vector and passed to fully
Key points: connected layers.
Function: Performs final classification
• Uses sparsity regularization (L1 based on extracted features.
regularization or KL divergence)
6. Output Layer
• Most hidden neurons output near 0 This is the final layer that gives the result.
(inactive)
4
Q9) Advantages of CNN over Multilayer Q10) What is Normalization? Explain Batch
Perceptron / Dense Network Normalization
CNN (Convolutional Neural Network) has many Normalization is a technique used in machine
advantages over Multilayer Perceptron (MLP) / learning and deep learning to scale and transform
Dense Networks, especially for image and spatial data (or activations) into a standard range. It helps
data processing. The main advantages are: to improve training speed, stability, and accuracy
by reducing variations in values. Normalization
1. Automatic Feature Extraction
also prevents problems like slow convergence and
CNN automatically learns important
vanishing/exploding gradients.
features like edges, shapes, and textures
from images using convolution filters, Batch Normalization (BN):
whereas MLP requires manual feature Batch Normalization is a normalization technique
extraction. applied inside neural networks during training. It
normalizes the output (activations) of a layer for
2. Fewer Parameters (Weight Sharing)
each mini-batch, so that the mean becomes 0 and
CNN uses shared weights (same filter
variance becomes 1. This helps the network train
applied across image), so it needs fewer
faster and more reliably.
parameters compared to dense networks
where every neuron connects to all inputs. Working of Batch Normalization:
This reduces memory and computation. For a mini-batch of activations (x):
3. Handles Spatial Information Better 1. Compute batch mean:
CNN preserves spatial relationships (\mu = \frac{1}{m}\sum x)
between pixels (nearby pixels are related),
2. Compute batch variance:
while MLP flattens the image into a 1D
(\sigma^2 = \frac{1}{m}\sum (x - \mu)^2)
vector and loses spatial structure.
3. Normalize:
4. Translation Invariance
(\hat{x} = \frac{x - \mu}{\sqrt{\sigma^2 +
Due to pooling and convolution, CNN can
\epsilon}})
recognize objects even if they shift slightly
where (\epsilon) is a small constant to
in the image. MLP is not good at handling
avoid division by zero.
shifted images.
4. Scale and shift (learnable parameters):
5. Better Performance for Images
(y = \gamma \hat{x} + \beta)
CNN gives higher accuracy in image
where (\gamma) and (\beta) are trainable
classification, object detection, and
parameters.
recognition tasks because it is specially
designed for image data. Advantages of Batch Normalization:
6. Less Overfitting 1. Speeds up training and convergence.
Because CNN has fewer parameters and
uses pooling/dropout, it reduces overfitting 2. Reduces internal covariate shift (stabilizes
compared to MLP, especially when training learning).
on large images. 3. Allows higher learning rates.
4. Reduces vanishing/exploding gradient
problems.
5. Acts like regularization and reduces
overfitting.
5
Q11) Explain Recurrent Neural Network (RNN) / Q12) Vanishing Gradient Problem + Challenges
Working of RNN in Gradient Descent
Recurrent Neural Network (RNN) is a type of The Vanishing Gradient Problem and challenges in
neural network specially designed to process Gradient Descent are major issues in training
sequential data where the current output depends deep neural networks. They affect learning speed,
on previous information. Unlike Feed Forward accuracy, and convergence of the
Neural Networks, RNN has feedback connections, [Link] Gradient Problem:
so it can store past information in the form of Vanishing gradient problem occurs when
hidden state (memory). RNN is mainly used for gradients become very small during
time series data, speech recognition, language backpropagation in deep neural networks. While
translation, and text processing. updating weights, the gradient values shrink as
they move backward from output layer to earlier
Working of RNN:
layers. Due to very small gradients, weight
1. Sequence Input updates become almost zero, so early layers learn
RNN takes input data in sequence form very slowly or stop [Link] reasons:
such as words in a sentence or values in a
1. Use of activation functions like sigmoid
time series. Input at each time step is
and tanh, which produce small derivatives
represented as (x_t).
for large inputs.
2. Hidden State (Memory)
2. Deep networks with many layers multiply
RNN maintains a hidden state (h_t) which
small gradients repeatedly, making them
stores information from previous time
even [Link]:
steps. This hidden state acts as memory.
• Slow training of deep networks
3. Recurrence Relation
At each time step, the hidden state is • Early layers cannot learn important
updated using current input and previous features
hidden state:
• Poor accuracy for long sequence learning
(h_t = f(W_h \cdot h_{t-1} + W_x \cdot x_t +
in RNNsSolutions:
b))
Where, • Use ReLU activation function
(W_h) = weight matrix for previous hidden
state • Use Batch Normalization
(W_x) = weight matrix for input • Use better weight initialization (Xavier/He
(b) = bias initialization)
(f) = activation function (tanh/ReLU)
• Use LSTM/GRU in RNNs to handle long
4. Output Generation dependencies
The output at time step (t) is calculated
using hidden state: Challenges in Gradient Descent:
(y_t = g(W_y \cdot h_t)) 1. Slow Convergence
Where (g) is activation function If learning rate is too small, gradient
(softmax/sigmoid). descent takes many iterations to reach
5. Same Weights for All Time Steps minimum.
RNN uses the same weights at each time 2. Overshooting / Divergence
step, which reduces parameters and helps If learning rate is too large, it may skip the
in learning sequential patterns. minimum point and training becomes
6. Advantages of RNN: unstable.
• Works well for sequential and time- 3. Local Minima and Saddle Points
dependent data Gradient descent may get stuck in local
7
Q15) Explain the working of Deep Learning Q16) State and explain key differences between
model Machine Learning and Deep Learning
Deep Learning is a subset of Machine Learning Machine Learning (ML) and Deep Learning (DL) are
that uses multi-layer Artificial Neural Networks to both used to build intelligent systems, but Deep
learn complex patterns from large datasets. A Learning is a subset of Machine Learning that
deep learning model works by passing input data uses deep neural networks with many layers. The
through multiple layers, extracting features key differences are as follows:
automatically, and producing output such as
1. Definition
classification or prediction.
Machine Learning is a method where
Working of a Deep Learning model: machines learn from data using algorithms
like regression, decision tree, SVM, etc.
1. Input Layer
Deep Learning is a type of ML that uses
The model takes input data such as image
multi-layer neural networks to learn
pixels, text features, or numerical values.
complex patterns automatically.
This input is given to the first layer of the
neural network. 2. Feature Extraction
In ML, features are mostly extracted
2. Forward Propagation
manually by humans (feature engineering
The input data moves forward through
is required).
hidden layers. Each neuron calculates a
In DL, features are automatically learned
weighted sum of inputs and adds bias:
from raw data using hidden layers.
net = Σ(wi × xi) + b
Then activation function is applied (ReLU, 3. Data Requirement
sigmoid, tanh) to generate output for next ML works well even with small to medium
layer. datasets.
DL requires large amount of data for better
3. Feature Learning in Hidden Layers
performance.
Hidden layers automatically learn features
from data. 4. Computational Power
ML can work on normal computers with
• Early layers learn simple features (edges,
less computation.
shapes)
DL requires high computation power like
• Deeper layers learn complex features GPUs/TPUs due to complex neural
(objects, patterns) networks.
8
Q17) Explain architecture of Multilayer Q18) Design neuron for AND and OR operation
Perceptron (MLP)
A neuron (Perceptron) can be designed to
Multilayer Perceptron (MLP) is a type of Artificial implement basic logic gates like AND and OR by
Neural Network (ANN) that consists of multiple choosing suitable weights and bias. The neuron
layers of neurons. It is a feed-forward neural uses weighted sum and a step activation function.
network, meaning data flows only in one direction
Neuron model:
from input to output. MLP is widely used for
net = w1·x1 + w2·x2 + b
classification and regression problems.
Output y = 1 if net ≥ 0, else y = 0
Architecture of MLP:
1. Input Layer
1) Neuron design for AND operation
The input layer receives the input features
Truth table (AND):
from the dataset. Each neuron in this layer
(0,0)→0
represents one input attribute. The input
(0,1)→0
layer only forwards the input values to the
(1,0)→0
next layer and does not perform
(1,1)→1
computation.
Choose: w1 = 1, w2 = 1, b = −1.5
2. Hidden Layer(s)
Between input and output layers, one or Check:
more hidden layers are present. These x1=0,x2=0 → net = 0+0−1.5 = −1.5 → y=0
layers perform actual processing and x1=0,x2=1 → net = 0+1−1.5 = −0.5 → y=0
learning. x1=1,x2=0 → net = 1+0−1.5 = −0.5 → y=0
Each neuron in hidden layer calculates x1=1,x2=1 → net = 1+1−1.5 = 0.5 → y=1
weighted sum of inputs and adds bias:
net = Σ(wi × xi) + b So, this neuron performs AND correctly.
Then activation function is applied (ReLU,
sigmoid, tanh) to generate output.
Hidden layers help MLP to learn complex 2) Neuron design for OR operation
patterns and non-linear relationships. Truth table (OR):
(0,0)→0
3. Output Layer (0,1)→1
The output layer produces the final result (1,0)→1
of the network. (1,1)→1
• For binary classification: sigmoid Choose: w1 = 1, w2 = 1, b = −0.5
activation is used
Check:
• For multi-class classification: softmax x1=0,x2=0 → net = 0+0−0.5 = −0.5 → y=0
activation is used x1=0,x2=1 → net = 0+1−0.5 = 0.5 → y=1
• For regression: linear activation is used x1=1,x2=0 → net = 1+0−0.5 = 0.5 → y=1
x1=1,x2=1 → net = 1+1−0.5 = 1.5 → y=1
Connections in MLP:
So, this neuron performs OR correctly.
• Each neuron in one layer is fully connected
to all neurons in the next layer.
• Each connection has a weight, and each
neuron has a bias.
Training of MLP:
MLP is trained using backpropagation algorithm.
The network output is compared with target
output, error is calculated, and weights are
updated to reduce error.
9
Q19) Discuss implementation of AND gate Q20) Explain XOR implementation using NAND,
using McCulloch Pitts neuron model OR and AND in neural networks
McCulloch Pitts (M-P) neuron is the earliest XOR (Exclusive OR) is a logic operation in which
mathematical model of an artificial neuron. It output is 1 only when inputs are different. XOR
works as a binary threshold unit where inputs and cannot be implemented using a single perceptron
output are only 0 or 1. It produces output 1 only because it is not linearly separable. Therefore,
when the weighted sum of inputs reaches a fixed XOR is implemented using a multi-layer neural
threshold value. network (2-layer perceptron) by combining NAND,
OR, and AND gates.
M-P Neuron Model:
For inputs (x_1, x_2) and weights (w_1, w_2): XOR Truth Table:
Net input = (w_1x_1 + w_2x_2) (0,0) → 0
Output (y = 1) if Net input ≥ Threshold (θ), (0,1) → 1
otherwise (y = 0) (1,0) → 1
(1,1) → 0
XOR using NAND, OR and AND:
Implementation of AND gate using M-P neuron:
The XOR function can be expressed as:
Truth table of AND gate: XOR = (A OR B) AND (A NAND B)
(0,0) → 0
Neural Network Implementation (2-layer):
(0,1) → 0
(1,0) → 0 Hidden Layer (Layer 1):
(1,1) → 1
1. Neuron 1 performs OR operation:
Choose weights and threshold: H1 = A OR B
Let,
2. Neuron 2 performs NAND operation:
(w_1 = 1), (w_2 = 1)
H2 = A NAND B
Threshold (θ = 2)
Output Layer (Layer 2):
So, Net input = (x_1 + x_2)
3) Neuron 3 performs AND operation on H1 and
Output rule: H2:
If (x_1 + x_2 ≥ 2) → (y = 1) Y = H1 AND H2
Else → (y = 0)
So final output becomes:
Working: Y = (A OR B) AND (A NAND B)
1. (x_1=0, x_2=0) → Net = 0 → y = 0 Working with inputs:
2. (x_1=0, x_2=1) → Net = 1 → y = 0 1. A=0, B=0
OR = 0, NAND = 1 → AND(0,1)=0
3. (x_1=1, x_2=0) → Net = 1 → y = 0
2. A=0, B=1
4. (x_1=1, x_2=1) → Net = 2 → y = 1
OR = 1, NAND = 1 → AND(1,1)=1
3. A=1, B=0
OR = 1, NAND = 1 → AND(1,1)=1
4. A=1, B=1
OR = 1, NAND = 0 → AND(1,0)=0
10
Q21) Describe the architecture of CNN Q22) Max Pooling vs Average Pooling: Pros and
Cons
CNN (Convolutional Neural Network) is a deep
learning architecture mainly used for image Pooling is an important layer in CNN used to
processing, pattern recognition, and computer reduce the size of feature maps and decrease
vision applications. CNN is designed to computation. The two common pooling
automatically learn features from input images by techniques are Max Pooling and Average Pooling.
using convolution and pooling operations, and Both perform down-sampling but in different
then perform classification using fully connected ways.
layers.
Max Pooling:
Architecture of CNN: Max pooling selects the maximum value from a
pooling window (example 2×2). It keeps the
1. Input Layer
strongest feature present in that region.
This layer takes the input image as pixel
values. The image is represented in the Pros of Max Pooling:
form of width × height × channels
1. Keeps most important and strong features
(example: 64×64×3 for RGB image).
like edges and textures.
2. Convolution Layer
2. Works well for image classification tasks.
This is the most important layer of CNN. It
applies filters (kernels) on the input image 3. Provides better translation invariance
to extract features such as edges, corners, (small shift in image does not affect much).
and textures. The output of convolution is
called feature map. 4. Reduces computation and overfitting
effectively.
3. Activation Layer (ReLU)
After convolution, activation function like Cons of Max Pooling:
ReLU is applied to introduce non-linearity. 1. It may lose some useful information
It helps the network learn complex because only maximum value is kept.
patterns and improves training speed.
2. Sensitive to noise, because noisy high
4. Pooling Layer value may be selected.
Pooling reduces the size of feature maps by
selecting maximum or average values from Average Pooling:
small regions. It reduces computation and Average pooling takes the average of all values in a
helps in preventing overfitting. Common pooling window. It keeps overall information of
types are Max Pooling and Average Pooling. that region.
11
Q23) Explain different optimization techniques Q24) What is Cross Entropy Loss?
with advantages and disadvantages
Cross Entropy Loss is a widely used loss function
Optimization techniques are used in Machine in Machine Learning and Deep Learning for
Learning and Deep Learning to minimize the loss classification problems. It measures how much
function and update model weights efficiently. the predicted probability distribution differs from
Different optimizers improve training speed, the actual (true) class distribution. The goal is to
stability, and accuracy. minimize this loss so that predicted probabilities
become closer to correct labels.
1) Gradient Descent (Batch Gradient Descent)
In this method, weights are updated using the In classification, the model outputs probabilities
complete dataset. for each class using activation functions like
Advantages: Stable convergence and accurate Sigmoid (binary) or Softmax (multi-class). Cross
gradient direction. entropy loss gives a higher penalty when the
Disadvantages: Slow for large datasets and high model predicts a wrong class with high
memory requirement. confidence.
2) Stochastic Gradient Descent (SGD) Binary Cross Entropy (for 2 classes):
SGD updates weights using one training sample at For target (y) (0 or 1) and predicted probability
a time. (\hat{y}):
Advantages: Faster updates and works well for Loss = (-[y \log(\hat{y}) + (1-y)\log(1-\hat{y})])
large datasets.
Categorical Cross Entropy (for multi-class):
Disadvantages: Noisy updates, may oscillate and
Loss = (- \sum y_i \log(\hat{y_i}))
not converge smoothly.
where (y_i) is the actual class label (one-hot) and
3) Mini-Batch Gradient Descent (\hat{y_i}) is predicted probability.
It updates weights using a small batch of data (like
Advantages of Cross Entropy Loss:
32, 64 samples).
Advantages: Faster than batch GD and more 1. Works well for classification tasks.
stable than SGD.
Disadvantages: Requires proper batch size 2. Provides faster and stable learning with
selection for best performance. probability outputs.
12
Q25) Define and explain: Cross Entropy Loss, Q26) What is Overfitting and Underfitting? How
Local Gradient, Backpropagated Gradient to resolve in neural networks?
In deep learning, training a neural network Overfitting and underfitting are common problems
requires calculating error (loss) and updating in training neural networks. They affect the
weights using gradients. Cross Entropy Loss and performance of the model on new (unseen) data.
gradients (local + backpropagated) are important
Overfitting:
concepts in backpropagation.
Overfitting occurs when a neural network learns
1) Cross Entropy Loss the training data too well, including noise and
Cross Entropy Loss is a loss function mainly used unnecessary details. As a result, it gives very high
for classification problems. It measures the accuracy on training data but poor accuracy on
difference between actual class label and testing/validation data.
predicted probability output. Lower cross entropy Main reason: model is too complex or trained too
means better prediction. long on limited data.
For multi-class classification:
How to resolve overfitting:
Loss = (- \sum y_i \log(\hat{y_i}))
where (y_i) is actual label (one-hot) and (\hat{y_i}) 1. Use more training data or data
is predicted probability. augmentation.
It gives high penalty if model predicts wrong class
with high confidence. 2. Apply regularization techniques like L1/L2
regularization.
2) Local Gradient
Local gradient is the gradient calculated at a 3. Use Dropout layer to randomly deactivate
particular neuron or layer using its own activation neurons during training.
function. It shows how the output of that neuron 4. Use Early stopping to stop training when
changes with respect to its input (net value). validation loss increases.
Example: If activation is sigmoid, local gradient is:
(f'(net) = f(net)(1 - f(net))) 5. Reduce model complexity (fewer
Local gradient is used in backpropagation to layers/neurons).
compute error term (delta) at that layer. 6. Use Batch Normalization to stabilize
3) Backpropagated Gradient training.
Backpropagated gradient is the gradient that is Underfitting:
passed backward from the output layer to hidden Underfitting occurs when the model is too simple
layers during backpropagation. It represents how and cannot learn the patterns of training data
much a hidden layer neuron contributes to the properly. It gives low accuracy on both training
final error. and testing data.
For hidden layer neuron: Main reason: insufficient training or model
(\delta = f'(net) \times \sum (\delta_{next} \cdot capacity is low.
w))
It helps to update weights of earlier layers by using How to resolve underfitting:
error information from later layers. 1. Increase model complexity (more
layers/neurons).
2. Train for more epochs (increase training
time).
3. Use better feature extraction and
preprocessing.
4. Reduce regularization if it is too strong.
5. Use advanced models like CNN/RNN for
complex data.
13
Q27) Discuss techniques for handling Q28) Explain concept of Layer by Layer
overfitting issues during deep learning Pretraining mechanism
Overfitting is a common problem in deep learning Layer by Layer Pretraining is a training technique
where the model performs very well on training used in deep neural networks where the network
data but gives poor performance on new or test is trained one layer at a time instead of training all
data. This happens because the model learns layers together from the beginning. This method
noise and unnecessary patterns from training was mainly used in early deep learning models to
data. To handle overfitting, the following improve training performance and reduce
techniques are used: problems like vanishing gradients.
1. Increase Training Data / Data Concept of Layer by Layer Pretraining:
Augmentation
1. First, the first hidden layer is trained using
Using more data improves generalization.
input data in an unsupervised way
Data augmentation creates new training
(commonly using Autoencoders or
samples by rotation, flipping, cropping,
Restricted Boltzmann Machines). This layer
etc., especially for images.
learns basic features from data.
2. Dropout Technique
2. After training the first layer, its weights are
Dropout randomly disables some neurons
fixed and the output of this layer is used as
during training, so the network does not
input for the next hidden layer.
depend on specific neurons. This reduces
overfitting and improves generalization. 3. Then, the second hidden layer is trained
similarly to learn higher-level features.
3. Regularization (L1 and L2)
Regularization adds penalty to large 4. This process is repeated for all hidden
weights in the loss function. layers, so each layer learns features step-
by-step from lower level to higher level.
• L1 makes weights sparse
5. After pretraining all layers, the complete
• L2 reduces weight values and prevents
network is fine-tuned using supervised
complex models
learning (backpropagation) with labeled
4. Early Stopping data to improve final accuracy.
Training is stopped when validation loss
Advantages of Layer by Layer Pretraining:
starts increasing, even if training loss
decreases. This prevents the model from 1. Helps in better weight initialization and
over-learning training data. faster convergence.
5. Batch Normalization 2. Reduces vanishing gradient problem in
Batch normalization stabilizes learning by deep networks.
normalizing layer outputs. It also acts as a
regularizer and reduces overfitting. 3. Improves learning when labeled data is
limited.
6. Reduce Model Complexity
Using fewer layers, fewer neurons, or 4. Helps deep networks learn meaningful
pruning unnecessary parameters helps features gradually.
avoid overfitting.
7. Cross Validation
Using k-fold cross validation helps in
selecting a model that performs well on
different data samples.
14
Q29) State some applications of Deep Learning Q30) State some applications of Machine
Learning and Deep Learning
Deep Learning is widely used in many real-world
fields because it can learn complex patterns from Machine Learning (ML) and Deep Learning (DL) are
large datasets. Some important applications of widely used in many real-world applications to
Deep Learning are: make systems intelligent, automated, and
accurate. Some important applications are:
1. Image Recognition and Classification
Used in face recognition, object detection, Applications of Machine Learning:
medical image analysis, and CCTV
1. Spam Email Detection
surveillance.
ML classifies emails as spam or non-spam
2. Natural Language Processing (NLP) based on patterns.
Used in language translation, chatbots,
2. Recommendation Systems
sentiment analysis, and text
Used in e-commerce for recommending
summarization.
products based on user interest.
3. Speech Recognition
3. Fraud Detection
Used in voice assistants like Siri, Alexa,
Used in banking to detect suspicious
Google Assistant and speech-to-text
transactions and fraud activities.
systems.
4. Medical Diagnosis
4. Self-Driving Cars
ML helps in predicting diseases using
Used for lane detection, traffic sign
patient data and reports.
recognition, obstacle detection, and
decision making. 5. Stock Market and Sales Prediction
ML is used for forecasting market trends
5. Healthcare
and business sales.
Used for disease prediction, cancer
detection from scans, and personalized Applications of Deep Learning:
treatment.
1. Image Recognition
6. Recommendation Systems Used for face recognition, object
Used by YouTube, Netflix, Amazon for detection, and medical image analysis.
recommending videos, movies, and
products. 2. Speech Recognition
Used in voice assistants like Alexa, Siri,
7. Fraud Detection and Cyber Security and speech-to-text systems.
Used to detect unusual patterns in
transactions and identify cyber attacks. 3. Natural Language Processing (NLP)
Used in chatbots, translation, sentiment
analysis, and text summarization.
4. Self-Driving Cars
Used for detecting lanes, vehicles,
pedestrians, and decision making.
5. Video and Content Recommendation
Used by YouTube and Netflix for
personalized recommendations.
15
Q31) Explain Face Recognition using Deep
Learning approach
Face recognition using Deep Learning is a
technique where a neural network automatically
learns facial features from images and identifies
or verifies a person. Deep learning provides high
accuracy because it can extract complex patterns
like eyes, nose shape, distance between facial
points, and overall face structure.
Working of Face Recognition using Deep
Learning:
1. Face Detection
First, the face is detected from an image or
video frame. Deep learning models like
MTCNN or Haar Cascade can locate the
face region.
2. Preprocessing
The detected face is cropped and resized
to a fixed size. Steps like normalization,
alignment (straightening face), and noise
removal are done to improve accuracy.
3. Feature Extraction using CNN
A Convolutional Neural Network (CNN) is
used to extract important facial features.
The CNN converts the face image into a
feature vector (embedding) which
represents unique face characteristics.
Example models: FaceNet, VGG-Face,
DeepFace.
4. Face Embedding Comparison
The generated embedding is compared
with stored embeddings in database.
Similarity is measured using distance
methods like Euclidean distance or cosine
similarity.
5. Recognition / Classification
If the distance between embeddings is
small, the face is matched and identified.
Otherwise, it is treated as unknown
person.
Advantages of Deep Learning Face Recognition:
1. High accuracy compared to
traditional methods.
2. Works well even with
different lighting, angles,
and expressions.
16