0% found this document useful (0 votes)

6 views16 pages

DL Micro

The document discusses various machine learning concepts, including the differences between supervised and unsupervised learning, Bayes Rule for classification, and the structure of biological neurons. It also covers the architecture of feed-forward neural networks, backpropagation learning, autoencoders, convolutional neural networks (CNNs), and the advantages of CNNs over multilayer perceptrons. Additionally, it explains normalization techniques such as batch normalization and the challenges of the vanishing gradient problem in recurrent neural networks.

Uploaded by

sauravjarhad6

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

6 views16 pages

DL Micro

Uploaded by

sauravjarhad6

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

1. Distinguish between Supervised and 2.

Explain Bayes Rule for classification / Bayes

Unsupervised Learning / Machine Learning classifier
approach
Bayes Rule is a probability theorem used in
Supervised Learning and Unsupervised Learning Machine Learning to classify an unknown data
are two major approaches of Machine Learning. sample based on prior knowledge. It helps to find
Their main differences are: the probability of a class when some features are
given.
1. Training Data
Bayes Theorem Formula:
• Supervised Learning: Uses labeled data
[
(input + correct output).
P(C|X)=\frac{P(X|C)\times P(C)}{P(X)}
• Unsupervised Learning: Uses unlabeled ]
data (only input data, no output).
Where:
2. Goal
• (P(C|X)) = Posterior Probability
• Supervised Learning: To predict output (probability of class C after seeing data X)
for new data.
(P(X|C)) = Likelihood (probability of data X given
• Unsupervised Learning: To discover class C)
hidden patterns or structure in data.
• (P(C)) = Prior Probability (probability of
3. Learning Method class C before seeing data)

• Supervised Learning: Learns mapping (P(X)) = Evidence (probability of data X)

function X → Y using known answers.
Bayes Classifier (Bayesian Classification)
• Unsupervised Learning: Groups or
A Bayes classifier assigns the input (X) to the class
organizes data based on similarity without
having the highest posterior probability.
known answers.
Decision Rule:
4. Types of Problems
Choose class (C_i) if
• Supervised Learning: Mainly used for [
Classification and Regression. P(C_i|X) > P(C_j|X) \quad \text{for all } j \neq i
]
• Unsupervised Learning: Mainly used for
Clustering and Association. Naive Bayes Classifier

5. Examples Naive Bayes is a popular Bayes classifier that

assumes all features are independent.
• Supervised Learning: Email spam
detection, house price prediction, disease [
prediction. P(C|X)=P(C|x_1,x_2,...,x_n)\propto P(C)\times
P(x_1|C)\times P(x_2|C)\times ... \times P(x_n|C)
• Unsupervised Learning: Customer ]
segmentation, grouping similar
documents, market basket analysis. Example (Simple)

6. Algorithms In spam detection:

• Supervised Learning: Linear Regression, • Class = {Spam, Not Spam}

Decision Tree, SVM, Naive Bayes, KNN.
• Features = {contains “free”, “win”, “offer”}
Using Bayes rule, we calculate probability
of Spam and Not Spam, then choose the
class with higher probability.

1
Q3) What is Neuron? Explain structure of Q4) Explain Architecture of Feed Forward
Biological Neuron Neural Network with necessary convention
A neuron is the basic structural and functional A Feed Forward Neural Network (FFNN) is the
unit of the nervous system. It is a specialized cell simplest type of Artificial Neural Network in which
that receives information, processes it, and information flows only in one direction, i.e., from
transmits signals to other neurons, muscles, or input layer to output layer through hidden layer(s).
glands in the form of electrical and chemical There is no feedback connection or loop in this
impulses. Neurons are responsible for network. It is mainly used for classification and
communication within the brain and throughout prediction tasks.
the body.
Architecture of Feed Forward Neural Network:
Structure of a Biological Neuron:
1. Input Layer
1. Cell Body (Soma) This is the first layer of the network. It
The cell body contains the nucleus and receives the input data/features from the
other organelles. It controls all activities of external environment. Each neuron in the
the neuron like metabolism, growth, and input layer represents one input feature.
signal processing. It also integrates the Input layer only passes data forward and
incoming signals. does not perform computation.
2. Dendrites 2. Hidden Layer(s)
Dendrites are branch-like structures Hidden layers are placed between input
connected to the cell body. Their main and output layers. These layers perform
function is to receive signals (inputs) from actual processing and learning. Each
other neurons and carry them towards the neuron in hidden layer receives weighted
cell body. inputs, applies an activation function, and
passes output to the next layer. A network
3. Axon
can have one or more hidden layers.
The axon is a long, tube-like structure that
carries nerve impulses away from the cell 3. Output Layer
body to other neurons or target cells. It This is the last layer of the network. It
acts as the output pathway of the neuron. produces the final output of the neural
network. The output depends on the
4. Myelin Sheath
problem type:
The axon is often covered by a fatty
insulating layer called myelin sheath. It • For classification: output may be class
increases the speed of signal transmission labels or probabilities
by preventing signal loss.
• For regression: output may be a numeric
5. Nodes of Ranvier value
These are small gaps between myelin
Necessary conventions in FFNN:
sheath segments on the axon. They help in
faster transmission of impulses through a • Weights (w): Each connection between
process called saltatory conduction. neurons has a weight. It represents the
strength of the connection.
6. Axon Terminals (Synaptic Terminals)
These are the end branches of the axon. • Bias (b): Each neuron has a bias value
They release neurotransmitters to which helps to shift the activation function.
communicate with the next neuron through
• Net input:
a junction called synapse.
net = Σ (xi * wi) + b
• Activation function (f): Converts net input
into output. Common functions are
sigmoid, ReLU, and tanh.
output = f(net)
2
Q5) What is Back Propagation Learning? Explain Q6) How weights are updated at output layer for
algorithm / weight update Multilayer Neural Network?
Back Propagation Learning is a supervised In a Multilayer Neural Network, weights at the
learning algorithm used to train multi-layer feed output layer are updated using the
forward neural networks. In this method, the error backpropagation learning rule. The main goal is to
between actual output and desired output is reduce the error between the target output and
calculated at the output layer and then the actual output by adjusting the weights using
propagated backward through hidden layers to gradient descent.
update the weights. The main aim is to minimize
Steps for weight update at output layer:
the total error by adjusting weights and biases.
1. Calculate net input to output neuron
Back Propagation Algorithm (Steps):
For an output neuron (k):
1. Initialize Weights and Biases net_k = (\sum (w_{jk} \cdot y_j)) + b_k
Set all weights and biases to small random Where,
values. (w_{jk}) = weight from hidden neuron (j) to
output neuron (k)
2. Forward Pass (Forward Propagation)
(y_j) = output of hidden neuron (j)
Give input vector to the network and
(b_k) = bias of output neuron
compute output layer result by passing
data through hidden layers using activation 2. Calculate actual output
functions. (o_k = f(net_k))
where (f) is activation function
3. Calculate Error
(sigmoid/tanh/ReLU).
Find difference between target output (T)
and actual output (Y). 3. Compute error at output neuron
Error for one output neuron: Error difference:
E = 1/2 (T − Y)² (e_k = (t_k - o_k))
where (t_k) is target output.
4. Backward Pass (Error Propagation)
Compute error gradients starting from 4. Calculate delta (error term) for output
output layer to hidden layers using chain neuron
rule. (\delta_k = (t_k - o_k) \cdot f'(net_k))
5. Weight Update 5. Update output layer weights
Update weights using gradient descent Weight update rule:
method to reduce error. (\Delta w_{jk} = \eta \cdot \delta_k \cdot
y_j)
Weight Update Rule:
For weight w between neurons: New weight:
w(new) = w(old) + Δw (w_{jk}(new) = w_{jk}(old) + \Delta w_{jk})
Where, 6. Update output layer bias
Δw = η × δ × x (\Delta b_k = \eta \cdot \delta_k)
(b_k(new) = b_k(old) + \Delta b_k)
η = learning rate (small positive constant)
x = input to the neuron
δ = error term (gradient)
For Output Layer Neuron:
δ = (T − Y) × f'(net)
For Hidden Layer Neuron:
δ = f'(net) × Σ (δ(next layer) × w)
Bias Update:
b(new) = b(old) + η × δ

3
Q7) What is Autoencoder? Explain Q8) Explain Architecture / Layers of CNN and
Undercomplete + Sparse Autoencoder their functions
An Autoencoder is a type of Artificial Neural CNN (Convolutional Neural Network) is a deep
Network used for unsupervised learning. It is learning model mainly used for image processing,
mainly used to learn efficient data representation pattern recognition, and computer vision tasks.
(encoding) by compressing the input data into a CNN automatically extracts important features
smaller form and then reconstructing the same from input images using different layers and
input at the output. The goal of an autoencoder is performs classification or detection.
to minimize reconstruction error between input
Architecture / Layers of CNN and their
and output.
functions:
Basic Structure of Autoencoder:
1. Input Layer
1. Encoder: Converts input data into This layer takes the input image in the form
compressed representation (latent vector). of pixels.
Example: 32×32×3 image (width × height ×
2. Bottleneck / Latent Space: The
channels).
compressed hidden layer representation.
2. Convolution Layer
3. Decoder: Reconstructs original input from
This is the main layer of CNN. It applies
the latent vector.
filters/kernels on the input image to extract
features like edges, corners, textures, and
patterns.
1) Undercomplete Autoencoder: Output of this layer is called feature map.
An undercomplete autoencoder has a hidden Function: Feature extraction and reducing
layer (latent layer) with fewer neurons than the parameters compared to fully connected
input layer. This forces the network to compress networks.
the data and learn the most important features.
3. Activation Layer (ReLU)
Key points: After convolution, an activation function is
• Hidden layer size < Input layer size applied, mostly ReLU (Rectified Linear
Unit).
• Learns compact and meaningful ReLU(x) = max(0, x)
representation Function: Adds non-linearity and improves
• Used for dimensionality reduction (like learning speed.
PCA) and feature extraction 4. Pooling Layer (Subsampling Layer)
Example: Input 100 features → Hidden 20 Pooling reduces the size of feature maps by
neurons → Output 100 features taking maximum or average value in a
2) Sparse Autoencoder: region.
A sparse autoencoder may have a hidden layer Types: Max Pooling, Average Pooling
size equal to or greater than input size, but it Function: Reduces computation, prevents
applies a sparsity constraint so that only a few overfitting, and keeps important features.
neurons are active at a time. This helps the model 5. Fully Connected Layer (FC Layer)
learn useful patterns and avoid copying input After feature extraction, the output is
directly. flattened into a vector and passed to fully
Key points: connected layers.
Function: Performs final classification
• Uses sparsity regularization (L1 based on extracted features.
regularization or KL divergence)
6. Output Layer
• Most hidden neurons output near 0 This is the final layer that gives the result.
(inactive)

4
Q9) Advantages of CNN over Multilayer Q10) What is Normalization? Explain Batch
Perceptron / Dense Network Normalization
CNN (Convolutional Neural Network) has many Normalization is a technique used in machine
advantages over Multilayer Perceptron (MLP) / learning and deep learning to scale and transform
Dense Networks, especially for image and spatial data (or activations) into a standard range. It helps
data processing. The main advantages are: to improve training speed, stability, and accuracy
by reducing variations in values. Normalization
1. Automatic Feature Extraction
also prevents problems like slow convergence and
CNN automatically learns important
vanishing/exploding gradients.
features like edges, shapes, and textures
from images using convolution filters, Batch Normalization (BN):
whereas MLP requires manual feature Batch Normalization is a normalization technique
extraction. applied inside neural networks during training. It
normalizes the output (activations) of a layer for
2. Fewer Parameters (Weight Sharing)
each mini-batch, so that the mean becomes 0 and
CNN uses shared weights (same filter
variance becomes 1. This helps the network train
applied across image), so it needs fewer
faster and more reliably.
parameters compared to dense networks
where every neuron connects to all inputs. Working of Batch Normalization:
This reduces memory and computation. For a mini-batch of activations (x):
3. Handles Spatial Information Better 1. Compute batch mean:
CNN preserves spatial relationships (\mu = \frac{1}{m}\sum x)
between pixels (nearby pixels are related),
2. Compute batch variance:
while MLP flattens the image into a 1D
(\sigma^2 = \frac{1}{m}\sum (x - \mu)^2)
vector and loses spatial structure.
3. Normalize:
4. Translation Invariance
(\hat{x} = \frac{x - \mu}{\sqrt{\sigma^2 +
Due to pooling and convolution, CNN can
\epsilon}})
recognize objects even if they shift slightly
where (\epsilon) is a small constant to
in the image. MLP is not good at handling
avoid division by zero.
shifted images.
4. Scale and shift (learnable parameters):
5. Better Performance for Images
(y = \gamma \hat{x} + \beta)
CNN gives higher accuracy in image
where (\gamma) and (\beta) are trainable
classification, object detection, and
parameters.
recognition tasks because it is specially
designed for image data. Advantages of Batch Normalization:
6. Less Overfitting 1. Speeds up training and convergence.
Because CNN has fewer parameters and
uses pooling/dropout, it reduces overfitting 2. Reduces internal covariate shift (stabilizes
compared to MLP, especially when training learning).
on large images. 3. Allows higher learning rates.
4. Reduces vanishing/exploding gradient
problems.
5. Acts like regularization and reduces
overfitting.

5
Q11) Explain Recurrent Neural Network (RNN) / Q12) Vanishing Gradient Problem + Challenges
Working of RNN in Gradient Descent
Recurrent Neural Network (RNN) is a type of The Vanishing Gradient Problem and challenges in
neural network specially designed to process Gradient Descent are major issues in training
sequential data where the current output depends deep neural networks. They affect learning speed,
on previous information. Unlike Feed Forward accuracy, and convergence of the
Neural Networks, RNN has feedback connections, [Link] Gradient Problem:
so it can store past information in the form of Vanishing gradient problem occurs when
hidden state (memory). RNN is mainly used for gradients become very small during
time series data, speech recognition, language backpropagation in deep neural networks. While
translation, and text processing. updating weights, the gradient values shrink as
they move backward from output layer to earlier
Working of RNN:
layers. Due to very small gradients, weight
1. Sequence Input updates become almost zero, so early layers learn
RNN takes input data in sequence form very slowly or stop [Link] reasons:
such as words in a sentence or values in a
1. Use of activation functions like sigmoid
time series. Input at each time step is
and tanh, which produce small derivatives
represented as (x_t).
for large inputs.
2. Hidden State (Memory)
2. Deep networks with many layers multiply
RNN maintains a hidden state (h_t) which
small gradients repeatedly, making them
stores information from previous time
even [Link]:
steps. This hidden state acts as memory.
• Slow training of deep networks
3. Recurrence Relation
At each time step, the hidden state is • Early layers cannot learn important
updated using current input and previous features
hidden state:
• Poor accuracy for long sequence learning
(h_t = f(W_h \cdot h_{t-1} + W_x \cdot x_t +
in RNNsSolutions:
b))
Where, • Use ReLU activation function
(W_h) = weight matrix for previous hidden
state • Use Batch Normalization
(W_x) = weight matrix for input • Use better weight initialization (Xavier/He
(b) = bias initialization)
(f) = activation function (tanh/ReLU)
• Use LSTM/GRU in RNNs to handle long
4. Output Generation dependencies
The output at time step (t) is calculated
using hidden state: Challenges in Gradient Descent:
(y_t = g(W_y \cdot h_t)) 1. Slow Convergence
Where (g) is activation function If learning rate is too small, gradient
(softmax/sigmoid). descent takes many iterations to reach
5. Same Weights for All Time Steps minimum.
RNN uses the same weights at each time 2. Overshooting / Divergence
step, which reduces parameters and helps If learning rate is too large, it may skip the
in learning sequential patterns. minimum point and training becomes
6. Advantages of RNN: unstable.

• Works well for sequential and time- 3. Local Minima and Saddle Points
dependent data Gradient descent may get stuck in local

• Maintains memory of previous inputs

6
Q13) Explain steps involved in Machine Q14) What are the different steps used in
Learning: Preprocessing, Segmentation, typical Deep Learning model?
Feature Extraction
A typical Deep Learning model is developed by
In Machine Learning, raw data cannot be directly following a sequence of steps from data
used for training because it may contain noise, collection to final deployment. These steps help in
missing values, or unwanted information. building an accurate and efficient model.
Therefore, before applying ML algorithms, the data
Steps used in a typical Deep Learning model:
is prepared through important steps like
preprocessing, segmentation, and feature 1. Data Collection
extraction. Collect large amount of data from sources
like images, text, sensors, or databases.
1) Preprocessing
Deep learning requires big datasets for
Preprocessing is the first step where raw data is
better training.
cleaned and transformed into a suitable format for
machine learning. It improves data quality and 2. Data Preprocessing
model accuracy. Clean and prepare data by removing noise,
Main tasks in preprocessing are: handling missing values, normalization,
resizing images, tokenizing text, etc. This
• Data cleaning: removing noise, duplicates,
improves model performance.
and incorrect values
3. Data Splitting
• Handling missing values: filling missing
Divide dataset into:
data using mean/median or removing rows
• Training set (for learning)
• Normalization/Standardization: scaling
data into a fixed range • Validation set (for tuning)
• Encoding: converting categorical data into • Testing set (for final evaluation)
numeric form (Label encoding, One-hot
encoding) 4. Model Selection / Designing Architecture
Choose the deep learning model type such
• Data transformation: converting data into as ANN, CNN, RNN, LSTM, etc. Decide
required format number of layers, neurons, activation
functions, and parameters.
2) Segmentation
Segmentation means dividing the data into 5. Model Training
meaningful parts or regions so that analysis Train the model using training data by
becomes easier. It is mainly used in image forward propagation and backpropagation.
processing and pattern recognition. Weights are updated using optimizers like
Examples: Gradient Descent, Adam, etc.
• Image segmentation: separating object 6. Hyperparameter Tuning
from background Adjust learning rate, batch size, number of
epochs, dropout, and optimizer settings to
• Text segmentation: splitting sentences into
improve accuracy.
words or tokens
7. Model Evaluation
• Customer segmentation: dividing
Evaluate model performance using test
customers into groups based on behavior
data and metrics like accuracy, precision,
Purpose of segmentation: recall, F1-score, and loss.
• Focus on important regions of data 8. Model Deployment
Deploy the trained model into real-world
• Reduce complexity and improve accuracy
applications using cloud, mobile apps, or
3) Feature Extraction web systems.
Feature extraction is the process of selecting or

7
Q15) Explain the working of Deep Learning Q16) State and explain key differences between
model Machine Learning and Deep Learning
Deep Learning is a subset of Machine Learning Machine Learning (ML) and Deep Learning (DL) are
that uses multi-layer Artificial Neural Networks to both used to build intelligent systems, but Deep
learn complex patterns from large datasets. A Learning is a subset of Machine Learning that
deep learning model works by passing input data uses deep neural networks with many layers. The
through multiple layers, extracting features key differences are as follows:
automatically, and producing output such as
1. Definition
classification or prediction.
Machine Learning is a method where
Working of a Deep Learning model: machines learn from data using algorithms
like regression, decision tree, SVM, etc.
1. Input Layer
Deep Learning is a type of ML that uses
The model takes input data such as image
multi-layer neural networks to learn
pixels, text features, or numerical values.
complex patterns automatically.
This input is given to the first layer of the
neural network. 2. Feature Extraction
In ML, features are mostly extracted
2. Forward Propagation
manually by humans (feature engineering
The input data moves forward through
is required).
hidden layers. Each neuron calculates a
In DL, features are automatically learned
weighted sum of inputs and adds bias:
from raw data using hidden layers.
net = Σ(wi × xi) + b
Then activation function is applied (ReLU, 3. Data Requirement
sigmoid, tanh) to generate output for next ML works well even with small to medium
layer. datasets.
DL requires large amount of data for better
3. Feature Learning in Hidden Layers
performance.
Hidden layers automatically learn features
from data. 4. Computational Power
ML can work on normal computers with
• Early layers learn simple features (edges,
less computation.
shapes)
DL requires high computation power like
• Deeper layers learn complex features GPUs/TPUs due to complex neural
(objects, patterns) networks.

4. Output Layer 5. Model Complexity

The last layer produces final output. ML models are simpler and easier to
understand.
• For classification: softmax/sigmoid gives DL models are complex and act like a
class probability “black box” sometimes.
• For regression: gives numeric output 6. Accuracy and Performance
5. Loss / Error Calculation ML gives good performance for structured
The predicted output is compared with data.
actual target output using loss function DL gives higher accuracy for unstructured
such as: data like images, audio, video, and text.

• Mean Squared Error (MSE) for regression 7. Training Time

ML training is faster.
• Cross Entropy loss for classification DL training takes more time due to many
6. Backpropagation layers and parameters.
The error is propagated backward from
output layer to input layer. Gradients are

8
Q17) Explain architecture of Multilayer Q18) Design neuron for AND and OR operation
Perceptron (MLP)
A neuron (Perceptron) can be designed to
Multilayer Perceptron (MLP) is a type of Artificial implement basic logic gates like AND and OR by
Neural Network (ANN) that consists of multiple choosing suitable weights and bias. The neuron
layers of neurons. It is a feed-forward neural uses weighted sum and a step activation function.
network, meaning data flows only in one direction
Neuron model:
from input to output. MLP is widely used for
net = w1·x1 + w2·x2 + b
classification and regression problems.
Output y = 1 if net ≥ 0, else y = 0
Architecture of MLP:
1. Input Layer
1) Neuron design for AND operation
The input layer receives the input features
Truth table (AND):
from the dataset. Each neuron in this layer
(0,0)→0
represents one input attribute. The input
(0,1)→0
layer only forwards the input values to the
(1,0)→0
next layer and does not perform
(1,1)→1
computation.
Choose: w1 = 1, w2 = 1, b = −1.5
2. Hidden Layer(s)
Between input and output layers, one or Check:
more hidden layers are present. These x1=0,x2=0 → net = 0+0−1.5 = −1.5 → y=0
layers perform actual processing and x1=0,x2=1 → net = 0+1−1.5 = −0.5 → y=0
learning. x1=1,x2=0 → net = 1+0−1.5 = −0.5 → y=0
Each neuron in hidden layer calculates x1=1,x2=1 → net = 1+1−1.5 = 0.5 → y=1
weighted sum of inputs and adds bias:
net = Σ(wi × xi) + b So, this neuron performs AND correctly.
Then activation function is applied (ReLU,
sigmoid, tanh) to generate output.
Hidden layers help MLP to learn complex 2) Neuron design for OR operation
patterns and non-linear relationships. Truth table (OR):
(0,0)→0
3. Output Layer (0,1)→1
The output layer produces the final result (1,0)→1
of the network. (1,1)→1
• For binary classification: sigmoid Choose: w1 = 1, w2 = 1, b = −0.5
activation is used
Check:
• For multi-class classification: softmax x1=0,x2=0 → net = 0+0−0.5 = −0.5 → y=0
activation is used x1=0,x2=1 → net = 0+1−0.5 = 0.5 → y=1
• For regression: linear activation is used x1=1,x2=0 → net = 1+0−0.5 = 0.5 → y=1
x1=1,x2=1 → net = 1+1−0.5 = 1.5 → y=1
Connections in MLP:
So, this neuron performs OR correctly.
• Each neuron in one layer is fully connected
to all neurons in the next layer.
• Each connection has a weight, and each
neuron has a bias.
Training of MLP:
MLP is trained using backpropagation algorithm.
The network output is compared with target
output, error is calculated, and weights are
updated to reduce error.
9
Q19) Discuss implementation of AND gate Q20) Explain XOR implementation using NAND,
using McCulloch Pitts neuron model OR and AND in neural networks
McCulloch Pitts (M-P) neuron is the earliest XOR (Exclusive OR) is a logic operation in which
mathematical model of an artificial neuron. It output is 1 only when inputs are different. XOR
works as a binary threshold unit where inputs and cannot be implemented using a single perceptron
output are only 0 or 1. It produces output 1 only because it is not linearly separable. Therefore,
when the weighted sum of inputs reaches a fixed XOR is implemented using a multi-layer neural
threshold value. network (2-layer perceptron) by combining NAND,
OR, and AND gates.
M-P Neuron Model:
For inputs (x_1, x_2) and weights (w_1, w_2): XOR Truth Table:
Net input = (w_1x_1 + w_2x_2) (0,0) → 0
Output (y = 1) if Net input ≥ Threshold (θ), (0,1) → 1
otherwise (y = 0) (1,0) → 1
(1,1) → 0
XOR using NAND, OR and AND:
Implementation of AND gate using M-P neuron:
The XOR function can be expressed as:
Truth table of AND gate: XOR = (A OR B) AND (A NAND B)
(0,0) → 0
Neural Network Implementation (2-layer):
(0,1) → 0
(1,0) → 0 Hidden Layer (Layer 1):
(1,1) → 1
1. Neuron 1 performs OR operation:
Choose weights and threshold: H1 = A OR B
Let,
2. Neuron 2 performs NAND operation:
(w_1 = 1), (w_2 = 1)
H2 = A NAND B
Threshold (θ = 2)
Output Layer (Layer 2):
So, Net input = (x_1 + x_2)
3) Neuron 3 performs AND operation on H1 and
Output rule: H2:
If (x_1 + x_2 ≥ 2) → (y = 1) Y = H1 AND H2
Else → (y = 0)
So final output becomes:
Working: Y = (A OR B) AND (A NAND B)
1. (x_1=0, x_2=0) → Net = 0 → y = 0 Working with inputs:
2. (x_1=0, x_2=1) → Net = 1 → y = 0 1. A=0, B=0
OR = 0, NAND = 1 → AND(0,1)=0
3. (x_1=1, x_2=0) → Net = 1 → y = 0
2. A=0, B=1
4. (x_1=1, x_2=1) → Net = 2 → y = 1
OR = 1, NAND = 1 → AND(1,1)=1
3. A=1, B=0
OR = 1, NAND = 1 → AND(1,1)=1
4. A=1, B=1
OR = 1, NAND = 0 → AND(1,0)=0

10
Q21) Describe the architecture of CNN Q22) Max Pooling vs Average Pooling: Pros and
Cons
CNN (Convolutional Neural Network) is a deep
learning architecture mainly used for image Pooling is an important layer in CNN used to
processing, pattern recognition, and computer reduce the size of feature maps and decrease
vision applications. CNN is designed to computation. The two common pooling
automatically learn features from input images by techniques are Max Pooling and Average Pooling.
using convolution and pooling operations, and Both perform down-sampling but in different
then perform classification using fully connected ways.
layers.
Max Pooling:
Architecture of CNN: Max pooling selects the maximum value from a
pooling window (example 2×2). It keeps the
1. Input Layer
strongest feature present in that region.
This layer takes the input image as pixel
values. The image is represented in the Pros of Max Pooling:
form of width × height × channels
1. Keeps most important and strong features
(example: 64×64×3 for RGB image).
like edges and textures.
2. Convolution Layer
2. Works well for image classification tasks.
This is the most important layer of CNN. It
applies filters (kernels) on the input image 3. Provides better translation invariance
to extract features such as edges, corners, (small shift in image does not affect much).
and textures. The output of convolution is
called feature map. 4. Reduces computation and overfitting
effectively.
3. Activation Layer (ReLU)
After convolution, activation function like Cons of Max Pooling:
ReLU is applied to introduce non-linearity. 1. It may lose some useful information
It helps the network learn complex because only maximum value is kept.
patterns and improves training speed.
2. Sensitive to noise, because noisy high
4. Pooling Layer value may be selected.
Pooling reduces the size of feature maps by
selecting maximum or average values from Average Pooling:
small regions. It reduces computation and Average pooling takes the average of all values in a
helps in preventing overfitting. Common pooling window. It keeps overall information of
types are Max Pooling and Average Pooling. that region.

5. Flattening Layer Pros of Average Pooling:

The feature maps obtained after 1. Preserves more general information of the
convolution and pooling are converted into feature map.
a one-dimensional vector. This vector is
used as input to fully connected layers. 2. Less sensitive to noise compared to max
pooling.
6. Fully Connected Layer (Dense Layer)
This layer connects all neurons and 3. Useful when smooth representation is
performs final decision making based on required.
extracted features. It learns the Cons of Average Pooling:
relationship between features and output
classes. 1. May reduce important features because it
averages strong signals with weak signals.
2. Not as effective as max pooling for
highlighting key features.

11
Q23) Explain different optimization techniques Q24) What is Cross Entropy Loss?
with advantages and disadvantages
Cross Entropy Loss is a widely used loss function
Optimization techniques are used in Machine in Machine Learning and Deep Learning for
Learning and Deep Learning to minimize the loss classification problems. It measures how much
function and update model weights efficiently. the predicted probability distribution differs from
Different optimizers improve training speed, the actual (true) class distribution. The goal is to
stability, and accuracy. minimize this loss so that predicted probabilities
become closer to correct labels.
1) Gradient Descent (Batch Gradient Descent)
In this method, weights are updated using the In classification, the model outputs probabilities
complete dataset. for each class using activation functions like
Advantages: Stable convergence and accurate Sigmoid (binary) or Softmax (multi-class). Cross
gradient direction. entropy loss gives a higher penalty when the
Disadvantages: Slow for large datasets and high model predicts a wrong class with high
memory requirement. confidence.
2) Stochastic Gradient Descent (SGD) Binary Cross Entropy (for 2 classes):
SGD updates weights using one training sample at For target (y) (0 or 1) and predicted probability
a time. (\hat{y}):
Advantages: Faster updates and works well for Loss = (-[y \log(\hat{y}) + (1-y)\log(1-\hat{y})])
large datasets.
Categorical Cross Entropy (for multi-class):
Disadvantages: Noisy updates, may oscillate and
Loss = (- \sum y_i \log(\hat{y_i}))
not converge smoothly.
where (y_i) is the actual class label (one-hot) and
3) Mini-Batch Gradient Descent (\hat{y_i}) is predicted probability.
It updates weights using a small batch of data (like
Advantages of Cross Entropy Loss:
32, 64 samples).
Advantages: Faster than batch GD and more 1. Works well for classification tasks.
stable than SGD.
Disadvantages: Requires proper batch size 2. Provides faster and stable learning with
selection for best performance. probability outputs.

4) Momentum Optimizer 3. Strongly penalizes incorrect predictions,

Momentum uses previous weight update direction improving accuracy.
to speed up learning.
Advantages: Faster convergence and reduces
oscillations.
Disadvantages: Requires tuning momentum
parameter and may overshoot.
5) AdaGrad (Adaptive Gradient)
AdaGrad adapts learning rate for each parameter
based on past gradients.
Advantages: Good for sparse data and improves
learning for rare features.
Disadvantages: Learning rate decreases too much
over time, slowing training.
6) RMSProp
RMSProp fixes AdaGrad problem by using moving
average of squared gradients.
Advantages: Works well for non-stationary
problems and faster convergence.

12
Q25) Define and explain: Cross Entropy Loss, Q26) What is Overfitting and Underfitting? How
Local Gradient, Backpropagated Gradient to resolve in neural networks?
In deep learning, training a neural network Overfitting and underfitting are common problems
requires calculating error (loss) and updating in training neural networks. They affect the
weights using gradients. Cross Entropy Loss and performance of the model on new (unseen) data.
gradients (local + backpropagated) are important
Overfitting:
concepts in backpropagation.
Overfitting occurs when a neural network learns
1) Cross Entropy Loss the training data too well, including noise and
Cross Entropy Loss is a loss function mainly used unnecessary details. As a result, it gives very high
for classification problems. It measures the accuracy on training data but poor accuracy on
difference between actual class label and testing/validation data.
predicted probability output. Lower cross entropy Main reason: model is too complex or trained too
means better prediction. long on limited data.
For multi-class classification:
How to resolve overfitting:
Loss = (- \sum y_i \log(\hat{y_i}))
where (y_i) is actual label (one-hot) and (\hat{y_i}) 1. Use more training data or data
is predicted probability. augmentation.
It gives high penalty if model predicts wrong class
with high confidence. 2. Apply regularization techniques like L1/L2
regularization.
2) Local Gradient
Local gradient is the gradient calculated at a 3. Use Dropout layer to randomly deactivate
particular neuron or layer using its own activation neurons during training.
function. It shows how the output of that neuron 4. Use Early stopping to stop training when
changes with respect to its input (net value). validation loss increases.
Example: If activation is sigmoid, local gradient is:
(f'(net) = f(net)(1 - f(net))) 5. Reduce model complexity (fewer
Local gradient is used in backpropagation to layers/neurons).
compute error term (delta) at that layer. 6. Use Batch Normalization to stabilize
3) Backpropagated Gradient training.
Backpropagated gradient is the gradient that is Underfitting:
passed backward from the output layer to hidden Underfitting occurs when the model is too simple
layers during backpropagation. It represents how and cannot learn the patterns of training data
much a hidden layer neuron contributes to the properly. It gives low accuracy on both training
final error. and testing data.
For hidden layer neuron: Main reason: insufficient training or model
(\delta = f'(net) \times \sum (\delta_{next} \cdot capacity is low.
w))
It helps to update weights of earlier layers by using How to resolve underfitting:
error information from later layers. 1. Increase model complexity (more
layers/neurons).
2. Train for more epochs (increase training
time).
3. Use better feature extraction and
preprocessing.
4. Reduce regularization if it is too strong.
5. Use advanced models like CNN/RNN for
complex data.

13
Q27) Discuss techniques for handling Q28) Explain concept of Layer by Layer
overfitting issues during deep learning Pretraining mechanism
Overfitting is a common problem in deep learning Layer by Layer Pretraining is a training technique
where the model performs very well on training used in deep neural networks where the network
data but gives poor performance on new or test is trained one layer at a time instead of training all
data. This happens because the model learns layers together from the beginning. This method
noise and unnecessary patterns from training was mainly used in early deep learning models to
data. To handle overfitting, the following improve training performance and reduce
techniques are used: problems like vanishing gradients.
1. Increase Training Data / Data Concept of Layer by Layer Pretraining:
Augmentation
1. First, the first hidden layer is trained using
Using more data improves generalization.
input data in an unsupervised way
Data augmentation creates new training
(commonly using Autoencoders or
samples by rotation, flipping, cropping,
Restricted Boltzmann Machines). This layer
etc., especially for images.
learns basic features from data.
2. Dropout Technique
2. After training the first layer, its weights are
Dropout randomly disables some neurons
fixed and the output of this layer is used as
during training, so the network does not
input for the next hidden layer.
depend on specific neurons. This reduces
overfitting and improves generalization. 3. Then, the second hidden layer is trained
similarly to learn higher-level features.
3. Regularization (L1 and L2)
Regularization adds penalty to large 4. This process is repeated for all hidden
weights in the loss function. layers, so each layer learns features step-
by-step from lower level to higher level.
• L1 makes weights sparse
5. After pretraining all layers, the complete
• L2 reduces weight values and prevents
network is fine-tuned using supervised
complex models
learning (backpropagation) with labeled
4. Early Stopping data to improve final accuracy.
Training is stopped when validation loss
Advantages of Layer by Layer Pretraining:
starts increasing, even if training loss
decreases. This prevents the model from 1. Helps in better weight initialization and
over-learning training data. faster convergence.
5. Batch Normalization 2. Reduces vanishing gradient problem in
Batch normalization stabilizes learning by deep networks.
normalizing layer outputs. It also acts as a
regularizer and reduces overfitting. 3. Improves learning when labeled data is
limited.
6. Reduce Model Complexity
Using fewer layers, fewer neurons, or 4. Helps deep networks learn meaningful
pruning unnecessary parameters helps features gradually.
avoid overfitting.
7. Cross Validation
Using k-fold cross validation helps in
selecting a model that performs well on
different data samples.

14
Q29) State some applications of Deep Learning Q30) State some applications of Machine
Learning and Deep Learning
Deep Learning is widely used in many real-world
fields because it can learn complex patterns from Machine Learning (ML) and Deep Learning (DL) are
large datasets. Some important applications of widely used in many real-world applications to
Deep Learning are: make systems intelligent, automated, and
accurate. Some important applications are:
1. Image Recognition and Classification
Used in face recognition, object detection, Applications of Machine Learning:
medical image analysis, and CCTV
1. Spam Email Detection
surveillance.
ML classifies emails as spam or non-spam
2. Natural Language Processing (NLP) based on patterns.
Used in language translation, chatbots,
2. Recommendation Systems
sentiment analysis, and text
Used in e-commerce for recommending
summarization.
products based on user interest.
3. Speech Recognition
3. Fraud Detection
Used in voice assistants like Siri, Alexa,
Used in banking to detect suspicious
Google Assistant and speech-to-text
transactions and fraud activities.
systems.
4. Medical Diagnosis
4. Self-Driving Cars
ML helps in predicting diseases using
Used for lane detection, traffic sign
patient data and reports.
recognition, obstacle detection, and
decision making. 5. Stock Market and Sales Prediction
ML is used for forecasting market trends
5. Healthcare
and business sales.
Used for disease prediction, cancer
detection from scans, and personalized Applications of Deep Learning:
treatment.
1. Image Recognition
6. Recommendation Systems Used for face recognition, object
Used by YouTube, Netflix, Amazon for detection, and medical image analysis.
recommending videos, movies, and
products. 2. Speech Recognition
Used in voice assistants like Alexa, Siri,
7. Fraud Detection and Cyber Security and speech-to-text systems.
Used to detect unusual patterns in
transactions and identify cyber attacks. 3. Natural Language Processing (NLP)
Used in chatbots, translation, sentiment
analysis, and text summarization.
4. Self-Driving Cars
Used for detecting lanes, vehicles,
pedestrians, and decision making.
5. Video and Content Recommendation
Used by YouTube and Netflix for
personalized recommendations.

15
Q31) Explain Face Recognition using Deep
Learning approach
Face recognition using Deep Learning is a
technique where a neural network automatically
learns facial features from images and identifies
or verifies a person. Deep learning provides high
accuracy because it can extract complex patterns
like eyes, nose shape, distance between facial
points, and overall face structure.
Working of Face Recognition using Deep
Learning:
1. Face Detection
First, the face is detected from an image or
video frame. Deep learning models like
MTCNN or Haar Cascade can locate the
face region.
2. Preprocessing
The detected face is cropped and resized
to a fixed size. Steps like normalization,
alignment (straightening face), and noise
removal are done to improve accuracy.
3. Feature Extraction using CNN
A Convolutional Neural Network (CNN) is
used to extract important facial features.
The CNN converts the face image into a
feature vector (embedding) which
represents unique face characteristics.
Example models: FaceNet, VGG-Face,
DeepFace.
4. Face Embedding Comparison
The generated embedding is compared
with stored embeddings in database.
Similarity is measured using distance
methods like Euclidean distance or cosine
similarity.
5. Recognition / Classification
If the distance between embeddings is
small, the face is matched and identified.
Otherwise, it is treated as unknown
person.
Advantages of Deep Learning Face Recognition:
1. High accuracy compared to
traditional methods.
2. Works well even with
different lighting, angles,
and expressions.

Understanding Artificial Neural Networks
No ratings yet
Understanding Artificial Neural Networks
34 pages
Understanding Computational Units in ANN
No ratings yet
Understanding Computational Units in ANN
48 pages
Artificial Neural Network Unit1
No ratings yet
Artificial Neural Network Unit1
20 pages
Neural Network Architecture Explained
No ratings yet
Neural Network Architecture Explained
17 pages
CNN Notes
No ratings yet
CNN Notes
13 pages
Unit 1 Notes Fundamentals of Deep Learning
No ratings yet
Unit 1 Notes Fundamentals of Deep Learning
29 pages
Understanding Neuron Models in Neural Networks
No ratings yet
Understanding Neuron Models in Neural Networks
4 pages
ANN Notes Ganesh Dethe
No ratings yet
ANN Notes Ganesh Dethe
30 pages
Feed Forward Neural Networks Explained
No ratings yet
Feed Forward Neural Networks Explained
19 pages
Information Flow in Feed Forward Networks
No ratings yet
Information Flow in Feed Forward Networks
41 pages
Unit-II NNDL New
No ratings yet
Unit-II NNDL New
28 pages
Introduction to Neural Networks Basics
No ratings yet
Introduction to Neural Networks Basics
7 pages
Understanding Perceptrons and Neural Networks
No ratings yet
Understanding Perceptrons and Neural Networks
9 pages
Introduction to Neural Networks Basics
No ratings yet
Introduction to Neural Networks Basics
48 pages
Neural Network Chapter-1 Introduction To Neural Network
No ratings yet
Neural Network Chapter-1 Introduction To Neural Network
21 pages
DL Notes
No ratings yet
DL Notes
27 pages
Neural Networks for Big Data Explained
No ratings yet
Neural Networks for Big Data Explained
8 pages
Deep Learning: Neural Networks Overview
No ratings yet
Deep Learning: Neural Networks Overview
14 pages
Understanding Artificial Neural Networks
No ratings yet
Understanding Artificial Neural Networks
54 pages
Understanding Artificial Neural Networks
No ratings yet
Understanding Artificial Neural Networks
39 pages
Understanding Neural Networks Basics
No ratings yet
Understanding Neural Networks Basics
10 pages
ML Chapter 3
No ratings yet
ML Chapter 3
29 pages
Understanding Deep Learning Basics
No ratings yet
Understanding Deep Learning Basics
68 pages
Neural Networks Explained for Class 12
100% (1)
Neural Networks Explained for Class 12
11 pages
Understanding Deep Learning Basics
No ratings yet
Understanding Deep Learning Basics
12 pages
Basics of Neural Networks Explained
No ratings yet
Basics of Neural Networks Explained
26 pages
MLDL Module 3-2
No ratings yet
MLDL Module 3-2
36 pages
Multi-Layer Perceptron Overview and Learning
No ratings yet
Multi-Layer Perceptron Overview and Learning
39 pages
Understanding Feedforward Neural Networks
No ratings yet
Understanding Feedforward Neural Networks
7 pages
Radial Basis Functions and Learning Types
No ratings yet
Radial Basis Functions and Learning Types
7 pages
Neural Networks in Machine Learning
No ratings yet
Neural Networks in Machine Learning
63 pages
What Is Neural Network-1
No ratings yet
What Is Neural Network-1
7 pages
Understanding Artificial Neural Networks
No ratings yet
Understanding Artificial Neural Networks
75 pages
Unit 5-4
No ratings yet
Unit 5-4
7 pages
Understanding Deep Learning Basics
No ratings yet
Understanding Deep Learning Basics
17 pages
Introduction to Artificial Neural Networks
100% (1)
Introduction to Artificial Neural Networks
19 pages
Machine Learning Unit 1 Overview
No ratings yet
Machine Learning Unit 1 Overview
20 pages
Unit 3 Notes - Deep Learning
No ratings yet
Unit 3 Notes - Deep Learning
28 pages
Neural Networks: Classification & Training
No ratings yet
Neural Networks: Classification & Training
46 pages
Neural Networks: Supervised Learning Basics
No ratings yet
Neural Networks: Supervised Learning Basics
17 pages
Neural Networks and Genetic Algorithms Overview
No ratings yet
Neural Networks and Genetic Algorithms Overview
25 pages
Neuron Schematic in Neural Networks
No ratings yet
Neuron Schematic in Neural Networks
10 pages
Neural Network CP25C11
No ratings yet
Neural Network CP25C11
52 pages
Understanding Artificial Neural Networks
No ratings yet
Understanding Artificial Neural Networks
37 pages
Neural Networks Explained: Types & Functions
No ratings yet
Neural Networks Explained: Types & Functions
7 pages
Activation Functions for Multi-Class Output
No ratings yet
Activation Functions for Multi-Class Output
21 pages
Fundamentals of Deep Learning Concepts
No ratings yet
Fundamentals of Deep Learning Concepts
58 pages
Neural Networks and Fuzzy Logic Overview
50% (2)
Neural Networks and Fuzzy Logic Overview
54 pages
Understanding Artificial Neural Networks
No ratings yet
Understanding Artificial Neural Networks
59 pages
Historical Trends in Deep Learning
No ratings yet
Historical Trends in Deep Learning
6 pages
Neural Network Classification Techniques
No ratings yet
Neural Network Classification Techniques
115 pages
Comparing Biological and Artificial Neurons
No ratings yet
Comparing Biological and Artificial Neurons
13 pages
DL@UNIT2
No ratings yet
DL@UNIT2
16 pages
Se NN
No ratings yet
Se NN
15 pages
Introduction to Neural Networks and Models
No ratings yet
Introduction to Neural Networks and Models
25 pages
Neuronal Communication and Action Potentials
No ratings yet
Neuronal Communication and Action Potentials
55 pages
Nervous Tissue: Structure & Functions Explained
No ratings yet
Nervous Tissue: Structure & Functions Explained
18 pages
Lithium Rescues ASD in Dyrk1A Mice
No ratings yet
Lithium Rescues ASD in Dyrk1A Mice
13 pages
Overview of Biology and Cell Structure
No ratings yet
Overview of Biology and Cell Structure
12 pages
Class 10 Control and Coordination Notes
88% (34)
Class 10 Control and Coordination Notes
9 pages
Adolescent Development True/False Quiz
100% (1)
Adolescent Development True/False Quiz
3 pages
Overview of the Musculoskeletal System
No ratings yet
Overview of the Musculoskeletal System
6 pages
Cambridge IGCSE Biology Mark Scheme 2019
No ratings yet
Cambridge IGCSE Biology Mark Scheme 2019
12 pages
Understanding Feedback Mechanisms in Homeostasis
No ratings yet
Understanding Feedback Mechanisms in Homeostasis
14 pages
Hopfield's 1982 Neural Networks Model
No ratings yet
Hopfield's 1982 Neural Networks Model
5 pages
Cognitive Neuroscience: Brain Anatomy Overview
No ratings yet
Cognitive Neuroscience: Brain Anatomy Overview
12 pages
Vibroacoustic Effects on Brain Health
No ratings yet
Vibroacoustic Effects on Brain Health
35 pages
Nearpeer & PMDC Mdcat Syllabus
100% (1)
Nearpeer & PMDC Mdcat Syllabus
40 pages
Human Physiology: Key Systems Explained
No ratings yet
Human Physiology: Key Systems Explained
7 pages
Stroke Recovery and Brain Repair Insights
No ratings yet
Stroke Recovery and Brain Repair Insights
22 pages
Deep Learning Overview for EE414
No ratings yet
Deep Learning Overview for EE414
42 pages
Health Risks of Weak EM Fields
No ratings yet
Health Risks of Weak EM Fields
29 pages
Elective Courses in Biomedical and Engineering
No ratings yet
Elective Courses in Biomedical and Engineering
73 pages
Neurobiology of Pain Mechanisms Explained
100% (4)
Neurobiology of Pain Mechanisms Explained
31 pages
Symbols of Church and Kingdom A Study in Early Syriac Tradition 2nd Edition Murray Robert Full Ebook Access
100% (1)
Symbols of Church and Kingdom A Study in Early Syriac Tradition 2nd Edition Murray Robert Full Ebook Access
37 pages
Understanding Nerve Impulses and Synapses
No ratings yet
Understanding Nerve Impulses and Synapses
30 pages
NEET 2025 Part Test Syllabus Overview
No ratings yet
NEET 2025 Part Test Syllabus Overview
6 pages
Nervous System Overview by Dongo SHEMA
No ratings yet
Nervous System Overview by Dongo SHEMA
21 pages
Understanding Brain Functions for Learning
100% (1)
Understanding Brain Functions for Learning
33 pages
Nervous System Overview and Functions
No ratings yet
Nervous System Overview and Functions
21 pages
Benefits of Scalar Energy for Health
No ratings yet
Benefits of Scalar Energy for Health
10 pages
Brain Maturation: Imaging Insights
No ratings yet
Brain Maturation: Imaging Insights
13 pages
Rough Set Theory for Dimensionality Reduction
No ratings yet
Rough Set Theory for Dimensionality Reduction
12 pages
The Developing Brain
No ratings yet
The Developing Brain
8 pages
Integration of Animal Body Systems
No ratings yet
Integration of Animal Body Systems
44 pages

DL Micro

Uploaded by

DL Micro

Uploaded by

1. Distinguish between Supervised and 2.

Explain Bayes Rule for classification / Bayes

• Supervised Learning: Learns mapping (P(X)) = Evidence (probability of data X)

5. Examples Naive Bayes is a popular Bayes classifier that

6. Algorithms In spam detection:

• Supervised Learning: Linear Regression, • Class = {Spam, Not Spam}

• Maintains memory of previous inputs

4. Output Layer 5. Model Complexity

• Mean Squared Error (MSE) for regression 7. Training Time

5. Flattening Layer Pros of Average Pooling:

4) Momentum Optimizer 3. Strongly penalizes incorrect predictions,

You might also like