Deep Learning: Unit - 2 (One Shot)
1. History of Deep Learning
Definition: It is a subset of ML, focuses on algorithms inspired by the structure and function of
the brain, known as ANN (Artificial Neural Networks).
Timeline:
o 1943: First conceptualization of artificial neurons by McCulloch.
o 1950s - 60s: Initial neural network models (e.g., perceptrons) were introduced by Frank
Rosenblatt.
o 1980s: Backpropagation algorithm was reintroduced, reviving interest in neural
networks.
o 1990s: Evolution in neural network and initiation of deep neural networks.
o 2006: Concept of Deep Belief Networks (DBNs), leading to a start of deep learning.
o 2012: Deep learning showed the potential in image recognition tasks.
o 2014: GANs (Generative Adversarial Networks) were introduced.
o 2015 - present: Widespread adoption of deep learning across various industries (e.g.,
NLP, automation, autonomous vehicles, etc.).
2. Probabilistic Theory of Deep Learning
The probabilistic theory of deep learning applies probabilistic methods to model uncertainty and deal
with incomplete or noisy data.
Key Aspects:
1. Bayesian Neural Networks: These incorporate probability distributions over the weights,
leading to more robust models.
2. Gaussian Processes: A probabilistic method used to distributions over functions.
3. Likelihood Estimation: Models are trained to maximize the likelihood of observing the training
data.
4. Markov Chain Monte Carlo (MCMC): Used to sample from distributions when direct
calculation is di icult.
5. Maximum Likelihood Estimation (MLE): A method for estimating parameters of a probabilistic
model.
Applications: Image recognition, Speech recognition, Text generation, Autonomous driving.
Advantages: Handles uncertainty well; can make predictions even with missing data.
Disadvantages: Computationally expensive; requires a large amount of data for e ective
training.
3. Backpropagation and Regularization
Backpropagation
It is a supervised learning algorithm used to minimize the error in a neural network by adjusting the
weights.
Steps:
1. Input the data.
2. Feedforward pass.
3. Error computation.
4. Backpropagation of error.
5. Updation of weights and biases.
Regularization
These techniques are used to prevent overfitting by adding a penalty term to the loss function.
Types:
1. L1 Regularization (Lasso): Adds a penalty proportional to the absolute value of the
weights.
2. L2 Regularization (Ridge): Adds a penalty proportional to the square of weights.
Advantages: Prevents overfitting; improves generalization of deep neural networks; helps in
reducing variances in large models.
Disadvantages: Increases computational complexity; may underfit if not tuned properly.
4. Batch Normalization & VC Dimensions
Batch Normalization
It normalizes the activations of each layer in a neural network to improve training speed and stability.
Working:
1. Normalize each layer activations by subtracting the mean and dividing by the standard
deviation.
2. Scale and shift the normalized output by learnable parameters.
3. Apply this during both training and testing phases.
Advantages: Reduces internal covariate shift; speeds up training; helps mitigate overfitting.
Disadvantages: May not work well for small datasets; adds extra computational overhead.
VC Dimensions (Vapnik-Chervonenkis Dimension)
It is a measure of the capacity of a statistical model, defined by the maximum number of points that
can be shattered (classified correctly) by the model.
Steps:
1. Shattering: A dataset is said to be shattered if the model can perfectly classify all
possible combinations of labels for the data points.
2. VC Dimensions: The highest number of points that can be shattered defines the VC
dimension.
Applications: Understanding model complexity, used in regularization techniques, helps in
model selection, determines generalization ability.
5. Deep Neural Networks vs. Shallow Networks
Feature Deep Neural Networks Shallow Networks
Layers Multiple hidden layers. Single or two hidden layers only.
Capacity Higher capacity for complex tasks. Lower capacity.
Training Time Longer training time. Faster training time.
Feature Extraction Hierarchical feature extraction. Limited feature extraction.
Cost High computation cost. Low computation cost.
Regularization Lower with regularization. Higher without regularization.
Non-linearity Can model complex non-linearities. Limited ability for non-linearity.
Problem Type Higher for complex problems. Better for simple problems.
Interpretability Hard to interpret. Easy to interpret.
Convergence Slower convergence. Faster convergence.
6. CNN, GAN, and Semi-supervised Learning
CNN (Convolutional Neural Network)
Specialized deep learning models designed for processing structured grid data, like images.
Layers:
1. Convolution Layer: Apply filters to the input image to detect features (feature
mapping).
2. Pooling Layer: Perform downsampling to reduce dimensionality.
3. Fully Connected Layer: Connect every neuron to every neuron in the next layer (also
called dense layer).
4. Flatten Layer: Reduced feature map presentation and output probabilities for
classification.
Applications: Image recognition, Object detection, Video classification, Medical image
analysis.
GAN (Generative Adversarial Networks)
A class of ML models where two networks (Generator and Discriminator) are trained together,
competing against each other.
Steps:
1. Generator: Generates fake data based on random input (noise).
2. Discriminator: Tries to di erentiate between real and generated data.
3. Both networks improve over time as they try to outsmart each other.
Applications: Image generation, Style transfer, Video generation, Data augmentation.
Semi-supervised Learning
Uses both labeled and unlabeled data to train models, providing a middle ground between supervised
and unsupervised learning.
Example: Improving image recognition using a few labeled images and many unlabeled ones.
Advantages: Reduces the need for labeled data; can improve performance when labeled data
is scarce.
Disadvantages: Requires careful selection of unlabeled data; di icult to ensure accuracy
without proper labeling.
Applications: Image classification with limited data, Speech recognition, NLP, Medical
diagnosis.