0% found this document useful (0 votes)
4 views11 pages

Module 4 - Autoencoders (Ae) & Variational Autoencoders (Vae)

The document provides an overview of Autoencoders (AE) and Variational Autoencoders (VAE), detailing their architecture, types, and applications. It explains the key components such as the encoder, decoder, and loss functions, as well as various types of autoencoders including undercomplete, overcomplete, denoising, sparse, and variational. Additionally, it discusses regularization techniques to prevent overfitting and improve feature learning.
Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views11 pages

Module 4 - Autoencoders (Ae) & Variational Autoencoders (Vae)

The document provides an overview of Autoencoders (AE) and Variational Autoencoders (VAE), detailing their architecture, types, and applications. It explains the key components such as the encoder, decoder, and loss functions, as well as various types of autoencoders including undercomplete, overcomplete, denoising, sparse, and variational. Additionally, it discusses regularization techniques to prevent overfitting and improve feature learning.
Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

MODULE 4 — AUTOENCODERS (AE) & VARIATIONAL

AUTOENCODERS (VAE)
Prepared from your uploaded lecture slides and notes.

1. Introduction to Autoencoders
An Autoencoder (AE) is a special type of feedforward neural network used for:
● feature learning
● dimensionality reduction
● data compression
● denoising
Goal:
● reconstruct the input at output layer.

Basic Structure of Autoencoder


Autoencoder contains:
. Encoder
. Latent Representation (Hidden Layer)
. Decoder

Encoder
Encoder converts input into compressed representation.
Formula:
h=g(Wx+b)
Where:
● (x) = input
● (W) = weights
● (b) = bias
● (h) = hidden representation

Decoder
3
Decoder reconstructs original input from hidden representation.
2
Formula:
1
\hat{x}=f(W^{*}h+c)
Where:
● (\hat{x}) = reconstructed input
● (W^*) = decoder weights
● (c) = bias

Working of Autoencoder
Step 1
Input data fed to encoder.
Step 2
Encoder compresses input.
Step 3
Hidden layer stores important features.
Step 4
Decoder reconstructs original data.
Step 5
Loss calculated between original and reconstructed data.

2. Types of Autoencoders
. Undercomplete Autoencoder
. Overcomplete Autoencoder
. Denoising Autoencoder
. Sparse Autoencoder
. Variational Autoencoder

3. Undercomplete Autoencoder
Condition:
[
dim(h) < dim(x)
]
Meaning:
hidden layer smaller than input layer.

Advantages
. Learns compressed representation
. Removes redundancy
. Similar to PCA

Important Point
If reconstruction is still accurate:
● hidden representation captures important characteristics of data.

Relation with PCA


Undercomplete AE behaves similar to:
Principal Component Analysis (PCA)
Both perform dimensionality reduction.
Difference:
● PCA is linear
● Autoencoder can learn non-linear features

4. Overcomplete Autoencoder
Condition:
[
dim(h) \ge dim(x)
]
Meaning:
hidden layer equal or larger than input.
Problem
Model may simply copy:
[
x \rightarrow h \rightarrow \hat{x}
]
This is called:
Identity Mapping

Disadvantage
Does not learn meaningful features.
Needs:
Regularization

5. Choice of Activation Functions

For Binary Inputs


Best decoder activation:
Logistic/Sigmoid Function
Because output remains between:
[
0 \text{ and } 1
]

Sigmoid Formula
\sigma(x)=\frac{1}{1+e^{-x}}

For Real Valued Inputs


Use:
Linear Activation
Formula:
\hat{x}=W^{*}h+c
Reason:
real values are unrestricted.

6. Loss Function of Autoencoder


Goal:
make reconstructed output close to original input.

Mean Squared Error (MSE)


L=\frac{1}{m}\sum_{i=1}^{m}(\hat{x}_i-x_i)^2

Matrix Form
L=(\hat{x}-x)^T(\hat{x}-x)
Objective
Minimize reconstruction error.
Training done using:
Backpropagation

7. Regularization in Autoencoders
Regularization prevents:
● overfitting
● identity mapping
Especially important in overcomplete autoencoders.

L2 Regularization
Objective Function:
L=\frac{1}{m}\sum(\hat{x}-x)^2+\lambda ||\theta||^2
Where:
● (\lambda) = regularization parameter

Advantages
. Better generalization
. Reduces overfitting
. Controls weight magnitude

Tied Weights
Another regularization technique:
[
W^* = W^T
]
Meaning:
decoder weights are transpose of encoder weights.

Advantages
. Fewer parameters
. Better learning
. Reduced complexity

8. Denoising Autoencoder (DAE)


DAE intentionally corrupts input before training.
Goal:
reconstruct original clean input.

Working
Step 1
Add noise to input.
Step 2
Feed noisy input to encoder.
Step 3
Decoder reconstructs clean input.

Corrupted Input
[
\tilde{x}
]
Original Input:
[
x
]

Noise Addition
One method:
[
P(\tilde{x}{ij}=0|x{ij})=q
]
Meaning:
with probability (q), input becomes zero.

Another Method
Gaussian noise:
\tilde{x}=x+N(0,1)

Why Denoising Helps?


Instead of memorizing:
model learns actual structure and patterns.

Advantages of DAE
. Robust feature learning
. Better generalization
. Noise removal
. Improved feature extraction

Applications of DAE
. Image denoising
. Speech enhancement
. Music separation
. Pattern recognition

Important Observation
As corruption increases:
● filters become more meaningful
● but too much corruption lowers reconstruction quality.

9. Sparse Autoencoder
Sparse AE forces neurons to remain inactive most of the time.

Idea
Neuron activation should mostly remain near:
[
0
]

Average Activation
\hat{\rho}=\frac{1}{m}\sum_{i=1}^{m}h_l(x_i)
Where:
● (\hat{\rho}) = average activation

Sparsity Constraint
Desired sparsity:
[
\rho \approx 0
]
Typically:
[
\rho = 0.005
]

Sparsity Penalty
\Omega(\theta)=\sum_{l=1}^{k}\rho\log\frac{\rho}{\hat{\rho_l}}+(1-\rho)
\log\frac{1-\rho}{1-\hat{\rho_l}}

Advantages of Sparse AE
. Learns meaningful features
. Better representation learning
. Avoids trivial copying

10. Variational Autoencoder (VAE)


VAE is an advanced generative autoencoder.
Difference:
Instead of mapping input to fixed vector,
VAE maps input to:
Probability Distribution

Goal of VAE
. Learn latent distribution
. Generate new samples
Architecture of VAE
. Encoder
. Latent Distribution
. Sampling
. Decoder

Encoder in VAE
Encoder predicts:
● Mean ((\mu))
● Variance ((\Sigma))
of latent distribution.

Assumption
Latent variables follow:
Gaussian Distribution
[
N(0,I)
]

Latent Variable
[
z
]
is sampled from learned distribution.

Decoder in VAE
Decoder generates reconstructed sample from:
[
z
]

VAE Objective
Learn:
. Compression
. Generation

Loss Function of VAE


Two parts:
. Reconstruction Loss
. KL Divergence Loss

VAE Loss Function


L_i(\theta,\phi)=-E_{z\sim Q_\theta(z|x_i)}[\log P_\phi(x_i|z)]+KL(Q_\theta(z|
x_i)||P(z))
KL Divergence
Measures difference between:
● learned distribution
● actual distribution

Why KL Divergence?
Prevents encoder from memorizing each input separately.
Acts as:
Regularizer

Without Reconstruction Loss


Learned distribution may deviate from desired distribution.

Without KL Divergence
Model may cheat by learning narrow distributions.

With Both Terms


Model:
● reconstructs correctly
● learns smooth latent space

Applications of VAE
. Image generation
. Face generation
. Data augmentation
. Drug discovery
. Anomaly detection

Autoencoder vs Variational Autoencoder


Autoencoder Variational Autoencoder
Learns fixed encoding Learns probability distribution
Mainly compression Compression + generation
Deterministic Probabilistic
Cannot generate diverse samples Can generate new samples
easily
Simpler architecture More complex
Advantages of Autoencoders
. Dimensionality reduction
. Noise removal
. Feature learning
. Data compression
Applications of Autoencoders
. Image compression
. Recommendation systems
. Medical imaging
. Fraud detection
. Feature extraction

EXAM READY 5 MARKER ANSWERS

Q1. Explain Autoencoder Architecture.


Answer
An autoencoder is a feedforward neural network used to reconstruct input data.
Architecture contains:
. Encoder
. Hidden layer
. Decoder
Encoder compresses input:
h=g(Wx+b)
Decoder reconstructs input:
\hat{x}=f(W^{*}h+c)
The network minimizes reconstruction loss between input and output.
Applications:
● compression
● denoising
● feature learning

Q2. Differentiate Undercomplete and Overcomplete


Autoencoders.
Answer
Undercomplete AE Overcomplete AE
(dim(h)<dim(x)) (dim(h)\ge dim(x))
Learns compressed features May learn identity mapping
Better representation learning Needs regularization
Similar to PCA Risk of overfitting
Undercomplete autoencoders are generally more useful for feature extraction.

Q3. Explain Denoising Autoencoder.


Answer
A denoising autoencoder reconstructs clean data from noisy input.
Steps:
. Add noise to input.
. Feed noisy data to encoder.
. Decoder reconstructs original data.
Noise example:
\tilde{x}=x+N(0,1)
Advantages:
. Robust learning
. Better generalization
. Noise removal
Applications:
● image denoising
● speech enhancement

Q4. Explain Sparse Autoencoder.


Answer
Sparse autoencoder forces neurons to remain inactive most of the time.
Goal:
maintain sparse hidden representation.
Average activation:
\hat{\rho}=\frac{1}{m}\sum_{i=1}^{m}h_l(x_i)
Sparsity penalty is added to loss function.
Advantages:
. Meaningful features
. Better feature extraction
. Prevents trivial copying

Q5. Explain Variational Autoencoder (VAE).


Answer
VAE is a generative model that learns probability distributions instead of fixed
encodings.
Components:
. Encoder
. Latent distribution
. Sampling
. Decoder
Encoder predicts:
● mean
● variance
Loss function:
L_i(\theta,\phi)=-E_{z\sim Q_\theta(z|x_i)}[\log P_\phi(x_i|z)]+KL(Q_\theta(z|
x_i)||P(z))
Advantages:
. Generates new samples
. Smooth latent space
. Strong generative capability
Applications:
● image generation
● anomaly detection
Q6. Explain Regularization in Autoencoders.
Answer
Regularization prevents overfitting and identity mapping.
Methods:
. L2 regularization
. Tied weights
. Denoising
. Sparsity constraints
L2 regularization:
L=\frac{1}{m}\sum(\hat{x}-x)^2+\lambda ||\theta||^2
Advantages:
. Better generalization
. Reduced overfitting
. Improved feature learning

Q7. Explain Loss Function of VAE.


Answer
VAE loss contains two terms:
. Reconstruction Loss
. KL Divergence
Loss function:
L_i(\theta,\phi)=-E_{z\sim Q_\theta(z|x_i)}[\log P_\phi(x_i|z)]+KL(Q_\theta(z|
x_i)||P(z))
Reconstruction loss:
ensures accurate output reconstruction.
KL divergence:
ensures latent distribution remains close to normal distribution.
Together they help generate meaningful and diverse samples.

These notes are based on your uploaded lecture materials for Autoencoders
and Variational Autoencoders.

You might also like