0% found this document useful (0 votes)
9 views40 pages

DL Lecture 18 Autoencoders

The document discusses autoencoders, a type of neural network architecture used for dimensionality reduction and data compression by encoding input data into a lower-dimensional space and then reconstructing it. It highlights the advantages of autoencoders over traditional methods like PCA, especially in handling non-linear data, and details their structure, applications, and various types such as denoising and variational autoencoders. Additionally, it covers the importance of hyperparameters and loss functions in optimizing autoencoder performance.

Uploaded by

realpokemonfan29
Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views40 pages

DL Lecture 18 Autoencoders

The document discusses autoencoders, a type of neural network architecture used for dimensionality reduction and data compression by encoding input data into a lower-dimensional space and then reconstructing it. It highlights the advantages of autoencoders over traditional methods like PCA, especially in handling non-linear data, and details their structure, applications, and various types such as denoising and variational autoencoders. Additionally, it covers the importance of hyperparameters and loss functions in optimizing autoencoder performance.

Uploaded by

realpokemonfan29
Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

AUTOENCODERS

Understanding Dimensionality Reduction


❑Dimensionality reduction is an
essential technique that helps simplify
high-dimensional data while retaining
the most critical features.
❑Transformation is from high
dimensional space to a low-
dimensional space while retaining
meaningful properties of the original
data.
❑This helps in dealing with curse of
dimensionality.
Popular Methods
❑Principal Component Analysis (PCA)
❑T-SNE
❑Linear discriminant Analysis (LDA)
PCA overview
PCA helps in reducing the original
data representation by transforming it
into low-dimensional space.
PCA represents data into principle
components where PC1 retains more
information than PC2.

This helps in significantly reducing the


data size and thus eliminating curse of
dimensionality!
Issue with PCA

Suppose we have non-linear representation


of data
Issue with PCA

In this case PCA fails to


transform the data effectively
thus resulting in meaningless
transformation!

Before PCA After PCA


Tackling non-linear data reduction

For non-linear data


representation researchers
introduced several methods like-
❑Non-linear PCA/ kernel PCA
❑Autoencoders

Before After
Here PCA fails to capture non-
linear relationship, which
autoencoder can easily capture!!
Introduction to Autoencoders
❑Introduced in 1980s
❑They are a type of Neural N/W architecture
designed to efficiently compress (encode)
input data down to its essential features, then
reconstruct (decode) the original input from
this compressed representation.
❑Most traditional application was
dimensionality reduction!
❑They are part of unsupervised learning
Applications

In today’s time autoencoders are primarily


used for Data Compression.

Image compression
Image Reconstruction
Generative tasks
Understanding autoencoders

Autoencoders consists of 3 parts-


❑Encoder
❑Bottleneck layer
❑Decoder
Encoder
The encoder is a set of convolutional
blocks followed by pooling modules
that compress the input to the model
into a compact section.
The module compresses the train-
validate-test set input data into an
encoded representation that is
typically several orders of magnitude
smaller than the input data.
Bottleneck

The bottleneck (or “code”) contains


the most compressed representation
of the input: it is both the output
layer of the encoder network and the
input layer of the decoder network.
Decoder
The decoder comprises hidden layers
with a progressively larger number of
nodes that decompress (or decode) the
encoded representation of data,
ultimately reconstructing the data back to
its original, pre-encoding form.
Checking efficiency of Autoencoder
To guage the efficiency of autoencoders we
calculate Reconstruction error/loss.
Loss is calculated by the following equation
where-
Theta- parameters of encoder
Phi- parameters of decoder
we are summing up the difference between
the original image, x`, and the
reconstructed image Fθ (gφ(x`)).
Selecting model architecture in
autoencoder
Model architecture in autoencoder is decided based on the tasks and different data types. We can
build autoencoder using any Neural Network like-
❑CNN based architecture
❑RNN based architecture
❑Or a simple vanilla feed-forward NN.

However the design of autoencoder requires certain parameters!


Autoencoder Hyperparameter
❑Code size - The size of the bottleneck determines how much the data is to be
compressed. The code size can also be used a regularization term: adjustments to code
size are one way to counter overfitting or underfitting.
❑Number of layers- The depth of the autoencoder is measured by the number of layers
in the encoder and decoder. More depth provides greater complexity, while less depth
provides greater processing speed.
Autoencoder Hyperparameter
❑Number of Nodes per layer- Generally, the number of nodes (or “neurons”) decreases
with each encoder layer, reaches a minimum at the bottleneck, and increases with each
layer of the decoder layer. The number of neurons may also vary per the nature of input
data: for example, an autoencoder dealing with large images would require more
neurons than one dealing with smaller images.
❑Loss Function- When training an autoencoder, the loss function—which measures
reconstruction loss between the output and input—is used to optimize model weights
through gradient descent during backpropagation. The ideal algorithm(s) for the loss
function depends on the task the autoencoder will be used for.
Working
Suppose we have a input image –
size (28 x 28)

Image needs to be flattened before


feeding it to neural network.
Working
❖Flattened image would be (784,)
❖This is passed in Encoder
❖The output of the encoder is then
fed to the bottleneck or latent
space which should be a reduced
version,
❖for instance, if the amount nodes
in the latent space is 8, it simply
means we have succeeded in
compressing an image of size 784 to
just 8 nodes!!!
Working
❖The decoder network then tries
to recreate the original (28 x 28)
input image from the compressed
state in the bottleneck.
❖As soon as image is reconstructed
you compare the reconstructed
image with the original image,
compute the difference, and
calculate the loss which can then be
minimized.
Using reconstruction loss for anomaly
detection!
One more application of Autoencoders is
in detecting outliers/ anomaly!
Example
Suppose an AutoEncoder is trained
on a dataset of 1000 data points
with a two-dimensional circle
distribution.
After training, the reconstruction error for each data point is calculated using MSE (Mean
Squared error) . Let’s assume a threshold for outlier detection is set at 1.5 times the standard
deviation of the reconstruction errors.
❑Mean Reconstruction Error (MSE): 0.007 (hypothetical value)
❑Standard Deviation of Reconstruction Errors: 0.003 (hypothetical value)
To detect outlier
Threshold Calculation:
❑Threshold=Mean+1.5×Standard deviation
❑Threshold=0.007+1.5×0.003
❑Threshold=0.0125

Identifying Outlier
Data points with reconstruction errors greater than 0.0125 would
be considered outliers.
The outliers are thus detected in
this case if data points
reconstruction error is greater
than the threshold!
Variants of Autoencoders
❑Denoising Autoencoder
❑Sparse Autoencoder
❑Variational Autoencoder
Denoising Autoencoder
Denoising autoencoder works on a
partially corrupted input and trains
to recover the original undistorted
image. As mentioned above, this
method is an effective way to
constrain the network from simply
copying the input and thus learn
the underlying structure and
important features of the data.
Sparse Autoencoder
In a standard autoencoder, the network
learns to encode and decode data
without any constraints on the hidden
layer's activations. Sparse autoencoders
modify this behavior by adding a
sparsity constraint, which forces the
hidden units to activate only a small
number of neurons at a time. This
encourages the network to discover
more meaningful and interpretable
features.
Sparse Autoencoder
The sparsity constraint can be enforced through various techniques:

1.L1 Regularization: Adds a penalty proportional to the absolute values of the weights.

[Link] Divergence: Measures the difference between the average activation of the hidden
neurons and a target sparsity level.
The objective function for a sparse
autoencoder can be expressed as
follows where-
•X: Input data.

•X^: Reconstructed output.

•λ: Regularization parameter.

•Penalty(s): A function that penalizes


deviations from sparsity, often
implemented using KL-divergence.
Variational Autoencoder (VAE)
❖Used for Data generation
❖Basic idea behind the VAE is that instead
of mapping an input to a fixed vector, the
input is mapped to a distribution.
❖The primary difference between AE and
VAE is that the bottleneck of the VAE is
continuous and replaced by two separate
vectors; one representing the means of
the distribution, and the other
representing the standard deviations of
the distribution.
Variational Autoencoder (VAE)
❖The loss function of the VAE is defined by
two terms, the reconstruction loss and the
regularizer which is essentially a KL
divergence between the encoder’s
distribution and the latent space.
Variational Autoencoder (VAE)
❖KL divergence-
Kulback-Leibler Divergence (D_KL for
short) is a measure of how one probability
distribution is different from the other. For
the discrete probability distribution P and
Q, the KL divergence between and P and Q
is defined as:
working
Variational autoencoder uses KL-
divergence as its loss function, the
goal of this is to minimize the
difference between a supposed
distribution and original distribution of
dataset.

Suppose we have a distribution z and


we want to generate the observation
x from it. In other words, we want to
calculate p(z∣x)
working
We can do it by following way:
p(z∣x)=p(x∣z)p(z)p(x)

But, the calculation of p(x) can be


quite difficult
Hence, we need to approximate p(z|x)
to q(z|x) to make it a tractable
distribution. To better approximate
p(z|x) to q(z|x), we will minimize the
KL-divergence loss which calculates
how similar two distributions are:
working
By simplifying, the above
minimization problem is equivalent to
the following maximization problem :

The first term represents the


reconstruction likelihood and the
other term ensures that our learned
distribution q is similar to the true
prior distribution p.
Thus our total loss consists of two
terms, one is reconstruction error and
other is KL-divergence loss:
VAE is significantly used in synthetic data
generation

You might also like