Deep Learning
Dr. Arundhati Das
©Usage of
these
slides on
any media
without
permission
of Dr
Arundhati
Image Courtesy: Internet Das is
strictly
prohibited
Module III: Autoencoders: Unsupervised
Learning
3.1 Introduction, Linear Autoencoder, Undercomplete
Autoencoder, Overcomplete Autoencoders, Regularization in
Autoencoders
3.2 Denoising Autoencoders, Sparse Autoencoders,
Contractive Autoencoders
3.3 Application of Autoencoders: Image Compression
Introduction: Auto-encoders
• An autoencoder is a special type of deep feed forward neural
network which does the following:
• Encodes its input x into a hidden representation h
• Decodes the input again from this hidden representation
• The model is trained to minimize a certain loss function
which will ensure that x’ is close to x
• Basically, an autoencoder contains an encoder and
decoder. These two parts function automatically and
give rise to the name “autoencoder”.
• The basic idea behind autoencoders is to encode the input
data into a lower-dimensional representation (i.e. called
latent space) and then decode it back into the original
format, with the objective of minimizing the reconstruction
error.
• An autoencoder is an unsupervised learning algorithm that
applies backpropagation setting the target values to be equal
to the inputs.
• Applications: dimensionality reduction, data compression as
well as for data reconstruction (data denoising) tasks. x h x’
3
Architecture and components of auto-
encoders (encoder and decoder)
• Autoencoders are simple network, where their
output (target feature) is their input.
• Their goal is to learn how to reconstruct the
input-data.
• The first part of the network is what we refer to
as the Encoder.
• It receives the input and it encodes it in a latent
space of a lower dimension.
• The second part (the Decoder) takes that vector
and decode it in order to produce the original
input.
4
Architecture-encoder, latent space,
decoder
• ENCODER:
• Encoding is achieved by the encoder part of the
network which has a decreasing number of hidden
units in each layer.
• In this way, this part is forced to pick up only the
most significant and representative features of the
data.
• We can implement this phenomenon by connecting
a series of pooling layers, each one reducing the
number of dimensions that are present in the data.
• LATENT SPACE:
• Thus, encoder transforms high-dimensional input
into lower-dimension (latent state, where the input is
more compressed).
5
Architecture-encoder, latent space,
decoder
• The latent vector in the middle is
important and crucial, as it is
a compressed representation of the input.
• It gives plenty of applications for
compression and dimensionality
reduction.
• DECODER:
• The latent vector can now further be used
to reproduce the same but slightly
different or better data. This gives rise to
applications for data denoising and data
augmentation.
6
Components of auto-encoders
• Autoencoder basically comprises of the components called of
encoder, the decoder, latent space and Loss function.
• 1. Encoder: The input data is first passed through an encoder
network, which consists of one or more layers of neurons.
These layers progressively reduce the dimensionality of the
data, creating a compressed representation (latent space) of
the input. The last layer of the encoder typically has fewer
neurons than the input layer, forcing it to capture essential
features and patterns in the data.
• 2. Latent Space: The compressed representation in the latent
space is a lower-dimensional representation of the input data.
This representation should ideally capture the most salient
features of the data. This is also known as bottleneck or code.
• 3. Decoder: The compressed representation is then passed through a decoder network, which aims to
reconstruct the original input from the compressed representation. Like the encoder, the decoder network
consists of one or more layers, and the final layer's output should match the input data's dimensions.
• 4. Loss Function: The performance of the autoencoder is evaluated using a loss function, which quantifies how well
the reconstructed output matches the input. Common loss functions include mean squared error (MSE) or cross-
entropy, depending on the nature of the data.
• A smaller loss means the autoencoder is learning to represent data more accurately.
7
Undercomplete Autoencoder ( )
• It is an autoencoder where the hidden layer has fewer units than the input
layer. The model compresses the input data into a lower-dimensional space
and then attempts to reconstruct the original input from this compressed
representation.
• If we are able to reconstruct perfectly from h, then h can be termed as loss-
free encoding of ; meaning h can capture all the characteristics of
Overcomplete Autoencoder ( )
• It is an autoencoder where the hidden layer has more units than the
input layer. The model expands the input data into a higher-
dimensional space, which allows for a potentially richer and detailed
representation of the data.
• Overcomplete autoencoders do make sense, but only when
combined with constraints that stop them from just copying the
input. With proper regularization, they can capture more nuanced,
high-dimensional structure in the data than an undercomplete
one.
• Encourage only a few neurons in the latent vector to be active for
any given input.
• This forces the network to learn a distributed, compressed-like
representation.
Auto-encoder advantages
while doing data compression
• Considering the applications for data-compression, autoencoders are preferred over
PCA.
• PCA makes one stringent but powerful assumption that is linearity i.e. there must
be linearity in the data set; which is not the case in real-life datasets.
• PCA is linear because it can only represent data transformations as linear combinations of the
original features
• However, an autoencoder can learn non-linear transformations with a non-linear
activation function and multiple layers.
• It can make use of pre-trained layers from another model to apply transfer learning
to enhance the encoder/decoder.
10
Autoencoder vs PCA
1. A type of neural network trained to 1. A linear statistical method that
learn an efficient compressed projects data into a lower-
representation (encoding) of the input dimensional space while preserving
data, and then reconstruct it. as much variance as possible.
2. Can model non-linear relationships 2. Always linear, transforms data
between features (if using non-linear using orthogonal basis vectors
activations like ReLU, Sigmoid, Tanh). (principal components).
3. Requires training using 3. No training, computed directly
backpropagation to minimize using eigen decomposition or SVD.
reconstruction error. 4. Produces principal components
4. Produces learned features (codes) in (uncorrelated features) ranked by
the bottleneck layer. variance explained.
5. More computationally expensive 5. Less computationally expensive
(gradient descent, multiple epochs). (closed-form solution).
Note: A linear autoencoder with no activation function and mean squared error loss will learn the same subspace
as PCA.
Q1.
Q2.
Q3.
Q4.
Choice of Activation functions
• Choice of f(xi) and g(xi)
Q