Module # 3
Autoencoders:
Unsupervised Learning
The need for autoencoding stems from its ability to solve
various practical problems in machine learning and data
science, such as dimensionality reduction, denoising,
feature extraction, anomaly detection, and data
compression. Additionally, autoencoders can be used for
pretraining deep networks, generating new data (in the case
of VAEs), and capturing complex, non-linear relationships in
data. They are highly versatile tools with broad applications
across a range of domains, including computer vision, natural
language processing, time-series analysis, and more.
Introduction
► Autoencoders are a specific type of feedforward neural networks
where the input is the same as the output.
► They compress the input into a lower-dimensional code and then
reconstruct the output from this representation.
► The code is a compact “summary” or “compression” of the input,
also called the latent-space representation.
► An autoencoder consists of 3 components: encoder, code and
decoder.
► The encoder compresses the input and produces the code, the
decoder then reconstructs the input only using this code.
Introduction
► To build an autoencoder we need 3 things: an encoding method,
decoding method, and a loss function to compare the output with the
target.
► Autoencoders are mainly a dimensionality reduction (or compression)
algorithm with a couple of important properties:
► Data-specific: Autoencoders are only able to meaningfully compress
data similar to what they have been trained on. Since they learn
features specific for the given training data, they are different than a
standard data compression algorithm like gzip. So we can’t expect an
autoencoder trained on handwritten digits to compress landscape
photos.
Introduction
► Lossy: The output of the autoencoder will not be exactly the same as
the input, it will be a close but degraded representation. If you want
lossless compression they are not the way to go.
► Unsupervised: To train an autoencoder we don’t need to do anything
fancy, just throw the raw input data at it. Autoencoders are
considered an unsupervised learning technique since they don’t need
explicit labels to train on. But to be more precise they are
self-supervised because they generate their own labels from the
training data.
Introduction
Training Autoencoders
► When you're building an autoencoder, there are a few things to keep in
mind.
► First, the code or bottleneck size is the most critical hyperparameter to
tune the autoencoder. It decides how much data has to be compressed. It
can also act as a regularisation term.
► Secondly, it's important to remember that the number of layers is critical
when tuning autoencoders. A higher depth increases model complexity, but
a lower depth is faster to process.
► Thirdly, you should pay attention to how many nodes you use per layer.
The number of nodes decreases with each subsequent layer in the
autoencoder as the input to each layer becomes smaller across the layers.
► Finally, it's worth noting that there are two famous losses for
reconstruction: MSE Loss and L1 Loss.
Types of Autoencoders
► Linear
► Undercomplete
► Overcomplete
► Sparce
► Contractive
► Denoising
Linear Autoencoders
► A linear autoencoder is a type of autoencoder that uses only linear
transformations. In other words, the encoder and decoder are
composed of only linear layers.
► The advantage of using a linear autoencoder is that it is
computationally efficient and can be trained on large datasets.
Undercomplete Autoencoders
► Copying the input to the output may sound useless, but we are
typically not interested in the output of the decoder.
► Instead, we hope that training the autoencoder to perform the input
copying task will result in network taking on useful properties.
► One way to obtain useful features from the autoencoder is to
constrain ‘code’ to have smaller dimension than ‘i/p’.
► An autoencoder whose code dimension is less than the input
dimension is called undercomplete.
► Learning an undercomplete representation forces the autoencoder to
capture the most salient features of the training data.
Overcomplete
Autoencoders
► Overcomplete is a case in which
the hidden code has dimension
greater than the input.
► In these cases, even a linear
encoder and linear decoder can
learn to copy the input to the
output without learning anything
useful about the data
distribution.
Regularized Autoencoders
► Regularized autoencoders provide the ability to choosing the code
dimension and the capacity of the encoder and decoder.
► Rather than limiting the model capacity by keeping the encoder and
decoder shallow and the code size small, regularized autoencoders
use a loss function that encourages the model to have other
properties besides the ability to copy its input to its output.
► These other properties include sparsity of the representation,
smallness of the derivative of the representation, and robustness to
noise or to missing inputs.
► A regularized autoencoder can be nonlinear and overcomplete but
still learn something useful about the data distribution even if the
model capacity is great enough to learn a trivial identity function.
Denoising Autoencoders
► Keeping the code layer small comples the autoencoder to learn an
intelligent representation of the data.
► There is another way to force the autoencoder to learn useful
features, which is adding random noise to its inputs and making it
recover the original noise-free data.
► This way the autoencoder can’t simply copy the input to its output
because the input also contains random noise.
► We are asking it to subtract the noise and produce the underlying
meaningful data. This is called a denoising autoencoder.
Dr. Tatwadarshi P. N.
Denoising Autoencoders
► Advantages:
► This type of autoencoder can extract important features and reduce
the noise or the useless features.
► Denoising autoencoders can be used as a form of data augmentation,
the restored images can be used as augmented data thus generating
additional training samples.
► Disadvantages:
► Selecting the right type and level of noise to introduce can be
challenging and may require domain knowledge.
► Denoising process can result into loss of some information that is
needed from the original input. This loss can impact accuracy of the
output.
Sparce Autoencoders
► We introduced two ways to force the autoencoder to learn useful
features: keeping the code size small and denoising autoencoders.
The third method is using regularization.
► We can regularize the autoencoder by using a sparsity constraint such
that only a fraction of the nodes would have nonzero values, called
active nodes.
► In particular, we add a penalty term to the loss function such that
only a fraction of the nodes become active.
► This forces the autoencoder to represent each input as a combination
of small number of nodes, and demands it to discover interesting
structure in the data.
► This method works even if the code size is large, since only a small
subset of the nodes will be active at any time.
Sparce Autoencoders
► Advantages:
► The sparsity constraint in sparse autoencoders helps in filtering out
noise and irrelevant features during the encoding process.
► These auto-encoders often learn important and meaningful features
due to their emphasis on sparse activations.
► Disadvantages:
► The choice of hyperparameters play a significant role in the
performance of this autoencoder. Different inputs should result in the
activation of different nodes of the network.
► The application of sparsity constraint increases computational
complexity.
Contractive Autoencoders
► Contractive Autoencoder was proposed by the researchers at the
University of Toronto in 2011 in the paper Contractive auto-encoders:
Explicit invariance during feature extraction.
► The idea behind that is to make the autoencoders robust of small
changes in the training dataset.
► To deal with the above challenge that is posed in basic autoencoders,
the authors proposed to add another penalty term to the loss
function of autoencoders.
► the above penalty term is the Frobenius Norm of the encoder
Applications of Autoencoders
► 1. File Compression: Primary use of Autoencoders is that they can reduce
the dimensionality of input data which we in common refer to as file
compression. Autoencoders works with all kinds of data like Images,
Videos, and Audio, this helps in sharing and viewing data faster than we
could do with its original file size.
► 2. Image De-noising: Autoencoders are also used as noise removal
techniques (Image De-noising), what makes it the best choice for
De-noising is that it does not require any human interaction, once trained
on any kind of data it can reproduce that data with less noise than the
original image.
► 3. Image Transformation: Autoencoders are also used for image
transformations, which is typically classified under GAN(Generative
Adversarial Networks) models. Using these we can transform B/W images to
colored one and vice versa, we can up-sample and down-sample the input
data, etc.