0% found this document useful (0 votes)

5 views26 pages

Dimensionality Reduction in Machine Learning

Dimensionality reduction is a technique used to reduce the number of features in a dataset while preserving important information, addressing issues like overfitting and the curse of dimensionality in machine learning. It includes methods such as feature selection and feature extraction, with techniques like Principal Component Analysis (PCA) and Linear Discriminant Analysis (LDA) being commonly used. The process is crucial for improving model performance and visualization, but care must be taken as it can also lead to the loss of useful information.

Uploaded by

klalol2121

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

5 views26 pages

Dimensionality Reduction in Machine Learning

Uploaded by

klalol2121

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

UNIT-4

Dimensionality reduction

Dimensionality reduction is a technique used to reduce the number of features in a dataset

while retaining as much of the important information as possible. In other words, it is a
process of transforming high-dimensional data into a lower-dimensional space that still
preserves the essence of the original data.
In machine learning, high-dimensional data refers to data with a large number of features or
variables. The curse of dimensionality is a common problem in machine learning, where the
performance of the model deteriorates as the number of features increases. This is because
the complexity of the model increases with the number of features, and it becomes more
difficult to find a good solution. In addition, high-dimensional data can also lead to
overfitting, where the model fits the training data too closely and does not generalize well to
new data.

Dimensionality reduction can help to mitigate these problems by reducing the complexity of
the model and improving its generalization performance. There are two main approaches to
dimensionality reduction: feature selection and feature extraction.
Feature Selection:
Feature selection involves selecting a subset of the original features that are most relevant
to the problem at hand. The goal is to reduce the dimensionality of the dataset while
retaining the most important features. There are several methods for feature selection,
including filter methods, wrapper methods, and embedded methods. Filter methods rank
the features based on their relevance to the target variable, wrapper methods use the
model performance as the criteria for selecting features, and embedded methods combine
feature selection with the model training process.

Feature Extraction:
Feature extraction involves creating new features by combining or transforming the original
features. The goal is to create a set of features that captures the essence of the original data
in a lower-dimensional space. There are several methods for feature extraction, including
principal component analysis (PCA), linear discriminant analysis (LDA), and t-distributed
stochastic neighbor embedding (t-SNE). PCA is a popular technique that projects the original
features onto a lower-dimensional space while preserving as much of the variance as
possible.

Why is Dimensionality Reduction important in Machine Learning and

Predictive Modeling?
An intuitive example of dimensionality reduction can be discussed through a simple e-mail
classification problem, where we need to classify whether the e-mail is spam or not. This can
involve a large number of features, such as whether or not the e-mail has a generic title, the
content of the e-mail, whether the e-mail uses a template, etc. However, some of these
features may overlap. In another condition, a classification problem that relies on both
humidity and rainfall can be collapsed into just one underlying feature, since both of the
aforementioned are correlated to a high degree. Hence, we can reduce the number of
features in such problems. A 3-D classification problem can be hard to visualize, whereas a

2-D one can be mapped to a simple 2-dimensional space, and a 1-D problem to a simple line.
The below figure illustrates this concept, where a 3-D feature space is split into two 2-D
feature spaces, and later, if found to be correlated, the number of features can be reduced
even further.

Components of Dimensionality Reduction

There are two components of dimensionality reduction:
 Feature selection: In this, we try to find a subset of the original set of variables, or
features, to get a smaller subset which can be used to model the problem. It usually
involves three ways:
1. Filter
2. Wrapper
3. Embedded
 Feature extraction: This reduces the data in a high dimensional space to a lower
dimension space, i.e. a space with lesser no. of dimensions.
Methods of Dimensionality Reduction
The various methods used for dimensionality reduction include:
 Principal Component Analysis (PCA)
 Linear Discriminant Analysis (LDA)
 Generalized Discriminant Analysis (GDA)
Dimensionality reduction may be both linear and non-linear, depending upon the method
used. The prime linear method, called Principal Component Analysis, or PCA, is discussed
below.

Principal Component Analysis

This method was introduced by Karl Pearson. It works on the condition that while the data in
a higher dimensional space is mapped to data in a lower dimension space, the variance of the
data in the lower dimensional space should be maximum.

It involves the following steps:

 Construct the covariance matrix of the data.
 Compute the eigenvectors of this matrix.
 Eigenvectors corresponding to the largest eigenvalues are used to reconstruct a large
fraction of variance of the original data.
Hence, we are left with a lesser number of eigenvectors, and there might have been some data
loss in the process. But, the most important variances should be retained by the remaining
eigenvectors.

Important points:
 Dimensionality reduction is the process of reducing the number of features in a
dataset while retaining as much information as possible.
This can be done to reduce the complexity of a model, improve the performance of a
learning algorithm, or make it easier to visualize the data.
 Techniques for dimensionality reduction include: principal component analysis
(PCA), singular value decomposition (SVD), and linear discriminant analysis (LDA).
 Each technique projects the data onto a lower-dimensional space while preserving
important information.
 Dimensionality reduction is performed during pre-processing stage before building a
model to improve the performance
 It is important to note that dimensionality reduction can also discard useful
information, so care must be taken when applying these techniques.

Linear Discriminant Analysis

Linear Discriminant Analysis (LDA), also known as Normal Discriminant Analysis or
Discriminant Function Analysis, is a dimensionality reduction technique primarily utilized
in supervised classification problems. It facilitates the modeling of distinctions between
groups, effectively separating two or more classes. LDA operates by projecting features from
a higher-dimensional space into a lower-dimensional one. In machine learning, LDA serves
as a supervised learning algorithm specifically designed for classification tasks, aiming to
identify a linear combination of features that optimally segregates classes within a dataset.

For example, we have two classes and we need to separate them efficiently. Classes
can have multiple features. Using only a single feature to classify them may result in
some overlapping as shown in the below figure. So, we will keep on increasing the
number of features for proper classification.

Assumptions of LDA
LDA assumes that the data has a Gaussian distribution and that
the covariance matrices of the different classes are equal. It also assumes that the data
is linearly separable, meaning that a linear decision boundary can accurately classify
the different classes.
Suppose we have two sets of data points belonging to two different classes that we
want to classify. As shown in the given 2D graph, when the data points are plotted on
the 2D plane, there’s no straight line that can separate the two classes of data points
completely. Hence, in this case, LDA (Linear Discriminant Analysis) is used which
reduces the 2D graph into a 1D graph in order to maximize the separability between
the two classes.
Linearly Separable Dataset

Here, Linear Discriminant Analysis uses both axes (X and Y) to create a new axis and
projects data onto a new axis in a way to maximize the separation of the two
categories and hence, reduces the 2D graph into a 1D graph.
Two criteria are used by LDA to create a new axis:
1. Maximize the distance between the means of the two classes.
2. Minimize the variation within each class.
The perpendicular distance between the line and points

In the above graph, it can be seen that a new axis (in red) is generated and plotted in
the 2D graph such that it maximizes the distance between the means of the two
classes and minimizes the variation within each class. In simple terms, this newly
generated axis increases the separation between the data points of the two classes.
After generating this new axis using the above-mentioned criteria, all the data points
of the classes are plotted on this new axis and are shown in the figure given below.

But Linear Discriminant Analysis fails when the mean of the distributions are shared,
as it becomes impossible for LDA to find a new axis that makes both classes linearly
separable. In such cases, we use non-linear discriminant analysis.
How does LDA work?
LDA works by projecting the data onto a lower-dimensional space that maximizes the
separation between the classes. It does this by finding a set of linear discriminants that
maximize the ratio of between-class variance to within-class variance. In other words,
it finds the directions in the feature space that best separates the different classes of da

Mathematical Intuition Behind LDA

Factor analysis
Factor analysis, a method within the realm of statistics and part of the general linear model
(GLM), serves to condense numerous variables into a smaller set of factors. By doing so, it
captures the maximum shared variance among the variables and condenses them into a
unified score, which can subsequently be utilized for further [Link] analysis operates
under several assumptions: linearity in relationships, absence of multicollinearity among
variables, inclusion of relevant variables in the analysis, and genuine correlations between
variables and factors. While multiple methods exist, principal component analysis stands out
as the most prevalent approach in practice.
Factorial analysis serves several purposes and objectives in statistical analysis:
1. Dimensionality Reduction: Factor analysis helps in reducing the number of variables
under consideration by identifying a smaller number of underlying factors that explain
the correlations or covariances among the observed variables. This simplification can
make the data more manageable and easier to interpret.
2. Identifying Latent Constructs: It allows researchers to identify latent constructs or
underlying factors that may not be directly observable but are inferred from patterns
in the observed data. These latent constructs can represent theoretical concepts, such
as personality traits, attitudes, or socioeconomic status.
3. Data Summarization: By condensing the information from multiple variables into a
smaller set of factors, factor analysis provides a more concise summary of the data
while retaining as much relevant information as possible.
4. Hypothesis Testing: Factor analysis can be used to test hypotheses about the
underlying structure of the data. For example, researchers may have theoretical
expectations about how variables should be related to each other, and factor analysis
can help evaluate whether these expectations are supported by the data.
5. Variable Selection: It aids in identifying which variables are most important or
relevant for explaining the underlying factors. This can help in prioritizing variables
for further analysis or for developing more parsimonious models.
6. Improving Predictive Models: Factor analysis can be used as a preprocessing step to
improve the performance of predictive models by reducing multicollinearity among
predictors and capturing the shared variance among variables more efficient.
Types of Factor Analysis
There are two main types of Factor Analysis used in data science:
1. Exploratory Factor Analysis (EFA)
Exploratory Factor Analysis (EFA) is used to uncover the underlying structure of a set of
observed variables without imposing preconceived notions about how many factors there are
or how the variables are related to each factor. It explores complex interrelationships among
items and aims to group items that are part of unified concepts or constructs.
 Researchers do not make a priori assumptions about the relationships among factors,
allowing the data to reveal the structure organically.
 Exploratory Factor Analysis (EFA) helps in identifying the number of factors needed
to account for the variance in the observed variables and understanding the
relationships between variables and factors.
2. Confirmatory Factor Analysis (CFA)
Confirmatory Factor Analysis (CFA) is a more structured approach that tests specific
hypotheses about the relationships between observed variables and latent factors based on
prior theoretical knowledge or expectations. It uses structural equation modeling techniques
to test a measurement model, wherein the observed variables are assumed to load onto
specific factors.
 Confirmatory Factor Analysis (CFA) assesses the fit of the hypothesized model to the
actual data, examining how well the observed variables align with the proposed factor
structure.
 This method allows for the evaluation of relationships between observed variables
and unobserved factors, and it can accommodate measurement error.
 Researchers hypothesize the relationships between variables and factors before
conducting the analysis, and the model is tested against empirical data to determine its
validity.



 Independent Component Analysis (ICA) is a statistical and computational

technique used in machine learning to separate a multivariate signal into its
independent non-Gaussian components. The goal of ICA is to find a linear
transformation of the data such that the transformed data is as close to
being statistically independent as possible.
 The heart of ICA lies in the principle of statistical independence. ICA
identify components within mixed signals that are statistically independent
of each other.
 Statistical Independence Concept:
 It is a probability theory that if two random variables X and Y are
statistically independent. The joint probability distribution of the pair is
equal to the product of their individual probability distributions, which
means that knowing the outcome of one variable does not change the
probability of the other outcome. 









Assumptions in ICA
1. The first assumption asserts that the source signals (original signals) are statistically
independent of each other.
2. The second assumption is that each source signal exhibits non-Gaussian distributions.
Mathematical Representation of Independent Component Analysis

The observed random vector is representing the observed data with m

components. The hidden components are represented by the random vector
where n is the number of hidden sources.
Linear Static Transformation
The observed data X is transformed into hidden components S using a linear static
transformation representation by the matrix W.

Here, W = transformation matrix.

The goal is to transform the observed data x in a way that the resulting hidden components
are independent. The independence is measured by some function The task is to
find the optimal transformation matrix W that maximizes the independence of the hidden
components.
Cocktail Party Problem
Consider Cocktail Party Problem or Blind Source Separation problem to understand the
problem which is solved by independent component analysis.
Problem: To extract independent sources’ signals from a mixed signal composed of the
signals from those sources.
Given: Mixed signal from five different independent sources.
Aim: To decompose the mixed signal into independent sources:
 Source 1
 Source 2
 Source 3
 Source 4
 Source 5
Solution: Independent Component Analysis

Here, there is a party going into a room full of people. There is ‘n’ number of speakers in that
room, and they are speaking simultaneously at the party. In the same room, there are also ‘n’
microphones placed at different distances from the speakers, which are recording ‘n’
speakers’ voice signals. Hence, the number of speakers is equal to the number of
microphones in the room.
Now, using these microphones’ recordings, we want to separate all the ‘n’ speakers’ voice
signals in the room, given that each microphone recorded the voice signals coming from each
speaker of different intensity due to the difference in distances between them.
Decomposing the mixed signal of each microphone’s recording into an independent source’s
speech signal can be done by using the machine learning technique, independent component
analysis.

where, X1, X2, …, Xn are the original signals present in the mixed signal and Y1, Y2, …, Yn
are the new features and are independent components that are independent of each other.
Locally Linear Embedding
LLE(Locally Linear Embedding) is an unsupervised approach designed to transform data
from its original high-dimensional space into a lower-dimensional representation, all while
striving to retain the essential geometric characteristics of the underlying non-linear feature
structure. LLE operates in several key steps:
 Firstly, it constructs a nearest neighbors graph to capture these local relationships.
Then, it optimizes weight values for each data point, aiming to minimize the
reconstruction error when expressing a point as a linear combination of its neighbors.
This weight matrix reflects the strength of connections between points.
 Next, LLE computes a lower dimensional representation of the data by
finding eigenvectors of a matrix derived from the weight matrix. These eigenvectors
represent the most relevant directions in the reduced space. Users can specify the
desired dimensionality for the output space, and LLE selects the top eigenvectors
accordingly.
As an illustration, consider a Swiss roll dataset, which is inherently non-linear in its high-
dimensional space. LLE, in this case, works to project this complex structure onto a lower-
dimensional plane, preserving its distinctive geometric properties throughout the
transformation process."
Mathematical Implementation of LLE Algorithm
The key idea of LLE is that locally, in the vicinity of each data point, the data lies
approximately on a linear subspace. LLE attempts to unfold or unroll the data while
preserving these local linear relationships.

Where:
 xi represents the i-th data point.
 wij are the weights that minimize the reconstruction error for data point xi using its
neighbors.
It aims to find a lower-dimensional representation of data while preserving local
relationships. The mathematical expression for LLE involves minimizing the reconstruction
error of each data point by expressing it as a weighted sum of its k nearest neighbors'
contributions. This optimization is subject to constraints ensuring that the weights sum to 1
for each data point. Locally Linear Embedding (LLE) is a dimensionality reduction technique
used in machine learning and data analysis. It focuses on preserving local relationships
between data points when mapping high-dimensional data to a lower-dimensional space.
Here, we will explain the LLE algorithm and its parameters.
Locally Linear Embedding Algorithm
The LLE algorithm can be broken down into several steps:
 Neighborhood Selection: For each data point in the high-dimensional space, LLE
identifies its k-nearest neighbors. This step is crucial because LLE assumes that each
data point can be well approximated by a linear combination of its neighbors.
 Weight Matrix Construction: LLE computes a set of weights for each data point to
express it as a linear combination of its neighbors. These weights are determined in
such a way that the reconstruction error is minimized. Linear regression is often used
to find these weights.
 Global Structure Preservation: After constructing the weight matrix, LLE aims to
find a lower-dimensional representation of the data that best preserves the local linear
relationships. It does this by seeking a set of coordinates in the lower-dimensional
space for each data point that minimizes a cost function. This cost function evaluates
how well each data point can be represented by its neighbors.
 Output Embedding: Once the optimization process is complete, LLE provides the
final lower-dimensional representation of the data. This representation captures the
essential structure of the data while reducing its dimensionality.
Parameters in LLE Algorithm
LLE has a few parameters that influence its behavior:
 k (Number of Neighbors): This parameter determines how many nearest neighbors
are considered when constructing the weight matrix. A larger k captures more global
relationships but may introduce noise. A smaller k focuses on local relationships but
can be sensitive to outliers. Selecting an appropriate value for k is essential for the
algorithm's success.
 Dimensionality of Output Space: You can specify the dimensionality of the lower-
dimensional space to which the data will be mapped. This is often chosen based on
the problem's requirements and the trade-off between computational complexity and
information preservation.
 Distance Metric: LLE relies on a distance metric to define the proximity between
data points. Common choices include Euclidean distance, Manhattan distance, or
custom-defined distance functions. The choice of distance metric can impact the
results.
 Regularization (Optional): In some cases, regularization terms are added to the cost
function to prevent overfitting. Regularization can be useful when dealing with noisy
data or when the number of neighbors is high.
 Optimization Algorithm (Optional): LLE often uses optimization techniques
like Singular Value Decomposition (SVD) or eigenvector methods to find the lower-
dimensional representation. These optimization methods may have their own
parameters that can be adjusted.
LLE (Locally Linear Embedding) represents a significant advancement in structural
analysis, surpassing traditional density modeling techniques like local PCA or mixtures of
factor analyzers. The limitation of density models lies in their inability to consistently
establish a set of global coordinates capable of embedding observations across the entire
structural manifold. Consequently, they prove inadequate for tasks such as generating low-
dimensional projections of the original dataset. These models excel only in identifying linear
features, as depicted in the image below. However, they fall short in capturing intricate
curved patterns, a capability inherent to LLE.

Isomap
A nonlinear dimensionality reduction method used in data analysis and machine learning is
called isomap, short for isometric mapping. Isomap was developed to maintain the inherent
geometry of high-dimensional data as a substitute for conventional techniques like Principal
Component Analysis (PCA). Isomap creates a low-dimensional representation, usually a two-
or three-dimensional map, by focusing on the preservation of pairwise distances between data
points.
This technique works especially well for extracting the underlying structure from large,
complex datasets, like those from speech recognition, image analysis, and biological systems.
Finding patterns and insights in a variety of scientific and engineering domains is made
possible by Isomap's capacity to highlight the fundamental relationships found in data.
Isomap
An understanding and representation of complicated data structures are crucial for the field of
machine learning. To achieve this, Manifold Learning, a subset of unsupervised learning, has
a significant role to play. Among the manifold learning techniques, ISOMAP (Isometric
Mapping) stands out for its prowess in capturing the intrinsic geometry of high-dimensional
data. In the case of situations in which linear methods are lacking, they have proved
particularly efficient.
ISOMAP is a flexible tool that seamlessly blends multiple learning and dimensionality
reduction intending to obtain more detailed knowledge of the underlying structure of data.
This article takes a look at ISOMAP's inner workings and sheds light on its parameters,
functions, and proper implementation with SkLearn.
Isometric mapping is an approach to reduce the dimensionality of machine learning.
Manifold Learning
To understand the underlying structure of complex data, Manifold Learning is
an unsupervised method of learning. Basically, it is aimed at capturing the inherent
characteristics of High Definition Datasets and representing them from a less dimensional
space. Multiple learning allows the discovery of nonlinear relationships hidden within data,
which is a valuable asset in different applications compared to linear techniques.
Isometric Mapping Concept
The idea of an Isometric Map, which aims to preserve pairwise distance between points, is
central to ISOMAP. In doing so, it seeks to achieve low dimensionality representation for the
data while at the same time keeping geodesic distances as shortest possible along the curving
edge of the data manifold. This is particularly important in situations where the underlying
structure has been broken or folded, since traditional methods such as PCA are not able to
take these nuances into account.
Relation between Geodesic Distances and Euclidean Distances
Understanding the distinction between equatorial and elliptic distances is of vital importance
for ISOMAP. The geodesic distance considers the shortest path along the curved surface of
the manifold, as opposed to Euclidean distances which are measured by measuring straight
Line distances in the input space. In order to provide a more precise representation of the
data's internal structure, ISOMAP exploits these quantum distances.
Working of ISOMAP
 Calculate the pairwise distances: The algorithm starts by calculating the Euclidean
distances between the data points.
 Find nearest neighbors according to these distances: For each data point, its k
nearest neighbor is determined by that distance.
 Create a neighborhood plot: the edges of each point are aligned with their closest
neighbors, which creates a diagram that represents the data's regional structure.
 Calculate geodesic distances: The Floyd algorithm sorts through all the pairs of data
points in a neighborhood graph and finds the most distant paths. geodesic distances
are represented by these shortest paths.
 Perform dimensional reduction: Classical Multi Scaling MDS is used for geodesic
distance matrices that result in low dimensional embedding of data.


The output of the above code
 This sample of code illustrates how to apply the dimensionality
reduction method Isomap to a dataset of S curves. Plotting the original 3D
data next to the reduced 2D data for visualization follows the generation of
S-curve data with 3D coordinates using Isomap. The fundamental
connections between data points in a lower-dimensional space are
preserved by the Isomap transformation, which captures the underlying
geometric structure. With the inherent patterns in the data still intact, the
resultant visualization shows how effective Isomap is at unfolding the S-
curve structure in a more manageable 2D representation.

Evolutionary Learning Genetic Algorithms

Genetic Algorithms(GAs) are adaptive heuristic search algorithms that belong
to the larger part of evolutionary algorithms. Genetic algorithms are based on
the ideas of natural selection and genetics. These are intelligent exploitation of
random searches provided with historical data to direct the search into the
region of better performance in solution space. They are commonly used to
generate high-quality solutions for optimization problems and search
problems.
Genetic algorithms simulate the process of natural selection which means
those species that can adapt to changes in their environment can survive and
reproduce and go to the next generation. In simple words, they simulate
“survival of the fittest” among individuals of consecutive generations to solve a
problem. Each generation consists of a population of individuals and each
individual represents a point in search space and possible solution. Each
individual is represented as a string of character/integer/float/bits. This string is
analogous to the Chromosome.
Genetic Algorithm in Machine Learning
A genetic algorithm is an adaptive heuristic search algorithm inspired by
"Darwin's theory of evolution in Nature." It is used to solve optimization
problems in machine learning. It is one of the important algorithms as it helps
solve complex problems that would take a long time to solve.
Genetic Algorithms are being widely used in different real-world applications,
for example, Designing electronic circuits, code-breaking, image processing,
and artificial creativity.
In this topic, we will explain Genetic algorithm in detail, including basic
terminologies used in Genetic algorithm, how it works, advantages and
limitations of genetic algorithm, etc.
What is a Genetic Algorithm?
Before understanding the Genetic algorithm, let's first understand basic
terminologies to better understand this algorithm:
o Population: Population is the subset of all possible or probable solutions,
which can solve the given problem.
o Chromosomes: A chromosome is one of the solutions in the population
for the given problem, and the collection of gene generate a
chromosome.
o Gene: A chromosome is divided into a different gene, or it is an element
of the chromosome.
o Allele: Allele is the value provided to the gene within a particular
chromosome.
o Fitness Function: The fitness function is used to determine the
individual's fitness level in the population. It means the ability of an
individual to compete with other individuals. In every iteration,
individuals are evaluated based on their fitness function.
o Genetic Operators: In a genetic algorithm, the best individual mate to
regenerate offspring better than parents. Here genetic operators play a
role in changing the genetic composition of the next generation.
o Selection
o After calculating the fitness of every existent in the population, a
selection process is used to determine which of the individualities in the
population will get to reproduce and produce the seed that will form the
coming generation.
o Types of selection styles available
o Roulette wheel selection
o Event selection
o Rank- grounded selection
o So, now we can define a genetic algorithm as a heuristic search
algorithm to solve optimization problems. It is a subset of evolutionary
algorithms, which is used in computing. A genetic algorithm uses genetic
and natural selection concepts to solve optimization problems.
1. Initialization

The process of a genetic algorithm starts by generating the set of individuals,

which is called population. Here each individual is the solution for the given
problem. An individual contains or is characterized by a set of parameters called
Genes. Genes are combined into a string and generate chromosomes, which is
the solution to the problem. One of the most popular techniques for
initialization is the use of random binary strings.

2. Fitness Assignment
Fitness function is used to determine how fit an individual is? It means the
ability of an individual to compete with other individuals. In every iteration,
individuals are evaluated based on their fitness function. The fitness function
provides a fitness score to each individual. This score further determines the
probability of being selected for reproduction. The high the fitness score, the
more chances of getting selected for reproduction.
3. Selection
The selection phase involves the selection of individuals for the reproduction of
offspring. All the selected individuals are then arranged in a pair of two to
increase reproduction. Then these individuals transfer their genes to the next
generation.
There are three types of Selection methods available, which are:
o Roulette wheel selection
o Tournament selection
o Rank-based selection
4. Reproduction
After the selection process, the creation of a child occurs in the reproduction
step. In this step, the genetic algorithm uses two variation operators that are
applied to the parent population. The two operators involved in the
reproduction phase are given below:
o Crossover: The crossover plays a most significant role in the
reproduction phase of the genetic algorithm. In this process, a crossover
point is selected at random within the genes. Then the crossover
operator swaps genetic information of two parents from the current
generation to produce a new individual representing the offspring.

The genes of parents are exchanged among themselves until the

crossover point is met. These newly generated offspring are added to the
population. This process is also called or crossover. Types of crossover
styles available:
o One point crossover
o Two-point crossover
o Livery crossover
o Inheritable Algorithms crossover
o Mutation
The mutation operator inserts random genes in the offspring (new child)
to maintain the diversity in the population. It can be done by flipping
some bits in the chromosomes.
Mutation helps in solving the issue of premature convergence and
enhances diversification. The below image shows the mutation process:
Types of mutation styles available,
o Flip bit mutation
o Gaussian mutation
o Exchange/Swap mutation

5. Termination
After the reproduction phase, a stopping criterion is applied as a base for
termination. The algorithm terminates after the threshold fitness solution is
reached. It will identify the final solution as the best solution in the population.
General Workflow of a Simple Genetic Algorithm
GENETIC OPERATOR
In machine learning, particularly in genetic algorithms (GAs), genetic operators
are the mechanisms that simulate biological processes of evolution, such as
selection, crossover (recombination), and mutation. These operators are
responsible for creating new solutions (offspring) from the existing population
(individuals). Here's a more detailed explanation of each genetic operator and
its role in machine learning:
1. Selection (Reproduction)
Selection is the process of choosing which individuals (solutions) in the
population will pass on their genetic material to the next generation. The key
idea is that individuals with higher fitness are more likely to be selected to
reproduce, which mimics the process of natural selection in biology (survival of
the fittest).
Types of Selection:
 Roulette Wheel Selection: This is also known as fitness proportionate
selection. Each individual is assigned a probability of selection based on
its fitness. The fitter the individual, the higher its chance of being
selected. The selection process is similar to spinning a roulette wheel
where the size of each slice is proportional to the individual’s fitness.
 Tournament Selection: A set number of individuals are randomly chosen
from the population, and the one with the highest fitness in that subset
is selected. This method is easier to implement and can help maintain
diversity in the population.
 Rank-Based Selection: Individuals are sorted based on their fitness, and
selection is done based on their rank rather than absolute fitness values.
This helps avoid issues where a small number of individuals with very
high fitness dominate the population.
2. Crossover (Recombination)
Crossover is the genetic operator that combines the genetic material (solution
parameters) of two parent individuals to create one or more offspring. This
operator simulates biological reproduction, where offspring inherit traits from
both parents.
How Crossover Works:
 Two parent solutions (chromosomes) are chosen based on their fitness.
 A crossover point is randomly selected within the chromosomes. The
parts of the chromosome before and after the crossover point are
swapped between the two parents to create new offspring.
Types of Crossover:
 Single-Point Crossover: A random crossover point is selected, and the
chromosomes of the two parents are exchanged at that point.
 Multi-Point Crossover: Multiple crossover points are chosen, and the
genetic material of the parents is exchanged at all these points, leading
to more complex offspring.
 Uniform Crossover: Rather than a single crossover point, genes from
each parent are chosen randomly to form the offspring. This allows more
varied recombination.
Purpose of Crossover: The goal of crossover is to combine the best traits of
parents, allowing the offspring to inherit superior qualities from both parents
and potentially explore new regions of the solution space.
3. Mutation
Mutation introduces small random changes in an individual’s genetic code
(solution). This operator mimics the genetic mutations seen in biology, where
random alterations in DNA can lead to new traits. Mutation is crucial for
maintaining diversity within the population, preventing premature
convergence, and helping the algorithm escape local optima.
How Mutation Works:
 For each offspring, a certain probability (mutation rate) is applied to each
gene (parameter in the solution).
 If mutation occurs, the value of that gene is altered randomly. The
mutation could involve changing the value of a parameter, flipping a bit,
or adjusting a solution's configuration.
Types of Mutation:
 Bit-flip Mutation: In binary representation of solutions, a random bit (0
or 1) is flipped.
 Random Resetting: For continuous values, the gene value may be
replaced by a random value within a given range.
 Gaussian Mutation: A small value is added to the gene based on a
Gaussian (normal) distribution, allowing small, continuous changes.
Purpose of Mutation: Mutation allows genetic algorithms to explore new parts
of the solution space and maintain diversity. Without mutation, a population
could become too uniform, limiting the algorithm’s ability to find novel
solutions.
4. Elitism
While not always considered a "genetic operator" in the traditional sense,
elitism is a strategy where the best individual (or a few top individuals) from
the current generation is guaranteed to survive and move to the next
generation. This ensures that the best solutions are not lost due to randomness
in selection or crossover.
Purpose of Elitism: Elitism helps maintain high-quality solutions over
generations by preventing the best solutions from being overwritten. It can
accelerate convergence and ensure that the algorithm doesn't lose its best-
found solution.
5. Replacement (Generation Gap)
This operator determines how offspring replace individuals in the population.
There are different strategies for replacement:
 Generational Replacement: The entire population is replaced by the
offspring after each generation.
 Steady-State Replacement: Only a few individuals are replaced each
generation, ensuring that part of the old population persists.
 Hyperparameter tuning: Optimizing the parameters (like learning rate,
number of layers) of machine learning models.
 Neural architecture search: Evolving the structure of neural networks to
find the best Selection is the process by which individuals (candidate
solutions) are chosen to reproduce based on their fitness. The idea is
that the fittest individuals are more likely to produce offspring, which
increases the chance of improving the population in subsequent
generations.

Common questions

Genetic algorithms solve optimization problems by mimicking biological evolution processes using genetic operators. The selection operator chooses individuals based on their fitness, allowing those with higher fitness to reproduce and pass their traits onto the next generation . Crossover combines genetic material from two parent individuals to create new offspring, exploring new solutions by inheriting features from both parents . Mutation introduces random changes to individuals, maintaining genetic diversity in the population and preventing premature convergence by exploring new areas of the solution space . Together, these operators iteratively improve upon solutions to find optimal or near-optimal solutions to complex problems.

Linear dimensionality reduction techniques assume that the data lies in a linear subspace and aim to project the data into a lower-dimensional space while preserving linear properties. Examples include Principal Component Analysis (PCA) and Linear Discriminant Analysis (LDA). Nonlinear dimensionality reduction techniques, such as Isomap and Locally Linear Embedding (LLE), do not assume linearity, thereby capturing more complex data structures and preserving the intrinsic geometry of the data, suitable for capturing intricate, non-linear relationships .

Feature selection involves selecting a subset of the original features that are most relevant to the problem, using methods like filter, wrapper, and embedded techniques. It retains the original features deemed important . In contrast, feature extraction transforms the original features to create new ones that capture the essence of the data in a lower-dimensional space, commonly using methods like PCA and LDA . By creating new features, feature extraction often yields a completely different set of dimensions that still represent the original data effectively.

Dimensionality reduction techniques are used to reduce the complexity of a model by decreasing the number of features while retaining as much important information as possible. This helps in mitigating the 'curse of dimensionality,' which occurs when the performance of the model deteriorates due to a large number of features . By doing so, dimensionality reduction can improve model generalization performance, reduce the risk of overfitting, and make data visualization simpler and more insightful, especially when mapping high-dimensional data into a lower-dimensional space .

The 'curse of dimensionality' refers to the challenges that arise when dealing with high-dimensional data, where the performance of a model can deteriorate as dimensions increase because of overfitting and increased complexity . Feature selection mitigates this effect by choosing only the most relevant features, thus reducing the number of dimensions without significant information loss . Feature extraction, on the other hand, transforms the original set of features into a new lower-dimensional set that still represents the core information of the dataset, preserving the essence while reducing complexity . Both approaches reduce dimensionality and improve model performance by simplifying the dataset.

Dimensionality reduction techniques help handle overfitting by simplifying the model, reducing the number of features, and thus lowering the complexity that could lead to overfitting . In doing so, they enable the model to generalize better to new data by focusing on the most significant data features. However, the trade-offs involve the potential loss of important information, as some features critical to capturing the data characteristics may be discarded . Careful selection and extraction balance these benefits and drawbacks to ensure maximal retention of essential data characteristics while reducing complexity.

Principal Component Analysis (PCA) achieves dimensionality reduction by projecting high-dimensional data onto a lower-dimensional space that retains the maximum variance. The key steps involve constructing a covariance matrix from the data, computing the eigenvectors and eigenvalues of this matrix, and using the eigenvectors that correspond to the largest eigenvalues to form a new feature space . This process ensures that the principal components capture the most significant variability in the data, thus reducing the dimensionality while preserving critical information .

Locally Linear Embedding (LLE) is preferred over PCA or LDA in scenarios where the data exhibits complex, non-linear structures and a global linear approach may not effectively capture the data's intrinsic geometry. LLE is adept at identifying intricate, curved patterns in data, which traditional linear methods like PCA and LDA might miss . It is particularly useful when the preservation of local relationships is crucial for understanding the data structure, and when the data contains outliers that could dominate the projection if not handled appropriately .

In Locally Linear Embedding (LLE), choosing a smaller value of 'k' focuses on capturing more local relationships, which can help in identifying the inherent structure in data, but may make the algorithm more sensitive to outliers and noise . Conversely, a larger 'k' might capture more global structures at the expense of local details, potentially smoothing over fine structures in the data. The choice of 'k' can significantly affect LLE's performance, as it determines the balance between local and global data representation, impacting the algorithm's ability to uncover the true manifold structure .

Isomap differs from traditional PCA by focusing on preserving the inherent geometry of high-dimensional data, particularly the pairwise distances between data points, which it maps to a lower-dimensional space. Unlike PCA, which assumes linear relationships, Isomap is designed for nonlinear dimensionality reduction, capturing the data's intrinsic manifold structure . Isomap is more effective than PCA for datasets that exhibit complex, nonlinear relationships, such as those found in speech recognition, image analysis, and biological systems, where preserving non-linear geometric relations is crucial .

Dimensionality Reduction in Machine Learning
No ratings yet
Dimensionality Reduction in Machine Learning
19 pages
Dimensionality Reduction Techniques in ML
No ratings yet
Dimensionality Reduction Techniques in ML
41 pages
Dimensionality Reduction & LDA Explained
No ratings yet
Dimensionality Reduction & LDA Explained
15 pages
Unit-IV Machine Learning
No ratings yet
Unit-IV Machine Learning
35 pages
Dimensionality Reduction Techniques Explained
No ratings yet
Dimensionality Reduction Techniques Explained
17 pages
Dimensionality Reduction Techniques Explained
No ratings yet
Dimensionality Reduction Techniques Explained
84 pages
Understanding Linear Discriminant Analysis
No ratings yet
Understanding Linear Discriminant Analysis
21 pages
ML Unit 4
No ratings yet
ML Unit 4
20 pages
Lecture 5 1
No ratings yet
Lecture 5 1
46 pages
ML Unit4
No ratings yet
ML Unit4
13 pages
ML Unit - IV
No ratings yet
ML Unit - IV
28 pages
ML - Pca - Lda - SVD
No ratings yet
ML - Pca - Lda - SVD
12 pages
Discriminant Analysis Overview
No ratings yet
Discriminant Analysis Overview
26 pages
Dimensionality Reduction Techniques in ML
No ratings yet
Dimensionality Reduction Techniques in ML
34 pages
Dimensionality Reduction Techniques in ML
No ratings yet
Dimensionality Reduction Techniques in ML
76 pages
Dimensionality Reduction Techniques Explained
No ratings yet
Dimensionality Reduction Techniques Explained
8 pages
Dimensionality Reduction Techniques Explained
No ratings yet
Dimensionality Reduction Techniques Explained
59 pages
Understanding Dimensionality Reduction Techniques
No ratings yet
Understanding Dimensionality Reduction Techniques
13 pages
Dimensionality Reduction & LDA Techniques
No ratings yet
Dimensionality Reduction & LDA Techniques
28 pages
Dimensionality Reduction Techniques Explained
No ratings yet
Dimensionality Reduction Techniques Explained
80 pages
ML Unit 4
No ratings yet
ML Unit 4
51 pages
ML Unit4 Total
No ratings yet
ML Unit4 Total
64 pages
Unit III - DR and KMeans
No ratings yet
Unit III - DR and KMeans
27 pages
Dimensionality Reduction in Machine Learning
No ratings yet
Dimensionality Reduction in Machine Learning
33 pages
Unit-4 ML Full
No ratings yet
Unit-4 ML Full
21 pages
Dimensionality Reduction Techniques in ML
No ratings yet
Dimensionality Reduction Techniques in ML
50 pages
Dimensionality Reduction Techniques Explained
No ratings yet
Dimensionality Reduction Techniques Explained
23 pages
Dimensionality Reduction Techniques Explained
No ratings yet
Dimensionality Reduction Techniques Explained
14 pages
Dimensionality Reduction Techniques Explained
No ratings yet
Dimensionality Reduction Techniques Explained
27 pages
Importance of Dimensionality Reduction
No ratings yet
Importance of Dimensionality Reduction
19 pages
Dimensionality Reduction Techniques Explained
No ratings yet
Dimensionality Reduction Techniques Explained
21 pages
Dimensionality Reduction in Machine Learning
No ratings yet
Dimensionality Reduction in Machine Learning
5 pages
Dimensionality Reduction in Machine Learning
No ratings yet
Dimensionality Reduction in Machine Learning
5 pages
PCA vs LDA for Dimensionality Reduction
No ratings yet
PCA vs LDA for Dimensionality Reduction
5 pages
Feature Selection and Dimensionality Reduction
No ratings yet
Feature Selection and Dimensionality Reduction
26 pages
Dimensionality Reduction - Genetic Algorithm
No ratings yet
Dimensionality Reduction - Genetic Algorithm
53 pages
Dimensionality Reduction Techniques Explained
No ratings yet
Dimensionality Reduction Techniques Explained
27 pages
Dimensionality Reduction Techniques Overview
No ratings yet
Dimensionality Reduction Techniques Overview
32 pages
Dimensionality Reduction Techniques in ML
No ratings yet
Dimensionality Reduction Techniques in ML
46 pages
Unit 04 Notes ML
No ratings yet
Unit 04 Notes ML
35 pages
Dimensionality Reduction Techniques Explained
No ratings yet
Dimensionality Reduction Techniques Explained
20 pages
Understanding Dimensionality Reduction Techniques
No ratings yet
Understanding Dimensionality Reduction Techniques
27 pages
Dimensionality Reduction Techniques Explained
No ratings yet
Dimensionality Reduction Techniques Explained
27 pages
Dimensionality Reduction Techniques Explained
No ratings yet
Dimensionality Reduction Techniques Explained
44 pages
Dimensionality Reduction Techniques Explained
No ratings yet
Dimensionality Reduction Techniques Explained
13 pages
Dimensionality Reduction Techniques Explained
No ratings yet
Dimensionality Reduction Techniques Explained
66 pages
Dimensionality Reduction Techniques Explained
No ratings yet
Dimensionality Reduction Techniques Explained
29 pages
Understanding Dimensionality Reduction Techniques
No ratings yet
Understanding Dimensionality Reduction Techniques
109 pages
Understanding Principal Component Analysis
No ratings yet
Understanding Principal Component Analysis
9 pages
Dimensionality Reduction in Machine Learning
No ratings yet
Dimensionality Reduction in Machine Learning
30 pages
Dimensionality Reduction Group Project Presentation
No ratings yet
Dimensionality Reduction Group Project Presentation
20 pages
Dimensionality Reduction Techniques Explained
No ratings yet
Dimensionality Reduction Techniques Explained
23 pages
Dimensionality Reduction Technologyy
No ratings yet
Dimensionality Reduction Technologyy
18 pages
Dimensionality Reduction Techniques Explained
No ratings yet
Dimensionality Reduction Techniques Explained
65 pages
ML Module 6
No ratings yet
ML Module 6
29 pages
Dimensionality Reduction in Machine Learning
No ratings yet
Dimensionality Reduction in Machine Learning
100 pages
LDA and PCA in Dimensionality Reduction
No ratings yet
LDA and PCA in Dimensionality Reduction
12 pages
Dimensionality Reduction @unit-1 ADS
No ratings yet
Dimensionality Reduction @unit-1 ADS
9 pages
Dimensionality Reduction Techniques Explained
No ratings yet
Dimensionality Reduction Techniques Explained
25 pages
Density-Based Classification Methods
No ratings yet
Density-Based Classification Methods
25 pages
Edge Detection Techniques in Image Processing
No ratings yet
Edge Detection Techniques in Image Processing
48 pages
Discriminant and Conjoint Analysis Overview
No ratings yet
Discriminant and Conjoint Analysis Overview
12 pages
Discriminant Analysis Overview
No ratings yet
Discriminant Analysis Overview
7 pages
Chaos-Based Micro-Instruction Set Security
No ratings yet
Chaos-Based Micro-Instruction Set Security
14 pages
Brand Classifications by Origin Analysis
No ratings yet
Brand Classifications by Origin Analysis
9 pages
PCA and LDA for Dimensionality Reduction
No ratings yet
PCA and LDA for Dimensionality Reduction
5 pages
Assessing Childhood Vertical Jump Skills
No ratings yet
Assessing Childhood Vertical Jump Skills
18 pages
Taxonomy of Cyclopes Anteaters
No ratings yet
Taxonomy of Cyclopes Anteaters
36 pages
B.Tech CSE Semester 7 & 8 Scheme
No ratings yet
B.Tech CSE Semester 7 & 8 Scheme
53 pages
PCA and LDA in Face Recognition Techniques
No ratings yet
PCA and LDA in Face Recognition Techniques
2 pages
EEG Analysis for Attention State Detection
No ratings yet
EEG Analysis for Attention State Detection
4 pages
Iris Dataset: Logistic Regression Analysis
No ratings yet
Iris Dataset: Logistic Regression Analysis
24 pages
Compensatory Growth Effects on Fish Scales
No ratings yet
Compensatory Growth Effects on Fish Scales
6 pages
Statistical Methods in Data Mining
No ratings yet
Statistical Methods in Data Mining
21 pages
Understanding Web Content Mining Techniques
No ratings yet
Understanding Web Content Mining Techniques
38 pages
Machine Learning Concepts and Techniques
No ratings yet
Machine Learning Concepts and Techniques
25 pages
Factor Analysis in Multivariate Data Analysis
No ratings yet
Factor Analysis in Multivariate Data Analysis
11 pages
Marriott's Market Segmentation Strategy
No ratings yet
Marriott's Market Segmentation Strategy
64 pages
Exercises On Fisher Discriminant Analysis
No ratings yet
Exercises On Fisher Discriminant Analysis
2 pages
Naïve Bayes Classifier in Pattern Recognition
No ratings yet
Naïve Bayes Classifier in Pattern Recognition
17 pages
Advanced Market Research Techniques
No ratings yet
Advanced Market Research Techniques
10 pages
Value Effect in Indian Stock Market
No ratings yet
Value Effect in Indian Stock Market
25 pages
Applied Data Science: ML Classification Techniques
No ratings yet
Applied Data Science: ML Classification Techniques
70 pages
Lithofacies, and Hydraulic Flow Units
No ratings yet
Lithofacies, and Hydraulic Flow Units
11 pages
Learn-One-Rule & Decision Trees Explained
No ratings yet
Learn-One-Rule & Decision Trees Explained
40 pages
Terahertz Technology in Biomedicine
No ratings yet
Terahertz Technology in Biomedicine
22 pages
Testbank for Multivariate Statistics
No ratings yet
Testbank for Multivariate Statistics
13 pages
Canonical Analysis for Ecological Ordination
No ratings yet
Canonical Analysis for Ecological Ordination
16 pages

Dimensionality Reduction in Machine Learning

Uploaded by

Dimensionality Reduction in Machine Learning

Uploaded by

UNIT-4

Dimensionality reduction is a technique used to reduce the number of features in a dataset

Why is Dimensionality Reduction important in Machine Learning and

Components of Dimensionality Reduction

Principal Component Analysis

It involves the following steps:

Linear Discriminant Analysis

Mathematical Intuition Behind LDA

 Independent Component Analysis (ICA) is a statistical and computational

The observed random vector is representing the observed data with m

Here, W = transformation matrix.

Evolutionary Learning Genetic Algorithms

The process of a genetic algorithm starts by generating the set of individuals,

The genes of parents are exchanged among themselves until the

Common questions

Explain how genetic algorithms use operators like selection, crossover, and mutation to solve optimization problems.

What are the main differences between the linear and nonlinear dimensionality reduction techniques, and can you give examples of each?

How do feature selection and feature extraction differ in their approaches to dimensionality reduction?

What are the primary reasons for using dimensionality reduction techniques in machine learning, particularly in relation to model performance and data visualization?

What role does the 'curse of dimensionality' play in the necessity for dimensionality reduction, and how can feature selection and extraction mitigate its effects?

How do dimensionality reduction techniques contribute to handling overfitting in machine learning models, and what trade-offs might be involved?

How does Principal Component Analysis (PCA) achieve dimensionality reduction, and what are the key steps involved in its implementation?

In what scenarios might Locally Linear Embedding (LLE) be preferred over other dimensionality reduction techniques like PCA or LDA, and why?

What are the implications of choosing a smaller or larger value of 'k' in locally linear embedding (LLE), and how does it affect the algorithm’s performance?

How does Isomap differ from traditional PCA in terms of its approach to dimensionality reduction, and in what type of datasets is Isomap more effective?

You might also like