0% found this document useful (0 votes)

14 views18 pages

Unit V

The document discusses various dimensionality reduction techniques, including Principal Component Analysis (PCA), Non-Negative Matrix Factorization (NMF), and manifold learning, highlighting their methodologies, advantages, and disadvantages. PCA transforms correlated features into uncorrelated components to simplify datasets, while NMF decomposes matrices into non-negative factors for feature extraction. Additionally, it covers clustering algorithms like K-Means, Agglomerative Clustering, and DBSCAN, explaining their processes and applications in data analysis.

Uploaded by

birundha2024software

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

14 views18 pages

Unit V

Uploaded by

birundha2024software

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

UNIT V

Principal Component Analysis (PCA)

PCA (Principal Component Analysis) is a dimensionality reduction technique and helps us to
reduce the number of features in a dataset while keeping the most important information. It
changes complex datasets by transforming correlated features into a smaller set of uncorrelated
components.

It helps us to remove redundancy, improve computational efficiency and make data easier to
visualize and analyze.
How Principal Component Analysis Works
PCA uses linear algebra to transform data into new features called principal components. It
finds these by calculating eigenvectors (directions) and eigenvalues (importance) from the
covariance matrix. PCA selects the top components with the highest eigenvalues and projects
the data onto them simplify the dataset.
Imagine you’re looking at a messy cloud of data points like stars in the sky and want to simplify
it. PCA helps you find the "most important angles" to view this cloud so you don’t miss the big
patterns. Here’s how it works step by step:
Step 1: Standardize the Data
Different features may have different units and scales like salary vs. age. To compare them
fairly PCA first standardizes the data by making each feature have:
• A mean of 0
• A standard deviation of 1
𝑋−𝜇
𝑍=
𝜎
where:
• 𝜇 is the mean of independent features 𝜇 = {𝜇1 , 𝜇2 , ⋯ , 𝜇𝑚 }
• 𝜎 is the standard deviation of independent features 𝜎 = {𝜎1 , 𝜎2 , ⋯ , 𝜎𝑚 }
Step 2: Calculate Covariance Matrix
Next PCA calculates the covariance matrix to see how features relate to each other whether
they increase or decrease together. The covariance between two features 𝑥1 and 𝑥2 is:
𝑛
ˉ )(𝑥2𝑖 − 𝑥2
∑𝑖=1(𝑥 1𝑖 − 𝑥1 ˉ )
𝑐𝑜𝑣(𝑥1, 𝑥2) =
𝑛−1
Where:
• 𝑥ˉ1 𝑎𝑛𝑑𝑥ˉ2 are the mean values of features 𝑥1 𝑎𝑛𝑑𝑥2
• 𝑛 is the number of data points
The value of covariance can be positive, negative or zeros.
Step 3: Find the Principal Components
PCA identifies new axes where the data spreads out the most:
• 1st Principal Component (PC1): The direction of maximum variance (most spread).
• 2nd Principal Component (PC2): The next best direction, perpendicular to PC1 and
so on.
These directions come from the eigenvectors of the covariance matrix and their importance is
measured by eigenvalues. For a square matrix A an eigenvector X (a non-zero vector) and its
corresponding eigenvalue λ satisfy:
𝐴𝑋 = 𝜆𝑋
This means:
• When A acts on X it only stretches or shrinks X by the scalar λ.
• The direction of X remains unchanged hence eigenvectors define "stable directions"
of A.
Eigenvalues help rank these directions by importance.
Step 4: Pick the Top Directions & Transform Data
After calculating the eigenvalues and eigenvectors PCA ranks them by the amount of
information they capture. We then:
1. Select the top k components that capture most of the variance like 95%.
2. Transform the original dataset by projecting it onto these top components.
This means we reduce the number of features (dimensions) while keeping the important
patterns in the data.
Transform this 2D dataset into a 1D representation while preserving as much variance as
possible.
In the above image the original dataset has two features "Radius" and "Area" represented by
the black axes. PCA identifies two new directions: PC₁ and PC₂ which are the principal
components.
• These new axes are rotated versions of the original ones. PC₁ captures the maximum
variance in the data meaning it holds the most information while PC₂ captures the
remaining variance and is perpendicular to PC₁.
• The spread of data is much wider along PC₁ than along PC₂. This is why PC₁ is chosen
for dimensionality reduction. By projecting the data points (blue crosses) onto PC₁ we
effectively transform the 2D data into 1D and retain most of the important structure and
patterns.
Advantages of Principal Component Analysis
1. Multicollinearity Handling: Creates new, uncorrelated variables to address issues
when original features are highly correlated.
2. Noise Reduction: Eliminates components with low variance enhance data clarity.
3. Data Compression: Represents data with fewer components reduce storage needs and
speeding up processing.
4. Outlier Detection: Identifies unusual data points by showing which ones deviate
significantly in the reduced space.
Disadvantages of Principal Component Analysis
1. Interpretation Challenges: The new components are combinations of original
variables which can be hard to explain.
2. Data Scaling Sensitivity: Requires proper scaling of data before application or results
may be misleading.
3. Information Loss: Reducing dimensions may lose some important information if too
few components are kept.
4. Assumption of Linearity: Works best when relationships between variables are linear
and may struggle with non-linear data.
5. Computational Complexity: Can be slow and resource-intensive on very large
datasets.
6. Risk of Overfitting: Using too many components or working with a small dataset
might lead to models that don't generalize well.

Non-Negative Matrix Factorization

Non-Negative Matrix Factorization (NMF) is a technique used to break down large dataset

into smaller meaningful parts while ensuring that all values remain non-negative. This helps

in extracting useful features from data and making it easier to analyze and process it.

Matrix Decomposition and Representation in NMF

For a matrix A of dimensions 𝑚 × 𝑛 where each element is ≥ 0 NMF factorizes it into two

matrices 𝑊 and 𝐻 with dimensions 𝑚 × 𝑘 and 𝑘 × 𝑛 respectively where both matrices

contain only non-negative elements:

𝐴𝑚×𝑛 ≈ 𝑊𝑚×𝑘 𝐻𝑘×𝑛

where:

• 𝐴 : Original input matrix (a linear combination of W and H)

• 𝑊: Feature matrix (basis components)

• 𝐻: Coefficient matrix (weights associated with W)

• 𝑘: Rank (dimensionality of the reduced representation where 𝑘 ≤ min⁡(𝑚, 𝑛)

NMF helps to identify hidden patterns in data by assuming that each data point can be

represented as a combination of fundamental features found in 𝑊.

Intuition Behind NMF

The goal of NMF is to simplify complex data into a smaller set of meaningful patterns. By

choosing a lower dimension k the decomposition highlights essential features while

ignoring noise.

• Each data point (column in 𝐴) is approximated as a combination of non-negative feature

vectors in 𝑊.

• This method assumes that data consists of meaningful parts that add up to form the

whole.

For example in facial recognition NMF can break down an image into basic facial features

such as eyes, nose and mouth. The 𝑊matrix contains these key features while the 𝐻matrix

defines how strongly each image is composed of these features.

Working of NMF

NMF decomposes a data matrix 𝐴 into two smaller matrices 𝑊and 𝐻using an iterative

optimization process that minimizes reconstruction error:

1. Initialization: Start with random non-negative values for 𝑊and 𝐻.

2. Iterative Update: Modify 𝑊and 𝐻to minimize the difference between 𝐴 and 𝑊 × 𝐻.

3. Stopping Criteria: The process stops when:

• The reconstruction error stabilizes.

• A set number of iterations is reached.

Common optimization techniques for NMF include:

• Multiplicative Update Rules: Ensures non-negativity by iteratively

adjusting 𝑊and 𝐻.

• Alternating Least Squares (ALS): Solves for 𝑊while keeping 𝐻fixed and vice versa,

in an alternating manner.

Real-life Example

Let us consider some real-life examples to understand the working of the NMF algorithm.

Let's take a case of image processing.

• Suppose we have an input image having pixels that form matrix A.

• Using NMF we factorize it into two matrices one containing the facial feature set

[Matrix W]

• Other contains the importance of each facial feature in the input image i.e. the weights

[Matrix H]. (As shown in below image)

NMF in Image Processing

Applications of NMF

NMF has a wide range of applications including:

• Image Processing: Feature extraction in facial recognition and object detection.

• Text Mining and NLP Task: Topic modeling by decomposing a document-term

matrix into key topics.

• Spectral Data Analysis: Identifying hidden patterns in sound, medical signals and

chemical spectra.

• Bioinformatics: Gene expression analysis for identifying molecular patterns in

biological data.

Manifold Learning
Manifold learning is a type of unsupervised learning technique that focuses on the nonlinear
structure of data. The goal is to reduce the dimensionality of the data while preserving the
relationships or structures within it. It is particularly useful when dealing with high-
dimensional data where linear methods like PCA might not be appropriate.
In manifold learning, the idea is to capture the underlying manifold, or surface, on which the
data resides. This underlying manifold is often of much lower dimensionality than the original
space, and capturing it can help in visualizing, understanding, and analyzing the data.
Common manifold learning techniques include:
1. t-SNE (t-Distributed Stochastic Neighbor Embedding): This technique is
particularly good at creating a two or three-dimensional map of the data, where similar
items are close together.
2. Isomap: This method works by estimating the geodesic or ‘curved’ distance between
points on the manifold.
3. Locally Linear Embedding (LLE): LLE seeks to preserve local relationships within
the data and can unfold twisted manifolds.
4. Spectral Embedding: This method uses the eigenvalues of a similarity matrix to
project the data into a lower-dimensional space.
Manifold learning can be applied to various domains such as image processing, speech
recognition, and bioinformatics. It’s particularly useful for visualizing complex data in a way
that highlights intrinsic relationships.
While manifold learning methods are powerful, they are not without their challenges. They can
be sensitive to noise, tuning parameters might be tricky, and they might be computationally
expensive for very large datasets.

Clustering
1. K-Means Clustering Algorithm
K-Means is a widely used clustering algorithm that partitions data points into K clusters
based on their similarity. The algorithm works by iteratively updating the cluster
centroids until convergence is achieved.
K-Means algorithm Steps:
1. Choose the number of clusters, K.
2. Randomly initialize K centroids.

3. Assign each data point to the nearest centroid.

4. Recalculate the centroid of each cluster.

5. Repeat steps 3 and 4 until convergence is achieved.

The distance metric used to determine the nearest centroid can be Euclidean,
Manhattan, or any other distance metric of choice.
Advantages of K-Means clustering algorithm :
▪ Easy to implement and interpret
▪ Fast and efficient for large datasets
▪ Works well with spherical clusters
Disadvantages of K-Means clustering algorithm :
▪ Assumes equal cluster sizes and variances
▪ Sensitive to initial centroid positions
▪ Can converge to local optima
Agglomerative Clustering in Machine Learning
Agglomerative clustering is a hierarchical clustering algorithm that starts with each data point
as its own cluster and iteratively merges the closest clusters until a stopping criterion is reached.
It is a bottom-up approach that produces a dendrogram, which is a tree-like diagram that shows
the hierarchical relationship between the clusters. The algorithm can be implemented using the
scikit-learn library in Python.
Agglomerative Clustering Algorithm
To group similar data points into clusters based on their proximity, Agglomerative Clustering
is used which is a type of hierarchical clustering. It follows a bottom-up approach, where each
data point starts as its own cluster and gradually merges with others based on similarity.
• The merging continues until all points form a single cluster or a set number of clusters
remain.
• It uses distance metrics like Euclidean or Manhattan distance to measure similarity.
• The process is often visualized using a dendrogram, which shows the hierarchy of
cluster formation.
• Common linkage methods include single, complete, average and ward linkage.
Animal Categorization Tree
Workflow
Lets dicuss step by step how it works:

1. Start with all points separate:

• Treat each data point as its own cluster like A, B, C, ...
• Initially, you have n clusters for n data points.
2. Compute pairwise distances:
• Calculate the distance between every pair of clusters.
• Common choices include Euclidean, Manhattan or Cosine distance.
• Store these values in a distance matrix.
3. Merge the nearest clusters:
• Identify the two clusters that are closest based on the chosen linkage method such as
single, complete, average or Ward linkage.
• Combine them into a single new cluster.
4. Update distances:
• Recalculate the distances between the newly formed cluster and all remaining clusters.
• Use the same linkage rule to ensure consistency.
5. Repeat the process:
• Continue merging clusters and updating distances iteratively.
• Stop when you reach a predefined number of clusters (k) or a distance threshold.
6. Visualize the results:
• Create a dendrogram to visualize how clusters merged at each step.
• Choose a suitable cut on the dendrogram to obtain the final cluster groups.

DBSCAN Clustering
DBSCAN is a density-based clustering algorithm that groups data points that are closely
packed together and marks outliers as noise based on their density in the feature space. It
identifies clusters as dense regions in the data space separated by areas of lower density. Unlike
K-Means or hierarchical clustering which assumes clusters are compact and spherical,
DBSCAN perform well in handling real-world data irregularities such as:
• Arbitrary-Shaped Clusters: Clusters can take any shape not just circular or convex.
• Noise and Outliers: It effectively identifies and handles noise points without assigning
them to any cluster.
Data Sets with clustering algorithms
The figure above shows a data set with clustering algorithms: K-Means and Hierarchical
handling compact, spherical clusters with varying noise tolerance while DBSCAN manages
arbitrary-shaped clusters and noise handling.
Key Parameters in DBSCAN
1. eps: This defines the radius of the neighborhood around a data point. If the distance between
two points is less than or equal to eps they are considered neighbors. A common method to
determine eps is by analyzing the k-distance graph. Choosing the right eps is important:
• If eps is too small most points will be classified as noise.
• If eps is too large clusters may merge and the algorithm may fail to distinguish between
them.
2. MinPts: This is the minimum number of points required within the eps radius to form a
dense region. A general rule of thumb is to set MinPts >= D+1 where D is the number of
dimensions in the dataset.
For most cases a minimum value of MinPts = 3 is recommended.
How Does DBSCAN Work?
DBSCAN works by categorizing data points into three types:
1. Core points which have a sufficient number of neighbors within a specified radius
(eplison)
2. Border points which are near core points but lack enough neighbors to be core points
themselves
3. Noise points which do not belong to any cluster.
By iteratively expanding clusters from core points and connecting density-reachable points,
DBSCAN forms clusters without relying on rigid assumptions about their shape or size.

DBSCAN Algorithm
Steps in the DBSCAN Algorithm
1. Identify Core Points: For each point in the dataset count the number of points within
its eps neighborhood. If the count meets or exceeds MinPts mark the point as a core
point.
2. Form Clusters: For each core point that is not already assigned to a cluster create a
new cluster. Recursively find all density-connected points i.e points within
the eps radius of the core point and add them to the cluster.
3. Density Connectivity: Two points a and b are density-connected if there exists a chain
of points where each point is within the eps radius of the next and at least one point in
the chain is a core point. This chaining process ensures that all points in a cluster are
connected through a series of dense regions.
4. Label Noise Points: After processing all points any point that does not belong to a
cluster is labeled as noise.
Comparison of Clustering Algorithms
Evaluating Clustering Algorithms
Clustering metrics are used to evaluate all clustering algorithms. Since clustering is
unsupervised, accuracy isn’t straightforward like in classification. Instead, we rely on
evaluation [Link] us take a look at some of the most commonly used clustering metrics:
1. Silhouette Score
The Silhouette Score is a way to measure how good the clusters are in a dataset. It helps us
understand how well the data points have been grouped. The score ranges from -1 to 1.
• A score close to 1 means a point fits really well in its group (cluster) and is far from
other groups.
• A score close to 0 means the point is on the border between two clusters.
• A score close to -1 means the point might be in the wrong cluster.
Silhouette Score (S) for a data point i is calculated as:
𝑏(𝑖) − 𝑎(𝑖)
𝑆(𝑖) =
𝑚𝑎𝑥(𝑎(𝑖), 𝑏(𝑖))
where,
• a(i) is the average distance from i to other data points in the same cluster.
• b(i) is the smallest average distance from i to data points in a different cluster.
2. Davies-Bouldin Index
The Davies-Bouldin Index (DBI) helps us measure how good the clustering is in a dataset. It
looks at how tight each cluster is (compactness), and how far apart the clusters are (separation).
• Lower DBI = better, clearer clusters
• Higher DBI = messy, overlapping clusters
A lower score is better, because it means:
• Points in the same cluster are close to each other.
• Different clusters are far apart from one another.
Davies-Bouldin Index (DB) is calculated as:
1 𝑘 𝑅𝑖𝑖 + 𝑅𝑗𝑗
𝐷𝐵 = ∑ max⁡𝑗≠𝑖 ( )
𝑘 𝑖=1 𝑅𝑖𝑗

where,
• k is the total number of clusters.
• 𝑅𝑖𝑖 is the compactness of cluster i.
• 𝑅𝑗𝑗 is the compactness of cluster j.

• 𝑅𝑖𝑗 is the dissimilarity (distance) between cluster i and cluster j.

3. Calinski-Harabasz Index (Variance Ratio Criterion)

The Calinski-Harabasz Index measures how good the clusters are in a dataset.
It looks at:
• How close the points are inside each cluster?
• How far apart the clusters are?
A higher score is better, as it means the clusters are tight and well-separated. It helps
determine the ideal number of clusters.
Calinski-Harabasz Index (CH) is calculated as:
𝐵 𝑁−𝐾
𝐶𝐻 = ×
𝑊 𝐾−1
where,
• B is the sum of squares between clusters.
• W is the sum of squares within clusters.
• N is the total number of data points.
• K is the number of clusters.
Calculating between group sum of squares (B)
𝐾
𝐵=∑ 𝑛𝑘 ×∣∣ 𝐶𝑘 –𝐶 ∣∣2
𝑘=1

where,
• 𝑛𝑘 is the number of observation in cluster 'k'
• 𝐶𝑘 is the centroid of cluster 'k'
• C is the centroid of the dataset
• K is number of clusters
Calculating within the group sum of squares (W)
𝑛𝑘
𝑊=∑ ∣∣ 𝑋𝑖𝑘 –𝐶𝑘 ∣∣2
𝑖=1

where,
• 𝑛𝑘 is the number of observation in cluster 'k'
• 𝑋𝑖𝑘 is the i-th observation of cluster 'k'
• 𝐶𝑘 is the centroid of cluster 'k'
4. Adjusted Rand Index (ARI)
The Adjusted Rand Index (ARI) helps us measure how accurate a clustering result is by
comparing it to the true labels (ground truth).
It checks how well the pairs of points are grouped:
• Are the same pairs together in both the real and predicted clusters?
• Are different pairs also kept apart correctly?
The score ranges from -1 to 1:
• 1 means perfect match - the clustering is exactly right.
• 0 means random guess - no better than chance.
• Below 0 means worse than random - very poor clustering.
Adjusted Rand Index (ARI) is calculated as:
(𝑅𝐼−𝐸𝑥𝑝𝑒𝑐𝑡𝑒𝑑𝑅𝐼 )
𝐴𝑅𝐼 =
(𝑚𝑎𝑥(𝑅𝐼) − 𝐸𝑥𝑝𝑒𝑐𝑡𝑒𝑑𝑅𝐼 )
where,
• RI is the Rand Index.
• 𝐸𝑥𝑝𝑒𝑐𝑡𝑒𝑑𝑅𝐼 is the expected value of the Rand Index.
5. Mutual Information (MI)
Mutual Information measures how much two variables are related or connected. In clustering,
it compares how much the true cluster labels match with the predicted labels. It shows
how much knowing about one variable helps us predict the other. The more agreement there is,
the higher the score.
• Higher values mean better agreement between the clusters.
• Zero means no agreement at all.
MI between true labels Y and predicted labels Z is calculated as:

𝑝(𝑦𝑖 , 𝑧𝑗 )
𝑀𝐼(𝑦, 𝑧) = ∑ ∑ 𝑝( 𝑦𝑖 , 𝑧𝑗 ) ⋅ log⁡ ( )
𝑗 𝑝(𝑦𝑖 ) ⋅ 𝑝(𝑧𝑗 )
𝑖

where,
• 𝑦𝑖 is a true label.
• 𝑧𝑖 is a predicted label.
• 𝑝(𝑦𝑖 , 𝑧𝑖 ) is the joint probability of 𝑦𝑖 and 𝑧𝑗 .

• 𝑝(𝑦𝑖 ) and 𝑝(𝑧𝑖 ) are the marginal probabilities.

These clustering metrics help in evaluating the quality and performance of clustering
algorithms, allowing for informed decisions when selecting the most suitable clustering
solution for a given dataset.
Summary of Clustering Methods
Clustering in machine learning groups unlabeled data points into clusters based on similarity,
enabling pattern discovery without supervision. Common methods fall into categories like
partitioning, hierarchical, density-based, and model-based approaches.
Partitioning Methods
These divide data into a predefined number of clusters (k) using centroids. K-Means initializes
k centroids, assigns points to the nearest, and iteratively updates centroids until convergence;
it's efficient for spherical clusters but sensitive to initialization and outliers. K-Medoids (e.g.,
PAM) uses actual data points as centers (medoids) for robustness to noise.
Hierarchical Methods
These build a tree of clusters without specifying k upfront. Agglomerative (bottom-up) starts
with each point as a cluster and merges closest pairs; divisive (top-down) splits a single cluster
recursively. They produce dendrograms for visualization but scale poorly to large datasets.
Density-Based Methods
These identify clusters as dense regions separated by sparse areas, handling arbitrary shapes
and noise. DBSCAN defines core points by neighborhood density (eps, minPts parameters)
and expands clusters from them; OPTICS and HDBSCAN extend it for varying densities.
Model-Based Methods
These assume data follows a mixture of probability distributions, like Gaussian Mixture
Models (GMM), which use EM algorithm to fit parameters and soft-assign points via
probabilities. Useful for overlapping clusters but computationally intensive.

Method
Key Algorithms Strengths Weaknesses
Type

K-Means, K- Needs k predefined, assumes

Partitioning Fast, scalable
Medoids spherical clusters

No k needed,
Hierarchical Agglomerative O(n²) time
hierarchical view

Density- DBSCAN,
Handles noise/shapes Parameter-sensitive
Based OPTICS

Probabilistic
Model-Based GMM Assumes distributions
assignments

Common questions

In text mining, NMF is commonly used for topic modeling by decomposing a document-term matrix into matrices that highlight key topics and how documents are composed of these topics . This allows the extraction of interpretable features, improving insights into textual data. In image processing, NMF is employed to break down images into basic features and their combinations, such as identifying dominant features like edges or parts of a face (eyes, nose). This feature extraction enhances pattern recognition, classification, and compression tasks, facilitating interpreting and processing high-dimensional image data.

PCA contributes to reducing multicollinearity by transforming the original correlated features into a smaller set of uncorrelated components called principal components. Each principal component is a linear combination of the original features and explains a portion of the variance in the data while being uncorrelated with others . This transformation mitigates the risk of multicollinearity, which is advantageous in regression analyses as it improves model stability and interpretability . The implication for data preprocessing is that PCA enables the selection of fewer features while retaining most variance, thus simplifying models and improving computational efficiency.

DBSCAN is particularly effective for discovering clusters of arbitrary shapes because it identifies clusters as densely connected regions and can distinguish between noise and clusters without assuming any shape . It groups data points based on density connectivity, making it robust to irregularities . Hierarchical clustering, specifically agglomerative methods, builds clusters by merging data points based on similarity measures like distances, often facing challenges with clusters of arbitrary shapes as it assumes a more globular structure until all points are exhausted . While hierarchical clustering provides a dendrogram that visualizes clustering relationships at multiple levels, it may not handle irregularly shaped clusters as effectively as DBSCAN.

One potential disadvantage of applying PCA is the challenge of interpretability since principal components are linear combinations of the original variables, making it difficult to understand what they represent . Additionally, PCA can result in information loss if too few components are retained, potentially discarding important nuances present in the original data . This trade-off between dimensionality reduction and information retention needs careful consideration.

Hierarchical clustering visualizes cluster formation through a dendrogram, which represents the order in which clusters are merged (in agglomerative clustering) or split (in divisive clustering) based on similarity or distance metrics . Each dendrogram branch shows at what point two clusters were combined, allowing observation of hierarchical relationships among data points. This is beneficial because it provides insights into the data structure at multiple levels, allowing decision-makers to choose the optimal number of clusters through cut-off thresholds on the dendrogram, based on specific needs or insights . It also aids in understanding data relationships and hierarchy without prespecifying the number of clusters.

K-Means assumes that the data is partitioned into spherical clusters of roughly equal size, leading it to perform best in datasets that approximately meet these conditions . It uses a simple distance measure, typically Euclidean, to assign points to clusters, which makes it sensitive to outliers and initial centroid positions. On the other hand, Gaussian Mixture Models (GMM) assume the data follows a mixture of Gaussian distributions, allowing for clusters of varying shapes, sizes, and densities . GMM uses a probabilistic approach, assigning membership probabilities rather than fixed assignments. This flexibility makes GMM more applicable to real-world data where cluster boundaries may not be spherical or where clusters overlap.

The silhouette score provides a measure of how well-separated clusters are and how cohesive individual clusters remain, with values ranging from -1 to 1 . A high silhouette score indicates well-defined clusters, which helps in evaluating the effectiveness of the clustering algorithm. However, relying solely on the silhouette score can be misleading if the dataset contains clusters of varying densities or if the algorithm itself is suitable for only certain types of clusters . Thus, while helpful, the silhouette score should be used alongside other evaluation metrics to get a comprehensive understanding of clustering effectiveness.

In Principal Component Analysis, eigenvectors define the directions of the new feature space, while eigenvalues quantify the amount of variance each eigenvector captures. During PCA, the covariance matrix of the standardized data is used to calculate eigenvectors and corresponding eigenvalues . The eigenvectors with the highest eigenvalues are selected as principal components because they account for the greatest variance in the data, thus effectively simplifying the dataset without losing significant information . This selection of top k components enables dimensional reduction by transforming the original data into the selected eigenvectors' space.

NMF might outperform PCA in datasets where only additive relationships make sense and when data is inherently non-negative, such as image processing or text mining . Since NMF only allows for non-negative factors, it ensures all components can be interpreted as meaningful parts of the whole, like parts of an image or word topics in text mining, whereas PCA, relying on linear transformations, might generate eigenvectors with negative values that are harder to interpret .

In DBSCAN, the 'eps' parameter defines the radius for neighborhood searches, while 'MinPts' sets the minimum number of data points required to form a dense region. A small 'eps' leads to many points being incorrectly classified as noise, whereas a large 'eps' might merge distinct clusters . Meanwhile, 'MinPts' influences the sensitivity of cluster formation, with higher values requiring denser regions to form clusters . Optimizing these involves analyzing the k-distance graph to find an 'elbow' point for 'eps' and using domain knowledge or rules of thumb like MinPts = D+1, where D is the data dimensionality, to set 'MinPts'. Proper tuning improves cluster discovery and enhances handling of noise and density variation.

Dimensionality Reduction in Machine Learning
No ratings yet
Dimensionality Reduction in Machine Learning
27 pages
PCA and Cluster Analysis Overview
No ratings yet
PCA and Cluster Analysis Overview
14 pages
Clustering & Dimensionality Reduction Techniques
No ratings yet
Clustering & Dimensionality Reduction Techniques
15 pages
Understanding Principal Component Analysis
No ratings yet
Understanding Principal Component Analysis
3 pages
4 - Dimensionality Reduction PCA
No ratings yet
4 - Dimensionality Reduction PCA
19 pages
Dimensionality Reduction and Principal Component Analysis
No ratings yet
Dimensionality Reduction and Principal Component Analysis
4 pages
Dimensionality Reduction Techniques Overview
No ratings yet
Dimensionality Reduction Techniques Overview
32 pages
Dimensionality Reduction in Machine Learning
No ratings yet
Dimensionality Reduction in Machine Learning
30 pages
Unsupervised Learning: Dimensionality Reduction
No ratings yet
Unsupervised Learning: Dimensionality Reduction
30 pages
Unit 4 Notes
No ratings yet
Unit 4 Notes
16 pages
Dimensionality Reduction Techniques Explained
No ratings yet
Dimensionality Reduction Techniques Explained
34 pages
Dimensionality Reduction Techniques Explained
No ratings yet
Dimensionality Reduction Techniques Explained
11 pages
Dimensionality Reduction @unit-1 ADS
No ratings yet
Dimensionality Reduction @unit-1 ADS
9 pages
Understanding Principal Component Analysis
No ratings yet
Understanding Principal Component Analysis
70 pages
PCA for Dimensionality Reduction
No ratings yet
PCA for Dimensionality Reduction
17 pages
Dimensionality Reduction Techniques Explained
No ratings yet
Dimensionality Reduction Techniques Explained
41 pages
Understanding Principal Component Analysis
No ratings yet
Understanding Principal Component Analysis
8 pages
Principal Component Analysis Overview
No ratings yet
Principal Component Analysis Overview
19 pages
16 Dimensionality Reduction
No ratings yet
16 Dimensionality Reduction
32 pages
Principal Component Analysis Explained
No ratings yet
Principal Component Analysis Explained
7 pages
Dimensionality Reduction Techniques Explained
No ratings yet
Dimensionality Reduction Techniques Explained
102 pages
Understanding Principal Component Analysis
100% (1)
Understanding Principal Component Analysis
18 pages
Dimensionality Reduction Techniques Explained
No ratings yet
Dimensionality Reduction Techniques Explained
28 pages
Dimensionality Reduction
No ratings yet
Dimensionality Reduction
39 pages
Feature Selection vs. Dimensionality Reduction
No ratings yet
Feature Selection vs. Dimensionality Reduction
18 pages
IML2
No ratings yet
IML2
15 pages
Principal Component Analysis Explained
No ratings yet
Principal Component Analysis Explained
3 pages
Understanding Principal Component Analysis
No ratings yet
Understanding Principal Component Analysis
8 pages
Dimensionality Reduction Techniques
No ratings yet
Dimensionality Reduction Techniques
59 pages
Feature Reduction & Selection Techniques
No ratings yet
Feature Reduction & Selection Techniques
41 pages
PCA for Dimensionality Reduction Guide
No ratings yet
PCA for Dimensionality Reduction Guide
21 pages
Understanding Dimensionality Reduction Techniques
No ratings yet
Understanding Dimensionality Reduction Techniques
123 pages
Dimensionality Reduction Techniques Explained
No ratings yet
Dimensionality Reduction Techniques Explained
8 pages
Dimensionality Reduction Techniques Explained
No ratings yet
Dimensionality Reduction Techniques Explained
21 pages
PCA in Machine Learning Overview
No ratings yet
PCA in Machine Learning Overview
63 pages
Understanding Principal Component Analysis
No ratings yet
Understanding Principal Component Analysis
14 pages
PCA: Dimensionality Reduction Explained
No ratings yet
PCA: Dimensionality Reduction Explained
28 pages
Dimensionality Reduction Techniques in ML
No ratings yet
Dimensionality Reduction Techniques in ML
18 pages
Understanding Principal Component Analysis
No ratings yet
Understanding Principal Component Analysis
13 pages
Data Reduction and Visualization Techniques
No ratings yet
Data Reduction and Visualization Techniques
22 pages
PCA Lecture Notes: Comprehensive Guide
No ratings yet
PCA Lecture Notes: Comprehensive Guide
22 pages
Dimensionality Reduction Techniques Explained
No ratings yet
Dimensionality Reduction Techniques Explained
21 pages
PCA For Visualization SeminarReport
No ratings yet
PCA For Visualization SeminarReport
12 pages
Unit-4 ML Full
No ratings yet
Unit-4 ML Full
21 pages
Dimensionality Reduction in Machine Learning
No ratings yet
Dimensionality Reduction in Machine Learning
15 pages
Principal Component Analysis Overview
No ratings yet
Principal Component Analysis Overview
16 pages
Understanding Principal Component Analysis
No ratings yet
Understanding Principal Component Analysis
26 pages
PCA Lab: A Beginner's Guide
No ratings yet
PCA Lab: A Beginner's Guide
5 pages
Unsupervised Dimensionality Reduction Techniques
No ratings yet
Unsupervised Dimensionality Reduction Techniques
81 pages
Understanding Principal Component Analysis
No ratings yet
Understanding Principal Component Analysis
10 pages
Dimensionality Reduction Techniques Explained
No ratings yet
Dimensionality Reduction Techniques Explained
48 pages
Beginner's Guide to Principal Component Analysis
No ratings yet
Beginner's Guide to Principal Component Analysis
6 pages
Data Preprocessing Techniques in Mining
No ratings yet
Data Preprocessing Techniques in Mining
32 pages
PCA Cheat Sheet for Dimensionality Reduction
No ratings yet
PCA Cheat Sheet for Dimensionality Reduction
2 pages
Fisher Discriminant Analysis Overview
No ratings yet
Fisher Discriminant Analysis Overview
15 pages
Understanding Principal Component Analysis
No ratings yet
Understanding Principal Component Analysis
4 pages
PCA Implementation in Python
No ratings yet
PCA Implementation in Python
11 pages
Web Technology - III Unit
No ratings yet
Web Technology - III Unit
28 pages
FCA - Unit 2
No ratings yet
FCA - Unit 2
18 pages
FCA - Unit 1
No ratings yet
FCA - Unit 1
16 pages
VB Script
No ratings yet
VB Script
10 pages
NEW
No ratings yet
NEW
18 pages
Oa Lab2
No ratings yet
Oa Lab2
1 page
JavaLab III
No ratings yet
JavaLab III
3 pages
Unit-3 Supervised Learning
No ratings yet
Unit-3 Supervised Learning
30 pages
pLM4ACE: A Protein Language Model Based Predictor For Antihypertensive Peptide Screening
No ratings yet
pLM4ACE: A Protein Language Model Based Predictor For Antihypertensive Peptide Screening
9 pages
UMAP for Single-Cell Data Visualization
No ratings yet
UMAP for Single-Cell Data Visualization
10 pages
Dimensionality Reduction Techniques Explained
No ratings yet
Dimensionality Reduction Techniques Explained
34 pages
Applied Soft Computing Journal: Yu-Pei Huang, Meng-Feng Yen
No ratings yet
Applied Soft Computing Journal: Yu-Pei Huang, Meng-Feng Yen
14 pages
Thyroid Dataset Analysis Techniques
No ratings yet
Thyroid Dataset Analysis Techniques
56 pages
Factor Analysis Techniques in Deep Learning
No ratings yet
Factor Analysis Techniques in Deep Learning
119 pages
Data Science: Techniques for Data Reduction
No ratings yet
Data Science: Techniques for Data Reduction
28 pages
Unit2 ANN
No ratings yet
Unit2 ANN
83 pages
Machine Learning for Financial Distress Prediction
No ratings yet
Machine Learning for Financial Distress Prediction
33 pages
IT Job Interview Preparation Guide
No ratings yet
IT Job Interview Preparation Guide
11 pages
Data Mining Exam Solutions for B9BA103
No ratings yet
Data Mining Exam Solutions for B9BA103
7 pages
A Global Geometric Framework For Nonlinear Dimensionality Reduction
No ratings yet
A Global Geometric Framework For Nonlinear Dimensionality Reduction
8 pages
Gait Analysis for Gender Classification
No ratings yet
Gait Analysis for Gender Classification
6 pages
What Are Higher-Order Networks?: Christian Bick Elizabeth Gross Heather A. Harrington Michael T. Schaub
No ratings yet
What Are Higher-Order Networks?: Christian Bick Elizabeth Gross Heather A. Harrington Michael T. Schaub
46 pages
Job-Resume Matching with Deep Learning
No ratings yet
Job-Resume Matching with Deep Learning
14 pages
Dimensionality Reduction in Chemoinformatics
No ratings yet
Dimensionality Reduction in Chemoinformatics
18 pages
Machine Learning Concepts and Techniques
No ratings yet
Machine Learning Concepts and Techniques
11 pages
JNTUK R20 Machine Learning Notes
No ratings yet
JNTUK R20 Machine Learning Notes
23 pages
Exam-Ready Answers for ML Modules 4-6
No ratings yet
Exam-Ready Answers for ML Modules 4-6
11 pages
Gaussian Mixture Model Overview
No ratings yet
Gaussian Mixture Model Overview
56 pages
Cryo-EM Image Processing Foundation Model
No ratings yet
Cryo-EM Image Processing Foundation Model
25 pages
Unsupervised Learning & Data Preprocessing
No ratings yet
Unsupervised Learning & Data Preprocessing
13 pages
Ear Recognition 2025
No ratings yet
Ear Recognition 2025
22 pages
Manifold Learning Techniques Overview
No ratings yet
Manifold Learning Techniques Overview
15 pages
Distinct Variation Pattern Discovery Using Alternating Nonlinear Principal Component Analysis
No ratings yet
Distinct Variation Pattern Discovery Using Alternating Nonlinear Principal Component Analysis
11 pages
XAI for Targeted Protein Degradation
No ratings yet
XAI for Targeted Protein Degradation
8 pages
Benchmarking Preprocessing and Integration Methods in Single-Cell Genomics
No ratings yet
Benchmarking Preprocessing and Integration Methods in Single-Cell Genomics
12 pages
Gibbs-BERTopic: Enhanced Topic Modeling
No ratings yet
Gibbs-BERTopic: Enhanced Topic Modeling
13 pages
CAGE: Constrained Graph Embedding Model
No ratings yet
CAGE: Constrained Graph Embedding Model
15 pages
Multivariate Analysis in Marketing Research
No ratings yet
Multivariate Analysis in Marketing Research
7 pages

Unit V

Uploaded by

Unit V

Uploaded by

UNIT V

Principal Component Analysis (PCA)

Non-Negative Matrix Factorization

Matrix Decomposition and Representation in NMF

matrices 𝑊 and 𝐻 with dimensions 𝑚 × 𝑘 and 𝑘 × 𝑛 respectively where both matrices

contain only non-negative elements:

𝐴𝑚×𝑛 ≈ 𝑊𝑚×𝑘 𝐻𝑘×𝑛

• 𝐴 : Original input matrix (a linear combination of W and H)

• 𝑊: Feature matrix (basis components)

• 𝐻: Coefficient matrix (weights associated with W)

• 𝑘: Rank (dimensionality of the reduced representation where 𝑘 ≤ min⁡(𝑚, 𝑛)

represented as a combination of fundamental features found in 𝑊.

choosing a lower dimension k the decomposition highlights essential features while

• Each data point (column in 𝐴) is approximated as a combination of non-negative feature

defines how strongly each image is composed of these features.

optimization process that minimizes reconstruction error:

1. Initialization: Start with random non-negative values for 𝑊and 𝐻.

3. Stopping Criteria: The process stops when:

• The reconstruction error stabilizes.

Common optimization techniques for NMF include:

• Multiplicative Update Rules: Ensures non-negativity by iteratively

Let's take a case of image processing.

• Suppose we have an input image having pixels that form matrix A.

[Matrix H]. (As shown in below image)

NMF in Image Processing

NMF has a wide range of applications including:

• Image Processing: Feature extraction in facial recognition and object detection.

• Text Mining and NLP Task: Topic modeling by decomposing a document-term

matrix into key topics.

• Bioinformatics: Gene expression analysis for identifying molecular patterns in

3. Assign each data point to the nearest centroid.

4. Recalculate the centroid of each cluster.

5. Repeat steps 3 and 4 until convergence is achieved.

1. Start with all points separate:

• 𝑅𝑖𝑗 is the dissimilarity (distance) between cluster i and cluster j.

3. Calinski-Harabasz Index (Variance Ratio Criterion)

• 𝑝(𝑦𝑖 ) and 𝑝(𝑧𝑖 ) are the marginal probabilities.

K-Means, K- Needs k predefined, assumes

Common questions

What are common applications of Non-Negative Matrix Factorization (NMF) in real-world data analysis, particularly in text mining and image processing?

What are common applications of Non-Negative Matrix Factorization (NMF) in real-world data analysis, particularly in text mining and image processing?

How does Principal Component Analysis (PCA) contribute to reducing multicollinearity in datasets, and what are the implications of this for data preprocessing?

How does Principal Component Analysis (PCA) contribute to reducing multicollinearity in datasets, and what are the implications of this for data preprocessing?

Compare the approaches of DBSCAN and hierarchical clustering in handling clusters of arbitrary shapes, and discuss their effectiveness.

Compare the approaches of DBSCAN and hierarchical clustering in handling clusters of arbitrary shapes, and discuss their effectiveness.

What are the potential disadvantages of applying Principal Component Analysis (PCA) to a dataset, particularly in terms of interpretability and information loss?

What are the potential disadvantages of applying Principal Component Analysis (PCA) to a dataset, particularly in terms of interpretability and information loss?

How does hierarchical clustering visualize the progression of cluster formations, and what are the practical benefits of this approach?

How does hierarchical clustering visualize the progression of cluster formations, and what are the practical benefits of this approach?

Discuss the differences in assumptions between K-Means and Gaussian Mixture Models regarding data distribution and cluster shapes, and their implications for real-world data.

Discuss the differences in assumptions between K-Means and Gaussian Mixture Models regarding data distribution and cluster shapes, and their implications for real-world data.

What are the advantages and potential pitfalls of using silhouette score to evaluate clustering algorithms?

What are the advantages and potential pitfalls of using silhouette score to evaluate clustering algorithms?

Explain the role of eigenvectors and eigenvalues in Principal Component Analysis and how they influence the dimensionality reduction process.

Explain the role of eigenvectors and eigenvalues in Principal Component Analysis and how they influence the dimensionality reduction process.

In what types of datasets might Non-Negative Matrix Factorization (NMF) outperform Principal Component Analysis (PCA), and why?

In what types of datasets might Non-Negative Matrix Factorization (NMF) outperform Principal Component Analysis (PCA), and why?

How does the choice of 'eps' and 'MinPts' parameters in DBSCAN affect its clustering performance, and what strategies can be employed to optimize these settings?

How does the choice of 'eps' and 'MinPts' parameters in DBSCAN affect its clustering performance, and what strategies can be employed to optimize these settings?

You might also like