0% found this document useful (0 votes)
3 views10 pages

Clustering Algorithms Overview

Clustering is an unsupervised machine learning technique that groups similar data points into clusters to reveal inherent patterns. Applications include customer segmentation, anomaly detection, image compression, and document clustering. Common approaches to clustering include centroid-based, density-based, distribution-based, and hierarchical methods, with evaluation metrics like Silhouette Score and Davies-Bouldin Index used to assess clustering quality.

Uploaded by

Samuel Yawson
Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views10 pages

Clustering Algorithms Overview

Clustering is an unsupervised machine learning technique that groups similar data points into clusters to reveal inherent patterns. Applications include customer segmentation, anomaly detection, image compression, and document clustering. Common approaches to clustering include centroid-based, density-based, distribution-based, and hierarchical methods, with evaluation metrics like Silhouette Score and Davies-Bouldin Index used to assess clustering quality.

Uploaded by

Samuel Yawson
Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd

CLUSTERING

ALGORITHMS
INTRODUCTION.
Machine Learning Algorithms

Supervised Learning Unsupervised Learning Reinforcement Learning

© 2 0 2 4
INTRODUCTION.

© 2 0 2 4
INTRODUCTION.

© 2 0 2 4
INTRODUCTION.

• Clustering is an unsupervised machine learning


technique used to group similar data points into
clusters.

• The goal is to organize a dataset into meaningful


structures based on the inherent patterns and
similarities in the data.

© 2 0 2 4
APPLICATIONS OF CLUSTERING.
• Customer Segmentation: Grouping customers based on
purchasing behavior for personalized marketing.

• Anomaly Detection: Identifying unusual patterns or outliers


in data (e.g., fraud detection).

• Image Compression: Grouping similar pixels in images to


reduce the size of the image while retaining quality.

• Document Clustering: Grouping similar documents for topic


modeling or summarization.

© 2 0 2 4
COMMON APPROACHES.

0 Centroid-based

1
02 Density-based

03 Distribution-based

04 Hierarchical-based

© 2 0 2 4
K-MEANS CLUSTERING.
How it works:
[Link] the number of clusters (k).
[Link] initialize k centroids.
[Link] each data point to the nearest
centroid.
[Link] the centroids based on the
points in each cluster.
[Link] steps 3-4 until convergence (no
change in centroids).

© 2 0 2 4
EVALUATION METRICS.
•Silhouette Score: Measures how well data points are clustered by comparing cohesion
(within-cluster similarity) and separation (between-cluster dissimilarity). Scores range
from -1 to 1, with higher scores indicating better-defined clusters.
•Davies-Bouldin Index: Assesses the compactness and separation of clusters. Lower
scores suggest better clustering by comparing intra-cluster and inter-cluster distances.
•Calinski-Harabasz Index: Evaluates the ratio of between-cluster variance to within-
cluster variance, with higher scores indicating compact and well-separated clusters.
•Adjusted Rand Index (ARI): Compares the clustering results to ground truth labels,
correcting for random chance. ARI values range from -1 to 1, where 1 indicates perfect
clustering.
•Mutual Information (MI): Quantifies how well the clustering corresponds to known
labels. Higher MI scores indicate stronger alignment between predicted and true clusters.

© 2 0 2 4
DEMO.
Tools Used

Live Demo

© 2 0 2 4

You might also like