CLUSTERING
ALGORITHMS
INTRODUCTION.
Machine Learning Algorithms
Supervised Learning Unsupervised Learning Reinforcement Learning
© 2 0 2 4
INTRODUCTION.
© 2 0 2 4
INTRODUCTION.
© 2 0 2 4
INTRODUCTION.
• Clustering is an unsupervised machine learning
technique used to group similar data points into
clusters.
• The goal is to organize a dataset into meaningful
structures based on the inherent patterns and
similarities in the data.
© 2 0 2 4
APPLICATIONS OF CLUSTERING.
• Customer Segmentation: Grouping customers based on
purchasing behavior for personalized marketing.
• Anomaly Detection: Identifying unusual patterns or outliers
in data (e.g., fraud detection).
• Image Compression: Grouping similar pixels in images to
reduce the size of the image while retaining quality.
• Document Clustering: Grouping similar documents for topic
modeling or summarization.
© 2 0 2 4
COMMON APPROACHES.
0 Centroid-based
1
02 Density-based
03 Distribution-based
04 Hierarchical-based
© 2 0 2 4
K-MEANS CLUSTERING.
How it works:
[Link] the number of clusters (k).
[Link] initialize k centroids.
[Link] each data point to the nearest
centroid.
[Link] the centroids based on the
points in each cluster.
[Link] steps 3-4 until convergence (no
change in centroids).
© 2 0 2 4
EVALUATION METRICS.
•Silhouette Score: Measures how well data points are clustered by comparing cohesion
(within-cluster similarity) and separation (between-cluster dissimilarity). Scores range
from -1 to 1, with higher scores indicating better-defined clusters.
•Davies-Bouldin Index: Assesses the compactness and separation of clusters. Lower
scores suggest better clustering by comparing intra-cluster and inter-cluster distances.
•Calinski-Harabasz Index: Evaluates the ratio of between-cluster variance to within-
cluster variance, with higher scores indicating compact and well-separated clusters.
•Adjusted Rand Index (ARI): Compares the clustering results to ground truth labels,
correcting for random chance. ARI values range from -1 to 1, where 1 indicates perfect
clustering.
•Mutual Information (MI): Quantifies how well the clustering corresponds to known
labels. Higher MI scores indicate stronger alignment between predicted and true clusters.
© 2 0 2 4
DEMO.
Tools Used
Live Demo
© 2 0 2 4