0% found this document useful (0 votes)
10 views11 pages

Clustering Techniques in Data Mining

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views11 pages

Clustering Techniques in Data Mining

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

DATA MINING

PRESENTATION ON
CLUSTERING IN DATA MINING
AN OVERVIEW OF TECHNIQUES AND APPLICATIONS
BY
HARSH ROY(22/13071)
DEFINITION AND SCOPE
Definition: Clustering is the process of
grouping a set of objects such that objects in
the same group (cluster) are more similar to
each other than to those in other groups.
Importance: Helps uncover hidden patterns in
data, often used for exploratory data analysis.
Use Cases:
Market segmentation.
Document clustering for information
retrieval.
Image and video processing.
.
Partitioning Methods:
Example: K-means.
Divides data into non-overlapping subsets.

TYPES OF
Hierarchical Methods:
Example: Agglomerative clustering.
Creates a tree-like structure

CLUSTERING
(dendrogram).
Density-Based Methods:
Example: DBSCAN.
Groups data based on density of points.
K-MEANS
CLUSTERING
Algorithm Overview:
a. Initialize K cluster centroids randomly.
b. Assign each data point to the nearest
centroid.
c. Recalculate centroids as the mean of
assigned points.
d. Repeat until centroids stabilize.
Strengths:
Simple and fast for large datasets.
Weaknesses:
Sensitive to the choice of K and outliers.
HIERARCHICAL CLUSTERING
Types:
Agglomerative: Start with individual points, merge clusters.
Divisive: Start with one cluster, split iteratively.
Representation:
Dendrogram shows hierarchical relationships.
Strengths:
No need to specify the number of clusters in advance.
Weaknesses:
Computationally expensive for large datasets.
DENSITY BASED
CLUSTERING
DBSCAN Overview:
Groups points that are closely packed together based on
a specified density threshold.
Points in low-density regions are labeled as noise.
Applications:
Identifying irregularly shaped clusters.
Outlier detection.
Challenges:
Requires setting appropriate parameters (e.g., ε and
MinPts).
CLUSTER EVALUATION METRICS

Cluster Evaluation Metrics


Internal Metrics:
Silhouette Score: Measures how similar an object is to its
own cluster vs. others.
Cohesion: Measures intra-cluster similarity.
External Metrics:
Purity: Fraction of total data correctly assigned to its true
cluster.
Normalized Mutual Information (NMI): Measures shared
information between true and predicted clusters.
Challenges:
No universal metric; depends on the data and goals.
STRENGTHS AND WEAKNESSES OF
CLUSTERING METHODS
K-Means:
Strength: Efficient and simple.
Weakness: Assumes spherical clusters.
Hierarchical:
Strength: Captures nested clusters.
Weakness: Computationally intensive.
DBSCAN:
Strength: Detects clusters of arbitrary
shape.
Weakness: Struggles with varying
densities.
APPLICATIONS OF CLUSTERING
Real-World Applications:
Customer Segmentation:
Group customers based on purchasing behavior.
Image Segmentation:
Cluster pixels for object identification in images.
Climate Data Analysis:
Identify regions with similar weather patterns.
Fraud Detection:
Cluster unusual patterns in transaction data.
CONCLUSION
Summary:
Clustering is a versatile tool in data mining.
Various algorithms cater to different data characteristics.
Future Trends:
Development of more scalable and adaptive clustering methods.
Integration with deep learning techniques.
Closing Statement:
"Clustering continues to play a pivotal role in making sense of
complex datasets."
THANK YOU!
harshroydsc71@[Link]

HARSH ROY ( 22/13071 )

You might also like