0% found this document useful (0 votes)

10 views39 pages

K-means Clustering Explained

Clustering is a data mining technique that groups similar data objects into clusters based on their attributes, with applications in various fields such as economics, marketing, and medicine. The document discusses basic clustering methods, particularly focusing on k-means clustering, which involves partitioning data into k clusters based on proximity to centroids. Limitations of k-means include challenges with clusters of varying sizes, densities, shapes, and the presence of outliers.

Uploaded by

lammoor.26

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

10 views39 pages

K-means Clustering Explained

Uploaded by

lammoor.26

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

CHAPTER 8

Clustering
Clustering
K-means
Clustering
3

Clustering
• Clustering is the process of grouping a set of data objects
into multiple groups or clusters so that objects within a cluster
have high similarity, but are very dissimilar to objects in other
clusters.

• Dissimilarities and similarities are assessed based on the

attribute values describing the objects and often involve
distance measures.

• Clustering as a data mining tool has its roots in many

application areas such as biology, security, business
intelligence, and Web search.
4

Clustering
In many fields there are obvious benefits to be had from
grouping together similar objects. For example
In an economics application, we might be interested in finding
countries whose economies are similar.
In a marketing application, we might wish to find clusters of
customers with similar buying behavior.
In a medical application, we might wish to find clusters of
patients with similar symptoms.
In a document retrieval application, we might wish to find
clusters of documents with related content or topics.
In business intelligence, we might wish to organize a large
number of customers into groups, where customers within a
group share strong similar characteristics.
5

What is Cluster Analysis?

• Finding groups of objects such that the objects in a group
will be similar (or related) to one another and different from
(or unrelated to) the objects in other groups

Inter-cluster
Intra-cluster
distances are
distances are
maximized
minimized
6

Clustering
Object can be described by the values of just two attributes.
So, we can represent them as points in a two-dimensional
space (a plane).
7

Clustering
It is usually easy to visualize clusters in two
dimensions.
The points seem to fall naturally into four groups
as shown by the curves drawn surrounding sets of
points.
8

Clustering
• However, there is frequently more than one
possibility.
9

Clustering
• What is a natural grouping among these objects?
10

Clustering
11

Basic Clustering Methods

The major fundamental clustering methods can be
classified into the following categories:

• Hierarchical algorithms:
• A set of nested clusters organized as a hierarchical tree

• Partitional algorithms:
• A division data objects into non-overlapping subsets (clusters)
such that each data object is in exactly one subset
12

Basic Clustering Methods

Method General Characteristics

• Clustering is a hierarchical decomposition (i.e., multiple

Hierarchical levels).
methods • Cannot correct erroneous merges or splits.
• Permit cluster to have sub clusters.
Partitioning • Find mutually exclusive clusters of round shape.
methods • Distance-based.
• May use mean or medoid (etc.) to represent cluster center.
• Effective for small- to medium-size data sets.
• Division of the set of data object into non-overlapping
subsets.
14

Basic Clustering Methods

Partitioning methods
2- Partitioning methods: w̅
toGroup
object

• Given a set of n objects, a partitioning method constructs

k partitions of the data, where each partition represents a
cluster and n ≥ k. That is, it divides the data into k groups
ÉF objects group
such that each group must contain at least one object.

• In other words, partitioning methods conduct one-level

partitioning on data sets.

• The basic partitioning methods typically adopt exclusive

cluster separation. That is, each object must belong to
exactly one group.
16

Partitioning methods Interclasturing May distance

is I intra in min distance
❑ Most partitioning methods are distance-based.

❑ Given k, the number of partitions to construct, a partitioning method

creates an initial partitioning.

❑ It then uses an iterative relocation technique that attempts to improve

the partitioning by moving objects from one group to another.

❑ The general criterion of a good partitioning is that objects in the same

cluster are “close” or related to each other, whereas objects in different

clusters are “far apart” or very different. There are various kinds of
other.
K-Means
18

K-means Clustering
There are many methods of clustering. The most commonly
used algorithms are: k-means clustering and hierarchical
clustering.

k-means clustering is an exclusive clustering algorithm. Each object

is assigned to precisely one of a set of clusters.

For this method of clustering, we start by deciding how many

clusters we would like to form from the data.
We call this value k.

The value of k is generally a small integer, such as 2, 3, 4 or 5, but

may be larger.
19

K-means Clustering
• Partitional clustering approach
• Each cluster is associated with a centroid (center point)
• Each point is assigned to the cluster with the closest
centroid I
• Number of clusters, K, must be specified
• The basic algorithm is very simple

Steps
Random
20

Algorithm k-means
The k-means algorithm is an algorithm to cluster n
objects based on attributes into k partitions, where k
< n.
Steps
1. Decide on a value for k.
2. Initialize the k cluster centers (randomly, if necessary).
3. Decide the class memberships of the N objects by
assigning them to the nearest cluster center.
4. Re-estimate the k cluster centers, by assuming the
memberships found above are correct.
5. If none of the N objects changed membership in the last
iteration, exit. Otherwise go to 3.
21

K-means Clustering – Details

g.s 1
• Initial centroids are often chosen randomly.
• Clusters produced vary from one run to another.
• The centroid is (typically) the mean of the points in the cluster.
• ‘Closeness’ is measured by Euclidean distance, cosine
similarity, correlation, etc. [Link]

• K-means will converge for common similarity measures

mentioned above.
• Most of the convergence happens in the first few iterations.
• Often the stopping condition is changed to ‘Until relatively few points
change clusters’
• Complexity is O( n * K * I * d )
• n = number of points, K = number of clusters,
• I = number of iterations, d = number of attributes
22

Algorithm k-means

Iii
23

Importance of Choosing Initial Centroids

Iteration 1 Iteration 2 Iteration 3
3 3 3

2.5 2.5 2.5

2 2 2

1.5 1.5 1.5

y
1 1 1

0.5 0.5 0.5

0 0 0

-2 -1.5 -1 -0.5 0 0.5 1 1.5 2 -2 -1.5 -1 -0.5 0 0.5 1 1.5 2 -2 -1.5 -1 -0.5 0 0.5 1 1.5 2
x x x

i
Iteration 4 Iteration 5 Iteration 6
3 3 3

2.5 2.5 2.5

2 2 2

1.5 1.5 1.5

y
1 1 1

0.5 0.5 0.5

0 0 0

-2 -1.5 -1 -0.5 0 0.5 1 1.5 2 -2 -1.5 -1 -0.5 0 0.5 1 1.5 2 -2 -1.5 -1 -0.5 0 0.5 1 1.5 2
x x x
24

Distance Measure between Objects

It is first necessary to decide on a way of measuring the
distance between two points.
As for nearest neighbour classification, a measure commonly
used when clustering is the Euclidean distance.
We will assume that all attribute values are continuous.

of
25

Centroid
We need to introduce the notion of the ‘centre’ of a
cluster, generally called its centroid.
Assuming that we are using Euclidean distance or
something similar as a measure, we define the
centroid of a cluster to be the point for which each
attribute value is the average of the values of the
corresponding attribute for all the points in the cluster.

Hobject

centroid mean
26

K-means Clustering Algorithm: Example

We will use the k-means

algorithm to cluster 16 objects.
I I
Each object with two attributes
x and y. 3vahe
ofdiston
2Ddimention
27

K-means Clustering Algorithm: Example

Step 1: Set initial k and initial centroids
We will assume that we have chosen k = 3 and that
three points have been selected to be the locations
of the initial three centroids.
This initial (arbitrary) centroids are shown below.
29

K-means Clustering Algorithm: Example

Step 2: Calculate the Euclidean distance
Calculate the Euclidean distance of each of the 16
points from the three centroids.
For example, the distance of the first point (6.8, 12.6)
from the first centroid (3.8, 9.9) is simply

Step 3: Assign objects to clusters

The column ‘cluster’ indicates the centroid closest to
each point and thus the cluster to which it should be
assigned.
30

K-means Clustering Algorithm: Example

Fitation

meEating
Average of the clustrius
9am
31

K-means Clustering Algorithm: Example

0.8 1.2 2.893.8 4.4 4.8 6.6 8.2 9
y
C [Link].916s 7ft astsa
32

K-means Clustering Algorithm: Example

Step 4: Recalculate the centroids of each cluster

• We next calculate the centroids of the three clusters using the x and
y values of the objects currently assigned to each one. For example:
6.0+6.2+7.6
• Centroid 3= 𝑥 = = 6.6
3
19.9+18.5+17.4
• Centroid 3= 𝑦 = = 18.5
3

the
Just
Same
33

K-means Clustering Algorithm: Example

• The three centroids have all been moved by
the assignment process, but the movement of
the third one is less than for the other two.
34

K-means Clustering Algorithm: Example

d1 d2 d3 Cluster
Repeat Step 2 & Step 3: Calculate
the Euclidean distance from new
centroids, then assign objects to
clusters
This gives the revised set of
clusters shown in Figure 14.11.
The centroids from now on the
centroids are ‘imaginary points’
corresponding to the ‘centre’ of
each cluster, not actual points
within the clusters.
In fact only one point has moved.
The object at (8.3, 6.9) has moved
from cluster 2 to cluster 1.
35

K-means Clustering Algorithm: Example

Repeat Step 4: recalculate the positions of the three
centroids.

Stop
6 dig
I
36

K-means Clustering Algorithm: Example

The first two centroids have moved a little, but the
third has not moved at all.
We assign the 16 objects to clusters once again, as
below.
37

K-means Clustering Algorithm: Example

These are the same clusters as before.
Their centroids will be the same as
those from which the clusters were
generated.
Hence the termination condition of the k-
means algorithm ‘repeat ... until the
centroids no longer move’ has been
met and these are the final clusters wie [Link] 11 I
produced by the algorithm.
38

Limitations of K-means
• K-means has problems when clusters are of
differing
supporise
• Sizes Classfication
Chestring unsuppernised
• Densities
• Non-round shapes

• K-means has problems when the data contains

outliers. I
1 Éo
datacleaningveryimportanttoavoidhis
problems
39

End of the Chapter

Thanks

Clustering Techniques - 10-12 - Shared
No ratings yet
Clustering Techniques - 10-12 - Shared
50 pages
Understanding Clustering Techniques
No ratings yet
Understanding Clustering Techniques
23 pages
13b K Means Clustering Clustering Concept PPTX Lyst7529
No ratings yet
13b K Means Clustering Clustering Concept PPTX Lyst7529
13 pages
K-Means Clustering Explained
No ratings yet
K-Means Clustering Explained
32 pages
Lecture
No ratings yet
Lecture
23 pages
Unsupervised Learning: Clustering Techniques
No ratings yet
Unsupervised Learning: Clustering Techniques
82 pages
K-means Clustering Explained
No ratings yet
K-means Clustering Explained
29 pages
Cluster
No ratings yet
Cluster
31 pages
Unit III
No ratings yet
Unit III
11 pages
K-means Clustering Explained
No ratings yet
K-means Clustering Explained
31 pages
Mixture Models in Machine Learning
No ratings yet
Mixture Models in Machine Learning
147 pages
Understanding Cluster Analysis Techniques
No ratings yet
Understanding Cluster Analysis Techniques
43 pages
Clustering Techniques and K-Means Guide
No ratings yet
Clustering Techniques and K-Means Guide
23 pages
Understanding Clustering Algorithms in ML
No ratings yet
Understanding Clustering Algorithms in ML
47 pages
Understanding Clustering Algorithms
No ratings yet
Understanding Clustering Algorithms
74 pages
Clustering Techniques in Machine Learning
No ratings yet
Clustering Techniques in Machine Learning
125 pages
Unit IV Clustering
No ratings yet
Unit IV Clustering
116 pages
Understanding Cluster Analysis Techniques
No ratings yet
Understanding Cluster Analysis Techniques
115 pages
Clustering Techniques in Machine Learning
No ratings yet
Clustering Techniques in Machine Learning
38 pages
Understanding Cluster Analysis Techniques
No ratings yet
Understanding Cluster Analysis Techniques
34 pages
Clustering Techniques for Data Insights
No ratings yet
Clustering Techniques for Data Insights
8 pages
Understanding Clustering in Unsupervised Learning
No ratings yet
Understanding Clustering in Unsupervised Learning
52 pages
Clustering Methods in Unsupervised Learning
No ratings yet
Clustering Methods in Unsupervised Learning
35 pages
Unit II Notes 2
No ratings yet
Unit II Notes 2
35 pages
Understanding Clustering in Machine Learning
No ratings yet
Understanding Clustering in Machine Learning
20 pages
Clustering Techniques and Use Cases
No ratings yet
Clustering Techniques and Use Cases
12 pages
K-Means Clustering Explained
No ratings yet
K-Means Clustering Explained
32 pages
Unit 5 Materials
No ratings yet
Unit 5 Materials
65 pages
Clustering and Ensemble Methods Overview
No ratings yet
Clustering and Ensemble Methods Overview
28 pages
K-means & Hierarchical Clustering Explained
No ratings yet
K-means & Hierarchical Clustering Explained
13 pages
K-Means Clustering in Python
No ratings yet
K-Means Clustering in Python
47 pages
Understanding Clustering Techniques
No ratings yet
Understanding Clustering Techniques
80 pages
Unsupervised Learning: Clustering Methods
No ratings yet
Unsupervised Learning: Clustering Methods
51 pages
K-Means Clustering Analysis in Jaipur
100% (1)
K-Means Clustering Analysis in Jaipur
26 pages
Cluster Analysis with Python Guide
No ratings yet
Cluster Analysis with Python Guide
47 pages
Customer Clustering Techniques in ML
No ratings yet
Customer Clustering Techniques in ML
13 pages
Understanding Clustering Techniques
No ratings yet
Understanding Clustering Techniques
28 pages
Understanding k-Means Clustering Techniques
No ratings yet
Understanding k-Means Clustering Techniques
68 pages
Clustering Techniques and Algorithms
No ratings yet
Clustering Techniques and Algorithms
48 pages
Clustering and Time-Series Forecasting Techniques
No ratings yet
Clustering and Time-Series Forecasting Techniques
23 pages
K-means Clustering Overview and Applications
No ratings yet
K-means Clustering Overview and Applications
23 pages
13 K Means Medoids Hierarchical Clustering
No ratings yet
13 K Means Medoids Hierarchical Clustering
65 pages
FALLSEM2025 26 VL ISWE209L 00100 TH 2025-11-14 Module 7 Clsutering Theory and Problem
No ratings yet
FALLSEM2025 26 VL ISWE209L 00100 TH 2025-11-14 Module 7 Clsutering Theory and Problem
40 pages
Unsupervised Learning and Clustering Techniques
No ratings yet
Unsupervised Learning and Clustering Techniques
50 pages
Unsupervised Learning & K-Means Clustering
No ratings yet
Unsupervised Learning & K-Means Clustering
19 pages
Understanding Clustering in Machine Learning
No ratings yet
Understanding Clustering in Machine Learning
154 pages
Unsupervised Learning: Clustering Techniques
No ratings yet
Unsupervised Learning: Clustering Techniques
85 pages
Odule Lustering Nalysis
No ratings yet
Odule Lustering Nalysis
63 pages
KMeans Report
No ratings yet
KMeans Report
33 pages
Clustering Algorithms Overview and Methods
No ratings yet
Clustering Algorithms Overview and Methods
19 pages
Data Analysis: Clustering Methods Explained
No ratings yet
Data Analysis: Clustering Methods Explained
53 pages
Unsupervised Learning: Clustering Methods
No ratings yet
Unsupervised Learning: Clustering Methods
60 pages
Cluster Analysis: Concepts & Methods
No ratings yet
Cluster Analysis: Concepts & Methods
98 pages
Understanding Cluster Analysis in Data Mining
No ratings yet
Understanding Cluster Analysis in Data Mining
18 pages
ML Unit5 Clustering
No ratings yet
ML Unit5 Clustering
57 pages
Introduction to Data Mining Concepts
No ratings yet
Introduction to Data Mining Concepts
59 pages
Data Mining Process Overview
No ratings yet
Data Mining Process Overview
41 pages
Decision Tree Classification Explained
No ratings yet
Decision Tree Classification Explained
64 pages
Frequent Pattern Mining Techniques
No ratings yet
Frequent Pattern Mining Techniques
58 pages
Privacy Preserving
No ratings yet
Privacy Preserving
12 pages
Neural Network Backpropagation Quiz
No ratings yet
Neural Network Backpropagation Quiz
3 pages
Machine Learning System Design Insights
No ratings yet
Machine Learning System Design Insights
51 pages
Understanding BiLSTM for Sequence Modeling
No ratings yet
Understanding BiLSTM for Sequence Modeling
13 pages
ANN Classification and Initialization Quiz
No ratings yet
ANN Classification and Initialization Quiz
4 pages
Feature Fusion in Multimodal Hand Biometrics
No ratings yet
Feature Fusion in Multimodal Hand Biometrics
21 pages
AI, ML, and DL: Key Differences Explained
No ratings yet
AI, ML, and DL: Key Differences Explained
11 pages
AI Development Roadmap: LLM to Agents
No ratings yet
AI Development Roadmap: LLM to Agents
2 pages
Research Paper 2025-2026 (Initial Drafting On 28-Jan-2026)
No ratings yet
Research Paper 2025-2026 (Initial Drafting On 28-Jan-2026)
5 pages
Unified Data-Free Compression Framework
No ratings yet
Unified Data-Free Compression Framework
10 pages
5G SDN Traffic Classification Insights
No ratings yet
5G SDN Traffic Classification Insights
6 pages
MSG-GAN: Stable Multi-Scale Image Synthesis
No ratings yet
MSG-GAN: Stable Multi-Scale Image Synthesis
10 pages
Enhanced Grad-CAM++ for Object Detection
No ratings yet
Enhanced Grad-CAM++ for Object Detection
5 pages
Sign Language Detection System Architecture
No ratings yet
Sign Language Detection System Architecture
4 pages
Phishing Website Prediction Techniques
No ratings yet
Phishing Website Prediction Techniques
23 pages
PrAACT: Predictive AAC with Transformers
No ratings yet
PrAACT: Predictive AAC with Transformers
49 pages
Real-Time Road Damage Detection Survey
No ratings yet
Real-Time Road Damage Detection Survey
13 pages
Data Orchestration in DNN Accelerators
No ratings yet
Data Orchestration in DNN Accelerators
166 pages
AI ML DL Cheat Sheet One Liner
No ratings yet
AI ML DL Cheat Sheet One Liner
5 pages
Step-By-step Guide To Become A Gen Ai Engineer
No ratings yet
Step-By-step Guide To Become A Gen Ai Engineer
7 pages
Deep Learning Architectures Explained
No ratings yet
Deep Learning Architectures Explained
12 pages
Comparative Analysis of Regression Models
No ratings yet
Comparative Analysis of Regression Models
8 pages
Understanding Semi-supervised Learning
No ratings yet
Understanding Semi-supervised Learning
31 pages
YOLOv5s6 for Human Pose Estimation
No ratings yet
YOLOv5s6 for Human Pose Estimation
3 pages
AI & ML Exam Notes: Key Concepts
No ratings yet
AI & ML Exam Notes: Key Concepts
7 pages
Understanding Convolutional Neural Networks
No ratings yet
Understanding Convolutional Neural Networks
12 pages
EPFL Machine Learning Final Exam 2017
No ratings yet
EPFL Machine Learning Final Exam 2017
18 pages
Image Generation with Stable Diffusion
No ratings yet
Image Generation with Stable Diffusion
29 pages
Molmo2 Tech Report
No ratings yet
Molmo2 Tech Report
58 pages
AI-Driven Cloud Data Engineering Course
No ratings yet
AI-Driven Cloud Data Engineering Course
13 pages

K-means Clustering Explained

Uploaded by

K-means Clustering Explained

Uploaded by

CHAPTER 8

• Dissimilarities and similarities are assessed based on the

• Clustering as a data mining tool has its roots in many

What is Cluster Analysis?

Basic Clustering Methods

Basic Clustering Methods

Basic Clustering Methods

• Clustering is a hierarchical decomposition (i.e., multiple

Basic Clustering Methods

• Given a set of n objects, a partitioning method constructs

• In other words, partitioning methods conduct one-level

• The basic partitioning methods typically adopt exclusive

Partitioning methods Interclasturing May distance

❑ Given k, the number of partitions to construct, a partitioning method

creates an initial partitioning.

❑ It then uses an iterative relocation technique that attempts to improve

the partitioning by moving objects from one group to another.

❑ The general criterion of a good partitioning is that objects in the same

cluster are “close” or related to each other, whereas objects in different

k-means clustering is an exclusive clustering algorithm. Each object

For this method of clustering, we start by deciding how many

The value of k is generally a small integer, such as 2, 3, 4 or 5, but

K-means Clustering – Details

• K-means will converge for common similarity measures

Importance of Choosing Initial Centroids

2.5 2.5 2.5

1.5 1.5 1.5

0.5 0.5 0.5

2.5 2.5 2.5

1.5 1.5 1.5

0.5 0.5 0.5

Distance Measure between Objects

K-means Clustering Algorithm: Example

We will use the k-means

K-means Clustering Algorithm: Example

K-means Clustering Algorithm: Example

K-means Clustering Algorithm: Example

Step 3: Assign objects to clusters

K-means Clustering Algorithm: Example

K-means Clustering Algorithm: Example

K-means Clustering Algorithm: Example

K-means Clustering Algorithm: Example

K-means Clustering Algorithm: Example

K-means Clustering Algorithm: Example

K-means Clustering Algorithm: Example

K-means Clustering Algorithm: Example

• K-means has problems when the data contains

End of the Chapter

You might also like