0% found this document useful (0 votes)

8 views25 pages

Overview of Cluster Analysis Methods

Clustering in data mining

Uploaded by

Bhavani Viswa

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

8 views25 pages

Overview of Cluster Analysis Methods

Clustering in data mining

Uploaded by

Bhavani Viswa

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

CLUSTERING

ClusterAnalysis
Types of Data in Cluster Analysis

A Categorization of Major Clustering

Methods
Cluster Analysis
 Cluster is a collection of data objects:
 Similar to one another within the same cluster,
 Dissimilar to the objects in other clusters.

 Cluster analysis is an analysis used to finding similarities

between data according to the characteristics found in the
data and grouping similar data objects into clusters.

 Cluster analysis is used to form groups or clusters of similar

records based on several measures made on these records.

 Cluster is Unsupervised learning, i.e., no predefined classes.

 This data has been applied in many areas, including

astronomy, archaeology, medicine, chemistry, education,
psychology, linguistics and sociology.
Application of Clustering
 Use of Clustering in Medical Image
Database
 Pattern Recognition
 Spatial Data Analysis
 Detect spatial clusters or for other
spatial mining tasks
 Image Processing
 Economic Science
 WWW
Requirements of Clustering in Data Mining

 The following are typical requirements of

clustering in data mining.

 Scalability.
 Ability to deal with different types of attributes.
 Discovery of clusters with arbitrary shape.
 Ability to deal with noisy data.
 Minimal requirements for domain knowledge to
determine input parameters.
 Insensitivity to the order of input records.
 High dimensionality.
 Constraint-based clustering.
 Interpretability and usability.
TYPES OF DATA IN CLUSTER ANALYSIS

 Types of data in cluster analysis are:

 Interval-scaled variables. Example: salary, height.

 Binary variables. Example: gender (Male/Female), has_cancer

(True/False)

 Nominal (categorical) variables. Example: religion (Christian,

Muslim, Buddhist, Hindu, etc.)

 Ordinal variables. Example: military-rank (soldier, sergeant,

captain, etc.)

 Ratio-scaled variables. Example: Population growth (1, 10, 100,

1000, ...)

 Variables of mixed types.

CATEGORISATION OF MAJOR CLUSTERING
METHODS

 We can divide the clustering methods

into two main groups:
 Hierarchical methods
 Partitioning methods

Additional three main categories:

 Density-based methods,
 Model-based clustering
 Grid-based methods.
Partitioning Methods
Definition: Given a database of n objects. A partition
method constructs k partitions of the data, where
each partition represents a cluster and k<=n.

It classifies the data into k groups, which satisfy the

following requirements,
1. Each group must contain at least one object.
2. Each object must belong to exactly one group.

Partitioning Algorithms:
K-means
K- medoids
K- means Partitioning
 Each cluster is represented by the mean value of the objects
in the cluster. Hence, it is known as Centroid-Based
technique.

Working method:
 First, it randomly selects k of the objects, each of which
initially represents a cluster mean.

 For each of the remaining objects, an object is assigned

to the cluster to which it is most similar, based on the
distance between the object and the cluster mean.

 Then, it compute the new mean for each cluster.

 The above process will be continue until the criterion

function converges.
K-Means Clustering
k-means algorithm is implemented as below:
Input: Number of clusters – k, database of n objects
Output: Set of k clusters that minimize the squared error

Choose k objects as the initial cluster centers Repeat

(Re)assign each object to the cluster to which the object is most

similar based on the mean value of the objects in the cluster
Update the cluster means
Until no change

9
K-Means Clustering

1
0
K-Means Clustering Method

10 10
10
9 9
9
8 8
8
7 7
7
6 6
6
5 5
5
4 4
4
Assign 3
Update 3
3

2 each 2 the 2

1
objects
1
cluster 1

0 0
0
0 1 2 3 4 5 6 7 8 9 to 0 1
10
2 3 4 5 6 7 8 9 means 0 1 2 3 4 5 6 7 8 9
10
10
most
similar reassign reassign
center 10 10

K=2 9 9

8 8

Arbitrarily choose K 7 7

6 6
object as initial 5 5

cluster center 4 Update 4

3
the 3

2 2

1 cluster 1

0
0 1 2 3 4 5 6 7 8 9
means 0
0 1 2 3 4 5 6 7 8 9
10 10

9
K-Means Method

Strength:Relatively efficient: O(tkn), where n is # objects, k

is # clusters, and t is # iterations. Normally, k, t << n.
Comment: Often terminates at a local optimum. The global
optimum may be found using techniques such as: deterministic
annealing and genetic algorithms
Weakness

Applicable only when mean is defined – Categorical data

Need to specify k, the number of clusters, in advance
Unable to handle noisy data and outliers
Not suitable to discover clusters with non-convex shapes

1
2
Variations of the K-Means
Method
A few variants of the k-means which differ in
 Selection of the initial k means
 Dissimilarity calculations
 Strategies to calculate cluster
means categorical data: k-modes
Handling
 Replacing means of clusters with modes
 Using new dissimilarity measures to deal with categorical objects
 A mixture of categorical and numerical data: k-prototype method
Expectation Maximization
 Assigns objects to clusters based on the probability of membership
Scalability of k-means
 Compressible, Discardable, To be maintained in main memory
 Clustering Features

1
3
Problem of the K-Means
Method
The k-means algorithm is sensitive to outliers
 Since an object with an extremely large value may substantially distort the
distribution of the data.

K-Medoids: Instead of taking the mean value of the object in a cluster

as a reference point, medoids can be used, which is the most
centrally located object in a cluster.
10 10
9 9
8 8
7 7
6 6
5 5
4 4
3 3
2 2
1 1
0 0
0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9
10

1
4
K-Medoids Clustering
Method
PAM (Partitioning Around Medoids)
 starts from an initial set of medoids and iteratively replaces one of the
medoids by one of the non-medoids if it improves the total distance of
the resulting clustering
 All pairs are analyzed for replacement
 PAM works effectively for small data sets, but does not scale well for
large data sets

CLARA
CLARANS

1
5
K-
Medoids
Input: k, and database of n objects
Output: A set of k clusters
Method:
 Arbitrarily choose k objects as initial medoids
 Repeat
Assign each remaining object to cluster with nearest medoid
Randomly select a non-medoid oarndom
Compute cost S of swapping oj with oarndom
If S < 0 swap to form new set of k medoids
Until no change
Working Principle: Minimize sum of the dissimilarities between each object and its
corresponding reference point. That is, an absolute-error criterion is used
E  k  |po |
j 1 pCj j

1
6
K-
medoids
 Case 1: p currently belongs to medoid oj. If oj is replaced by orandomas a medoid and p is
closest to one of oi where i < > j then p is reassigned to oi.
 Case 2: p currently belongs to medoid oj. If oj is replaced by orandomas a medoid and p is
closest to orandomthen p is reassigned to orandom.

 Case 3: p currently belongs to medoid oi (i< >j) If oj is replaced by oarndomas a medoid

and p is still closest to oi assignment does not change
 Case 4: p currently belongs to medoid oi (i < > j). If oj is replaced by orandomas a medoid
and p is closest to orandomthen p is reassigned to orandom.

1
7
K-
medoids

 After reassignment difference in squared
error E is calculated. Total cost of

swapping – Sum of costs incurred by all
non-medoid objects
 If total cost is negative, o is replaced with o as
j random

E will be reduced

1
8
K-medoids
Algorithm

1
9
Hierarchical Methods
Agglomerative hierarchical clustering:
Each object initially represents a
cluster of its own. Then clusters are
successively merged until the desired
cluster structure is obtained.

Divisive hierarchical clustering:

All objects initially belong to one
cluster. Then the cluster is divided into sub-
clusters, which are successively divided
into their own sub-clusters. This process
continues until the desired cluster structure
Hierarchical Methods

 The hierarchical clustering methods could be further

divided according to the manner that the similarity
measure is calculated:

 Single-link clustering (also called the

connectedness, the minimum method or the nearest
neighbour method)

 Complete-link clustering (also called the diameter,

the maximum method or the furthest neighbour
method)

 Average-link clustering (also called minimum

variance method)
Density – Based Clustering Method

 This method is used to discover clusters with

arbitrary [Link] are three methods,

 DBSCAN: grows clusters according to a

density-based connectivity analysis.

 OPTICS: produce cluster obtained from a

wide range of parameter settings.

 DENCLUE: clusters objects based on a set of

density distribution functions.
Grid Based Clustering
 It uses multiresolution grid data structure.
 It quantizes the object space into a finite number of
cells, that form a grid structure.
 In that grid structure, all clustering operations are
performed.

 Approaches of Grid-based clustering:

 STING: which explores statistical information stored in
the grid cells.

 WAVE CLUSTER: wavelet transform method used to

cluster objects.

 CLIQUE: represents a grid & density based approach

for clustering in high-dimensional data space.
STING: STatistical Information Grid

 In this technique, the spatial area is divided

into rectangular cells.

 Each cell at high level is partitioned to form

a number of cells at next lower level.

 Statistical information regarding the

attributes in each grid cell is precomputed
and stored.

 These statistical parameters are useful for

query processing.
Wave Cluster

 It is a multiresolution clustering algorithm.

 It first summarizes the data on to the data

space.

 Then it transform the original feature space,

to find dense region by using wavelet
transformation.

Clustering Techniques and Algorithms
No ratings yet
Clustering Techniques and Algorithms
48 pages
Understanding Cluster Analysis Techniques
No ratings yet
Understanding Cluster Analysis Techniques
42 pages
Understanding Clustering Algorithms
No ratings yet
Understanding Clustering Algorithms
74 pages
Understanding Cluster Analysis Techniques
No ratings yet
Understanding Cluster Analysis Techniques
37 pages
Cluster Analysis: Methods and Applications
No ratings yet
Cluster Analysis: Methods and Applications
53 pages
Clustering Methods Overview
No ratings yet
Clustering Methods Overview
89 pages
Chapter 5. Clustering Algorithms-Stud
No ratings yet
Chapter 5. Clustering Algorithms-Stud
44 pages
Cluster Analysis Methods Explained
No ratings yet
Cluster Analysis Methods Explained
31 pages
Unit 5 Materials
No ratings yet
Unit 5 Materials
65 pages
Clustering Techniques in Data Mining
No ratings yet
Clustering Techniques in Data Mining
21 pages
Clustering Techniques and Methods Overview
No ratings yet
Clustering Techniques and Methods Overview
43 pages
K-Means Clustering in Python
No ratings yet
K-Means Clustering in Python
47 pages
Clustering Techniques in Machine Learning
No ratings yet
Clustering Techniques in Machine Learning
40 pages
Module 4 - Clustering and Outlier Detection - Google Slides
No ratings yet
Module 4 - Clustering and Outlier Detection - Google Slides
30 pages
Understanding Clustering Techniques in Data Mining
No ratings yet
Understanding Clustering Techniques in Data Mining
18 pages
Clustering Techniques and Applications
No ratings yet
Clustering Techniques and Applications
93 pages
Clustering Techniques in Machine Learning
No ratings yet
Clustering Techniques in Machine Learning
40 pages
Clustering Techniques in Data Mining
No ratings yet
Clustering Techniques in Data Mining
40 pages
Understanding K-Means Clustering
No ratings yet
Understanding K-Means Clustering
22 pages
Overview of Clustering Methods
No ratings yet
Overview of Clustering Methods
83 pages
Grid-Based Clustering Overview
No ratings yet
Grid-Based Clustering Overview
19 pages
Partitioning Methods in Clustering
No ratings yet
Partitioning Methods in Clustering
20 pages
Unit IV Clustering
No ratings yet
Unit IV Clustering
116 pages
Understanding Cluster Analysis Techniques
No ratings yet
Understanding Cluster Analysis Techniques
7 pages
Unsupervised Learning: Cluster Analysis Guide
No ratings yet
Unsupervised Learning: Cluster Analysis Guide
60 pages
Cluster Analysis in Machine Learning
No ratings yet
Cluster Analysis in Machine Learning
34 pages
Partitioning Methods in Clustering
100% (1)
Partitioning Methods in Clustering
7 pages
Cluster Analysis in Data Mining Techniques
No ratings yet
Cluster Analysis in Data Mining Techniques
76 pages
Clustering Techniques in Data Mining
No ratings yet
Clustering Techniques in Data Mining
26 pages
Understanding Clustering in Machine Learning
No ratings yet
Understanding Clustering in Machine Learning
9 pages
Unsupervised Learning: Clustering Techniques
No ratings yet
Unsupervised Learning: Clustering Techniques
82 pages
Understanding Clustering in Data Analysis
No ratings yet
Understanding Clustering in Data Analysis
16 pages
Understanding Cluster Analysis Methods
No ratings yet
Understanding Cluster Analysis Methods
67 pages
Cluster Analysis Techniques Overview
No ratings yet
Cluster Analysis Techniques Overview
37 pages
Concepts and Techniques: Data Mining
No ratings yet
Concepts and Techniques: Data Mining
101 pages
Cluster Analysis: Methods & Techniques
No ratings yet
Cluster Analysis: Methods & Techniques
22 pages
Understanding Cluster Analysis Techniques
No ratings yet
Understanding Cluster Analysis Techniques
34 pages
Clustering Techniques in Unsupervised Learning
No ratings yet
Clustering Techniques in Unsupervised Learning
70 pages
Understanding Cluster Analysis
No ratings yet
Understanding Cluster Analysis
77 pages
Understanding Cluster Analysis Techniques
No ratings yet
Understanding Cluster Analysis Techniques
95 pages
K-Medoids Clustering Overview
No ratings yet
K-Medoids Clustering Overview
29 pages
Cluster Analysis: Methods and Concepts
No ratings yet
Cluster Analysis: Methods and Concepts
64 pages
Clustering Methods in Unsupervised Learning
No ratings yet
Clustering Methods in Unsupervised Learning
35 pages
Understanding Cluster Analysis Techniques
No ratings yet
Understanding Cluster Analysis Techniques
18 pages
Understanding Cluster Analysis in Data Mining
No ratings yet
Understanding Cluster Analysis in Data Mining
80 pages
Cluster Analysis Techniques Overview
No ratings yet
Cluster Analysis Techniques Overview
33 pages
Understanding Clustering in Machine Learning
No ratings yet
Understanding Clustering in Machine Learning
45 pages
Clustering Techniques in Machine Learning
No ratings yet
Clustering Techniques in Machine Learning
125 pages
Unit 3-Ecomp
No ratings yet
Unit 3-Ecomp
96 pages
Cluster Analysis in Machine Learning
No ratings yet
Cluster Analysis in Machine Learning
15 pages
Clustering Methods and Similarity Metrics
No ratings yet
Clustering Methods and Similarity Metrics
76 pages
Partitioning Methods in Cluster Analysis
No ratings yet
Partitioning Methods in Cluster Analysis
22 pages
Cluster Analysis: Methods & Applications
No ratings yet
Cluster Analysis: Methods & Applications
23 pages
Machine Learning Clustering Techniques
No ratings yet
Machine Learning Clustering Techniques
66 pages
Cluster Analysis: Concepts & Methods
No ratings yet
Cluster Analysis: Concepts & Methods
21 pages
Odule Lustering Nalysis
No ratings yet
Odule Lustering Nalysis
63 pages
Unsupervised Learning: Clustering Methods
No ratings yet
Unsupervised Learning: Clustering Methods
28 pages
Understanding Cluster Analysis in Data Mining
No ratings yet
Understanding Cluster Analysis in Data Mining
18 pages
Cluster Analysis Basics Explained
No ratings yet
Cluster Analysis Basics Explained
29 pages
Oracle 1Z0-084 Exam Topic 1 Discussion
No ratings yet
Oracle 1Z0-084 Exam Topic 1 Discussion
3 pages
Data Warehousing & Modeling Expertise
No ratings yet
Data Warehousing & Modeling Expertise
10 pages
Concurrency Control Issues Explained
No ratings yet
Concurrency Control Issues Explained
4 pages
Winning Hackathons: A Beginner's Guide
No ratings yet
Winning Hackathons: A Beginner's Guide
4 pages
Walmart PySpark Data Engineer Interview Guide
No ratings yet
Walmart PySpark Data Engineer Interview Guide
22 pages
SQL Server Error Handling and Triggers
No ratings yet
SQL Server Error Handling and Triggers
35 pages
SQL Functions, Joins, and Subqueries Guide
No ratings yet
SQL Functions, Joins, and Subqueries Guide
7 pages
Prashant Shaha: Oracle DBA Expert Profile
No ratings yet
Prashant Shaha: Oracle DBA Expert Profile
4 pages
SQL Movie Database Schema and Queries
No ratings yet
SQL Movie Database Schema and Queries
2 pages
Database Query Processing Explained
No ratings yet
Database Query Processing Explained
23 pages
SQL Injection in ASP.NET Applications
No ratings yet
SQL Injection in ASP.NET Applications
5 pages
Unit 5 Distributed Database
No ratings yet
Unit 5 Distributed Database
36 pages
MS Access Assignments for Students
No ratings yet
MS Access Assignments for Students
3 pages
Admission Test for Computer Engineering
No ratings yet
Admission Test for Computer Engineering
7 pages
Liquor Sales Analysis with BigQuery
No ratings yet
Liquor Sales Analysis with BigQuery
33 pages
Oracle 12c SQL Exam Prep Guide
No ratings yet
Oracle 12c SQL Exam Prep Guide
22 pages
Understanding VSAM: Key Concepts Explained
No ratings yet
Understanding VSAM: Key Concepts Explained
10 pages
Agentic AI Tools Overview and Functions
No ratings yet
Agentic AI Tools Overview and Functions
32 pages
Chapter 10,11,12 MCQ
No ratings yet
Chapter 10,11,12 MCQ
7 pages
Mapping with Oracle XML Gateway
100% (3)
Mapping with Oracle XML Gateway
24 pages
Splunk Enterprise Search Reference 6.4.1
No ratings yet
Splunk Enterprise Search Reference 6.4.1
3 pages
EPAS Essentials v15
No ratings yet
EPAS Essentials v15
432 pages
Database Assessment Paper Overview
No ratings yet
Database Assessment Paper Overview
6 pages
Azure Synapse Data Cleansing Tutorial
No ratings yet
Azure Synapse Data Cleansing Tutorial
13 pages
Power BI Developer Resume Summary
No ratings yet
Power BI Developer Resume Summary
2 pages
TAFJ-Default Properties PDF
100% (1)
TAFJ-Default Properties PDF
22 pages
AWS Tools for Real-Time Sensor Data
No ratings yet
AWS Tools for Real-Time Sensor Data
24 pages
IM 101 - Fundamentals of Database Systems - Unit 6
No ratings yet
IM 101 - Fundamentals of Database Systems - Unit 6
25 pages
Transforming Data Models to Relational Design
No ratings yet
Transforming Data Models to Relational Design
27 pages
Downloading Research Papers Guide
No ratings yet
Downloading Research Papers Guide
16 pages

Overview of Cluster Analysis Methods

Uploaded by

Overview of Cluster Analysis Methods

Uploaded by

CLUSTERING

A Categorization of Major Clustering

 Cluster analysis is an analysis used to finding similarities

 Cluster analysis is used to form groups or clusters of similar

 Cluster is Unsupervised learning, i.e., no predefined classes.

 This data has been applied in many areas, including

 The following are typical requirements of

 Types of data in cluster analysis are:

 Interval-scaled variables. Example: salary, height.

 Binary variables. Example: gender (Male/Female), has_cancer

 Nominal (categorical) variables. Example: religion (Christian,

 Ordinal variables. Example: military-rank (soldier, sergeant,

 Ratio-scaled variables. Example: Population growth (1, 10, 100,

 Variables of mixed types.

 We can divide the clustering methods

Additional three main categories:

It classifies the data into k groups, which satisfy the

 For each of the remaining objects, an object is assigned

 Then, it compute the new mean for each cluster.

 The above process will be continue until the criterion

Choose k objects as the initial cluster centers Repeat

(Re)assign each object to the cluster to which the object is most

cluster center 4 Update 4

Strength:Relatively efficient: O(tkn), where n is # objects, k

Applicable only when mean is defined – Categorical data

K-Medoids: Instead of taking the mean value of the object in a cluster

 Case 3: p currently belongs to medoid oi (i< >j) If oj is replaced by oarndomas a medoid

Divisive hierarchical clustering:

 The hierarchical clustering methods could be further

 Single-link clustering (also called the

 Complete-link clustering (also called the diameter,

 Average-link clustering (also called minimum

 This method is used to discover clusters with

 DBSCAN: grows clusters according to a

 OPTICS: produce cluster obtained from a

 DENCLUE: clusters objects based on a set of

 Approaches of Grid-based clustering:

 WAVE CLUSTER: wavelet transform method used to

 CLIQUE: represents a grid & density based approach

 In this technique, the spatial area is divided

 Each cell at high level is partitioned to form

 Statistical information regarding the

 These statistical parameters are useful for

 It is a multiresolution clustering algorithm.

 It first summarizes the data on to the data

 Then it transform the original feature space,

You might also like