0% found this document useful (0 votes)

20 views12 pages

Unit 5 (Part 2)

Uploaded by

gowri283u

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

20 views12 pages

Unit 5 (Part 2)

Uploaded by

gowri283u

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

Unsupervised Learning Techniques: Clustering, Hierarchical Clustering, Partitional

clustering: K-Means Clustering

Clustering:
Clustering is an unsupervised machine learning technique used to group similar data points
together without using labelled data. It helps discover hidden patterns or natural groupings in
datasets by placing similar data points into the same cluster.

The clustering technique can be widely used in various tasks. Some most common uses of
this technique are:

 Market Segmentation
 Statistical data analysis
 Social network analysis
 Image segmentation
 Anomaly detection

Types of Clustering
1. Hard Clustering
Hard clustering assigns each data point to exactly one cluster. A data point cannot belong to
multiple clusters, making the grouping clear and easy to interpret.
 Each data point belongs to only one cluster
 No overlap between clusters
If customers are divided into two clusters, each customer belongs completely to either Cluster
1 or Cluster 2. A customer cannot belong to both clusters at the same time.

2. Soft Clustering
Soft clustering allows a data point to belong to multiple clusters with different probabilities.
Instead of assigning a strict cluster, it gives a degree of membership to each cluster.
Example
A data point may belong 70% to Cluster 1 and 30% to Cluster 2, indicating that it shares
characteristics with both groups
Clustering Methods/ Types of Clustering
Clustering methods can be classified on the basis of how they form clusters
1. Centroid based Clustering
Centroid based clustering groups data points around central points called centroids. Each
cluster is represented by the average of its points and data points are assigned to the nearest
centroid.
Algorithms:
 K-means
 K-medoids
2. Density based Clustering
The density-based clustering method tends to connect the highly-dense areas into clusters,
and then the algorithm forms the arbitrarily shaped distributions as long as it can connect the
dense region. This algorithm achieves this by identifying different clusters in the dataset and
connecting the areas of high densities into clusters with each other. The algorithm then
divides the dense areas in data space from each other into sparser areas.

Distribution model-based clustering:

Within the distribution model-based clustering method, the algorithm divides the data on the
basis of the probability of how a dataset belongs to a particular distribution. The algorithm
then performs grouping by assuming some distributions commonly Gaussian Distribution.
Hierarchical Clustering
We can make use of Hierarchical clustering as an alternative for the partitioned clustering as
there is no requirement of pre-specifying the number of clusters that we need to specify
beforehand.
In this technique, the algorithm divides the dataset into clusters to create a tree-like structure,
which is referred to as a dendrogram. The algorithm can then select the observations or any
number of clusters by cutting the tree at the correct level. The most widely-known example of
this method is the Agglomerative Hierarchical algorithm.

Fuzzy Clustering
Fuzzy clustering is a type of soft clustering method in which a data object may belong to one
or more than one group or cluster. Each dataset consists of a set of membership coefficients,
which tends to depend on the degree of membership to be in a cluster. The Fuzzy C-means
algorithm is a common example of this type of clustering; it is sometimes even referred to as
the Fuzzy k-means algorithm.
Hierarchical Clustering:
Hierarchical clustering is an unsupervised learning algorithm used to group similar data
points into clusters. It builds a multilevel hierarchy of clusters by either merging smaller
clusters into larger ones (agglomerative) or dividing a large cluster into smaller ones
(divisive). This results in a tree-like structure known as a dendrogram.
A dendrogram is a visual representation that illustrates the arrangement of clusters and their
relationships with each other. The height of the branches in a dendrogram represents the
distance or dissimilarity at which clusters merge. Lower heights indicate clusters joined at
smaller distances, thus representing higher similarity.
Hierarchical clustering is particularly useful when the number of clusters is
not known beforehand.

Dendrograms:

A dendrogram is a tree-like structure that visualizes the process of

hierarchical clustering. Each level of the tree represents a merge or split
operation, and the height of the branches represents the distance (or
dissimilarity) at which clusters were joined.
it's important because:
 Allows visual inspection of clusters at different levels.
 By "cutting" the dendrogram at a certain height, you can choose the
number of clusters that best fits the data
Linkage criteria

Linkage criteria determine how distances between clusters are calculated

during the merging process.

The most common linkage functions are as follows:

1. Single linkage defines the distance between two clusters as the smallest pairwise
distance between elements from each cluster.

2. Complete linkage defines the distance between two clusters as the largest pointwise
distance.

3. Average linkage defines the cluster distance as the average pointwise distance.

4. Centroid linkage defines the cluster distance as the point distance between the cluster
means.
q

Types of Hierarchical Clustering

The two main types of hierarchical clustering are agglomerative (bottom-

up) and divisive (top-down).
1. Agglomerative (Bottom-Up)
Agglomerative clustering starts with each data point as an individual cluster and iteratively
merges the closest pair of clusters until only one cluster remains or until a stopping condition
is met (like a desired number of clusters).
This method is also called bottom-up because it starts from the bottom (individual data
points) and builds up to the top (final cluster).
Step-by-step guide:
1. Treat each data point as a singleton cluster.
2. Compute the pairwise distances between all clusters.
3. Merge the two clusters with the smallest distance.
4. Update the distance matrix to reflect the new set of clusters.
5. Repeat steps 2–4 until all points belong to a single cluster.
Divisive (Top-Down):
Divisive clustering starts with all data points in a single cluster and recursively splits them
into smaller clusters. The process continues until each data point is in its own individual
cluster.
It is also known as a top-down approach, as it starts at the top (single cluster) and breaks it
down into smaller clusters.
Step-by-step guide:
1. Start with one large cluster containing all data points.
2. Split the cluster into two that are as different as possible.
3. Reapply the splitting process recursively to each resulting cluster.

K-Means:
K-Means is an unsupervised learning algorithm used to find groups, or clusters, within data.
Here, the data is unlabelled, l. The algorithm’s goal is to discover hidden patterns in the data

so that similar points are grouped together. K mean is a clustering algorithm that is used to
classify unlabeled data into groups/clusters based on similarity.
Here K defines the number of pre-defined clusters that need to be created in the process if
K=2, there will be two clusters, and for K=3, there will be three clusters, and so on.
It is a centroid-based algorithm, where each cluster is associated with a centroid. The main
aim of this algorithm is to minimize the sum of distances between the data point and their
corresponding clusters.
The algorithm takes the unlabeled dataset as input, divides the dataset into k-number of
clusters, and repeats the process until it does not find the best clusters. The value of k should
be predetermined in this algorithm.
K-Mean Algorithm :
1. Plot Data
2. Select the number K to decide the number of clusters.
3. Select random K points or centroids. (It can be other from the input dataset).
4. Assign each data point to their closest centroid, which will form the predefined K
clusters.
5. Repeat the fourth step, which means reassigning each data point to the new closest
centroid of each cluster.

Example: We have a dataset of six points in 2D space:

We will cluster these points into K=2 clusters.

Step-by-Step Solution:
Step 1: Initialization
● Choose K=2
● Randomly select two initial centroids. Let's choose points A (1, 2) and D (8, 8) as the
initial centroids.
Step 2: Assignment Step
Calculate the distance between each point and the centroids. We'll use Euclidean distance.
Distance from each point to Centroid 1 (A: (1, 2)):
assign each point to the nearest centroid:
● Points A, B, and C are closest to Centroid 1.
● Points D, E, and F are closest to Centroid 2.
Step 3: Update Step
Calculate the new centroids:
New Centroid 1 (mean of points A, B, C):
Step 4: Repeat Assignment Step
Reassign each point to the nearest centroid:
1. Distance from each point to New Centroid 1 (2, 3):

Assign each point to the nearest centroid:

● Points A, B, and C are closest to Centroid 1.
● Points `D E and F are closest to Centroid 2.
The assignments have not changed, indicating convergence.
Final Clusters:
● Cluster 1: Contains points A, B, C with centroid (2, 3).
● Cluster 2: Contains points D, E, F with centroid (9, 9).

Example-2: cluster the following 8 points into 3 clusters A1(2,10), A2(2,5), A3(8,4),
A4(5,8), A6(7,5),A7(6,4), A8(1,2), A8(4,5)

Clustering Techniques in Machine Learning
No ratings yet
Clustering Techniques in Machine Learning
49 pages
Unsupervised Learning: Clustering Methods
No ratings yet
Unsupervised Learning: Clustering Methods
60 pages
Unsupervised Learning: Clustering Methods
No ratings yet
Unsupervised Learning: Clustering Methods
92 pages
Unsupervised Learning: Clustering Methods
No ratings yet
Unsupervised Learning: Clustering Methods
23 pages
Introduction to Clustering in ML
No ratings yet
Introduction to Clustering in ML
11 pages
Clustering Techniques in Unsupervised Learning
No ratings yet
Clustering Techniques in Unsupervised Learning
45 pages
Unsupervised Learning: Clustering & RL
No ratings yet
Unsupervised Learning: Clustering & RL
13 pages
K-Means Clustering Explained in Python
No ratings yet
K-Means Clustering Explained in Python
28 pages
Understanding Clustering in Machine Learning
No ratings yet
Understanding Clustering in Machine Learning
10 pages
DWDM Unit-Iv
No ratings yet
DWDM Unit-Iv
18 pages
Clustering Techniques in Machine Learning
No ratings yet
Clustering Techniques in Machine Learning
48 pages
Unsupervised Learning in Machine Learning
No ratings yet
Unsupervised Learning in Machine Learning
51 pages
Unsupervised Learning in Machine Learning
No ratings yet
Unsupervised Learning in Machine Learning
99 pages
Unsupervised Learning and Clustering Techniques
No ratings yet
Unsupervised Learning and Clustering Techniques
59 pages
Understanding Clustering Techniques
No ratings yet
Understanding Clustering Techniques
34 pages
Unsupervised Learning Algorithms Explained
No ratings yet
Unsupervised Learning Algorithms Explained
15 pages
Understanding Clustering Techniques in ML
No ratings yet
Understanding Clustering Techniques in ML
34 pages
Unsupervised Learning: Clustering Techniques
No ratings yet
Unsupervised Learning: Clustering Techniques
23 pages
3,4,5 Module AIML 4th ME
No ratings yet
3,4,5 Module AIML 4th ME
126 pages
Clustering Techniques in Machine Learning
No ratings yet
Clustering Techniques in Machine Learning
13 pages
Unit IV Clustering
No ratings yet
Unit IV Clustering
116 pages
Understanding Unsupervised Learning Techniques
No ratings yet
Understanding Unsupervised Learning Techniques
9 pages
Unsupervised Learning: Clustering Techniques
No ratings yet
Unsupervised Learning: Clustering Techniques
46 pages
UNIT
No ratings yet
UNIT
47 pages
Unsupervised Learning Overview and Techniques
No ratings yet
Unsupervised Learning Overview and Techniques
41 pages
Understanding Clustering in Data Analysis
No ratings yet
Understanding Clustering in Data Analysis
16 pages
Unsupervised Learning in Machine Learning
No ratings yet
Unsupervised Learning in Machine Learning
31 pages
Unsupervised Learning in Python: Clustering
No ratings yet
Unsupervised Learning in Python: Clustering
19 pages
Understanding Clustering in Machine Learning
No ratings yet
Understanding Clustering in Machine Learning
9 pages
Understanding Cluster Analysis Techniques
No ratings yet
Understanding Cluster Analysis Techniques
11 pages
Unsupervised Learning: Clustering Explained
No ratings yet
Unsupervised Learning: Clustering Explained
10 pages
Clustering Techniques in Machine Learning
No ratings yet
Clustering Techniques in Machine Learning
91 pages
Clustering Techniques in Data Mining
No ratings yet
Clustering Techniques in Data Mining
91 pages
K-Means Clustering in Unsupervised Learning
No ratings yet
K-Means Clustering in Unsupervised Learning
13 pages
Unsupervised Machine Learning Overview
No ratings yet
Unsupervised Machine Learning Overview
18 pages
ML Mod3
No ratings yet
ML Mod3
95 pages
Unsupervised Learning and Clustering Techniques
No ratings yet
Unsupervised Learning and Clustering Techniques
10 pages
Clustering Techniques in Unsupervised Learning
No ratings yet
Clustering Techniques in Unsupervised Learning
42 pages
Unsupervised Clustering Techniques Explained
No ratings yet
Unsupervised Clustering Techniques Explained
64 pages
Hierarichal Unit 4
No ratings yet
Hierarichal Unit 4
5 pages
Cluster Analysis Basics Explained
No ratings yet
Cluster Analysis Basics Explained
29 pages
Understanding K-Means Clustering
No ratings yet
Understanding K-Means Clustering
24 pages
Unsupervised Learning: Clustering Basics
No ratings yet
Unsupervised Learning: Clustering Basics
20 pages
Understanding Clustering Methods in Data Analysis
No ratings yet
Understanding Clustering Methods in Data Analysis
18 pages
23AD1401 Machine Learning Unit 4
No ratings yet
23AD1401 Machine Learning Unit 4
42 pages
Clustering Techniques in Machine Learning
No ratings yet
Clustering Techniques in Machine Learning
9 pages
Unsupervised Learning: Clustering Techniques
No ratings yet
Unsupervised Learning: Clustering Techniques
38 pages
Unsupervised Learning Techniques Guide
No ratings yet
Unsupervised Learning Techniques Guide
133 pages
Clustering Techniques Explained
No ratings yet
Clustering Techniques Explained
32 pages
Clustering Techniques in Machine Learning
No ratings yet
Clustering Techniques in Machine Learning
34 pages
Decision Trees and Clustering Techniques
No ratings yet
Decision Trees and Clustering Techniques
48 pages
Clustering Methods in Machine Learning
No ratings yet
Clustering Methods in Machine Learning
45 pages
Unsupervised Learning: Clustering Methods
No ratings yet
Unsupervised Learning: Clustering Methods
12 pages
Lecture04 Unsupervised
No ratings yet
Lecture04 Unsupervised
81 pages
Clustering Approaches in Machine Learning
No ratings yet
Clustering Approaches in Machine Learning
44 pages
Unsupervised Learning: Clustering Techniques
No ratings yet
Unsupervised Learning: Clustering Techniques
82 pages
25 - Unsupervised Learning
No ratings yet
25 - Unsupervised Learning
47 pages
Understanding Clustering Techniques in ML
No ratings yet
Understanding Clustering Techniques in ML
26 pages
Understanding Clustering in Unsupervised Learning
No ratings yet
Understanding Clustering in Unsupervised Learning
19 pages
PE I LTR Dated 16.01.2026
No ratings yet
PE I LTR Dated 16.01.2026
2 pages
ABAP Development - Displaying A DMS PDF Document..in WD
100% (1)
ABAP Development - Displaying A DMS PDF Document..in WD
3 pages
Java Generics Explained: Methods & Classes
No ratings yet
Java Generics Explained: Methods & Classes
11 pages
Java Program to Reverse an Array
No ratings yet
Java Program to Reverse an Array
3 pages
04PC Parameter Back-Up and Restore Procedure Ver 1 - 2
100% (1)
04PC Parameter Back-Up and Restore Procedure Ver 1 - 2
32 pages
Python Programming Assignment Guide
No ratings yet
Python Programming Assignment Guide
2 pages
Gpelab A Matlab Toolbox For Computing Stationary Solutions and Dynamics of Gross-Pitaevskii Equations (Gpe)
No ratings yet
Gpelab A Matlab Toolbox For Computing Stationary Solutions and Dynamics of Gross-Pitaevskii Equations (Gpe)
122 pages
BCT (Unit 4)
No ratings yet
BCT (Unit 4)
31 pages
Computer Science Compression Guide
No ratings yet
Computer Science Compression Guide
13 pages
Hexapod Spider Robot Project Overview
No ratings yet
Hexapod Spider Robot Project Overview
15 pages
Project Classes and Types Overview
No ratings yet
Project Classes and Types Overview
4 pages
Binary Code Conversion Techniques
No ratings yet
Binary Code Conversion Techniques
9 pages
ICPC 2024 North America Championship Problems
No ratings yet
ICPC 2024 North America Championship Problems
28 pages
MDM SWAYAM Course Details Configured On ERP For Credit Transfer 2022-26 (2025-26 Passing Out Batch) 13 Feb 2026
No ratings yet
MDM SWAYAM Course Details Configured On ERP For Credit Transfer 2022-26 (2025-26 Passing Out Batch) 13 Feb 2026
2 pages
SLS Admission Declaration AY 2024-25
No ratings yet
SLS Admission Declaration AY 2024-25
2 pages
Problem Solving Techniques in C
100% (5)
Problem Solving Techniques in C
4 pages
Hibernate Annotations: Reference Guide
No ratings yet
Hibernate Annotations: Reference Guide
25 pages
The Hiball Tracker: High-Performance Wide-Area Tracking For Virtual and Augmented Environments
No ratings yet
The Hiball Tracker: High-Performance Wide-Area Tracking For Virtual and Augmented Environments
11 pages
Char vs Varchar in SQL Explained
No ratings yet
Char vs Varchar in SQL Explained
3 pages
PRINCE2 Agile Chapters 1 To 3 PDF
100% (2)
PRINCE2 Agile Chapters 1 To 3 PDF
19 pages
NS2 Simulation: TCP/UDP Scenarios
No ratings yet
NS2 Simulation: TCP/UDP Scenarios
29 pages
Desktop Support Engineer Interview Questions and Answers - ToughNickel
No ratings yet
Desktop Support Engineer Interview Questions and Answers - ToughNickel
5 pages
Project Final Year On Password Management System
25% (4)
Project Final Year On Password Management System
52 pages
CIBIL Process Overview for CRMnext
No ratings yet
CIBIL Process Overview for CRMnext
85 pages
Convert IPv4 Addresses to Binary Lab
No ratings yet
Convert IPv4 Addresses to Binary Lab
5 pages
Beginner DSA Questions in Python
No ratings yet
Beginner DSA Questions in Python
28 pages
SAN Storage Management Solutions Explained
No ratings yet
SAN Storage Management Solutions Explained
25 pages
Binary Tree Construction and Analysis
No ratings yet
Binary Tree Construction and Analysis
15 pages
Samsung M332X Series Firmware Update Guide
No ratings yet
Samsung M332X Series Firmware Update Guide
3 pages
Minimum Spanning Tree
No ratings yet
Minimum Spanning Tree
17 pages

Unit 5 (Part 2)

Uploaded by

Unit 5 (Part 2)

Uploaded by

Unsupervised Learning Techniques: Clustering, Hierarchical Clustering, Partitional

clustering: K-Means Clustering

Distribution model-based clustering:

A dendrogram is a tree-like structure that visualizes the process of

Linkage criteria determine how distances between clusters are calculated

The most common linkage functions are as follows:

Types of Hierarchical Clustering

The two main types of hierarchical clustering are agglomerative (bottom-

Example: We have a dataset of six points in 2D space:

We will cluster these points into K=2 clusters.

Assign each point to the nearest centroid:

You might also like