Unsupervised Learning Techniques: Clustering, Hierarchical Clustering, Partitional
clustering: K-Means Clustering
Clustering:
Clustering is an unsupervised machine learning technique used to group similar data points
together without using labelled data. It helps discover hidden patterns or natural groupings in
datasets by placing similar data points into the same cluster.
The clustering technique can be widely used in various tasks. Some most common uses of
this technique are:
Market Segmentation
Statistical data analysis
Social network analysis
Image segmentation
Anomaly detection
Types of Clustering
1. Hard Clustering
Hard clustering assigns each data point to exactly one cluster. A data point cannot belong to
multiple clusters, making the grouping clear and easy to interpret.
Each data point belongs to only one cluster
No overlap between clusters
If customers are divided into two clusters, each customer belongs completely to either Cluster
1 or Cluster 2. A customer cannot belong to both clusters at the same time.
2. Soft Clustering
Soft clustering allows a data point to belong to multiple clusters with different probabilities.
Instead of assigning a strict cluster, it gives a degree of membership to each cluster.
Example
A data point may belong 70% to Cluster 1 and 30% to Cluster 2, indicating that it shares
characteristics with both groups
Clustering Methods/ Types of Clustering
Clustering methods can be classified on the basis of how they form clusters
1. Centroid based Clustering
Centroid based clustering groups data points around central points called centroids. Each
cluster is represented by the average of its points and data points are assigned to the nearest
centroid.
Algorithms:
K-means
K-medoids
2. Density based Clustering
The density-based clustering method tends to connect the highly-dense areas into clusters,
and then the algorithm forms the arbitrarily shaped distributions as long as it can connect the
dense region. This algorithm achieves this by identifying different clusters in the dataset and
connecting the areas of high densities into clusters with each other. The algorithm then
divides the dense areas in data space from each other into sparser areas.
Distribution model-based clustering:
Within the distribution model-based clustering method, the algorithm divides the data on the
basis of the probability of how a dataset belongs to a particular distribution. The algorithm
then performs grouping by assuming some distributions commonly Gaussian Distribution.
Hierarchical Clustering
We can make use of Hierarchical clustering as an alternative for the partitioned clustering as
there is no requirement of pre-specifying the number of clusters that we need to specify
beforehand.
In this technique, the algorithm divides the dataset into clusters to create a tree-like structure,
which is referred to as a dendrogram. The algorithm can then select the observations or any
number of clusters by cutting the tree at the correct level. The most widely-known example of
this method is the Agglomerative Hierarchical algorithm.
Fuzzy Clustering
Fuzzy clustering is a type of soft clustering method in which a data object may belong to one
or more than one group or cluster. Each dataset consists of a set of membership coefficients,
which tends to depend on the degree of membership to be in a cluster. The Fuzzy C-means
algorithm is a common example of this type of clustering; it is sometimes even referred to as
the Fuzzy k-means algorithm.
Hierarchical Clustering:
Hierarchical clustering is an unsupervised learning algorithm used to group similar data
points into clusters. It builds a multilevel hierarchy of clusters by either merging smaller
clusters into larger ones (agglomerative) or dividing a large cluster into smaller ones
(divisive). This results in a tree-like structure known as a dendrogram.
A dendrogram is a visual representation that illustrates the arrangement of clusters and their
relationships with each other. The height of the branches in a dendrogram represents the
distance or dissimilarity at which clusters merge. Lower heights indicate clusters joined at
smaller distances, thus representing higher similarity.
Hierarchical clustering is particularly useful when the number of clusters is
not known beforehand.
Dendrograms:
A dendrogram is a tree-like structure that visualizes the process of
hierarchical clustering. Each level of the tree represents a merge or split
operation, and the height of the branches represents the distance (or
dissimilarity) at which clusters were joined.
it's important because:
Allows visual inspection of clusters at different levels.
By "cutting" the dendrogram at a certain height, you can choose the
number of clusters that best fits the data
Linkage criteria
Linkage criteria determine how distances between clusters are calculated
during the merging process.
The most common linkage functions are as follows:
1. Single linkage defines the distance between two clusters as the smallest pairwise
distance between elements from each cluster.
2. Complete linkage defines the distance between two clusters as the largest pointwise
distance.
3. Average linkage defines the cluster distance as the average pointwise distance.
4. Centroid linkage defines the cluster distance as the point distance between the cluster
means.
q
Types of Hierarchical Clustering
The two main types of hierarchical clustering are agglomerative (bottom-
up) and divisive (top-down).
1. Agglomerative (Bottom-Up)
Agglomerative clustering starts with each data point as an individual cluster and iteratively
merges the closest pair of clusters until only one cluster remains or until a stopping condition
is met (like a desired number of clusters).
This method is also called bottom-up because it starts from the bottom (individual data
points) and builds up to the top (final cluster).
Step-by-step guide:
1. Treat each data point as a singleton cluster.
2. Compute the pairwise distances between all clusters.
3. Merge the two clusters with the smallest distance.
4. Update the distance matrix to reflect the new set of clusters.
5. Repeat steps 2–4 until all points belong to a single cluster.
Divisive (Top-Down):
Divisive clustering starts with all data points in a single cluster and recursively splits them
into smaller clusters. The process continues until each data point is in its own individual
cluster.
It is also known as a top-down approach, as it starts at the top (single cluster) and breaks it
down into smaller clusters.
Step-by-step guide:
1. Start with one large cluster containing all data points.
2. Split the cluster into two that are as different as possible.
3. Reapply the splitting process recursively to each resulting cluster.
K-Means:
K-Means is an unsupervised learning algorithm used to find groups, or clusters, within data.
Here, the data is unlabelled, l. The algorithm’s goal is to discover hidden patterns in the data
so that similar points are grouped together. K mean is a clustering algorithm that is used to
classify unlabeled data into groups/clusters based on similarity.
Here K defines the number of pre-defined clusters that need to be created in the process if
K=2, there will be two clusters, and for K=3, there will be three clusters, and so on.
It is a centroid-based algorithm, where each cluster is associated with a centroid. The main
aim of this algorithm is to minimize the sum of distances between the data point and their
corresponding clusters.
The algorithm takes the unlabeled dataset as input, divides the dataset into k-number of
clusters, and repeats the process until it does not find the best clusters. The value of k should
be predetermined in this algorithm.
K-Mean Algorithm :
1. Plot Data
2. Select the number K to decide the number of clusters.
3. Select random K points or centroids. (It can be other from the input dataset).
4. Assign each data point to their closest centroid, which will form the predefined K
clusters.
5. Repeat the fourth step, which means reassigning each data point to the new closest
centroid of each cluster.
Example: We have a dataset of six points in 2D space:
We will cluster these points into K=2 clusters.
Step-by-Step Solution:
Step 1: Initialization
● Choose K=2
● Randomly select two initial centroids. Let's choose points A (1, 2) and D (8, 8) as the
initial centroids.
Step 2: Assignment Step
Calculate the distance between each point and the centroids. We'll use Euclidean distance.
Distance from each point to Centroid 1 (A: (1, 2)):
assign each point to the nearest centroid:
● Points A, B, and C are closest to Centroid 1.
● Points D, E, and F are closest to Centroid 2.
Step 3: Update Step
Calculate the new centroids:
New Centroid 1 (mean of points A, B, C):
Step 4: Repeat Assignment Step
Reassign each point to the nearest centroid:
1. Distance from each point to New Centroid 1 (2, 3):
Assign each point to the nearest centroid:
● Points A, B, and C are closest to Centroid 1.
● Points `D E and F are closest to Centroid 2.
The assignments have not changed, indicating convergence.
Final Clusters:
● Cluster 1: Contains points A, B, C with centroid (2, 3).
● Cluster 2: Contains points D, E, F with centroid (9, 9).
Example-2: cluster the following 8 points into 3 clusters A1(2,10), A2(2,5), A3(8,4),
A4(5,8), A6(7,5),A7(6,4), A8(1,2), A8(4,5)