Machine Learning
1
UNIT - III
UNIT III 9 Hrs
Unsupervised Learning: Clustering: Introduction to clustering, K-
means clustering, K-Mode Clustering, Hierarchal clustering, Anomaly
Detection, Association: Apriori Algorithm, F-P Growth Algorithm, Eclat
Algorithm, Ensemble method: Bootstrap Aggregation, Boosting,
Gradient Boosting Machines, Stacking.
2
Unsupervised Learning
Introduction
• In machine learning, the problem of unsupervised learning is that of trying to find
hidden structure in unlabeled data.
• Since the examples given to the learner are unlabeled, there is no error or reward
signal to evaluate the goodness of a potential solution. This distinguishes
unsupervised from supervised learning.
• Unsupervised learning is defined as the task performed by algorithms that learn
from a training set of unlabeled or unannotated examples, using the features of the
inputs to categorize them according to some geometric or statistical criteria.
• Unsupervised learning encompasses many techniques that seek to summarize and
explain key features or structures of the data. 3
Unsupervised Learning
Introduction
• Many methods employed in unsupervised learning are based on data mining methods
used to preprocess data.
• Most unsupervised learning techniques can be summarized as those that tackle the
following four groups of problems.
• Clustering:
• Has as a goal to partition the set of examples into groups.
• Dimensionality reduction:
• Aims to reduce the dimensionality of the data. Here, we encounter techniques such as Principal
Component Analysis (PCA), independent component analysis, and nonnegative matrix factorization.
• Outlier detection:
• Has as a purpose to find unusual events (e.g., a malfunction), that distinguish part of the data from
the rest according to certain criteria.
• Novelty detection:
• Deals with cases when changes occur in the data (e.g., in streaming data). 4
Unsupervised Learning
Clustering
• The most common unsupervised task is clustering.
• Clustering is a process of grouping similar objects together; i.e., to partition
unlabeled examples into disjoint subsets of clusters, such that:
• Examples within a cluster are similar (in this case, we speak of high intraclass
similarity).
• Examples in different clusters are different (in this case, we speak of low
interclass similarity).
• When we denote data as similar and dissimilar, we should define a measure for this
similarity/dissimilarity.
• Note:
• Grouping similar data together can help in discovering new categories in an
unsupervised manner, even when no sample category labels are provided.
5
Unsupervised Learning
Types of Clustering
6
K-Means Clustering Algorithm
What is K-Means Clustering ?
• K-Means clustering is the most popular unsupervised learning algorithm.
• It is used when we have unlabeled data which is data without defined categories or
groups.
• The algorithm follows an easy or simple way to classify a given data set through a
certain number of clusters.
• K-Means algorithm works iteratively to assign each data point to one of K groups
based on the features that are provided. Data points are clustered based on feature
similarity.
7
K-Means Clustering Algorithm
What is K-Means Clustering ?
• K-Means Clustering groups the unlabeled dataset into different clusters. Here K
defines the number of pre-defined clusters that need to be created in the process,
as if K=2, there will be two clusters, and for K=3, there will be three clusters, and so
on.
“It is an iterative algorithm that divides the unlabeled dataset into k different clusters
in such a way that each dataset belongs only one group that has similar properties”.
• It allows us to cluster the data into different groups and a convenient way to
discover the categories of groups in the unlabeled dataset on its own without the
need for any training.
• It is a centroid-based algorithm, where each cluster is associated with a centroid.
The main aim of this algorithm is to minimize the sum of distances between the
data point and their corresponding clusters. 8
K-Means Clustering Algorithm
What is K-Means Clustering ?
• The algorithm takes the unlabeled dataset as input, divides the dataset into k-
number of clusters, and repeats the process until it does not find the best clusters.
The value of k should be predetermined in this algorithm.
• The k-means clustering algorithm mainly performs two tasks:
• Determines the best value for K center points or centroids by an iterative process.
• Assigns each data point to its closest k-center. Those data points which are near to the
particular k-center, create a cluster.
• Hence each cluster has datapoints with some commonalities, and it is
away from other clusters.
9
K-Means Clustering Algorithm
How does the K-Means Algorithm Work
10
K-Means Clustering Algorithm
How does the K-Means Algorithm Work?
The working of the K-Means algorithm is explained in the below steps:
Step-1: Select the number K to decide the number of clusters.
Step-2: Select random K points or centroids. (It can be other from the input
dataset).
Step-3: Assign each data point to their closest centroid, which will form the
predefined K clusters.
Step-4: Calculate the variance and place a new centroid of each cluster.
Step-5: Repeat the third steps, which means reassign each datapoint to the
new closest centroid of each cluster.
Step-6: If any reassignment occurs, then go to step-4 else go to FINISH.
Step-7: The model is ready.
11
K-Means Clustering Algorithm
How to choose the value of "K number of clusters" in K-means Clustering?
• The performance of the K-means clustering algorithm depends upon highly
efficient clusters that it forms. But choosing the optimal number of clusters is a big
task. There are some different ways to find the optimal number of clusters, but
here we are discussing the most appropriate method to find the number of
clusters or value of K. The method is given below:
12
K-Means Clustering Algorithm
Elbow Method
• The Elbow method is one of the most popular ways to find the optimal number of
clusters. This method uses the concept of WCSS value. WCSS stands for Within
Cluster Sum of Squares, which defines the total variations within a cluster. The
formula to calculate the value of WCSS (for 3 clusters) is given below:
WCSS= ∑Pi in Cluster1 distance(Pi C1)2 +∑Pi in Cluster2distance(Pi C2)2+∑Pi in CLuster3 distance(Pi C3)2
• In the above formula of WCSS,
• ∑Pi in Cluster1 distance(Pi C1)2: It is the sum of the square of the distances between
each data point and its centroid within a cluster1 and the same for the other two
terms.
13
K-Means Clustering Algorithm
Elbow Method
• To measure the distance between data points and centroid, we can use any
method such as Euclidean distance or Manhattan distance.
• To find the optimal value of clusters, the elbow method follows the below
steps:
• It executes the K-means clustering on a given dataset for different K
values (ranges from 1-10).
• For each value of K, calculates the WCSS value.
• Plots a curve between calculated WCSS values and the number of clusters
K.
• The sharp point of bend or a point of the plot looks like an arm, then that
point is considered as the best value of K.
14
K-Means Clustering Algorithm
Elbow Method
• Since the graph shows the sharp bend, which looks like an elbow, hence it is known
as the elbow method. The graph for the elbow method looks like the below image:
15
K-Means Clustering Algorithm
Applications of clustering
• K-Means clustering is the most common unsupervised machine
learning algorithm.
• It is widely used for many applications which include-
• Image segmentation
• Customer segmentation
• Species clustering
• Anomaly detection
• Clustering languages
16
Hierarchical Clustering Algorithm
• Hierarchical clustering is another unsupervised machine learning
algorithm, which is used to group the unlabeled datasets into a cluster
and also known as hierarchical cluster analysis or HCA.
• In this algorithm, we develop the hierarchy of clusters in the form of a
tree, and this tree-shaped structure is known as the dendrogram.
• Sometimes the results of K-means clustering and hierarchical
clustering may look similar, but they both differ depending on how they
work. As there is no requirement to predetermine the number of
clusters as we did in the K-Means algorithm.
17
Hierarchical Clustering Algorithm
• The hierarchical clustering technique has two approaches:
[Link]: Agglomerative is a bottom-up approach, in which the
algorithm starts with taking all data points as single clusters and
merging them until one cluster is left.
[Link]: Divisive algorithm is the reverse of the agglomerative
algorithm as it is a top-down approach.
18
Hierarchical Clustering Algorithm
Why hierarchical clustering?
• As we already have other clustering algorithms such as K-Means
Clustering, then why we need hierarchical clustering?
• We have seen in the K-means clustering that there are some challenges with
this algorithm, which are a predetermined number of clusters, and it always
tries to create the clusters of the same size.
• To solve these two challenges, we can opt for the hierarchical clustering
algorithm because, in this algorithm, we don't need to have knowledge about
the predefined number of clusters.
19
Hierarchical Clustering Algorithm
Agglomerative Hierarchical clustering
• The agglomerative hierarchical clustering algorithm is a popular
example of HCA.
• To group the datasets into clusters, it follows the bottom-up
approach. It means, this algorithm considers each dataset as a single
cluster at the beginning, and then start combining the closest pair of
clusters together.
• It does this until all the clusters are merged into a single cluster that
contains all the datasets.
• This hierarchy of clusters is represented in the form of 20 the
Hierarchical Clustering Algorithm
How the Agglomerative Hierarchical clustering Work?
The working of the AHC algorithm can be explained using the below
steps:
Step-1: Create each data point as a single cluster. Let's say there are N
data points, so the number of clusters will also be N.
21
Hierarchical Clustering Algorithm
How the Agglomerative Hierarchical clustering Work?
Step-2: Take two closest data points or clusters and merge them to form
one cluster. So, there will now be N-1 clusters..
22
Hierarchical Clustering Algorithm
How the Agglomerative Hierarchical clustering Work?
Step-3: Again, take the two closest clusters and merge them together to
form one cluster. There will be N-2 clusters.
23
Hierarchical Clustering Algorithm
How the Agglomerative Hierarchical clustering Work?
Step-4: Repeat Step 3 until only one cluster left. So, we will get the
following clusters. Consider the below images:
24
Hierarchical Clustering Algorithm
How the Agglomerative Hierarchical clustering Work?
• Step-5: Once all the clusters are combined into one big cluster,
develop the dendrogram to divide the clusters as per the problem.
Note:
To better understand hierarchical clustering, it is advised to have a look
on k-means clustering
25
Hierarchical Clustering Algorithm
Measure for the distance between two clusters
• As we have seen, the closest distance between the two clusters is
crucial for the hierarchical clustering.
• There are various ways to calculate the distance between two clusters,
and these ways decide the rule for clustering.
• These measures are called Linkage methods. Some of the popular
linkage methods are given below:
26
Hierarchical Clustering Algorithm
1. Single Linkage:
It is the Shortest Distance between the closest points of the clusters.
Consider the below image:
27
Hierarchical Clustering Algorithm
2. Complete Linkage:
• It is the farthest distance between the two points of two different
clusters. It is one of the popular linkage methods as it forms tighter
clusters than single-linkage.
28
Hierarchical Clustering Algorithm
3. Average Linkage:
• It is the linkage method in which the distance between each pair of
datasets is added up and then divided by the total number of datasets
to calculate the average distance between two clusters.
• It is also one of the most popular linkage methods.
29
Hierarchical Clustering Algorithm
4. Centroid Linkage:
• It is the linkage method in which the distance between the centroid of
the clusters is calculated. Consider the below image:
• From the above-given approaches, we can apply any of them
according to the type of problem or business requirement. 30
Hierarchical Clustering Algorithm
Woking of Dendrogram in Hierarchical clustering
• The dendrogram is a tree-like structure that is mainly used to store
each step as a memory that the HC algorithm performs.
• In the dendrogram plot, the Y-axis shows the Euclidean distances
between the data points, and the x-axis shows all the data points of the
given dataset. The working of the dendrogram can be explained using
the below diagram:
31
Hierarchical Clustering Algorithm
Woking of Dendrogram in Hierarchical clustering
• In the above diagram, the left part is showing how clusters are created
in agglomerative clustering, and the right part is showing the
corresponding dendrogram. 32
Hierarchical Clustering Algorithm
Woking of Dendrogram in Hierarchical clustering
• As we have discussed above, firstly, the datapoints P2 and P3
combine together and form a cluster, correspondingly a dendrogram is
created, which connects P2 and P3 with a rectangular shape. The
hight is decided according to the Euclidean distance between the data
points.
• In the next step, P5 and P6 form a cluster, and the corresponding
dendrogram is created. It is higher than of previous, as the Euclidean
distance between P5 and P6 is a little bit greater than the P2 and P3.
• Again, two new dendrograms are created that combine P1, P2, and P3
in one dendrogram, and P4, P5, and P6, in another dendrogram.
• At last, the final dendrogram is created that combines all the data
points together.
33
K-means Vs Hierarchical
34
K-means Vs Hierarchical
35
K-Mode Clustering Algorithm
• K-Modes clustering is one of the unsupervised Machine Learning
algorithms that is used to cluster categorical variables.
• Unlike traditional clustering algorithms that use distance metrics, K-
Modes works by identifying the modes or most frequent values within
each cluster to determine its centroid.
• K-Modes is ideal for clustering categorical data such as customer
demographics, market segments, or survey responses.
• It is a powerful tool for data analysts and scientists to gain insights into
their data and make informed decisions.
36
K-Mode Clustering Algorithm
K-Modes vs K-Means
• K-Means uses mathematical measures (distance) to cluster continuous
data. The lesser the distance, the more similar our data points are.
Centroids are updated by Means.
• But for categorical data points, we cannot calculate the distance. So
we go for K-Modes algorithm. It uses the dissimilarities(total
mismatches) between the data points. The lesser the dissimilarities the
more similar our data points. It uses Modes instead of means.
37
K-Mode Clustering Algorithm
How Does the K-Modes Algorithm Work?
• Unlike K-Means clustering methods, we need to upfront specify the K.
[Link] K observations at random and use them as leaders/clusters.
[Link] the dissimilarities and assign each observation to its closest
cluster.
[Link] new modes for the clusters.
[Link] 2–3 steps until there are is no re-assignment required
38
K-Mode Clustering Algorithm
Example
Imagine we have a dataset that has the information about hair color, eye color, and
skin color of persons. We aim to group them based on the available information
we have the sample data now. Let us proceed by defining the number
39
of
clusters(K)=3
K-Mode Clustering Algorithm
Example
Step 1: Pick K observations at random and use them as leaders/clusters
I am choosing P1, P7, P8 as leaders/clusters
40
K-Mode Clustering Algorithm
Example
Step 2: Calculate the dissimilarities(no. of mismatches) and assign each observation
to its closest cluster
• Iteratively compare the cluster data points to each of the observations. Similar data
points give 0, dissimilar data points give 1.
41
K-Mode Clustering Algorithm
Example
Step 2: Calculate the dissimilarities(no. of mismatches) and assign each observation
to its closest cluster
• Iteratively compare the cluster data points to each of the observations. Similar data
points give 0, dissimilar data points give 1.
• Comparing leader/Cluster P1 to
the observation
• P1 gives 0 dissimilarities.
42
K-Mode Clustering Algorithm
Example
• Comparing leader/cluster P1 to
the observation P2 gives 3(1+1+1)
dissimilarities.
43
K-Mode Clustering Algorithm
Example
• Likewise, calculate all the
dissimilarities and put them in
a matrix as shown below and
assign the observations to
their closest cluster(cluster
that has the least
dissimilarity)
44
K-Mode Clustering Algorithm
Example
• After step 2, the observations P1, P2, P5 are assigned to cluster 1; P3, P7 are
assigned to Cluster 2; and P4, P6, P8 are assigned to cluster 3.
• Note:
• If all the clusters have the same dissimilarity with an observation, assign to any
cluster randomly.
• In our case, the observation P2 has 3 dissimilarities with all the leaders. I
randomly assigned it to Cluster 1.
45
K-Mode Clustering Algorithm
Example
Step 3: Define new modes for the clusters
• Mode is simply the most observed value.
• Mark the observations according to the cluster they belong to. Observations of
Cluster 1 are marked in Yellow, Cluster 2 are marked in Brick red, and Cluster 3
are marked in Purple.
46
• Considering one cluster at a time, for each feature, look for the Mode and update
K-Mode Clustering Algorithm
Example
• Cluster 1 observations(P1, P2, P5) has brunette as the most observed hair color,
amber as the most observed eye color, and fair as the most observed skin color.
Note: If you observe the same occurrence of values, take the mode randomly. In our
case, the observations of Cluster 3(P3, P7) have one occurrence of brown, fair skin
color. I randomly chose brown as the mode.
• Below are our new leaders after the update.
• Obtained new leaders; Repeat steps 2–4 47
K-Mode Clustering Algorithm
Example
• After obtaining the new leaders, again calculate the dissimilarities between the
observations and the newly obtained leaders.
48
K-Mode Clustering Algorithm
Example
• Comparing Cluster 1 to the observation P1 gives 1 dissimilarity.
49
K-Mode Clustering Algorithm
Example
• Comparing Cluster 1 to the observation P2 gives 2 dissimilarities.
• Likewise, calculate all the dissimilarities and put them in a matrix. Assign each
observation to its closest cluster..
• The observations P1, P2, P5 are assigned to Cluster 1; P3, P7 are assigned to
Cluster 2; and P4, P6, P8 are assigned to Cluster 3.
50
• We stop here as we see there is no change in the assignment of observations.
Association Rule Learning
• Association rule learning is a type of unsupervised learning technique
that checks for the dependency of one data item on another data item
and maps accordingly so that it can be more profitable.
• It tries to find some interesting relations or associations among the
variables of dataset.
• It is based on different rules to discover the interesting relations
between variables in the database.
• The association rule learning is one of the very important concepts of
machine learning, and it is employed in Market Basket analysis, Web
usage mining, continuous production, etc.
51
Association Rule Learning
• Here market basket analysis is a technique used by the various big retailer
to discover the associations between items.
• We can understand it by taking an example of a supermarket, as in a
supermarket, all products that are purchased together are put together.
• For example, if a customer buys bread, he most likely can also buy butter,
eggs, or milk, so these products are stored within a shelf or mostly nearby.
52
Association Rule Learning
• Consider the below diagram:
• Association rule learning can be divided into three types of algorithms:
1. Apriori
2. Eclat
3. F-P Growth Algorithm
53
Association Rule Learning
How does Association Rule Learning work?
• Association rule learning works on the concept of If and Else Statement, such as if
A then B.
• Here the If element is called antecedent, and then statement is called as
Consequent. These types of relationships where we can find out some association
or relation between two items is known as single cardinality.
• It is all about creating rules, and if the number of items increases, then cardinality
also increases accordingly. So, to measure the associations between thousands of
data items, there are several metrics.
54
Association Rule Learning
How does Association Rule Learning work?
• These metrics are given below:
• Support
• Confidence
• Lift
Support
• Support is the frequency of A or how frequently an item appears in the dataset. It is
defined as the fraction of the transaction T that contains the itemset X. If there are
X datasets, then for transactions T, it can be written as:
55
Association Rule Learning
How does Association Rule Learning work?
Confidence
• Confidence indicates how often the rule has been found to be true. Or how often
the items X and Y occur together in the dataset when the occurrence of X is
already given. It is the ratio of the transaction that contains X and Y to the number
of records that contain X.
Lift
• It is the strength of any rule, which can be defined as below formula:
56
Association Rule Learning
How does Association Rule Learning work?
• It is the ratio of the observed support measure and expected support if X and Y are
independent of each other.
• It has three possible values:
• If Lift= 1: The probability of occurrence of antecedent and consequent is
independent of each other.
• Lift>1: It determines the degree to which the two item-sets are dependent to
each other.
• Lift<1: It tells us that one item is a substitute for other items, which means one
item has a negative effect on another.
57
Association Rule Learning
Types of Association Rule Learning
Association rule learning can be divided into three algorithms:
1. Apriori Algorithm
• This algorithm uses frequent datasets to generate association rules. It is designed
to work on the databases that contain transactions.
• This algorithm uses a breadth-first search and Hash Tree to calculate the itemset
efficiently.
• It is mainly used for market basket analysis and helps to understand the products
that can be bought together.
• It can also be used in the healthcare field to find drug reactions for patients. 58
Association Rule Learning
Types of Association Rule Learning
2. F-P Growth Algorithm
• The F-P growth algorithm stands for Frequent Pattern, and it is the improved
version of the Apriori Algorithm.
• It represents the database in the form of a tree structure that is known as a
frequent pattern or tree.
• The purpose of this frequent tree is to extract the most frequent patterns.
59
Association Rule Learning
Types of Association Rule Learning
3. Eclat Algorithm
• Eclat algorithm stands for Equivalence Class Transformation.
• This algorithm uses a depth-first search technique to find frequent itemsets in a
transaction database.
• It performs faster execution than Apriori Algorithm.
60
Association Rule Learning
Applications of Association Rule Learning
• It has various applications in machine learning and data mining. Below are some
popular applications of association rule learning:
• Market Basket Analysis: It is one of the popular examples and applications of
association rule mining. This technique is commonly used by big retailers to
determine the association between items.
• Medical Diagnosis: With the help of association rules, patients can be cured
easily, as it helps in identifying the probability of illness for a particular disease.
• Protein Sequence: The association rules help in determining the synthesis of
artificial Proteins.
• It is also used for the Catalog Design and Loss-leader Analysis and many more
other applications. 61
Unit – III (Functions) 62