APEX INSTITUTE OF TECHNOLOGY
COMPUTER SCIENCE &
ENGINEERING
Bachelor of Engineering (Computer Science)
Generative and Discriminative Analysis, Gaussian
Discriminant Analysis (GDA), Naïve Bayes Analysis
20CSF-286
Prof. (Dr.) Paras Chawla (E5653)
Unit 1 : Machine Learning DISCOVER . LEARN . EMPOWER
Outline
• Naïve Bayes Analysis.
• Generative Models
• Discriminative Models
• Gaussian Discriminant Analysis (GDA)
Course Objectives
S. No. Objectives
1 Understand the concept of learning in computer and
science.
2 Compare and contrast different paradigms for
learning (supervised, unsupervised, etc.)
3 Design experiments to evaluate and compare
different machine learning techniques on real-world
problems.
Course Outcomes
S. No. Outcomes
CO1 Understand the various key paradigms of Machine
Learning
CO2 Familiar with the mathematical and statistical
techniques used in Machine Learning
CO3 Implement a wide variety of learning algorithms
including well-studied methods for classification,
regression and clustering.
CO4 Analyze methods to evaluate learning models
generated from data.
CO5 Ability to evaluate machine learning models for
solving practical problems.
Bayesian Classifier
• A statistical classifier
Performs probabilistic prediction, i.e., predicts class membership probabilities
• Foundation
Based on Bayes’ Theorem.
• Assumptions
The classes are mutually exclusive and exhaustive.
The attributes are independent given the class.
• Called “Naïve” classifier because of these assumptions.
Empirically proven to be useful.
Scales very well.
5
Air-Traffic Data
Days Season Fog Rain Class
Weekday Spring None None On Time
Weekday Winter None Slight On Time
Weekday Winter None None On Time
Holiday Winter High Slight Late
Saturday Summer Normal None On Time
Weekday Autumn Normal None Very Late
Holiday Summer High Slight On Time
Sunday Summer Normal None On Time
Weekday Winter High Heavy Very Late
Weekday Summer None Slight On Time
Cond. to next slide…
6
Air-Traffic Data
Cond. from previous slide…
Days Season Fog Rain Class
Saturday Spring High Heavy Cancelled
Weekday Summer High Slight On Time
Weekday Winter Normal None Late
Weekday Summer High None On Time
Weekday Winter Normal Heavy Very Late
Saturday Autumn High Slight On Time
Weekday Autumn None Heavy On Time
Holiday Spring Normal Slight On Time
Weekday Spring Normal None On Time
Weekday Spring Normal Heavy On Time
7
Air-Traffic Data
• In this database, there are four attributes
A = [ Day, Season, Fog, Rain] with 20 tuples.
• The categories of classes are:
C= [On Time, Late, Very Late, Cancelled]
• Given this is the knowledge of data and classes, we are to find most likely classification
for any other unseen instance, for example:
Week Day Winter High None ???
• Classification technique eventually to map this tuple into an accurate class.
8
Bayesian Classifier
• In many applications, the relationship between the attributes set and the class variable is
non-deterministic.
• In other words, a test cannot be classified to a class label with certainty.
• In such a situation, the classification can be achieved probabilistically.
• The Bayesian classifier is an approach for modelling probabilistic relationships between
the attribute set and the class variable.
• More precisely, Bayesian classifier use Bayes’ Theorem of Probability for classification.
• Before going to discuss the Bayesian classifier, we should have a quick look at the
Theory of Probability and then Bayes’ Theorem.
9
Bayes’ Theorem of Probability
10
Simple Probability
Definition: Simple Probability
If there are n elementary events associated with a random experiment and m of n
of them are favorable to an event A, then the probability of happening or
occurrence of A is
11
Simple Probability
• Suppose, A and B are any two events and P(A), P(B) denote the probabilities that the events A
and B will occur, respectively.
• Mutually Exclusive Events:
Two events are mutually exclusive, if the occurrence of one precludes the occurrence of the other.
Example: Tossing a coin (two events)
Tossing a ludo cube (Six events)
Can you give an example, so that two events are not mutually exclusive?
Hint: Tossing two identical coins, Weather (sunny, foggy, warm)
12
Simple Probability
• Independent events: Two events are independent if occurrences of one does not alter the
occurrence of other.
Example: Tossing both coin and ludo cube together.
(How many events are here?)
Can you give an example, where an event is dependent on one or more other events(s)?
Hint: Receiving a message (A) through a communication channel (B)
over a computer (C), rain and dating.
13
Joint Probability
Definition : Joint Probability
If P(A) and P(B) are the probability of two events, then
If A and B are mutually exclusive, then
If A and B are independent events, then
Thus, for mutually exclusive events
14
Conditional Probability
Definition 8.2: Conditional Probability
If events are dependent, then their probability is expressed by conditional
probability. The probability that A occurs given that B is denoted by .
Suppose, A and B are two events associated with a random experiment. The
probability of A under the condition that B has already occurred and is given by
15
Conditional Probability
Conditional Probability
or
For three events A, B and C
For n events A1, A2, …, An and if all events are mutually independent to each other
Note:
if events are mutually exclusive
if A and B are independent
otherwise,
16
Conditional Probability
Generalization of Conditional Probability:
∵ P(A) = P(B)
By the law of total probability : P(B) =
17
Conditional Probability
In general,
18
Total Probability
Definition 8.3: Total Probability
Let be n mutually exclusive and exhaustive events associated with a random
experiment. If A is any event which occurs with , then
19 19
Total Probability: An Example
Example 8.3
A bag contains 4 red and 3 black balls. A second bag contains 2 red and 4 black balls. One bag is
selected at random. From the selected bag, one ball is drawn. What is the probability that the ball
drawn is red?
This problem can be answered using the concept of Total Probability
Selecting bag I
Selecting bag II
A = Drawing the red ball
Thus,
where, = Probability of drawing red ball when first bag has been chosen
and = Probability of drawing red ball when second bag has been chosen
20
Reverse Probability
Example 8.3:
A bag (Bag I) contains 4 red and 3 black balls. A second bag (Bag II) contains 2 red and 4 black
balls. You have chosen one ball at random. It is found as red ball. What is the probability that the
ball is chosen from Bag I?
Here,
Selecting bag I
Selecting bag II
A = Drawing the red ball
We are to determine P(|A). Such a problem can be solved using Bayes' theorem of probability.
21
Bayes’ Theorem
Theorem 8.4: Bayes’ Theorem
Let be n mutually exclusive and exhaustive events associated with a random
experiment. If A is any event which occurs with , then
22
Prior and Posterior Probabilities
X Y
P(A) and P(B) are called prior probabilities
P(A|B), P(B|A) are called posterior probabilities A
A
Example 8.6: Prior versus Posterior Probabilities
B
This table shows that the event Y has two outcomes namely A and B,
which is dependent on another event X with various outcomes like A
and . B
Case1: Suppose, we don’t have any information of the event A. A
Then, from the given sample space, we can calculate P(Y = A) =
= 0.5 B
Case2: Now, suppose, we want to calculate P(X = |Y =A) = = 0.4 . B
B
The later is the conditional or posterior probability, where as the
former is the prior probability. A
23
Naïve Bayesian Classifier
Suppose, Y is a class variable and X = is a set of attributes,
with instance of Y.
INPUT (X) CLASS(Y)
… … …
… … … …
… … … …
The classification problem, then can be expressed as the class-conditional probability
24
Naïve Bayesian Classifier
Naïve Bayesian classifier calculate this posterior probability using Bayes’ theorem, which
is as follows.
From Bayes’ theorem on conditional probability, we have
where,
(Y)
Note:
is called the evidence (also the total probability) and it is a constant.
The probability P(Y|X) (also called class conditional probability) is therefore
proportional to P(X|Y).
Thus, P(Y|X) can be taken as a measure of Y given that X.
P(Y|X)
25
Naïve Bayesian Classifier
• Suppose, for a given instance of X (say x = () and ….. .
• There are any two class conditional probabilities namely P(Y|X=x) and P(YX=x).
• If P(YX=x) > P(YX=x), then we say that is more stronger than for the instance X = x.
• The strongest is the classification for the instance X = x.
26
Naïve Bayesian Classifier
Example: With reference to the Air Traffic Dataset mentioned earlier, let us
tabulate all the posterior and prior probabilities as shown below.
Class
Attribute On Time Late Very Late Cancelled
Weekday 9/14 = 0.64 ½ = 0.5 3/3 = 1 0/1 = 0
Saturday 2/14 = 0.14 ½ = 0.5 0/3 = 0 1/1 = 1
Day
Sunday 1/14 = 0.07 0/2 = 0 0/3 = 0 0/1 = 0
Holiday 2/14 = 0.14 0/2 = 0 0/3 = 0 0/1 = 0
Spring 4/14 = 0.29 0/2 = 0 0/3 = 0 0/1 = 0
Summer 6/14 = 0.43 0/2 = 0 0/3 = 0 0/1 = 0
Season
Autumn 2/14 = 0.14 0/2 = 0 1/3= 0.33 0/1 = 0
Winter 2/14 = 0.14 2/2 = 1 2/3 = 0.67 0/1 = 0
27
Naïve Bayesian Classifier
Class
Attribute On Time Late Very Late Cancelled
None 5/14 = 0.36 0/2 = 0 0/3 = 0 0/1 = 0
Fog
High 4/14 = 0.29 1/2 = 0.5 1/3 = 0.33 1/1 = 1
Normal 5/14 = 0.36 1/2 = 0.5 2/3 = 0.67 0/1 = 0
None 5/14 = 0.36 1/2 = 0.5 1/3 = 0.33 0/1 = 0
Rain
Slight 8/14 = 0.57 0/2 = 0 0/3 = 0 0/1 = 0
Heavy 1/14 = 0.07 1/2 = 0.5 2/3 = 0.67 1/1 = 1
Prior Probability 14/20 = 0.70 2/20 = 0.10 3/20 = 0.15 1/20 = 0.05
28
Naïve Bayesian Classifier
Instance:
Week Day Winter High Heavy ???
Case1: Class = On Time : 0.70 × 0.64 × 0.14 × 0.29 × 0.07 = 0.0013
Case2: Class = Late : 0.10 × 0.50 × 1.0 × 0.50 × 0.50 = 0.0125
Case3: Class = Very Late : 0.15 × 1.0 × 0.67 × 0.33 × 0.67 = 0.0222
Case4: Class = Cancelled : 0.05 × 0.0 × 0.0 × 1.0 × 1.0 = 0.0000
Case3 is the strongest; Hence correct classification is Very Late
29
Naïve Bayesian Classifier
Algorithm: Naïve Bayesian Classification
Input: Given a set of k mutually exclusive and exhaustive classes C = , which
have prior probabilities P(C1), P(C2),….. P(Ck).
There are n-attribute set A = which for a given instance have values = , = ,….., =
Step: For each , calculate the class condition probabilities, i = 1,2,…..,k
Output: is the classification
Note: , because they are not probabilities rather proportion values (to posterior probabilities)
30
Discriminative & Generative Models
• Machine learning models can be classified into discriminative and generative models.
• A “Discriminative model” models the decision boundary between the classes and A “Generative Model” explicitly
models the actual distribution of each class.
Discriminative & Generative Models
• The discriminative model is used particularly for supervised machine learning. Also called a
conditional model, it learns the boundaries between classes or labels in a dataset.
• The ultimate goal of discriminative models is to separate one class from another.
• Types of discriminative models in machine learning include: Logistic Regression, Support Vector
Machine, Decision Tree, Random Forest.
• Generative models are a class of statistical models that generate new data instances. These models are
used in unsupervised machine learning to perform tasks such as probability and likelihood estimation,
modelling data points, and distinguishing between classes using these probabilities.
• Generative models rely on the Bayes theorem to find the joint probability.
• Examples of Generative models are Naive Bayes (and generally Bayesian networks), Hidden Markov
model, Linear discriminant analysis (LDA), a dimensionality reduction technique.
Discriminative & Generative Models
A Generative Model learns the joint probability distribution p(x,y). It
predicts the conditional probability with the help of Bayes Theorem. A
Discriminative model learns the conditional probability distribution p(y|x).
Both of these models were generally used in supervised learning problems.
Note:
Joint Probability
Joint probability is the likelihood of more than one event occurring at the same time P(A and B).
Conditional Probability
The conditional probability of an event B is the probability that the event will occur given the knowledge
that an event A has already occurred. It is denoted by P(B|A).
Discriminative Vs Generative Models
• Discriminative models have the advantage of being more robust to outliers, unlike the
generative models.
• However, one major drawback is a misclassification problem, i.e., wrongly classifying a
data point.
• Another key difference between these two types of models is that while a generative
model focuses on explaining how the data was generated, a discriminative model focuses
on predicting labels of the data.
Gaussian Discriminant Analysis
Gaussian Discriminant Analysis is a Generative Learning Algorithm and
in order to capture the distribution of each class, it tries to fit a Gaussian
Distribution to every class of the data separately.
Classification
Classification
Performance Measures
Confusion Matrix
Focus on the predictive capability of a model
PREDICTED CLASS
Class=Yes Class=No a: TP (true positive)
b: FN (false
ACTUAL Class=Yes a b
CLASS negative) c: FP
Class=No c d (false positive) d:
TN (true negative)
Accuracy Metric
Ratio of true positives and true negatives to the sum of true positives, true negatives, false
negatives,
and false positives
PREDICTED CLASS
Class=Yes Class=No
ACTUAL CLASS Class=Yes a b
(TP) (FN)
Class=No c d
(FP) (TN)
ad TP TN a b c d
Accuracy
TP TN FP FN
Limitation of Accuracy
Consider a 2-class problem
Number of Class 0 examples =
9990 Number of Class 1 examples
= 10
If the model predicts every example to be
class 0, accuracy is 9990/10000 = 99.9 %
Hence, accuracy is misleading because the
model does not detect any class 1
example
Cost Matrix
Cost matrix takes weights into account
PREDICTED CLASS
C(i|j) Class=Yes Class=No
ACTUAL Class=Yes C(Yes|Yes) C(No|Yes)
CLASS
Class=No C(Yes|No) C(No|No)
Cost of classifying class j example as class wawd
1 4
i
Weighted Accuracy w a w b w c w
1 2 3 4
d
Computing Cost of Classification
Cost PREDICTED CLASS
Matrix
C(i|j) + -
ACTUAL + -1 100
CLASS
- 1 0
Model PREDICTED CLASS Model PREDICTED CLASS
M1 M2
+ - + -
ACTUAL + 150 40 ACTUAL + 250 45
CLASS CLASS
- 60 250 - 5 200
Accuracy = 80% Accuracy = 90%
Cost = 3910 Cost = 4255
Cost vs. Accuracy
Count PREDICTED CLASS Accuracy is proportional to cost if
1. C(Yes|No)=C(No|Yes) = q
Class=Yes Class=No 2. C(Yes|Yes)=C(No|No) = p
Class=Yes a b
ACTU N=a+b+c+d
AL
CLAS Class=No c d
S
Accuracy = (a + d)/N
Cost PREDICTED CLASS Cost = p (a + d) + q (b + c)
Class=Yes Class=No = p (a + d) + q (N – a – d)
Class=Yes p q = q N – (q – p)(a + d)
ACTUAL
CLASS = N [q – (q-p) Accuracy]
Class=No q p
THANK
YOU
For Queries,
Write at : [Link]@[Link]