0% found this document useful (0 votes)

18 views40 pages

Bayesian Learning and Naïve Bayes Classifier

Uploaded by

64vd7wkk49

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

18 views40 pages

Bayesian Learning and Naïve Bayes Classifier

Uploaded by

64vd7wkk49

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

+

Bayesian Learning

Dr. Megha Ummat

+
Elements of Probability

◼A random experiment is one whose outcome is not

predictable with certainty in advance.

◼ The set of all possible outcomes is known as the

sample space S.
◼ A sample space is discrete if it consists of a finite
(or countably infinite) set of outcomes; otherwise
it is continuous.
+
Elements of Probability

◼ Anysubset E of S is an event.
◼ Events are sets, and we can talk about their
complement, intersection, union, and so forth.
+
Probability

◼ Probability can be interpreted as a frequency.

◼ When an experiment is continually repeated under

the exact same conditions, for any event E, the
proportion of time that the outcome is in E
approaches some constant value.

◼ This constant limiting frequency is the probability

of the event, and we denote it as P(E).
+
Bayesian Learning

◼ BayesianLearning is a probabilistic
approach of inference.

◼ Given a set of training data T, we are

interested in the best hypothesis h
from the space H.

◼ The best hypothesis can be

considered equivalent to the most
probable hypothesis.
+
Bayesian Learning

◼ Fortraining data T, if we have any

previous knowledge about the
probabilities of various hypothesis in
H, we can estimate the probability of
best hypothesis with the aid of Bayes
theorem.
+
Bayes Theorem
◼ P(h) prior probability of h, denotes any previous
knowledge about the chance that h is correct

◼ P(T) prior probability of T, probability that data

such as T will be observed

◼ P(T|h) probability of observing T, given that h

holds

◼ P(h|T) reflects confidence that h holds after T has

been observed. Also known as posterior
probability.

𝑷 𝑻𝒉 𝑷(𝒉)
P(h|T)=
𝑷(𝑻)
+
Bayes Theorem Example 1
▪ Given:
◼ A doctor knows that meningitis causes stiff neck
50% of the time
◼ Prior probability of any patient having meningitis
is 1/50,000
◼ Prior probability of any patient having stiff neck
is 1/20

◼ If a patient has stiff neck, what’s the probability

he/she has meningitis?
P( S | M ) P( M ) 0.5 1 / 50000
P( M | S ) = = = 0.0002
P( S ) 1 / 20
+
Bayes Theorem Example 2
◼ In
Denmark, 51% of the adults are males.
One adult is randomly selected for a
survey involving credit card usage.

◼ Theselected survey subject was found

smoking a cigar. It is known that 9.5% of
males smoke cigars, whereas 1.7% of
females smoke cigars.

◼ Findthe probability that the selected

subject is a male.
+
Bayes Theorem Example 2

◼ Solution

M → Male M’ → Not a Male

C → Cigar Smoker C’ → Not a Cigar Smoker

Based on the given information

P(M) = 0.51 P(M’) = 0.49

P(C|M)=0.095 P(C|M’) = 0.017

𝑷 𝑪𝑴 𝑷(𝑴)
P(M|C)=
𝑷(𝑪)
+
Bayes Theorem Example 2

P(C) = P(M) * P(C|M) + P(M’) * P(C|M’)

Given

P(M) = 0.51 P(M’) = 0.49

P(C|M)=0.095 P(C|M’) = 0.017

P(C|M) P(M)
P(M|C) =
P(C|M) P(M) +P(C|M′) P(M’)
0.095 ∗ 0.51
= = 0.853
0.095 ∗ 0.51 + 0.017 ∗ 0.49
+
Using Bayes Theorem for
Classification
◼X → Set of Attributes, Y → Class Variable
◼ Given a record with attributes (X1, X2,…, Xd)
◼ Goal is to predict class Y
◼ The relationship can be captured probabilistically by
using P(Y|X)
◼ Specifically, we want to find the value of Y that
maximizes P(Y| X1, X2,…, Xd )
◼ This conditional probability is also known as posterior
probability for Y, as opposed to prior probability P(Y)
◼ Can we estimate P(Y| X1, X2,…, Xd ) directly from
data?
+
Using Bayes Theorem for
Classification
◼ During the training phase, we need to learn the
posterior probabilities P(Y|X) for every
combination of X and Y based on information
gathered from the training data.
◼ Byknowing these probabilities, a test record X’ can
be classified by finding the class Y’ that maximizes
the posterior probability P(Y’|X’).
◼ Given a test record with binary dependent
variables : “A” and “B”, we need to compute the
posterior probabilities P(A|X) and P(B|X) based
on the information available in the training data.
+
Using Bayes Theorem for
Classification
◼ If P(A|X) > P(B|X), then the record
is classified as “A” otherwise “B”.

◼ Estimating the posterior

probabilities accurately for every
possible combination of class label
and attribute value is a difficult
problem because it requires a very
large training set, even for
moderate number of attributes.
+
Using Bayes Theorem for
Classification
◼ TheBayes theorem is useful because it allows us to
express posterior probability in terms of prior
probability P(Y), the class conditional probability
P(X|Y), and the evidence P(X).

P( X | Y ) P(Y )
P(Y | X ) =
P( X )
+
Bayesian Classifiers
 In many applications, the relationship between the
class variable and the attribute set is non-
deterministic.

 This situation may arise due to presence of noisy

data or the presence of certain confounding
factors that affect classification but are not
included in the analysis.

 Eg: Predicting heart disease on the basis of

person’s diet and workout frequency. Confounding
factors such as heredity, excessive smoking or
alcohol abuse may be other factors that may cause
heart disease leading to non-deterministic
relationship.
+
Conditional Independence

X is said to be conditionally independent of Y given

Z if P(X|Y,Z) = P(X|Z)

Example: Arm length and reading skills

– Young child has shorter arm length and limited
reading skills, compared to adults
– If age is fixed, no apparent relationship between
arm length and reading skills
– Arm length and reading skills are conditionally
independent given age
+
Using Bayes Theorem for
Classification
◼ Compute posterior probability P(Y | X1, X2, …, Xd) using
the Bayes theorem.
P(X1, X2, ….,Xd|Y) P(Y)
P(Y|X1, X2, ….,Xd) =
P(X1, X2, ….,X𝑑)
◼ Since P(X1, X2,…, Xd ) is always constant (it can be ignored)
◼ Maximize the posterior probability: Choose Y that
maximizes P(Y | X1, X2, …, Xd)
◼ Equivalent to choosing value of Y that maximizes P(X1, X2,
…, Xd|Y) P(Y)
◼ How to estimate P(X1, X2, …, Xd | Y )?
+
Naïve Bayes Classifier
◼A naïve Bayes classifier estimates the class-
conditional probability by assuming that the
attributes are conditionally independent, given the
class label Y.
◼ Assume independence among attributes Xi when
class is given:
◼ P(X1, X2, …, Xd |Yj) = P(X1| Yj) P(X2| Yj)… P(Xd| Yj)

◼ Now we can estimate P(Xi| Yj) for all Xi and Yj

combinations from the training data

◼ New point is classified to Yj if P(Yj) * P(Xi| Yj) is

maximal.
+
Naïve Bayes Classifier

◼ Witha conditional independence assumption, instead of

computing the class-conditional probability for every
combination of X, we only have to estimate the
conditional probability of each Xi, given Y.
◼ Thisapproach is more practical because it does not
require a very large training set to obtain a good
estimate of the probability.
◼ Toclassify a test record, the Naïve Bayes classifier
computes the posterior probability for each class Y:
𝑃 𝑌 ∏ 𝑃(𝑋𝑖|𝑌)
P(Y|X) =
𝑃(𝑿)
+
Example
◼ Given a test record X = (Ease of Use: Easy; Quality:
Bad)

◼ Using Bayes Theorem

◼ Given a test record X = (Ease of Use: Easy; Quality: Bad)

P(X|Yes) = P(Ease of Use=Easy|Yes) * P(Quality=Bad|Yes)

P(X|No) = P(Ease of Use=Easy|No) * P(Quality=Bad|No)

Product Ease of Use Quality Satisfied?
P1 Easy Good Yes
P2 Moderate Good No
P3 Difficult Bad No
P4 Difficult Good No
P5 Moderate Bad Yes
+
Estimate Probabilities from Data
◼ Class: P(Y) = Nc/N
◼ Eg., P(No) = 3/5, P(Yes) = 2/5

◼ For attributes: P(Xi | Yk) = |Xik|/ Nc

◼ where |Xik| is number of instances having attribute
value Xi and belonging to class Yk
◼ Examples:
◼ P(Quality=Good|No) = 2/3
◼ P(Ease of use=Moderate|Yes)=1/2
Product Ease of Use Quality Satisfied?
P1 Easy Good Yes
P2 Moderate Good No
P3 Difficult Bad No
P4 Difficult Good No
P5 Moderate Bad Yes
+
Naïve Bayes from Example Data
◼ Given a test record X = (Ease of Use: Easy; Quality:
Bad)

P(X|Yes) = P(Ease of Use=Easy|Yes) * P(Quality=Bad|Yes)

As, P(X|Yes)* P(Yes) > P(X|No) * P(No)

Therefore, P(Yes|X) > P(No|X)
Class Prediction → Yes
+
Characteristics of Naïve Bayes
Classifier
◼ Apart from giving providing posterior probability
estimates, they attempt to capture the underlying
mechanism behind the generation of data instances
belonging to every class. Thus, they can be used for
predictive as well as descriptive insights.

◼ Ifthe attributes are conditionally independent of each

other, Naive Bayes can easily compute class conditional
probabilities even in high dimensional settings. This
makes it a simple and effective classification technique
that can be used in diverse problems such as text
classification.
+
Characteristics of Naïve Bayes
Classifier
◼ They are robust to isolated noise points because such
points are averaged out when estimating conditional
probabilities from data.

◼ Naïve Bayes can handle missing values in the training

data by ignoring the example during training and
classification. They can effectively handle missing values
in test instance, by using only the non-missing attribute
values while computing the posterior probabilities.

◼ They are robust to irrelevant attributes.

+
Characteristics of Naïve Bayes
Classifier
◼ Correlated attributes can degrade the
performance of Naïve Bayes as the conditional
independence assumption no longer holds for
such attributes.
◼ Use other techniques like Bayesian Belief
Networks
+
Practice
Record A B C Class

1 0 0 0 +
2 0 0 1 -
3 0 1 1 -
4 0 1 1 -
5 0 0 1 +
6 1 0 1 +
7 1 0 1 -
8 1 0 1 -
9 1 1 1 +
10 1 0 1 +
a) Estimate the conditional probabilities for P(A|+), P(B|+),
P(C|+), P(A|-), P(B|-), P(C|-).
b) Use the estimate of conditional probabilities given in the
previous question to predict the class label for a test sample
(A=0, B=1, C=0) using the naïve bayes approach.
+Estimate Probabilities from Data
l For continuous attributes:
Discretization: Partition the range into bins:
◆ Replace continuous value with bin value
◼ Attribute changed from continuous to ordinal
k

Probability density estimation:

◆ Assume attribute follows a normal distribution
◆ Use data to estimate parameters of distribution
(e.g., mean and standard deviation)
◆ Once probability distribution is known, use it to
estimate the conditional probability P(Xi|Y)
◆ The distribution is estimated by mean and
variance
+Estimate Probabilities from Data
l l
ic a ic a
ous
or or nu
te g
te g
nti
la ss
ca ca co c
Tid Refund Marital
Status
Taxable
Income Evade
l Normal distribution:
( X i − ij ) 2
−
1 Yes Single 125K No 1 2 ij2
P( X i | Y j ) = e
2 No Married 100K No
2 2
ij
3 No Single 70K No
4 Yes Married 120K No – One for each (Xi,Yi) pair
5 No Divorced 95K Yes
6 No Married 60K No
l For (Income, Class=No):
7 Yes Divorced 220K No
8 No Single 85K Yes – If Class=No
9 No Married 75K No ◆ sample mean = 110
10
10 No Single 90K Yes ◆ sample variance = 2975

1 −
( 120−110 ) 2

P( Income = 120 | No) = e 2 ( 2975 )

= 0.0072
2 (54.54)
+
Example of Naïve Bayes Classifier
Given a Test Record:

X = (Refund = No, Divorced, Income = 120K)

Naïve Bayes Classifier:

P(Refund = Yes | No) = 3/7

For Taxable Income:

If class = No: sample mean = 110
sample variance = 2975
If class = Yes: sample mean = 90
sample variance = 25
+
Example of Naïve Bayes Classifier
Given a Test Record:

X = (Refund = No, Divorced, Income = 120K)

 P(X | No) = P(Refund=No | No)
 P(Divorced | No)
 P(Income=120K | No)
= 4/7  1/7  0.0072 = 0.0006

 P(X | Yes) = P(Refund=No | Yes)

 P(Divorced | Yes)
 P(Income=120K | Yes)
= 1  1/3  1.2  10-9 = 4  10-10
Since P(X|No)P(No) > P(X|Yes)P(Yes)
Therefore P(No|X) > P(Yes|X)
=> Class = No
+
Handling Zero Conditional
Probabilities
◼ IfP( Marital Status = Divorced| No) is zero instead
of 1/7, then a data instance with attribute set x=
(Home Owner = Yes, marital Status = Divorced,
Income =$120) will have the following class
conditional probabilities:
◼ P(X|No) = 3/7 X 0 X 0.0072 = 0
◼ P (X|Yes) = 0 X 1/3 X 1.2 X 10-9 = 0
As both class conditional probabilities are zero
Naïve Bayes will not be able to classify the
instance.
+
Handling Zero Conditional
Probabilities
◼ To address this issue, we need to adjust the conditional
probability estimates using the following alternate
estimates :
Laplace Estimate:
𝑛𝑐 + 1
𝑃(𝑋𝑖 = 𝑐|𝑦) =
𝑛+𝑣
m- estimate:
𝑛𝑐 + 𝑚𝑝
𝑃(𝑋𝑖 = 𝑐|𝑦) =
𝑛+𝑚
n : Number of training instances belonging to class y
nc : Number of training instances with Xi = c and Y = y
v is the total number of values Xi can take
p is the initial estimate of P (Xi = c|y) known as priori
m is a hyper-parameter that indicates our confidence in using p when the
fraction of training instances is too brittle
+
Exercise

◼ Previous probabilities in Bayes Theorem that are

changed with help of new available information
are termed as

A) independent probabilities

B) posterior probabilities

C) interior probabilities

D) dependent probabilities
+
Exercise

◼ Previous probabilities in Bayes Theorem that are

changed with help of new available information
are termed as

A) independent probabilities

B) posterior probabilities

C) interior probabilities

D) dependent probabilities
+
Exercise

◼ Suppose the fraction of undergraduate students

who smoke is 15% and the fraction of graduate
students who smoke is 23%. If one-fifth of the
college students are graduate students and the rest
are undergraduates, what is the probability that a
student who smokes is a graduate student?
+
Solution

P(S|UG) = 0.15, P(S|G) = 0.23, P(G) = 0.2, P(UG) = 0.8.

We want to compute P(G|S).

According to Bayesian Theorem,

0.23 𝑋 0.2
P(G|S) = = 0.277
0.15 𝑋 0.8+0.23 𝑋 0.2

Bayesian Classification Methods Explained
No ratings yet
Bayesian Classification Methods Explained
46 pages
Naïve Bayes Classification Overview
No ratings yet
Naïve Bayes Classification Overview
31 pages
Understanding Naive Bayes Algorithm
No ratings yet
Understanding Naive Bayes Algorithm
22 pages
Naive Bayes Algorithm Course Overview
No ratings yet
Naive Bayes Algorithm Course Overview
8 pages
Understanding Bayes' Theorem in Classification
No ratings yet
Understanding Bayes' Theorem in Classification
37 pages
Bayesian Classification in Data Mining
No ratings yet
Bayesian Classification in Data Mining
46 pages
Naive Bayesian Classification Explained
No ratings yet
Naive Bayesian Classification Explained
48 pages
Giu 2575 68 31286 2026-03-24T12 26 26
No ratings yet
Giu 2575 68 31286 2026-03-24T12 26 26
21 pages
Understanding Bayesian Classification Techniques
No ratings yet
Understanding Bayesian Classification Techniques
15 pages
Bayesian Classification Explained
No ratings yet
Bayesian Classification Explained
18 pages
Bayesian Classification Techniques
No ratings yet
Bayesian Classification Techniques
7 pages
Naïve Bayes Classifier Overview
No ratings yet
Naïve Bayes Classifier Overview
44 pages
Probability and Bayesian Concepts Explained
No ratings yet
Probability and Bayesian Concepts Explained
48 pages
Understanding Bayes Theorem in ML
No ratings yet
Understanding Bayes Theorem in ML
20 pages
Naïve Bayes Classification Overview
No ratings yet
Naïve Bayes Classification Overview
19 pages
Naïve Bayes Classifier Overview
No ratings yet
Naïve Bayes Classifier Overview
19 pages
Bayes Classification Explained
No ratings yet
Bayes Classification Explained
41 pages
Naive Bayes Classification Explained
No ratings yet
Naive Bayes Classification Explained
12 pages
Support Vector Machines and Naïve Bayes
No ratings yet
Support Vector Machines and Naïve Bayes
37 pages
AtrayeeDutta PM
No ratings yet
AtrayeeDutta PM
5 pages
Naïve Bayes Classifier Overview
No ratings yet
Naïve Bayes Classifier Overview
29 pages
Understanding Bayesian Classification
No ratings yet
Understanding Bayesian Classification
25 pages
Naïve Bayes Classifier Explained
No ratings yet
Naïve Bayes Classifier Explained
19 pages
Naïve Bayes Classifier Overview
No ratings yet
Naïve Bayes Classifier Overview
26 pages
Machine Learning: Classification Techniques
No ratings yet
Machine Learning: Classification Techniques
37 pages
Naive Bayesian Classification Overview
No ratings yet
Naive Bayesian Classification Overview
16 pages
Naive Bayes Classifier Explained
No ratings yet
Naive Bayes Classifier Explained
79 pages
Bayesian Decision Theory Overview
No ratings yet
Bayesian Decision Theory Overview
64 pages
Understanding Bayes Classifier in ML
No ratings yet
Understanding Bayes Classifier in ML
57 pages
Laplace Smoothing in Naive Bayes
No ratings yet
Laplace Smoothing in Naive Bayes
79 pages
Understanding Bayesian Classification Techniques
No ratings yet
Understanding Bayesian Classification Techniques
49 pages
Naive Bayesian Classifier Overview
No ratings yet
Naive Bayesian Classifier Overview
21 pages
Naïve Bayes Classifier Explained
No ratings yet
Naïve Bayes Classifier Explained
9 pages
Machine Learning: Dr. Mohamed Hussein
No ratings yet
Machine Learning: Dr. Mohamed Hussein
41 pages
Naive Bayes Classifier Implementation
No ratings yet
Naive Bayes Classifier Implementation
7 pages
NB Classifier 2024
No ratings yet
NB Classifier 2024
64 pages
Understanding Bayesian Learning Methods
No ratings yet
Understanding Bayesian Learning Methods
42 pages
Data Mining Notes
No ratings yet
Data Mining Notes
18 pages
Naïve Bayes Classifier Overview
No ratings yet
Naïve Bayes Classifier Overview
19 pages
Bayesian Classification Techniques
No ratings yet
Bayesian Classification Techniques
16 pages
Bayesian Concept Learning Overview
No ratings yet
Bayesian Concept Learning Overview
40 pages
ML Lecture - 4 Classification (Naive Bayes)
No ratings yet
ML Lecture - 4 Classification (Naive Bayes)
34 pages
Understanding Naïve Bayes Classifiers
No ratings yet
Understanding Naïve Bayes Classifiers
31 pages
Understanding Naïve Bayes Classifiers
No ratings yet
Understanding Naïve Bayes Classifiers
6 pages
Bayes Classifier in Data Mining
No ratings yet
Bayes Classifier in Data Mining
14 pages
Naive Bayes Classification Overview
No ratings yet
Naive Bayes Classification Overview
21 pages
Bayesian Learning and Decision Theory
No ratings yet
Bayesian Learning and Decision Theory
11 pages
Bayesian Classification
No ratings yet
Bayesian Classification
11 pages
Naive Bayes Algorithm Overview
No ratings yet
Naive Bayes Algorithm Overview
11 pages
Naïve Bayes Classifier Overview
No ratings yet
Naïve Bayes Classifier Overview
21 pages
Naïve Bayes Classifier Explained
No ratings yet
Naïve Bayes Classifier Explained
19 pages
Naïve Bayesian Classification Explained
No ratings yet
Naïve Bayesian Classification Explained
17 pages
Bayesian Classification in Data Mining
No ratings yet
Bayesian Classification in Data Mining
15 pages
Bayes' Theorem in Data Classification
No ratings yet
Bayes' Theorem in Data Classification
10 pages
Understanding Bayes Theorem and Applications
No ratings yet
Understanding Bayes Theorem and Applications
10 pages
Bayesian Classification Overview
No ratings yet
Bayesian Classification Overview
66 pages
Bayesian Concept Learning Overview
No ratings yet
Bayesian Concept Learning Overview
40 pages
Market Research Objectives and Types
No ratings yet
Market Research Objectives and Types
4 pages
Primate Fossil Behavior Reconstruction
No ratings yet
Primate Fossil Behavior Reconstruction
443 pages
Understanding Sampling Methods in Research
No ratings yet
Understanding Sampling Methods in Research
10 pages
Deep Learning for Intrusion Detection
No ratings yet
Deep Learning for Intrusion Detection
6 pages
Statistical Process Control Analysis
No ratings yet
Statistical Process Control Analysis
1 page
Linear Regression in Google Colab
No ratings yet
Linear Regression in Google Colab
3 pages
UCL Museum Wellbeing Measures Toolkit
No ratings yet
UCL Museum Wellbeing Measures Toolkit
28 pages
Backpropagation in Neural Networks
No ratings yet
Backpropagation in Neural Networks
7 pages
Unemployment Trends in India
No ratings yet
Unemployment Trends in India
17 pages
Understanding Normal Distribution
No ratings yet
Understanding Normal Distribution
5 pages
Credit Access Impact on Livestock Farmers
No ratings yet
Credit Access Impact on Livestock Farmers
10 pages
CSEBook PDF
No ratings yet
CSEBook PDF
342 pages
Sustainability 12 08149 PDF
No ratings yet
Sustainability 12 08149 PDF
11 pages
STA301: Intro to Statistics & Probability
No ratings yet
STA301: Intro to Statistics & Probability
73 pages
Ordinal Regression Analysis with PLUM
No ratings yet
Ordinal Regression Analysis with PLUM
11 pages
SL Math Applications Review Guide
No ratings yet
SL Math Applications Review Guide
7 pages
Testing and Correcting Serial Correlation
No ratings yet
Testing and Correcting Serial Correlation
20 pages
Real-World Applications of Mathematics
No ratings yet
Real-World Applications of Mathematics
3 pages
EViews 6 Users Guide II
No ratings yet
EViews 6 Users Guide II
688 pages
Understanding Research Significance
No ratings yet
Understanding Research Significance
58 pages
Pooled OLS Model Analysis Results
No ratings yet
Pooled OLS Model Analysis Results
3 pages
Educational Research Methods Explained
No ratings yet
Educational Research Methods Explained
7 pages
Correlation Coefficients and Best Fit Lines
100% (1)
Correlation Coefficients and Best Fit Lines
4 pages
Impact of Manganese on Arabidopsis Roots
No ratings yet
Impact of Manganese on Arabidopsis Roots
2 pages
Data Analysis Techniques Overview
No ratings yet
Data Analysis Techniques Overview
15 pages
Normal Distribution in Concrete Quality Control
No ratings yet
Normal Distribution in Concrete Quality Control
14 pages
Prod and Statis
No ratings yet
Prod and Statis
7 pages
Business Statistics for B.B.A. Students
No ratings yet
Business Statistics for B.B.A. Students
16 pages
Mindfulness Program Impact on Students
No ratings yet
Mindfulness Program Impact on Students
116 pages
GEE and GLMM in Neuroscience Research
No ratings yet
GEE and GLMM in Neuroscience Research
10 pages