0% found this document useful (0 votes)

11 views19 pages

Naïve Bayes Classifier Overview

The document discusses Bayesian Theory and Naïve Bayes Classifiers, which are statistical classifiers based on Bayes' Theorem that predict class membership probabilities. It explains the principles of Bayes' Theorem, the process of classification using maximum posteriori probabilities, and the assumptions of conditional independence in Naïve Bayes classifiers. Additionally, it highlights the advantages and disadvantages of using Naïve Bayes in practical applications.

Uploaded by

nihar44203

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

11 views19 pages

Naïve Bayes Classifier Overview

Uploaded by

nihar44203

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

Bayesian Theory & Naïve Bayes Classifiers

Course 4232: Machine Learning

Dept. of Computer Science

Faculty of Science and Technology
Bayesian Classifier
A statistical classifier: performs probabilistic
prediction, i.e., predicts class membership
probabilities
 Foundation: Based on Bayes’ Theorem.
 Performance: A basic Bayesian classifier, naïve
Bayesian classifier, has comparable performance
with decision tree and selected neural network
classifiers
 Standard: Even when Bayesian methods are
computationally intractable, they can provide a
standard of optimal decision making against
which other methods can be measured
Bayes’ Theorem: Basics

 Bayes’ Theorem:

P( H | X) P(X | H ) P( H ) P(X | H )P( H ) / P(X)

P(X)
 Let X be a data sample (“evidence”): class label is unknown
 Let H be a hypothesis that X belongs to class C
 Classification is to determine P(H|X), (i.e., posteriori
probability): the probability that the hypothesis holds given
the observed data sample X
 P(H) (prior probability): the initial probability
 E.g., X will buy computer, regardless of age, income, …
 P(X): probability that sample data is observed
 P(X|H) (likelihood): the probability of observing the sample X,
given that the hypothesis holds
 E.g., Given that X will buy computer, the prob. that X is
31..40, medium income
Prediction Based on Bayes’ Theorem

 Given training data X, posteriori probability of a

hypothesis H, P(H|X), follows the Bayes’ theorem

P(H | X) P(X | H ) P(H ) P(X | H )P(H ) / P(X)

P(X)
 Informally, this can be viewed as

posteriori = likelihood x prior/evidence

 Predicts X belongs to Ci iff the probability P(Ci|X) is

the highest among all the P(Ck|X) for all the k

classes
 Practical difficulty: It requires initial knowledge of
many probabilities, involving significant
Classification Is to Derive the Maximum Posteriori

 Let D be a training set of tuples and their associated class

labels, and each tuple is represented by an n-D attribute
vector X = (x1, x2, …, xn)
 Suppose there are m classes C1, C2, …, Cm.
 Classification is to derive the maximum posteriori, i.e., the
maximal P(Ci|X)
 This can be derived fromP(XBayes’
| C )P(C theorem
)
P(C | X)  i i
i P(X)

 Since P(X) is constant for all classes, only

P(C | X) P(X | C )P(C )
i i i

needs to be maximized
Applying Bayes’ rule:
A basic Example

A doctor knows that the disease meningitis causes the patient to have a stiff neck, say,
70% of the time. The doctor also knows some unconditional facts: the prior probability that a
patient has meningitis is 1/50,000, and the prior probability that any patient has a stiff
neck is 1%.

Letting
s be the proposition that the patient has a stiff neck and
m be the proposition that the patient has meningitis, we have
Bayesian Methods
Learning and classification methods based
on probability theory.
Bayes theorem plays a critical role in
probabilistic learning and classification.
Uses prior probability of each category
given no information about an item.
Categorization produces a posterior
probability distribution over the possible
categories given a description of an item.
Bayes Classifiers
Assumption: training set consists of instances of different
classes described cj as conjunctions of attributes values
Task: Classify a new instance d based on a tuple of
attribute values into one of the classes cj  C
Key idea: assign the most probable classc MAP using
Bayes Theorem.

cMAP argmax P(c j | x1 , x2 ,  , xn )

c j C

P ( x1 , x2 ,  , xn | c j ) P (c j )
argmax
c j C P ( x1 , x2 ,  , xn )
argmax P( x1 , x2 ,  , xn | c j ) P(c j )
c j C
Naïve Bayes Classifier

A simplified assumption: attributes are

conditionally independent (i.e., no dependence
relation between n
attributes):
P( X | ) 
Ci P( | ) P( | ) P( | ) ... P( | )
 x k Ci x 1 Ci x 2 Ci x n Ci
k 1

 This greatly reduces the computation cost: Only

counts the class distribution
 If A is categorical, P(x |C ) is the # of tuples in C
k k i i
having value xk for Ak divided by |Ci, D| (# of tuples
of Ci in D)
 If A is continous-valued, P(xk1|Ci)  ( xis  )
usually
2

k
g ( x,  ,  ) distribution
2
2
computed based on Gaussian e with a
2 
mean μ and standard deviation σ
P ( X | C i )  g ( x k ,  Ci ,  C i )
Parameters estimation
P(cj)
 Can be estimated from the frequency of classes
in the training examples.
P(x1,x2,…,xn|cj)
 O(|X|n•|C|) parameters
 Could only be estimated if a very, very large
number of training examples was available.
 Independence Assumption: attribute values are
conditionally independent given the target
value: naïve Bayes.
P ( x1 , x 2 ,  , x n | c j )  P ( xi | c j )
i

c NB arg max P (c j ) P ( xi | c j )
c j C i
Naïve Bayes Classifier: Training Dataset
age income studentcredit_rating
buys_compu
<=30 high no fair no
<=30 high no excellent no
Class:
31…40 high no fair yes
C1:buys_computer = ‘yes’ >40 medium no fair yes
C2:buys_computer = ‘no’ >40 low yes fair yes
>40 low yes excellent no
Data to be classified: 31…40 low yes excellent yes
X = (age <=30, <=30 medium no fair no
Income = medium, <=30 low yes fair yes
>40 medium yes fair yes
Student = yes
<=30 medium yes excellent yes
Credit_rating = Fair) 31…40 medium no excellent yes
31…40 high yes fair yes
>40 medium no excellent no
Naïve Bayes Classifier: An Example age income studentcredit_rating
buys_comp
<=30 high no fair no
<=30 high no excellent no
31…40 high no fair yes
 P(C ): P(buys_computer = “yes”) = 9/14 = 0.643 >40 medium no fair yes
i >40 low yes fair yes
>40 low yes excellent no
P(buys_computer = “no”) = 5/14= 0.357 31…40
<=30
low
medium
yes excellent
no fair
yes
no

 Compute P(X|C ) for each class <=30

>40
low yes fair
medium yes fair
yes
yes
i <=30 medium yes excellent yes
P(age = “<=30” | buys_computer = “yes”) = 2/9 = 0.22231…40
31…40
medium
high
no excellent
yes fair
yes
yes

P(age = “<= 30” | buys_computer = “no”) = 3/5 = 0.6 >40 medium no excellent no

P(income = “medium” | buys_computer = “yes”) = 4/9 = 0.444

P(income = “medium” | buys_computer = “no”) = 2/5 = 0.4
P(student = “yes” | buys_computer = “yes) = 6/9 = 0.667
P(student = “yes” | buys_computer = “no”) = 1/5 = 0.2
P(credit_rating = “fair” | buys_computer = “yes”) = 6/9 = 0.667
P(credit_rating = “fair” | buys_computer = “no”) = 2/5 = 0.4
 X = (age <= 30 , income = medium, student = yes, credit_rating =
fair)
P(X|Ci) : P(X|buys_computer = “yes”) = 0.222 x 0.444 x 0.667 x 0.667 =
0.044
P(X|buys_computer = “no”) = 0.6 x 0.4 x 0.2 x 0.4 = 0.019
P(X|Ci)*P(Ci) : P(X|buys_computer = “yes”) * P(buys_computer = “yes”) =
0.028
P(X|buys_computer = “no”) * P(buys_computer = “no”) =
0.007
What is discriminant Function
For classification problem, for each class,
define a function such that we
choose Ci if
gi ( x), i 1,..., K

gi ( x) max g k ( x)
k
K=2 Classes
Dichotomizer (K=2) vs Polychotomizer
(K>2)
g(x) = g1(x) – g2(x)
C 1 if g x  0
choose
C 2 otherwise

P C 1 | x
Log odds:log
P C 2 | x
Discriminant Functions
chooseC i if gi x  maxk gk x gi x, i 1, , K

 R i | x

gi x  P C i | x
p x | C P C 
 i i

K decision regions R1,...,RK

R i x | gi x  maxk gk x

Properties
Estimating
P ( xi | c j ) instead , x2 , , xn | c j )
P( x1of
greatly reduces the number of
parameters (and the data sparseness).
TheP(learning
xi | c j ) step in Naïve Bayes consists
of estimating and based
on the frequenciesP(in c j )the training data
An unseen instance is classified by
computing the class that maximizes the
posterior
When conditioned independence is
satisfied, Naïve Bayes corresponds to MAP
classification.
Maximum A Posterior
Based on Bayes Theorem, we can compute the
Maximum A Posterior (MAP) hypothesis for the data
We are interested in the best hypothesis for some
space H given observed training data D.

hMAP argmax P ( h | D )
hH

P ( D | h) P ( h)
argmax
hH P( D)
argmax P( D | h) P(h)
hH
H: set of all hypothesis.
Note that we can drop P(D) as the probability of the data
is constant (and independent of the hypothesis).
Desirable Properties of Bayes
Classifier
Incrementality: with each training example,
the prior and the likelihood can be updated
dynamically: flexible and robust to errors.
Combines prior knowledge and observed
data: prior probability of a hypothesis
multiplied with probability of the hypothesis
given the training data
Probabilistic hypothesis: outputs not only a
classification, but a probability distribution
over all classes
Naïve Bayes Classifier: Comments
 Advantages
Easy to implement
Good results obtained in most of the cases
 Disadvantages
Assumption: class conditional independence,
therefore loss of accuracy
Practically, dependencies exist among variables
 E.g., hospitals: patients: Profile: age, family history, etc.
Symptoms: fever, cough etc., Disease: lung
cancer, diabetes, etc.
 Dependencies among these cannot be modeled by Naïve
Bayes Classifier

Naïve Bayes Classifier Overview
No ratings yet
Naïve Bayes Classifier Overview
21 pages
Naïve Bayes Classifier Explained
No ratings yet
Naïve Bayes Classifier Explained
9 pages
Naïve Bayesian Classification Explained
No ratings yet
Naïve Bayesian Classification Explained
17 pages
Bayesian Classification
No ratings yet
Bayesian Classification
11 pages
Bayesian Classification in Data Mining
No ratings yet
Bayesian Classification in Data Mining
15 pages
Machine Learning: Classification Techniques
No ratings yet
Machine Learning: Classification Techniques
37 pages
Bayes Classification Explained
No ratings yet
Bayes Classification Explained
41 pages
Understanding Bayesian Classification Techniques
No ratings yet
Understanding Bayesian Classification Techniques
15 pages
Bayesian Classification Explained
No ratings yet
Bayesian Classification Explained
18 pages
Naïve Bayes Classifier Explained
No ratings yet
Naïve Bayes Classifier Explained
19 pages
Naive Bayes Classification Explained
No ratings yet
Naive Bayes Classification Explained
18 pages
Bayesian Classification Methods Explained
No ratings yet
Bayesian Classification Methods Explained
46 pages
Bayesian Classifiers and Data Mining Techniques
No ratings yet
Bayesian Classifiers and Data Mining Techniques
7 pages
Bayesian Classification Techniques
No ratings yet
Bayesian Classification Techniques
16 pages
Bayesian Classification in Data Mining
No ratings yet
Bayesian Classification in Data Mining
6 pages
Mod 3 Aiml
No ratings yet
Mod 3 Aiml
84 pages
Naive Bayesian Classification Explained
No ratings yet
Naive Bayesian Classification Explained
48 pages
Naive Bayesian Classification Overview
No ratings yet
Naive Bayesian Classification Overview
16 pages
Naïve Bayes Classifier Overview
No ratings yet
Naïve Bayes Classifier Overview
27 pages
Data Mining Notes
No ratings yet
Data Mining Notes
18 pages
K-Nearest Neighbors Overview
No ratings yet
K-Nearest Neighbors Overview
35 pages
Naive Bayesian Classifier Overview
No ratings yet
Naive Bayesian Classifier Overview
21 pages
Supervised Learning: Classification Methods
No ratings yet
Supervised Learning: Classification Methods
79 pages
Bayes' Theorem in Data Classification
No ratings yet
Bayes' Theorem in Data Classification
10 pages
Bayesian Classification Methods Overview
No ratings yet
Bayesian Classification Methods Overview
21 pages
Bayesian Classification Techniques Explained
No ratings yet
Bayesian Classification Techniques Explained
8 pages
Machine Learning: Classification & Clustering
No ratings yet
Machine Learning: Classification & Clustering
44 pages
Classification Algorithms Overview
No ratings yet
Classification Algorithms Overview
169 pages
Naive Bayes Algorithm Course Overview
No ratings yet
Naive Bayes Algorithm Course Overview
8 pages
Understanding Bayesian Classification Techniques
No ratings yet
Understanding Bayesian Classification Techniques
23 pages
Understanding Bayesian Classification
No ratings yet
Understanding Bayesian Classification
25 pages
Naive Bayesian Classification Overview
No ratings yet
Naive Bayesian Classification Overview
15 pages
Bayesian Classification of Computer Buyers
No ratings yet
Bayesian Classification of Computer Buyers
4 pages
Understanding Bayesian Learning Methods
No ratings yet
Understanding Bayesian Learning Methods
42 pages
Bayes Optimal Classifier Explained
No ratings yet
Bayes Optimal Classifier Explained
16 pages
Naïve Bayes Classifier Overview
No ratings yet
Naïve Bayes Classifier Overview
19 pages
10 Naive Bayesian Classifier
No ratings yet
10 Naive Bayesian Classifier
20 pages
Bayesian Decision Theory Overview
No ratings yet
Bayesian Decision Theory Overview
64 pages
Naïve Bayesian Classification Overview
No ratings yet
Naïve Bayesian Classification Overview
47 pages
Understanding Naïve Bayes Classifiers
No ratings yet
Understanding Naïve Bayes Classifiers
6 pages
Classification and Prediction Methods
No ratings yet
Classification and Prediction Methods
46 pages
Understanding Bayes' Theorem in Classification
No ratings yet
Understanding Bayes' Theorem in Classification
37 pages
Bayes Classification Techniques Overview
No ratings yet
Bayes Classification Techniques Overview
30 pages
Bayesian Learning and Decision Theory
No ratings yet
Bayesian Learning and Decision Theory
11 pages
Bayesian Learning and Naïve Bayes Classifier
No ratings yet
Bayesian Learning and Naïve Bayes Classifier
40 pages
Naïve Bayes Classification Explained
No ratings yet
Naïve Bayes Classification Explained
22 pages
Naive Bayes Work Type Prediction
No ratings yet
Naive Bayes Work Type Prediction
9 pages
Statistical Learning: Bayesian Models Overview
No ratings yet
Statistical Learning: Bayesian Models Overview
33 pages
Bayesian Classification Techniques Explained
No ratings yet
Bayesian Classification Techniques Explained
22 pages
Machine Learning Classifiers Overview
No ratings yet
Machine Learning Classifiers Overview
113 pages
Naive Bayes Classifier Overview
No ratings yet
Naive Bayes Classifier Overview
15 pages
Bayesian Concept Learning Overview
No ratings yet
Bayesian Concept Learning Overview
40 pages
Understanding Bayes Classifier in ML
No ratings yet
Understanding Bayes Classifier in ML
57 pages
Understanding Naïve Bayes Classifiers
No ratings yet
Understanding Naïve Bayes Classifiers
31 pages
Naive Bayesian Classifiers Overview
No ratings yet
Naive Bayesian Classifiers Overview
43 pages
Bayesian Classification in Data Mining
No ratings yet
Bayesian Classification in Data Mining
46 pages
Understanding Bayes' Theorem in Learning
No ratings yet
Understanding Bayes' Theorem in Learning
68 pages
Chapter#13
No ratings yet
Chapter#13
10 pages
Bayesian Reasoning and Classifiers Explained
No ratings yet
Bayesian Reasoning and Classifiers Explained
10 pages
Understanding Regression in Supervised Learning
No ratings yet
Understanding Regression in Supervised Learning
30 pages
K-Nearest Neighbors in Classification
No ratings yet
K-Nearest Neighbors in Classification
15 pages
Decision Tree Classification Overview
No ratings yet
Decision Tree Classification Overview
28 pages
AIUB Machine Learning Course Overview
No ratings yet
AIUB Machine Learning Course Overview
34 pages
Vril Society: Secrets of Goddess Energy
86% (7)
Vril Society: Secrets of Goddess Energy
8 pages
The Tudors Newest Edition 2025
No ratings yet
The Tudors Newest Edition 2025
100 pages
SRS Test P4
No ratings yet
SRS Test P4
3 pages
Heinrich Pesch: Legacy of Solidarism
No ratings yet
Heinrich Pesch: Legacy of Solidarism
24 pages
BCKV Master's Admission 2025-26 Details
No ratings yet
BCKV Master's Admission 2025-26 Details
10 pages
Online UG 1st Semester Exam Notice
No ratings yet
Online UG 1st Semester Exam Notice
2 pages
Project Management Concepts and Techniques
No ratings yet
Project Management Concepts and Techniques
22 pages
Effective Visuals for Presentations
No ratings yet
Effective Visuals for Presentations
11 pages
Gold Exp B2P U1 Lang Test B
No ratings yet
Gold Exp B2P U1 Lang Test B
2 pages
Research Paper Writing Guide 2024
No ratings yet
Research Paper Writing Guide 2024
5 pages
15 - Ing - 1288377409.2 - SAE R14 - Manguera
No ratings yet
15 - Ing - 1288377409.2 - SAE R14 - Manguera
2 pages
Electrical Resistivity Method Overview
100% (1)
Electrical Resistivity Method Overview
26 pages
Forensic Entomology in Death Investigations
No ratings yet
Forensic Entomology in Death Investigations
9 pages
Alarms and Operating Envelopes Overview
No ratings yet
Alarms and Operating Envelopes Overview
25 pages
SQL Queries for Client and Product Management
No ratings yet
SQL Queries for Client and Product Management
7 pages
Durian Husk as Sustainable Bio-Leather
100% (1)
Durian Husk as Sustainable Bio-Leather
8 pages
Trace Elements in Turkish Baby Foods
No ratings yet
Trace Elements in Turkish Baby Foods
6 pages
Driving NC II Training Orientation Guide
No ratings yet
Driving NC II Training Orientation Guide
10 pages
ACCA PM Budgeting Quiz for Nirvana Center
No ratings yet
ACCA PM Budgeting Quiz for Nirvana Center
8 pages
Viscoelastic Flat-Punch Indentation Analysis
No ratings yet
Viscoelastic Flat-Punch Indentation Analysis
13 pages
CV for PhD in Chemistry Mentorship
No ratings yet
CV for PhD in Chemistry Mentorship
1 page
GPSAntennas Brochure
No ratings yet
GPSAntennas Brochure
1 page
Intellectual Skills Questionnaire Guide
No ratings yet
Intellectual Skills Questionnaire Guide
12 pages
ETAP 21.0.1 - Load Flow Analysis
100% (1)
ETAP 21.0.1 - Load Flow Analysis
111 pages
Evolution of Hekirs to Humans
No ratings yet
Evolution of Hekirs to Humans
53 pages
Differential Equations for Class XII
No ratings yet
Differential Equations for Class XII
3 pages
Ana Mendieta's Silueta Series Explained
No ratings yet
Ana Mendieta's Silueta Series Explained
52 pages
American Voices Unit Plan for 9th Grade
No ratings yet
American Voices Unit Plan for 9th Grade
12 pages
RenckJalongo-Saracho2016 Chapter FromAResearchProjectToAJournal
No ratings yet
RenckJalongo-Saracho2016 Chapter FromAResearchProjectToAJournal
23 pages

Naïve Bayes Classifier Overview

Uploaded by

Naïve Bayes Classifier Overview

Uploaded by

Bayesian Theory & Naïve Bayes Classifiers

Course 4232: Machine Learning

Dept. of Computer Science

P( H | X) P(X | H ) P( H ) P(X | H )P( H ) / P(X)

 Given training data X, posteriori probability of a

P(H | X) P(X | H ) P(H ) P(X | H )P(H ) / P(X)

posteriori = likelihood x prior/evidence

the highest among all the P(Ck|X) for all the k

 Let D be a training set of tuples and their associated class

 Since P(X) is constant for all classes, only

cMAP argmax P(c j | x1 , x2 ,  , xn )

A simplified assumption: attributes are

 This greatly reduces the computation cost: Only

 Compute P(X|C ) for each class <=30

P(income = “medium” | buys_computer = “yes”) = 4/9 = 0.444

K decision regions R1,...,RK

R i x | gi x  maxk gk x

You might also like