0% found this document useful (0 votes)

13 views21 pages

Bayesian Classification Methods Overview

The document discusses Bayesian classification methods, including the Naïve Bayes classifier and Bayesian belief networks, highlighting their principles, advantages, and limitations. It explains the application of Bayes' theorem in classification, the importance of conditional probabilities, and the challenges of dependency among variables. Additionally, it outlines scenarios for training Bayesian networks and the computational aspects involved.

Uploaded by

Nedia Ben Ammar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

13 views21 pages

Bayesian Classification Methods Overview

Uploaded by

Nedia Ben Ammar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

Università degli Studi di Milano

Master Degree in Computer Science

Information Management
course
Teacher: Alberto Ceselli

Lecture 19: 10/12/2015

Data Mining:
Concepts and Techniques
(3rd ed.)

— Chapter 8, 9 —
Jiawei Han, Micheline Kamber, and Jian Pei
University of Illinois at Urbana-Champaign &
Simon Fraser University
©2011 Han, Kamber & Pei. All rights reserved.
2
Classification methods
 Classification: Basic Concepts
 Decision Tree Induction
 Bayes Classification Methods
 Support Vector Machines
 Model Evaluation and Selection
 Rule-Based Classification
 Techniques to Improve Classification
Accuracy: Ensemble Methods
3
Bayesian Classification:
Why?
 A statistical classifier: performs probabilistic prediction,
i.e., predicts class membership probabilities
 Foundation: Based on Bayes’ Theorem.
 Performance: A simple Bayesian classifier, naïve
Bayesian classifier, has comparable performance with
decision tree and selected neural network classifiers
 Incremental: Each training example can incrementally
increase/decrease the probability that a hypothesis is
correct — prior knowledge can be combined with
observed data
 Standard: Even when Bayesian methods are
computationally intractable, they can provide a standard
of optimal decision making against which other methods
can be measured
4
Bayesian Classification Rationale:
conditional probability
income student credit buys
Class: high no fair no
C1:buys_computer = ‘yes’ high no excellent no
C2:buys_computer = ‘no’ high no fair yes
medium no fair yes
low yes fair yes
low yes excellent no
P(C1)? low yes excellent yes
P(C1|student = yes)? medium no fair no
low yes fair yes
medium yes fair yes
medium yes excellent yes
medium no excellent yes
high yes fair yes
medium no excellent no
5
Bayesian Classification Rationale
 Let P(Ci|X) be the conditional probability of observing
class Ci provided the set of attributes values of my
element is X
 Final aim: obtaining (an estimation of) P(Ci|X) for each i
and for each X (classification model is the set of these
values)
 P(Ci|X) = P(Ci ∩ X) / P(X)
 How to compute P(X)?
 We would need a sufficient number of elements in

the training set whose attribute values are X

 … and therefore some elements for each possible

combination of the attribute values (unrealistic)

 How to compute P(Ci ∩ X)? Same problems
6
Bayesian Theorem: Basics
 Let X be an evidence (data sample): unkn. class label
 Let H be a hypothesis on the class X belongs
(say “potential” class)
 Classification is to find P(H|X)
a posteriori probability: the probability that the
hypothesis holds given the observed data sample X
 We can estimate:
P(H) (a priori probability), an initial “blind” probability
 E.g., X buys computer, regardless of age, income

P(X): probability that a certain data sample is observed

P(X|H) (likelyhood), the probability of observing the
sample X, given that the hypothesis H holds
7
Bayesian classification: defs
age income student credit_rating PC
<=30 high no fair no • Evidence X =
<=30 high no excellent no (age = 31..40;
31…40 high no fair yes
income = medium;
>40 medium no fair yes
>40 low yes fair yes
student = no;
>40 low yes excellent no rating = excellent)
31…40 low yes excellent yes • Hypotesis H =
<=30 medium no fair no (PC = yes)
<=30 low yes fair yes • A priori Probability
>40 medium yes fair yes P(H) = 9/14
<=30 medium yes excellent yes
• Likelihood
31…40 medium no excellent yes
31…40 high yes fair yes P(X|H) = 1/9
>40 medium no excellent no • A posteriori Probability
P(H|X) = ???

8
Bayesian Theorem
 Given training data X, posteriori probability of a
hypothesis H, P(H|X), follows the Bayes theorem
P ( X ∣H ) P ( H )
P ( H ∣ X )= = P ( X ∣H )× P ( H )/ P ( X )
P( X )
 Informally, this can be written as
posteriori = likelihood x priori/evidence
 Predicts that X belongs to Ci iff the probability
P(Ci|X) is the highest among all the P(Ck|X) for all
the k classes
 Practical difficulty: require initial knowledge of
many probabilities, significant computational cost 9
Bayesian Classification
 Let D be a training set of tuples and their
associated class labels, and each tuple is
represented by an n-D attribute vector
X = (x1, x2, …, xn)
 Suppose there are m classes C1, C2, …, Cm.
 Classification is to derive the maximum posteriori,
i.e., the maximal P(Ci|X)
 This can be derived from Bayes’ theorem
P ( X ∣C i ) P (C i )
P (C i∣X )=
P( X )
 Since P(X) is constant for all classes, only max

P (C i∣X )=P ( X ∣C i ) P (C i )
needs to be found (Maximum A Posteriori method) 10
The “Optimal” Bayesian
Classifier
 From a theoretical point of view, the Bayesian MAP
classifier is optimal: no classifier can exist
achieving a smaller error rate
 In order to compute

P (C i∣X )=P ( X ∣C i ) P (C i )
we need
P (C i )
→ “easy”: just scan the DB once
and
P ( X ∣C i )
→ if we have k classes and m attributes, each
taking n possible values: k*nm probability values!
11
Derivation of Naïve Bayes
Classifier
 A simplified assumption: attributes are conditionally
independent (i.e., no dependence relation between
attributes) and identically distributed (iid):
n
P ( X ∣C i )=∏ P ( x k∣C i )= P ( x 1∣C i )× P ( x 2∣C i )×. ..× P ( x n∣C i )
k =1

 This greatly reduces the computation cost: Only counts

the class distribution (k*n*m probabilities)

 If Ak is categorical, P(xk|Ci) is the # of tuples in Ci having

value xk for Ak divided by |Ci, D| (# of tuples of Ci in D)
 If Ak is continuous-valued, P(xk|Ci) is usually computed
based on Gaussian distribution with a mean μ and
standard deviation σ 2
( x− μ)
−
1 2σ 2
g( x , μ , σ )= e P ( X∣C i )=g ( x k , μ C ,σ C )
√ 2π σ i i
12
Training a Naïve Bayesian
Classifier (example)
Training:
age income student credit_rating PC
• P(PC = yes) = 9/14
<=30 high no fair no • P(PC = no) = 5/14
<=30 high no excellent no • P(age = “<=30” | PC = yes) = 2/9
31…40 high no fair yes • P(age = “<=30” | PC = no) = 3/5
>40 medium no fair yes • P(incm. = “med” | PC = yes) = 4/9
>40 low yes fair yes • P(incm. = “med” | PC = no) = 2/5
>40 low yes excellent no • P(student = “yes” | PC = yes) = 6/9
31…40 low yes excellent yes
• P(student = “yes” | PC = no) = 1/5
<=30 medium no fair no
<=30 low yes fair yes
• P(credit = “fair” | PC = “yes”) = 6/9
>40 medium yes fair yes • P(credit = “fair” | PC = “no”) = 2/5
<=30 medium yes excellent yes • P( all other combinations )
31…40 medium no excellent yes …
31…40 high yes fair yes Using:
>40 medium no excellent no •X = (“<=30”;“med”;“yes”;“fair”)
•P(X|PC = yes) →
P(age = “<=30” | PC = yes) *
P(incm. = “med” | PC = yes) *
P(student = “yes” | PC = yes) *
P(credit = “fair” | PC = “yes”) → 0.044
•P(X|PC = no) → 0.019
•P(PC = yes | X)→π*P(X|PC = yes)*P(PC = yes)→π*0.028
•P(PC = no | X)→π*P(X|PC = no)*P(PC = no)→π*0.007
PREDICT “PC = yes”!!! 13
Avoiding the Zero-Probability
Problem
 Naïve Bayesian prediction requires each
conditional prob. be non-zero. Otherwise, the
predicted prob. will be zero
n
P ( X ∣C i )=∏ P ( x k ∣C i )
k =1
 Ex. Suppose a dataset with 1000 tuples,
income=low (0), income= medium (990), and
income = high (10)
 Use Laplacian correction (or Laplacian
estimator)
 Adding 1 to each case
Prob(income = low) = 1/1003
Prob(income = medium) = 991/1003
Prob(income = high) = 11/1003 15
Naïve Bayesian Classifier:
Comments
 Advantages
 Easy to implement and computationally efficient

 Good results obtained in most of the cases

 Disadvantages
 Assumption: class conditional independence,

therefore loss of accuracy

 Practically, dependencies exist among variables


E.g., hospitals: patients: Profile: age, family
history, etc.
Symptoms: fever, cough etc., Disease: lung
cancer, diabetes, etc.

Dependencies among these cannot be modeled
by Naïve Bayesian Classifier
 How to deal with these dependencies?
→ Bayesian Belief Networks 16
Bayesian Belief Networks
 Bayesian belief networks (also known as
Bayesian networks, probabilistic networks):
allow class conditional independencies between
subsets of variables
 A (directed acyclic) graphical model of causal
relationships
 Represents dependency among the variables

 Gives a specification of joint probability

distribution

17
Bayesian Belief Networks
●
Nodes: random variables
●
Links: dependency
●
X and Y are the parents of Z, and Y is the
parent of P
●
No dependency between Z and P
●
Has no loops/cycles

X Y

Z
P
18
Bayesian Belief Network: An
Example
Family CPT: Conditional Probability
Smoker (S)
History (FH) Table for variable LungCancer:
(FH, S) (FH, ~S) (~FH, S) (~FH, ~S)

LC 0.8 0.5 0.7 0.1

LungCancer
Emphysema ~LC 0.2 0.5 0.3 0.9
(LC)

shows the conditional probability

for each possible combination of its
parents
PositiveXRay Dyspnea Derivation of the probability of a
particular combination of values
of X, from CPT:
Bayesian Belief Network n
P ( x 1 , .. . , x n )= ∏ P ( x i∣Parents ( x i ))
i=1
19
Training Bayesian Networks:
Several Scenarios
 Scenario 1: Given both the network structure and all
variables observable: compute only the CPT entries
 Scenario 2: Network structure known, some variables hidden:
gradient descent (greedy hill-climbing) method, i.e., search for
a solution along the steepest descent of a criterion function
 Weights are initialized to random probability values

 At each iteration, it moves towards what appears to be the

best solution at the moment, w.o. backtracking

 Weights are updated at each iteration & converge to local

optimum

20
Training Bayesian Networks:
Several Scenarios
 Scenario 3: Network structure unknown, all variables
observable: search through the model space to reconstruct
network topology
 Scenario 4: Unknown structure, all hidden variables: No good
algorithms known for this purpose
 D. Heckerman. A Tutorial on Learning with Bayesian Networks
. In Learning in Graphical Models, M. Jordan, ed.. MIT Press,
1999.

21
Bayesian Belief Networks:
Comments
 Advantages
 Computationally heavier than naïve classifier, but

still tractable
 Handle (approximating) dependencies

 Very good results (provided a meaningful network

is designed & tuned)

 Disadvantages
 Need expert problem knowledge or external

mining algorithms for designing the network

Bayesian Classification in Data Mining
No ratings yet
Bayesian Classification in Data Mining
15 pages
Bayesian Classification Techniques
No ratings yet
Bayesian Classification Techniques
16 pages
Understanding Bayesian Classification Techniques
No ratings yet
Understanding Bayesian Classification Techniques
15 pages
Understanding Naïve Bayesian Classifier
No ratings yet
Understanding Naïve Bayesian Classifier
13 pages
Bayesian Classification in Data Mining
No ratings yet
Bayesian Classification in Data Mining
46 pages
Bayesian Classification Methods Explained
No ratings yet
Bayesian Classification Methods Explained
46 pages
Naïve Bayes Classifier Explained
No ratings yet
Naïve Bayes Classifier Explained
9 pages
Bayes' Theorem in Data Classification
No ratings yet
Bayes' Theorem in Data Classification
10 pages
Bayesian Classification
No ratings yet
Bayesian Classification
11 pages
Bayesian Classification in Data Mining
No ratings yet
Bayesian Classification in Data Mining
6 pages
Machine Learning: Classification Techniques
No ratings yet
Machine Learning: Classification Techniques
37 pages
Naïve Bayesian Classification Overview
No ratings yet
Naïve Bayesian Classification Overview
47 pages
Understanding Classification Techniques
No ratings yet
Understanding Classification Techniques
111 pages
Understanding Bayesian Classification Techniques
No ratings yet
Understanding Bayesian Classification Techniques
40 pages
Introduction to Bayesian Classification
No ratings yet
Introduction to Bayesian Classification
19 pages
Naïve Bayes Classifier Overview
No ratings yet
Naïve Bayes Classifier Overview
19 pages
Naïve Bayes Classifier Explained
No ratings yet
Naïve Bayes Classifier Explained
19 pages
Naïve Bayesian Classifier Overview
No ratings yet
Naïve Bayesian Classifier Overview
13 pages
Naïve Bayes Classifier Overview
No ratings yet
Naïve Bayes Classifier Overview
21 pages
Naive Bayes Classification Explained
No ratings yet
Naive Bayes Classification Explained
18 pages
Understanding Bayesian Learning Methods
No ratings yet
Understanding Bayesian Learning Methods
42 pages
Understanding Bayesian Classification
No ratings yet
Understanding Bayesian Classification
25 pages
Bayesian Classification Techniques Explained
No ratings yet
Bayesian Classification Techniques Explained
22 pages
Mod 3 Aiml
No ratings yet
Mod 3 Aiml
84 pages
Bayesian Classification Explained
No ratings yet
Bayesian Classification Explained
18 pages
Understanding Bayesian Classification Techniques
No ratings yet
Understanding Bayesian Classification Techniques
35 pages
Naïve Bayesian Classifier Overview
No ratings yet
Naïve Bayesian Classifier Overview
16 pages
Bayes Classification Explained
No ratings yet
Bayes Classification Explained
41 pages
Naive Bayesian Classifier Overview
No ratings yet
Naive Bayesian Classifier Overview
21 pages
Understanding Bayes' Theorem in Learning
No ratings yet
Understanding Bayes' Theorem in Learning
68 pages
Unit 3 Bayesian Concept Learning
No ratings yet
Unit 3 Bayesian Concept Learning
20 pages
Naïve Bayesian Classification Explained
No ratings yet
Naïve Bayesian Classification Explained
17 pages
Navbay V
No ratings yet
Navbay V
25 pages
Bayesian Decision Theory in ML
0% (1)
Bayesian Decision Theory in ML
15 pages
Understanding Bayesian Classification Techniques
No ratings yet
Understanding Bayesian Classification Techniques
23 pages
Classification Algorithms Overview
No ratings yet
Classification Algorithms Overview
169 pages
Bayes Classification Techniques Overview
No ratings yet
Bayes Classification Techniques Overview
30 pages
Naive Bayesian Classifiers Overview
No ratings yet
Naive Bayesian Classifiers Overview
43 pages
Bayesian Classification Overview
No ratings yet
Bayesian Classification Overview
66 pages
Supervised Learning: Classification Methods
No ratings yet
Supervised Learning: Classification Methods
79 pages
Naive Bayesian Classification Overview
No ratings yet
Naive Bayesian Classification Overview
15 pages
Understanding Bayesian Classification Techniques
No ratings yet
Understanding Bayesian Classification Techniques
27 pages
Bayesian Learning Overview
No ratings yet
Bayesian Learning Overview
51 pages
Understanding Bayesian Classifiers
No ratings yet
Understanding Bayesian Classifiers
58 pages
Bayesian Classification Overview
No ratings yet
Bayesian Classification Overview
35 pages
Naive Bayes Algorithm Course Overview
No ratings yet
Naive Bayes Algorithm Course Overview
8 pages
Chapter#13
No ratings yet
Chapter#13
10 pages
Bayesian Reasoning and Classifiers Explained
No ratings yet
Bayesian Reasoning and Classifiers Explained
10 pages
Bayesian Classification Techniques Explained
No ratings yet
Bayesian Classification Techniques Explained
24 pages
10 Naive Bayesian Classifier
No ratings yet
10 Naive Bayesian Classifier
20 pages
Machine Learning - Bayesian Classification
No ratings yet
Machine Learning - Bayesian Classification
8 pages
Naïve Bayes Classifier Overview
No ratings yet
Naïve Bayes Classifier Overview
80 pages
Naive Bayesian Classification Overview
No ratings yet
Naive Bayesian Classification Overview
16 pages
Naive Bayesian Classification Explained
No ratings yet
Naive Bayesian Classification Explained
48 pages
Bayesian Learning for Classification
No ratings yet
Bayesian Learning for Classification
40 pages
2020 4 Naive Bayesian Classifier-819591-16741256178947
No ratings yet
2020 4 Naive Bayesian Classifier-819591-16741256178947
30 pages
Bayesian Learning in Machine Learning
No ratings yet
Bayesian Learning in Machine Learning
35 pages
Bayes Theorem in Medical Diagnosis
No ratings yet
Bayes Theorem in Medical Diagnosis
44 pages
Bayesian Concept Learning Overview
No ratings yet
Bayesian Concept Learning Overview
40 pages
Data Mining: Dimensionality Reduction Techniques
No ratings yet
Data Mining: Dimensionality Reduction Techniques
21 pages
Frequent Pattern Mining Techniques
No ratings yet
Frequent Pattern Mining Techniques
25 pages
Cluster Analysis Methods Overview
No ratings yet
Cluster Analysis Methods Overview
64 pages
Backjumping Search Techniques Explained
No ratings yet
Backjumping Search Techniques Explained
46 pages
Incomplete Information in Game Theory
No ratings yet
Incomplete Information in Game Theory
50 pages
Game Theory Final Exam Guidelines
No ratings yet
Game Theory Final Exam Guidelines
3 pages
086D50 Impulse Hammer Manual
No ratings yet
086D50 Impulse Hammer Manual
15 pages
Human Capital Management (HCM I) Case Study: Product Motivation Prerequisites
No ratings yet
Human Capital Management (HCM I) Case Study: Product Motivation Prerequisites
43 pages
Nüve NC 100 Steam Sterilizer Manual
0% (1)
Nüve NC 100 Steam Sterilizer Manual
35 pages
AFC LTR - 05287 - Amnnexture A - RC Segments Instead of Steel Segments - Technical Method of Work
No ratings yet
AFC LTR - 05287 - Amnnexture A - RC Segments Instead of Steel Segments - Technical Method of Work
68 pages
DHS User Guide v61 PDF
100% (1)
DHS User Guide v61 PDF
781 pages
MEX Fire Rated Doors Overview
No ratings yet
MEX Fire Rated Doors Overview
10 pages
Low-Cost Green Hydrogen from Seawater
No ratings yet
Low-Cost Green Hydrogen from Seawater
5 pages
Bond Graphs for Dynamic System Modeling
No ratings yet
Bond Graphs for Dynamic System Modeling
43 pages
Weekly Equipment Safety Checklists
No ratings yet
Weekly Equipment Safety Checklists
11 pages
Belt Conveyor Request Form 4.5
No ratings yet
Belt Conveyor Request Form 4.5
3 pages
Game Mechanics and Player Functions
No ratings yet
Game Mechanics and Player Functions
2 pages
Cybersecurity in Cooperative Driving Automation
No ratings yet
Cybersecurity in Cooperative Driving Automation
19 pages
Simple Machines Learning Module
No ratings yet
Simple Machines Learning Module
2 pages
FTID Refund Methods Explained
No ratings yet
FTID Refund Methods Explained
1 page
3db19433aaaa - v1 - Wavence Aslm User Guide
No ratings yet
3db19433aaaa - v1 - Wavence Aslm User Guide
33 pages
HGU Electrical Documentation Overview
No ratings yet
HGU Electrical Documentation Overview
2 pages
HPE Superdome Flex 280 Server Setup
No ratings yet
HPE Superdome Flex 280 Server Setup
2 pages
Programming Assignments Overview
No ratings yet
Programming Assignments Overview
29 pages
Computer Networking Concepts Explained
No ratings yet
Computer Networking Concepts Explained
96 pages
Supply Chain 5.0: Review of Impacts and Challenges
No ratings yet
Supply Chain 5.0: Review of Impacts and Challenges
11 pages
Ionizing Radiation Effects in SONOS-Based Neuromorphic Inference Accelerators
No ratings yet
Ionizing Radiation Effects in SONOS-Based Neuromorphic Inference Accelerators
8 pages
Evolution of Online Education Infographic
No ratings yet
Evolution of Online Education Infographic
3 pages
Mechatronic Systems Design Assignment Guide
No ratings yet
Mechatronic Systems Design Assignment Guide
10 pages
Home Automation Project Report
No ratings yet
Home Automation Project Report
6 pages
Modified Basic Education Enrollment Form
No ratings yet
Modified Basic Education Enrollment Form
1 page
Enhanced GWR Adapter for Desalter Use
No ratings yet
Enhanced GWR Adapter for Desalter Use
1 page
Intro 2 RNAseq
No ratings yet
Intro 2 RNAseq
98 pages
Python Conditional Programming Lab Guide
No ratings yet
Python Conditional Programming Lab Guide
11 pages
IT Change Management Best Practices
No ratings yet
IT Change Management Best Practices
3 pages

Bayesian Classification Methods Overview

Uploaded by

Bayesian Classification Methods Overview

Uploaded by

Università degli Studi di Milano

Master Degree in Computer Science

Lecture 19: 10/12/2015

the training set whose attribute values are X

combination of the attribute values (unrealistic)

P(X): probability that a certain data sample is observed

 This greatly reduces the computation cost: Only counts

 If Ak is categorical, P(xk|Ci) is the # of tuples in Ci having

 Good results obtained in most of the cases

therefore loss of accuracy

 Gives a specification of joint probability

LC 0.8 0.5 0.7 0.1

shows the conditional probability

 At each iteration, it moves towards what appears to be the

best solution at the moment, w.o. backtracking

 Very good results (provided a meaningful network

is designed & tuned)

mining algorithms for designing the network

You might also like