0% found this document useful (0 votes)

7 views11 pages

Bayesian Learning and Decision Theory

The document discusses Bayesian learning, focusing on Bayesian Decision Theory (BDT) and its applications in machine learning, such as the Naïve Bayes Classifier and Bayesian Belief Networks. It explains key concepts like prior and posterior probabilities, the Bayes Optimal Classifier, and the EM Algorithm for learning with unobservable variables. The document emphasizes the practical utility of Bayesian methods in various domains, including medical diagnosis and text classification.

Uploaded by

code.alentech

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

7 views11 pages

Bayesian Learning and Decision Theory

Uploaded by

code.alentech

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

LESSON 7

BAYESIAN LEARNING

Bayesian learning algorithm is a method of calculating probabilities for hypothesis

It is one of the most practical approaches to certain type of learning problems

Bayesian Decision Theory

The BDTcame long before Version Spaces, Decision Tree Learning and Neural Networks. It was
studied in the field of Statistical Theory and more specifically, in the field of Pattern
Recognition.

Bayesian Decision Theory is at the basis of important learning schemes such as the Naïve Bayes
Classifier, Learning Bayesian Belief Networks and the EM (Expectation
Maximization)Algorithm.

The BDT is also useful as it provides a framework within which many non-Bayesian classifiers
can be studied.

Bayes Theorem

The above can be written as

Where:

P(h) = prior probability of hypothesis h.

P(D) = prior probability of training data D.

P(h|D) = probability of h given D.

P(D|h) = probability of D given h.

Goal: To determine the most probable hypothesis, given the data D plus any initial knowledge
about the prior probabilities of the various hypotheses in H.

P(h) - the prior probability of a hypothesis h. Reflects background knowledge; before data is
observed. If no information - uniform distribution.

P(D) - The probability that this sample of the Data is observed. (No knowledge of the
hypothesis)

P(D|h): The probability of observing the sample D, given hypothesis h

P(h|D): The posterior probability of h. The probability of h given that D has been observed.

Why Bayesian?

It provides practical learning algorithms for example the Naïve Bayes. This is because Prior
knowledge and observed data can be combined

It is also a generative (model based) approach, which offers a useful conceptual framework

- E.g. sequences could also be classified, based on a probabilistic model specification

- And, Any kind of objects can be classified, based on a probabilistic model specification

Bayes Optimal Classifier

One great advantage of Bayesian Decision Theory is that it gives a lower bound on the
classification error that can be obtained for a given problem.

Bayes Optimal Classification: The most probable classification of a new instance is obtained
by combining the predictions of all hypotheses, weighted by their posterior probabilities:

argmaxvjVhi HP(vh|hi)P(hi|D)

where V is the set of all the values a classification can take and vj is one possible such
classification.

Unfortunately, Bayes Optimal Classifier is usually too costly to apply! ==> Naïve Bayes
Classifier

Example
Does patient have cancer or not?

A patient takes a lab test and the result comes back positive. It is known that the test returns a
correct positive result in only 98% of the cases and a correct negative result in only 97% of the
cases. Furthermore, only 0.008 of the entire population has this disease.

1. What is the probability that this patient has cancer?

2. What is the probability that he does not have cancer?

3. What is the diagnosis?

MAP Learner

For each hypothesis h in H, calculate the posterior probability

Output the hypothesis hmap with the highest posterior probability

Choosing Hypotheses
Maximum Likelihood hypothesis:

Generally we want the most probable hypothesis given training data. This is the maximum a
posteriori hypothesis:

Useful observation: it does not depend on the denominator P(d)

Now we compute the diagnosis

To find the Maximum Likelihood hypothesis, we evaluate P(d|h) for the data d, which is the
positive lab test and chose the hypothesis (diagnosis) that maximises it:

To find the Maximum A Posteriori hypothesis, we evaluate P(d|h)P(h) for the data d, which is
the positive lab test and chose the hypothesis (diagnosis) that maximises it. This is the same as
choosing the hypotheses gives the higher posterior probability.

Some Results from the Analysis of Learners in a Bayesian Framework

If P(h)=1/|H| and if P(D|h)=1 if D is consistent with h, and 0 otherwise, then every hypothesis in
the version space resulting from D is a MAP hypothesis.

Under certain assumptions regarding noise in the data, minimizing the mean squared error (what
common neural nets do) corresponds to computing the maximum likelihood hypothesis.

When using a certain representation for hypotheses, choosing the smallest hypotheses
corresponds to choosing MAP hypotheses (An attempt at justifying Occam’s razor)
Naïve Bayes Classifier

An important, special and simple of a Bayes optimal classifier, where

– hypothesis = classification

– all attributes are independent given the class

All the attributes belong to the same class.

What can we do if our data d has several attributes?

Naïve Bayes assumption: Attributes that describe data instances are conditionally independent
given the classification hypothesis

It is a simplifying assumption, obviously it may be violated in reality and in spite of that, it

works well in practice

The Bayesian classifier that uses the Naïve Bayes assumption and computes the MAP hypothesis
is called Naïve Bayes classifier

One of the most practical learning methods

Has been successful applied in the following areas:

 Medical Diagnosis

 Text classification

Example

Given the following Training data (Play Tennis)

Solution by Naïve Bayes is to Classify any new datum instance x=(a1,…aT) as:

Based on training examples, we need to estimate the parameters from the training examples:

For each target value (hypothesis) h

For each attribute value at of each datum instance

Based on the examples in the table, classify the following datum x:

x=(Outl=Sunny, Temp=Cool, Hum=High, Wind=strong)

That means we need to determine either Play tennis or not?

Computation:

x’=(Outlook=Sunny,Temperature=Cool,Humidity=High, Wind=Strong)

P(Outlook=Sunny|Play=Yes) = 2/9

P(Temperature=Cool|Play=Yes) = 3/9

P(Huminity=High|Play=Yes) = 3/9

P(Wind=Strong|Play=Yes) = 3/9

P(Play=Yes) = 9/14

P(Outlook=Sunny|Play=No) = 3/5

P(Temperature=Cool|Play==No) = 1/5

P(Huminity=High|Play=No) = 4/5

P(Wind=Strong|Play=No) = 3/5
P(Play=No) = 5/14

P(Yes|x’)

≈[P(Sunny|Yes)P(Cool|Yes)P(High|Yes)P(Strong|Yes)] P(Play=Yes)

=2/9*3/9*3/9*3/9*9/14

= 0.0053

P(No|x’)

≈[P(Sunny|No)P(Cool|No)P(High|No)P(Strong|No)] P(Play=No)

=3/5*1/5*4/5*3/5*5/14

= 0.0206

Given the fact P(Yes|x’) < P(No|x’), then the new datum is label x’ to be “No-class”.

Naive Bayes Classifier is a simple but effective Bayesian classifier for vector data (i.e. data with
several attributes) that assumes that attributes are independent given the class.

Let each instance x of a training set D be described by a conjunction of n attribute values

<a1,a2,..,an> and let f(x), the target function, be such that f(x)  V, a finite set.

Bayesian Approach:

vMAP = argmaxvj V P(vj|a1,a2,..,an)

= argmaxvj V [P(a1,a2,..,an|vj) P(vj)/P(a1,a2,..,an)]

= argmaxvj V [P(a1,a2,..,an|vj) P(vj)

Naïve Bayesian Approach: We assume that the attribute values are conditionally independent
so that P(a1,a2,..,an|vj) =i P(a1|vj) [and not too large a data set is required.] Naïve Bayes
Classifier:

vNB = argmaxvj V P(vj) i P(ai|vj)

Bayesian Belief Networks

The Bayes Optimal Classifier is often too costly to apply.

The Naïve Bayes Classifier uses the conditional independence assumption to defray these costs.
However, in many cases, such an assumption is overly restrictive.

Bayesian belief networks provide an intermediate approach which allows stating conditional
independence assumptions that apply to subsets of the variable.

Conditional Independence

We say that X is conditionally independent of Y given Z if the probability distribution governing

X is independent of the value of Y given a value for Z.

i.e., (xi,yj,zk) P(X=xi|Y=yj,Z=zk)=P(X=xi|Z=zk) or, P(X|Y,Z)=P(X|Z)

This definition can be extended to sets of variables as well: we say that the set of variables
X1…Xl is conditionally independent of the set of variables Y1…Ym given the set of variables
Z1…Zn , if

P(X1…Xl|Y1…Ym,Z1…Zn(=P(X1…Xl|Z1…Zn)

Representation in Bayesian Belief Networks

Associated with each node is a conditional probability table, which specifies the conditional
distribution for the variable, given its immediate parents in the graph

Each node is asserted to be conditionally independent of its non-descendants, given its

immediate parents

Inference in Bayesian Belief Networks

A Bayesian Network can be used to compute the probability distribution for any subset of
network variables given the values or distributions for any subset of the remaining variables.

Unfortunately, exact inference of probabilities in general for an arbitrary Bayesian Network is

known to be NP-hard.

In theory, approximate techniques (such as Monte Carlo Methods) can also be NP-hard, though
in practice, many such methods were shown to be useful.

Learning Bayesian Belief Networks

3 Cases:

1. The network structure is given in advance and all the variables are fully observable in the
training examples. ==> Trivial Case: just estimate the conditional probabilities.

2. The network structure is given in advance but only some of the variables are observable in the
training data. ==> Similar to learning the weights for the hidden units of a Neural Net: Gradient
Ascent Procedure

3. The network structure is not known in advance. ==> Use a heuristic search or constraint-based
technique to search through potential structures.

The EM Algorithm: Learning with unobservable relevant variables.

Example:Assume that data points have been uniformly generated from k distinct Gaussian with
the same known variance. The problem is to output a hypothesis h=<1, 2 ,.., k> that
describes the means of each of the k distributions. In particular, we are looking for a maximum
likelihood hypothesis for these means.

We extend the problem description as follows: for each point xi, there are k hidden variables
zi1,..,zik such that zil=1 if xi was generated by normal distribution l and ziq= 0 for all ql.
An arbitrary initial hypothesis h=<1, 2 ,.., k> is chosen.

The EM Algorithm iterates over two steps:

Step 1 (Estimation, E): Calculate the expected value E[zij] of each hidden variable zij, assuming
that the current hypothesis h=<1, 2 ,.., k> holds.

Step 2 (Maximization, M): Calculate a new maximum likelihood hypothesis h’=<1’, 2’ ,..,
k’>, assuming the value taken on by each hidden variable zij is its expected value E[zij]
calculated in step 1. Then replace the hypothesis h=<1, 2 ,.., k> by the new hypothesis
h’=<1’, 2’ ,.., k’> and iterate.

The EM Algorithm can be applied to more general problems

Understanding Bayesian Learning Methods
No ratings yet
Understanding Bayesian Learning Methods
10 pages
Understanding Bayesian Learning Methods
No ratings yet
Understanding Bayesian Learning Methods
52 pages
Understanding Bayesian Learning and Naïve Bayes
No ratings yet
Understanding Bayesian Learning and Naïve Bayes
4 pages
Bayesian Classification in Data Mining
No ratings yet
Bayesian Classification in Data Mining
46 pages
Naive Bayes Classifier Overview
No ratings yet
Naive Bayes Classifier Overview
15 pages
Naïve Bayes Classifier Overview
No ratings yet
Naïve Bayes Classifier Overview
31 pages
Naive Bayes Classifier Overview
No ratings yet
Naive Bayes Classifier Overview
15 pages
Bayesian Learning for Classification
No ratings yet
Bayesian Learning for Classification
40 pages
Naïve Bayes Classifier Overview
No ratings yet
Naïve Bayes Classifier Overview
19 pages
Bayesian Methods in Machine Learning
No ratings yet
Bayesian Methods in Machine Learning
36 pages
Complete DKV PPTs 367 388
No ratings yet
Complete DKV PPTs 367 388
22 pages
Naive Bayes Classifier Overview
No ratings yet
Naive Bayes Classifier Overview
38 pages
Introduction to Bayesian Learning
No ratings yet
Introduction to Bayesian Learning
18 pages
Understanding Bayesian Learning Methods
No ratings yet
Understanding Bayesian Learning Methods
42 pages
Naive Bayesian Classification Explained
No ratings yet
Naive Bayesian Classification Explained
48 pages
Understanding Bayesian Classification Techniques
No ratings yet
Understanding Bayesian Classification Techniques
15 pages
2020 4 Naive Bayesian Classifier-819591-16741256178947
No ratings yet
2020 4 Naive Bayesian Classifier-819591-16741256178947
30 pages
Bayesian Learning Algorithms Explained
No ratings yet
Bayesian Learning Algorithms Explained
54 pages
Naïve Bayes Classifier Overview
No ratings yet
Naïve Bayes Classifier Overview
21 pages
Bayes Optimal Classifier Explained
No ratings yet
Bayes Optimal Classifier Explained
16 pages
Naïve Bayes Classifier Overview
No ratings yet
Naïve Bayes Classifier Overview
19 pages
Naïve Bayes Classifier Overview
No ratings yet
Naïve Bayes Classifier Overview
18 pages
AI Neural Networks and Bayesian Learning
No ratings yet
AI Neural Networks and Bayesian Learning
46 pages
Bayesian Classification Overview
No ratings yet
Bayesian Classification Overview
66 pages
Understanding Naïve Bayes Classifiers
No ratings yet
Understanding Naïve Bayes Classifiers
6 pages
Naïve Bayes Classification Overview
No ratings yet
Naïve Bayes Classification Overview
37 pages
Naive Bayesian Classification Overview
No ratings yet
Naive Bayesian Classification Overview
16 pages
Bayesian Learning in Machine Learning
No ratings yet
Bayesian Learning in Machine Learning
19 pages
Bayes
No ratings yet
Bayes
48 pages
Bayesian Classification Methods Explained
No ratings yet
Bayesian Classification Methods Explained
46 pages
Program 8 1
No ratings yet
Program 8 1
7 pages
Statistical Learning: Bayesian Models Overview
No ratings yet
Statistical Learning: Bayesian Models Overview
33 pages
Naïve Bayesian Classifier Overview
No ratings yet
Naïve Bayesian Classifier Overview
16 pages
Naïve Bayes Classifier Overview
No ratings yet
Naïve Bayes Classifier Overview
38 pages
Laplace Smoothing in Naive Bayes
No ratings yet
Laplace Smoothing in Naive Bayes
79 pages
Naive Bayes Classifier Explained
No ratings yet
Naive Bayes Classifier Explained
9 pages
Bayesian Classification Techniques
No ratings yet
Bayesian Classification Techniques
7 pages
Bayesian Learning and Naïve Bayes Classifier
No ratings yet
Bayesian Learning and Naïve Bayes Classifier
40 pages
Naive Bayes Algorithm Course Overview
No ratings yet
Naive Bayes Algorithm Course Overview
8 pages
Week6 Lesson6 Naive Bayes Classifier-1-1
No ratings yet
Week6 Lesson6 Naive Bayes Classifier-1-1
7 pages
ML Lecture Slides 4
No ratings yet
ML Lecture Slides 4
25 pages
Bayesian Classification in Data Mining
No ratings yet
Bayesian Classification in Data Mining
15 pages
Bayesian Learning and Naïve Bayes Classifier
No ratings yet
Bayesian Learning and Naïve Bayes Classifier
40 pages
Naïve Bayes Classification Overview
No ratings yet
Naïve Bayes Classification Overview
31 pages
Understanding Bayes' Theorem in Learning
No ratings yet
Understanding Bayes' Theorem in Learning
68 pages
Bayesian Classification Methods Overview
No ratings yet
Bayesian Classification Methods Overview
21 pages
Supervised Learning: Naïve Bayes & kNN
No ratings yet
Supervised Learning: Naïve Bayes & kNN
32 pages
Bayesian Learning for Classification
No ratings yet
Bayesian Learning for Classification
34 pages
Bayesian Learning in Machine Learning
No ratings yet
Bayesian Learning in Machine Learning
5 pages
Bayesian Concept Learning Overview
No ratings yet
Bayesian Concept Learning Overview
40 pages
Understanding Bayes Theorem and Applications
No ratings yet
Understanding Bayes Theorem and Applications
10 pages
Bayesian Concept Learning Overview
No ratings yet
Bayesian Concept Learning Overview
40 pages
Naïve Bayes Classifier Overview
No ratings yet
Naïve Bayes Classifier Overview
20 pages
Naïve Bayes Classifier Overview
No ratings yet
Naïve Bayes Classifier Overview
15 pages
Naïve Bayes Classifier Explained
No ratings yet
Naïve Bayes Classifier Explained
9 pages
Naive Bayes Algorithm in Medical Diagnosis
No ratings yet
Naive Bayes Algorithm in Medical Diagnosis
15 pages
Machine Learning - Bayesian Classification
No ratings yet
Machine Learning - Bayesian Classification
8 pages
Bayesian Classification Explained
No ratings yet
Bayesian Classification Explained
18 pages
Naïve Bayes Classifier Explained
No ratings yet
Naïve Bayes Classifier Explained
21 pages
Early Sepsis Detection via Machine Learning
No ratings yet
Early Sepsis Detection via Machine Learning
11 pages
Data Wrangling with Python and Pandas
No ratings yet
Data Wrangling with Python and Pandas
87 pages
Midterm Lab Exam Results and Review
No ratings yet
Midterm Lab Exam Results and Review
17 pages
Cyber Abuse Detection in Roman Urdu
No ratings yet
Cyber Abuse Detection in Roman Urdu
4 pages
Unit IV - Classification, Clustering and Association - Imp
No ratings yet
Unit IV - Classification, Clustering and Association - Imp
21 pages
Kurdish Multilabel Emotional Dataset
No ratings yet
Kurdish Multilabel Emotional Dataset
8 pages
Sentiment Analysis Mini-Project Report
75% (4)
Sentiment Analysis Mini-Project Report
45 pages
Python Report Email Spam Detection
No ratings yet
Python Report Email Spam Detection
13 pages
Business Analytics Test Bank: Python
No ratings yet
Business Analytics Test Bank: Python
7 pages
AI and Database Management Course Outline
No ratings yet
AI and Database Management Course Outline
3 pages
CRISP-DM vs SEMMA Methodologies Explained
No ratings yet
CRISP-DM vs SEMMA Methodologies Explained
33 pages
Ensemble Learning and Random Forests Guide
No ratings yet
Ensemble Learning and Random Forests Guide
15 pages
Supervised Learning: Classification in ML
No ratings yet
Supervised Learning: Classification in ML
33 pages
Unit-wise Machine Learning Questions
No ratings yet
Unit-wise Machine Learning Questions
3 pages
Employee Turnover Prediction with ML
No ratings yet
Employee Turnover Prediction with ML
7 pages
Predicting HIE Outcomes via Heart Rate Variability
No ratings yet
Predicting HIE Outcomes via Heart Rate Variability
7 pages
SRKR Engineering College Internship Report
No ratings yet
SRKR Engineering College Internship Report
27 pages
Machine Learning Concepts Explained
No ratings yet
Machine Learning Concepts Explained
112 pages
Machine Learning vs AI: Key Differences
No ratings yet
Machine Learning vs AI: Key Differences
42 pages
Executive PG in ML & AI at IIIT Bangalore
No ratings yet
Executive PG in ML & AI at IIIT Bangalore
23 pages
Naive Bayes Classifier in R
No ratings yet
Naive Bayes Classifier in R
2 pages
Kuleshov's Generative Models Lecture 2
No ratings yet
Kuleshov's Generative Models Lecture 2
31 pages
Spark Naive Bayes for Yelp Ratings
No ratings yet
Spark Naive Bayes for Yelp Ratings
7 pages
Text Classification PDF
No ratings yet
Text Classification PDF
34 pages
Data Mining Lab with Python Guide
No ratings yet
Data Mining Lab with Python Guide
39 pages
NLP-Based Spam Detection System
No ratings yet
NLP-Based Spam Detection System
2 pages
Text Mining (Concepts, Implementation, and Big Data Challenge) (2nd Edition) Jo PDF
No ratings yet
Text Mining (Concepts, Implementation, and Big Data Challenge) (2nd Edition) Jo PDF
10 pages
AI Search Algorithms Lab Manual
No ratings yet
AI Search Algorithms Lab Manual
43 pages

Bayesian Learning and Decision Theory

Uploaded by

Bayesian Learning and Decision Theory

Uploaded by

LESSON 7

Bayesian learning algorithm is a method of calculating probabilities for hypothesis

It is one of the most practical approaches to certain type of learning problems

Bayesian Decision Theory

The above can be written as

P(h) = prior probability of hypothesis h.

P(D) = prior probability of training data D.

P(h|D) = probability of h given D.

P(D|h) = probability of D given h.

P(D|h): The probability of observing the sample D, given hypothesis h

- E.g. sequences could also be classified, based on a probabilistic model specification

Bayes Optimal Classifier

1. What is the probability that this patient has cancer?

2. What is the probability that he does not have cancer?

3. What is the diagnosis?

For each hypothesis h in H, calculate the posterior probability

Output the hypothesis hmap with the highest posterior probability

Useful observation: it does not depend on the denominator P(d)

Now we compute the diagnosis

Some Results from the Analysis of Learners in a Bayesian Framework

An important, special and simple of a Bayes optimal classifier, where

– all attributes are independent given the class

All the attributes belong to the same class.

What can we do if our data d has several attributes?

It is a simplifying assumption, obviously it may be violated in reality and in spite of that, it

One of the most practical learning methods

Has been successful applied in the following areas:

Given the following Training data (Play Tennis)

For each target value (hypothesis) h

For each attribute value at of each datum instance

Based on the examples in the table, classify the following datum x:

x=(Outl=Sunny, Temp=Cool, Hum=High, Wind=strong)

That means we need to determine either Play tennis or not?

Let each instance x of a training set D be described by a conjunction of n attribute values

vMAP = argmaxvj V P(vj|a1,a2,..,an)

= argmaxvj V [P(a1,a2,..,an|vj) P(vj)/P(a1,a2,..,an)]

= argmaxvj V [P(a1,a2,..,an|vj) P(vj)

vNB = argmaxvj V P(vj) i P(ai|vj)

Bayesian Belief Networks

The Bayes Optimal Classifier is often too costly to apply.

We say that X is conditionally independent of Y given Z if the probability distribution governing

i.e., (xi,yj,zk) P(X=xi|Y=yj,Z=zk)=P(X=xi|Z=zk) or, P(X|Y,Z)=P(X|Z)

Representation in Bayesian Belief Networks

Each node is asserted to be conditionally independent of its non-descendants, given its

Inference in Bayesian Belief Networks

Unfortunately, exact inference of probabilities in general for an arbitrary Bayesian Network is

Learning Bayesian Belief Networks

The EM Algorithm: Learning with unobservable relevant variables.

The EM Algorithm iterates over two steps:

The EM Algorithm can be applied to more general problems

You might also like