0% found this document useful (0 votes)

12 views59 pages

Bayesian Learning in Machine Learning

Module 4 covers Bayesian Learning, introducing Bayes theorem and its application in machine learning, including concepts like Maximum a Posteriori (MAP) and Maximum Likelihood (ML) hypotheses. It discusses the advantages and challenges of Bayesian methods, such as the need for prior probabilities and computational costs. The module also explores practical examples, including medical diagnosis and probabilistic predictions in various scenarios.

Uploaded by

nidhimaa1820

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

12 views59 pages

Bayesian Learning in Machine Learning

Uploaded by

nidhimaa1820

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

ARTIFICIAL INTELLIGENCE

AND MACHINE LEARNING

18CS71

Module 4
Module 4

Bayesian Learning: Introduction, Bayes theorem,

Bayes theorem and concept learning, ML
and LS error hypothesis, ML for predicting, MDL
principle, Bates optimal classifier, Gibbs
algorithm, Navie Bayes classifier, BBN, EM Algorithm
Texbook1: Chapter 6
Bayesian Learning: Introduction
• Bayesian methods provide the basis for probabilistic learning methods that
accommodate (and require) knowledge about the prior probabilities of
alternative hypotheses and about the probability of observing various data given
the hypothesis.
• Bayesian methods allow assigning a posterior probability to each candidate
hypothesis, based on these assumed priors and the observed data.
Bayesian learning methods are relevant to study of machine learning for two
different reasons.
• First,Bayesian learning algorithms that calculate explicit probabilities for
hypotheses, such as the naive Bayes classifier, are among the most practical
approaches to certain types of learning problems
• The second reason is that they provide a useful perspective for
understanding many learning algorithms that do not explicitly manipulate
probabilities.
Features of Bayesian Learning Methods
• Each observed training example can incrementally decrease or increase the
estimated probability that a hypothesis is correct. This provides a more flexible
approach to learning than algorithms that completely eliminate a hypothesis if it
is found to be inconsistent with any single example
• Prior knowledge can be combined with observed data to determine the final
probability of a hypothesis. In Bayesian learning, prior knowledge is provided by
asserting (1) a prior probability for each candidate hypothesis, and (2) a
probability distribution over observed data for each possible hypothesis.
• Bayesian methods can accommodate hypotheses that make probabilistic
predictions
• New instances can be classified by combining the predictions of multiple
hypotheses, weighted by their probabilities.
• Even in cases where Bayesian methods prove computationally intractable, they
can provide a standard of optimal decision making against which other practical
methods can be measured.
Practical difficulty in applying Bayesian methods

• One practical difficulty in applying Bayesian methods is that they

typically require initial knowledge of many probabilities. When
these probabilities are not known in advance they are often
estimated based on background knowledge, previously available
data, and assumptions about the form of the underlying
distributions.
• A second practical difficulty is the significant computational cost
required to determine the Bayes optimal hypothesis in the general
case. In certain specialized situations, this computational cost can
be significantly reduced.
BAYES THEOREM
• Bayes theorem provides a way to calculate the probability of a
hypothesis based on its prior probability, the probabilities of
observing various data given the hypothesis, and the observed data
itself.
• Notations
• P(h) prior probability of h, reflects any background knowledge
about the chance that h is correct
• P(D) prior probability of D, probability that D will be observed
• P(D|h) probability of observing D given a world in which h holds
• P(h|D) posterior probability of h, reflects confidence that h holds
after D has been observed
Bayes theorem is the cornerstone of Bayesian learning methods
because it provides a way to calculate the posterior probability
P(h|D), from the prior probability P(h), together with P(D) and
P(D(h).

P(h|D) increases with P(h) and with P(D|h) according to Bayes

theorem.
P(h|D) decreases as P(D) increases, because the more probable it is
that D will be observed independent of h, the less evidence D
provides in support of h.
Maximum a Posteriori (MAP) Hypothesis
• In many learning scenarios, the learner considers some set of candidate
hypotheses H and is interested in finding the most probable hypothesis h ∈
H given the observed data D. Any such maximally probable hypothesis is
called a maximum a posteriori (MAP) hypothesis.
• Bayes theorem to calculate the posterior probability of each candidate
hypothesis is hMAP is a MAP hypothesis provided

• P(D) can be dropped, because it is a constant independent of h

Maximum Likelihood (ML) Hypothesis

In some cases, it is assumed that every hypothesis in H is equally probable a priori

(P(hi) = P(hj) for all hi and hj in H).
In this case the below equation can be simplified and need only consider the term
P(D|h) to find the most probable hypothesis.

P(D|h) is often called the likelihood of the data D given h, and any hypothesis that
maximizes P(D|h) is called a maximum likelihood (ML) hypothesis
The avaiiable data is from a particular laboratory test with two
possible outcomes: $ (positive) and 8 (negative). We have prior
knowledge that over the entire population of people only .008 have
this disease. Furthermore, the lab test is only an imperfect indicator
of the disease. The test returns a correct positive result in only 98% of
the cases in which the disease is actually present and a correct
negative result in only 97% of the cases in which the disease is not
present. In other cases, the test returns the opposite result.

Suppose we now observe a new patient for whom the lab test returns
a positive result. Should we diagnose the patient as having cancer or
not?
Example
Consider a medical diagnosis problem in which there are two alternative hypotheses
• The patient has a particular form of cancer (denoted by cancer)
• The patient does not (denoted by ¬ cancer)

The available data is from a particular laboratory with two possible outcomes: +
(positive) and - (negative)
• Suppose a new patient is observed for whom the lab test returns a positive (+)
result.
• Should we diagnose the patient as having cancer or not?
Chance that Team 0 wins: (Y0)→95%
Probability that Team 0 wins: P(Y0)→0.95
Probability that Team 1 wins: P(Y1)→(1-0.95)= 0.05

Probability that Team 1 hosts the match & wins: (75%)-P(X1|Y1)→0.75

P(Y0|X1) = 1- 0.1162 = 0.8838

Team 0 is more probable of Winning than team 1

Another Method:
P(Y1|X1) = (P(X1|Y1) * P (Y1))
= 0.0375
P(Y0|X1) = (P(X1|Y0) * P (Y0))
= 0.285

Team 0 is more probable of Winning than team 1

Let us assume dangerous fire are rare
(1%) but smoke is fairly common (10%)
due to barbeques and 90% of the
dangerous fires makes smoke. Find the
probability of dangerous fire when there
is smoke
Let us assume dangerous fire are rare (1%) but smoke is fairly
common (10%) due to barbeques and 90% of the dangerous
fires makes smoke. Find the probability of dangerous fire
when there is smoke
BAYES T H EO R E M A N D C O N C E P T L EA R N I N G

What is the relationship between Bayes theorem and

the problem of concept learning?

Since Bayes theorem provides a principled way to calculate

the posterior probability of each hypothesis given the
training data, and can use it as the basis for a straightforward
learning algorithm that calculates the probability for each
possible hypothesis, then outputs the most probable.
Brute-Force Bayes Concept Learning

We can design a straightforward concept learning algorithm to output the maximum

a posteriori hypothesis, based on Bayes theorem, as follows:
In order specify a learning problem for the BRUTE-FORCE MAP LEARNING
algorithm we must specify what values are to be used for P(h) and for P(D|h) ?

Lets choose P(h) and for P(D|h) to be consistent with the following assumptions:
• The training data D is noise free (i.e., di = c(xi))
• The target concept c is contained in the hypothesis space H
• We have no a priori reason to believe that any hypothesis is more probable than any other.
What values should we specify for P(h)?
• Given no prior knowledge that one hypothesis is more likely than another, it is
reasonable to assign the same prior probability to every hypothesis h in H.
• Assume the target concept is contained in H and require that these prior
probabilities sum to 1.
What choice shall we make for P(D|h)?

• P(D|h) is the probability of observing the target values D = (d1 . . .dm) for the
fixed set of instances (x1 . . . xm), given a world in which hypothesis h holds
• Since we assume noise-free training data, the probability of observing
classification di given h is just 1 if di = h(xi) and 0 if di # h(xi). Therefore,
Given these choices for P(h) and for P(D|h) we now have a fully-defined
problem for the above BRUTE-FORCE MAP LEARNING algorithm.

In a first step, we have to determine the probabilities for P(h|D)

To summarize, Bayes theorem implies that the posterior probability P(h|D)
under our assumed P(h) and P(D|h) is

where |VSH,D| is the number of hypotheses from H consistent with D

The below figure shows the evaluation of posterior probabilities. Initially all
hypotheses have the same probability (in figure 6.1 a). As training data
accumulates (Figures 6.1 b and 6. lc), the posterior probability for inconsistent
hypotheses becomes zero while the total probability summing to one is
shared equally among the remaining consistent hypotheses. The above
analysis implies that under our choice for P(h) and P(Dlh), every consistent
hypothesis has posterior probability (1 /I V SH, I), and every inconsistent
hypothesis has posterior probability 0. Every consistent hypothesis is,
therefore, a MAP hypothesis.
MAP Hypotheses and Consistent Learners

• Learning algorithm is a consistent learner provided it outputs a hypothesis

that commits zero errors over the training examples.
• Using Bayes theorem and posterior probability, we can conclude that every
consistent learner outputs a MAP hypothesis, if we assume a uniform prior
probability distribution over H (i.e., P(hi) = P(hj) for all i, j), and if we assume
deterministic, noise free training data (i.e., P(D Ih) = 1 if D and h are
consistent, and 0 otherwise).
• SECOND IA PORTIONS WILL START FROM NEXT SLIDE
MAXIMUM LIKELIHOOD AND LEAST-SQUARED
ERROR HYPOTHESES

Consider the problem of learning a continuous-valued target function such as neural

network learning, linear regression, and polynomial curve fitting

A straightforward Bayesian analysis will show that under certain assumptions any
learning algorithm that minimizes the squared error between the output hypothesis
predictions and the training data will output a maximum likelihood (ML) hypothesis
Learning A Continuous-Valued Target Function

• Learner L considers an instance space X and a hypothesis space H consisting of some class of real-
valued functions defined over X, i.e., (∀ h ∈ H)[ h : X → R] and training examples of the form
<xi,di>
• The problem faced by L is to learn an unknown target function f : X → R
• A set of m training examples is provided, where the target value of each example is corrupted by
random noise drawn according to a Normal probability distribution with zero mean (di = f(xi) + ei)
• Each training example is a pair of the form (xi ,di ) where di = f (xi ) + ei .
– Here f(xi) is the noise-free value of the target function and ei is a random variable representing
the noise.
–It is assumed that the values of the ei are drawn independently and that they are distributed
according to a Normal distribution with zero mean.
• The task of the learner is to output a maximum likelihood hypothesis, or, equivalently, a MAP
hypothesis assuming all hypotheses are equally probable a priori.
Learning A Linear Function

• The target function f corresponds to the solid

line.
• The training examples (xi , di ) are assumed to
have Normally distributed noise ei with zero
mean added to the true target value f (xi ).
• The dashed line corresponds to the hypothesis
hML with least-squared training error, hence the
maximum likelihood hypothesis.
• Notice that the maximum likelihood hypothesis is
not necessarily identical to the correct
hypothesis, f, because it is inferred from only a
limited sample of noisy training data
Normal Probability Distribution (Gaussian Distribution)

A Normal distribution is a smooth, bell-shaped distribution that can be completely

characterized by its mean μ and its standard deviation σ
Using the previous
definition of h we have ML

Assuming training examples are mutually independent given h, we can write P(D|h) as the product of
the various (di|h)

Given the noise ei obeys a Normal distribution with zero mean and unknown variance σ2 , each di
must also obey a Normal distribution around the true targetvalue f(xi). Because we are writing the
expression for P(D|h), we assume h is the correct description of f. Hence, µ = f(xi) = h(xi)
MAXIMUM LIKELIHOOD HYPOTHESES FOR
PREDICTING PROBABILITIES
Consider the setting in which we wish to learn a nondeterministic (probabilistic)
function f : X → {0, 1}, which has two discrete output values.

We want a function approximator whose output is the probability that f(x) = 1

In other words , learn the target function
f ’ : X → [0, 1] such that f ’ (x) = P(f(x) = 1)

How can we learn f' using a neural network?

Use of brute force way would be to first collect the observed frequencies of 1's and
0's for each possible value of x and to then train the neural network to output the
target frequency for each x.

Deepak D, Asst. Prof., Dept. of CSE, Canara Engg. College 41

What criterion should we optimize in order to find a maximum likelihood
hypothesis for f' in this setting?

• First obtain an expression for P(D|h)

• Assume the training data D is of the form D = {(x , d ) . . . (x
1 1 m, dm)}, where di is the
observed 0 or 1 value for f (xi).
• Both x and d as random variables, and assuming that each training example is
i i
drawn independently, we can write P(D|h) as

Applying the product rule

The probability P(di|h, xi)

Re-express it in a more mathematically manipulable form, as

Equation (4) to substitute for P(di |h, xi) in Equation (5) to obtain
We write an expression for the maximum likelihood hypothesis

The last term is a constant independent of h, so it can be dropped

It easier to work with the log of the likelihood, yielding

Equation (7) describes the quantity that must be maximized in order to obtain the maximum
likelihood hypothesis in our current problem setting
Gradient Search to Maximize Likelihood in a Neural Net
Derive a weight-training rule for neural network learning that seeks to maximize G(h, D) using
gradient ascent
• The gradient of G(h, D) is given by the vector of partial derivatives of G(h, D) with respect to the
various network weights that define the hypothesis h represented by the learned network
• In this case, the partial derivative of G(h, D) with respect to weight wjk from input k to unit j is
Suppose our neural network is constructed from a single layer of sigmoid units. Then,

where xijk is the kth input to unit j for the ith training example, and d(x) is the derivative of the sigmoid
squashing function.
Finally, substituting this expression into Equation (1), we obtain a simple expression for the
derivatives that constitute the gradient
Because we seek to maximize rather than minimize P(D|h),
we perform gradient ascent rather than gradient descent
search. On each iteration of the search the weight vector is
adjusted in the direction of the gradient, using the weight
update rule

where η is a small positive constant that determines the step size of the i gradient ascent search
It is interesting to compare this weight-update rule to the weight-update rule used by the
BACKPROPAGATION algorithm to minimize the sum of squared errors between predicted and
observed network outputs.
The BACKPROPAGATION update rule for output unit weights, re-expressed using our current
notation, is
MINIMUM DESCRIPTION LENGTH PRINCIPLE
• A Bayesian perspective on Occam’s razor
• Motivated by interpreting the definition of hMAP in the light of basic concepts from information
theory.

which can be equivalently expressed in terms of maximizing the log2

or alternatively, minimizing the negative of this quantity

• This equation can be interpreted as a statement that short hypotheses are preferred, assuming a
particular representation scheme for encoding hypotheses and data
• Consider the problem of designing a code to transmit messages drawn at random
• i is the message
• The probability of encountering message i is pi
• Interested in the most compact code; that is, interested in the code that minimizes
the
expected number of bits we must transmit in order to encode a message drawn at
random
• To minimize the expected code length we should assign shorter codes to messages
that are more probable
• Shannon and Weaver (1949) showed that the optimal code (i.e., the code that
minimizes the expected message length) assigns - log, pi bitst to encode message
i.
• The number of bits required to encode message i using code C as the description
length
of message i with respect to C, which we denote by Lc(i).
Interpreting the equation

• -log2P(h): the description length of h under the optimal encoding for the hypothesis space H
LCH (h) = −log2P(h), where CH is the optimal code for hypothesis space H.
• -log2P(D | h): the description length of the training data D given hypothesis h, under the
optimal encoding fro the hypothesis space H: LCH (D|h) = −log2P(D| h) , where C D|h is the
optimal code for describing data D assuming that both the sender and receiver know the
hypothesis h.

Rewrite Equation (1) to show that hMAP is the hypothesis h that minimizes the sum given by the
description length of the hypothesis plus the description length of the data given the hypothesis.

where CH and CD|h are the optimal encodings for H and for D given h
The Minimum Description Length (MDL) principle recommends choosing the hypothesis that
minimizes the sum of these two description lengths of equ.

Minimum Description Length principle:

Where, codes C1 and C2 to represent the hypothesis and the data given the hypothesis

The above analysis shows that if we choose C1 to be the optimal encoding of hypotheses CH, and if
we choose C2 to be the optimal encoding CD|h, then hMDL = hMAP
Application to Decision Tree Learning
Apply the MDL principle to the problem of learning decision trees from some training data.
What should we choose for the representations C1 and C2 of hypotheses and data?
• For C1: C1 might be some obvious encoding, in which the description length grows with the
number of nodes and with the number of edges
• For C2: Suppose that the sequence of instances (x1 . . .xm) is already known to both the transmitter
and receiver, so that we need only transmit the classifications (f (x1) . . . f (xm)).
Now if the training classifications (f (x1) . . .f(xm)) are identical to the predictions of the
hypothesis, then there is no need to transmit any information about these examples. The
description length of the classifications given the hypothesis ZERO
If examples are misclassified by h, then for each misclassification we need to transmit a message
that identifies which example is misclassified as well as its correct classification
The hypothesis hMDL under the encoding C1 and C2 is just the one that minimizes the sum of these
description lengths.
• MDL principle provides a way for trading off hypothesis complexity for the number of errors
committed by the hypothesis
• MDL provides a way to deal with the issue of overfitting the data.
• Short imperfect hypothesis may be selected over a long perfect hypothesis.
BAYES OPTIMAL CLASSIFIER

• The Bayes optimal classifier combines the predictions of all alternative

hypotheses, weighted by their posterior probabilities, to calculate the most
probable classification of each new instance
BAYES OPTIMAL CLASSIFIER
GIBBS ALGORITHM
THE EM ALGORITHM

• The EM algorithm (Dempster et al. 1977), a widely used

approach to learning in the presence of unobserved
variables.
• The EM algorithm can be used even for variables whose
value is never directly observed, provided the general
form of the probability distribution governing these
variables is known.
• Estimating Means of k Gaussians
• Consider a problem in which the data D is a set of
instances generated by a probability distribution that is
a mixture of k distinct Normal distributions.
The above fig is for the case where k = 2 and where the instances are the
points shown along the x axis.
Each instance is generated using a two-step process.
First, one of the k Normal distributions is selected at random.
Second, a single random instance xi is generated according to this
selected distribution.
This process is repeated to generate a set of data points as shown in the
figure.
• To simplify our discussion, we consider the special case where the selection of the
single Normal distribution at each step is based on choosing each with uniform
probability, where each of the k Normal distributions has the same variance a2, and
where a2 is known. The learning task is to output a hypothesis h = (FI, . . . pk) that
describes the means of each of the k distributions.

Bayesian Learning and Hypothesis Evaluation
No ratings yet
Bayesian Learning and Hypothesis Evaluation
36 pages
Bayesian Learning Methods Explained
No ratings yet
Bayesian Learning Methods Explained
26 pages
Unit 4 ML Shashi
No ratings yet
Unit 4 ML Shashi
40 pages
Understanding Bayesian Learning Methods
No ratings yet
Understanding Bayesian Learning Methods
130 pages
Bayesian Learning in Machine Learning
No ratings yet
Bayesian Learning in Machine Learning
65 pages
Understanding Bayesian Learning Methods
No ratings yet
Understanding Bayesian Learning Methods
24 pages
Bayesian Learning in Machine Learning
No ratings yet
Bayesian Learning in Machine Learning
24 pages
Bayesian Learning in Machine Learning
No ratings yet
Bayesian Learning in Machine Learning
24 pages
Bayesian Learning Methods Overview
No ratings yet
Bayesian Learning Methods Overview
44 pages
Bayesian Learning in Machine Learning
No ratings yet
Bayesian Learning in Machine Learning
81 pages
Bayesian Learning in Machine Learning
No ratings yet
Bayesian Learning in Machine Learning
30 pages
Bayesian Learning in Machine Learning
No ratings yet
Bayesian Learning in Machine Learning
70 pages
Bayesian Concept Learning Overview
No ratings yet
Bayesian Concept Learning Overview
60 pages
Introduction to Bayesian Learning Theory
No ratings yet
Introduction to Bayesian Learning Theory
178 pages
Bayesian Learning in AI and ML
No ratings yet
Bayesian Learning in AI and ML
30 pages
Understanding Bayes' Theorem in Learning
No ratings yet
Understanding Bayes' Theorem in Learning
22 pages
Bayesian Learning in Machine Learning
No ratings yet
Bayesian Learning in Machine Learning
27 pages
Understanding Bayesian Learning Methods
No ratings yet
Understanding Bayesian Learning Methods
50 pages
Ai&ml Unit 3 (A)
No ratings yet
Ai&ml Unit 3 (A)
18 pages
Bayesian Methods in Machine Learning
No ratings yet
Bayesian Methods in Machine Learning
36 pages
Ai&Ml - Module 4 Final
No ratings yet
Ai&Ml - Module 4 Final
24 pages
Bayesian Learning in Machine Learning
No ratings yet
Bayesian Learning in Machine Learning
12 pages
Bayesian Learning in Machine Learning
No ratings yet
Bayesian Learning in Machine Learning
19 pages
Bayesian Learning in Machine Learning
No ratings yet
Bayesian Learning in Machine Learning
41 pages
Bayesian Learning in Machine Learning
No ratings yet
Bayesian Learning in Machine Learning
24 pages
Bayesian Learning in Machine Learning
No ratings yet
Bayesian Learning in Machine Learning
54 pages
Bayesian Learning in Machine Learning
No ratings yet
Bayesian Learning in Machine Learning
25 pages
Bayesian Learning in Machine Learning
No ratings yet
Bayesian Learning in Machine Learning
73 pages
Understanding Bayesian Learning Methods
No ratings yet
Understanding Bayesian Learning Methods
42 pages
Understanding Bayesian Learning Methods
No ratings yet
Understanding Bayesian Learning Methods
10 pages
Bayesian Learning in Machine Learning
No ratings yet
Bayesian Learning in Machine Learning
24 pages
Bayesian Learning and Hypothesis Testing
No ratings yet
Bayesian Learning and Hypothesis Testing
36 pages
ML Module4 Part1
No ratings yet
ML Module4 Part1
90 pages
Bayesian Learning in Machine Learning
No ratings yet
Bayesian Learning in Machine Learning
44 pages
Bayesian Learning in Machine Learning
No ratings yet
Bayesian Learning in Machine Learning
24 pages
Understanding Bayesian Learning Methods
No ratings yet
Understanding Bayesian Learning Methods
14 pages
Bayesian Learning in Machine Learning
No ratings yet
Bayesian Learning in Machine Learning
25 pages
Bayesian Learning in Machine Learning
No ratings yet
Bayesian Learning in Machine Learning
60 pages
@vtudeveloper - in ML Mod 4
No ratings yet
@vtudeveloper - in ML Mod 4
11 pages
Unit 3 17156657902872
No ratings yet
Unit 3 17156657902872
16 pages
Bayesian Learning in Machine Learning
No ratings yet
Bayesian Learning in Machine Learning
25 pages
Bayesian Learning in Machine Learning
No ratings yet
Bayesian Learning in Machine Learning
14 pages
Bayesian Learning in Machine Learning
No ratings yet
Bayesian Learning in Machine Learning
19 pages
Chapter 8 ML
No ratings yet
Chapter 8 ML
14 pages
Bayesian Learning and Inference Techniques
No ratings yet
Bayesian Learning and Inference Techniques
26 pages
MLnotes Module 4
No ratings yet
MLnotes Module 4
30 pages
Understanding Naive Bayes Classification
No ratings yet
Understanding Naive Bayes Classification
12 pages
Bayesian Learning in Machine Learning
No ratings yet
Bayesian Learning in Machine Learning
34 pages
Bayesian Learning in Machine Learning
No ratings yet
Bayesian Learning in Machine Learning
25 pages
Bayesian Learning in Machine Learning
No ratings yet
Bayesian Learning in Machine Learning
27 pages
Bayesian Learning in Machine Learning
No ratings yet
Bayesian Learning in Machine Learning
51 pages
Understanding Bayes' Theorem in Learning
No ratings yet
Understanding Bayes' Theorem in Learning
31 pages
Bayesian Learning in Machine Learning
No ratings yet
Bayesian Learning in Machine Learning
18 pages
AI Neural Networks and Bayesian Learning
No ratings yet
AI Neural Networks and Bayesian Learning
46 pages
Bayesian Learning in Machine Learning
No ratings yet
Bayesian Learning in Machine Learning
45 pages
Bayesian Statistics and Inference Techniques
No ratings yet
Bayesian Statistics and Inference Techniques
57 pages
Bayesian Learning in Machine Learning
No ratings yet
Bayesian Learning in Machine Learning
31 pages
Bayes Theorem in Machine Learning Concepts
No ratings yet
Bayes Theorem in Machine Learning Concepts
49 pages
Understanding Bayesian Learning Concepts
No ratings yet
Understanding Bayesian Learning Concepts
14 pages
Life of Saint Francis of Assisi
No ratings yet
Life of Saint Francis of Assisi
4 pages
Yukiko Shirai: ESL & Japanese Instructor Resume
100% (1)
Yukiko Shirai: ESL & Japanese Instructor Resume
2 pages
Anh Văn 2
0% (1)
Anh Văn 2
125 pages
Discovery of Oyu Tolgoi Case Study of Mineral and Geological Exploration Sergei Diakov Ebook & Testbank
No ratings yet
Discovery of Oyu Tolgoi Case Study of Mineral and Geological Exploration Sergei Diakov Ebook & Testbank
289 pages
S1 English Exam Coverage 2024-2025
No ratings yet
S1 English Exam Coverage 2024-2025
5 pages
DML Operations in SQL Laboratory Report
No ratings yet
DML Operations in SQL Laboratory Report
5 pages
Thematic Index of the Popol Vuh
No ratings yet
Thematic Index of the Popol Vuh
4 pages
Class 5 Maths Pattern Worksheet
No ratings yet
Class 5 Maths Pattern Worksheet
3 pages
Grade 10 English Lesson Plan
No ratings yet
Grade 10 English Lesson Plan
2 pages
SEO Best Practices Guide
No ratings yet
SEO Best Practices Guide
8 pages
Alibaba ACA Cloud Exam Prep Guide
No ratings yet
Alibaba ACA Cloud Exam Prep Guide
12 pages
ESL Workbook: Learning Vegetables
No ratings yet
ESL Workbook: Learning Vegetables
25 pages
Unit 11 He Was A Journalist
No ratings yet
Unit 11 He Was A Journalist
7 pages
Audio Signal Processing Simulation
No ratings yet
Audio Signal Processing Simulation
13 pages
The Role of Poetry in Society
No ratings yet
The Role of Poetry in Society
10 pages
Types of Binary Trees Explained
No ratings yet
Types of Binary Trees Explained
27 pages
Baudelaire's 19th-Century Poetic Alchemy
No ratings yet
Baudelaire's 19th-Century Poetic Alchemy
4 pages
Web-ADI Excel Setup and Error Solutions
100% (1)
Web-ADI Excel Setup and Error Solutions
18 pages
AI&ML Internal Assessment Results 2025
No ratings yet
AI&ML Internal Assessment Results 2025
5 pages
Integers and Their Operations: Grade 7 Module
No ratings yet
Integers and Their Operations: Grade 7 Module
10 pages
Weekend Activities: Affirmative & Negative
No ratings yet
Weekend Activities: Affirmative & Negative
2 pages
Seventy-Two Magical Alphabets
No ratings yet
Seventy-Two Magical Alphabets
7 pages
Educator Performance Assessment Guide
No ratings yet
Educator Performance Assessment Guide
22 pages
Explication of "The Lady of Shalott"
No ratings yet
Explication of "The Lady of Shalott"
6 pages
PPF Account Statement 2017-2025
No ratings yet
PPF Account Statement 2017-2025
3 pages
Uct Ebe CPD Introduction To Spatial Design Updated Nov
No ratings yet
Uct Ebe CPD Introduction To Spatial Design Updated Nov
6 pages
Understanding Song Structure and Purpose
No ratings yet
Understanding Song Structure and Purpose
5 pages
Grade 2 Lesson Plan: Demonstrative Pronouns
No ratings yet
Grade 2 Lesson Plan: Demonstrative Pronouns
5 pages
A Late Ptolemaics Statue of Hathor From Her Temple at Dendereh
No ratings yet
A Late Ptolemaics Statue of Hathor From Her Temple at Dendereh
14 pages
Min Stack Implementation in Java
No ratings yet
Min Stack Implementation in Java
3 pages

Bayesian Learning in Machine Learning

Uploaded by

Bayesian Learning in Machine Learning

Uploaded by

ARTIFICIAL INTELLIGENCE

AND MACHINE LEARNING

Bayesian Learning: Introduction, Bayes theorem,

• One practical difficulty in applying Bayesian methods is that they

P(h|D) increases with P(h) and with P(D|h) according to Bayes

• P(D) can be dropped, because it is a constant independent of h

In some cases, it is assumed that every hypothesis in H is equally probable a priori

Probability that Team 1 hosts the match & wins: (75%)-P(X1|Y1)→0.75

P(Y0|X1) = 1- 0.1162 = 0.8838

Team 0 is more probable of Winning than team 1

Team 0 is more probable of Winning than team 1

What is the relationship between Bayes theorem and

Since Bayes theorem provides a principled way to calculate

We can design a straightforward concept learning algorithm to output the maximum

In a first step, we have to determine the probabilities for P(h|D)

where |VSH,D| is the number of hypotheses from H consistent with D

• Learning algorithm is a consistent learner provided it outputs a hypothesis

Consider the problem of learning a continuous-valued target function such as neural

• The target function f corresponds to the solid

A Normal distribution is a smooth, bell-shaped distribution that can be completely

We want a function approximator whose output is the probability that f(x) = 1

How can we learn f' using a neural network?

Deepak D, Asst. Prof., Dept. of CSE, Canara Engg. College 41

• First obtain an expression for P(D|h)

Applying the product rule

Re-express it in a more mathematically manipulable form, as

The last term is a constant independent of h, so it can be dropped

It easier to work with the log of the likelihood, yielding

which can be equivalently expressed in terms of maximizing the log2

or alternatively, minimizing the negative of this quantity

Minimum Description Length principle:

• The Bayes optimal classifier combines the predictions of all alternative

• The EM algorithm (Dempster et al. 1977), a widely used

You might also like