0% found this document useful (0 votes)

14 views47 pages

Importance of Bayesian Methods in ML

Uploaded by

Manali Patel

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

14 views47 pages

Importance of Bayesian Methods in ML

Uploaded by

Manali Patel

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

6.

Bayesian Concept Learning

Subject: Machine Learning (3170724)
Faculty: Dr. Ami Tusharkant Choksi
Associate professor, Computer Engineering Department,
Navyug Vidyabhavan Trust
[Link] College of Engineering and Technology,
Surat, Gujarat State, India.
Website: [Link]

Dr. Ami T. Choksi @CKPCET Machine Learning (3170724)

Contents
- Impotence of Bayesian methods,
- Bayesian theorem,
- Bayes’ theorem and concept learning,
- Bayesian Belief Network

05 hours

CO-3 Evaluate the various Supervised Learning algorithms using appropriate Dataset.

Dr. Ami T. Choksi @CKPCET Machine Learning (3170724) 2

Introduction
● Bayes theorem was introduced by 18th century mathematician Thomas Bayes.
● He developed the foundational mathematical principles, known as Bayesian
methods, which describe the probability of events, and more importantly, how
probabilities should be revised when there is additional information available.

Dr. Ami T. Choksi @CKPCET Machine Learning (3170724) 3

WHY BAYESIAN METHODS ARE IMPORTANT?
● Bayesian learning algorithms, like the naive Bayes classifier, are highly practical
approaches to certain types of learning problems as they can calculate explicit
probabilities for hypotheses.
● In many cases, they are equally competitive or even outperform the other
learning algorithms, including decision tree and neural network algorithms.
● Bayesian classifiers use a simple idea that the training data are utilized to
calculate an observed probability of each class based on feature values.
● When the same classifier is used later for unclassified data, it uses the observed
probabilities to predict the most likely class for the new features.
● The application of the observations from the training data can also be thought of
as applying our prior knowledge or prior belief to the probability of an outcome,
so that it has higher probability of meeting the actual or real-life outcome.
Dr. Ami T. Choksi @CKPCET Machine Learning (3170724) 4
WHY BAYESIAN METHODS ARE IMPORTANT?
● This simple concept is used in Bayes’ rule and applied for training a machine in
machine learning terms.
● Some of the real-life uses of Bayesian classifiers are as follows:
○ Text-based classification such as spam or junk mail filtering,
○ author identification, or topic categorization
○ Medical diagnosis such as given the presence of a set of observed symptoms during a disease,
identifying the probability of new patients having the disease
○ Network security such as detecting illegal intrusion or anomaly in computer networks
● Strength of Bayesian classifiers is that they utilize all available parameters,
others ignore the features that have weak effects.
● Bayesian classifiers assume that even if few individual parameters have small
effect on the outcome, the collective effect of those parameters could be quite
large. For such learning tasks, the naive Bayes classifier is most effective.
Dr. Ami T. Choksi @CKPCET Machine Learning (3170724) 5
Features of Bayesian learning methods
● Prior knowledge of the candidate hypothesis is combined with the observed data
for arriving at the final probability of a hypothesis.
● So, two important components are :
○ the prior probability of each candidate hypothesis
○ the probability distribution over the observed data set for each possible hypothesis.
● The Bayesian approach to learning is more flexible than the other approaches
because each observed training pattern can influence the outcome of the
hypothesis by increasing or decreasing the estimated probability about the
hypothesis, whereas most of the other algorithms tend to eliminate a hypothesis
if that is inconsistent with the single training pattern.

Dr. Ami T. Choksi @CKPCET Machine Learning (3170724) 6

Features of Bayesian learning methods
● Bayesian methods can perform better than the other methods while validating
the hypotheses that make probabilistic predictions.
○ For example, when starting a new software project, on the basis of the demographics of the
project, we can predict the probability of encountering challenges during execution of the project.
● Through the easy approach of Bayesian methods, it is possible to classify new
instances by combining the predictions of multiple hypotheses, weighted by
their respective probabilities.
● In some cases, when Bayesian methods cannot compute the outcome
deterministically, they can be used to create a standard for the optimal decision
against which the performance of other methods can be measured.
● Bayesian method depends on the probability of the hypothesis set. If these
probabilities are not known in advance, we will use background knowledge.

Dr. Ami T. Choksi @CKPCET Machine Learning (3170724) 7

BAYES’ THEOREM
● Concept learning: how a child starts to learn meaning of new words, e.g. ‘ball’.
● The child is provided with positive examples of ‘objects’ which are ‘ball’.
● At first, the child may be confused with many different colours, shapes and sizes
of the balls and may also get confused with some objects which look similar to
ball, like a balloon or a globe.
● The child’s parent continuously feeds her positive examples like ‘that is a ball’,
‘this is a green ball’, ‘bring me that small ball’, etc.
● Seldom there are negative examples used for such concept teaching, like ‘this is a
non-ball’, but the parent may clear the confusion of the child when it points to a
balloon and says it is a ball by saying ‘that is not a ball’.

Dr. Ami T. Choksi @CKPCET Machine Learning (3170724) 8

Ball

Shape: round Shape: round

Color: red Color: red
Size: small Size: big Shape: round
Material: Rubber Material: Rubber Color: red, blue, white, yellow,
like Plastic orange, red
Size: small
Material: Plastic

Dr. Ami T. Choksi @CKPCET Machine Learning (3170724) 9

BAYES’ THEOREM
● But it is observed that the learning is most influenced through positive examples
rather than through negative examples, and the expectation is that the child will
be able to identify the object ‘ball’ from a wide variety of objects and different
types of balls kept together once the concept of a ball is clear to her.
● We can extend this example to explain how we can expect machines to learn
through the feeding of positive examples, which forms the basis for concept
learning
● Let’s relate learning concept to model of Bayes.
● ‘meaning of a word’ as equivalent to learning, a concept using binary
classification.
● Let us define a concept set C and a corresponding function f(k). We also define
f(k) = 1, when k is within the set C and f(k) = 0
Dr. Ami T. Choksi @CKPCET Machine Learning (3170724) 10
BAYES’ THEOREM
● Our aim is to learn the indicator function f that defines which elements are
within the set C. So, by using the function f, we will be able to classify the
element either inside or outside our concept set.
● In Bayes’ theorem, we will learn how to use standard probability calculus to
determine the uncertainty about the function f, and we can validate the
classification by feeding positive examples.
● Bayes theorem,

● where A and B are conditionally related events and p(A|B) denotes the
probability of event A occurring when event B has already occurred.

Dr. Ami T. Choksi @CKPCET Machine Learning (3170724) 11

BAYES’ THEOREM
● Let us assume that we have a training data set D where we have noted some
observed data. Our task is to determine the best hypothesis in space H by using
the knowledge of D.
● We should have knowledge of 1. Prior probability 2. Posterior probability 3.
likelihood
● The prior knowledge or belief about the probabilities of various hypotheses in H
is called Prior in context of Bayes’ theorem.
● For example, if we have to determine whether a particular type of tumour is
malignant for a patient, the prior knowledge of such tumours becoming
malignant can be used to validate our current hypothesis and is a prior
probability or simply called Prior.

Dr. Ami T. Choksi @CKPCET Machine Learning (3170724) 12

BAYES’ THEOREM
● Let us introduce few notations to explain the concepts. We will assume that
○ P(h) is the initial probability of a hypothesis ‘h’ that the patient has a malignant tumour based
only on the malignancy test, without considering the prior knowledge of the correctness of the
test process or the so-called training data.
○ Similarly, P(T) is the prior probability that the training data will be observed or, in this case, the
probability of positive malignancy test results.
○ We will denote P(T|h) as the probability of observing data T in a space where ‘h’ holds true, which
means the probability of the test results showing a positive value when the tumour is actually
malignant

Dr. Ami T. Choksi @CKPCET Machine Learning (3170724) 13

Posterior probability
● The probability that a particular hypothesis holds for a data set based on the
Prior is called the posterior probability or simply Posterior.
● In the previous example, the probability of the hypothesis that the patient has a
malignant tumour considering the Prior of correctness of the malignancy test is
a posterior probability. In our notation, we will say that we are interested in
finding out P(h|T), which means whether the hypothesis holds true given the
observed training data T. This is called the posterior probability or simply
Posterior in machine learning language.
● So, the prior probability P(h), which represents the probability of the hypothesis
independent of the training data (Prior), now gets refined with the introduction
of influence of the training data as P(h|T).
● According to Bayes’ theorem combines prior and posterior probabilities.

Dr. Ami T. Choksi @CKPCET Machine Learning (3170724) 14

Bayes’ theorem
● From the above equation, we can deduce that P(h|T) increases as P(h) and
P(T|h) increases and also as P(T) decreases.
● The simple explanation is that when there is more probability that T can occur
independently of h then it is less probable that h can get support from T in its
occurrence.
● It is a common question in machine learning problems to find out the maximum
probable hypothesis h from a set of hypotheses H (h∈H) given the observed
training data T.
● This maximally probable hypothesis is called the maximum a posteriori (MAP)
hypothesis. By using Bayes’ theorem, we can identify the MAP hypothesis from
the posterior probability of each candidate hypothesis:

Dr. Ami T. Choksi @CKPCET Machine Learning (3170724) 15

Bayes’ theorem
● and as P(T) is a constant independent of h, in this case, we can write

● In certain machine learning problems, we can further simplify above equation. if

every hypothesis in H has equal probable priori as P(hi) = P(hj)[flipping coin
● P(head)=P(tail)],
● and then, we can determine P(h|T) from the probability P(T|h) only. Thus,
P(T|h) is called the likelihood of data T given h, and any hypothesis that
maximizes P(T|h) is called the maximum likelihood (ML) hypothesis, h.

Dr. Ami T. Choksi @CKPCET Machine Learning (3170724) 16

Bayes’ theorem

Above figures are for the conceptual and mathematical representation of Bayes theorem and the
relationship of Prior, Posterior and Likelihood.

Dr. Ami T. Choksi @CKPCET Machine Learning (3170724) 17

Example
● Let us take the example of malignancy identification in a particular patient’s
tumour as an application for Bayes rule.
● We will calculate how the prior knowledge of the percentage of cancer cases in a
sample population and probability of the test result being correct influence the
probability outcome of the correct diagnosis.
● We have two alternative hypotheses: (1) a particular tumour is of malignant
type and (2) a particular tumour is non-malignant type. The priori available
are—1. only 0.5% of the population has this kind of tumour which is malignant,
2. the laboratory report has some amount of incorrectness as it could detect the
malignancy was present only with 98% accuracy whereas could show the
malignancy was not present correctly only in 97% of cases.

Dr. Ami T. Choksi @CKPCET Machine Learning (3170724) 18

Example: Solution
● This means the test predicted malignancy was present which actually was a false alarm in 2% of
the cases, and also missed detecting the real malignant tumour in 3% of the cases.
● Let us denote Malignant Tumour = MT,
● Positive Lab Test = PT,
● Negative Lab Test = NT
● h1 = the particular tumour is of malignant type = MT in our example
● h2 = the particular tumour is not malignant type = !MT in our example
● P(MT) = 0.005
● P(!MT) = 0.995
● P(PT|MT) = 0.98
● P(PT|!MT) = 0.02
● P(NT|!MT) = 0.97
● P(NT|MT) = 0.03

Dr. Ami T. Choksi @CKPCET Machine Learning (3170724) 19

Example: Solution
● So, for the new patient, if the laboratory test report shows positive result, let us see if
we should declare this as the malignancy case or not:

● As P(h 2|PT) is higher than P(h1 |PT), it is clear that the hypothesis h2 has more
probability of being true. So, hMAP = h2 = !MT.

Dr. Ami T. Choksi @CKPCET Machine Learning (3170724) 20

Example: Solution
● This indicates that even if the posterior probability of malignancy is significantly
higher than that of nonmalignancy, the probability of this patient not having
malignancy is still higher on the basis of the prior knowledge.
● Also, it should be noted that through Bayes’ theorem, we identified the
probability of one hypothesis being higher than the other hypothesis, and we did
not completely accept or reject the hypothesis by this theorem.
● Furthermore, there is very high dependency on the availability of the prior data
for successful application of Bayes’ theorem.

Dr. Ami T. Choksi @CKPCET Machine Learning (3170724) 21

BAYES’ THEOREM AND CONCEPT LEARNING
● One simplistic view of concept learning can be that if we feed the machine with
the training data, then it can calculate the posterior probability of the
hypotheses and outputs the most probable hypothesis. This is also called brute-
force Bayesian learning algorithm
● It is also observed that consistency in providing the right probable hypothesis by
this algorithm is very comparable to the other algorithms.

Dr. Ami T. Choksi @CKPCET Machine Learning (3170724) 22

Brute-force Bayesian algorithm
● How to use the MAP(maximum a posteriori) hypothesis output to design a simple
learning algorithm called brute-force map learning algorithm
● Let us assume that the learner considers a finite hypothesis space H in which the
learner will try to learn some target concept c:X → {0,1} where X is the instance space
corresponding to H. The sequence of training examples is {(x1 , t1 ), (x2 , t2 ),…, (xm ,
tm )}, where xi is the instance of X and ti is the target concept of xi defined as ti =
c(xi).
● Without impacting the efficiency of the algorithm, we can assume that the sequence
of instances of x {x1 ,…, xm } is held fixed, and then, the sequence of target values
becomes T = {t1 ,…, tm }.
● For calculating the highest posterior probability, we can use Bayes’ theorem as
discussed earlier : Calculate the posterior probability of each hypothesis h in H:

Dr. Ami T. Choksi @CKPCET Machine Learning (3170724) 23

Buys_computer
Student-computer
Not student - credit_rating-fair-yes
no
Buy Computer or not?

Lecturer - low
P(yes)=0.5
P(no)=0.7

Dr. Ami T. Choksi @CKPCET Machine Learning (3170724) 24

Brute-force Bayesian algorithm
● Let us try to connect the concept learning problem with the problem of
identifying the h .
● On the basis of the probability distribution of P(h) and P(T|h), we can derive the
prior knowledge of the learning task. There are few important assumptions to be
made as follows:
● 1. The training data or target sequence T is noise free, which means that it is a
direct function of X only (i.e. ti = c(xi))
● 2. The concept c lies within the hypothesis space H
● 3. Each hypothesis is equally probable and independent of each other.
● On the basis of assumption 3, we can say that each hypothesis h within the space
H has equal prior probability, and also because of assumption 2, we can say that
these prior probabilities sum up to 1. So, we can write
Dr. Ami T. Choksi @CKPCET Machine Learning (3170724) 25
Brute-force Bayesian algorithm
● P(T|h) is the probability of observing the target values ti in the fixed set of
instances {x1 ,…, xm ) in the space where h holds true and describes the concept
c correctly.
● Using assumption 1, we can say that if T is consistent with h, then the probability
of data T given the hypothesis h is 1 and is 0 otherwise:

● Using Bayes’ theorem to identify the posterior probability

● For the cases when h is inconsistent with the training data T, we get

Dr. Ami T. Choksi @CKPCET Machine Learning (3170724) 26

Brute-force Bayesian algorithm
● when h is consistent with T,

● Now, if we define a subset of the

hypothesis H which is consistent
with T as HD , then by using the
total probability equation, we get
● So, with our set of assumptions
about P(h) and P(T|h), we get the
posterior probability P(h|T) as

Dr. Ami T. Choksi @CKPCET Machine Learning (3170724) 27

Brute-force Bayesian algorithm
● So, with our set of assumptions about P(h) and P(T|h), we get the posterior probability
P(h|T) as

● where H is the number of hypotheses from the space H which are consistent with target data
set T.
● The interpretation of this evaluation is that initially, each hypothesis has equal probability
and, as we introduce the training data, the posterior probability of inconsistent hypotheses
becomes zero and the total probability that sums up to 1 is distributed equally among
● the consistent hypotheses in the set. So, under this condition, each consistent hypothesis is a
MAP hypothesis with posterior probability 1/|HD|

Dr. Ami T. Choksi @CKPCET Machine Learning (3170724) 28

Concept of consistent learners
● From the discussion, we understand the behaviour of the general class of learner
whom we call as consistent learners.
● So, the group of learners who commit zero error over the training data and output
the hypothesis are called consistent learners.
● If the training data is noise free and deterministic (i.e. P(D|h) = 1 if D and h are
consistent and 0 otherwise) and
● if there is uniform prior probability distribution over H (so, P(hm) = P(hn ) for all
m, n), then every consistent learner outputs the MAP hypothesis.
● An important application of this conclusion is that Bayes’ theorem can
characterize the behaviour of learning algorithms even when the algorithm does
not explicitly manipulate the probability.
● As it can help to identify the optimal distributions of P(h) and P(T|h) under which
the algorithm outputs the MAP hypothesis, the knowledge can be used to
characterize the assumptions under which the algorithms behave optimally.
● The theorem can be used with the same effectiveness for noisy training data and
additional assumptions about the probability distribution governing the noise.
Dr. Ami T. Choksi @CKPCET Machine Learning (3170724) 29
Bayes optimal classifier
● Bayes Optimal Classifier is a probabilistic model that finds the most probable
prediction using the training data and space of hypotheses to make a prediction
for a new data instance
● Why Bayes optimal classifier: It can be shown that of all classifiers, the Optimal
Bayes classifier is the one that will have the lowest probability of miss classifying
an observation, i.e. the lowest probability of error. So if we know the posterior
distribution, then using the Bayes classifier is as good as it gets.

Dr. Ami T. Choksi @CKPCET Machine Learning (3170724) 30

Bayes optimal classifier
● To illustrate the concept, let us assume three hypotheses h1 , h2 , and h3 in the
hypothesis space H. Let the posterior probability of these hypotheses be 0.4, 0.3,
and 0.3, respectively.
● There is a new instance x, which is classified as true by h1, but false by h2 and h3.
● Then the most probable classification of the new instance (x) can be obtained by
combining the predictions of all hypotheses weighed by their corresponding
posterior probabilities.
● By denoting the possible classification of the new instance as c from the set C, the
probability P(ci |T) that the correct classification for the new instance is ci is
● The optimal classification is for which P(ci|T) is maximum, is

● The optimal classification is for which

● P(ci|T) is maximum is,
Dr. Ami T. Choksi @CKPCET Machine Learning (3170724) 31
Bayes optimal classifier
● So, extending the example, the set of possible outcomes for the new instance x is
within the set C = {True, False} and
● P(h1 | T) = 0.4, P(False | h1 ) = 0, P(True | h1 ) = 1
● P(h2 | T) = 0.3, P(False | h2 ) = 1, P(True | h2) = 0
● P(h3 | T) = 0.3, P(False | h3 ) = 1, P(True | h3 ) = 0
● Then,

●
This method maximizes the probability that the new instance is classified
correctly when the available training data, hypothesis space and the prior
probabilities of the hypotheses are known. This is thus also called Bayes optimal
classifier
Dr. Ami T. Choksi @CKPCET Machine Learning (3170724) 32
Next Lecture

Dr. Ami T. Choksi @CKPCET Machine Learning (3170724) 33

Naïve Bayes classifier

Dr. Ami T. Choksi @CKPCET Machine Learning (3170724) 34

Bayes’ Theorem: Basics
■ Total probability Theorem:

■ Bayes’ Theorem:

■ Let X be a data sample (“evidence”): class label is unknown

■ Let H be a hypothesis that X belongs to class C
■ Classification is to determine P(H|X), (i.e., posteriori probability): the probability
that the hypothesis holds given the observed data sample X
■ P(H) (prior probability): the initial probability
■ E.g., X will buy computer, regardless of age, income, …

■ P(X): probability that sample data is observed

■ P(X|H) (likelihood): the probability of observing the sample X, given that the
hypothesis holds
■ E.g., Given that X will buy computer, the prob. that X is 31..40, medium
income
Dr. Ami T. Choksi @CKPCET Machine Learning (3170724)
35
Naïve Bayes classifier

■ A prior probability of hypothesis h or P(h): This is the

probability of an event or hypothesis before the
evidence is observed.
■ 2. A posterior probability of h or P(h|D): This is the
probability of an event after the evidence is observed
within the population D.

Dr. Ami T. Choksi @CKPCET Machine Learning (3170724)

36
Naïve Bayes Classifier
■ A simplified assumption: attributes are conditionally
independent (i.e., no dependence relation between attributes):

■ This greatly reduces the computation cost: Only counts the class
distribution
■ If Ak is categorical, P(xk|Ci) is the # of tuples in Ci having value xk
for Ak divided by |Ci, D| (# of tuples of Ci in D)
■ If Ak is continous-valued, P(xk|Ci) is usually computed based on
Gaussian distribution with a mean μ and standard deviation σ

and P(xk|Ci) is

Dr. Ami T. Choksi @CKPCET Machine Learning (3170724)

37
Naïve Bayes classifier

Dr. Ami T. Choksi @CKPCET Machine Learning (3170724)

38
Naïve Bayes Classifier: Training Dataset

Class:
C1:buys_computer =
‘yes’
C2:buys_computer =
‘no’

Data to be classified: X = (age <=30, Income = medium, Student = yes

Credit_rating = Fair)
Dr. Ami T. Choksi @CKPCET Machine Learning (3170724)
39
Naïve Bayes Classifier: An Example
■ P(Ci): P(buys_computer = “yes”) = 9/14 = 0.643
P(buys_computer = “no”) = 5/14= 0.357
■ Compute P(X|Ci) for each class
P(age = “<=30” | buys_computer = “yes”) = 2/9 = 0.222
P(age = “<= 30” | buys_computer = “no”) = 3/5 = 0.6
P(income = “medium” | buys_computer = “yes”) = 4/9 = 0.444
P(income = “medium” | buys_computer = “no”) = 2/5 = 0.4
P(student = “yes” | buys_computer = “yes) = 6/9 = 0.667
P(student = “yes” | buys_computer = “no”) = 1/5 = 0.2
P(credit_rating = “fair” | buys_computer = “yes”) = 6/9 = 0.667
P(credit_rating = “fair” | buys_computer = “no”) = 2/5 = 0.4
■ X = (age <= 30 , income = medium, student = yes, credit_rating = fair)
P(X|Ci) : P(X|buys_computer = “yes”) = 0.222 x 0.444 x 0.667 x 0.667 = 0.044
P(X|buys_computer = “no”) = 0.6 x 0.4 x 0.2 x 0.4 = 0.019
P(X|Ci)*P(Ci) : P(X|buys_computer = “yes”) * P(buys_computer = “yes”) = 0.028
P(X|buys_computer = “no”) * P(buys_computer = “no”) = 0.007
Therefore, X belongs to class (“buys_computer = yes”)
Dr. Ami T. Choksi @CKPCET Machine Learning (3170724)
40
one more numerical example

Find Fruit={Yellow, Sweet, Long}

Fruit Yellow Sweet Long Total
Mango 350 450 0 650
Banana 400 300 350 400
others 50 100 50 150
Total 800 850 400 1200

Dr. Ami T. Choksi @CKPCET Machine Learning (3170724)

41
One more numerical example

Find Fruit={Yellow, Sweet, Long}

P(A|B) = P(B|A).P(A)/P(B)
(i) trying for mango
P(X|Mango) = P(Y|M)*P(S|M)*P(L|M)
P(Y|M) = P(M|Y).P(Y)/P(M)=0.53
P(S|M)=0.69
P(L|M)=0
P(X|Mango) = P(Y|M)*P(S|M)*P(L|M)=0

Dr. Ami T. Choksi @CKPCET Machine Learning (3170724)

42
One more numerical example

Find Fruit={Yellow, Sweet, Long}

P(A|B) = P(B|A).P(A)/P(B)
(ii) Banana
P(X|Banana) = P(Y|B)*P(S|B)*P(L|B)
P(Y|B) = 1
P(S|B)=0.75
P(L|B)=0.875
P(X|B) = 0.65

Dr. Ami T. Choksi @CKPCET Machine Learning (3170724)

43
One more numerical example
Find Fruit={Yellow, Sweet, Long}
P(A|B) = P(B|A).P(A)/P(B)
(iii)Others
P(Y|O) = 0.33
P(S|O)=0.66
P(L|O)=0.33
P(X|O) = 0.072

P(X|Mango)=0, P(X|B) = 0.65, P(X|O) = 0.072

Maximum probability is for P(X|B). So, Fruit is Banana

Dr. Ami T. Choksi @CKPCET Machine Learning (3170724)

44
Naïve Bayes classifier

Dr. Ami T. Choksi @CKPCET Machine Learning (3170724) 45

Applications of Naïve Bayes classifier

Dr. Ami T. Choksi @CKPCET Machine Learning (3170724) 46

References
1. Why bayes optimal classifier, [Link]
optimal-classifier/

Dr. Ami T. Choksi @CKPCET Machine Learning (3170724) 47

Eee ML U3
No ratings yet
Eee ML U3
24 pages
Bayesian Concept Learning Explained
No ratings yet
Bayesian Concept Learning Explained
66 pages
Bayesian Learning in Machine Learning
No ratings yet
Bayesian Learning in Machine Learning
25 pages
Unit 3 17156657902872
No ratings yet
Unit 3 17156657902872
16 pages
Bayesian Learning in AI and ML
No ratings yet
Bayesian Learning in AI and ML
30 pages
Bayesian Learning in Machine Learning
No ratings yet
Bayesian Learning in Machine Learning
26 pages
Bayesian Concept Learning Overview
No ratings yet
Bayesian Concept Learning Overview
131 pages
Bayesian Learning in Machine Learning
No ratings yet
Bayesian Learning in Machine Learning
60 pages
Bayesian Learning in Machine Learning
No ratings yet
Bayesian Learning in Machine Learning
12 pages
Bayesian Learning in Machine Learning
No ratings yet
Bayesian Learning in Machine Learning
157 pages
Bayesian Learning in Machine Learning
No ratings yet
Bayesian Learning in Machine Learning
65 pages
Bayesian Learning in Machine Learning
No ratings yet
Bayesian Learning in Machine Learning
81 pages
Introduction to Bayesian Learning Theory
No ratings yet
Introduction to Bayesian Learning Theory
178 pages
Bayesian Learning in Machine Learning
No ratings yet
Bayesian Learning in Machine Learning
70 pages
Bayes Theorem in Machine Learning Concepts
No ratings yet
Bayes Theorem in Machine Learning Concepts
49 pages
Bayesian Learning in Machine Learning
No ratings yet
Bayesian Learning in Machine Learning
25 pages
Bayesian Learning in Machine Learning
No ratings yet
Bayesian Learning in Machine Learning
24 pages
.trashed-1771610495-ARTI SEM4 ENG
No ratings yet
.trashed-1771610495-ARTI SEM4 ENG
103 pages
Bayesian Learning in Machine Learning
No ratings yet
Bayesian Learning in Machine Learning
27 pages
Bayesian Learning in Machine Learning
No ratings yet
Bayesian Learning in Machine Learning
199 pages
Bayesian Learning Methods Overview
No ratings yet
Bayesian Learning Methods Overview
44 pages
Bayesian Learning in Machine Learning
No ratings yet
Bayesian Learning in Machine Learning
24 pages
Bayesian Learning in Machine Learning
No ratings yet
Bayesian Learning in Machine Learning
44 pages
Bayesian Learning in Machine Learning
No ratings yet
Bayesian Learning in Machine Learning
16 pages
Bayesian Learning in Machine Learning
No ratings yet
Bayesian Learning in Machine Learning
24 pages
Bayesian Concept Learning Explained
No ratings yet
Bayesian Concept Learning Explained
46 pages
Understanding Bayesian Learning Methods
No ratings yet
Understanding Bayesian Learning Methods
40 pages
CH 5 1 1
No ratings yet
CH 5 1 1
12 pages
Bayesian Learning in Machine Learning
No ratings yet
Bayesian Learning in Machine Learning
24 pages
Bayesian Learning and Hypothesis Testing
No ratings yet
Bayesian Learning and Hypothesis Testing
102 pages
ML Notes Module 4
No ratings yet
ML Notes Module 4
46 pages
Understanding Bayesian Learning Methods
No ratings yet
Understanding Bayesian Learning Methods
130 pages
Machine Learning (BCS602) - Module 4 Notes
No ratings yet
Machine Learning (BCS602) - Module 4 Notes
46 pages
Bayesian Learning in Machine Learning
No ratings yet
Bayesian Learning in Machine Learning
25 pages
Bayesian Concept Learning Overview
No ratings yet
Bayesian Concept Learning Overview
60 pages
Unit 4 ML Shashi
No ratings yet
Unit 4 ML Shashi
40 pages
Understanding Bayesian Learning Methods
No ratings yet
Understanding Bayesian Learning Methods
50 pages
BCS602 Mod4@Azdocuments - in
No ratings yet
BCS602 Mod4@Azdocuments - in
37 pages
Bayesian Learning in Machine Learning
No ratings yet
Bayesian Learning in Machine Learning
41 pages
@vtudeveloper - in ML Mod 4
No ratings yet
@vtudeveloper - in ML Mod 4
11 pages
Bayesian Learning in Machine Learning
No ratings yet
Bayesian Learning in Machine Learning
39 pages
Bayesian Concept Learning in ML
No ratings yet
Bayesian Concept Learning in ML
15 pages
Bayesian Learning in Machine Learning
No ratings yet
Bayesian Learning in Machine Learning
18 pages
Bayesian Learning in Machine Learning
No ratings yet
Bayesian Learning in Machine Learning
51 pages
Bayesian Learning in Machine Learning
No ratings yet
Bayesian Learning in Machine Learning
24 pages
Bayesian Learning in Machine Learning
No ratings yet
Bayesian Learning in Machine Learning
27 pages
Bayesian Learning in Machine Learning
No ratings yet
Bayesian Learning in Machine Learning
30 pages
Machine Learning Techniques Overview
No ratings yet
Machine Learning Techniques Overview
145 pages
Introduction to Machine Learning Concepts
No ratings yet
Introduction to Machine Learning Concepts
128 pages
Bayesian Learning and Hypothesis Evaluation
No ratings yet
Bayesian Learning and Hypothesis Evaluation
36 pages
Understanding Bayesian Learning Concepts
No ratings yet
Understanding Bayesian Learning Concepts
33 pages
Bayesian Learning in Machine Learning
No ratings yet
Bayesian Learning in Machine Learning
14 pages
Bayesian Learning in Machine Learning
No ratings yet
Bayesian Learning in Machine Learning
34 pages
Bayesian Learning in Machine Learning
No ratings yet
Bayesian Learning in Machine Learning
31 pages
Bayesian Learning and Probabilistic Models
No ratings yet
Bayesian Learning and Probabilistic Models
5 pages
AI Neural Networks and Bayesian Learning
No ratings yet
AI Neural Networks and Bayesian Learning
46 pages
Understanding Machine Learning Basics
No ratings yet
Understanding Machine Learning Basics
20 pages
Full Course of Machine Learning
100% (22)
Full Course of Machine Learning
660 pages
50 Challenging Calculus Problems (Fully Solved) - Chris McMullen
100% (20)
50 Challenging Calculus Problems (Fully Solved) - Chris McMullen
236 pages
Hackers Guide To Machine Learning With Python PDF
100% (16)
Hackers Guide To Machine Learning With Python PDF
272 pages
Understanding Machine Learning Concepts
100% (75)
Understanding Machine Learning Concepts
416 pages
Essential Calculus Skills Practice Workbook With Full Solutions
95% (95)
Essential Calculus Skills Practice Workbook With Full Solutions
528 pages
Differential Calculus
95% (21)
Differential Calculus
625 pages
A World Beyond Physics PDF
91% (11)
A World Beyond Physics PDF
169 pages
Linear Algebra Optimization Machine Learning PDF
100% (12)
Linear Algebra Optimization Machine Learning PDF
507 pages
1001 Algebra Problems
95% (75)
1001 Algebra Problems
292 pages
Machine Learning With Python
100% (19)
Machine Learning With Python
692 pages
Deep Learning Algorithms
100% (4)
Deep Learning Algorithms
412 pages
Python For Science and Engineering
100% (15)
Python For Science and Engineering
304 pages
Ordinary Differential Equations 9781498733816 Compress
91% (11)
Ordinary Differential Equations 9781498733816 Compress
864 pages
The Wonder Book of Geometry
91% (22)
The Wonder Book of Geometry
289 pages
Learning The Pandas Library Python Tools For Data Munging Analysis and Visual PDF
100% (19)
Learning The Pandas Library Python Tools For Data Munging Analysis and Visual PDF
208 pages
Bayesian Statistical Methods
100% (11)
Bayesian Statistical Methods
288 pages
The Colossal Book of Mathematics PDF
100% (18)
The Colossal Book of Mathematics PDF
744 pages
Fundamentals of Quantum Computing (2021) (9783030636890) (2021)
100% (10)
Fundamentals of Quantum Computing (2021) (9783030636890) (2021)
480 pages
Deep Learning With Python
100% (12)
Deep Learning With Python
396 pages
Hands On Machine Learning With Python Concepts and Applications For Beginners - John Anderson 2018
91% (11)
Hands On Machine Learning With Python Concepts and Applications For Beginners - John Anderson 2018
166 pages
Python Programming. A Step-by-Step Guide For Absolute Beginners
92% (52)
Python Programming. A Step-by-Step Guide For Absolute Beginners
181 pages
Introduction To Linear Algebra For Science and Engineering 1st Ed
90% (59)
Introduction To Linear Algebra For Science and Engineering 1st Ed
550 pages
All The Math You Missed - But Need To Know For Graduate School
100% (36)
All The Math You Missed - But Need To Know For Graduate School
417 pages
Python Machine Learning Projects Guide
100% (17)
Python Machine Learning Projects Guide
135 pages
Python Machine Learning For Beginners Ebook Final
100% (11)
Python Machine Learning For Beginners Ebook Final
305 pages
Calculus Better Explained PDF
94% (16)
Calculus Better Explained PDF
76 pages
Why Machines Learn: The Math Behind AI
67% (3)
Why Machines Learn: The Math Behind AI
151 pages
Mathematics For Engineers I Basic Calculus
100% (14)
Mathematics For Engineers I Basic Calculus
403 pages
The Art of Problem Solving Intermediate Algebra
88% (41)
The Art of Problem Solving Intermediate Algebra
720 pages
Python 3 Cheat Sheet
94% (51)
Python 3 Cheat Sheet
2 pages
PolyInfo Database for ML in Polymers
No ratings yet
PolyInfo Database for ML in Polymers
19 pages
SYN Flood Detection with ML & SNORT
No ratings yet
SYN Flood Detection with ML & SNORT
26 pages
IoT & AI for Vehicle Emission Prediction
No ratings yet
IoT & AI for Vehicle Emission Prediction
19 pages
Integrating Two-Tier Optimization Algorithm With Convolutional Bi-LSTM Model For Robust Anomaly Detection in Autonomous Vehicles
No ratings yet
Integrating Two-Tier Optimization Algorithm With Convolutional Bi-LSTM Model For Robust Anomaly Detection in Autonomous Vehicles
14 pages
Machine Learning Question Bank 2024-25
No ratings yet
Machine Learning Question Bank 2024-25
16 pages
Empirical Study of Bagging and Boosting
No ratings yet
Empirical Study of Bagging and Boosting
30 pages
Brain Stroke Detection with ML Techniques
No ratings yet
Brain Stroke Detection with ML Techniques
25 pages
AI in Criminal Justice: A Systematic Review
No ratings yet
AI in Criminal Justice: A Systematic Review
20 pages
Machine Learning Course Report
No ratings yet
Machine Learning Course Report
22 pages
AI-Driven Power Electronics Design Innovations
No ratings yet
AI-Driven Power Electronics Design Innovations
29 pages
ECG Arrhythmia Detection with Ensemble Learning
No ratings yet
ECG Arrhythmia Detection with Ensemble Learning
6 pages
Network Intrusion Detection Techniques
No ratings yet
Network Intrusion Detection Techniques
88 pages
Data Analytics: Machine Learning Insights
No ratings yet
Data Analytics: Machine Learning Insights
22 pages
Sentiment Analysis Using Naive Bayes
No ratings yet
Sentiment Analysis Using Naive Bayes
28 pages
Hybrid ML Model for Fair Credit Decisions
No ratings yet
Hybrid ML Model for Fair Credit Decisions
21 pages
Crop Yield Prediction Framework
No ratings yet
Crop Yield Prediction Framework
10 pages
Responsible AI Insights for Data Scientists
No ratings yet
Responsible AI Insights for Data Scientists
3 pages
Predicting Student CGPA with Random Forest
No ratings yet
Predicting Student CGPA with Random Forest
17 pages
Machine Learning for Credit Card Fraud Detection
No ratings yet
Machine Learning for Credit Card Fraud Detection
9 pages
Data Science with Python Guide
No ratings yet
Data Science with Python Guide
149 pages
Ensemble Classifier Techniques Explained
No ratings yet
Ensemble Classifier Techniques Explained
43 pages
Credit Card Fraud Detection via Stacking
No ratings yet
Credit Card Fraud Detection via Stacking
4 pages
DL Unit-2
No ratings yet
DL Unit-2
20 pages
AI Learning Methods Overview
No ratings yet
AI Learning Methods Overview
8 pages
Machine Learning Weather Prediction Model
No ratings yet
Machine Learning Weather Prediction Model
30 pages
2023-2024 Machine Learning Project Titles
No ratings yet
2023-2024 Machine Learning Project Titles
8 pages
Dashboard for Classifier Metrics Analysis
No ratings yet
Dashboard for Classifier Metrics Analysis
7 pages
Jait 2023 0167
No ratings yet
Jait 2023 0167
7 pages
Project of Anandadeep Bala
No ratings yet
Project of Anandadeep Bala
23 pages
Classification & Regression Question Bank
No ratings yet
Classification & Regression Question Bank
180 pages

Importance of Bayesian Methods in ML

Uploaded by

Importance of Bayesian Methods in ML

Uploaded by

6.

Bayesian Concept Learning

Dr. Ami T. Choksi @CKPCET Machine Learning (3170724)

Dr. Ami T. Choksi @CKPCET Machine Learning (3170724) 2

Dr. Ami T. Choksi @CKPCET Machine Learning (3170724) 3

Dr. Ami T. Choksi @CKPCET Machine Learning (3170724) 6

Dr. Ami T. Choksi @CKPCET Machine Learning (3170724) 7

Dr. Ami T. Choksi @CKPCET Machine Learning (3170724) 8

Shape: round Shape: round

Dr. Ami T. Choksi @CKPCET Machine Learning (3170724) 9

Dr. Ami T. Choksi @CKPCET Machine Learning (3170724) 11

Dr. Ami T. Choksi @CKPCET Machine Learning (3170724) 12

Dr. Ami T. Choksi @CKPCET Machine Learning (3170724) 13

Dr. Ami T. Choksi @CKPCET Machine Learning (3170724) 14

Dr. Ami T. Choksi @CKPCET Machine Learning (3170724) 15

● In certain machine learning problems, we can further simplify above equation. if

Dr. Ami T. Choksi @CKPCET Machine Learning (3170724) 16

Dr. Ami T. Choksi @CKPCET Machine Learning (3170724) 17

Dr. Ami T. Choksi @CKPCET Machine Learning (3170724) 18

Dr. Ami T. Choksi @CKPCET Machine Learning (3170724) 19

Dr. Ami T. Choksi @CKPCET Machine Learning (3170724) 20

Dr. Ami T. Choksi @CKPCET Machine Learning (3170724) 21

Dr. Ami T. Choksi @CKPCET Machine Learning (3170724) 22

Dr. Ami T. Choksi @CKPCET Machine Learning (3170724) 23

Dr. Ami T. Choksi @CKPCET Machine Learning (3170724) 24

● Using Bayes’ theorem to identify the posterior probability

Dr. Ami T. Choksi @CKPCET Machine Learning (3170724) 26

● Now, if we define a subset of the

Dr. Ami T. Choksi @CKPCET Machine Learning (3170724) 27

Dr. Ami T. Choksi @CKPCET Machine Learning (3170724) 28

Dr. Ami T. Choksi @CKPCET Machine Learning (3170724) 30

● The optimal classification is for which

Dr. Ami T. Choksi @CKPCET Machine Learning (3170724) 33

Dr. Ami T. Choksi @CKPCET Machine Learning (3170724) 34

■ Let X be a data sample (“evidence”): class label is unknown

■ P(X): probability that sample data is observed

■ A prior probability of hypothesis h or P(h): This is the

Dr. Ami T. Choksi @CKPCET Machine Learning (3170724)

Dr. Ami T. Choksi @CKPCET Machine Learning (3170724)

Dr. Ami T. Choksi @CKPCET Machine Learning (3170724)

Data to be classified: X = (age <=30, Income = medium, Student = yes

Find Fruit={Yellow, Sweet, Long}

Dr. Ami T. Choksi @CKPCET Machine Learning (3170724)

Find Fruit={Yellow, Sweet, Long}

Dr. Ami T. Choksi @CKPCET Machine Learning (3170724)

Find Fruit={Yellow, Sweet, Long}

Dr. Ami T. Choksi @CKPCET Machine Learning (3170724)

P(X|Mango)=0, P(X|B) = 0.65, P(X|O) = 0.072

Dr. Ami T. Choksi @CKPCET Machine Learning (3170724)

Dr. Ami T. Choksi @CKPCET Machine Learning (3170724) 45

Dr. Ami T. Choksi @CKPCET Machine Learning (3170724) 46

Dr. Ami T. Choksi @CKPCET Machine Learning (3170724) 47

You might also like