0% found this document useful (0 votes)

5 views52 pages

Understanding Bayesian Learning Methods

Bayesian Learning methods in machine learning utilize explicit probabilities for hypotheses, making them practical for various learning problems, such as classification. The Naïve Bayes Classifier, a supervised learning algorithm based on Bayes' theorem, assumes feature independence and is effective in text classification. However, it faces challenges like the zero-frequency problem and the violation of independence assumptions, which can be addressed through techniques like Laplace smoothing.

Uploaded by

siddhantbhagat002

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

5 views52 pages

Understanding Bayesian Learning Methods

Uploaded by

siddhantbhagat002

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

Bayesian Learning

Why using Bayesian Learning methods in Machine

Learning:-

- Bayesian learning algorithms calculate explicit

probabilities for hypothesis.

- These methods are among most practical approaches to

certain types of leaning problems
Eg:- Bayes classifier - calculate probability of hypothesis.

- They provide useful perspective for understanding many

learning algorithms that do not explicitly manipulate
probabilities.
Features of Bayesian Learning
• Instead of eliminating hypothesis, it will incrementally
increase or decrease the estimated probability that the
hypothesis is correct.

• This method provides hypothesis that makes probabilistic

predictions.

• New instances can be classified by combining the

predictions of multiple hypothesis, weighted by their
probabilities.
Features of Bayesian Learning
• When Bayesian method proves computationally complex,
they provide decision making against which other practical
methods can be measured.

• Prior knowledge can be combined with observed data to

determine the final probability of a hypothesis.

• Bayesian methods can accommodate hypotheses that make

probabilistic predictions
Issues in Bayesian Methods

1. It require initial knowledge of many probabilities.

If it is not known in advance, they are estimated based on

background knowledge previously available data, and
assumptions about the form of the underlying distribution.

2. Computational cost is high to determine Bayes optimal

hypothesis.
- in some special cases, it can be subsequently reduced.
What is Bayes Theorem

• In probability theory, it relates the conditional probability and the

marginal probabilities of two events.

• It determines the conditional probability of an event A given that event

B has already occurred.

• Bayes Theorem calculates the probability based on the hypothesis.

• Bayes Theorem allows to update the predicted probabilities of an event

by incorporating new information.
Bayes Theorem
• Determines the best hypothesis from some space ‘H’ which
generates highest probability score, with the given observed
training data and the initial knowledge about the dataset.

• Calculates the probability of a hypothesis based on its prior

probability, the probabilities of observing various data points
given the hypothesis and the observed data itself.
Bayes Theorem
• Given a hypothesis h and data D which bears on the hypothesis:
P ( D | h) P ( h)
P(h | D) =
P( D)
• P(h): independent probability of h: prior probability (The probability
“h” being true. This is knowledge before we observed the training
data. )

• P(D): independent probability of D, Marginal Probability, the

probability of evidence. (The probability “D” being true.)

• P(D|h): conditional probability of D given h: likelihood (prob. of

observing data D in which hypothesis “h” holds.)

• P(h|D): conditional probability of h given D: posterior probability,

(The prob. of “h” being true, after we have seen training data “D”.)
Maximum A Posterior Hypothesis
• Given some set of candidate hypothesis H and is interested in
finding most probable hypothesis h ∈ H
• Given the observed data D.
• Any such maximally probable hypothesis is called MAP
Maximum A Posterior Hypothesis
• Maximally probable hypothesis is called a maximum a
posteriori (MAP) hypothesis.
• Based on Bayes Theorem, we can compute the Maximum A
Posterior (MAP) hypothesis for the data.
• We are interested in the best hypothesis for some space H given
observed training data D.
hMAP  argmax P( h | D )
hH

P ( D | h) P ( h)
= argmax
hH P( D)
= argmax P ( D | h) P (h)
hH

P(D) is dropped as it is constant.

H: set of all hypothesis.
Maximum Likelihood

• A special case of MAP where all the candidate hypothesis have

same probability

• Now assume that all hypotheses are equally probable a priori,

i.e., P(hi ) = P(hj ) for all hi, hj belong to H.

• This is called assuming a uniform prior. It simplifies

computing the posterior:
hML = arg max P( D | h)
hH

• We dropped P(h) as it is constant.

• This hypothesis that maximizes P(D|h) is called the maximum
likelihood hypothesis.
Desirable Properties of Bayes Classifier

• Incrementality: with each training example, the prior and the

likelihood can be updated dynamically: flexible and robust to
errors.

• Combines prior knowledge and observed data: prior

probability of a hypothesis multiplied with probability of the
hypothesis given the training data.

• Probabilistic hypothesis: outputs not only a classification, but

a probability distribution over all classes
Naïve Bayes Classifier Algorithm

• It is a supervised learning algorithm, which is based on Bayes

theorem and used for solving classification problems.

• It is mainly used in text classification that includes a high-

dimensional training dataset.

• It is a probabilistic classifier, which means it predicts on

the basis of the probability of an object.
Naïve Bayes Classifier Algorithm

The Naïve Bayes algorithm is comprised of two words Naïve and

Bayes, Which can be described as:
• Naïve: It is called Naïve because it assumes that the
occurrence of a certain feature is independent of the
occurrence of other features.
• Bayes: It is called Bayes because it depends on the principle of
Bayes Theorem.
Naïve Bayes Classifier Example
Suppose we have a
dataset of animal details
and corresponding target
variable “Pet". Classify
pet and non-pet animals
using the given feature-
set.
Naïve Bayes Classifier Example
Assumptions of Naive Bayes
• All the variables are independent.
That is if the animal is Dog that doesn’t mean that “Size”
will be only Medium or Small.

• All the predictors have an equal effect on the outcome.

That is, the animal being dog does not have more
importance in deciding if we can pet him or not. All the
features have equal importance.
Naïve Bayes Classifier Example
So to solve this problem, we need to follow the below steps:
• Convert the given dataset into frequency tables.
• Generate Likelihood table by finding the probabilities of
given features.
• Now, use Bayes theorem to calculate the posterior
probability.
Naïve Bayes Classifier Example
We need to find P(xi|yj) for each xi in X and each yj in Y.
Naïve Bayes Classifier Example
We need to find P(xi|yj) for each xi in X and each yj in Y.
Naïve Bayes Classifier Example
We also need the probabilities (P(y)), which are calculated in
the table below. For example, P(Pet Animal = NO) = 6/14.
Naïve Bayes Classifier Example
Now if test data is = (Cow, Medium, Black)

Probability of petting an animal :

And the probability of not petting an animal:

Naïve Bayes Classifier Example
As per the axioms of probability theory
P(Yes|Test)+P(No|test) = 1
So, we will normalize the result:

We see here that P(Yes|Test) > P(No|Test), so the prediction that

we can pet this animal is “Yes”.
Advantages of Naïve Bayes Classifier

• It doesn’t require larger amounts of training data.

• It can be used for Binary as well as Multi-class Classifications.
• It performs well in Multi-class predictions as compared to the
other Algorithms.
• Convergence is quicker than other models, which are
discriminative.
• It is highly scalable with several data points and predictors.
• It can handle both continuous and categorical data.
• It is not sensitive to irrelevant data and doesn’t follow the
assumptions it holds.
Disadvantages of Naïve Bayes Classifier

• The Naive Bayes Algorithm has trouble with the ‘zero-frequency

problem’. It happens when you assign zero probability for
categorical variables in the training dataset that is not available.
When you use a smooth method for overcoming this problem,
you can make it work the best.

• It assumes that all features are independent or unrelated, so it

cannot learn the relationship between features.

• Dependencies among the features can not be modeled by Naive

Bayesian Classifier
Relevant Issues

• Violation of Independence Assumption

• Zero conditional probability problem
Violation of Independence Assumption
• Naive Bayesian classifiers assume that the effect of an
attribute value on a given class is independent of the values
of the other attributes.

• This assumption is called class conditional independence.

• It is made to simplify the computations involved and, in this

sense, is considered “naive.”
Improvement
• Bayesian belief network are graphical models, which unlike
naive Bayesian classifiers, allow the representation of
dependencies among subsets of attributes.

• Bayesian belief networks can also be used for classification.

Zero Conditional Probability Problem
• If a given class and feature value never occur together in the
training set then the frequency-based probability estimate will
be zero.

• This is problematic since it will wipe out all information in

the other probabilities when they are multiplied.

• It is therefore often desirable to incorporate a small-sample

correction in all probability estimates such that no probability
is ever set to be exactly zero.
Correction
• To eliminate zeros, we use add-one or Laplace smoothing,
which simply adds 1 to each count.
Example
Suppose,
• For the class Buys computer ‘D’ (yes) in some training database,
D, containing 1000 tuples.
• we have 0 tuples with income D low,
• 990 tuples with income D medium, and
• 10 tuples with income D high.
• The probabilities of these events, without the Laplacian correction,
are 0, 0.990 (from 990/1000), and 0.010 (from 10/1000),
respectively.
Example
• Using the Laplacian correction for the three quantities, we pretend
that we have 1 more tuple for each income-value pair. In this way,
we instead obtain the following probabilities:

• The “corrected” probability estimates are close to their

“uncorrected” counterparts, yet the zero probability value is
avoided.
Types of Naïve Bayes Model
Types of Naïve Bayes Model

Optimal Naive Bayes

• Optimal Naive Bayes selects the class that has the greatest posterior probability of
happenings.
• As per the name, it is optimal. But it will go through all the possibilities, which is very
slow and time-consuming.

Gaussian Naive Bayes

• It is a straightforward algorithm used when the attributes are continuous.
• The attributes present in the data should follow the rule of Gaussian distribution or
normal distribution.
Types of Naïve Bayes Model

Bernoulli Naive Bayes

• Bernoulli Naive Bayes is an algorithm that is useful for data that has binary or boolean
attributes.
• The attributes will have a value of yes or no, useful or not, granted or rejected, etc.
• It is more popular for document classification.

Multinomial Naive Bayes

• The Multinomial Naïve Bayes classifier is used when the data is multinomial
distributed.
• It is primarily used for document classification problems.
• The features needed for this type are the frequency of the words converted from the
document.
Application of Naïve Bayes Classifier
Bayesian Network
"A Bayesian network is a probabilistic graphical model
which represents a set of variables and their conditional
dependencies using a directed acyclic graph.“

It simplifies the representation of probabilistic relationships

between random variables.

It is also called a Bayes network, belief network, decision

network, or Bayesian model
Bayesian Network
• By representing conditional dependence by edges in a directed
graph, they seek to model conditional dependence.

• The relationships help to conduct inference on random variables in

the graph.

• These networks satisfy the local Markov property which allows to

simplify the joint distribution to a smaller form. It helps to minimize
the amount of computation needed in bigger networks.

• The task of prediction is about calculating a probability distribution

over one or more variables whose values we want to know, with the
prior information or evidence about some other variables.
Bayesian Network
How to create a Bayesian Network?
To create a Bayesian network, following things are required to be
defined-

1. Define the variables that exist in the problem that are required to
be solved and identify the main variable.

2. Define the conditional relationships between all the variables,

i.e. the structure of the network.

3. Figure out the probability distributions for each variable, i.e., the
probability rules for the relationships between variables.
Example:
Harry installed a new burglar alarm at his home to detect burglary. The
alarm reliably responds at detecting a burglary but also responds for
minor earthquakes. Harry has two neighbors David and Sophia, who
have taken a responsibility to inform Harry at work when they hear the
alarm. David always calls Harry when he hears the alarm, but
sometimes he get confused with the phone ringing and calls at that
time too. On the other hand, Sophia likes to listen to high music, so
sometimes she misses to hear the alarm. Here we would like to
compute the probability of Burglary Alarm.

Problem:
Calculate the probability that alarm has sounded, but there is neither a
burglary, nor an earthquake occurred, and David and Sophia both called
the Harry.
Solution:
The Bayesian network for the above problem is given below.

• The network structure is showing that burglary and earthquake

is the parent node of the alarm and directly affecting the
probability of alarm's going off, but David and Sophia's calls
depend on alarm probability.

• The network is representing that our assumptions do not directly

perceive the burglary and also do not notice the minor earthquake,
and they also not discuss before calling.
List of all events occurring in this network:

Burglary (B)
Earthquake(E)
Alarm(A)
David Calls(D)
Sophia calls(S)
What are Bayesian networks used for?

1. Medical diagnosis
They can be used to figure out the probable disease that a patient is
suffering from, based on the symptoms that are identified. A doctor can
note the symptoms that are observed and enter them into the program
which would compute the probabilities of multiple diseases based on the
symptoms that were identified.

2. Testing hypotheses
Bayesian networks help in understanding the causal relationships between
various features. It determines whether the effect of a new feature is
desirable.
What are Bayesian networks used for?

3. Environmental modeling
These networks can be used to model animal population trends.
Environmental stressors have a lot of attention paid to them here.

4. Forecasting traffic
Bayesian networks can be used to forecast traffic flows & learn from
them.
Markov Model

Markov Chain models:

• A Markov chain is a model that tells us something about the probabilities of
sequences of random states/variables.
• A Markov chain makes a very strong assumption that if we want to predict the
future in the sequence, all that matters is the current state.
• All the states before the current state have no impact on the future except via the
current state.

• A finite state machine with probabilistic state transitions.

• Makes Markov assumption that next state only depends on the
current state and independent of previous history.

44
Markov Model

Markov Chain models:

• Below are the specified components of Markov Chains :

45
Markov Model

Markov Chain models:

Say you have a sequence. Something like this:

Sunny, Rainy, Cloudy, Cloudy, Sunny, Sunny, Sunny, Rainy

So, the weather for any given day can be in any of the three states.

46
Markov Model

Now using the data that we have, we can construct the following
state diagram with the labelled probabilities.

47
Markov Model

In order to compute the probability of any given day’s weather

given N previous observations, we will use the Markovian
Property.

The Markov property suggests that the distribution for a random

variable in the future depends solely only on its distribution in the
current state, and none of the previous states have any impact on
the future states.
48
What are Hidden Markov Models?
• A Hidden Markov Model (HMM) is a probabilistic model that consists
of a sequence of hidden states, each of which generates an observation.
• The hidden states are usually not directly observable, and the goal of
HMM is to estimate the sequence of hidden states based on a sequence of
observations.
• HMM is a statistical model in which the system being modeled are
Markov processes with unobserved or hidden states.
• Markov process is a Memoryless process in which the past and future
states are independent
• Markov assumption is the assumption that a hidden variable is dependent
only on the previous hidden state.
• Mathematically, the probability of being in a state at a time t depends only
on the state at the time (t-1).
What are Hidden Markov Models?
A hidden Markov model consists of five important components:
1. Initial probability distribution: An initial probability distribution over
states. The initialization distribution defines each hidden variable in its
initial condition at time t=0 (the initial hidden state).
2. One or more hidden states

3. Transition probability distribution: A transition probability matrix

where each aij represents the probability of moving from state i to state j.
The transition matrix is used to show the hidden state to hidden state
transition probabilities.

4. A sequence of observations

5. Emission probabilities: A sequence of observation likelihoods, also

called emission probabilities, each expressing the probability of an
observation Oi being generated from a state I.
HMM Example-
Markov Networks Vs Bayesian Networks
• Bayesian networks are probabilistic graphical models that represent
sets of random variables and their conditional dependencies through
the means of directed acyclic graphs (DAGs).

• A Markov network, a undirected graphical model is a set of random

variables that has a Markov property described by an undirected
graph.

• Bayesian networks are directed and acyclic, whereas Markov

networks are undirected and could be cyclic. I.e. Markov networks
also represent cyclic dependencies

Statistical Learning: Bayesian Models Overview
No ratings yet
Statistical Learning: Bayesian Models Overview
33 pages
Naive Bayes Classifier Overview
No ratings yet
Naive Bayes Classifier Overview
15 pages
Understanding Bayesian Learning Methods
No ratings yet
Understanding Bayesian Learning Methods
10 pages
Understanding Bayesian Learning Methods
No ratings yet
Understanding Bayesian Learning Methods
42 pages
2020 4 Naive Bayesian Classifier-819591-16741256178947
No ratings yet
2020 4 Naive Bayesian Classifier-819591-16741256178947
30 pages
Naive Bayes Classifier Overview
No ratings yet
Naive Bayes Classifier Overview
38 pages
Bayesian Learning and Decision Theory
No ratings yet
Bayesian Learning and Decision Theory
11 pages
Naïve Bayes Classifier Overview
No ratings yet
Naïve Bayes Classifier Overview
31 pages
Advanced Machine Learning Notes
No ratings yet
Advanced Machine Learning Notes
9 pages
Understanding Supervised Learning Basics
No ratings yet
Understanding Supervised Learning Basics
28 pages
Bayesian Concept Learning Overview
No ratings yet
Bayesian Concept Learning Overview
40 pages
Naïve Bayes Classifier Overview
No ratings yet
Naïve Bayes Classifier Overview
24 pages
Understanding Bayes' Theorem in Classification
No ratings yet
Understanding Bayes' Theorem in Classification
37 pages
Naive Bayesian Classification Explained
No ratings yet
Naive Bayesian Classification Explained
48 pages
Bayesian Learning and Naïve Bayes Classifier
No ratings yet
Bayesian Learning and Naïve Bayes Classifier
19 pages
Program 8 1
No ratings yet
Program 8 1
7 pages
Naïve Bayes Classifier Overview
No ratings yet
Naïve Bayes Classifier Overview
18 pages
ML Lecture Slides 4
No ratings yet
ML Lecture Slides 4
25 pages
Naive Bayes Algorithm Overview
No ratings yet
Naive Bayes Algorithm Overview
11 pages
Naive Bayes Classifier Overview
No ratings yet
Naive Bayes Classifier Overview
24 pages
Supervised Learning: Naïve Bayes & kNN
No ratings yet
Supervised Learning: Naïve Bayes & kNN
32 pages
Ensemble Learning and Naive Bayes Overview
No ratings yet
Ensemble Learning and Naive Bayes Overview
36 pages
Bayesian Learning in Machine Learning
No ratings yet
Bayesian Learning in Machine Learning
5 pages
Bayesian Learning Algorithms Explained
No ratings yet
Bayesian Learning Algorithms Explained
54 pages
Naive Bayes Classifier Explained
No ratings yet
Naive Bayes Classifier Explained
9 pages
Naïve Bayes Classifier Explained
No ratings yet
Naïve Bayes Classifier Explained
11 pages
Bayesian Concept Learning Overview
No ratings yet
Bayesian Concept Learning Overview
40 pages
AtrayeeDutta PM
No ratings yet
AtrayeeDutta PM
5 pages
Bayesian Classification Overview
No ratings yet
Bayesian Classification Overview
66 pages
Bayesian Learning in Machine Learning
No ratings yet
Bayesian Learning in Machine Learning
47 pages
Naive Bayes Classification Overview
No ratings yet
Naive Bayes Classification Overview
21 pages
Complete DKV PPTs 367 388
No ratings yet
Complete DKV PPTs 367 388
22 pages
Naïve Bayes Classifier Overview
No ratings yet
Naïve Bayes Classifier Overview
16 pages
Bayesian Classification in Data Mining
No ratings yet
Bayesian Classification in Data Mining
46 pages
Bayesian Learning and Inference Techniques
No ratings yet
Bayesian Learning and Inference Techniques
26 pages
Bayesian Methods in Machine Learning
No ratings yet
Bayesian Methods in Machine Learning
36 pages
Machine Learning: Dr. Mohamed Hussein
No ratings yet
Machine Learning: Dr. Mohamed Hussein
41 pages
Naïve Bayes Classifier Overview
No ratings yet
Naïve Bayes Classifier Overview
19 pages
Understanding Bayes Theorem and Applications
No ratings yet
Understanding Bayes Theorem and Applications
10 pages
Naïve Bayes Algorithm Explained
No ratings yet
Naïve Bayes Algorithm Explained
19 pages
21-25 ML Assignment NB
No ratings yet
21-25 ML Assignment NB
37 pages
Lecture 14
No ratings yet
Lecture 14
10 pages
Naive Bayes Classifier Explained
No ratings yet
Naive Bayes Classifier Explained
79 pages
Understanding Naïve Bayes Classifiers
No ratings yet
Understanding Naïve Bayes Classifiers
6 pages
Naive Bayes in Data Mining
No ratings yet
Naive Bayes in Data Mining
31 pages
Naive Bayes Algorithm Course Overview
No ratings yet
Naive Bayes Algorithm Course Overview
8 pages
Naïve Bayes Classifier Overview
No ratings yet
Naïve Bayes Classifier Overview
38 pages
Naïve Bayes Classifiers Overview
No ratings yet
Naïve Bayes Classifiers Overview
47 pages
Naive Bayes Classifier Overview
No ratings yet
Naive Bayes Classifier Overview
13 pages
Laplace Smoothing in Naive Bayes
No ratings yet
Laplace Smoothing in Naive Bayes
79 pages
Probabilistic Classification Methods
No ratings yet
Probabilistic Classification Methods
78 pages
Trisha Chakrabarti - 11700223119 - Data Mining - Ca2
No ratings yet
Trisha Chakrabarti - 11700223119 - Data Mining - Ca2
6 pages
Naïve Bayes Classifier Overview
No ratings yet
Naïve Bayes Classifier Overview
29 pages
Supervised Learning: Naïve Bayes & kNN
No ratings yet
Supervised Learning: Naïve Bayes & kNN
29 pages
Introduction to Supervised Learning
No ratings yet
Introduction to Supervised Learning
54 pages
Arko Pal
No ratings yet
Arko Pal
6 pages
Bayes Theorem in Machine Learning Concepts
No ratings yet
Bayes Theorem in Machine Learning Concepts
49 pages
Understanding Bayesian Classification Techniques
No ratings yet
Understanding Bayesian Classification Techniques
15 pages
Naive Bayes Classifier Overview
No ratings yet
Naive Bayes Classifier Overview
7 pages
Microcontroller Engineering: Prof. Si Hyun Lee
No ratings yet
Microcontroller Engineering: Prof. Si Hyun Lee
64 pages
PUC II Physics Blueprint for 2026
No ratings yet
PUC II Physics Blueprint for 2026
14 pages
Addition & Subtraction of Fractions Worksheet
No ratings yet
Addition & Subtraction of Fractions Worksheet
1 page
EndNote Menus Reference Guide 06-25-18
No ratings yet
EndNote Menus Reference Guide 06-25-18
16 pages
Laboratory Guide: Specific Gravity Experiment
No ratings yet
Laboratory Guide: Specific Gravity Experiment
41 pages
PW-39 Postweld Heat Treatment Guide
No ratings yet
PW-39 Postweld Heat Treatment Guide
1 page
Iq/Oq: Iq/Oq For Vaisala Viewlinc Monitoring System Page 1 of 161
No ratings yet
Iq/Oq: Iq/Oq For Vaisala Viewlinc Monitoring System Page 1 of 161
23 pages
EAPL Price List 2025 Overview
No ratings yet
EAPL Price List 2025 Overview
29 pages
Microbial Size Measurement in Micrometers
No ratings yet
Microbial Size Measurement in Micrometers
45 pages
Detailed Project Report for NH-244 Tunnels
No ratings yet
Detailed Project Report for NH-244 Tunnels
146 pages
Well Containment Screening Tool Overview
No ratings yet
Well Containment Screening Tool Overview
28 pages
Oxidation Numbers of Metals and Nonmetals
No ratings yet
Oxidation Numbers of Metals and Nonmetals
1 page
Kassing Trading System Explained
No ratings yet
Kassing Trading System Explained
7 pages
Recrystallization of Benzoic Acid
No ratings yet
Recrystallization of Benzoic Acid
22 pages
Greedy Algorithms for Optimization Problems
No ratings yet
Greedy Algorithms for Optimization Problems
35 pages
Universal Seat Heater Installation Guide
No ratings yet
Universal Seat Heater Installation Guide
8 pages
E89382 Hannstar J MV-4 94V-0 Schematics
71% (7)
E89382 Hannstar J MV-4 94V-0 Schematics
3 pages
Comprehensive Guide to Sulphur Properties
No ratings yet
Comprehensive Guide to Sulphur Properties
52 pages
Overview of Generative Adversarial Networks
100% (1)
Overview of Generative Adversarial Networks
14 pages
Class 6 Syllabus Outline 2024-2025
No ratings yet
Class 6 Syllabus Outline 2024-2025
6 pages
B-Tree of Order 5: Structure & Operations
No ratings yet
B-Tree of Order 5: Structure & Operations
28 pages
Science 8 Second Periodic Test Specs
No ratings yet
Science 8 Second Periodic Test Specs
2 pages
EMA Ribbon Scalping Strategy
No ratings yet
EMA Ribbon Scalping Strategy
5 pages
STAT3003 Applied Statistics Overview
No ratings yet
STAT3003 Applied Statistics Overview
9 pages
Utility and Topographic Surveys in Longlands
No ratings yet
Utility and Topographic Surveys in Longlands
89 pages
AVR250 Service Manual Overview
No ratings yet
AVR250 Service Manual Overview
87 pages
Form 3 Mathematics Exam Paper 2025
No ratings yet
Form 3 Mathematics Exam Paper 2025
15 pages
SIMATIC Device Drivers Overview
No ratings yet
SIMATIC Device Drivers Overview
2 pages
RM
No ratings yet
RM
38 pages
Beer-Lambert Law Overview and Derivation
No ratings yet
Beer-Lambert Law Overview and Derivation
3 pages

Understanding Bayesian Learning Methods

Uploaded by

Understanding Bayesian Learning Methods

Uploaded by

Bayesian Learning

Why using Bayesian Learning methods in Machine

- Bayesian learning algorithms calculate explicit

- These methods are among most practical approaches to

- They provide useful perspective for understanding many

• This method provides hypothesis that makes probabilistic

• New instances can be classified by combining the

• Prior knowledge can be combined with observed data to

• Bayesian methods can accommodate hypotheses that make

1. It require initial knowledge of many probabilities.

If it is not known in advance, they are estimated based on

2. Computational cost is high to determine Bayes optimal

• In probability theory, it relates the conditional probability and the

• It determines the conditional probability of an event A given that event

• Bayes Theorem calculates the probability based on the hypothesis.

• Bayes Theorem allows to update the predicted probabilities of an event

• Calculates the probability of a hypothesis based on its prior

• P(D): independent probability of D, Marginal Probability, the

• P(D|h): conditional probability of D given h: likelihood (prob. of

• P(h|D): conditional probability of h given D: posterior probability,

P(D) is dropped as it is constant.

• A special case of MAP where all the candidate hypothesis have

• Now assume that all hypotheses are equally probable a priori,

• This is called assuming a uniform prior. It simplifies

• We dropped P(h) as it is constant.

• Incrementality: with each training example, the prior and the

• Combines prior knowledge and observed data: prior

• Probabilistic hypothesis: outputs not only a classification, but

• It is a supervised learning algorithm, which is based on Bayes

• It is mainly used in text classification that includes a high-

• It is a probabilistic classifier, which means it predicts on

The Naïve Bayes algorithm is comprised of two words Naïve and

• All the predictors have an equal effect on the outcome.

Probability of petting an animal :

And the probability of not petting an animal:

We see here that P(Yes|Test) > P(No|Test), so the prediction that

• It doesn’t require larger amounts of training data.

• The Naive Bayes Algorithm has trouble with the ‘zero-frequency

• It assumes that all features are independent or unrelated, so it

• Dependencies among the features can not be modeled by Naive

• Violation of Independence Assumption

• This assumption is called class conditional independence.

• It is made to simplify the computations involved and, in this

• Bayesian belief networks can also be used for classification.

• This is problematic since it will wipe out all information in

• It is therefore often desirable to incorporate a small-sample

• The “corrected” probability estimates are close to their

Optimal Naive Bayes

Gaussian Naive Bayes

Bernoulli Naive Bayes

Multinomial Naive Bayes

It simplifies the representation of probabilistic relationships

It is also called a Bayes network, belief network, decision

• The relationships help to conduct inference on random variables in

• These networks satisfy the local Markov property which allows to

• The task of prediction is about calculating a probability distribution

2. Define the conditional relationships between all the variables,

• The network structure is showing that burglary and earthquake

• The network is representing that our assumptions do not directly

Markov Chain models:

• A finite state machine with probabilistic state transitions.

Markov Chain models:

Markov Chain models:

Sunny, Rainy, Cloudy, Cloudy, Sunny, Sunny, Sunny, Rainy

In order to compute the probability of any given day’s weather

The Markov property suggests that the distribution for a random

3. Transition probability distribution: A transition probability matrix

5. Emission probabilities: A sequence of observation likelihoods, also

• A Markov network, a undirected graphical model is a set of random

• Bayesian networks are directed and acyclic, whereas Markov

You might also like