Bayesian Inference
Math & Stat for Data Science
Graduate School of Data Science
Seoul National University
Bayesian Inference
• Up to now, statistical methods we have discussed are
frequentist (or classical) method
• Frequentist are based on followings:
• Probability: limiting relative frequencies.
• Probabilities are objective properties of the real world.
• Parameters are fixed, unknown constants.
• Because they are not fluctuating, no useful probability statements
can be made about parameters.
• Statistical procedures should be designed to have well-
defined long run frequency properties.
• For example, a 95 percent confidence interval should trap the true
value of the parameter with limiting frequency at least 95 percent.
Bayesian Inference
• Probability describes degree of belief, not limiting
frequency
• Can make probability statements about lots of things, not just
data which are subject to random variation.
• Can make probability statements about parameters
• Even though they are fixed constants.
• Can make inferences about a parameter 𝜃 by producing
a probability distribution for 𝜃.
• Inferences (ex. point estimates and interval estimates) may
then be extracted from this distribution.
Bayesian Inference
• Bayesian inference is subjective
• Controversial
• Non-informative prior can address this
• Bayesian approach is a great platform to update
prior belief
• Computationally challenging
• Fast computational approaches (MCMC, Variational
Inference, etc) have been developed
Bayesian Inference
Bayesian Inference
• Bayesian theorem
• For continuous traits, use density function
• For IID samples
Bayesian Inference
• Notation: 𝑋 ! = (𝑋" , … , 𝑋! ) and 𝑥 ! = (𝑥" , … , 𝑥! )
Normalizing Constant
(Partition function)
Posterior distribution
• Posterior can be used to estimate the parameter
• Posterior mean is the parameter estimate
• Posterior interval: Find a, b such that
Bayesian Inference
• Example: Let 𝑋" , … , 𝑋! ~Bernoulli(p) and
𝑝~Uniform(0,1)
Bayesian Inference
• Posterior Mean
• Posterior Interval
Bayesian Inference
• Example: Let 𝑋" , … , 𝑋! ~Bernoulli(p) and
𝑝~Beta(𝛼,𝛽)
Bayesian Inference
Simulation
• In many situation, it is difficult to analytically derive
the posterior distribution
• Use simulation
• Generate 𝜃" , … , 𝜃# ~𝑓(𝜃|𝑥 ! )
• Use simulated 𝜃 for point estimation and confidence
interval
• MCMC is commonly used
Non-informative prior
• Prior has a great impact on the subsequent
estimation
• Example:
• Suppose 10 Bernoulli RV is observed with S=4
• Consider two priors
• Uniform(0,1) (= Beta(1,1))
• Beta(1, 20)
Prior/Posterior, n=10, S=4
Beta(1,1) Posterior
1.4
density
density
1.5
1.0
0.6
0.0
0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0
xrange xrange
Beta(1,20) Posterior
20
0 2 4 6
density
density
10
0
0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0
xrange xrange
Flat Prior
• One approach is using a constant as a prior
• Ex. 𝑓 𝜃 = 𝑐
"
• Since∫!" 𝑓 𝜃 𝑑𝜃 = ∞ , this is not proper probability
distribution
• However we still can carry out Bayesian Inference
𝑓 𝜃 𝑥 # ∝ 𝑓 𝜃 𝐿# (𝜃) ∝ 𝐿# (𝜃)
Flat Prior
Example: Suppose X ~ N(𝜃, 𝜎 $) and 𝑓 𝜃 = 𝑐. Posterior?
Flat Prior
• One of the drawback of flat prior is not
transformation invariant
• Ex. X ~ Bernoulli (p)
• Use flat prior of f(p)=1
%
• Now let 𝜓 = log &!%
, and then the distribution of 𝜓 is
• This is not a flat prior in terms of 𝜓
Jeffreys’ Prior
"/%
• Use 𝑓 𝜃 = 𝐼 𝜃
• 𝐼(𝜃) is a Fisher information
• This is a transformation invariant
• More details, please see the lecture note posted (from
Duke)
Jeffreys’ Prior
• Example: X ~ Bernoulli (p)
Prior/Posterior, n=10, S=4
Bayesian Testing
Bayesian Testing
• Putting prior on H0 and on the parameter 𝜃, and
then computing P(H0|Xn)
• Consider
Bayesian Testing
With P(H0) = P(H1)=1/2
Bayesian Testing
• Example: Suppose 10 Bernoulli RV is observed with S=4. Let
H0: p=0.7. With Jeffreys prior, P(𝐻'|X () ?
Bayes factors
• Most popular approach of Bayesian hypothesis test method is using
Bayes factor
) *! +" )
K=
)(*! |+# )
• Bayes factor is the odds of posterior probability with P(H0) = P(H1)=1/2
) *! +" ) ) /" *! )
K= =
)(*! |+# ) )(/# |*! )
Bayes factors
• Interpretation
Kass and Raftery (1995)
Wikipedia: [Link]
Bayes factors
• Example: Suppose 10 Bernoulli RV is observed with S=4. Let
H0: p=0.7. With Jeffreys prior, Bayes factor K=?
Bayesian Testing
• Bayes factors and posterior probability
P X ! H" )
K=
P(X ! |H# )
• Bayesian can provide a different answer compares
to the frequentist approach
• Cannot use improper prior
Summary
• Frequentist Inference vs Bayesian Inference
• Bayesian Method
• Prior
• Posterior
• Estimation of Posterior Distribution
• Non-informative prior
• Flat
• Jeffrey’s
• Bayesian Testing