0% found this document useful (0 votes)
10 views28 pages

Bayesian Statistics and Bayes' Theorem

This document introduces Bayesian statistics and Bayes' theorem. It provides examples of how to calculate posterior distributions given prior distributions and likelihood functions. The key steps are to specify the prior distribution, determine the likelihood function from the data, calculate the posterior as the product of the prior and likelihood, and identify the resulting posterior distribution. Choosing a conjugate prior that results in the same distribution family for the posterior is discussed.

Uploaded by

Y L
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views28 pages

Bayesian Statistics and Bayes' Theorem

This document introduces Bayesian statistics and Bayes' theorem. It provides examples of how to calculate posterior distributions given prior distributions and likelihood functions. The key steps are to specify the prior distribution, determine the likelihood function from the data, calculate the posterior as the product of the prior and likelihood, and identify the resulting posterior distribution. Choosing a conjugate prior that results in the same distribution family for the posterior is discussed.

Uploaded by

Y L
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd

Chapter 2

Bayesian Statistics
MA501, Statistics for Insurance

1
Introductory Example
There are 5 unlabeled bags on the table. 4 of those bags are of type A and contain
equal number of white and black counters. 1 of the bags is of type B and contains
all white counters.

Type A Type B
You are asked to choose a bag and then pick a counter from that bag (without
seeing its contents). You pull out a white counter, which type of bag did it most
likely come from?
P(Bag A) = 4/5 P(Bag B) = 1/5
P(White|Bag A) = ½ P(White|Bag B) = 1
P(White) = P(White and Bag A) + P(White and Bag B)
= P(White|Bag A)P(Bag A) + P(White|Bag B)P(Bag B)
= 1/2  4/5 + 1  1/5 = 3/5
P(Bag A|White) = P(White and Bag A)/ P(White)
= P(White|Bag A) P(Bag A)/ P(White) = (1/2  4/5) / (3/5) = 2/3
P(Bag B| White) = 1 – 2/3 = 1/3
Answer = Bag A
2
2.1 Bayes’ Theorem
Problem: How do you “turn round” a conditional probability? How do
you find P(B|A) from P(A|B)?
Solution: Suppose that a sample space S admits a partition:
𝑘

𝑆 = ራ 𝐵𝑖 , 𝐵𝑖 ሩ 𝐵𝑗 = ∅, 𝑖≠𝑗
𝑖=1
and that P(Bi)  0, i = 1,...,k. Then for any event A in S with P(A)  0:
𝑃 𝐴 𝐵𝑟 𝑃 𝐵𝑟
𝑃 𝐵𝑟 𝐴 =
σ𝑖 𝑃 𝐴 𝐵𝑖 𝑃 𝐵𝑖
(proof see CT6 notes)

3
2.1 Bayes’ Theorem
Example
1. A shoe shop sells shoes from three manufacturers. 50% of the
shoes are made by manufacturer 1, 30% by manufacturer 2 and the
rest are made by manufacturer 3. 1% of the shoes made by
manufacturer 1 are faulty, 2% of the shoes made by manufacturer
2 are faulty and 5% of the shoes made by manufacturer 3 are
faulty. A shoe is found to be faulty what is the probability it comes
from:
i) manufacturer 1
ii) manufacturer 2
iii) manufacturer 3

4
2.1 Bayes’ Theorem
Example
• Let B1 be the event shoes made by manufacturer 1, …
• Let A be the event the shoe is found faulty.
• P(B1) = 0.5, P(B2) = 0.3, P(B3) = 1 – 0.5 – 0.3 = 0.2
• P(A|B1) = 0.01, P(A|B2) = 0.02, P(A|B3) = 0.05
𝑃 𝐴 𝐵𝑟 𝑃 𝐵𝑟
Bayes Theorem: 𝑃 𝐵𝑟 𝐴 =
σ𝑖 𝑃 𝐴 𝐵𝑖 𝑃 𝐵𝑖

5
2.1 Bayes’ Theorem
Why Bayes?
Statistics can be divided into two areas: classical (or frequents)
and Bayesian Statistics. Bayesian Statistics is based on Bayes’
Theorem.
Consider a random sample X = (X1,X2,...,Xn) from a population
with density or probability function f(x;) where  is unknown.
We want to infer .
Classical Statistics:
–  is a fixed unknown constant
– Only the sample information, X, is used
For example, MLE and the method of moments
Thomas Bayes
Bayesian Statistics: (c. 1702 - 1761)
–  is regarded as a random variable with a prior distribution.
– Not only the sample information but also non-sample
information from other sources. 6
2.1 Bayes’ Theorem
Why Bayes?
An advantage of Bayesian statistics is that it allows us to make
use of any additional information that may be available.

An example of a use in insurance:


Suppose an insurance company is reviewing its premium rates
for a particular type of policy. During this process there are
results from other insurers, as well as from its own
policyholders. These additional data might contain a lot of
information, which should not be ignored.

7
2.2 Prior and Posterior Distributions
• We require an estimate of the parameter .
•  is assumed to be a random variable, so has a distribution, f().
This distribution is called the prior distribution, and can make
use of any prior knowledge of .
• We then collect the data, X = (X1,X2,...,Xn), which is a random
sample from a population with density or probability function
f(X|).
• We want to find the distribution of  given the data, f(|X), which
is called the posterior distribution.

• Note that f(X|), the joint density of the sample values, is also
know as the likelihood hence:
• POSTERIOR  LIKELIHOOD  PRIOR
8
2.2 Prior and Posterior Distributions
Basic Steps
• Step 1 – select a suitable prior distribution
Write down the prior distribution of the unknown
parameter
• Step 2 – determine the likelihood function
Write down the likelihood from the observations
• Step 3 – determine the posterior
POSTERIOR  likelihood  prior
• Step 4 – identify the posterior distribution
Either look for a standard distribution that has a similar
algebraic form as the posterior.
Or integrate out the unknown parameter to find the
normalising constant (not covered by this course).

9
2.2 Prior and Posterior Distributions Examples
2. Suppose that X follows a Poisson distribution with rate  and that
 has exponential distribution with mean 1/100. We observe a
sample x = (1,5,3,10). What is the posterior distribution of ?
Step 1: Prior:
FB p7: Poisson dist:
e−𝜇 𝜇 𝑥
𝑝 𝑥 =
Step 2: Likelihood: 𝑥!
Any terms multiplying
whole expression not
containing  =  can be
removed as only
σ𝑥𝑖 = 1 + 5 + 3 + 10 = 19
interested in 

Step 3: Calc. Post: Post  likelihood  prior

Step 4 – Identify the posterior distribution. x = 


  1 = 19   = 20
 = 104
Gamma(20,104) 10
2.2 Prior and Posterior Distributions Examples
3. Suppose that X follows a Binomial distribution Bin(10,p) and that p
has a beta distribution Beta(2,3). We observe a sample
x = (5,4,6,3,7). What is the posterior distribution of p?
𝑛 𝑥 𝑛−𝑥
FB x~Bin 𝑛, 𝑝 𝑓 𝑥 = 𝑝 1−𝑝
𝑥
Step 1: Prior:

Step 2: Likelihood:

Step 3: Calc. Posterior: Posterior  likelihood  prior

Step 4 – Identify the posterior distribution.


Γ 𝛼 + 𝛽 𝛼−1 𝛽−1
𝑓 𝑥 = 𝑥 1−𝑥 ∝ 𝑥 𝛼−1 1 − 𝑥 𝛽−1
Γ 𝛼 Γ 𝛽
Beta (with x = p)   1 = 26   = 27   1 = 27   = 28
Beta(27, 28)
11
2.2 Prior and Posterior Distributions
Selecting the prior distribution
• In practice, there are two barriers to the application of Bayes
methods. One is the posterior calculation and the other is the
selection of the prior.
• The prior needs to be from a suitable family for fitting our
prior knowledge. Each distribution in that family is required to
have a distribution range compatible with our prior knowledge.
• For example, for the binomial distribution Bin(m,p), the prior
on the parameter p cannot be modelled by a Gamma
distribution. This is because p has the range [0,1] while the
Gamma distribution has the range [0,). But the prior could be
modelled by a Beta distribution, as this has the range [0,1].
• If for a given likelihood the posterior distribution belongs to
the same family as the prior distribution, the prior distribution
is known as a conjugate prior. Eg both posterior and prior
follow a gamma distribution.
12
2.2 Prior and Posterior Distributions
Selecting the prior distribution - Example
4. Suppose that 𝒙 = 𝑥1 , 𝑥2 , … , 𝑥𝑚 is a random sample from a
Bernoulli distribution with parameter p (ie X~Bin(1,p)) what
family of distributions would result in a conjugate prior and
posterior distribution.
𝑛 𝑥
FB: Bin dist: 𝑝 𝑥 = 𝑝 1 − 𝑝 𝑛−𝑥 Bernoulli 𝑛 = 1
𝑥
𝑚

𝐿 𝒙|𝑝 = ෑ 𝑝 𝑥𝑖 1 − 𝑝 1−𝑥𝑖 = 𝑝σ𝑥𝑖 1 − 𝑝 𝑚−σ𝑥𝑖

𝑖=1
post ∝ lik × prior = 𝑝σ𝑥𝑖 1 − 𝑝 𝑚−σ𝑥𝑖 × 𝑓(𝑝) = 𝑝 𝐴 1 − 𝑝 𝐵

Γ 𝛼 + 𝛽 𝛼−1
Beta dist: 𝑓 𝑥 = 𝑥 1 − 𝑥 𝛽−1 ∝ 𝑥 𝛼−1 1 − 𝑥 𝛽−1
Γ 𝛼 Γ 𝛽
Answer = Beta Distribution is conjugate.
post ∝ 𝑝σ𝑥𝑖 1 − 𝑝 𝑚−σ𝑥𝑖 × 𝑝𝛼−1 1 − 𝑝 𝛽−1
= 𝑝σ𝑥𝑖+𝛼−1 1 − 𝑝 𝑚−σ𝑥𝑖 +𝛽−1
Beta(σ𝑥𝑖 + 𝛼, 𝑚 − σ𝑥𝑖 + 𝛽)
13
2.2 Prior and Posterior Distributions
Selecting the prior distribution
• In some cases it may be useful to use a prior which assumes that
the known parameter can take on any value, for example if there
is no prior information.
• A prior that assumes that the known parameter is equally likely to
take on any value is called an uninformative prior.
• An uninformative prior is the uniform distribution as it gives
equal probability to all possibilities.
• Example: Suppose we have a sample from the binomial
distribution with probability p. What would be a suitable
uninformative prior?
Range of p is [0,1], therefore an uninformative prior is U(0,1)
• Note if a parameter takes on the range (,), U(,) does not
make sense because the pdf of this distribution is 0 everywhere.
We can get round this problem by using U(N,N) which has pdf:
1/(2N).
14
2.2 Prior and Posterior Distributions
Combinations of Likelihoods and Prior
Likelihood Prior Posterior
Obs: X1,…,Xn
Poisson()  ~ U(0,) Gamma(x + 1, n)
Poisson()  ~ Exp() Gamma(x+1, n + )
Poisson()  ~ Gamma(,) Gamma(x+, n + )
Exp()  ~ U(0,) Gamma(n + 1,x )
Exp()  ~ Exp() Gamma(n + 1,x + )
Exp()  ~ Gamma(,) Gamma(n+, x + )
Gamma(,) ( known)  ~ U(0,) Gamma(n + 1,x )
Gamma(,) ( known)  ~ Exp() Gamma(n + 1,x + )
Gamma(,) ( known)  ~ Gamma(,) Gamma(n+, x + )
Weibull(c,) ( known) c ~ U(0,) Gamma(n + 1,x )
Weibull(c,) ( known) c ~ Exp() Gamma(n + 1,x + )
Weibull(c,) ( known) c ~ Gamma(,) Gamma(n+, x + )

15
2.2 Prior and Posterior Distributions
Combinations of Likelihoods and Prior
Likelihood Prior Posterior
Obs: X1,…,Xn
N(,2) (2 known)  ~ U(-,) N(x/n , 2/n)

N(,2) (2 known)  ~ N(,2)

LogN(,2) (2 known)  ~ U(-,) N({log(x)}/n , 2/n)


Bin(m,p) (m known) p ~ U(0,1) Beta(x + 1, nm – x + 1)
Bin(m,p) (m known) p ~ Beta(,) Beta(x + , nm – x + )
Geo(p) p ~ U(0,1) Beta(n + 1, x + 1)
Geo(p) p ~ Beta(,) Beta(n + , x + )
NegBin(k,p) (k known) p ~ U(0,1) Beta(nk + 1, x + 1)
Type 2
NegBin(k,p) (k known) p ~ Beta(,) Beta(nk + , x + )
Type 2

16
2.2 Prior and Posterior Distributions
Identifying Posterior Distributions
• Suppose we have found the posterior distribution of , f( |x)
• Gamma:

• Posterior follows the gamma distribution Gamma(A+1,B) if


for any A and B.

• Example: what is the Posterior distribution if

Posterior is Gamma(12,20)

• Example: what is the Posterior distribution if

Posterior is
17
2.2 Prior and Posterior Distributions
Identifying Posterior Distributions
• Beta:

• Posterior follows the beta distribution Beta(A+1,B+1) if


for any A and B.
NB Posterior is never a Binomial distribution.

• Example: what is the Posterior distribution if

Posterior is Beta(24,16)

• Example: what is the Posterior distribution if

Posterior is
18
2.2 Prior and Posterior Distributions
Identifying Posterior Distributions
• Normal:

• Posterior follows the normal distribution N(A,B2) if


for any A and B.

• Example: what is the Posterior distribution if


• ?
Posterior is N(10,9)

• For the possible combinations in this course, Posterior


distributions will only be Gamma, Beta or Normal
distributions.
19
2.3 The loss function

• In Chapter 1 for statistical games we defined the Bayes risk as


E[R(d(·),  )] = E{E[l(d(x),]} and wanted to obtain the decision
that minimises this risk. This required a loss function, l(d(x)).
• Similarly to be able to obtain an estimator of  a loss function
must first be specified.
• Here loss is denoted by g(x) and L{g(x)} denotes the loss function.
• A loss function should be zero when the estimation is exactly
correct (i.e. g(x) = ), and should be positive and not decrease as
g(x) gets further away from .
• A Bayesian estimator is found by minimising the expected
posterior loss:

20
2.3 The loss function
Three different types of loss functions:

Quadratic loss L(g(x)) = [ g(x) –  ]2

Absolute error loss L(g(x)) = | g(x) –  |

All-or-nothing loss (0/1)

21
2.3 The loss function
For quadratic loss EPL is Posterior

𝑑
The minimum is at: Loss function
𝑑𝑔
𝑔−𝜃 2
= 2(𝑔 − 𝜃)

𝐸 𝑦 = ∫ 𝑦𝑓 𝑦 𝑑𝑦

To check this a minimum we diff EPL a 2nd time

The Bayesian estimator under quadratic loss is the mean of the


posterior distribution.
22
2.3 The loss function
For absolute loss EPL is

The minimum is at:

Which specifies the median

(To check this a minimum we diff EPL a 2nd time –details in


CT6 notes)
The Bayesian estimator under absolute loss is the median of the
posterior distribution.
The Bayesian estimator under all-or-nothing loss is the mode
of the posterior distribution. (See CT6 notes for proof)
23
2.3 The loss function. Examples
2. (Continued) Suppose that X follows a Poisson distribution with
rate  and that  has exponential distribution with mean 1/100.
We observe a sample x = (1,5,3,10). What is the posterior
distribution of ? Find the Bayesian estimator of  under
quadratic and all-or-nothing loss. (You may assume that the mode
of Gamma(,) is ( – 1)/ - see moodle for how to find mode).
From slide 10 the posterior Gamma(20, 104)
From FB: X~Gamma(,) E(X) = /
Quadratic loss: mean of post. = 20/104 = 0.192
All-or-nothing loss: mode of post. = (20 – 1)/104 = 0.183

3. (Continued) Suppose that X follows a Binomial distribution


Bin(10,p) and that p has a beta distribution Beta(2,3). We observe
a sample x = (5,4,6,3,7). What is the posterior distribution of p?
Find the Bayesian estimator of p under quadratic loss. From slide
11 the posterior is Beta(27, 28).
From FB X~Beta(,) E(X) = /(+)
Quadratic loss: mean of post. = 27/(27+28) = 0.49
24
Summary
P ( A | Br ) P ( Br )
• Bayes Theorem P ( Br | A)  , P ( A)   P ( A | Bi ) P ( Bi )
 P ( A | B i ) P ( B i ) i

• Classical Statistics:  is a fixed unknown constant. Use sample data to estimate 


i

• Bayesian Statistics:  is regarded as a random variable with a prior distribution.


Uses both data and prior information.

• POSTERIOR  LIKELIHOOD  PRIOR

• A conjugate prior is a prior that belongs to the same family of distributions as the
posterior

• expected posterior loss is EPL  EL( g (x))   L( g (x)) f ( | x)d


• quadratic loss L(g(x)) = [ g(x) –  ]2 (mean of the posterior distribution)
(need to know proof)
• absolute loss L(g(x)) = | g(x) –  | (median of the posterior distribution)
(need to know proof – except checking minimum)
0 if g ( x)  
• all-or-nothing L( g (x))   (mode of the posterior distribution)
1 if g ( x)  

25
Example Exam Question
(Q5 2007) An insurer models that the number of claims, N, in one month from a
particular type of policy follows the distribution: P(N = 0) =  , P(N = 1) = 1 − .
Prior beliefs on the parameter  are represented by a beta distribution with density
function f() = 2(1 − ),  > 0.
There are a total of 12 claims on this policy over a 18 month period. The claims
are assumed to arise independently.
(i) Derive the posterior distribution for .
(ii) Determine the Bayesian estimate of . under all-or-nothing loss. [Hint the
mode of the Beta distribution is (1)/(+2)] [7 marks]

(i) Step 1: Prior: f() = 2(1 − )  (1 − )

Step 2: Likelihood: n = 18, Ni = 12, distribution is binary (p = 1   )

Step 3: Calc. Posterior: Posterior  likelihood  prior


= 1 − 𝜃 12 𝜃 6 × 1 − 𝜃 = 𝜃 6 1 − 𝜃 13

26
Step 1: Prior: f() = 2(1 − )  (1 − )
Step 2: Likelihood: n = 18, Ni = 12, distribution is binary (p = 1   )

Step 3: Calc. Posterior: Posterior  likelihood  prior


= 1 − 𝜃 12 𝜃 6 × 1 − 𝜃 = 𝜃 6 1 − 𝜃 13

Step 4 – Identify the posterior distribution


Γ 𝛼 + 𝛽 𝛼−1 𝛽−1
FB: X~Beta 𝛼, 𝛽 : 𝑓 𝑥 = 𝑥 1−𝑥 ∝ 𝑥 𝛼−1 1 − 𝑥 𝛽−1
Γ 𝛼 Γ 𝛽
  1 = 6,   1 = 13
Beta(7,14)
(ii) The Bayesian estimate under all-or-nothing loss is the mode of
the posterior distribution.
𝛼−1 7−1
= = 0.316
𝛼 + 𝛽 − 2 7 + 14 − 2

27
Homework

• Read chapter 2 of CT6 notes and go through exercises.

• Go through Posterior Proofs pdf on moodle

• Further reading: Appendix C of Boland(2007)

28

You might also like