Chapter 2
Bayesian Statistics
MA501, Statistics for Insurance
1
Introductory Example
There are 5 unlabeled bags on the table. 4 of those bags are of type A and contain
equal number of white and black counters. 1 of the bags is of type B and contains
all white counters.
Type A Type B
You are asked to choose a bag and then pick a counter from that bag (without
seeing its contents). You pull out a white counter, which type of bag did it most
likely come from?
P(Bag A) = 4/5 P(Bag B) = 1/5
P(White|Bag A) = ½ P(White|Bag B) = 1
P(White) = P(White and Bag A) + P(White and Bag B)
= P(White|Bag A)P(Bag A) + P(White|Bag B)P(Bag B)
= 1/2 4/5 + 1 1/5 = 3/5
P(Bag A|White) = P(White and Bag A)/ P(White)
= P(White|Bag A) P(Bag A)/ P(White) = (1/2 4/5) / (3/5) = 2/3
P(Bag B| White) = 1 – 2/3 = 1/3
Answer = Bag A
2
2.1 Bayes’ Theorem
Problem: How do you “turn round” a conditional probability? How do
you find P(B|A) from P(A|B)?
Solution: Suppose that a sample space S admits a partition:
𝑘
𝑆 = ራ 𝐵𝑖 , 𝐵𝑖 ሩ 𝐵𝑗 = ∅, 𝑖≠𝑗
𝑖=1
and that P(Bi) 0, i = 1,...,k. Then for any event A in S with P(A) 0:
𝑃 𝐴 𝐵𝑟 𝑃 𝐵𝑟
𝑃 𝐵𝑟 𝐴 =
σ𝑖 𝑃 𝐴 𝐵𝑖 𝑃 𝐵𝑖
(proof see CT6 notes)
3
2.1 Bayes’ Theorem
Example
1. A shoe shop sells shoes from three manufacturers. 50% of the
shoes are made by manufacturer 1, 30% by manufacturer 2 and the
rest are made by manufacturer 3. 1% of the shoes made by
manufacturer 1 are faulty, 2% of the shoes made by manufacturer
2 are faulty and 5% of the shoes made by manufacturer 3 are
faulty. A shoe is found to be faulty what is the probability it comes
from:
i) manufacturer 1
ii) manufacturer 2
iii) manufacturer 3
4
2.1 Bayes’ Theorem
Example
• Let B1 be the event shoes made by manufacturer 1, …
• Let A be the event the shoe is found faulty.
• P(B1) = 0.5, P(B2) = 0.3, P(B3) = 1 – 0.5 – 0.3 = 0.2
• P(A|B1) = 0.01, P(A|B2) = 0.02, P(A|B3) = 0.05
𝑃 𝐴 𝐵𝑟 𝑃 𝐵𝑟
Bayes Theorem: 𝑃 𝐵𝑟 𝐴 =
σ𝑖 𝑃 𝐴 𝐵𝑖 𝑃 𝐵𝑖
5
2.1 Bayes’ Theorem
Why Bayes?
Statistics can be divided into two areas: classical (or frequents)
and Bayesian Statistics. Bayesian Statistics is based on Bayes’
Theorem.
Consider a random sample X = (X1,X2,...,Xn) from a population
with density or probability function f(x;) where is unknown.
We want to infer .
Classical Statistics:
– is a fixed unknown constant
– Only the sample information, X, is used
For example, MLE and the method of moments
Thomas Bayes
Bayesian Statistics: (c. 1702 - 1761)
– is regarded as a random variable with a prior distribution.
– Not only the sample information but also non-sample
information from other sources. 6
2.1 Bayes’ Theorem
Why Bayes?
An advantage of Bayesian statistics is that it allows us to make
use of any additional information that may be available.
An example of a use in insurance:
Suppose an insurance company is reviewing its premium rates
for a particular type of policy. During this process there are
results from other insurers, as well as from its own
policyholders. These additional data might contain a lot of
information, which should not be ignored.
7
2.2 Prior and Posterior Distributions
• We require an estimate of the parameter .
• is assumed to be a random variable, so has a distribution, f().
This distribution is called the prior distribution, and can make
use of any prior knowledge of .
• We then collect the data, X = (X1,X2,...,Xn), which is a random
sample from a population with density or probability function
f(X|).
• We want to find the distribution of given the data, f(|X), which
is called the posterior distribution.
• Note that f(X|), the joint density of the sample values, is also
know as the likelihood hence:
• POSTERIOR LIKELIHOOD PRIOR
8
2.2 Prior and Posterior Distributions
Basic Steps
• Step 1 – select a suitable prior distribution
Write down the prior distribution of the unknown
parameter
• Step 2 – determine the likelihood function
Write down the likelihood from the observations
• Step 3 – determine the posterior
POSTERIOR likelihood prior
• Step 4 – identify the posterior distribution
Either look for a standard distribution that has a similar
algebraic form as the posterior.
Or integrate out the unknown parameter to find the
normalising constant (not covered by this course).
9
2.2 Prior and Posterior Distributions Examples
2. Suppose that X follows a Poisson distribution with rate and that
has exponential distribution with mean 1/100. We observe a
sample x = (1,5,3,10). What is the posterior distribution of ?
Step 1: Prior:
FB p7: Poisson dist:
e−𝜇 𝜇 𝑥
𝑝 𝑥 =
Step 2: Likelihood: 𝑥!
Any terms multiplying
whole expression not
containing = can be
removed as only
σ𝑥𝑖 = 1 + 5 + 3 + 10 = 19
interested in
Step 3: Calc. Post: Post likelihood prior
Step 4 – Identify the posterior distribution. x =
1 = 19 = 20
= 104
Gamma(20,104) 10
2.2 Prior and Posterior Distributions Examples
3. Suppose that X follows a Binomial distribution Bin(10,p) and that p
has a beta distribution Beta(2,3). We observe a sample
x = (5,4,6,3,7). What is the posterior distribution of p?
𝑛 𝑥 𝑛−𝑥
FB x~Bin 𝑛, 𝑝 𝑓 𝑥 = 𝑝 1−𝑝
𝑥
Step 1: Prior:
Step 2: Likelihood:
Step 3: Calc. Posterior: Posterior likelihood prior
Step 4 – Identify the posterior distribution.
Γ 𝛼 + 𝛽 𝛼−1 𝛽−1
𝑓 𝑥 = 𝑥 1−𝑥 ∝ 𝑥 𝛼−1 1 − 𝑥 𝛽−1
Γ 𝛼 Γ 𝛽
Beta (with x = p) 1 = 26 = 27 1 = 27 = 28
Beta(27, 28)
11
2.2 Prior and Posterior Distributions
Selecting the prior distribution
• In practice, there are two barriers to the application of Bayes
methods. One is the posterior calculation and the other is the
selection of the prior.
• The prior needs to be from a suitable family for fitting our
prior knowledge. Each distribution in that family is required to
have a distribution range compatible with our prior knowledge.
• For example, for the binomial distribution Bin(m,p), the prior
on the parameter p cannot be modelled by a Gamma
distribution. This is because p has the range [0,1] while the
Gamma distribution has the range [0,). But the prior could be
modelled by a Beta distribution, as this has the range [0,1].
• If for a given likelihood the posterior distribution belongs to
the same family as the prior distribution, the prior distribution
is known as a conjugate prior. Eg both posterior and prior
follow a gamma distribution.
12
2.2 Prior and Posterior Distributions
Selecting the prior distribution - Example
4. Suppose that 𝒙 = 𝑥1 , 𝑥2 , … , 𝑥𝑚 is a random sample from a
Bernoulli distribution with parameter p (ie X~Bin(1,p)) what
family of distributions would result in a conjugate prior and
posterior distribution.
𝑛 𝑥
FB: Bin dist: 𝑝 𝑥 = 𝑝 1 − 𝑝 𝑛−𝑥 Bernoulli 𝑛 = 1
𝑥
𝑚
𝐿 𝒙|𝑝 = ෑ 𝑝 𝑥𝑖 1 − 𝑝 1−𝑥𝑖 = 𝑝σ𝑥𝑖 1 − 𝑝 𝑚−σ𝑥𝑖
𝑖=1
post ∝ lik × prior = 𝑝σ𝑥𝑖 1 − 𝑝 𝑚−σ𝑥𝑖 × 𝑓(𝑝) = 𝑝 𝐴 1 − 𝑝 𝐵
Γ 𝛼 + 𝛽 𝛼−1
Beta dist: 𝑓 𝑥 = 𝑥 1 − 𝑥 𝛽−1 ∝ 𝑥 𝛼−1 1 − 𝑥 𝛽−1
Γ 𝛼 Γ 𝛽
Answer = Beta Distribution is conjugate.
post ∝ 𝑝σ𝑥𝑖 1 − 𝑝 𝑚−σ𝑥𝑖 × 𝑝𝛼−1 1 − 𝑝 𝛽−1
= 𝑝σ𝑥𝑖+𝛼−1 1 − 𝑝 𝑚−σ𝑥𝑖 +𝛽−1
Beta(σ𝑥𝑖 + 𝛼, 𝑚 − σ𝑥𝑖 + 𝛽)
13
2.2 Prior and Posterior Distributions
Selecting the prior distribution
• In some cases it may be useful to use a prior which assumes that
the known parameter can take on any value, for example if there
is no prior information.
• A prior that assumes that the known parameter is equally likely to
take on any value is called an uninformative prior.
• An uninformative prior is the uniform distribution as it gives
equal probability to all possibilities.
• Example: Suppose we have a sample from the binomial
distribution with probability p. What would be a suitable
uninformative prior?
Range of p is [0,1], therefore an uninformative prior is U(0,1)
• Note if a parameter takes on the range (,), U(,) does not
make sense because the pdf of this distribution is 0 everywhere.
We can get round this problem by using U(N,N) which has pdf:
1/(2N).
14
2.2 Prior and Posterior Distributions
Combinations of Likelihoods and Prior
Likelihood Prior Posterior
Obs: X1,…,Xn
Poisson() ~ U(0,) Gamma(x + 1, n)
Poisson() ~ Exp() Gamma(x+1, n + )
Poisson() ~ Gamma(,) Gamma(x+, n + )
Exp() ~ U(0,) Gamma(n + 1,x )
Exp() ~ Exp() Gamma(n + 1,x + )
Exp() ~ Gamma(,) Gamma(n+, x + )
Gamma(,) ( known) ~ U(0,) Gamma(n + 1,x )
Gamma(,) ( known) ~ Exp() Gamma(n + 1,x + )
Gamma(,) ( known) ~ Gamma(,) Gamma(n+, x + )
Weibull(c,) ( known) c ~ U(0,) Gamma(n + 1,x )
Weibull(c,) ( known) c ~ Exp() Gamma(n + 1,x + )
Weibull(c,) ( known) c ~ Gamma(,) Gamma(n+, x + )
15
2.2 Prior and Posterior Distributions
Combinations of Likelihoods and Prior
Likelihood Prior Posterior
Obs: X1,…,Xn
N(,2) (2 known) ~ U(-,) N(x/n , 2/n)
N(,2) (2 known) ~ N(,2)
LogN(,2) (2 known) ~ U(-,) N({log(x)}/n , 2/n)
Bin(m,p) (m known) p ~ U(0,1) Beta(x + 1, nm – x + 1)
Bin(m,p) (m known) p ~ Beta(,) Beta(x + , nm – x + )
Geo(p) p ~ U(0,1) Beta(n + 1, x + 1)
Geo(p) p ~ Beta(,) Beta(n + , x + )
NegBin(k,p) (k known) p ~ U(0,1) Beta(nk + 1, x + 1)
Type 2
NegBin(k,p) (k known) p ~ Beta(,) Beta(nk + , x + )
Type 2
16
2.2 Prior and Posterior Distributions
Identifying Posterior Distributions
• Suppose we have found the posterior distribution of , f( |x)
• Gamma:
• Posterior follows the gamma distribution Gamma(A+1,B) if
for any A and B.
• Example: what is the Posterior distribution if
Posterior is Gamma(12,20)
• Example: what is the Posterior distribution if
Posterior is
17
2.2 Prior and Posterior Distributions
Identifying Posterior Distributions
• Beta:
• Posterior follows the beta distribution Beta(A+1,B+1) if
for any A and B.
NB Posterior is never a Binomial distribution.
• Example: what is the Posterior distribution if
Posterior is Beta(24,16)
• Example: what is the Posterior distribution if
Posterior is
18
2.2 Prior and Posterior Distributions
Identifying Posterior Distributions
• Normal:
• Posterior follows the normal distribution N(A,B2) if
for any A and B.
• Example: what is the Posterior distribution if
• ?
Posterior is N(10,9)
• For the possible combinations in this course, Posterior
distributions will only be Gamma, Beta or Normal
distributions.
19
2.3 The loss function
• In Chapter 1 for statistical games we defined the Bayes risk as
E[R(d(·), )] = E{E[l(d(x),]} and wanted to obtain the decision
that minimises this risk. This required a loss function, l(d(x)).
• Similarly to be able to obtain an estimator of a loss function
must first be specified.
• Here loss is denoted by g(x) and L{g(x)} denotes the loss function.
• A loss function should be zero when the estimation is exactly
correct (i.e. g(x) = ), and should be positive and not decrease as
g(x) gets further away from .
• A Bayesian estimator is found by minimising the expected
posterior loss:
20
2.3 The loss function
Three different types of loss functions:
Quadratic loss L(g(x)) = [ g(x) – ]2
Absolute error loss L(g(x)) = | g(x) – |
All-or-nothing loss (0/1)
21
2.3 The loss function
For quadratic loss EPL is Posterior
𝑑
The minimum is at: Loss function
𝑑𝑔
𝑔−𝜃 2
= 2(𝑔 − 𝜃)
𝐸 𝑦 = ∫ 𝑦𝑓 𝑦 𝑑𝑦
To check this a minimum we diff EPL a 2nd time
The Bayesian estimator under quadratic loss is the mean of the
posterior distribution.
22
2.3 The loss function
For absolute loss EPL is
The minimum is at:
Which specifies the median
(To check this a minimum we diff EPL a 2nd time –details in
CT6 notes)
The Bayesian estimator under absolute loss is the median of the
posterior distribution.
The Bayesian estimator under all-or-nothing loss is the mode
of the posterior distribution. (See CT6 notes for proof)
23
2.3 The loss function. Examples
2. (Continued) Suppose that X follows a Poisson distribution with
rate and that has exponential distribution with mean 1/100.
We observe a sample x = (1,5,3,10). What is the posterior
distribution of ? Find the Bayesian estimator of under
quadratic and all-or-nothing loss. (You may assume that the mode
of Gamma(,) is ( – 1)/ - see moodle for how to find mode).
From slide 10 the posterior Gamma(20, 104)
From FB: X~Gamma(,) E(X) = /
Quadratic loss: mean of post. = 20/104 = 0.192
All-or-nothing loss: mode of post. = (20 – 1)/104 = 0.183
3. (Continued) Suppose that X follows a Binomial distribution
Bin(10,p) and that p has a beta distribution Beta(2,3). We observe
a sample x = (5,4,6,3,7). What is the posterior distribution of p?
Find the Bayesian estimator of p under quadratic loss. From slide
11 the posterior is Beta(27, 28).
From FB X~Beta(,) E(X) = /(+)
Quadratic loss: mean of post. = 27/(27+28) = 0.49
24
Summary
P ( A | Br ) P ( Br )
• Bayes Theorem P ( Br | A) , P ( A) P ( A | Bi ) P ( Bi )
P ( A | B i ) P ( B i ) i
• Classical Statistics: is a fixed unknown constant. Use sample data to estimate
i
• Bayesian Statistics: is regarded as a random variable with a prior distribution.
Uses both data and prior information.
• POSTERIOR LIKELIHOOD PRIOR
• A conjugate prior is a prior that belongs to the same family of distributions as the
posterior
• expected posterior loss is EPL EL( g (x)) L( g (x)) f ( | x)d
• quadratic loss L(g(x)) = [ g(x) – ]2 (mean of the posterior distribution)
(need to know proof)
• absolute loss L(g(x)) = | g(x) – | (median of the posterior distribution)
(need to know proof – except checking minimum)
0 if g ( x)
• all-or-nothing L( g (x)) (mode of the posterior distribution)
1 if g ( x)
25
Example Exam Question
(Q5 2007) An insurer models that the number of claims, N, in one month from a
particular type of policy follows the distribution: P(N = 0) = , P(N = 1) = 1 − .
Prior beliefs on the parameter are represented by a beta distribution with density
function f() = 2(1 − ), > 0.
There are a total of 12 claims on this policy over a 18 month period. The claims
are assumed to arise independently.
(i) Derive the posterior distribution for .
(ii) Determine the Bayesian estimate of . under all-or-nothing loss. [Hint the
mode of the Beta distribution is (1)/(+2)] [7 marks]
(i) Step 1: Prior: f() = 2(1 − ) (1 − )
Step 2: Likelihood: n = 18, Ni = 12, distribution is binary (p = 1 )
Step 3: Calc. Posterior: Posterior likelihood prior
= 1 − 𝜃 12 𝜃 6 × 1 − 𝜃 = 𝜃 6 1 − 𝜃 13
26
Step 1: Prior: f() = 2(1 − ) (1 − )
Step 2: Likelihood: n = 18, Ni = 12, distribution is binary (p = 1 )
Step 3: Calc. Posterior: Posterior likelihood prior
= 1 − 𝜃 12 𝜃 6 × 1 − 𝜃 = 𝜃 6 1 − 𝜃 13
Step 4 – Identify the posterior distribution
Γ 𝛼 + 𝛽 𝛼−1 𝛽−1
FB: X~Beta 𝛼, 𝛽 : 𝑓 𝑥 = 𝑥 1−𝑥 ∝ 𝑥 𝛼−1 1 − 𝑥 𝛽−1
Γ 𝛼 Γ 𝛽
1 = 6, 1 = 13
Beta(7,14)
(ii) The Bayesian estimate under all-or-nothing loss is the mode of
the posterior distribution.
𝛼−1 7−1
= = 0.316
𝛼 + 𝛽 − 2 7 + 14 − 2
27
Homework
• Read chapter 2 of CT6 notes and go through exercises.
• Go through Posterior Proofs pdf on moodle
• Further reading: Appendix C of Boland(2007)
28