BETA-BINOMIAL – BINOMIAL JOINT DISTRIBUTION OF THE SAMPLE
(“LIKELIHOOD”), BETA PRIOR
D. Uniform Prior, Binomial Likelihood:
1. Suppose we take a random sample from a Bernoulli distribution with
parameter p. Our joint distribution of the sample is (“Likelihood” function) is:
n
f n (x | p) = f(x i | p)=p X1 (1 p)1-X1 p X2 (1 p)1-X2 ...p Xn (1 p)1-Xn
i=1
n
=p y (1 - p) n-y and y = Xi
i=1
2. Now, suppose our prior distribution of p is simply Uniform on 0 to 1; that is:
1 0 < p < 1
ξ(p)=
0 otherwise
3. Hence the joint distribution of the sample and p is
h(x1 , x 2 , x 3 , ... , x n , p) = f n (x1 , x 2 , x 3 , ... , x n | p)ξ(p)
Or simply:
n
h(x,p) = fn(x | θ) ξ(θ) = p y (1 - p) n-y and y = Xi
i=1
4. The marginal distribution of the sample is:
g n (x1 , x 2 , x 3 , ... , x n )= h(x1 , x 2 , x 3 , ... , x n , p)dp =
p
( y 1)(n y 1)
p (1 - p) n-y dp
y
p
(n 2)
This result is from the form of the Beta distribution is:
( ) 1
x (1 x) 1 0 x 1
f ( x | 1 , 2 ) ( )( ) 0, 0
0 otherwise
1
Where α=y+1 and β=n-y+1.
5. So that the posterior distribution is:
f n (x1 , x 2 , x 3 , ... , x n | θ)ξ(θ)
ξ(θ| x1 , x 2 , x 3 , ... , x n )=
g n (x1 , x 2 , x 3 , ... , x n )
(n 2)
p y (1 - p) n-y
( y 1)(n y 1)
This is a Beta distribution with parameters:
n n
y 1 X i 1 and n y 1 n X i 1
i 1 i 1
6. The expected value and variance of the Beta distribution is:
E( X ) and VAR(X) =
( ) ( 1)
2
7. Hence, the Bayesian Estimator for the Mean and Variance is:
n
n n
Xi 1 X i 1 n X i 1
pˆ i 1
and VAR( pˆ ) i 1 i 1
n2 (n 2) 2 (n 3)
8. And the MLE for the Mean and Variance is:
X i
pˆ (1 pˆ )
pˆ mle i 1
and VARmle ( pˆ )
n n
Note that, as the sample size increases:
pˆ bayes pˆ mle
This is also true of the variances. To see this, divide the numerator and denominator by
n2; that is:
2
n
X i 1 pˆ mle 1 pˆ mle
n
1 1
X i 1 n
VAR( pˆ ) i 1
i 1 n n
(n 2) (n 3)
2
4 3
n 4 1
n n
So that, as the sample size increases:
VARbayes ( pˆ ) VARmle ( pˆ )
D. Conjugate Priors (Part 1) – Binomial Joint Distribution of the Sample
(“Likelihood function”) and Beta Prior Distribution – Bayesian Computation With R
example of Beta-Binomial
n
f n (x | p) = f(x i | p)=p X1 (1 p)1-X1 p X2 (1 p)1-X2 ...p Xn (1 p)1-Xn
i=1
n
=p y (1 - p) n-y and y = Xi
i=1
Γ(α+β) α-1
ξ(p)= p (1 - p)β-1 , 0 < p < 1, α, β > 0
Γ(α)Γ(β)
Recall that the joint distribution of the sample and p is equal to the product of the joint
distribution of the sample (“likelihood function”) and the prior distribution of p:
h(x1, x2 , … , xn, p ) = fn(x | p) ξ(p) =
Γ(α+β) α-1 Γ(α+β) y+α-1
p y (1 - p) n-y p (1 - p)β-1 p (1 - p) n-y+β-1
Γ(α)Γ(β) Γ(α)Γ(β)
To get the marginal distribution of the sample we need to integrate out p.
Γ(α+β) ( y )(n y )
gn ( x) *
Γ(α)Γ(β) ( n )
Γ(α+β) ( y )(n y )
1
Γ(n+α+β)
Γ(y+α)Γ(n-y+β) p (1 - p)n-y+β-1dp
y+α-1
0
Γ(α)Γ(β) ( n )
And the Posterior distribution is:
3
Γ(α+β) y+α-1
p (1 - p) n-y+β-1
hn ( x , p) Γ(α)Γ(β)
( p | x)
gn ( x) Γ(α+β) ( y ) ( n y )
Γ(α)Γ(β) (n )
Γ(n+α+β)
p y+α-1 (1 - p) n-y+β-1
Γ(y+α)Γ(n-y+β)
This a Beta distribution with α* = y + α and β* = n – y + β, so the posterior is:
Γ(α*+β*) α*-1
ξ(p | x)= p (1 - p)β*-1
Γ(α*)Γ(β*)
The mean of the posterior is:
α* y+α
E(X) pˆ
α* + β* α + β + n
If we take a second sample and use the posterior as our new prior then
Γ(α*+β*) α*-1
2 (p)=ξ1 (p | x1 )= p (1 - p)β*-1
Γ(α*)Γ(β*)
and the joint distribution of the sample is (“likelihood function”) for the second sample
is:
f n 2 (x 2 | p) = p y2 (1 - p) n 2 -y2
where the subscript gives the sample number. The posterior is the Beta distribution
α-1
Γ(α + β)
ξ 2 (p | x 2 )= p (1 - p)β-1
Γ(α)Γ(β)
where
α = y 2 + α* = y 2 + y1 +α and
β = n 2 - y 2 + β* = n 2 - y 2 + n1 - y1 + β = n1 + n 2 - y1 - y 2 + β
and
4
α y1 + y 2 + α
E(X) pˆ
α + β α + β + n1 + n 2
As the total sample size gets large this converges to the MLE estimator:
y k
E(X) = k=1
m
= y = pˆ
n
k=1
k