0% found this document useful (0 votes)
6 views5 pages

Beta-Binomial Distribution Explained

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views5 pages

Beta-Binomial Distribution Explained

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

BETA-BINOMIAL – BINOMIAL JOINT DISTRIBUTION OF THE SAMPLE

(“LIKELIHOOD”), BETA PRIOR

D. Uniform Prior, Binomial Likelihood:

1. Suppose we take a random sample from a Bernoulli distribution with

parameter p. Our joint distribution of the sample is (“Likelihood” function) is:

n
f n (x | p) =  f(x i | p)=p X1 (1  p)1-X1 p X2 (1  p)1-X2 ...p Xn (1  p)1-Xn
i=1
n
=p y (1 - p) n-y and y = Xi
i=1

2. Now, suppose our prior distribution of p is simply Uniform on 0 to 1; that is:

1 0 < p < 1
ξ(p)= 
0 otherwise

3. Hence the joint distribution of the sample and p is

h(x1 , x 2 , x 3 , ... , x n , p) = f n (x1 , x 2 , x 3 , ... , x n | p)ξ(p)

Or simply:

n
h(x,p) = fn(x | θ) ξ(θ) = p y (1 - p) n-y and y = Xi
i=1

4. The marginal distribution of the sample is:

g n (x1 , x 2 , x 3 , ... , x n )=  h(x1 , x 2 , x 3 , ... , x n , p)dp =


p

( y  1)(n  y  1)
p (1 - p) n-y dp 
y

p
(n  2)

This result is from the form of the Beta distribution is:

 (   )  1
 x (1  x)  1 0  x  1
f ( x | 1 ,  2 )   ( )(  )   0,   0
 0 otherwise

1
Where α=y+1 and β=n-y+1.

5. So that the posterior distribution is:

f n (x1 , x 2 , x 3 , ... , x n | θ)ξ(θ)


ξ(θ| x1 , x 2 , x 3 , ... , x n )= 
g n (x1 , x 2 , x 3 , ... , x n )
(n  2)
p y (1 - p) n-y
( y  1)(n  y  1)

This is a Beta distribution with parameters:

n n
  y  1   X i  1 and   n  y  1  n   X i  1
i 1 i 1

6. The expected value and variance of the Beta distribution is:

 
E( X )  and VAR(X) =
  (   ) (    1)
2

7. Hence, the Bayesian Estimator for the Mean and Variance is:

n
 n  n

 Xi 1   X i  1  n   X i  1
pˆ  i 1
and VAR( pˆ )   i 1  i 1 
n2 (n  2) 2 (n  3)

8. And the MLE for the Mean and Variance is:

X i
pˆ (1  pˆ )
pˆ mle  i 1
and VARmle ( pˆ ) 
n n

Note that, as the sample size increases:

pˆ bayes  pˆ mle

This is also true of the variances. To see this, divide the numerator and denominator by

n2; that is:

2
 n  
X i  1  pˆ mle   1  pˆ mle  
n
1 1
  X i  1  n  
VAR( pˆ )   i 1  
i 1 n  n
(n  2) (n  3)
2
 4  3 
 n  4   1  
 n  n 

So that, as the sample size increases:

VARbayes ( pˆ )  VARmle ( pˆ )

D. Conjugate Priors (Part 1) – Binomial Joint Distribution of the Sample

(“Likelihood function”) and Beta Prior Distribution – Bayesian Computation With R

example of Beta-Binomial
n
f n (x | p) =  f(x i | p)=p X1 (1  p)1-X1 p X2 (1  p)1-X2 ...p Xn (1  p)1-Xn
i=1
n
=p y (1 - p) n-y and y = Xi
i=1

Γ(α+β) α-1
ξ(p)= p (1 - p)β-1 , 0 < p < 1, α, β > 0
Γ(α)Γ(β)

Recall that the joint distribution of the sample and p is equal to the product of the joint

distribution of the sample (“likelihood function”) and the prior distribution of p:

h(x1, x2 , … , xn, p ) = fn(x | p) ξ(p) =

Γ(α+β) α-1 Γ(α+β) y+α-1


p y (1 - p) n-y p (1 - p)β-1  p (1 - p) n-y+β-1
Γ(α)Γ(β) Γ(α)Γ(β)

To get the marginal distribution of the sample we need to integrate out p.

Γ(α+β) ( y   )(n  y   )
gn ( x)  *
Γ(α)Γ(β) ( n     )
Γ(α+β) ( y   )(n  y   )
1
Γ(n+α+β)
 Γ(y+α)Γ(n-y+β) p (1 - p)n-y+β-1dp 
y+α-1

0
Γ(α)Γ(β) ( n     )

And the Posterior distribution is:

3
Γ(α+β) y+α-1
p (1 - p) n-y+β-1
hn ( x , p) Γ(α)Γ(β)
 ( p | x)   
gn ( x) Γ(α+β)  ( y   ) ( n  y   )
Γ(α)Γ(β) (n     )
Γ(n+α+β)
p y+α-1 (1 - p) n-y+β-1
Γ(y+α)Γ(n-y+β)

This a Beta distribution with α* = y + α and β* = n – y + β, so the posterior is:

Γ(α*+β*) α*-1
ξ(p | x)= p (1 - p)β*-1
Γ(α*)Γ(β*)

The mean of the posterior is:

α* y+α
E(X)  pˆ  
α* + β* α + β + n

If we take a second sample and use the posterior as our new prior then

Γ(α*+β*) α*-1
 2 (p)=ξ1 (p | x1 )= p (1 - p)β*-1
Γ(α*)Γ(β*)

and the joint distribution of the sample is (“likelihood function”) for the second sample

is:

f n 2 (x 2 | p) = p y2 (1 - p) n 2 -y2

where the subscript gives the sample number. The posterior is the Beta distribution

 α-1
Γ(α + β) 
ξ 2 (p | x 2 )= p  (1 - p)β-1
 
Γ(α)Γ(β)

where

α = y 2 + α* = y 2 + y1 +α and
β = n 2 - y 2 + β* = n 2 - y 2 + n1 - y1 + β = n1 + n 2 - y1 - y 2 + β

and

4
α y1 + y 2 + α
E(X)  pˆ  
α + β α + β + n1 + n 2

As the total sample size gets large this converges to the MLE estimator:

y k
E(X) = k=1
m
= y = pˆ
n
k=1
k

You might also like