0% found this document useful (0 votes)
24 views37 pages

Understanding Binomial Distribution

The document provides an overview of the binomial distribution, explaining its characteristics, applications, and the mathematical formulas used to calculate probabilities. It discusses random variables, Bernoulli trials, and includes practical examples such as the treatment of kidney cancer to illustrate how to apply binomial probability. Additionally, it covers the expected value and variance for binomial distributions, along with practice problems for further understanding.
Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
24 views37 pages

Understanding Binomial Distribution

The document provides an overview of the binomial distribution, explaining its characteristics, applications, and the mathematical formulas used to calculate probabilities. It discusses random variables, Bernoulli trials, and includes practical examples such as the treatment of kidney cancer to illustrate how to apply binomial probability. Additionally, it covers the expected value and variance for binomial distributions, along with practice problems for further understanding.
Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Binomial Distribution

Probability distributions
• We use probability 100

distributions because 80

they fit lots of data in 60

real world 40

20 Std. Dev = 14.76


Mean = 35.3
0 N = 713.00

2.

18

34

50

66
0

.0

.0

.0

.0
Ht (cm) 1996

Height (cm) of Hypericum cumulicola


Random Variables

Variable: measurable characteristic

Random Variable: variable that can have different


outcomes of an experiment, determined by
chance

Examples:
•X = outcome of roll of a die,
•Y = outcome of a coin toss,
•Z = height
Random Variables

• Types:

– Discrete: Bernoulli, Binomial, Poisson

– Continuous: Exponential, Normal


Random Variables - Bernoulli
When outcomes of experiment are binary

Dichotomous (Bernoulli): X = 0 or 1

P(X=1) = p

P(X=0) = 1-p

e.g. Heads, Tails


True, False
Success, Failure
The Binomial Distribution
Bernoulli Random Variables

• Imagine a simple trial with only two possible outcomes


– Success (S)
– Failure (F)

• Examples
– Toss of a coin (heads or tails)
– Sex of a newborn (male or female)
– Survival of an organism in a region (live or die)
The Binomial Distribution
Overview

• Suppose that the probability of success is p

• What is the probability of failure?


– q=1–p

• Examples
– Toss of a coin (S = head): p = 0.5 ⇒ q = 0.5
– Roll of a die (S = 1): p = 0.1667 ⇒ q = 0.8333
– Fertility of a chicken egg (S = fertile): p = 0.8 ⇒ q = 0.2
The Binomial Distribution
Overview

• Imagine that a trial is repeated n times

• Examples
– A coin is tossed 5 times
– A die is rolled 25 times
– 50 chicken eggs are examined

• Assume p remains constant from trial to trial and that the


trials are statistically independent of each other
The Binomial Distribution
Overview

• What is the probability of obtaining x successes in n trials?

• Example
– What is the probability of obtaining 2 heads from a coin
that was tossed 5 times?

P(HHTTT) = (1/2)5 = 1/32


The Binomial Distribution
Overview

• But there are more possibilities:

HHTTT HTHTT HTTHT HTTTH


THHTT THTHT THTTH
TTHHT TTHTH
TTTHH

P(2 heads) = 10 × 1/32 = 10/32


The Binomial Distribution
Overview

• In general, if trials result in a series of success and failures,

FFSFFFFSFSFSSFFFFFSF…

Then the probability of x successes in that order is

P(x) = q ⋅ q ⋅ p ⋅ q ⋅ …
= px ⋅ qn – x
The Binomial Distribution
Overview

• However, if order is not important, then

n!
P(x) = px ⋅ qn – x
x!(n – x)!

n!
where is the number of ways to obtain x successes
x!(n – x)!

in n trials, and i! = i ⋅ (i – 1) ⋅ (i – 2) ⋅ … ⋅ 2 ⋅ 1
Binomial Probability Distribution
A binomial random variable X is defined to the number
of “successes” in n independent trials where the
P(“success”) = p is constant.
Notation: X ~ BIN(n,p)
In the definition above notice the following conditions
need to be satisfied for a binomial experiment:
1. There is a fixed number of n trials carried out.
2. The outcome of a given trial is either a “success”
or “failure”.
3. The probability of success (p) remains constant
from trial to trial.
4. The trials are independent, the outcome of a trial is
not affected by the outcome of any other trial.
Binomial Distribution
• If X ~ BIN(n, p), then
 n x n− x n!
P( X = x) =   p (1 − p ) = p x (1 − p ) n − x x = 0,1,..., n.
 x x!(n − x)!
• where
n!= n × (n − 1) × (n − 2) × ... × 1, also 0! = 1 and 1! = 1
n
  = " n choose x" = the number of ways to obtain
x
x " successes" in n trials.
P (" success" ) = p
Binomial Distribution
• If X ~ BIN(n, p), then
 n x n− x n!
P( X = x) =   p (1 − p ) = p x (1 − p ) n − x x = 0,1,..., n.
 x x!(n − x)!
• E.g. when n = 3 and p = .50 there are 8 possible
equally likely outcomes (e.g. flipping a coin)
SSS SSF SFS FSS SFF FSF FFS FFF
X=3 X=2 X=2 X=2 X=1 X=1 X=1 X=0
P(X=3)=1/8, P(X=2)=3/8, P(X=1)=3/8, P(X=0)=1/8
• Now let’s use binomial probability formula instead…
Binomial Distribution
• If X ~ BIN(n, p), then
 n x n− x n!
P( X = x) =   p (1 − p ) = p x (1 − p ) n − x x = 0,1,..., n.
 x x!(n − x)!
• E.g. when n = 3, p = .50 find P(X = 2)
SSF
 3 3! 3! 3 ⋅ 2 ⋅1
  = = = = 3 ways SFS
 2  2!(3 − 2)! 2!1! (2 ⋅1) ⋅1 FSS

 3 2
P ( X = 2) =  .5 (.5) 3− 2 = 3(.52 )(.51 ) = .375 or 3
 2 8
Example: Treatment of Kidney Cancer
• Suppose we have n = 40 patients who will be
receiving an experimental therapy which is
believed to be better than current treatments
which historically have had a 5-year survival
rate of 20%, i.e. the probability of 5-year
survival is
p = .20.
• Thus the number of patients out of 40 in our
study surviving at least 5 years has a binomial
distribution, i.e. X ~ BIN(40,.20).
Example: Treatment of Kidney Cancer

• Suppose that using the new treatment we find


that 16 out of the 40 patients survive at least
5 years past diagnosis.
• Q: Does this result suggest that the new
therapy has a better 5-year survival rate than
the current, i.e. is the probability that a
patient survives at least 5 years greater than
.20 or a 20% chance when treated using the
new therapy?
What do we consider in answering the
question of interest?
We essentially ask ourselves the following:
• If we assume that new therapy is no better
than the current what is the probability we
would see these results by chance variation
alone?

• More specifically what is the probability of


seeing 16 or more successes out of 40 if the
success rate of the new therapy is .20 or 20%
as well?
Example: Treatment of Kidney Cancer
• X ~ BIN(40,.20), find the probability that exactly 16
patients survive at least 5 years.
 40  16 24
P( X = 16) =  .20 .80 = .001945
 16 
• Keep in mind that we need to find the probability of
having 16 or more patients surviving at least 5 yrs.
Example: Treatment of Kidney Cancer
• So we actually need to find:
P(X > 16) = P(X = 16) + P(X = 17) + … + P(X = 40)
 40  16 24
P( X = 16) =  .20 .80 = .001945
 16 
 40  17 23
+ P( X = 17) =  .20 .80 = .000686
 17 

 40 
+ P( X = 40) =  .20 40.800 ≈ 0
 40 

= .002936
Conclusion
• Because it is highly unlikely (p = .0029) that
we would see this many successes in a group
of 40 patients if the new method had the
same probability of success as the current
method, we have to make a choice, either …
A) we have obtained a very rare result by dumb
luck.
OR
B) our assumption about the success rate of the
new method is wrong and in actuality the new
method has a better than 20% 5-year survival
rate making the observed result more
plausible.
All probability distributions are characterized by an
expected value and a variance:

If X follows a binomial distribution with


parameters n and p: X ~ Bin (n, p)
Then:
µx= E(X) = np Note: the variance will
always lie between
σx2 =Var (X) = np(1-p) 0*n-.25 *n

σx =SD (X)= np(1 − p)


p(1-p) reaches maximum at
p=.5
P(1-p)=.25
Characteristics of Bernouilli
distribution
For Bernouilli (n=1)
E(X) = p
Var (X) = p(1-p)
Things that follow a binomial
distribution…

Cohort study (or cross-sectional):


– The number of exposed individuals in your sample that
develop the disease
– The number of unexposed individuals in your sample that
develop the disease
Case-control study:
– The number of cases that have had the exposure
– The number of controls that have had the exposure
Practice problems
• 1. You are performing a cohort study. If the
probability of developing disease in the exposed
group is .05 for the study duration, then if you
sample (randomly) 500 exposed people, how many
do you expect to develop the disease? Give a margin
of error (+/- 1 standard deviation) for your estimate.

• 2. What’s the probability that at most 10 exposed


people develop the disease?
Answer
1. You are performing a cohort study. If the probability of
developing disease in the exposed group is .05 for the study
duration, then if you sample (randomly) 500 exposed people,
how many do you expect to develop the disease? Give a
margin of error (+/- 1 standard deviation) for your estimate.

X ~ binomial (500, .05)


E(X) = 500 (.05) = 25
Var(X) = 500 (.05) (.95) = 23.75
StdDev(X) = square root (23.75) = 4.87
∴25 ± 4.87
Answer
2. What’s the probability that at most 10 exposed subjects develop
the disease?

This is asking for a CUMULATIVE PROBABILITY: the probability of 0 getting the


disease or 1 or 2 or 3 or 4 or up to 10.

P(X≤10) = P(X=0) + P(X=1) + P(X=2) + P(X=3) + P(X=4)+….+ P(X=10)=

 500   500   500   500 


 (.05) (.95) +  (.05) (.95) +  (.05) (.95) + ... +  (.05) (.95) < .01
0 500 1 499 2 498 10 490

 0  1  2  10 
Pascal’s Triangle Trick
You’ll rarely calculate the binomial by hand. However, it is good to
know how to …

Pascal’s Triangle Trick for calculating binomial coefficients


Recall from math in your past that Pascal’s Triangle is used to get
the coefficients for binomial expansion…
For example, to expand: (p + q)5
The powers follow a set pattern: p5 + p4q1 + p3q2 + p2q3+ p1q4+ q5
But what are the coefficients?
– Use Pascal’s Magic Triangle…
Pascal’s Triangle

Edges are
all 1’s

1
To get the 11
coefficient Add the two
121
for numbers in
1331
expanding the row above
14641
to the 5th to get the
1 5 10 10 5 1
power, use number
1 6 15 20 15 6 1
the row below, e.g.:
1 7 21 35 35 21 7 1
that starts 3+1=4;
with 5. 5+10=15

(p + q)5 = 1p5 + 5p4q1 + 10p3q2 + 10p2q3+ 5p1q4+ 1q5


The Binomial Distribution
Overview
Bin(0.3, 5)
Bin(0.1, 5)
0.4
0.8
0.3
0.6
0.2
0.4
0.1
0.2
0 0
0 1 2 3 4 5
0 1 2 3 4 5

Bin(0.5, 5)

0.4
0.3
0.2
0.1
0
0 1 2 3 4 5
Bin(0.9, 5)
Bin(0.7, 5)
0.8
0.4
0.6
0.3
0.4
0.2
0.2
0.1
0
0
0 1 2 3 4 5
0 1 2 3 4 5
Poisson distribution
In a Binomial distribution if n is large and p is small

Poisson distribution

Ex 1.
A sample of size 1000 is drawn from the population

Then X ~ Bin (1000, p)


N=1000 large, p=0.01 small
m ≈ 1000p = 1000 × 0.01 = 10
Practice problem
A manufacturer, who produces medicine bottles, finds that 0.1% of the
bottles are defective. The bottles are packed in boxes containing 500
bottles. A drug manufacturer buys 100 boxes from the producer of
bottles. Using Poisson distribution, find how many boxes will contain
at least two defective bottles.
Answer
Let X be the Poisson variate, “the number of defective bottles in a box”.
Here, number of bottles in a box (n) = 500,
therefore, the probability (p) of a bottle being defective is
P= 0.1% = 0.1/100 = 0.001
n = 500
np = 500 ×.001 = 0.5
Using Poisson distribution, we have
𝑒𝑒 −𝑚𝑚 𝑚𝑚𝑥𝑥
𝑃𝑃[𝑋𝑋 = 𝑥𝑥] =
𝑥𝑥!
𝑒𝑒 −0.5 (0.5)𝑥𝑥
=
𝑥𝑥!
Therefore, the probability that a box contain at least two defective bottles

= 𝑃𝑃[𝑋𝑋 ≥ 2] = [1 − 𝑃𝑃 𝑋𝑋 < 2 ] = 1 − [𝑃𝑃 𝑋𝑋 = 0 +𝑃𝑃 𝑋𝑋 = 0 ]


𝑒𝑒 −0.5 0.5 0 𝑒𝑒 −0.5 0.5 1
=1− − = 1 − 𝑒𝑒 −0.5 (1 + 0.5)
0! 1!
−0.5
= 1 − 𝑒𝑒 (1 + 0.5) = 1 − 0.6065 × 1.5 = 0.09025

Hence, the expected number of boxes containing at least two defective bottles
= N.P[X ≥ 2] = = (100) (0.09025) = 9.025

You might also like