0% found this document useful (0 votes)
5 views64 pages

Prob & Statisitics

The document defines joint probability distributions, including both discrete and continuous cases, and explains their properties and examples. It covers concepts such as marginal distributions, conditional probabilities, independence, covariance, and provides solved examples for clarity. Key points include the relationship between joint distributions and independence, as well as the calculation of expectations and variances.

Uploaded by

vineshkotwani2
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views64 pages

Prob & Statisitics

The document defines joint probability distributions, including both discrete and continuous cases, and explains their properties and examples. It covers concepts such as marginal distributions, conditional probabilities, independence, covariance, and provides solved examples for clarity. Key points include the relationship between joint distributions and independence, as well as the calculation of expectations and variances.

Uploaded by

vineshkotwani2
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

JOINT DISTRIBUTION & TYPES OF

DISTRIBUTIONS

GHULAM MUSTAFA

February 20, 2026


Definition of Joint Probability Distribution

Definition
A joint probability distribution describes the probability of two or
more random variables occurring simultaneously. It provides a
complete description of the probabilistic relationship between
multiple random variables defined on the same sample space.
Joint Probability Mass Function (PMF)
For two discrete random variables X and Y :
Joint PMF

p(x, y ) = P(X = x and Y = y )

Properties:
▶ p(x, y ) ≥ 0 for all (x, y )
XX
▶ p(x, y ) = 1
x y

Example
For rolling two fair dice:
1
p(x, y ) = for all x, y ∈ {1, 2, 3, 4, 5, 6}
36
Example: Rolling Two Fair Dice
Solution:
▶ Let X = outcome of first die, Y = outcome of second die.
The sample space has 36 equally likely outcomes.
▶ Joint Probability Mass Function (PMF):
1
pX ,Y (x, y ) = P(X = x, Y = y ) = , x, y = 1, . . . , 6.
36
▶ Marginal PMFs:
6 6
X X 1 6 1
pX (x) = pX ,Y (x, y ) = = = , x = 1, . . . , 6.
36 36 6
y =1 y =1
1
Similarly, pY (y ) = for y = 1, . . . , 6. So each die is
6
uniformly distributed.
▶ Independence: Since
1 1 1
pX ,Y (x, y ) = = · = pX (x) pY (y ) for all x, y ,
36 6 6
X and Y are independent.
▶ Some probabilities:
6 1
▶ P(X + Y = 7) = = (pairs:
36 6
(1,6),(2,5),(3,4),(4,3),(5,2),(6,1)).
6 1
▶ P(X + Y ≤ 4) = = (pairs:
36 6
(1,1),(1,2),(1,3),(2,1),(2,2),(3,1)).
P(X = 1, X + Y = 7) 1/36 1
▶ P(X = 1 | X +Y = 7) = = = .
P(X + Y = 7) 6/36 6
Expectation and Variance:
1+2+3+4+5+6
E [X ] = E [Y ] = = 3.5,
6
E [X + Y ] = E [X ] + E [Y ] = 7,
35
Var(X ) = Var(Y ) = ,
12
35
Var(X + Y ) = Var(X ) + Var(Y ) = (by independence).
6
Joint Probability Density Function (PDF)
For two continuous random variables X and Y :
Joint PDF
Z ∞ Z ∞
f (x, y ) ≥ 0 and f (x, y ) dx dy = 1
−∞ −∞

For any region A in the xy-plane:


ZZ
P[(X , Y ) ∈ A] = f (x, y ) dx dy
A

Example
For a uniform distribution over a unit square:

f (x, y ) = 1 for 0 ≤ x ≤ 1, 0 ≤ y ≤ 1
Example 3: Continuous Bivariate Distribution

Joint pdf:
(
2e −x e −y , x > 0, y > 0,
fX ,Y (x, y ) =
0, otherwise.

Marginal of X :
Z ∞ Z ∞
−x −y −x
fX (x) = 2e e dy = 2e e −y dy = 2e −x ·1 = 2e −x , x > 0.
0 0

By symmetry, fY (y ) = 2e −y , y > 0.
Example 3: Continuous Bivariate Distribution

Joint pdf:
(
2e −x e −y , x > 0, y > 0,
fX ,Y (x, y ) =
0, otherwise.

Marginal of X :
Z ∞ Z ∞
−x −y −x
fX (x) = 2e e dy = 2e e −y dy = 2e −x ·1 = 2e −x , x > 0.
0 0

By symmetry, fY (y ) = 2e −y , y > 0.
Independence check:

fX (x)fY (y ) = (2e −x )(2e −y ) = 4e −(x+y ) ̸= 2e −x e −y .

Thus X and Y are not independent.


Summary

▶ The joint distribution contains all information about the pair


(X , Y ).
▶ Marginal distributions are obtained by summing/integrating the
joint over the other variable(s).
▶ If the joint factorises into the product of the marginals, then X and
Y are independent.
▶ Examples illustrate the mechanics for both discrete and continuous
cases.
Conditional Probability and Independence

Conditional Distribution
f (x, y )
f (x|y ) = (continuous)
fY (y )
p(x, y )
p(x|y ) = (discrete)
pY (y )

Independence
X and Y are independent if and only if:

f (x, y ) = fX (x) · fY (y ) (continuous)

p(x, y ) = pX (x) · pY (y ) (discrete)


Solved Example 1

Problem: Roll a fair die twice. Let X = first roll, Y = second roll.
Solution:
1
p(x, y ) = ∀x, y ∈ {1, 2, . . . , 6}
36
1
P(X = 3, Y = 5) =
36
4 1
P(X ≤ 2, Y ≥ 5) = =
36 9
6
X 1 1
pX (x) = = for all x
36 6
y =1
Solved Example 2

Problem: Given f (x, y ) = 3x for 0 ≤ y ≤ x ≤ 1.


Marginals:
Z x
fX (x) = 3x dy = 3x 2 , 0 < x < 1
0
Z 1
3
fY (y ) = 3x dx = (1 − y 2 ), 0 < y < 1
y 2

Conditional:
3x 2x
f (x|y ) = 3
= , y ≤x ≤1
2 (1
2
−y ) 1 − y2
Solved Example 2 (Continued)

Compute P(X ≤ 12 , 14 < Y < 43 ):

Z 1/2 Z 1/2
P= 3x dx dy
y =1/4 x=y
Z 1/2 

1 3 2
= − y dy
1/4 4 2
1/2
y3

3 1
= y−
2 4 3 1/4
5
=
128
Independence check: fX (x)fY (y ) = 92 x 2 (1 − y 2 ) ̸= 3x = f (x, y )
Therefore, X and Y are not independent.
What is a Marginal Distribution?

Definition (Marginal Distribution)


The probability distribution of a single random variable obtained by
summing (discrete) or integrating (continuous) the joint
distribution over the other variable(s).

▶ Discrete:
X X
pX (x) = pX ,Y (x, y ), pY (y ) = pX ,Y (x, y ).
y x

▶ Continuous:
Z ∞ Z ∞
fX (x) = fX ,Y (x, y ) dy , fY (y ) = fX ,Y (x, y ) dx.
−∞ −∞
Example 1: Discrete Joint Table

Given the joint PMF:

Y =1 Y =2 Y =3
X =0 0.1 0.2 0.1
X =1 0.3 0.1 0.2

Marginal of X :

pX (0) = 0.1 + 0.2 + 0.1 = 0.4, pX (1) = 0.3 + 0.1 + 0.2 = 0.6.

Marginal of Y :

pY (1) = 0.1+0.3 = 0.4, pY (2) = 0.2+0.1 = 0.3, pY (3) = 0.1+0.2 = 0.3


P P
Check: pX = 1, pY = 1.
Example 2: Continuous Joint PDF
Joint PDF:
3
fX ,Y (x, y ) = (x 2 + y 2 ), 0 < x < 1, 0 < y < 1.
2
Marginal of X :
Z 1
3 2
fX (x) = (x + y 2 ) dy
0 2
1
y3

3 2
= x y+
2 3 y =0
 
3 1 3 1
= x2 + = x2 + , 0 < x < 1.
2 3 2 2
Marginal of Y (by symmetry):
1 3 2
fY (y ) = + y , 0 < y < 1.
2 2
R1 R1 3 2 1

Check: 0 fX (x) dx = 0 2 x + 2 dx = 1 (similarly for fY ).
Solved Example (Continued)

Verify sum = 1:

0.18 + 0.18 + 0.064 + 0.096 + 0.12 + 0.08 + 0.12 + 0.16 = 1.00

P(X = 0) = 0.18 + 0.18 + 0.064 + 0.096 = 0.52

P(X = 1, Y = 0) = 0.12 + 0.08 = 0.20

P(X = 0, Y = 1, Z = 1)
P(Z = 1|X = 0, Y = 1) =
P(X = 0, Y = 1)
0.096 0.096
= = = 0.6
0.064 + 0.096 0.16
What is Covariance?

Conceptual Definition
Covariance measures the linear relationship between two random
variables. It indicates how two variables change together.

Mathematical Definition:

Cov(X , Y ) = E [(X − µX )(Y − µY )]

where µX = E [X ] and µY = E [Y ]. Computational Formula:

Cov(X , Y ) = E [XY ] − E [X ]E [Y ]
Interpreting Covariance

▶ Cov(X , Y ) > 0: positive linear relationship

▶ Cov(X , Y ) < 0: negative linear relationship

▶ Cov(X , Y ) = 0: uncorrelated (no linear relationship)

Key Properties:
▶ Cov(X , Y ) = Cov(Y , X ) (symmetry)

▶ Cov(X , X ) = Var(X )

▶ Bilinearity: Cov(aX + b, cY + d) = ac Cov(X , Y )

▶ Var(X + Y ) = Var(X ) + Var(Y ) + 2 Cov(X , Y )

▶ Independence ⇒ zero covariance (converse not true)


Solved Example 1 (Discrete)

Joint PMF Table:


X \Y 0 1 2
0 0.1 0.2 0.1
1 0.2 0.1 0.1
2 0.0 0.1 0.1

Step 1: Marginals

pX (0) = 0.4, pX (1) = 0.4, pX (2) = 0.2

pY (0) = 0.3, pY (1) = 0.4, pY (2) = 0.3


Step 2: Means

E [X ] = 0.8, E [Y ] = 1.0
Covariance: Definition and Example
Definition
For two random variables X and Y :
 
Cov(X , Y ) = E (X − E [X ])(Y − E [Y ])

Cov(X , Y ) = E [XY ] − E [X ]E [Y ]

Simple Example: Joint distribution of X and Y :


Y =0 Y =1
X =0 0.2 0.3
X =1 0.1 0.4
P(X = 0) = 0.5, P(X = 1) = 0.5, P(Y = 0) = 0.3, P(Y = 1) = 0.7
E [X ] = 0 · 0.5 + 1 · 0.5 = 0.5, E [Y ] = 0 · 0.3 + 1 · 0.7 = 0.7.
Expected product E [XY ]
E [XY ] = (1 · 1) · 0.4 = 0.4.
Example (Continuous)
Joint PDF:

f (x, y ) = x + y , 0 ≤ x ≤ 1, 0 ≤ y ≤ 1

Step 1: Marginal Densities


Z 1
1
fX (x) = (x + y ) dy = x + , 0≤x ≤1
0 2
Z 1
1
fY (y ) = (x + y ) dx = y + , 0≤y ≤1
0 2
Step 2: Means
Z 1  
1 7
E [X ] = x x+ dx =
0 2 12
7
By symmetry, E [Y ] = 12 .
Example (Continued)
Step 3: Compute E [XY ]
Z 1Z 1 Z 1Z 1
E [XY ] = xy (x + y ) dy dx = (x 2 y + xy 2 ) dy dx
0 0 0 0

Inner integral:
1
x2 x
Z
(x 2 y + xy 2 ) dy = +
0 2 3
Outer integral:
Z 1 2 
x x 1 1 1
+ dx = + =
0 2 3 6 6 3

Step 4: Covariance
 2
1 7 1 49 1
Cov(X , Y ) = − = − =−
3 12 3 144 144

Weak negative linear relationship.


What is a Linear Combination?

A linear combination of random variables X1 , X2 , . . . , Xn is

Y = a1 X1 + a2 X2 + · · · + an Xn + b

where a1 , a2 , . . . , an and b are constants.

Examples:
▶ Portfolio return: P = 0.6A + 0.4B

▶ Difference of two measurements: D = X − Y

▶ Weighted sum: Z = 2X − 3Y + 1
Expected Value is Linear

For any constants ai and b:

E [Y ] = a1 E [X1 ] + a2 E [X2 ] + · · · + an E [Xn ] + b

Special Cases:

E [X + Y ] = E [X ] + E [Y ]
E [X − Y ] = E [X ] − E [Y ]
E [aX + b] = aE [X ] + b

This holds regardless of independence.


Variance Formula

n
X X
Var(Y ) = ai2 Var(Xi ) + 2 ai aj Cov(Xi , Xj )
i=1 1≤i<j≤n

If the variables are independent (Cov(Xi , Xj ) = 0 for i ̸= j):

Var(Y ) = a12 Var(X1 ) + a22 Var(X2 ) + · · · + an2 Var(Xn )

Important: Covariance terms can increase or decrease the


variance.
Solved Example

Problem: Portfolio: 60% in Stock A, 40% in Stock B.


E [A] = 8%, E [B] = 12%, Var(A) = 0.04, Var(B) = 0.09.
Independent returns. Solution:

P = 0.6A + 0.4B
E [P] = 0.6(8%) + 0.4(12%) = 4.8% + 4.8% = 9.6%
Var(P) = (0.6)2 (0.04) + (0.4)2 (0.09)
= 0.36 × 0.04 + 0.16 × 0.09 = 0.0144 + 0.0144 = 0.0288

Thus, expected return 9.6%, variance 0.0288.


Solved Example 2

Problem: E [X ] = 5, E [Y ] = 3, Var(X ) = 4, Var(Y ) = 9,


Cov(X , Y ) = 2. Find mean and variance of Z = 2X − 3Y + 1.
Solution:

E [Z ] = 2(5) − 3(3) + 1 = 10 − 9 + 1 = 2
Var(Z ) = 22 (4) + (−3)2 (9) + 2(2)(−3)(2)
= 4 × 4 + 9 × 9 + (−24) = 16 + 81 − 24 = 73

So E [Z ] = 2 and Var(Z ) = 73.


Definition of a Bernoulli Trial
Bernoulli Trial
A random experiment with exactly two outcomes:
▶ Success (S) with probability p

▶ Failure (F ) with probability 1 − p

The associated random variable X is:


(
1, success,
X =
0, failure.

Probability mass function:

P(X = 1) = p, P(X = 0) = 1 − p.

Key properties:

E [X ] = p, Var(X ) = p(1 − p).


Example: Free Throws
Example
A basketball player makes a free throw with probability 0.7
(success) and misses with probability 0.3 (failure). Assume each
free throw is independent.
(a) Define the Bernoulli random variable.
(
1, makes the shot,
X =
0, misses.

Here p = 0.7.
(b) Probability of success:

P(X = 1) = p = 0.7.

(c) Probability of failure:

P(X = 0) = 1 − p = 0.3.
Example: Free Throws (continued)

(d) Expected number of successes in one trial:

E [X ] = p = 0.7.

(e) Variance:

Var(X ) = p(1 − p) = 0.7 × 0.3 = 0.21.


Binomial Distribution

Definition
The binomial distribution counts how many times a specific
event (a “success”) happens when you repeat the same simple
experiment a fixed number of times, and each repetition is
independent and has the same chance of success.
Intuition: Flip a coin several times. The binomial distribution tells
you the probability of getting, say, exactly 3 heads in 5 flips.

Formula: For n trials, success probability p,


 
n k
P(X = k) = p (1 − p)n−k , k = 0, 1, . . . , n.
k
Example 1: Flipping a Fair Coin

Problem: Flip a fair coin 5 times. Probability of exactly 3 heads?


Solution:

n = 5, p = 0.5, k = 3
   
5 3 2 5
P(X = 3) = (0.5) (0.5) = (0.5)5
3 3
 
5 1
= 10, (0.5)5 =
3 32
1 10 5
P = 10 × = = = 0.3125
32 32 16
Thus, a 31.25% chance.
Example 2: Guessing on a Test

Problem: 10 questions, each with 4 choices. Random guessing.


Probability exactly 4 correct?
Solution:

n = 10, p = 0.25, k = 4
 
10
P(X = 4) = (0.25)4 (0.75)6
4
 
10
= 210, (0.25)4 = 0.00390625, (0.75)6 ≈ 0.17797
4
210 × 0.00390625 = 0.8203125
0.8203125 × 0.1779785 ≈ 0.146

About 14.6% chance.


Multinomial Distribution

Definition
The multinomial distribution is like the binomial, but instead of
just two outcomes (success/failure), there are several possible
outcomes. It gives the probability of getting specific counts for
each outcome in a fixed number of independent trials.

Intuition: Roll a die multiple times. The multinomial distribution


tells you the probability of getting, say, exactly two of each number
in 12 rolls.

Formula: For n trials, m categories with probabilities p1 , . . . , pm :


n!
P(X1 = x1 , . . . , Xm = xm ) = p x1 · · · pm
xm
.
x1 ! · · · xm ! 1
Example 1: Fair Die Rolled 6 Times

Problem: Roll a fair die 6 times. Probability each face appears


exactly once?
Solution:
1
n = 6, m = 6, pi = , xi = 1 for all i
6
 6
6! 1 1
P= = 720 ×
1!1!1!1!1!1! 6 46656
720 5
= ≈ 0.01543
46656 324
About 1.54% chance.
Example 2: Sampling Voters
Problem: 30% prefer A, 50% B, 20% C. Sample 5 people.
Probability exactly 2 A, 2 B, 1 C?
Solution:

n = 5, pA = 0.3, pB = 0.5, pC = 0.2


xA = 2, xB = 2, xC = 1
5!
P= × (0.3)2 (0.5)2 (0.2)1
2! 2! 1!
5! 120
= = 30
2!2!1! 4
(0.3)2 = 0.09, (0.5)2 = 0.25, (0.2)1 = 0.2
0.09 × 0.25 = 0.0225, 0.0225 × 0.2 = 0.0045
30 × 0.0045 = 0.135

Thus, 13.5% probability.


hypergeometric distribution

Definition
The hypergeometric distribution gives the probability of
obtaining exactly k successes in n draws without replacement
from a finite population of size N that contains exactly K
successes.
Intuition: Drawing cards from a deck without replacement. Each
draw changes the composition, so the probabilities are not
constant. Key difference from binomial: Binomial assumes
independent draws with replacement (constant probability);
hypergeometric assumes no replacement.
Mathematical Formula

K N−K
 
k n−k
P(X = k) = N

n
where:
▶ N = population size

▶ K = number of successes in the population

▶ n = sample size (drawn without replacement)

▶ k = number of successes in the sample

▶ ba = binomial coefficient

Example 1: Cards

Problem: From a standard deck (52 cards), draw 5 cards without


replacement. What is the probability of exactly 2 aces?
Solution:

N = 52, K = 4, n = 5, k=2
4 48
 
2
P= 52
3
5
     
4 48 52
= 6, = 17296, = 2598960
2 3 5
6 × 17296 103776
P= = ≈ 0.03993
2598960 2598960
About 3.99% chance.
Example 2: Defective Components

Problem: A batch of 20 components contains 3 defectives.


Randomly select 5 without replacement. Probability of exactly 1
defective?
Solution:

N = 20, K = 3, n = 5, k=1
3 17
 
1
P= 20
4
5
     
3 17 20
= 3, = 2380, = 15504
1 4 5
3 × 2380 7140
P= = ≈ 0.4605
15504 15504
About 46.05% chance.
Hypergeometric Distribution

Definition
The hypergeometric distribution gives the probability of
obtaining exactly k successes in n draws without replacement
from a finite population of size N that contains exactly K
successes.
Intuition: Drawing cards from a deck without replacement. Each
draw changes the composition, so the probabilities are not
constant. Key difference from binomial: Binomial assumes
independent draws with replacement (constant probability);
hypergeometric assumes no replacement.
Mathematical Formula

K N−K
 
k n−k
P(X = k) = N

n
where:
▶ N = population size

▶ K = number of successes in the population

▶ n = sample size (drawn without replacement)

▶ k = number of successes in the sample

▶ ba = binomial coefficient

Example 1: Cards

Problem: From a standard deck (52 cards), draw 5 cards without


replacement. What is the probability of exactly 2 aces?
Solution:

N = 52, K = 4, n = 5, k=2
4 48
 
2
P= 52
3
5
     
4 48 52
= 6, = 17296, = 2598960
2 3 5
6 × 17296 103776
P= = ≈ 0.03993
2598960 2598960
About 3.99% chance.
Example 2: Defective Components

Problem: A batch of 20 components contains 3 defectives.


Randomly select 5 without replacement. Probability of exactly 1
defective?
Solution:

N = 20, K = 3, n = 5, k=1
3 17
 
1
P= 20
4
5
     
3 17 20
= 3, = 2380, = 15504
1 4 5
3 × 2380 7140
P= = ≈ 0.4605
15504 15504
About 46.05% chance.
Negative Binomial Distribution

Definition
The negative binomial distribution models the number of trials
needed to achieve a fixed number of successes in a sequence of
independent and identical Bernoulli trials.

Intuition: Flip a coin until you get 3 heads. The number of flips
required (including the last head) is a negative binomial random
variable.

Key difference from binomial: Binomial fixes the number of


trials and counts successes; negative binomial fixes the number of
successes and counts trials.
Mathematical Formula

 
x −1 r
P(X = x) = p (1 − p)x−r , x = r , r + 1, r + 2, . . .
r −1

where:
▶ r = desired number of successes

▶ p = probability of success on each trial

▶ x = total number of trials (including the last success)

▶ x−1

r −1 counts ways to arrange the first r − 1 successes among the
first x − 1 trials.

r
Mean: E [X ] = p
r (1−p)
Variance: Var(X ) = p2
Example 1: Getting 3 Heads

Problem: A fair coin is flipped until 3 heads appear. What is the


probability that exactly 5 flips are needed?
Solution:

r = 3, p = 0.5, x = 5
   
5−1 4
P(X = 5) = (0.5)3 (0.5)2 = (0.5)5
3−1 2
 
4 1
= 6, (0.5)5 =
2 32
1 6 3
P =6× = = = 0.1875
32 32 16
There is an 18.75% chance.
Example 2: Making 2 Shots

Problem: A basketball player makes 70% of free throws. She


shoots until she makes 2. What is the probability she takes exactly
4 shots?
Solution:

r = 2, p = 0.7, x = 4
   
4−1 2 2 3
P(X = 4) = (0.7) (0.3) = (0.49)(0.09)
2−1 1
 
3
= 3, 0.49 × 0.09 = 0.0441
1
P = 3 × 0.0441 = 0.1323

About 13.23% chance.


Geometric Distribution?

Definition
The geometric distribution models the number of trials needed
to get the first success in a sequence of independent and identical
Bernoulli trials.

Intuition: Flip a coin until you get heads. The number of flips
required follows a geometric distribution.

Key properties:
▶ Trials are independent, each with success probability p.

▶ We stop at the first success.

▶ Possible values: k = 1, 2, 3, . . .
Mathematical Formula

P(X = k) = (1 − p)k−1 p, k = 1, 2, 3, . . .

1
Mean: E [X ] =
p
1−p
Variance: Var(X ) =
p2
Memoryless property: The geometric distribution is the only
discrete distribution that is memoryless:

P(X > m + n | X > m) = P(X > n)


Example 1: First Head on the 4th Flip

Problem: A fair coin is flipped until heads appears. Find the


probability that the first head occurs on the 4th flip.
Solution:

p = 0.5, k=4
P(X = 4) = (1 − p)4−1 p = (0.5)3 × 0.5
= 0.125 × 0.5 = 0.0625

Thus, there is a 6.25% chance.


Example 2: First 6 on the 3rd Roll

Problem: A fair die is rolled until a 6 appears. Find the


probability that it takes exactly 3 rolls.
Solution:
1 5
p = , 1−p = , k =3
6 6
 2
5 1 25 1 25
P(X = 3) = × = × = ≈ 0.1157
6 6 36 6 216

About 11.57% chance.


Poisson Distribution

Definition
The Poisson distribution models the number of times an event
occurs in a fixed interval of time or space, given that these events
happen with a known constant average rate and independently of
each other.

Intuition: Count emails per hour, calls per minute, or defects per
meter – all can be modeled with Poisson.

Key properties:
▶ Events occur independently.

▶ Average rate λ is constant.

▶ No two events occur simultaneously.


Mathematical Formula

e −λ λk
P(X = k) = , k = 0, 1, 2, . . .
k!

Mean: E [X ] = λ
Variance: Var(X ) = λ (equal mean and variance)

Conditions for Poisson:


▶ The probability of an event in a small interval is proportional to
the length.
▶ Events in disjoint intervals are independent.
Example 1: Calls per Minute

Problem: A call center receives 5 calls per minute on average.


Find the probability of exactly 3 calls in a minute.
Solution:

λ = 5, k=3
e −5 · 53 e −5 · 125
P(X = 3) = =
3! 6
Using e −5 ≈ 0.0067379:
0.0067379 × 125 0.8422375
P≈ = ≈ 0.14037
6 6
Thus, about 14.04% chance.
Example 2: Typos per Page

Problem: A book averages 2 typos per page. Find the probability


a given page has no typos.
Solution:

λ = 2, k=0
e −2 · 20
P(X = 0) = = e −2
0!
e −2 ≈ 0.135335, so about 13.53% chance.
Binomial Distribution
Binomial Distribution
X ∼ Binomial(n, p) counts the number of successes in n
independent Bernoulli trials, each with success probability p.
 
n k
P(X = k) = p (1 − p)n−k , k = 0, 1, . . . , n.
k

Mean and Variance

E [X ] = np , Var(X ) = np(1 − p) .

Example 1: Toss a fair coin 10 times. Let X = number of heads.


Here n = 10, p = 0.5. Then

E [X ] = 10 · 0.5 = 5, Var(X ) = 10 · 0.5 · 0.5 = 2.5.

P(X = 6) = 10 6 4 10 ≈ 0.205.

6 (0.5) (0.5) = 210 · (0.5)
Example 2: A multiple-choice test has 5 questions, each with 4
options. Random guessing gives p = 0.25, n = 5.

E [X ] = 5 · 0.25 = 1.25, Var(X ) = 5 · 0.25 · 0.75 = 0.9375.

P(X = 2) = 52 (0.25)2 (0.75)3 = 10 · 0.0625 · 0.421875 ≈ 0.2637.



Poisson Distribution
Definition
X ∼ Poisson(λ) models the number of events occurring in a fixed
interval, with average rate λ > 0.

λk
P(X = k) = e −λ , k = 0, 1, 2, . . .
k!

Mean and Variance

E [X ] = λ , Var(X ) = λ .

Example 1: Emails arrive at an average rate of 5 per hour. Let X


= number of emails in one hour, λ = 5.
53 125
E [X ] = 5, Var(X ) = 5, P(X = 3) = e −5 = e −5 ≈ 0.1404.
3! 6
Example 2: A book has an average of 0.2 typos per page. For one
page, λ = 0.2.
Geometric Distribution (Trials until first success)
Definition
X ∼ Geometric(p) counts the number of independent Bernoulli
trials needed to obtain the first success, where each trial has
success probability p.

P(X = k) = (1 − p)k−1 p, k = 1, 2, 3, . . .

Mean and Variance

1 1−p
E [X ] = , Var(X ) = .
p p2

Example 1: Roll a fair die until a 6 appears. p = 16 .


1 5/6 5/6
E [X ] = = 6, Var(X ) = 2
= = 30.
1/6 (1/6) 1/36
P(first 6 on 3rd roll) = (1 − 16 )2 · 16 = ( 65 )2 · 16 = 216
25
≈ 0.1157.
Example 2: A shooter hits the target with probability 0.8 per
Hypergeometric Distribution
Definition
X ∼ Hypergeometric(N, K , n) counts the number of successes in a
sample of size n drawn without replacement from a population of
size N containing K successes.
K N−K
 
k n−k
P(X = k) = N
 , max(0, n − (N − K )) ≤ k ≤ min(n, K ).
n

Mean and Variance


 
K K K N −n
E [X ] = n , Var(X ) = n 1− .
N N N N −1

Example 1: Draw 5 cards from a standard deck (52 cards, 4


aces). N = 52, K = 4, n = 5.
4 20 4 48 47 5 · 4 · 48 · 47
E [X ] = 5· = ≈ 0.3846, Var(X ) = 5· · · = ≈
52 52 52 52 51 52 · 52 · 51
Negative Binomial Distribution (Trials until r -th success)
Definition
X ∼ Negative Binomial(r , p) counts the number of independent
Bernoulli trials needed to obtain r successes, each with success
probability p.
 
k −1 r
P(X = k) = p (1 − p)k−r , k = r , r + 1, . . .
r −1

Mean and Variance

r r (1 − p)
E [X ] = , Var(X ) = .
p p2

Example 1: Roll a fair die until the 3rd time a 6 appears. r = 3,


p = 16 .
3 3 · (5/6) 5
E [X ] = = 18, Var(X ) = 2
= 3 · · 36 = 90.
1/6 (1/6) 6
Frame Title

END OF LECTURE

You might also like