Prob & Statisitics
Prob & Statisitics
DISTRIBUTIONS
GHULAM MUSTAFA
Definition
A joint probability distribution describes the probability of two or
more random variables occurring simultaneously. It provides a
complete description of the probabilistic relationship between
multiple random variables defined on the same sample space.
Joint Probability Mass Function (PMF)
For two discrete random variables X and Y :
Joint PMF
Properties:
▶ p(x, y ) ≥ 0 for all (x, y )
XX
▶ p(x, y ) = 1
x y
Example
For rolling two fair dice:
1
p(x, y ) = for all x, y ∈ {1, 2, 3, 4, 5, 6}
36
Example: Rolling Two Fair Dice
Solution:
▶ Let X = outcome of first die, Y = outcome of second die.
The sample space has 36 equally likely outcomes.
▶ Joint Probability Mass Function (PMF):
1
pX ,Y (x, y ) = P(X = x, Y = y ) = , x, y = 1, . . . , 6.
36
▶ Marginal PMFs:
6 6
X X 1 6 1
pX (x) = pX ,Y (x, y ) = = = , x = 1, . . . , 6.
36 36 6
y =1 y =1
1
Similarly, pY (y ) = for y = 1, . . . , 6. So each die is
6
uniformly distributed.
▶ Independence: Since
1 1 1
pX ,Y (x, y ) = = · = pX (x) pY (y ) for all x, y ,
36 6 6
X and Y are independent.
▶ Some probabilities:
6 1
▶ P(X + Y = 7) = = (pairs:
36 6
(1,6),(2,5),(3,4),(4,3),(5,2),(6,1)).
6 1
▶ P(X + Y ≤ 4) = = (pairs:
36 6
(1,1),(1,2),(1,3),(2,1),(2,2),(3,1)).
P(X = 1, X + Y = 7) 1/36 1
▶ P(X = 1 | X +Y = 7) = = = .
P(X + Y = 7) 6/36 6
Expectation and Variance:
1+2+3+4+5+6
E [X ] = E [Y ] = = 3.5,
6
E [X + Y ] = E [X ] + E [Y ] = 7,
35
Var(X ) = Var(Y ) = ,
12
35
Var(X + Y ) = Var(X ) + Var(Y ) = (by independence).
6
Joint Probability Density Function (PDF)
For two continuous random variables X and Y :
Joint PDF
Z ∞ Z ∞
f (x, y ) ≥ 0 and f (x, y ) dx dy = 1
−∞ −∞
Example
For a uniform distribution over a unit square:
f (x, y ) = 1 for 0 ≤ x ≤ 1, 0 ≤ y ≤ 1
Example 3: Continuous Bivariate Distribution
Joint pdf:
(
2e −x e −y , x > 0, y > 0,
fX ,Y (x, y ) =
0, otherwise.
Marginal of X :
Z ∞ Z ∞
−x −y −x
fX (x) = 2e e dy = 2e e −y dy = 2e −x ·1 = 2e −x , x > 0.
0 0
By symmetry, fY (y ) = 2e −y , y > 0.
Example 3: Continuous Bivariate Distribution
Joint pdf:
(
2e −x e −y , x > 0, y > 0,
fX ,Y (x, y ) =
0, otherwise.
Marginal of X :
Z ∞ Z ∞
−x −y −x
fX (x) = 2e e dy = 2e e −y dy = 2e −x ·1 = 2e −x , x > 0.
0 0
By symmetry, fY (y ) = 2e −y , y > 0.
Independence check:
Conditional Distribution
f (x, y )
f (x|y ) = (continuous)
fY (y )
p(x, y )
p(x|y ) = (discrete)
pY (y )
Independence
X and Y are independent if and only if:
Problem: Roll a fair die twice. Let X = first roll, Y = second roll.
Solution:
1
p(x, y ) = ∀x, y ∈ {1, 2, . . . , 6}
36
1
P(X = 3, Y = 5) =
36
4 1
P(X ≤ 2, Y ≥ 5) = =
36 9
6
X 1 1
pX (x) = = for all x
36 6
y =1
Solved Example 2
Conditional:
3x 2x
f (x|y ) = 3
= , y ≤x ≤1
2 (1
2
−y ) 1 − y2
Solved Example 2 (Continued)
Z 1/2 Z 1/2
P= 3x dx dy
y =1/4 x=y
Z 1/2
1 3 2
= − y dy
1/4 4 2
1/2
y3
3 1
= y−
2 4 3 1/4
5
=
128
Independence check: fX (x)fY (y ) = 92 x 2 (1 − y 2 ) ̸= 3x = f (x, y )
Therefore, X and Y are not independent.
What is a Marginal Distribution?
▶ Discrete:
X X
pX (x) = pX ,Y (x, y ), pY (y ) = pX ,Y (x, y ).
y x
▶ Continuous:
Z ∞ Z ∞
fX (x) = fX ,Y (x, y ) dy , fY (y ) = fX ,Y (x, y ) dx.
−∞ −∞
Example 1: Discrete Joint Table
Y =1 Y =2 Y =3
X =0 0.1 0.2 0.1
X =1 0.3 0.1 0.2
Marginal of X :
pX (0) = 0.1 + 0.2 + 0.1 = 0.4, pX (1) = 0.3 + 0.1 + 0.2 = 0.6.
Marginal of Y :
Verify sum = 1:
P(X = 0, Y = 1, Z = 1)
P(Z = 1|X = 0, Y = 1) =
P(X = 0, Y = 1)
0.096 0.096
= = = 0.6
0.064 + 0.096 0.16
What is Covariance?
Conceptual Definition
Covariance measures the linear relationship between two random
variables. It indicates how two variables change together.
Mathematical Definition:
Cov(X , Y ) = E [XY ] − E [X ]E [Y ]
Interpreting Covariance
Key Properties:
▶ Cov(X , Y ) = Cov(Y , X ) (symmetry)
▶ Cov(X , X ) = Var(X )
Step 1: Marginals
E [X ] = 0.8, E [Y ] = 1.0
Covariance: Definition and Example
Definition
For two random variables X and Y :
Cov(X , Y ) = E (X − E [X ])(Y − E [Y ])
Cov(X , Y ) = E [XY ] − E [X ]E [Y ]
f (x, y ) = x + y , 0 ≤ x ≤ 1, 0 ≤ y ≤ 1
Inner integral:
1
x2 x
Z
(x 2 y + xy 2 ) dy = +
0 2 3
Outer integral:
Z 1 2
x x 1 1 1
+ dx = + =
0 2 3 6 6 3
Step 4: Covariance
2
1 7 1 49 1
Cov(X , Y ) = − = − =−
3 12 3 144 144
Y = a1 X1 + a2 X2 + · · · + an Xn + b
Examples:
▶ Portfolio return: P = 0.6A + 0.4B
▶ Weighted sum: Z = 2X − 3Y + 1
Expected Value is Linear
Special Cases:
E [X + Y ] = E [X ] + E [Y ]
E [X − Y ] = E [X ] − E [Y ]
E [aX + b] = aE [X ] + b
n
X X
Var(Y ) = ai2 Var(Xi ) + 2 ai aj Cov(Xi , Xj )
i=1 1≤i<j≤n
P = 0.6A + 0.4B
E [P] = 0.6(8%) + 0.4(12%) = 4.8% + 4.8% = 9.6%
Var(P) = (0.6)2 (0.04) + (0.4)2 (0.09)
= 0.36 × 0.04 + 0.16 × 0.09 = 0.0144 + 0.0144 = 0.0288
E [Z ] = 2(5) − 3(3) + 1 = 10 − 9 + 1 = 2
Var(Z ) = 22 (4) + (−3)2 (9) + 2(2)(−3)(2)
= 4 × 4 + 9 × 9 + (−24) = 16 + 81 − 24 = 73
P(X = 1) = p, P(X = 0) = 1 − p.
Key properties:
Here p = 0.7.
(b) Probability of success:
P(X = 1) = p = 0.7.
P(X = 0) = 1 − p = 0.3.
Example: Free Throws (continued)
E [X ] = p = 0.7.
(e) Variance:
Definition
The binomial distribution counts how many times a specific
event (a “success”) happens when you repeat the same simple
experiment a fixed number of times, and each repetition is
independent and has the same chance of success.
Intuition: Flip a coin several times. The binomial distribution tells
you the probability of getting, say, exactly 3 heads in 5 flips.
n = 5, p = 0.5, k = 3
5 3 2 5
P(X = 3) = (0.5) (0.5) = (0.5)5
3 3
5 1
= 10, (0.5)5 =
3 32
1 10 5
P = 10 × = = = 0.3125
32 32 16
Thus, a 31.25% chance.
Example 2: Guessing on a Test
n = 10, p = 0.25, k = 4
10
P(X = 4) = (0.25)4 (0.75)6
4
10
= 210, (0.25)4 = 0.00390625, (0.75)6 ≈ 0.17797
4
210 × 0.00390625 = 0.8203125
0.8203125 × 0.1779785 ≈ 0.146
Definition
The multinomial distribution is like the binomial, but instead of
just two outcomes (success/failure), there are several possible
outcomes. It gives the probability of getting specific counts for
each outcome in a fixed number of independent trials.
Definition
The hypergeometric distribution gives the probability of
obtaining exactly k successes in n draws without replacement
from a finite population of size N that contains exactly K
successes.
Intuition: Drawing cards from a deck without replacement. Each
draw changes the composition, so the probabilities are not
constant. Key difference from binomial: Binomial assumes
independent draws with replacement (constant probability);
hypergeometric assumes no replacement.
Mathematical Formula
K N−K
k n−k
P(X = k) = N
n
where:
▶ N = population size
▶ ba = binomial coefficient
Example 1: Cards
N = 52, K = 4, n = 5, k=2
4 48
2
P= 52
3
5
4 48 52
= 6, = 17296, = 2598960
2 3 5
6 × 17296 103776
P= = ≈ 0.03993
2598960 2598960
About 3.99% chance.
Example 2: Defective Components
N = 20, K = 3, n = 5, k=1
3 17
1
P= 20
4
5
3 17 20
= 3, = 2380, = 15504
1 4 5
3 × 2380 7140
P= = ≈ 0.4605
15504 15504
About 46.05% chance.
Hypergeometric Distribution
Definition
The hypergeometric distribution gives the probability of
obtaining exactly k successes in n draws without replacement
from a finite population of size N that contains exactly K
successes.
Intuition: Drawing cards from a deck without replacement. Each
draw changes the composition, so the probabilities are not
constant. Key difference from binomial: Binomial assumes
independent draws with replacement (constant probability);
hypergeometric assumes no replacement.
Mathematical Formula
K N−K
k n−k
P(X = k) = N
n
where:
▶ N = population size
▶ ba = binomial coefficient
Example 1: Cards
N = 52, K = 4, n = 5, k=2
4 48
2
P= 52
3
5
4 48 52
= 6, = 17296, = 2598960
2 3 5
6 × 17296 103776
P= = ≈ 0.03993
2598960 2598960
About 3.99% chance.
Example 2: Defective Components
N = 20, K = 3, n = 5, k=1
3 17
1
P= 20
4
5
3 17 20
= 3, = 2380, = 15504
1 4 5
3 × 2380 7140
P= = ≈ 0.4605
15504 15504
About 46.05% chance.
Negative Binomial Distribution
Definition
The negative binomial distribution models the number of trials
needed to achieve a fixed number of successes in a sequence of
independent and identical Bernoulli trials.
Intuition: Flip a coin until you get 3 heads. The number of flips
required (including the last head) is a negative binomial random
variable.
x −1 r
P(X = x) = p (1 − p)x−r , x = r , r + 1, r + 2, . . .
r −1
where:
▶ r = desired number of successes
▶ x−1
r −1 counts ways to arrange the first r − 1 successes among the
first x − 1 trials.
r
Mean: E [X ] = p
r (1−p)
Variance: Var(X ) = p2
Example 1: Getting 3 Heads
r = 3, p = 0.5, x = 5
5−1 4
P(X = 5) = (0.5)3 (0.5)2 = (0.5)5
3−1 2
4 1
= 6, (0.5)5 =
2 32
1 6 3
P =6× = = = 0.1875
32 32 16
There is an 18.75% chance.
Example 2: Making 2 Shots
r = 2, p = 0.7, x = 4
4−1 2 2 3
P(X = 4) = (0.7) (0.3) = (0.49)(0.09)
2−1 1
3
= 3, 0.49 × 0.09 = 0.0441
1
P = 3 × 0.0441 = 0.1323
Definition
The geometric distribution models the number of trials needed
to get the first success in a sequence of independent and identical
Bernoulli trials.
Intuition: Flip a coin until you get heads. The number of flips
required follows a geometric distribution.
Key properties:
▶ Trials are independent, each with success probability p.
▶ Possible values: k = 1, 2, 3, . . .
Mathematical Formula
P(X = k) = (1 − p)k−1 p, k = 1, 2, 3, . . .
1
Mean: E [X ] =
p
1−p
Variance: Var(X ) =
p2
Memoryless property: The geometric distribution is the only
discrete distribution that is memoryless:
p = 0.5, k=4
P(X = 4) = (1 − p)4−1 p = (0.5)3 × 0.5
= 0.125 × 0.5 = 0.0625
Definition
The Poisson distribution models the number of times an event
occurs in a fixed interval of time or space, given that these events
happen with a known constant average rate and independently of
each other.
Intuition: Count emails per hour, calls per minute, or defects per
meter – all can be modeled with Poisson.
Key properties:
▶ Events occur independently.
e −λ λk
P(X = k) = , k = 0, 1, 2, . . .
k!
Mean: E [X ] = λ
Variance: Var(X ) = λ (equal mean and variance)
λ = 5, k=3
e −5 · 53 e −5 · 125
P(X = 3) = =
3! 6
Using e −5 ≈ 0.0067379:
0.0067379 × 125 0.8422375
P≈ = ≈ 0.14037
6 6
Thus, about 14.04% chance.
Example 2: Typos per Page
λ = 2, k=0
e −2 · 20
P(X = 0) = = e −2
0!
e −2 ≈ 0.135335, so about 13.53% chance.
Binomial Distribution
Binomial Distribution
X ∼ Binomial(n, p) counts the number of successes in n
independent Bernoulli trials, each with success probability p.
n k
P(X = k) = p (1 − p)n−k , k = 0, 1, . . . , n.
k
E [X ] = np , Var(X ) = np(1 − p) .
P(X = 6) = 10 6 4 10 ≈ 0.205.
6 (0.5) (0.5) = 210 · (0.5)
Example 2: A multiple-choice test has 5 questions, each with 4
options. Random guessing gives p = 0.25, n = 5.
λk
P(X = k) = e −λ , k = 0, 1, 2, . . .
k!
E [X ] = λ , Var(X ) = λ .
P(X = k) = (1 − p)k−1 p, k = 1, 2, 3, . . .
1 1−p
E [X ] = , Var(X ) = .
p p2
r r (1 − p)
E [X ] = , Var(X ) = .
p p2
END OF LECTURE