0% found this document useful (0 votes)
54 views92 pages

Discrete Random Variables Explained

This document discusses discrete random variables, their properties, and related probability distributions, including binomial, Poisson, and hypergeometric distributions. It covers concepts such as probability mass functions, cumulative distribution functions, expectation, variance, and the relationship between them. Additionally, it includes practice questions to reinforce understanding of these concepts.

Uploaded by

yathei2007112
Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
54 views92 pages

Discrete Random Variables Explained

This document discusses discrete random variables, their properties, and related probability distributions, including binomial, Poisson, and hypergeometric distributions. It covers concepts such as probability mass functions, cumulative distribution functions, expectation, variance, and the relationship between them. Additionally, it includes practice questions to reinforce understanding of these concepts.

Uploaded by

yathei2007112
Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Analysis of Economic Data

Topic 3
Discrete Random Variables
Xinyao Qiu

HKU Faculty of Business and Economics


The University of Hong Kong
Logistics
• Attendance
• Assignment I answer key
• Assignment II (due Oct 5, 23:59, Sunday)
Plan of This Lecture
• Random Variables
• Probability Distributions for Discrete Random
Variables
• Properties of Discrete Random Variables
• Binomial Distribution
• Poisson Distribution
• Hypergeometric Distribution
• Jointly Distributed Discrete Random Variables
Random Variables
Random Variables
• A random variable (r.v.) is a variable that takes on numerical
values realized by the outcomes in the sample space
generated by a random experiment.
– Mathematically, a random variable is a function from sample space S
to ℛ
– Conventionally, we use capital letters, such as 𝑋, to denote the
random variable and corresponding lowercase letter 𝑥 to denote a
possible value
• A random variable is a discrete random variable if it can take
on no more than a countable number of values.
• A random variable is a continuous random variable if it can
take any value in an interval.
– for any two values, there is some third value that lies between them
Practice Questions
Are the following random variables discrete or continuous?
1. The number of customers visiting a small store in a day

2. The number of claims on a medical insurance policy in a


particular week

3. The rate of return on a portfolio in a year

4. The highest temperature in a day

5. * The number of customers visiting McDonald’s around the


world in a day
Practice Questions
Are the following random variables discrete or continuous?
1. The number of customers visiting a small store in a day
– discrete
2. The number of claims on a medical insurance policy in a particular
week
– discrete
3. The rate of return on a portfolio in a year
– continuous
4. The highest temperature in a day
– continuous
5. * The number of customers visiting McDonald’s around the world
in a day
– Better being treated as continuous
• For practical purpose we treat random variables as discrete when
probability statements about the individual possible outcomes have
worthwhile meaning
• all other random variables are regarded as continuous.
– e.g., $35,276.21 and $35,276.22: difference between adjacent discrete values
is of no importance => model as continuous
• For continuous random variables we can assign probabilities only to a
range of values. The probability that 𝑋 is precisely equal to 𝑥 is zero.

Random
Variables

Discrete Continuous
Random Variable Random Variable
Probability Distributions
for Discrete Random Variables
Probability Distribution Function
• The probability distribution function, 𝑝(𝑥), of a
discrete random variable 𝑋 represents the probability
that 𝑋 takes the value 𝑥, as a function of 𝑥. That is
𝑝 𝑥 = 𝑃 𝑋 = 𝑥 for all values of 𝑥
– such that:
• 0 ≤ 𝑝 𝑥 ≤ 1 for any value of 𝑥
• The individual probabilities sum to 1, that is, ∑! 𝑝 𝑥 = 1, where
the notation indicates summation over all possible values of 𝑥
• Sometimes the probability distribution of a discrete
random variable is called the probability mass function
(PMF)
– Note that 𝑋 = 𝑥 must be an event; otherwise 𝑃 𝑋 = 𝑥 is
not well defined.
Notation: I will use 𝑝 𝑥 and 𝑝 for PMF.
- Rather than 𝑃 𝑥 and 𝑃 as in the textbook.
- To avoid confusion with the probability symbol 𝑃
Example 4.1: Number of Product Sales
Cumulative Distribution Function
• The cumulative distribution function (CDF),
𝐹(𝑥! ), of a discrete or continuous random
variable 𝑋, represents the probability that 𝑋 does
not exceed the value 𝑥! , as a function of 𝑥! . That
is
𝐹 𝑥! = 𝑃 𝑋 ≤ 𝑥!
– where the function is evaluated at all values of 𝑥!
– such that:
• 0 ≤ 𝐹 𝑥! ≤ 1 for every number 𝑥!
• if 𝑥! and 𝑥" are two numbers with 𝑥! < 𝑥" , then 𝐹 𝑥! ≤
𝐹 𝑥"
– 𝑝 # and 𝐹 # are probabilistic counterparts of
histogram and ogive in Topic 1
We can show that
𝐹 𝑥! = ∑"#"! 𝑝(𝑥)
• where the notation implies that summation is over all
possible values of 𝑥 that are less than or equal to 𝑥*
Relationship between PMF and CDF for Discrete r.v.

𝐹 𝑥! = $ 𝑝(𝑥)
"#""
– where the function is evaluated at all values of 𝑥*
– From the definition of cdf:
i. 0 ≤ 𝐹 𝑥! ≤ 1 ∀ 𝑥!
ii. if 𝑥! < 𝑥$, then 𝐹 𝑥! ≤ 𝐹 𝑥$ (i.e. 𝐹 0 is a
(weakly) increasing function)
Practice Question
Which of the following is true about a probability distribution?
A) The sum of probabilities of all possible outcomes could not equal 1.
B) The representation must be graphed, not tabular or algebraic.
C) The probability of each outcome must be between 0 and 1, inclusive.
D) The outcomes do not need to be mutually exclusive.
Practice Question
Which of the following is true about a probability distribution?
A) The sum of probabilities of all possible outcomes must not equal 1.
B) The representation must be graphed, not tabular or algebraic.
C) The probability of each outcome must be between 0 and 1, inclusive.
D) The outcomes do not need to be mutually exclusive.

Answer: C

This is how the other choices could have been correct:


A) The sum of probabilities of all possible outcomes must equal 1.
B) The representation could be graphed, tabular or algebraic.
D) The outcomes have to be mutually exclusive.
Properties of
Discrete Random Variables
Expectation
• The pmf contains all information about the properties of a discrete
r.v., but it’s desirable to have some summary measures of pmf’s
characteristics.
• The expected value (expectation, or mean), 𝐸[𝑋], of a discrete
random variable 𝑋 is defined as
𝐸 𝑋 = 𝜇# = . 𝑥𝑝(𝑥)
$
– where the notation indicates that the summation extends over all
possible values of 𝑥 (i.e. in the support of x)
– weighted average (weighted by what?)
• Use probability language here: think of 𝐸 𝑋 in terms of relative
frequencies, weighting each possible value of x by its probability
Practice Question
Q1: Roll a die once. What is the expected outcome?

Q2: Toss 2 coins, let x = # of heads. Compute expected value of x.


Practice Question
Q1: Roll a dice once. What is the expected outcome?

$ 5$
𝐸 𝑋 = ∑ 𝑥𝑝(𝑥) = 4 ∑4$ 𝑖 = 4
= 3.5
Q2: Toss 2 coins, let x = # of heads. Compute expected value of x.

Step 1: Compute probabilities for each possible outcome.


Step 2: Compute expected mean.
x p(x)
0 .25
1 .50
2 .25

𝐸 𝑋 = ( 𝑥𝑝(𝑥) = 0×0.25 + 1×0.5 + 2×0.25 = 1


Variance
• The expectation of the squared deviations about the
mean, 𝑋 − 𝜇+ , , is called the variance, given by
𝜎+, = 𝐸 𝑋 − 𝜇+ , = ,{ 𝑥 − 𝜇+ , 𝑝(𝑥)}
-
– Alternative form 𝜎65 = 𝐸 𝑋 5 − 𝜇65 = ∑" 𝑥 5𝑝 𝑥 − 𝜇65
(can be derived from definition above)
• The standard deviation is the positive square root of
the variance
• Same as population variance and standard deviation in
Topic 1
Variance (derivation of alternative form)

𝜎$% = 𝐸 𝑋 − 𝜇$ %
= $ 𝑥 − 𝜇$ % 𝑝 𝑥
"

= $ 𝑥 % 𝑝 𝑥 − 2𝜇$ $ 𝑥𝑝 𝑥 + 𝜇$ % $ 𝑝 𝑥
" " "
= 𝐸 𝑋 % − 2𝜇$ 𝐸 𝑋 + 𝜇$%
= 𝐸 𝑋 % − 2𝜇$% + 𝜇$%
= 𝐸 𝑋 % − 𝜇$%
Practice Question
Q: Toss 2 coins, let x = # of heads. Compute the standard deviation
of x. (recall from two slides ago 𝐸 𝑋 = 1)
Practice Question
Q: Toss 2 coins, let x = # of heads. Compute the standard deviation
of x. (recall from two slides ago 𝐸 𝑋 = 1)

x P(x)
0 .25
1 .50
2 .25

• 𝜎+ = ∑-{ 𝑥 − 𝜇+ , 𝑃(𝑥)}
= (0 − 1), ×0.25 + (1 − 1), ×0.5 + (2 − 1), ×0.25
= 0.5 = 0.707
Expectation of Functions of a R.V.
• Let 𝑋 be a discrete random variable with probability
distribution 𝑝(𝑥), and let g(𝑥) be some function of 𝑋.
Then the expected value, E[𝑔 𝑥 ], of that function is
defined as follows:
𝐸𝑔 𝑋 = ,{𝑔 𝑥 𝑝(𝑥)}
-
– E.g. 𝑋 can be the time to complete a contract, and 𝑔 𝑥 is
the cost when the completion time is 𝑋. And we are
interested in knowing the expected cost.
• In general, 𝐸 𝑔 𝑋 ≠ 𝑔(𝐸[𝑋]) unless g(𝑥) is linear
– How about g 𝑥 = 𝑋 5?
Expectation of Functions of a R.V.
• Let 𝑋 be a discrete random variable with probability
distribution 𝑃(𝑥), and let g(𝑥) be some function of 𝑋.
Then the expected value, E[𝑔 𝑥 ], of that function is
defined as follows:
𝐸𝑔 𝑋 = ,{𝑔 𝑥 𝑃(𝑥)}
-
– E.g. 𝑋 can be the time to complete a contract, and 𝑔 𝑥 is
the cost when the completion time is 𝑋. And we are
interested in knowing the expected cost.
• In general, 𝐸 𝑔 𝑋 ≠ 𝑔(𝐸[𝑋]) unless g(𝑥) is linear
– How about g 𝑋 = 𝑋 5?
– 𝐸 𝑔 𝑋 − 𝑔 𝐸 𝑋 = 𝐸 𝑋 5 − 𝜇5 = 𝜎 5 ≠ 0
– i.e. 𝐸 𝑔 𝑋 ≠ 𝑔(𝐸[𝑋]) when g 𝑋 = 𝑋 5
Mean and Variance of Linear Functions of Random Variables

• Let 𝑎 and 𝑏 be any constant fixed numbers. Then 𝐸 𝑎 = 𝑎 and


𝑉𝑎𝑟 𝑎 = 0. Similarly for 𝑏
• Let 𝑋 be a random variable with mean 𝜇6 and variance 𝜎65. Define
the random variable 𝑌 as 𝑎 + 𝑏𝑋. Then, the mean and variance of
𝑌 are

𝜇2 ≡ 𝐸 𝑌 = 𝐸 𝑎 + 𝑏𝑋 = 𝑎 + 𝑏𝐸 𝑋 ≡ 𝑎 + 𝑏𝜇3

𝜎24 ≡ 𝑉𝑎𝑟 𝑌 = 𝑉𝑎𝑟 𝑎 + 𝑏𝑋 = 𝑏 4 𝑉𝑎𝑟(𝑋) ≡ 𝑏 4 𝜎34


𝜎2 = 𝑉𝑎𝑟(𝑌) = 𝑏 𝜎3
Proof is similar to the earlier slide on alternative form of variance.
[assignment II]
Some Special Linear Functions for 𝐘 = 𝒂 + 𝒃𝑿

• If 𝑏 = 0, i.e. 𝑌 = 𝑎. Then 𝐸 𝑎 = 𝑎 and Var 𝑎 = 0


• If 𝑎 = 0, i.e. 𝑌 = 𝑏𝑋. Then 𝐸 𝑏𝑋 = 𝑏𝐸[𝑋] and
Var 𝑏𝑋 = 𝑏 , 𝑉𝑎𝑟(𝑋) (variance is the same as case
where a is a general constant)
.! 0 +1.!
• If 𝑎 = − and 𝑏 = , i.e. 𝑌 = is the ______ of
/! /! /!
+1.!
𝑋, then E Y = E =?
/!
+1.!
and Var Y = Var /!
=?
Some Special Linear Functions for 𝐘 = 𝒂 + 𝒃𝑿

• If 𝑏 = 0, i.e. 𝑌 = 𝑎. Then 𝐸 𝑎 = 𝑎 and Var 𝑎 = 0


• If 𝑎 = 0, i.e. 𝑌 = 𝑏𝑋. Then 𝐸 𝑏𝑋 = 𝑏𝐸[𝑋] and
Var 𝑏𝑋 = 𝑏 , 𝑉𝑎𝑟(𝑋) (variance is the same as case
where a is a general constant)
.! 0 +1.!
• If 𝑎 = − and 𝑏 = , i.e. 𝑌 = is the z-score of
/! /! /!
+1.! + .!
𝑋, then E Y = E =E − =0
/! /! /!
+1.! 234(+)
and Var Y = Var /!
= /! "
=1
Practice Question
1. The listing of all possible outcomes of an experiment and their
corresponding probability of occurrence is called a(n):
A) random variable.
B) objective probability.
C) probability distribution.

2. Consider the following probability distribution. Which of the following is


true?
x 0 1 2 3 4 5 6
P(x) 0.07 0.19 0.23 0.17 0.16 0.14 0.04
A) P(X > 3) = 0.51
B) P(2 ≤ X ≤ 5) = 0.33
C) P(X ≥3) = 0.51
D) P(X < 6) =1
Practice Question
1. The listing of all possible outcomes of an experiment and their
corresponding probability of occurrence is called a(n):
A) random variable.
B) objective probability.
C) probability distribution.

2. Consider the following probability distribution. Which of the following is


true?
x 0 1 2 3 4 5 6
P(x) 0.07 0.19 0.23 0.17 0.16 0.14 0.04
A) P(X > 3) = 0.51 0.34
B) P(2 ≤ X ≤ 5) = 0.33 0.70
C) P(X ≥3) = 0.51
D) P(X < 6) = 1 0.96
Probability
Distributions

Ch. 4 Discrete Continuous Ch. 5


Probability Probability
Distributions Distributions

Binomial Uniform

Poisson Normal

Hypergeometric Exponential
Binomial Distribution
Probability
Distributions

Ch. 4 Discrete Continuous Ch. 5


Probability Probability
Distributions Distributions

Binomial Uniform

Poisson Normal

Hypergeometric Exponential
Bernoulli Distribution
The Bernoulli random variable is a random variable taking only two values, 0
and 1, labeled as "failure" and "success” in one trial.
• Let 𝑝! denote the probability of success, and, the probability of failure (1 −
𝑝! ). The Bernoulli distribution, 𝑋~𝐵𝑒𝑟𝑛𝑜𝑢𝑙𝑙𝑖(𝑝! ), is

𝑝 0 = 1 − 𝑝! and 𝑝 1 = 𝑝!

• The mean and variance of a Bernoulli(p) r.v. X are

𝜇" = 𝐸 𝑋 = C 𝑥𝑝 𝑥 = 0 1 − 𝑝! + 1 𝑝! = 𝑝!
#

𝜎"$ = 𝐸 𝑋 − 𝜇" $ = C 𝑥 − 𝜇" $ 𝑝 𝑥


#
= 0 − 𝑝! $ 1 − 𝑝! + 1 − 𝑝! $ 𝑝! = 𝑝! (1 − 𝑝! )

• 𝜎"$ achieves its maximum when 𝑝! = 0.5; 𝜎"$ achieves its minimum when
𝑝! = 0 𝑜𝑟 1
[Link]

Determining feature: 1 trial, 2 outcomes


Binomial Distribution
• Suppose that a random experiment can result
in two possible mutually exclusive and
collectively exhaustive outcomes, “success: 1”
and “failure: 0”
• 𝑃 1 = 𝑝& = 0.6
• 𝑛 = 4 independent trials are carried out
• What is the probability of 𝑥 = 3 successes in
𝑛 = 4 independent trials?
• We’re looking for all possible “paths” where
three successes happen and one failure happens.
• Number of paths 𝐶56 = 4:
– Success, Success, Success, Failure
– Success, Success, Failure, Success
– Success, Failure, Success, Success
– Failure, Success, Success, Success
• If the prob of success = 0.7, the prob of each path
is 0.7×0.7×0.7×0.3
• The prob of 3 successes in 4 independent trials:
4×0.7×0.7×0.7×0.3
Binomial Distribution
• The number of sequences with 𝑥 successes in 𝑛
independent trials is
"!
o 𝐶!" = "! = !! "$! !
o where 𝑛! = 𝑛× 𝑛 − 1 × 𝑛 − 2 … 2×1

• The binomial random variable 𝑋 is the number of


successes in 𝑛 independent trials of a 𝐵𝑒𝑟𝑛𝑜𝑢𝑙𝑙𝑖 𝑝I
random variable, denoted as 𝑿~𝑩𝒊𝒏𝒐𝒎𝒊𝒂𝒍 𝒏, 𝒑𝑩

• The event “𝑥 successes resulting from 𝑛 independent


trials” can occur in 𝐶-K mutually exclusively ways. And
probability of any sequence with x successes is
𝑝I - 1 − 𝑝I K1-
Binomial Distribution
Suppose that a random experiment can result in two possible
mutually exclusive and collectively exhaustive outcomes,
“success” and “failure” and that 𝑝% is the probability of a
success in a single trial. If 𝑛 independent trials are carried out,
the distribution of the number of resulting successes, 𝑥, is
called the binomial distribution. Its probability distribution
function for the binomial random variable X = 𝑥 is as follows:

𝑛! ! "$!
𝑝 𝑥|𝑛, 𝑝% = 𝑝% 1 − 𝑝%
𝑥! 𝑛 − 𝑥 !

– Mean: 𝜇# = 𝐸 𝑋 = 𝑛𝑝%
– Variance: 𝜎#& = 𝐸 𝑋 − 𝜇# & = 𝑛𝑝% (1 − 𝑝% )
• More discussion on this on jointly distributed r.v.’s
Binomial Distribution
Before using the binomial, the specific situation
must be analyzed to determine if the following
occur:
• The application involves several trials, each of
which has only two outcomes: yes or no, on or
off, success or failure (i.e. each trial is a Bernoulli
r.v.)
• The probability of the outcome is the same for
each trial
• The probability of the outcome on one trial does
not affect the probability of other trials
Practice Question
What is the probability of 3 successes in 5 independent trials
with success rate of 0.1?
Practice Question
What is the probability of 3 successes in 5 independent trials with
success rate of 10%?
• Step 1: figure out which random variable this is.
• Step 2: figure out the key parameters of this r.v..
– Binomial: 𝑛 = 5 independent trials, 𝑥 = 3 successes, and 𝑝 = 0.1
• Step 3: rewriting the question in statistical language.
– Known: this r.v. follows a binomial distribution
– Also known: in general binomial distribution follows the following
probability formula
#! % #&%
• 𝑝 𝑥|𝑛, 𝑝" = 𝑝" 1 − 𝑝"
%! #&% !
– Want: 𝑝 𝑥 𝑛, 𝑝 = 𝑝(3|5,0.1)
• Step 4: compute the desired probability
'! ( '&(
– 𝑝 3 5,0.1 = 0.1 1 − 0.1 = 0.0081
(! '&( !
Practice Question
Suppose that you are in charge of marketing airline seats for a
major carrier. Four days before the flight date you have 16 seats
remaining on the plane. You know from past experience data
that 80% of the people that purchase tickets in this time period
will actually show up for the flight.
– Assume all people that purchase tickets before this will show up, so
we don’t consider these people in this question.
Assume the amount of people that purchase tickets in this time
period and end up showing up follows binomial distribution. If
you sell 20 extra tickets, what is the probability that you will
1. have 16 people show up?
2. have at least 1 empty seat?
3. overbook the flight?
Practice Question
Step 1: (translate English to statistics)
Define this binomial distribution
• # of people show up out of 20 à 𝑋
• 80% show up à 𝑝I = 𝑃 𝑠ℎ𝑜𝑤 𝑢𝑝 = 0.8
• You sell 20 extra tickets à 𝑛 = 20
• Probability of 16 people show upà 𝑃(𝑋 = 16)
• Probability of at least 1 empty seat à 𝑃(𝑋 ≤ 15)
• Probability of overbooking the flight à 𝑃(𝑋 > 16)
Practice Question
• Probability of 16 people show up
!"! %& $
– 𝑃 𝑋 = 16 = 𝑝 𝑥 𝑛, 𝑝 = 𝑝 16 20,0.8 = 0.8 0.2
$!%&!
– 𝑃 𝑋 = 16 = 0.2181994
• Probability of at least 1 empty seat
– 𝑃 𝑋 ≤ 15 = 𝑃 𝑋 = 15 + 𝑃 𝑋 = 14 + ⋯ + 𝑃 𝑋 = 1 + 𝑃(𝑋 = 0)
– 𝑃 𝑋 ≤ 15 = 1 − 𝑃 𝑋 = 16 − 𝑃 𝑋 = 17 − 𝑃 𝑋 = 18 − 𝑃 𝑋 = 19 −𝑃 𝑋 = 20
– 𝑃 𝑋 ≤ 15 = 0.3703617
• Probability of overbooking the flight
– 𝑃 𝑋 > 16 = 0.4114489
– The probability of overbooking is too high, then you probably shouldn’t be selling 20
extra tickets.

• The airline management then must evaluate the cost of overbooking versus the cost of
empty seats that generates no revenue. Airlines analyze data to determine the number of
seats that should be sold at reduced rates to maximize profit from each flight.
(*)How do we plot these in R?
(*) Binomial Distribution | R Tutorial
[Link]
distribution
Poisson Distribution
Probability
Distributions

Ch. 4 Discrete Continuous Ch. 5


Probability Probability
Distributions Distributions

Binomial Uniform

Poisson Normal

Hypergeometric Exponential
"!
" ! !
# ( !) =
!!

You work for an Apple store and manage the genius bar. Based
on your experience, customers without appointments arrive at
your genius bar at an average rate of 2 every hour.
– Assume these walk-in visits are independent, with a
constant arrival rate.
– Normally you only have two genius bar spots (which are
not reserved) available each hour. You want to know the
probability that you will have more than two walk-in visits
in one hour.
• We can solve this question using Poisson distribution
You work for an Apple store and manage the genius
bar. Based on your experience, customers without
appointments arrive at your genius bar at an average
rate of 2 every hour.

• # of walk-in visits in one hour: 𝑋~𝑃𝑜𝑖𝑠𝑠𝑜𝑛(𝜆)


• an average rate of 2 every hour: 𝜆 = 2
< &' =(
• use PMF 𝑃 𝑥 = >!
: @A;B
–𝑃 𝑋=0 = !!
= 𝑒 =;
: @A;C
–𝑃 𝑋=1 = = 2𝑒 =;
>!
: @A;A
–𝑃 𝑋=2 = = 2𝑒 =;
;!
– 𝑃 𝑋 > 2 = 1 − 𝑃 𝑋 ≤ 2 = 1 − 5𝑒 =;
Poisson Distribution
• The random variable 𝑋 is said to follow the Poisson distribution,
𝑿~𝑷𝒐𝒊𝒔𝒔𝒐𝒏(𝝀), if it has the probability distribution
𝑒 DE 𝜆$
𝑃 𝑥|𝜆 = , 𝑓𝑜𝑟 𝑥 = 0, 1, 2, …
𝑥!
• The mean and variance are:
𝜇$ = 𝐸 𝑋 = 𝜆
𝜎$& = 𝐸 𝑋 − 𝜇$ & = 𝜆
– where
– 𝑝 𝑥 : the probability of 𝑥 successes over a given time or space, given 𝜆
– 𝜆: the expected number of successes per time or space unit, 𝜆 > 0
– 𝑒 = the mathematical constant approximated by 2.71828. It is the
base for natural logarithms, called Euler’s number
• The sum of Poisson random variables is also a Poisson random
variable, e.g. the sum of 𝐾 𝑃𝑜𝑖𝑠𝑠𝑜𝑛(𝜆) r.v. is a 𝑃𝑜𝑖𝑠𝑠𝑜𝑛 𝐾𝜆 r.v.
Poisson Distribution
• Two important applications of the Poisson
distribution in the modern global economy are the
probability of failures in complex systems and the
probability of defective products in large
production runs of several hundred thousand to a
million units.
• The Poisson distribution has been found to be
particularly useful in waiting line problems.
– As a store manager, how to balance long lines (too few
checkout lines, losing customers) and idle customer
service associates (too many lines, resulting waste)?
Graph of Poisson Probabilities
Poisson Distribution Shape
• The shape of the Poisson Distribution depends on
the parameter 𝜆
Data Science & Statistics Tutorial: The Poisson Distribution
[Link] (5:08)
Poisson Distribution Approximation
• Intuitively, the Poisson r.v. is the binomial r.v.
taking limit as 𝑝 → 0 and n → ∞. If n𝑝 → 𝜆
which specifies the average number of
occurrences (successes) for a particular time
(and/or space), then the binomial
distribution converges to the Poisson
distribution
• When 𝑛 is large and 𝑛𝑝 is of only moderate
size (preferably 𝑛𝑝 ≤ 7), the binomial
distribution can be approximated by
𝑃𝑜𝑖𝑠𝑠𝑜𝑛(𝑛𝑝).
Poisson and Binomial Distribution
• We can derive the Poisson distribution directly from the
Binomial distribution in the limit as 𝑝% → 0 and 𝑛 → ∞
• With these limits, the parameter 𝜆 = 𝑛𝑝% is a constant
that specifies the average number of occurrences
(successes) for a particular time and/or space
• Poisson distribution can be used to approximate the
binomial probabilities, when
– number of trials 𝑛 is large
– probability 𝑃 is small, generally such that 𝜆 = 𝑛𝑝% ≤ 7
– E.g., an insurance company will hold a large number of life
policies on individuals of any particular age, and the
probability that a single policy will result in a claim during the
year is very low. Here, we have a binomial distribution with
large 𝑛 and small 𝑃
Poisson and Binomial Distribution
Practice Question
An analyst predicted that 2% of all small corporations would file for bankruptcy in the
coming year. For a random sample of 100 small corporations, estimate the probability
that at least 3 will file for bankruptcy in the next year, assuming that the analyst’s
predication is correct. Find this probability using two different methods and compare
your results. (hint: use Binomial and Poisson distributions)
Practice Question
An analyst predicted that 2% of all small corporations would file for bankruptcy in the coming year. For a
random sample of 100 small corporations, estimate the probability that at least 3 will file for bankruptcy in
the next year, assuming that the analyst’s predication is correct. Find this probability using two different
methods and compare your results. (hint: use Binomial and Poisson distributions)

Step 1: translate from English to statistics

The question asks for, in statistical language, find 𝑃(𝑋 ≥ 3) when:


– 𝑋~𝐵𝑖𝑛𝑜𝑚𝑖𝑎𝑙 𝑛, 𝑝! 𝑤ℎ𝑒𝑟𝑒 𝑛 = 100, 𝑝" = 0.02
– 𝑋~𝑃𝑜𝑖𝑠𝑠𝑜𝑛 𝜆 𝑤ℎ𝑒𝑟𝑒 𝜆 = 2

Step 2: use CDF to compute for the probabilities

𝑃 𝑋 ≥3 =1−𝑃 𝑋 <3 =1−𝑃 𝑋 ≤2

With Poisson: 𝑃 𝑋 ≥ 3 = 1 − 𝑃 𝑋 ≤ 2 = 1 − 5𝑒 !" ≈ 0.323

With Binomial : 𝑃 𝑋 ≥ 3 = 1 − 𝑃 𝑋 ≤ 2 ≈ 0.323 (check the interim steps on your own)

Step 3: conclude – the probability that at least 3 corporations would file for bankruptcy is 0.323; both Poisson
and Binomial distribution give similar estimates.
(*) Approximation result using R. Sometimes the
approximation is not as this good.
Hypergeometric Distribution
Probability
Distributions

Ch. 4 Discrete Continuous Ch. 5


Probability Probability
Distributions Distributions

Binomial Uniform

Poisson Normal

Hypergeometric Exponential
Hypergeometric Distribution
• Let’s say we have to form a committee of 5 professors. We have
9 professors to choose from. Out of the 9, three of them are
Finance professors, and six are Econ professors.
– What’s the probability of having 3 Econ professors on the committee?
– Can we use Binomial? --- “success” is defined as having an Econ professor,
the “success rate” is 6/9?
• We should use the hypergeometric distribution since the success
rate is not constant in this case.
• If the Binomial distribution can be treated as from random
sampling with replacement from a population of size N, and S of
)
which are successes and * = 𝑝, then the Hypergeometric
distribution models the number of successes from random
sampling without replacement.
Hypergeometric Distribution
• A r.v. with this distribution is denoted as
𝑋~𝐻𝑦𝑝𝑒𝑟𝑔𝑒𝑜𝑚𝑒𝑡𝑟𝑖𝑐(𝑛, 𝑁, 𝑆)
• The Binomial assumes items are drawn independently, with the
probability of selecting an item being constant. This assumption
is easier met if a small sample is drawn w/o replacement from a
"
large population (e.g. 𝑁 > 10000 and * < 1%).
• When we draw from a small population, the probability of
selecting an item is changing with each selection, because the
number of remaining items is changing.
• In this case, the Hypergeometric distribution is used for
situations similar to the binomial with the important exception
that sample observations are not replaced in the population
when sampling from a “small population.”
Practice Question
3 different computers are checked from 10 in the department. 4
of the 10 computers have illegal software loaded. What is the
probability that 2 of the 3 selected computers have illegal
software loaded?
Practice Question
3 different computers are checked from 10 in the department. 4
of the 10 computers have illegal software loaded. What is the
probability that 2 of the 3 selected computers have illegal
software loaded?
𝑁 = 10, 𝑛 = 3
𝑆 = 4, 𝑥 = 2
𝑁 − 𝑆 = 6, 𝑛 − 𝑥 = 1

( " )( " )
! " !! ! "

% ( # = #) = C C #
"
$! #
=C C
#
$%
$
= = %'&
C $ C &
$#%

The probability that 2 of the 3 selected computers have illegal software loaded is
0.30, or 30%.
Practice Question
A company receives a shipment of 20 items. Because inspection
of each individual item is expensive, it has a policy of checking a
random sample of 6 items from such a shipment, and if no more
than 1 sampled item is defective, the remainder will not be
checked. What is the probability that a shipment of 5 defective
items will not be subjected to additional checking?
Practice Question
A company receives a shipment of 20 items. Because
inspection of each individual item is expensive, it has a policy
of checking a random sample of 6 items from such a
shipment, and if no more than 1 sampled item is defective,
the remainder will not be checked. What is the probability
that a shipment of 5 defective items will not be subjected to
additional checking?
• “success”: having defective items
• # of defective items 𝑋~𝐻𝑦𝑝𝑒𝑟𝑔𝑒𝑜𝑚𝑒𝑡𝑟𝑖𝑐(𝑛, 𝑁, 𝑆)
• shipment of 20 itemsà 𝑁 = 20
• checking a random sample of 6 itemsà 𝑛 = 6
• 𝑆 = 5 of the 20 items are defective
• Probability of shipment is not checked further 𝑃 0 + 𝑃(1)
Jointly Distributed Discrete Random
Variables
Bivariate Discrete R.V.s
Joint Probability Distributions
• We can use bivariate probability distribution to model the relationship between
univariate r.v.’s

• 𝑝 𝑥, 𝑦 is a straightforward extension of joint probabilities in Topic 2, where X = 𝑥 and


Y = 𝑦 are two events with 𝑥 and 𝑦 indexing them.
• From Topic 2 probability postulates:
• 0 ≤ 𝑝 𝑥, 𝑦 ≤ 1
• ∑9 ∑: 𝑝 𝑥, 𝑦 = 1
Bivariate Discrete R.V.s
Marginal Probability Distributions
Conditional Probability Distribution
The conditional probability distribution of the random variable Y
expresses the probability that Y takes the value y when the
value x is specified for X.
!"#& %$
!"% ' #$ =
!"#$

Similarly, the conditional probability function of X, given Y = y is:

!"%& #$
!"% ' #$ =
!"#$
Independence of Bivariate R.V.’s
Conditional Mean and Variance
The conditional mean and variance are computed as
𝜇N|+P-B = 𝐸 𝑌 𝑋 = 𝑥Q = , 𝑦𝑃(𝑦|𝑥* )
R
, ,
𝜎N|+P-B
=𝐸 𝑌 − 𝜇N|+P-B 𝑋 = 𝑥Q
,
= , 𝑦 − 𝜇N|+P-B 𝑃(𝑦|𝑥* )
R
• For any constant 𝑎 and 𝑏, 𝑊 = 𝑎 + 𝑏𝑌 has
o 𝜇>|@A!! = 𝑎 + 𝑏𝜇B|@A!!
C C𝜎 C
o 𝜎>|@A!!
= 𝑏 B|@A!!
Practice Question
Suppose that you consider investing money in two stocks, XXX and YYY, whose
return can only take four possible values. Let 𝑋 and 𝑌 be random variables of
possible percent returns for each stock.
1. Find the marginal probabilities
2. Determine if XXX and YYY are independent
3. Find the means and variances of both 𝑋 and 𝑌
Practice Question
Suppose that you consider investing money in two stocks,
XXX and YYY, whose return can only take four possible
values. Let 𝑋 and 𝑌 be random variables of possible
percent returns for each stock.
1. Find the marginal probabilities
2. Determine if XXX and YYY are independent
3. Find the means and variances of both 𝑋 and 𝑌

1. 𝑃 𝑋 = 0 = ∑: 𝑝(0, 𝑦) = 0.0625 + 0.0625 + 0.0625 + 0.0625 = 0.25


you will also do it for the other seven.

2. To test independence, check if 𝑃 𝑥, 𝑦 = 𝑃 𝑥 𝑃(𝑦) for all possible pairs of values of


x and y. 𝑃 𝑥, 𝑦 = 0.0625 for all pairs, 𝑃 𝑥 = 0.25 and 𝑃 𝑦 = 0.25 for all values of
x and y. And 𝑃 𝑥, 𝑦 = 0.0625 = 0.25 0.25 = 𝑃 𝑥 𝑃 𝑦 . Independent.

3.𝜇9 = 𝐸 𝑋 =
∑9 𝑥𝑃 𝑥 = 0 0.25 + 0.05 0.25 + 0.1 0.25 + 0.15 0.25 = 0.075. Similarly,
𝜎9; = 𝑉𝑎𝑟 𝑥 = ∑9 𝑥 − 𝜇9 ; 𝑝 𝑥 = 0 − 0.075 ; ∗ 0.25 + (0.05 −
0.075); ∗ 0.25 + 0.1 − 0.075 ; ∗ 0.25 + 0.15 − 0.075 ; ∗ 0.25 = 0.003125.
𝜇: = 0.075. 𝜎:; = 0.003125.
Mean and Variance of (Linear) Functions
• For a function of 𝑋, 𝑌 , 𝑔 𝑋, 𝑌 , its mean E 𝑔 𝑋, 𝑌 is defined
as
E 𝑔 𝑋, 𝑌 = X X 𝑔 𝑥, 𝑦 𝑝(𝑥, 𝑦)
! S
• For a linear function of 𝑋, 𝑌 , let’s call it W: W = 𝑎𝑋 + 𝑏𝑌
– 𝜇3 ≔ 𝐸 𝑊 = 𝑎𝜇4 + 𝑏𝜇5
6
– 𝜎3 ≔ 𝑉𝑎𝑟(𝑊) = 𝑎6 𝜎46 + 𝑏 6 𝜎76 + 2𝑎𝑏𝜎45

• E.g. W is the total revenue of two products with 𝑋, 𝑌 being


sales and 𝑎, 𝑏 being prices
Mean and Variance of (Linear) Functions
• For a linear function of 𝑋, 𝑌 , let’s call it W: W = 𝑎𝑋 + 𝑏𝑌
– 𝜇< ≔ 𝐸 𝑊 = 𝑎𝜇= + 𝑏𝜇>
;
– 𝜎< ≔ 𝑉𝑎𝑟(𝑊) = 𝑎; 𝜎=; + 𝑏 ; 𝜎:; + 2𝑎𝑏𝜎=>

• If 𝑎 = 𝑏 = 1, then 𝐸 𝑋 + 𝑌 = 𝐸 𝑋 + 𝐸 𝑌
– i.e. the mean of the sum is the sum of the means.
• If 𝑎 = 1 and 𝑏 = −1, then 𝐸 𝑋 − 𝑌 = 𝐸 𝑋 − 𝐸 𝑌
– i.e. the mean of difference is the difference of means.
• If 𝑎 = 𝑏 = 1, and 𝜎45 = 0, then 𝑉𝑎𝑟 𝑋 + 𝑌 = 𝑉𝑎𝑟 𝑋 + 𝑉𝑎𝑟(𝑌)
– i.e. the variance of the sum is the sum of the variances.
• If 𝑎 = 1, 𝑏 = −1, and 𝜎45 = 0,then 𝑉𝑎𝑟 𝑋 − 𝑌 = 𝑉𝑎𝑟 𝑋 + 𝑉𝑎𝑟(𝑌)
– i.e. the variance of difference is the sum of variances.
Mean and Variance of (Linear) Functions

These results can be extended to the


linear combination of many random
variables
Jointly Distributed Discrete r.v.: Covariance

• Let X and Y be discrete random variables with means μX


and μY
• The expected value of (X - μX)(Y - μY) is called the covariance
between X and Y
• For discrete random variables
,-.'$E #% = *+'$ " !$ %'# " !# %) = !! '! " !! %'" " !" %C'!E "%
! "

• An equivalent expression is
*+,%EC )# = '%E)# " !!!" = !! !"$%!C "# " !!!"
! "
Jointly Distributed Discrete r.v.: Correlation

• The correlation between X and Y is:

$%C'"( !#
" = $%))'"( !# =
! "! !
• −1 ≤ 𝜌 ≤ 1
• 𝜌 = 0 ⟹ no linear relationship between X and Y
• 𝜌 > 0 ⟹ positive linear relationship between X and Y
» when X is high (low) then Y is likely to be high (low)
» p = +1 ⟹ perfect positive linear dependency
• 𝜌 < 0 ⟹ negative linear relationship between X and Y
» when X is high (low) then Y is likely to be low (high)
» p = −1 ⟹ perfect negative linear dependency
Covariance and Independence
• The covariance measures the strength of the linear relationship
between two variables
• If two random variables are statistically independent, the covariance
between them is 0
– The converse is not necessarily true

8 8 8
• E.g. if 𝑋 is distribution as p −1 = , 𝑝 0 = and 𝑝 1 = . Let
9 6 9
6
𝑌 = 𝑋 . Then Cov(X, 𝑌) = 0. But X 𝑎𝑛𝑑 𝑌 are not independent.
Why?
Covariance and Independence
• The covariance measures the strength of the linear relationship between
two variables
• If two random variables are statistically independent, the covariance
between them is 0
– The converse is not necessarily true

8 8 8
• E.g. if 𝑋 is distribution as p −1 = , 𝑝 0 = and 𝑝 1 = . Let 𝑌 =
9 6 9
6
𝑋 . Then Cov(X, 𝑌) = 0. But X 𝑎𝑛𝑑 𝑌 are not independent. Why?
– Not independent: because X determines Y.
% % %
– p 𝑥 = −1, 𝑦 = 1 = $ , p 𝑥 = −1 = $ , p 𝑦 = 1 = !
%
– E X = 0 and E Y = (check on your own)
!
% % %
– Joint distribution of (𝑋, 𝑌) is p −1,1 = $, 𝑝 0,0 = ! and 𝑝 1,1 = $. So E XY = 0,
so Cov 𝑋, 𝑌 = E 𝑋Y − 𝐸 𝑋 𝐸 𝑌 = 0
Practice Question
Suppose that you consider investing money in two
stocks, XXX and YYY, whose return can only take four
possible values. Let 𝑋 and 𝑌 be random variables of
possible percent returns for each stock.
1. Find the covariance and correlation of 𝑋 and 𝑌
2. You decide to get a portfolio of 40% of XXX and
60% of YYY. Find the mean and variance of your
portfolio.
Practice Question
Suppose that you consider investing money in two
stocks, XXX and YYY, whose return can only take four
possible values. Let 𝑋 and 𝑌 be random variables of
possible percent returns for each stock.
1. Find the covariance and correlation of 𝑋 and 𝑌
2. You decide to get a portfolio of 40% of XXX and
60% of YYY. Find the mean and variance of your
portfolio.

1. Independence: both correlation and covariance are 0.


2. W = 0.4X + 0.6Y. 𝐸 𝑊 = 0.4𝐸 𝑋 + 0.6𝐸 𝑌 = 0.075. Var W =
Cov 0.4𝑋 + 0.6𝑌 = 0.46 Cov 𝑋 + 0.66 𝐶𝑜𝑣 𝑌 + 2𝐶𝑜𝑣 𝑋, 𝑌 =
0.001625 < 0.003125
Intuition behind investment rule: diversification of assets ⟹ lowers
variance (hence lowers risk).

You might also like