Discrete Random Variables Explained
Discrete Random Variables Explained
Topic 3
Discrete Random Variables
Xinyao Qiu
Random
Variables
Discrete Continuous
Random Variable Random Variable
Probability Distributions
for Discrete Random Variables
Probability Distribution Function
• The probability distribution function, 𝑝(𝑥), of a
discrete random variable 𝑋 represents the probability
that 𝑋 takes the value 𝑥, as a function of 𝑥. That is
𝑝 𝑥 = 𝑃 𝑋 = 𝑥 for all values of 𝑥
– such that:
• 0 ≤ 𝑝 𝑥 ≤ 1 for any value of 𝑥
• The individual probabilities sum to 1, that is, ∑! 𝑝 𝑥 = 1, where
the notation indicates summation over all possible values of 𝑥
• Sometimes the probability distribution of a discrete
random variable is called the probability mass function
(PMF)
– Note that 𝑋 = 𝑥 must be an event; otherwise 𝑃 𝑋 = 𝑥 is
not well defined.
Notation: I will use 𝑝 𝑥 and 𝑝 for PMF.
- Rather than 𝑃 𝑥 and 𝑃 as in the textbook.
- To avoid confusion with the probability symbol 𝑃
Example 4.1: Number of Product Sales
Cumulative Distribution Function
• The cumulative distribution function (CDF),
𝐹(𝑥! ), of a discrete or continuous random
variable 𝑋, represents the probability that 𝑋 does
not exceed the value 𝑥! , as a function of 𝑥! . That
is
𝐹 𝑥! = 𝑃 𝑋 ≤ 𝑥!
– where the function is evaluated at all values of 𝑥!
– such that:
• 0 ≤ 𝐹 𝑥! ≤ 1 for every number 𝑥!
• if 𝑥! and 𝑥" are two numbers with 𝑥! < 𝑥" , then 𝐹 𝑥! ≤
𝐹 𝑥"
– 𝑝 # and 𝐹 # are probabilistic counterparts of
histogram and ogive in Topic 1
We can show that
𝐹 𝑥! = ∑"#"! 𝑝(𝑥)
• where the notation implies that summation is over all
possible values of 𝑥 that are less than or equal to 𝑥*
Relationship between PMF and CDF for Discrete r.v.
𝐹 𝑥! = $ 𝑝(𝑥)
"#""
– where the function is evaluated at all values of 𝑥*
– From the definition of cdf:
i. 0 ≤ 𝐹 𝑥! ≤ 1 ∀ 𝑥!
ii. if 𝑥! < 𝑥$, then 𝐹 𝑥! ≤ 𝐹 𝑥$ (i.e. 𝐹 0 is a
(weakly) increasing function)
Practice Question
Which of the following is true about a probability distribution?
A) The sum of probabilities of all possible outcomes could not equal 1.
B) The representation must be graphed, not tabular or algebraic.
C) The probability of each outcome must be between 0 and 1, inclusive.
D) The outcomes do not need to be mutually exclusive.
Practice Question
Which of the following is true about a probability distribution?
A) The sum of probabilities of all possible outcomes must not equal 1.
B) The representation must be graphed, not tabular or algebraic.
C) The probability of each outcome must be between 0 and 1, inclusive.
D) The outcomes do not need to be mutually exclusive.
Answer: C
$ 5$
𝐸 𝑋 = ∑ 𝑥𝑝(𝑥) = 4 ∑4$ 𝑖 = 4
= 3.5
Q2: Toss 2 coins, let x = # of heads. Compute expected value of x.
𝜎$% = 𝐸 𝑋 − 𝜇$ %
= $ 𝑥 − 𝜇$ % 𝑝 𝑥
"
= $ 𝑥 % 𝑝 𝑥 − 2𝜇$ $ 𝑥𝑝 𝑥 + 𝜇$ % $ 𝑝 𝑥
" " "
= 𝐸 𝑋 % − 2𝜇$ 𝐸 𝑋 + 𝜇$%
= 𝐸 𝑋 % − 2𝜇$% + 𝜇$%
= 𝐸 𝑋 % − 𝜇$%
Practice Question
Q: Toss 2 coins, let x = # of heads. Compute the standard deviation
of x. (recall from two slides ago 𝐸 𝑋 = 1)
Practice Question
Q: Toss 2 coins, let x = # of heads. Compute the standard deviation
of x. (recall from two slides ago 𝐸 𝑋 = 1)
x P(x)
0 .25
1 .50
2 .25
• 𝜎+ = ∑-{ 𝑥 − 𝜇+ , 𝑃(𝑥)}
= (0 − 1), ×0.25 + (1 − 1), ×0.5 + (2 − 1), ×0.25
= 0.5 = 0.707
Expectation of Functions of a R.V.
• Let 𝑋 be a discrete random variable with probability
distribution 𝑝(𝑥), and let g(𝑥) be some function of 𝑋.
Then the expected value, E[𝑔 𝑥 ], of that function is
defined as follows:
𝐸𝑔 𝑋 = ,{𝑔 𝑥 𝑝(𝑥)}
-
– E.g. 𝑋 can be the time to complete a contract, and 𝑔 𝑥 is
the cost when the completion time is 𝑋. And we are
interested in knowing the expected cost.
• In general, 𝐸 𝑔 𝑋 ≠ 𝑔(𝐸[𝑋]) unless g(𝑥) is linear
– How about g 𝑥 = 𝑋 5?
Expectation of Functions of a R.V.
• Let 𝑋 be a discrete random variable with probability
distribution 𝑃(𝑥), and let g(𝑥) be some function of 𝑋.
Then the expected value, E[𝑔 𝑥 ], of that function is
defined as follows:
𝐸𝑔 𝑋 = ,{𝑔 𝑥 𝑃(𝑥)}
-
– E.g. 𝑋 can be the time to complete a contract, and 𝑔 𝑥 is
the cost when the completion time is 𝑋. And we are
interested in knowing the expected cost.
• In general, 𝐸 𝑔 𝑋 ≠ 𝑔(𝐸[𝑋]) unless g(𝑥) is linear
– How about g 𝑋 = 𝑋 5?
– 𝐸 𝑔 𝑋 − 𝑔 𝐸 𝑋 = 𝐸 𝑋 5 − 𝜇5 = 𝜎 5 ≠ 0
– i.e. 𝐸 𝑔 𝑋 ≠ 𝑔(𝐸[𝑋]) when g 𝑋 = 𝑋 5
Mean and Variance of Linear Functions of Random Variables
𝜇2 ≡ 𝐸 𝑌 = 𝐸 𝑎 + 𝑏𝑋 = 𝑎 + 𝑏𝐸 𝑋 ≡ 𝑎 + 𝑏𝜇3
Binomial Uniform
Poisson Normal
Hypergeometric Exponential
Binomial Distribution
Probability
Distributions
Binomial Uniform
Poisson Normal
Hypergeometric Exponential
Bernoulli Distribution
The Bernoulli random variable is a random variable taking only two values, 0
and 1, labeled as "failure" and "success” in one trial.
• Let 𝑝! denote the probability of success, and, the probability of failure (1 −
𝑝! ). The Bernoulli distribution, 𝑋~𝐵𝑒𝑟𝑛𝑜𝑢𝑙𝑙𝑖(𝑝! ), is
𝑝 0 = 1 − 𝑝! and 𝑝 1 = 𝑝!
𝜇" = 𝐸 𝑋 = C 𝑥𝑝 𝑥 = 0 1 − 𝑝! + 1 𝑝! = 𝑝!
#
• 𝜎"$ achieves its maximum when 𝑝! = 0.5; 𝜎"$ achieves its minimum when
𝑝! = 0 𝑜𝑟 1
[Link]
𝑛! ! "$!
𝑝 𝑥|𝑛, 𝑝% = 𝑝% 1 − 𝑝%
𝑥! 𝑛 − 𝑥 !
– Mean: 𝜇# = 𝐸 𝑋 = 𝑛𝑝%
– Variance: 𝜎#& = 𝐸 𝑋 − 𝜇# & = 𝑛𝑝% (1 − 𝑝% )
• More discussion on this on jointly distributed r.v.’s
Binomial Distribution
Before using the binomial, the specific situation
must be analyzed to determine if the following
occur:
• The application involves several trials, each of
which has only two outcomes: yes or no, on or
off, success or failure (i.e. each trial is a Bernoulli
r.v.)
• The probability of the outcome is the same for
each trial
• The probability of the outcome on one trial does
not affect the probability of other trials
Practice Question
What is the probability of 3 successes in 5 independent trials
with success rate of 0.1?
Practice Question
What is the probability of 3 successes in 5 independent trials with
success rate of 10%?
• Step 1: figure out which random variable this is.
• Step 2: figure out the key parameters of this r.v..
– Binomial: 𝑛 = 5 independent trials, 𝑥 = 3 successes, and 𝑝 = 0.1
• Step 3: rewriting the question in statistical language.
– Known: this r.v. follows a binomial distribution
– Also known: in general binomial distribution follows the following
probability formula
#! % #&%
• 𝑝 𝑥|𝑛, 𝑝" = 𝑝" 1 − 𝑝"
%! #&% !
– Want: 𝑝 𝑥 𝑛, 𝑝 = 𝑝(3|5,0.1)
• Step 4: compute the desired probability
'! ( '&(
– 𝑝 3 5,0.1 = 0.1 1 − 0.1 = 0.0081
(! '&( !
Practice Question
Suppose that you are in charge of marketing airline seats for a
major carrier. Four days before the flight date you have 16 seats
remaining on the plane. You know from past experience data
that 80% of the people that purchase tickets in this time period
will actually show up for the flight.
– Assume all people that purchase tickets before this will show up, so
we don’t consider these people in this question.
Assume the amount of people that purchase tickets in this time
period and end up showing up follows binomial distribution. If
you sell 20 extra tickets, what is the probability that you will
1. have 16 people show up?
2. have at least 1 empty seat?
3. overbook the flight?
Practice Question
Step 1: (translate English to statistics)
Define this binomial distribution
• # of people show up out of 20 à 𝑋
• 80% show up à 𝑝I = 𝑃 𝑠ℎ𝑜𝑤 𝑢𝑝 = 0.8
• You sell 20 extra tickets à 𝑛 = 20
• Probability of 16 people show upà 𝑃(𝑋 = 16)
• Probability of at least 1 empty seat à 𝑃(𝑋 ≤ 15)
• Probability of overbooking the flight à 𝑃(𝑋 > 16)
Practice Question
• Probability of 16 people show up
!"! %& $
– 𝑃 𝑋 = 16 = 𝑝 𝑥 𝑛, 𝑝 = 𝑝 16 20,0.8 = 0.8 0.2
$!%&!
– 𝑃 𝑋 = 16 = 0.2181994
• Probability of at least 1 empty seat
– 𝑃 𝑋 ≤ 15 = 𝑃 𝑋 = 15 + 𝑃 𝑋 = 14 + ⋯ + 𝑃 𝑋 = 1 + 𝑃(𝑋 = 0)
– 𝑃 𝑋 ≤ 15 = 1 − 𝑃 𝑋 = 16 − 𝑃 𝑋 = 17 − 𝑃 𝑋 = 18 − 𝑃 𝑋 = 19 −𝑃 𝑋 = 20
– 𝑃 𝑋 ≤ 15 = 0.3703617
• Probability of overbooking the flight
– 𝑃 𝑋 > 16 = 0.4114489
– The probability of overbooking is too high, then you probably shouldn’t be selling 20
extra tickets.
• The airline management then must evaluate the cost of overbooking versus the cost of
empty seats that generates no revenue. Airlines analyze data to determine the number of
seats that should be sold at reduced rates to maximize profit from each flight.
(*)How do we plot these in R?
(*) Binomial Distribution | R Tutorial
[Link]
distribution
Poisson Distribution
Probability
Distributions
Binomial Uniform
Poisson Normal
Hypergeometric Exponential
"!
" ! !
# ( !) =
!!
You work for an Apple store and manage the genius bar. Based
on your experience, customers without appointments arrive at
your genius bar at an average rate of 2 every hour.
– Assume these walk-in visits are independent, with a
constant arrival rate.
– Normally you only have two genius bar spots (which are
not reserved) available each hour. You want to know the
probability that you will have more than two walk-in visits
in one hour.
• We can solve this question using Poisson distribution
You work for an Apple store and manage the genius
bar. Based on your experience, customers without
appointments arrive at your genius bar at an average
rate of 2 every hour.
Step 3: conclude – the probability that at least 3 corporations would file for bankruptcy is 0.323; both Poisson
and Binomial distribution give similar estimates.
(*) Approximation result using R. Sometimes the
approximation is not as this good.
Hypergeometric Distribution
Probability
Distributions
Binomial Uniform
Poisson Normal
Hypergeometric Exponential
Hypergeometric Distribution
• Let’s say we have to form a committee of 5 professors. We have
9 professors to choose from. Out of the 9, three of them are
Finance professors, and six are Econ professors.
– What’s the probability of having 3 Econ professors on the committee?
– Can we use Binomial? --- “success” is defined as having an Econ professor,
the “success rate” is 6/9?
• We should use the hypergeometric distribution since the success
rate is not constant in this case.
• If the Binomial distribution can be treated as from random
sampling with replacement from a population of size N, and S of
)
which are successes and * = 𝑝, then the Hypergeometric
distribution models the number of successes from random
sampling without replacement.
Hypergeometric Distribution
• A r.v. with this distribution is denoted as
𝑋~𝐻𝑦𝑝𝑒𝑟𝑔𝑒𝑜𝑚𝑒𝑡𝑟𝑖𝑐(𝑛, 𝑁, 𝑆)
• The Binomial assumes items are drawn independently, with the
probability of selecting an item being constant. This assumption
is easier met if a small sample is drawn w/o replacement from a
"
large population (e.g. 𝑁 > 10000 and * < 1%).
• When we draw from a small population, the probability of
selecting an item is changing with each selection, because the
number of remaining items is changing.
• In this case, the Hypergeometric distribution is used for
situations similar to the binomial with the important exception
that sample observations are not replaced in the population
when sampling from a “small population.”
Practice Question
3 different computers are checked from 10 in the department. 4
of the 10 computers have illegal software loaded. What is the
probability that 2 of the 3 selected computers have illegal
software loaded?
Practice Question
3 different computers are checked from 10 in the department. 4
of the 10 computers have illegal software loaded. What is the
probability that 2 of the 3 selected computers have illegal
software loaded?
𝑁 = 10, 𝑛 = 3
𝑆 = 4, 𝑥 = 2
𝑁 − 𝑆 = 6, 𝑛 − 𝑥 = 1
( " )( " )
! " !! ! "
% ( # = #) = C C #
"
$! #
=C C
#
$%
$
= = %'&
C $ C &
$#%
The probability that 2 of the 3 selected computers have illegal software loaded is
0.30, or 30%.
Practice Question
A company receives a shipment of 20 items. Because inspection
of each individual item is expensive, it has a policy of checking a
random sample of 6 items from such a shipment, and if no more
than 1 sampled item is defective, the remainder will not be
checked. What is the probability that a shipment of 5 defective
items will not be subjected to additional checking?
Practice Question
A company receives a shipment of 20 items. Because
inspection of each individual item is expensive, it has a policy
of checking a random sample of 6 items from such a
shipment, and if no more than 1 sampled item is defective,
the remainder will not be checked. What is the probability
that a shipment of 5 defective items will not be subjected to
additional checking?
• “success”: having defective items
• # of defective items 𝑋~𝐻𝑦𝑝𝑒𝑟𝑔𝑒𝑜𝑚𝑒𝑡𝑟𝑖𝑐(𝑛, 𝑁, 𝑆)
• shipment of 20 itemsà 𝑁 = 20
• checking a random sample of 6 itemsà 𝑛 = 6
• 𝑆 = 5 of the 20 items are defective
• Probability of shipment is not checked further 𝑃 0 + 𝑃(1)
Jointly Distributed Discrete Random
Variables
Bivariate Discrete R.V.s
Joint Probability Distributions
• We can use bivariate probability distribution to model the relationship between
univariate r.v.’s
!"%& #$
!"% ' #$ =
!"#$
Independence of Bivariate R.V.’s
Conditional Mean and Variance
The conditional mean and variance are computed as
𝜇N|+P-B = 𝐸 𝑌 𝑋 = 𝑥Q = , 𝑦𝑃(𝑦|𝑥* )
R
, ,
𝜎N|+P-B
=𝐸 𝑌 − 𝜇N|+P-B 𝑋 = 𝑥Q
,
= , 𝑦 − 𝜇N|+P-B 𝑃(𝑦|𝑥* )
R
• For any constant 𝑎 and 𝑏, 𝑊 = 𝑎 + 𝑏𝑌 has
o 𝜇>|@A!! = 𝑎 + 𝑏𝜇B|@A!!
C C𝜎 C
o 𝜎>|@A!!
= 𝑏 B|@A!!
Practice Question
Suppose that you consider investing money in two stocks, XXX and YYY, whose
return can only take four possible values. Let 𝑋 and 𝑌 be random variables of
possible percent returns for each stock.
1. Find the marginal probabilities
2. Determine if XXX and YYY are independent
3. Find the means and variances of both 𝑋 and 𝑌
Practice Question
Suppose that you consider investing money in two stocks,
XXX and YYY, whose return can only take four possible
values. Let 𝑋 and 𝑌 be random variables of possible
percent returns for each stock.
1. Find the marginal probabilities
2. Determine if XXX and YYY are independent
3. Find the means and variances of both 𝑋 and 𝑌
3.𝜇9 = 𝐸 𝑋 =
∑9 𝑥𝑃 𝑥 = 0 0.25 + 0.05 0.25 + 0.1 0.25 + 0.15 0.25 = 0.075. Similarly,
𝜎9; = 𝑉𝑎𝑟 𝑥 = ∑9 𝑥 − 𝜇9 ; 𝑝 𝑥 = 0 − 0.075 ; ∗ 0.25 + (0.05 −
0.075); ∗ 0.25 + 0.1 − 0.075 ; ∗ 0.25 + 0.15 − 0.075 ; ∗ 0.25 = 0.003125.
𝜇: = 0.075. 𝜎:; = 0.003125.
Mean and Variance of (Linear) Functions
• For a function of 𝑋, 𝑌 , 𝑔 𝑋, 𝑌 , its mean E 𝑔 𝑋, 𝑌 is defined
as
E 𝑔 𝑋, 𝑌 = X X 𝑔 𝑥, 𝑦 𝑝(𝑥, 𝑦)
! S
• For a linear function of 𝑋, 𝑌 , let’s call it W: W = 𝑎𝑋 + 𝑏𝑌
– 𝜇3 ≔ 𝐸 𝑊 = 𝑎𝜇4 + 𝑏𝜇5
6
– 𝜎3 ≔ 𝑉𝑎𝑟(𝑊) = 𝑎6 𝜎46 + 𝑏 6 𝜎76 + 2𝑎𝑏𝜎45
• If 𝑎 = 𝑏 = 1, then 𝐸 𝑋 + 𝑌 = 𝐸 𝑋 + 𝐸 𝑌
– i.e. the mean of the sum is the sum of the means.
• If 𝑎 = 1 and 𝑏 = −1, then 𝐸 𝑋 − 𝑌 = 𝐸 𝑋 − 𝐸 𝑌
– i.e. the mean of difference is the difference of means.
• If 𝑎 = 𝑏 = 1, and 𝜎45 = 0, then 𝑉𝑎𝑟 𝑋 + 𝑌 = 𝑉𝑎𝑟 𝑋 + 𝑉𝑎𝑟(𝑌)
– i.e. the variance of the sum is the sum of the variances.
• If 𝑎 = 1, 𝑏 = −1, and 𝜎45 = 0,then 𝑉𝑎𝑟 𝑋 − 𝑌 = 𝑉𝑎𝑟 𝑋 + 𝑉𝑎𝑟(𝑌)
– i.e. the variance of difference is the sum of variances.
Mean and Variance of (Linear) Functions
• An equivalent expression is
*+,%EC )# = '%E)# " !!!" = !! !"$%!C "# " !!!"
! "
Jointly Distributed Discrete r.v.: Correlation
$%C'"( !#
" = $%))'"( !# =
! "! !
• −1 ≤ 𝜌 ≤ 1
• 𝜌 = 0 ⟹ no linear relationship between X and Y
• 𝜌 > 0 ⟹ positive linear relationship between X and Y
» when X is high (low) then Y is likely to be high (low)
» p = +1 ⟹ perfect positive linear dependency
• 𝜌 < 0 ⟹ negative linear relationship between X and Y
» when X is high (low) then Y is likely to be low (high)
» p = −1 ⟹ perfect negative linear dependency
Covariance and Independence
• The covariance measures the strength of the linear relationship
between two variables
• If two random variables are statistically independent, the covariance
between them is 0
– The converse is not necessarily true
8 8 8
• E.g. if 𝑋 is distribution as p −1 = , 𝑝 0 = and 𝑝 1 = . Let
9 6 9
6
𝑌 = 𝑋 . Then Cov(X, 𝑌) = 0. But X 𝑎𝑛𝑑 𝑌 are not independent.
Why?
Covariance and Independence
• The covariance measures the strength of the linear relationship between
two variables
• If two random variables are statistically independent, the covariance
between them is 0
– The converse is not necessarily true
8 8 8
• E.g. if 𝑋 is distribution as p −1 = , 𝑝 0 = and 𝑝 1 = . Let 𝑌 =
9 6 9
6
𝑋 . Then Cov(X, 𝑌) = 0. But X 𝑎𝑛𝑑 𝑌 are not independent. Why?
– Not independent: because X determines Y.
% % %
– p 𝑥 = −1, 𝑦 = 1 = $ , p 𝑥 = −1 = $ , p 𝑦 = 1 = !
%
– E X = 0 and E Y = (check on your own)
!
% % %
– Joint distribution of (𝑋, 𝑌) is p −1,1 = $, 𝑝 0,0 = ! and 𝑝 1,1 = $. So E XY = 0,
so Cov 𝑋, 𝑌 = E 𝑋Y − 𝐸 𝑋 𝐸 𝑌 = 0
Practice Question
Suppose that you consider investing money in two
stocks, XXX and YYY, whose return can only take four
possible values. Let 𝑋 and 𝑌 be random variables of
possible percent returns for each stock.
1. Find the covariance and correlation of 𝑋 and 𝑌
2. You decide to get a portfolio of 40% of XXX and
60% of YYY. Find the mean and variance of your
portfolio.
Practice Question
Suppose that you consider investing money in two
stocks, XXX and YYY, whose return can only take four
possible values. Let 𝑋 and 𝑌 be random variables of
possible percent returns for each stock.
1. Find the covariance and correlation of 𝑋 and 𝑌
2. You decide to get a portfolio of 40% of XXX and
60% of YYY. Find the mean and variance of your
portfolio.