Understanding Discrete Random Variables
Understanding Discrete Random Variables
Notation.
The result of a statistical experiment is called a random variable. The observations of a
discrete random variable are quantitative discrete data (as discussed in Chapter 1) which
could be put in a list.
These values do not have to be positive, nor do they have to be whole numbers. Usually, but
not always, the list is limited to just a few values.
Here are some examples of discrete random variable:
o the number of thunderstorms occurring in a particular location over the course of one
year,
Random variables are denoted by capital letters, usually chosen from the end of the alphabet
(e.g. X, Y or Z). Observed values of a random variable are denoted by the corresponding
lower case letter (i.e. x, y or z).
Hence we make probability statements such as P(X = x), which should be read as “the
probability that the random variable X takes the particular value x”.
60
61
x 1 2 3 4 5 6
1 1 1 1 1 1
P(X = x) 6 6 6 6 6 6
Example 4.2
The following table gives the probability distribution for the discrete random variable X which
is an “insurance firm’s projected profit (in $1000’s)”.
Example 4.3
A psychologist determined that the number of sessions required to gain the trust of a new
patient was either 1, 2 or 3. Let Y be the random variable indicating the number of sessions
required to gain the patient’s trust. The following probability distribution has been proposed:
y
P(Y = y) = , (y = 1, 2, 3).
6
Is this probability function valid? Explain.
What is the probability that it takes exactly two sessions to gain the patient’s trust?
What is the probability that is takes at least two sessions to gain at patient’s trust?
1 1
This is given by P(Y ≥ 2) = P(Y = 2) + P(Y = 3) = 3
+ 2
= 56 .
For a discrete random variable X, P(X ≤ x) is called the cumulative distribution function
for X.
Example 4.4
Let X be “the number obtained when rolling a fair six-sided die”. For 0 ≤ x ≤ 6, the
cumulative distribution function is given by
x
P(X ≤ x) =
.
6
3
For example, P(X ≤ 3) = P(X = 3) + P(X = 2) + P(X = 1) = 6
= 12 .
63
Example 4.5
A couple plans to have children until they get a girl, but they agree they will not have more
than three children even if they are all boys. Assume each child is equally likely to be a boy
or a girl.
(i) Let the random variable X denote the number of children the couple has. Calculate the
probability distribution of X.
(ii) Let the random variable Y denote the number of boys the couple has. Calculate the
probability distribution of Y .
Expectation.
The observed value of a random variable is likely to change each time the experiment is
repeated.
We could repeat the experiment n times, and take the sample mean x̄ of these n observed
values. As n becomes large, the sample mean x̄ tends to the expectation or expected value
of the random variable, and can be thought of as being “the long-term average value”.
Before looking at the formal definition, let’s discuss this just using our intuition or “common
sense”:
(Open the link to the video in a new tab or new window - usually done by using right-click.)
64 CHAPTER 4. DISCRETE RANDOM VARIABLES
Definition 4.6
The expectation or expected valueof a random variable is denoted E[X] and is defined as
X
E[X] = xP(X = x).
x
Example 4.7
Let X be “the number obtained when rolling a fair six-sided die”. Then
X 1 1 1 1 1 1 7
E[X] = xP(X = x) = 1 × +2× +3× +4× +5× +6× = .
x
6 6 6 6 6 6 2
Note that the expected value is not necessarily a value that can be obtained in the experiment.
Exercise 4.8
If you roll two fair six-sided dice, what is the expected value of the difference between the
two numbers obtained?
The following video covers this exercise in detail. But you could try it yourself first.
(Open the link to the video in a new tab or new window - usually done by using right-click.)
Recall that in Example 4.5 we calculated the probability distribution for the couple planning
their family of boys and girls. In this video we look at the resulting expected values:
(Open the link to the video in a new tab or new window - usually done by using right-click.)
65
Variance.
We have seen above, that we can think of the expected value as the “long-term average/mean”
x̄ if we repeat an experiment over and over again, or collect larger and larger samples.
In the same way we can think of the variance of a random variable as a “long-term version”
of the sample variance.
Just like the sample variance introduced in Chapter 1, the variance of a random variable is a
measure of the variability of the random variable, i.e. a measure for how much the random
variable differs from its expected value.
Definition 4.9
The variance of a random variable is denoted Var[X] and is defined as
where X
E[X 2 ] = x2 P(X = x).
x
In this video we briefly discuss how to read these formulas and how they are connected to the
formula for the sample variance:
(Open the link to the video in a new tab or new window - usually done by using right-click.)
Example 4.10
Let X be “the number obtained when rolling a fair six-sided die”. Then
X 1 1 1 1 1 1 91
E[X 2 ] = x2 P(X = x) = 1 × + 2 2 × + 32 × + 42 × + 52 × + 62 × = ,
x
6 6 6 6 6 6 6
and 2
2 912 7 35
Var[X] = E[X ] − {E[X]} = − = .
6 2 12
66 CHAPTER 4. DISCRETE RANDOM VARIABLES
Fact 4.11
(i) One can show that
E[X 2 ] − {E[X]}2 = E {X − E[X]}2 ,
which shows that Var[X] is a measure for how much the random variable X is expected
to differ from E[X].
(ii) A variance of zero implies the random variable takes a single value with probability 1.
Example 4.12
You are invited to throw a fair six-sided die. You will pay £5 to roll the die, and you will
receive twice the value (in £) of the number showing on the die after it is rolled (e.g. if you
roll a 3, you will receive £6). Define the random variable X to be the number showing after
the die is rolled. Define the random variable Y to be the profit you make when you roll the
die. So Y = 2X − 5.
The probability distribution of Y is shown in the table below.
x 1 2 3 4 5 6 Total
1 1 1 1 1 1
P(X = x) 6 6 6 6 6 6
1
y = 2x − 5 −3 −1 1 3 5 7
1 1 1 1 1 1
P(Y = y) 6 6 6 6 6 6
1
yP(Y = y) − 63 − 16 1
6
3
6
5
6
7
6
2
y2 9 1 1 9 25 49
9 1 1 9 25 49 47
y 2 P(Y = y) 6 6 6 6 6 6 3
X X 47
Hence E[Y ] = yP(Y = y) = 2 and E[Y 2 ] = y 2 P(Y = y) = . We find that
y y
3
47 35
Var[Y ] = E[Y 2 ] − {E[Y ]}2 = − 22 = .
3 3
However, we did not actually need to go through all of those calculations! Since Y = 2X − 5,
and we know that E[X] = 27 and Var[X] = 35 12
. Using the formulas given above, we have
7
E[Y ] = 2 × E[X] − 5 = 2 ×
− 5 = 2,
2
35 35
Var[Y ] = 22 × Var[X] = 4 × = ,
12 3
which are exactly the same as the figures we worked out above!
Example 4.13
Let the random variable X be “the amount claimed (in $) on a car insurance policy in a single
year”. The probability distribution for X is shown in the table below.
We could simply use the formula for E[X] and Var[X] using the values in the table. But
to make the calculations easier, avoiding having to deal with the large numbers, we define
a random variable Y = X/1000, calculate E[Y ] and Var[Y ], and then use Fact 4.11(iii) and
X = 1000Y to get E[X] and Var[X].
Let’s first organise the calculations in a table:
X X
Hence E[Y ] = yP(Y = y) = 0.166 and E[Y 2 ] = y 2 P(Y = y) = 0.5964. We find that
y y
Since X = 1000Y ,
The insurance firm wants to make a profit of $100 on each annual car insurance policy it sells.
How much should it charge for a policy?
The insurance firm can expect to pay out E[X] = $166 on an annual car insurance policy.
Therefore it must charge $166+$100 = $266 for an annual policy in order to make an expected
profit of $100.
Exercise 4.14
(We will go through this one in the lecture.)
A commuter must pass through five traffic lights on his way to work and will have to stop at
each one which is red. She estimates the probability distribution for the number of red lights,
R, to be
r 0 1 2 3 4 5
P(R = r) 0.05 0.25 0.35 0.15 0.15 0.05
Table 4.6: Probability distribution for R.
Find the expectation and variance of the number of red lights the commuter is expected to
meet on her way to work.
We will now look at three specific types of discrete probability distributions, the uniform
distribution, the binomial distribution, and the Poisson distribution.
The following video shows a proof a these facts, but you can skip this if you want.
(Open the link to the video in a new tab or new window - usually done by using right-click.)
Example 4.17
If X is the random variable “the number obtained when rolling a fair six-sided die”, then X
follows a discrete uniform distribution where the discrete set of possible values are 1, 2, . . . , 6,
each of which are equally likely to occur.
From Example 4.1 we know that E[X] = 72 , Var[X] = 35 12
. Using the above with n = 6 we
find
1 7 1 1 35
E[X] = (6 + 1) = , Var[X] = (62 − 1) = (36 − 1) = .
2 2 12 12 12
The results agree!
Example 4.18
A technician repairs computers in the Leeds area. The length of time T taken for a repair
will be either 1, 2, 3 or 4 hours (each of which is equally likely to occur).
T will follow a discrete uniform distribution on the whole numbers 1, 2, 3, 4. Hence its prob-
ability function is
1
P(T = t) = , (t = 1, 2, 3, 4).
4
What is the expected time taken to repair a computer, E[T ], and the variance in the time
taken to repair a computer, Var[T ]?
The next type of discrete probability distribution we will look at is the Binomial Distribution:
Binomial Distribution.
The Binomial Distribution is used to model statistical experiments of the following kind:
(i) A fixed number n of “trials” is performed.
(ii) Each trial is either a success (S) or a failure (F).
(iii) The probability p of a success is the same for each trial.
(iv) All trials are independent.
(v) The random variable X is the number of successful trials.
Note that the word “success” does not necessarily refer to something desirable or good; the
words “success” and “failure” are just used to distinguish between the two possible outcomes
of each trial.
As a shorthand we write X ∼ Bin(n, p) where ∼ means “is distributed as”.
Another example:
Let X =“number of girls in a family of four children”. Then X ∼ Bin(n = 4, p = 12 ).
Example 4.19
Let X =“number of sixes in four rolls of a fair die”. So X ∼ Bin(n = 4, p = 16 ).
What is P(X = 2)?
For any two of the trials to be successful (to give a 6), any one of the following could have
occurred:
SSFF, SFSF, SFFS, FSSF, FSFS, FFSS.
So there are
4 4!
= =6
2 2!2!
outcomes leading to X = 2 successes.
Since the trials are independent, the probability of each of the 6 outcomes is
1
6
× 16 × 56 × 5
6
= 25
1296
,
and so the probability that there are X = 2 successful outcomes is
4 1 2 5 2 25
P(X = 2) = = 6 × 1296 = 0.116.
2 6 6
71
Fact 4.20
Generalising the argument above, we find that for X ∼ Bin(n, p), the probability P(X = x)
of x successes is given by
n x
P(X = x) = p (1 − p)n−x , (for x = 0, 1, 2, . . . , n).
x
Example 4.21
Let X =“number of heads in five tosses of a coin”.
What is P(X = 3)? Here X ∼ Bin(n = 5, p = 12 ).
5 1 3 20 1 3
(1 − 21 )5−3 = ( 12 )2 = 0.3125.
P(X = 3) = 2
× 2
3 2
In the following video we look at the formula in the context of this example more closely:
(Open the link to the video in a new tab or new window - usually done by using right-click.)
Example 4.22
Let X =“number of sixes in four rolls of a fair die”. What is P(X ≥ 1) that we see at least
one 6?
Here X ∼ Bin(n = 4, p = 61 ). So, using the fact that for any event A we have P(A) = 1−P(A0 ),
4 1 0
P(X ≥ 1) = 1 − P(X = 0) = 1 − (1 − 61 )4−0 = 1 − 1 × 1 × ( 56 )4 = 1 − 0.482 = 0.518.
0 6
Example 4.23
Let X =“number of sixes in four rolls of a fair die”.
Here X ∼ Bin(n = 4, p = 16 ).
1
E[X] = 4 × 6
= 0.67, Var[X] = 4 × 16 × (1 − 16 ) = 0.556.
Example 4.24
A financial adviser sells six small loans in a typical week. From past experience, he believes
the probability that a client will default on a loan is 0.15. What is the probability that more
than two of the six clients will default on a loan? What assumptions are we making?
Let Y denote the number of clients who default on a loan. We have Y ∼ Bin(6, 0.15). Hence
6
P(Y = y) = 0.15y 0.856−y , (y = 0, 1, . . . , 6).
y
We require P(Y > 2). Substituting into the above formula gives the following table.
y 0 1 2 3 4 5 6
P(Y = y) 0.37715 0.39933 0.17618 0.04145 0.00549 0.00039 0.00001
Hence P(Y > 2) = 0.04145 + 0.00549 + 0.00039 + 0.00001 = 0.04734. So there is roughly a
5% chance that more than 2 clients will default on their loans.
We have used a binomial distribution as a model. This means we have assumed that clients
default on their loans independently of one another (this may not be true, for example
if there is a stock market crash). We have also assumed that each client has the same
probability of defaulting (this is unlikely to be true – some clients will be a “higher risk”
than others).
73
What is the expected number of clients to default and what is the variance in the number
of defaulting clients? We could use the formulas from above, but as an exercise, let us work
from first principles.
y 0 1 2 3 4 5 6 Sum
P(Y = y) 0.37715 0.39933 0.17618 0.04145 0.00549 0.00039 0.00001 1
yP(Y = y) 0 0.39933 0.35235 0.12436 0.02195 0.00194 0.00007 0.9
y2 0 1 4 9 16 25 36
y 2 P(Y = y) 0 0.39933 0.70471 0.37308 0.08778 0.00968 0.00041 1.575
X X
Hence E[Y ] = yP(Y = y) = 0.9 and E[Y 2 ] = y 2 P(Y = y) = 1.575. We find that
y y
Hence the expected number of clients to default is 0.9, and the variance in the number of
clients to default is 0.765.
Exercise 4.25
(We will go through this one in the lecture. You can also try it by yourself.)
A certain tennis player makes a successful serve 70% of the time. Assume each serve is
independent of the others and he serves six times. Determine the probabilities of the following
events.
The third type of discrete probability distribution we will cover is the Poisson Distribution:
(a) for any instant of time or point in space the probability of an event occurring is the
same, and
(b) only one or no event can occur at any instant of time or point in space.
(ii) Car crashes occurring at a road junction over a certain period of time.
Fact 4.26
Suppose events occur in time or space as a Poisson process with an average rate of µ events
over a fixed time period or fixed amount of space T . Let the random variable X be the
number of the events occurring in the time period or amount of space T . Then X has a
Poisson distribution with the following distribution function:
µx e−µ
P(X = x) = , (x = 0, 1, 2, . . .).
x!
As a shorthand we write X ∼ Poisson(µ).
In short, this says that, if on average there are µ events occurring over the period (or space)
µx e−µ
T , then the probability that there are x events occurring over the period T is given by .
x!
Note that the probability function involves the number e ≈ 2.71828. The number e refers
to Euler’s number, named after the mathematician Leonhard Euler. Most calculators have a
button marked e∗ .
75
Here are some graphs illustrating the Poisson probability distribution for various values of µ:
Example 4.27
Major medical emergencies at a hospital occur at a rate of 0.1 per year. Assuming they
occur as a Poisson process, calculate the probability that there are exactly 3 major medical
emergencies in a 10 year period.
Let X denote the number of major medical emergencies that occur in a 10 year period. Since
the rate of medical emergencies per year is 0.1, the rate of emergencies over T = 10 years is
10 × 0.1 = 1, i.e. an average of one medical emergencies over 10 years.
So we have X ∼ Poisson(µ = 1).
[Note that µ is the rate over the time period that the random variable X is referring to, which
can be different from the one that is given. In this example we are given the rate per year, but
76 CHAPTER 4. DISCRETE RANDOM VARIABLES
the random variable we are interested in, refers to a 10-year period. So we needed to convert
the given rate accordingly.]
We require P(X = 3). So
13 e−1 0.3679
P(X = 3) = = = 0.0613.
3! 6
So the probability that there are exactly 3 major medical emergencies in a 10-year period is
about 0.06.
E[X] = µ, Var[X] = µ.
In the example above, X ∼ Posson(µ = 1). The expected number of major medical emergen-
cies per 10 years is E[X] = 1, and the variance in the number of major medical emergencies
per 10 years is Var[X] = 1.
Example 4.28
Phone calls arrive at a mean rate of 48 calls per hour at the reservation desk for Regional
Airways. Assuming calls occur as a Poisson process, compute the probability of receiving
three calls in a five-minute interval of time.
Let X denote the number of calls in a 5-minute period. Since the mean rate of calls per hour
1
is 48, he mean rate of calls in a 5-minute period is µ1 = 48 × 12 = 4, since there are twelve
5-minute time intervals in an hour (12 × 5 = 60).
[Note how we had to convert the given rate to the appropriate rate for time interval we are
interested in.]
So X ∼ Poisson(4) and
43 e−4 0.0183 × 64
P(X = 3) = = = 0.1954.
3! 6
Hence the probability of receiving three calls in a five-minute interval is approximately 0.20.
Note that this is the probability of receiving exactly three calls, no more and no less. We
might be more interested in the probability of receiving more than a specific number of calls:
Now determine the probability of receiving more than 2 calls in 15 minutes.
Let Y denote the number of calls in a 15 minute period. The mean rate of calls in a 15 minute
period is µ2 = 48 × 41 = 12, as there are four 15-minute time intervals in an hour.
77
(Or, µ2 = 3 × 4 = 12, as there are three 5-minute time intervals in 15 minutes, using the rate
µ1 from before.)
Hence Y ∼ Poisson(12). We calculate P(Y > 2) using
P(Y > 2) = 1 − P(Y ≤ 2) = 1 − P(Y = 2) − P(Y = 1) − P(Y = 0).
Hence
122 e−12 121 e−12 120 e−12
P(Y > 2) = 1 − − −
2! 1! 0!
= 1 − 0.000442 − 0.000074 − 0.0000061
= 0.999478.
So the probability of receiving more than 2 calls in 15 minutes is about 99.9%.
What is the expectation and variance in the number of calls in a 15 minute period?
The expected number of calls is E[Y ] = 12 and the variance in the number of calls is Var[Y ] =
12.
What are the limitations of the model? We have assumed that calls occur as a Poisson process.
This assumes that calls to the desk are made independently of one another. This seems
reasonable as people will be calling from all over the country, and are not likely to be able
to influence each others decisions. We are also assuming that the probability of calling
is constant through time. This is less reasonable, there are likely to be “busy periods”
during the day (e.g. at lunch time – when people take a break from work, or at 5pm – when
they return from work).
The following example involves a Poisson process over a certain space, rather than over time.
The following video explains the example in more detail:
(Open the link to the video in a new tab or new window - usually done by using right-click.)
Example 4.29
Flaws in a particular kind of metal sheeting occur at an average rate of one per 10 sqft. What
is the probability that a 5-by-8-foot sheet has more than two flaws ?
To solve this, let the random variable X be the number of flaws in the 5-by-8-foot sheet.
The area of the 5-by-8-foot sheet is 5 × 8 = 40 ft2 . So there are on average 4 × 1 flaws in an
5-by-8-foot sheet. Therefore X ∼ Poisson(µ = 4), and
4 42 13
P (X > 2) = 1−[P (X = 0)+P (X = 1)+P (X = 2)] = 1−[e−4 (1+ + )] = 1− 4 = 0.7619.
1 2 e
So the probability that a 5-by-8-foot sheet has more than two flaws is about 76.2%.
78 CHAPTER 4. DISCRETE RANDOM VARIABLES
Exercise 4.30
In a certain area fires occur at a rate of one every 12 years. Assuming the number of fires in
that area can be modelled by a Poisson distribution,
(i) what is the probability that no fire will occur in the area in the next 20 years?
(ii) what is the probability that at least two fires will occur in the area the next 10 years?
The following video discusses the solution, but try it yourself first:
!! Warning !! The final answer in this video is incorrect. It should be 20.3%.
(Open the link to the video in a new tab or new window - usually done by using right-click.)
Exercise 4.31
(We will go through this one in the lecture. You can also try it by yourself.)
In a certain published book of 520 pages, 390 typographical errors occur. What is the prob-
ability that one page, selected randomly by the printer as a sample, will be free from errors?
In the following section we discuss how the Poisson distribution can be used to approximate
the binomial distribution:
5! = 120,
10! = 3628800,
20! = 2.43 × 1018 ,
40! = 8.16 × 1047 ,
80! = too large for many calculators!
Fact 4.32
Suppose X is a binomially distributed discrete random variable, i.e. X ∼ Bin(n, p), where n
is large (say n > 50) and p is small (say p < 0.1), then
X ≈ Poisson(µ = np),
Example 4.33
Suppose Y ∼ Bin(n = 60, p = 0.02). What is P(Y = 1)? By definition
60
P(Y = 1) = 0.021 0.9859 = 0.3644.
1
Since n > 50 and p < 0.1, we have Y ≈ Poisson(µ), with µ = 60 × 0.02 = 1.2. Hence:
e−1.2 1.21
P(Y = 1) ≈ = 0.3614,
1!
and we see that the approximation is reasonably accurate!
Example 4.34
Past experience suggests that the probability a peach will show signs of mildew on arrival
at market is 0.004. Occasionally, if storage conditions are faulty, the probability a peach
will show signs of mildew can be much higher than this. Assuming that the conditions of
individual fruit are independent of one another, and that the probability a peach will show
signs of mildew on arrival at market is 0.004, determine the probability that a carton of
250 individually packed peaches contains more than three that show signs of mildew. What
conclusions would you draw if a randomly chosen carton was found to contain five mildewed
peaches?
Let W denote the number of mildewed peaches in a carton of 250. If storage conditions are
not faulty, W ∼ Bin(250, 0.004). We require P(W > 3). To calculate this we use
Since n = 250 > 50 and p = 0.004 < 0.1, the Poisson approximation is appropriate with
µ = 250 × 0.004 = 1. Hence
1w e−1
P(W = w) ≈ , (w = 0, 1, 2, . . .).
w!
Summarising the calculations in a table gives
80 CHAPTER 4. DISCRETE RANDOM VARIABLES
w 0 1 2 3 Total
P(W = w) 0.3679 0.3679 0.1839 0.0613 P(W ≤ 3) = 0.9810
Table 4.9: Calculations.
Hence P(W > 3) = 1 − 0.9810 = 0.019. If storage conditions are not faulty then there is
less than a 2% chance that more than three of the 250 peaches will have mildew. Hence if
a randomly sampled carton was found to have five mildewed peaches, this would be strong
evidence to suggest that storage conditions were at fault!
Exercise 4.35
The probability of contracting tuberculosis (TB) is small, with a probability of 0.0005 for
each person in a given year. Suppose a particular small town has a population of 8000 people.
(i) What is the expected number of new cases in the town next year?
(ii) Use the Poisson approximation to estimate the probability that there will be at least
one new case in the town next year?
Try this one yourself first, before watching the video with the solution:
(Open the link to the video in a new tab or new window - usually done by using right-click.)