Chapter 1: Discrete Random Variables
Introduction
Probability is an essential branch of mathematics, with applications in many fields such as
physics, engineering, economics, and many others. This booklet is intended for second-year
students in preparatory classes for economics and management sciences, and aims to
provide a deep understanding of the fundamental concepts of probability. Mastery of these
notions is crucial for future academic and professional success.
This course is structured around four main chapters, each dealing with a key aspect of
probability: discrete random variables, continuous random variables, discrete vectors, and
continuous vectors. Each chapter will cover definitions, fundamental concepts, and
important theorems, with a particular emphasis on practical applications and problem-
solving methods.
The first chapter is devoted to discrete random variables. You will learn about probability
and distribution functions, as well as the concepts of expectation and variance. We will also
examine common distributions such as the binomial law, the geometric law, and the Poisson
law, which are essential for understanding discrete phenomena.
The second chapter focuses on continuous random variables. We will study probability
densities, cumulative distribution functions, and moments. Common continuous
distributions, such as the normal, exponential, and uniform laws, will be analyzed in detail.
These notions are essential for modeling and analyzing continuous phenomena in various
contexts.
The third chapter addresses discrete vectors, that is, pairs of discrete random variables. You
will learn how to work with joint and marginal distribution functions, as well as the
concepts of covariance, correlation, and independence. These notions will help you
understand the relationships between several discrete random variables and analyze
complex phenomena.
The fourth chapter focuses on continuous vectors, that is, pairs of continuous random
variables. We will explore joint density functions, marginal distributions, and the concepts
of covariance and correlation in the continuous context. This chapter will generalize
previously discussed notions and apply them to continuous variables.
Definitions and General Concepts
1.1 Probabilistic Spaces
Here, we will develop skills to represent random experiments such as tossing a coin,
drawing balls from an urn, or shuffling a deck of cards. The results of these experiments, by
definition, are uncertain and can vary from one trial to another, which does not seem very
compatible with the mathematical language we are used to. Therefore, it will be necessary
to make some effort to obtain a precise description. Before continuing, you may wonder
how you would have used familiar mathematical objects (sets, functions, etc.) to represent
randomness, which we all have an intuitive understanding of. The theory presented below
will appear even more refined!
The probabilistic space is the essential element that allows us to correctly formulate a
random experiment. It consists of three elements: a sample space Ω, a σ-algebra �, and a
probability measure P (sometimes simply called probability). We will now spend some time
studying each of these notions.
1.2 Sample Space
To describe a random experiment, it is necessary to begin by defining all possible outcomes.
All these results are known as the sample space and are denoted by Ω.
The size of this set is not limited: it can be finite, countably infinite, or uncountably infinite.
It should be emphasized that the selection of the sample space is not unique—there are
various reasonable ways to describe the same situation. Here are some examples:
• Tossing a die: Ω = {1, 2, 3, 4, 5, 6}.
• Two successive dice throws: Ω = {1, 2, 3, 4, 5, 6} × {1, 2, 3, 4, 5, 6}.
• Tossing a coin: Ω = {Heads, Tails}.
1.3 σ-Algebras
Our random experiment will produce an outcome ω ∈ Ω, and we will want to know whether
this outcome falls into a particular subset A ⊆ Ω of our sample space. Once the sample space
Ω is specified, ...
Our next task is to create a list of all events A ⊆ Ω that may be of interest to us. The set of
these events will be called a sigma-algebra, and will be denoted by �. Thus, a sigma-algebra
� is a collection or a family of subsets of Ω.
Therefore, � ⊂ P(Ω).
To define a sigma-algebra, it is necessary to satisfy certain stability properties.
1.3.1 Definition Let Ω be a set and � a subset of Ω.
We say that � is a sigma-algebra (σ-algebra) on Ω if:
1. Ω ∈ �
2. For all A ∈ �, the complement A̅ ∈ � (complement property)
3. For any countable family (Aᵢ)ᵢ≥1, ⋃₍ᵢ₌₁₎^∞ Aᵢ ∈ � (countable union property)
Remark 1.3.1 Recall that a countable set is one that is in bijection with a subset of ℕ (or
equivalently, can be injected into ℕ). Thus, in this course, a countable set may be either
finite or countably infinite.
Example 1.3.1
� = {∅, A, A̅, Ω}, with A and A̅ ⊆ Ω.
Is � a sigma-algebra?
1. The first condition is satisfied.
2. The second condition is satisfied since ∅̅ = Ω ∈ �, A̅ ∈ �, and (A̅)̅ = A ∈ �.
3. The third condition is also satisfied because ∅ ∪ A ∪ A̅ ∪ Ω = Ω ∈ �.
Since all three conditions are verified, we conclude that � is a sigma-algebra.
Example 1.3.2
If � = {∅, A, Ω}, we notice that the complement of A, denoted A̅, does not belong to �.
Therefore, � is not a sigma-algebra.
Remark 1.3.2
The power set P(Ω) is the largest sigma-algebra on Ω.
1.3.2 Definition Let (Ω, �, P) be a probabilistic space. A real-valued random variable (r.v.) is
any mapping
X: Ω → ℝ
ω ↦ X(ω)
satisfying: for all x ∈ ℝ, the preimage X⁻¹(]−∞, x]) ∈ �,
that is,
X⁻¹(]−∞, x]) = {ω ∈ Ω : X(ω) ∈ ]−∞, x]} = {ω ∈ Ω : X(ω) ≤ x}.
Example 1.3.3
We toss a coin twice and observe the number of heads obtained.
X denotes the number of heads obtained.
1. Determine Ω and the values taken by X.
2. Show that X is a real random variable on Ω equipped with the sigma-algebra ℱ = P(Ω).
Solution:
1. Ω = {HH, HT, TH, TT}
| ω | HH | HT | TH | TT |
|----|----|----|----|----|
| X(ω) | 0 | 1 | 1 | 2 |
Thus, X(Ω) = {0, 1, 2}.
For all x ∈ ℝ: X⁻¹(]−∞, x]) = {ω ∈ Ω : X(ω) ≤ x} ∈ P(Ω)
Case 1: x < 0 ⇒ X⁻¹(]−∞, x]) = ∅ ∈ P(Ω)
Case 2: 0 ≤ x < 1 ⇒ X⁻¹(]−∞, x]) = {ω ∈ Ω : X(ω) = 0} = {HH} ∈ P(Ω)
Case 3: 1 ≤ x < 2 ⇒ X⁻¹(]−∞, x]) = {HH, HT, TH} ∈ P(Ω)
Case 4: x ≥ 2 ⇒ X⁻¹(]−∞, x]) = Ω ∈ P(Ω)
Conclusion: X is a real random variable on (Ω, ℱ).
Remark 1.3.3 In general, the set of values taken by the random variable X is denoted X(Ω)
and is called the **support** of X.
1.4 Probability Law
1.4.1 Definition The probability law of X is the function Pₓ: X(Ω) → [0, 1] defined by:
For all x ∈ X(Ω): Pₓ(x) = P(X = x) = P(X⁻¹(x))
and P(X⁻¹(x)) = P{ω ∈ Ω : X(ω) = x}.
1.4.2 Definition Let X be a real random variable with support Ω. If X(Ω) is a countable
subset of ℝ, then X is called a **discrete random variable**.
1.4.3 Proposition
If X and Y are two real discrete random variables on (Ω, �), then:
a) For all real numbers a, b: aX + bY is a discrete random variable.
b) X × Y is a discrete random variable.
c) sup(X, Y) and inf(X, Y) are discrete random variables.
1.4.4 Definition
An indicator function is a function defined with respect to a subset A of real numbers, which
can take only two values:
the value 1 if the variable of the function belongs to A, and the value 0 otherwise.
Notation
For all fixed A ⊆ ℝ, we define the function:
1ₐ: ℝ → {0, 1}
x ↦ 1ₐ(x) = 1{x ∈ A} =
1 if x ∈ A
0 if x ∉ A
Note that 1(−∞, +∞) ≡ 1, for all x ∈ ℝ.
Example 1.4.2
Let
f(x) =
0 if x < −1
(x + 1)/4 if −1 ≤ x < 3
1 if x ≥ 3
Then, f(x) = (x + 1)/4 · 1{−1 ≤ x < 3}(x) + 1{ x ≥ 3}(x).
Properties:
• 1ₐᶜ(x) = 1 − 1ₐ(x)
• 1ₐ∩�(x) = 1ₐ(x) · 1�(x)
• 1ₐ∪�(x) = 1ₐ(x) + 1�(x) − 1ₐ∩�(x)
1.5 Cumulative Distribution Function
1.5.1 Definition
Let X be a random variable on (Ω, �, P).
The cumulative distribution function (CDF) of X is the function Fₓ: ℝ → [0, 1] defined by:
For all x ∈ ℝ, Fₓ(x) = P(X ≤ x) = Σ₍ₖ₌−∞₎^ₓ P(X = k).
Example:
x −3 −2 0 1
P(X = x) 0.4 0.3 0.2 0.1
Case 1: x < −3 ⇒ Fₓ(x) = 0
Case 2: −3 ≤ x < −2 ⇒ Fₓ(x) = P(X = −3) = 0.4
Case 3: −2 ≤ x < 0 ⇒ Fₓ(x) = P(X = −3) + P(X = −2) = 0.7
Case 4: 0 ≤ x < 1 ⇒ Fₓ(x) = P(X = −3) + P(X = −2) + P(X = 0) = 0.9
Case 5: x ≥ 1 ⇒ Fₓ(x) = P(X = −3) + P(X = −2) + P(X = 0) + P(X = 1) = 1
We can write:
Fₓ(x) =
0 if x < −3
0.4 if −3 ≤ x < −2
0.7 if −2 ≤ x < 0
0.9 if 0≤x<1
1 if x≥1
1.6 Properties of the Cumulative Distribution Function (CDF)
1. Fₓ(x) is non-decreasing on ℝ ⇔ for all x, y ∈ ℝ, x < y ⇒ Fₓ(x) ≤ Fₓ(y).
2. Fₓ(x) is right-continuous at every point of ℝ and admits a left-hand limit:
For all a ∈ ℝ, lim₍ₓ→a⁺₎ Fₓ(x) = Fₓ(a).
3. lim₍ₓ→−∞₎ Fₓ(x) = 0 and lim₍ₓ→+∞₎ Fₓ(x) = 1.
1.7 Probability Associated with an Interval
Let X be a random variable and Fₓ(x) its cumulative distribution function.
For all a, b ∈ ℝ with a < b, we have:
• P(a < X ≤ b) = Fₓ(b) − Fₓ(a)
• P(a ≤ X < b) = Fₓ(b⁻) − Fₓ(a⁻), where Fₓ(a⁻) = lim₍ₓ→a⁻₎ Fₓ(x)
• P(a < X < b) = Fₓ(b⁻) − Fₓ(a)
• P(a ≤ X ≤ b) = Fₓ(b) − Fₓ(a⁻)
• P(X = a) = Fₓ(a) − Fₓ(a⁻)
• P(X < a) = P(X ≤ a) − P(X = a) = Fₓ(a⁻)
P(X > a) = 1 − P(X ≤ a) = 1 − Fₓ(a)
Example 1.7.1
Let X be a random variable with cumulative distribution function
Fₓ(x) =
0 if x < 0
(1/3) + (x/3) if 0 ≤ x < 1
1 if x ≥ 1
Compute: P(1/2 < X ≤ 3/2), P(0 < X < 5), P(X > 3/4), P(X = 0)
Solution:
• P(1/2 < X ≤ 3/2) = F(3/2) − F(1/2) = 1 − (1/3 + 1/(2×3)) = 1/2
• P(0 < X < 5) = F(5−) − F(0) = 1 − 1/3 = 2/3
• P(X > 3/4) = 1 − P(X ≤ 3/4) = 1 − F(3/4) = 1 − (1/3 + 3/(4×3)) = 5/12
• P(X = 0) = F(0) − F(0−) = 1/3 − 0 = 1/3
1.7.1 Definition
A random variable X is said to be **discrete** if the set X(Ω) is finite or countably infinite,
and its cumulative distribution function is a **step function**.
Using the cumulative distribution function, we can determine the probability law of X:
P(X = a) = Fₓ(a) − Fₓ(a⁻).
If Fₓ is continuous at a, then P(X = a) = 0.
1.8 Probability Law of a Discrete Random Variable
1.8.1 Definition
The probability law is the function p(x) defined as:
p(x) =
P(X = x) if x ∈ X(Ω)
0 otherwise
This probability function satisfies the following properties:
p(x) ≥ 0 for all x ∈ ℝ
Then, the cumulative distribution function (CDF) of X is:
Fₓ(x) =
0 if x<0
1/4 if 0≤x<1
3/4 if 1≤x<2
1 if x≥2
And if we want to determine the probability law of X from the cumulative distribution
function, we have:
X(Ω) = {0, 1, 2}
P(X = 0) = F(0) − F(0−) = 1/4 − 0 = 1/4
P(X = 1) = F(1) − F(1−) = 3/4 − 1/4 = 2/4
P(X = 2) = F(2) − F(2−) = 1 − 3/4 = 1/4
1.9 Moments, Mathematical Expectation, and Variance of a Discrete Random Variable
1.9.1 Definition
Let X be a discrete random variable. We say that X admits a moment of order k (k ∈ ℕ*) if:
Σ₍ₓ∈X(Ω)₎ xᵏ · P(X = x) is absolutely convergent.
Then we have:
E(Xᵏ) = Σ₍ₓ∈X(Ω)₎ xᵏ · P(X = x)
If k = 1, we obtain the **mean** or **expectation** of X.
• The **variance** of the random variable X is the second-order moment of (X − E(X)):
V(X) = E[(X − E(X))²] = Σ₍ₓ∈X(Ω)₎ (x − E(X))² P(X = x) = E(X²) − [E(X)]²
• The quantity σ(X) = √V(X) is called the **standard deviation** of the random variable X.
1.10 Generating Function
Let X be a discrete random variable taking only non-negative integer values (X(Ω) ⊂ ℕ).
1.10.1 Definition
The **probability generating function** (PGF) of X is defined by:
Gₓ(t) = E(tˣ) = Σ₍ₖ₌₀₎^∞ tᵏ P(X = k) = P(X = 0) + tP(X = 1) + … , t ∈ [−1, 1]
Properties:
1. Gₓ(0) = P(X = 0)
2. Gₓ(1) = 1
3. P(X = k) = Gₓ⁽ᵏ⁾(0) / k!, for all k ≥ 0
4. If E(Xᵏ) exists, then:
Gₓ⁽ᵏ⁾(1) = E[X(X − 1)(X − 2)…(X − k + 1)] — called the **factorial moment** of order k
of X.
In particular:
Gₓ′(1) = E(X)
Gₓ″(1) = E[X(X − 1)] = E(X²) − E(X) ⇒ E(X²) = Gₓ″(1) + E(X)
1.9.1 Definition
Let X be a discrete random variable. We say that X admits a moment of order k (k ∈ ℕ*) if:
Σ₍ₓ∈X(Ω)₎ xᵏ · P(X = x) is absolutely convergent.
Then we have:
E(Xᵏ) = Σ₍ₓ∈X(Ω)₎ xᵏ · P(X = x)
If k = 1, we obtain the **mean** or **expectation** of X.
• The **variance** of the random variable X is the second-order moment of (X − E(X)):
V(X) = E[(X − E(X))²] = Σ₍ₓ∈X(Ω)₎ (x − E(X))² P(X = x) = E(X²) − [E(X)]²
• The quantity σ(X) = √V(X) is called the **standard deviation** of the random variable X.
1.10 Generating Function
Let X be a discrete random variable taking only non-negative integer values (X(Ω) ⊂ ℕ).
1.10.1 Definition
The **probability generating function** (PGF) of X is defined by:
Gₓ(t) = E(tˣ) = Σ₍ₖ₌₀₎^∞ tᵏ P(X = k) = P(X = 0) + tP(X = 1) + … , t ∈ [−1, 1]
Properties:
1. Gₓ(0) = P(X = 0)
2. Gₓ(1) = 1
3. P(X = k) = Gₓ⁽ᵏ⁾(0) / k!, for all k ≥ 0
4. If E(Xᵏ) exists, then:
Gₓ⁽ᵏ⁾(1) = E[X(X − 1)(X − 2)…(X − k + 1)] — called the **factorial moment** of order k
of X.
In particular:
Gₓ′(1) = E(X)
Gₓ″(1) = E[X(X − 1)] = E(X²) − E(X) ⇒ E(X²) = Gₓ″(1) + E(X)
1.11 Moment Generating Function
Computing the moments of order k for a random variable X can quickly become tedious.
However, there is a way to obtain all moments from a single function, called the **moment
generating function (MGF)**, denoted Mₓ(t), defined by:
Mₓ(t) = E(eᵗˣ) = Σ₍ₓ∈X(Ω)₎ eᵗˣ P(X = x)
The n-th order moment of the random variable X is given by:
E(Xⁿ) = Mₓ⁽ⁿ⁾(0),
where Mₓ⁽ⁿ⁾(0) denotes the n-th derivative of Mₓ evaluated at 0.
**Example:** Determine the moment generating function of the random variable X when:
1. X follows a Bernoulli distribution with parameter p.
2. X follows a Binomial distribution B(n, p).
**Solution:**
1. X ~ B(p), P(X = 0) = 1 − p = q, P(X = 1) = p
Mₓ(t) = E(eᵗˣ) = Σ₍ₓ∈X(Ω)₎ eᵗˣ P(X = x)
= e⁰·q + eᵗ·p = q + p eᵗ
2. X ~ B(n, p), P(X = x) = Cₙˣ pˣ (1 − p)ⁿ⁻ˣ
Mₓ(t) = E(eᵗˣ) = Σ₍ₓ∈X(Ω)₎ eᵗˣ P(X = x)
= Σ₍ₓ₌₀₎ⁿ eᵗˣ Cₙˣ pˣ (1 − p)ⁿ⁻ˣ
= Σ₍ₓ₌₀₎ⁿ Cₙˣ (p eᵗ)ˣ (1 − p)ⁿ⁻ˣ (by the binomial theorem)
= (p eᵗ + q)ⁿ
1.12 Common Discrete Distributions
1.12.1 Uniform Distribution:
A discrete random variable X is said to follow a **uniform distribution** when it takes
values in {1, 2, …, n} with equal (equiprobable) probabilities. Each probability is:
P(X = k) = 1/n
E(X) = (n + 1)/2, V(X) = (n² − 1)/12
Example
The roll of a six-sided die follows a uniform distribution with the following probability law:
|x|1|2|3|4|5|6|
|---|---|---|---|---|---|---|
| P(X = x) | 1/6 | 1/6 | 1/6 | 1/6 | 1/6 | 1/6 |
E(X) = (6 + 1)/2 = 3.5 V(X) = (6² − 1)/12 = 2.61
1.12.2 Bernoulli Distribution
A random variable X follows a **Bernoulli distribution** when the experiment has only two
possible outcomes such as “success or failure,” “true or false,” “heads or tails,” etc.
Success is represented by the event {X = 1} and failure by {X = 0}.
X(Ω) = {0, 1}
P(X = 0) = 1 − p = q , P(X = 1) = p , p+q=1
We write: X ~ B(p), p ∈ [0, 1]
E(X) = p and V(X) = p(1 − p) = pq
**Example:** Toss a die once.
p: probability of rolling a 6.
1 − p: probability of rolling another number.
Thus, X ~ B(1/6), with X(Ω) = {0, 1}.
P(X = 0) = 1 − 1/6 = 5/6 , P(X = 1) = 1/6
E(X) = Σ₍ₓ∈X(Ω)₎ xP(X = x) = 0·(5/6) + 1·(1/6) = 1/6
E(X²) = Σ₍ₓ∈X(Ω)₎ x²P(X = x) = 0²·(5/6) + 1²·(1/6) = 1/6
V(X) = E(X²) − [E(X)]² = 1/6 − (1/36) = 5/36 = (1/6)(5/6) = p·q
1.12.3 Binomial Distribution
The **binomial distribution** is the law of a random variable representing a series of
Bernoulli trials, possessing the following properties:
• Each experiment has two outcomes (success, failure).
• The repeated experiments are identical and independent (successive draws with
replacement).
X: denotes the number of successes in a sequence of n trials.
We write:
X ~ B(n, p), with X(Ω) = {0, …, n}
P(X = k) = Cₙᵏ pᵏ (1 − p)ⁿ⁻ᵏ, for k = 0, …, n
E(X) = np and V(X) = np(1 − p) = npq
**Example:**
A coin is tossed 10 times.
Let X be the number of heads obtained in these 10 tosses.
Compute the probability of obtaining exactly 3 heads.
n = 10, p = 1/2
X ~ B(10, 1/2)
X(Ω) = {0, …, 10}
P(X = k) = C₁₀ᵏ (1/2)ᵏ (1/2)¹⁰⁻ᵏ
P(X = 3) = C₁₀³ (1/2)³ (1/2)⁷ = 0.117
E(X) = 10 × (1/2) = 5, V(X) = 10 × (1/2) × (1/2) = 2.5
1.12.4 Geometric Distribution
Suppose we repeat a Bernoulli trial and we are interested in the number X of times the
experiment must be repeated to obtain the **first success**.
If the repetitions are independent (draws with replacement) and each has the same
probability p of success, then X follows a **Geometric Distribution** with parameter p.
We write: X ~ G(p)
X(Ω) = {1, 2, …} = ℕ*
P(X = k) = p qᵏ⁻¹, for k ∈ ℕ*
E(X) = 1/p, V(X) = q/p², F(x) = 1 − qˣ
Σ₍ₖ₌₁₎ⁿ qᵏ = q(1 − qⁿ) / (1 − q)
Thus,
p × [q(1 − qˣ) / (1 − q)] = 1 − qˣ since p + q = 1
**Example:**
We have a key ring with 5 identical keys. In the dark, we try to open a lock without paying
attention to which key we try each time. Knowing that only one key fits, what is the
probability of using the correct key on the 10th attempt?
X ~ G(1/5), X(Ω) = {1, 2, …} = ℕ*
P(X = k) = (1/5) × (4/5)ᵏ⁻¹
P(X = 10) = (1/5) × (4/5)⁹ = 0.026
1.12.5 Hypergeometric Distribution
Let an urn contain N balls, of which N₁ are white and N₂ are black. We draw n balls
**without replacement** (n ≤ N).
Let X denote the number of white balls obtained among the n draws.
We set p = N₁ / N (probability of drawing a white ball).
X(Ω) = {max(0, n − N₂), …, min(n, N₁)}
and we write X ~ H(N, n, p)
The probability mass function is given by:
P(X = k) = [Cₙ₁ᵏ × Cₙ₂ⁿ⁻ᵏ] / Cₙⁿ for all k ∈ X(Ω)
E(X) = np, V(X) = npq × [(N − n) / (N − 1)]
**Example:**
A machine has produced 500 items, of which 5 are defective. We randomly select 20 items.
Assume the selection is done **without replacement**.
1. Determine the mean and variance of the number of defective items.
2. What is the probability that none of the 20 items is defective?
Given:
N = 500, N₁ = 5, N₂ = 495, n = 20, p = 5 / 500
X(Ω) = {0, 1, …, 5} = {max(0, n − N₂ = 0), min(n, N₁) = 5}
X ~ H(N, n, p) = H(500, 20, 5/500)
P(X = k) = [C₅ᵏ · C₄₉₅²⁰⁻ᵏ] / C₅₀₀²⁰
E(X) = n p = 20 × (5 / 500) = 0.2
V(X) = n p q × [(N − n) / (N − 1)]
= 20 × (5/500) × (495/500) × (480/499) = 0.19
2. P(X = 0) = [C₅⁰ × C₄₉₅²⁰⁻⁰] / C₅₀₀²⁰ = C₄₉₅²⁰ / C₅₀₀²⁰ = 0.81
3. If the draws are done with replacement:
X ~ B(20, 5/500), X(Ω) = {0, 1, 2, …, 20}
P(X = 0) = C₂₀⁰ (5/500)⁰ (1 − 5/500)²⁰ = 0.81
1.12.6 Poisson Distribution (Law of Rare Events)
The Poisson distribution is used to model the number of occurrences of an event within a
given time interval.
Examples:
• Number of fatal accidents per week.
• Number of bacteria in a water sample.
• Number of printing errors per page in a book.
We write: X ~ P(λ), λ > 0, X(Ω) = ℕ
P(X = k) = (e⁻ˡ λᵏ) / k!, k∈ℕ
E(X) = V(X) = λ
Example:
On average, two work accidents occur each year in a school.
What is the probability that in a given year there are exactly five accidents?