0% found this document useful (0 votes)
12 views21 pages

Introduction to Probability Concepts

course-berestycki

Uploaded by

Hoa Nguyễn
Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views21 pages

Introduction to Probability Concepts

course-berestycki

Uploaded by

Hoa Nguyễn
Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Introduction to Probability

Nathanaël Berestycki
Lecture notes1 - Michaelmas 2007

These notes are a list of some of the definitions and results presented
in the course. Since they are free from any motivating exposition or
examples, and since no proofs are given for any of the theorems, these
notes should be used only as a reference to supplement the recom-
mended texts. The class material consists, ultimately, of what was
done in class and in the exmaple sheets. In what follows, phrases en-
closed in brackets [ . . . ] are digressive or slightly outside of the syllabus
of the course, but are included for completeness. A table of notation
is in the appendix.

1. Basic tools
1.1. Probability spaces.

Definition 1.1. Let Ω be a set. A sigma-field on Ω is a non-empty


set F of subsets of Ω such that
(1) if A ∈ F then Ac ∈ F, S
(2) if A1 , A2 , . . . ∈ F then ∞
i=1 Ai ∈ F.
[The terms sigma-field and sigma-algebra are interchangeable.]

Definition 1.2. Let Ω be a set and let F be a sigma-field on Ω. A


probablity measure on F is a function P : F → [0, 1] such that
(1) if A1 , A2 , . . . ∈ F are disjoint then P( ∞
S P∞
i=1 Ai ) = i=1 P(Ai ),
(2) P(Ω) = 1.
[The terms disjoint and mutually exclusive are interchangeable and
refer to events A and B such that A ∩ B = ∅.]

Definition 1.3. Let Ω be a set, F a sigma-field on Ω, and P a proba-


bility measure on F. The triple (Ω, F, P) is called a probability space.
The set Ω is called the sample space, and an element of Ω is called
an outcome. A subset of Ω which is an element of F is called an event.
Let A ∈ F be an event. If P(A) = 1 then A is called an almost
sure event, and if P(A) = 0 then A is called a null event. [The phrase
“almost surely” is often abbreviated a.s.]

1
Version of November 28th, 2007
1
1.2. Random variables and distribution functions.
Definition 1.4. Let (Ω, F, P) be a probability space. A random vari-
able is a function X : Ω → R such that the set {ω ∈ Ω : X(ω) ≤ t} is
an element of F for all t ∈ R.
Let A be a subset of R, and let X be a random variable. We use the
notation {X ∈ A} to denote the set {ω ∈ Ω : X(ω) ∈ A}. A random
variable X is said to take values in a subset S ⊆ R if X ∈ S almost
surely.
The distribution function of X is the function FX : R → [0, 1] defined
by
FX (t) = P(X ≤ t)
for all t ∈ R.
[A distribution function is called defective if either limt↑∞ FX (t) < 1
or limt↓−∞ FX (t) > 0. Unless otherwise indicated, all random vari-
ables considered here are assumed to have non-defective distribution
functions.]
The law of X is the probability measure ν such that
ν([a, b]) = P(a ≤ X ≤ b)
for all a ≤ b. The statement “the random variable X has law ν” is
written X ∼ ν. [The measure ν is defined on theBorel sigma-field
B, which is defined as the smallest sigma-field on R containing the
intervals [a, b] for all a ≤ b.]
Definition 1.5. Let A be an event in Ω. The indicator function of the
event A is the random variable 1A : Ω → {0, 1} defined by

1 if ω ∈ A
1A (ω) =
0 if ω ∈ Ac
for all ω ∈ Ω.
Definition 1.6. A random variable X is called discrete if X takes
values in a countable set.
If X is discrete, the function pX : R → [0, 1] defined by pX (t) =
P(X = t) is called the mass function of X.
Definition 1.7. Let X is a discrete random variable taking values in
N with mass function pX .
The random variable X is called
• Bernoulli with parameter p if
pX (0) = 1 − p and pX (1) = p.
where 0 ≤ p ≤ 1.
2
• binomial with parameters n and p, written X ∼ bin(n, p), if
 
n k
pX (k) = p (1 − p)n−k for all k = 0, 1, . . . , n
k
where n ∈ N and 0 ≤ p ≤ 1.
• Poisson with parameter λ if
λk −λ
pX (k) = e for all k = 0, 1, 2, . . .
k!
where λ ≥ 0.
• geometric with parameter p if
pX (k) = p(1 − p)k−1 for all k = 1, 2, 3, . . .
where 0 ≤ p ≤ 1.
[In the above formulae, the convention 00 = 1 is used. If X is geometric
with parameter p = 0, then X = ∞ almost surely.]
In practice, a random variable is specified by its law. We almost
never need to construct explicitly the probability space (Ω, F, P) asso-
ciated with it.
Definition 1.8. Let FX be the distribution of a random variable X.
The random variable X is continuous if and only if FX is a continuous
function.
The random variable X is absolutely continuous if and only if there
exists a positive function fX : R → [0, ∞) such that
Z t
FX (t) = fX (s)ds
−∞
for all t ∈ R, in which case the function fX is called the density function
of X.
Definition 1.9. Let X is a continuous random variable with density
function fX .
The random variable X is called
• uniform on the interval (a, b), written X ∼ unif(a, b), if
1
fX (t) = for all a < t < b
b−a
for some a < b.
• normal with mean µ and variance σ 2 , written X ∼ N (µ, σ 2 ), if
1 (x − µ)2 
fX (t) = √ exp − for all t ∈ R
2πσ 2σ 2
for some µ ∈ R and σ 2 > 0.
3
• exponential with rate λ, written X ∼ exp(λ), if
fX (t) = λe−λt for all t ≥ 0
for some λ > 0.
• Cauchy if
1
fX (t) = for all t ∈ R.
π(1 + t2 )
1.3. Expectations and variances.
Definition 1.10. Let X be a random variable on (Ω, F, P). The ex-
pected value of X is denoted by E(X) and is given by the integral
Z
E(X) = X(ω)P(dω).

The above integral is defined in the following cases:
• if X ≥ 0 almost surely.
• if either E(X + ) or E(X − ) is finite, in which case E(X) =
E(X + ) − E(X − ).
A random variable X is integrable if E|X| < ∞ and is square-integrable
if E(X 2 ) < ∞. The terms expected value, expectation, and mean are
interchangeable.
The variance of an integrable random variable X, written Var(X),
is
Var(X) = E(X 2 ) − E(X)2 .
The covariance of square-integrable random variable X and Y , writ-
ten Cov(X, Y ), is
Cov(X, Y ) = E(XY ) − E(X)E(Y ).
If neither X or Y is almost surely constant, then their correlation,
written ρ(X, Y ), is
Cov(X, Y )
ρ(X, Y ) = .
Var(X)1/2 Var(Y )1/2
Random variables X and Y are called uncorrelated if Cov(X, Y ) = 0.
Theorem 1.11. Let the function g : R → R be such that g(X) is
integrable.
If X is a discrete random variable with probability mass function pX
taking values in a countable set S then
X
E(g(X)) = g(t) pX (t).
t∈S
4
If X is an absolutely continuous integrable random variable with den-
sity function fX then
Z ∞
E(g(X)) = g(t) fX (t) dt.
−∞

Theorem 1.12 (Cauchy–Schwarz inequality). Let X and Y square-


integrable random variables. Then
E(XY )2 ≤ E(X 2 )E(Y 2 )
with equality if and only if aX = bY almost surely, for some constants
a, b ∈ R.
1.4. Conditional probability and expectation, independence.
Definition 1.13. Let B be an event with P(B) > 0. The conditional
probability of an event A given B, written P(A|B), is
P(A ∩ B)
P(A|B) = .
P(B)
The conditional expectation of X given B, written E(X|B), is
E(X1B )
E(X|B) = .
P(B)
Theorem 1.14 (The law of total S∞probability). Let B1 , B2 , . . . be dis-
joint, non-null events such that i=1 Bi = Ω. Then

X
P(A) = P(A|Bi )P(Bi )
i=1

for all events A.


Definition 1.15. Let X and Y be random variables. The function
FX,Y : R2 → [0, 1] defined by
FX,Y (s, t) = P(X ≤ s, Y ≤ t)
is called their joint distribution function.
If both X and Y are discrete random variables, then the function
pX,Y : R2 → [0, 1] defined by
pX,Y (s, t) = P(X = s, Y = t)
is called their joint mass function. The conditional mass function of
X given Y = t, where pY (t) > 0, is defined as
pX,Y (s, t)
pX|Y (s|t) = P(X = s|Y = t) = .
pY (t)
5
If there exists a function fX,Y : R2 → [0, ∞) such that
Z s Z t
FX,Y (s, t) = fX,Y (u, v) du dv
u=−∞ v=−∞
then X and Y are said to be jointly absolutely continuous and fX,Y is
called their joint density function. The conditional density function of
X given Y = t, where fY (t) > 0, is defined as
1 fX,Y (s, t)
fX|Y (s|t) = lim P(|X − s| < δ |Y − t| < ) = .
δ↓0,↓0 δ fY (t)
Theorem 1.16. Let the function g : R2 → R be such that g(X, Y ) is
integrable.
If X and Y are discrete and taking values in a countable set S then
X
E(g(X, Y )) = g(s, t) pX,Y (s, t).
s,t∈S

If X and Y are jointly absolutely continuous then


Z ∞Z ∞
E(g(X, Y )) = g(s, t) fX,Y (s, t) ds dt.
−∞ −∞

Definition 1.17. Random variables X and Y are jointly normal with


2
means µX and µY , variances σX and σY2 , and correlation ρ, written
     2

X µX σX σX σY ρ
∼N ,
Y µY σX σY ρ σY2
if the joint density function is
1 1
fX,Y (s, t) = p exp(− Q(s, t))
2πσX σY 1 − ρ2 2
where
 
1 s − µX 2 s − µX  t − µY  t − µY 2
Q(s, t) = − 2ρ +
1 − ρ2 σX σX σY σY
Definition 1.18. The conditional expectation of X given Y = t, writ-
ten E(X|Y = t), is defined by either Definition 1.13, if P(Y = t) > 0,
or by the formula
E(X1{|Y −t|<} )
E(X|Y = t) = lim
↓0 P(|Y − t| < )

if P(Y = t) = 0.
For fixed random variable X and Y , let h : R → R be the function
defined by h(t) = E(X|Y = t). The conditional expectation of X given
Y , written E(X|Y ), is
E(X|Y ) = h(Y ).
6
Theorem 1.19. Let the function g : R → R be such that g(X) is
integrable.
If X and Y are discrete taking values in S and pY (t) > 0, then
X
E(g(X)|Y = t) = g(s) pX|Y (s, t).
s∈S

If X and Y are jointly absolutely continuous and if fY (t) > 0 then


Z ∞
E(g(X)|Y = t) = g(s) fX|Y (s, t) ds.
−∞

Theorem 1.20 (The law of iterated expectation). Let X and Y be


random variables and suppose X is integrable. Then
E(X) = E(E(X|Y )).
Definition 1.21. Let A1 , A2 , . . . be events. If
\ Y
P( Ai ) = P(Ai )
i∈I i∈I

for every finite subset I ⊂ N then the events are said to be independent.
Random variables X1 , X2 , . . . are called independent if the events
{X1 ≤ t1 }, {X2 ≤ t2 }, . . . are independent. [The phrase “independent
and identically distributed” is often abbreviated i.i.d.]
Theorem 1.22. If X and Y are discrete and independent random
variables then
pX,Y (s, t) = pX (s)pY (t).
If X and Y are jointly absolutely continuous and independent random
variables then
fX,Y (s, t) = fX (s)fY (t).
If X and Y are independent and integrable, then
E(XY ) = E(X)E(Y ).
1.5. Probability inequalities.
Theorem 1.23 (Markov’s inequality). Let X be a nonnegative random
variable with E(X) < ∞. Then
E(X)
P(X ≥ A) ≤
A
for all A > 0.
7
Corollary 1.24 (Chebychev’s inequality). Let X be a random variable
with E(X) = µ and Var(X) = σ 2 < ∞. Then
σ2
P(|X − µ| ≥ A) ≤
A2
for all A > 0.
1.6. Generating and characteristic functions.
Definition 1.25. Let X be a random variable on (Ω, F, P) taking
values in {0, 1, 2, . . .}. The moment generating function of X is the
function GX : [0, 1] → [0, 1] defined by
GX (t) = E(tX )
for all t ∈ (0, 1] and GX (0) = P(X = 0).
Theorem 1.26. Let GX be the moment generating function of X.
Then
1 (n)
P(X = n) = GX (0)
n!
(n)
for all n ∈ N, where GX (0) denotes the n-th derivative of GX evalu-
tated at 0.
Definition 1.27. Let X be a random variable on (Ω, F, P). The
Laplace transform of X is the function MX : R → R ∪ {∞} defined by
MX (t) = E(etX )
for all t ∈ R.
Theorem 1.28. Let MX be the Laplace transform of a random variable
X, and suppose there exists an  > 0 such that MX (t) < ∞ for all
− < t < . Then
(n)
E(X n ) = MX (0)
(n)
for all n ∈ N, where MX (0) denotes the n-th derivative of MX eva-
lutated at 0. (The number µn = E(X n ) is called the n-th moment of
X.)
Definition 1.29. The characteristic function of a real-valued random
variable X is the function φX : R → C defined by
φX (t) = E(eitX )

for all t ∈ R, where i = −1.
Theorem 1.30 (Uniqueness of generating and characteristic func-
tions). Let X and Y be real-valued random variables with distribution
functions FX and FY .
8
• Let φX and φY be the characteristic functions of X and Y . Then
φX (t) = φY (t) for all t ∈ R
if and only if
FX (t) = FY (t) for all t ∈ R.
• Let X and Y be valued in N with moment generating functions
GX and GY . Then
GX (t) = GY (t) for all t ∈ [0, 1]
if and only if
FX (t) = FY (t) for all t ∈ R.
• Let MX and MY be the Laplace transforms of X and Y . If there
exists an  > 0 such that
MX (t) = MY (t) < ∞ for all t ∈ (−, )
then
FX (t) = FY (t) for all t ∈ R.
Theorem 1.31. Inversion formula. Let X be a random variable with
characteristic function ϕ. Then for any a < b
Z T −ita
1 e − e−itb 1
lim ϕ(t)dt = P(a < X < b)+ (P(X = a)+P(X = b))
T →∞ 2π −T it 2
2. Fundamental probability results
Definition 2.1. Let x1 , x2 , . . . be a sequence of real numbers. The
limit superior is defined by
lim sup xn = inf sup xn
n↑∞ N ∈N n≥N

and limit inferior by


lim inf xn = sup inf xn .
n↑∞ N ∈N n≥N

Equivalently, if x ∈ R then

{n ∈ N : xn > x + } is finite
x = lim sup xn ⇔ for all  > 0
n↑∞ {n ∈ N : xn > x − } is infinite
and

{n ∈ N : xn < x + } is infinite
x = lim inf xn ⇔ for all  > 0
n↑∞ {n ∈ N : xn < x − } is finite
If lim supn↑∞ xn = lim inf n↑∞ xn = x then the sequence x1 , x2 , . . . is
convergent and with limit x = limn↑∞ xn .
9
Definition 2.2 (Modes of convergence). Let X1 , X2 , . . . and X be ran-
dom variables.
• Xn → X almost surely if P(Xn → X) = 1
• Xn → X in Lp , for p ≥ 1, if E|X|p < ∞ and E|Xn − X|p → 0
• Xn → X in probability if P(|Xn − X| > ) → 0 for all  > 0
• Xn → X in distribution if FXn (t) → FX (t) for all points t ∈ R
of continuity of FX
Theorem 2.3. The following implications hold:

Xn → X almost surely 
or ⇒ Xn → X in probability ⇒ Xn → X in distribution
Xn → X in Lp , p ≥ 1 
Furthermore, if r ≥ p ≥ 1 then Xn → X in Lr ⇒ Xn → X in Lp .
Definition 2.4. Let A1 , A2 , . . . be events. The term eventually is de-
fined by [ \
{An eventually} = An
N ∈N n≥N
and infinitely often by
\ [
{An infinitely often} = An .
N ∈N n≥N

[The phrase “infinitely often” is often abbreviated i.o.]


Theorem 2.5 (The first Borel-Cantelli lemma). Let A1 , A2 , . . . be a
sequence of events. If

X
P(An ) < ∞
n=1
then P(An infinitely often) = 0.
Theorem 2.6 (The second Borel-Cantelli lemma). Let A1 , A2 , . . . be a
sequence of independent events. If
X∞
P(An ) = ∞
n=1

then P(An infinitely often) = 1.


Theorem 2.7 (A sufficient condition for almost sure convergence). Let
X1 , X2 , . . . and X be random variables. If
X∞
P(|Xn − X| > ) < ∞
n=1
for all  > 0 then Xn → X almost surely.
10
Theorem 2.8 (Monotone convergence theorem). Let X1 , X2 , . . . be
positive random variables with Xn ≤ Xn+1 almost surely for all n ≥ 1,
and let X = supn∈N Xn . Then Xn → X almost surely and
E(Xn ) → E(X).
Theorem 2.9 (Fatou’s lemma). Let X1 , X2 , . . . be positive random
variables. Then
E(lim inf Xn ) ≤ lim inf E(Xn ).
n↑∞ n↑∞

Theorem 2.10 (Dominated convergence theorem). Let X1 , X2 , . . . and


X be random variables such that Xn → X almost surely. If E(supn≥1 |Xn |) <
∞ then
E(Xn ) → E(X).
Definition 2.11. Let X1 , X2 , . . . be a collection of random variables,
and let Sn = X1 +. . .+Xn . The collection of random variables satisfies a
weak law of large numbers if Sn /n → µ in probability for some constant
µ. The collection satisfies a strong law of large numbers if Sn /n → µ
almost surely.
Theorem 2.12 (A weak law of large numbers). Let X1 , X2 , . . . be in-
dependent and indentically distrubuted with
N P(|Xi | > N ) → 0 and E(X1{|Xi |≤N } ) → µ as N ↑ ∞
for some µ ∈ R then
X1 + . . . + Xn
→ µ in probability.
n
Theorem 2.13 (A strong law of large numbers). Let X1 , X2 , . . . be
independent and identically distributed integrable random variables with
common mean E(Xi ) = µ. Then
X1 + . . . + Xn
→ µ almost surely.
n
Theorem 2.14 (Lévy’s continuity theorem). Let X1 , X2 , . . . and X be
random variables with characteristic functions φ1 , φ2 , . . . and φ respec-
tively. The following are equivalent:
• Xn → X in distribution
• φn (t) → φ(t) for all t ∈ R
Theorem 2.15 (Central limit theorem). Let X1 , X2 , . . . be independent
and identically distributed with E(Xi ) = µ and Var(Xi ) = σ 2 for each
i = 1, 2, . . ., and let
X1 + . . . + Xn − nµ
Zn = √ .
σ n
11
Then Zn → Z in distribution, where Z ∼ N (0, 1).

3. Markov chains
Definition 3.1. A stochastic process is a collection (Xi )i∈I of random
variables. If the index set I is N, then the stochastic process is called
discrete time and is denoted by (Xn )n≥0 . If the index set I is [0, ∞),
then the stochastic process is called continuous time and is denoted by
(Xt )t≥0 .
3.1. Discrete time Markov chains.
Definition 3.2. Let (Xn )n≥0 be a discrete time stochastic process tak-
ing values in a countable set S. The process (Xn )n≥0 is called a Markov
chain if
P(Xn = in |X0 = i0 , . . . , Xn−1 = in−1 ) = P(Xn = in |Xn−1 = in−1 )
for each n ≥ 1, and s0 , . . . , sn ∈ S. The set S is called the state space of
the Markov chain, and an elements of S is called a state. The Markov
chain is called time homogeneous if
P(Xn = j|Xn−1 = i) = P(X1 = j|X0 = i)
for all n ≥ 1 and all states i, j ∈ S.
Throughout these notes, all Markov chains are assumed to be time
homogeneous.
Definition 3.3. Let (Xn )n≥0 be a Markov chain on a state space S.
The numbers pij = P(X1 = j|X0 = i) for i, j ∈ S are called the one-
step transition probabilities, and pij (n) = P(Xn = j|X0 = i) for n ≥ 0
the n-step transition probabilities.
The |S| × |S| matrix P = (pij )i,j∈S is called the transition matrix.
We use the notation
Pi (A) = P(A|X0 = i)
for any event A ∈ F, and
Ei (Z) = E(Z|X0 = i)
for any random variable Z for which the conditional expectation can
be defined.
Theorem 3.4 (Chapman-Kolmogorov equations). Let pij (n) for i, j ∈
S and n ≥ 0 denote the n-step transition probabilities of a Markov
chain with state space S. Then
X
pik (m + n) = pij (m)pjk (n)
j∈S
12
for all i, k ∈ S and m, n ≥ 0. In particular, the n-step transition
probabilities are given by
pij (n) = (P n )ij
where P is the transition matrix of the Markov chain.
Definition 3.5. A |S| × |S|P matrix P = (pij )i,j∈S is called stochastic
if pij ≥ 0 for all i, j ∈ S and j∈S pij = 1 for all i ∈ S. In particular, a
matrix is stochastic if and only if it is the transition matrix of a Markov
chain.
Theorem 3.6. Let P be an d × d matrix where d < ∞. Let λ1 , . . . , λd
be the eigenvalues of P . If the eigenvalues are distinct, then there exists
(k)
complex numbers aij for i, j, k ∈ {1, . . . , d} such that
d
X
n (k)
(P )ij = aij λnk
k=1

for all n ∈ N. More generally, if the eigenvalues are not necessarily dis-
(k)
tinct, then there exists polynomials aij : N → C for i, j, k ∈ {1, . . . , d}
of degree less than the multiplicity of the eigenvalue λk such that
d
X
n (k)
(P )ij = aij (n)λnk
k=1

for all n ∈ N.
Theorem 3.7. Let (Xn )n≥0 be a Markov chain on a state space S, and
let A ⊆ S. Define the random variable H A valued in N ∪ {∞} by
H A = inf{n ≥ 0 : Xn ∈ S}.
[The standard convention inf ∅ = ∞ is used throughout.] For each state
i ∈ S let hA A A
i = Pi (H < ∞). Then (hi )i∈S is the minimal non-negative
solution to 
A 1 if i ∈ A
hi = P A
p h
i∈S ij j otherwise
Definition 3.8. Let pij (n) for i, j ∈ S and n ≥ 0 denote the n-step
transition probabilities of a Markov chain with state space S.
States i leads to state j, written i → j, if there exists an n ≥ 0 such
that pij (n) > 0. States i and j communicate, written i ↔ j, if i → j
and j → i. The communicating class containing a state i is the largest
subset C ⊆ S with the property that if state j is in C then states i and
j communicate.
13
A communicating class C is called closed it has the property if i ∈ C
and i → j then j ∈ C. Otherwise, C is called open if there exists a
state i in C and a state j not in C such that i leads to j.
A Markov chain is irreducible if all states in S communicate.
Definition 3.9. A state i ∈ S is called recurrent if
Pi (Xn = i infinitely often) = 1,
and called transient if
Pi (Xn = i infinitely often) < 1.
Let C ⊆ S be a communicating class of the Markov chain. The class
is called a transient class if every state i ∈ C is transient, and the class
is called a recurrent class if every state i ∈ C is recurrent.
[The terms recurrent and persistent are interchangeable.]
Definition 3.10. Let (Xn )n≥0 be a Markov chain with state space S.
A stopping time T is a random variable taking values in N∪{∞} having
the property that for each n ∈ N the event {T = n} is determined by
X0 , . . . , Xn } in the sense that there exists a function f : S n → {0, 1}
such that
1{T =n} = f (X0 , . . . , Xn ).
Theorem 3.11 (The strong Markov property). Let (Xn )n≥0 be a Markov
chain with state space S and let T be a stopping time. Then conditional
on the events {T < ∞} and XT = i, the random process (XT +n )n≥0 is
a Markov chain starting at i, independent of X0 , . . . , XT .
Theorem 3.12. Let (Xn )n≥0 be a Markov chain with state space S.
For each state i ∈ S define the random variable Ti taking values in
N ∪ {∞} by
Ti = inf{n ≥ 1 : Xn = i}.
The following are equivalent:
• State
P∞ i is recurrent.
• n=1 pii (n) = ∞.
• Pi (Ti < ∞) = 1
Moreover, the following are equivalent:
• State i is transient.
• Pi (Xn = i infinitely often) = 0.
• Pi (T
P i < ∞) < 1.
∞ 1
• n=1 pii (n) = 1−Pi (Ti <∞) < ∞.
14
Definition 3.13. Let (Xn )n≥0 be a Markov chain and let Ti = inf{n ≥
1 : Xn = i}. The state i is called positive recurrent if Ei (Ti ) < ∞ and is
called null recurrent otherwise. [The terms positive and non-null are
interchangeable in the context of recurrent Markov chains.]
Theorem 3.14 (Recurrence, transience, and positive recurrence are
class properties). Let i and j be communicating states of a Markov
chain. Then i is recurrent if and only if j is recurrent. Equivalently, i
is transient if and only if j is transient. Also, i is positive recurrent if
and only if j is positive recurrent.
Definition 3.15. Let P be the transition matrix of a Markov chain.
An invariant Pdistribution is a row vector π = (πi )i∈S such that πi ≥ 0
for all i ∈ S, i∈S πi = 1 and
πP = π.
[The terms invariant distribution and stationary distribution are inter-
changeable.]
Theorem 3.16. Let (Xn )n≥0 be an irreducible Markov chain with tran-
sition matrix P . Then (Xn )n≥0 is positive recurrent if an only if there
exists an invariant distribution for P . If the Markov chain is positive
recurrent then the invariant distribution is unique and is given by the
formula
1
πi =
Ei (Ti )
where Ti = inf{n ≥ 1 : Xn = i}.
Theorem 3.17. Let (Xn )n≥0 be an irreducible recurrent Markov chain
on S with transition matrix P . Let k ∈ S be a fixed state and let
Tk = inf{n ≥ 1 : Xn = k}. For each i ∈ S let
Tk
X
(k)
γi = Ek 1{Xn =i}
n=1

be the expected number of times, conditional starting from state k, the


chain visits state i before returning to k, and let γ (k) be the row vector
(k)
(γi )i∈S . Then γ (k) is satisfies the equation
γ (k) P = γ (k)
(k)
with γk = 1. If (Xn )n≥0 is positive recurrent with invariant distribu-
tion π then
(k) πi
γi = .
πk
15
Definition 3.18. Let (pij (n))i,j∈S be the n-step transition probabilities
for a Markov chain. The period di of the state i is given by
di = gcd{n ≥ 1 : pii (n) > 0}.
A state i is aperiodic if di = 1.
A communicating class C is has period d if d = di for all i ∈ C. A
communicating class is aperiodic if it has period d = 1.
Theorem 3.19 (Periods are class properties). If i and j are com-
municating states of a Markov chain, then they have the same period
di = dj .
Definition 3.20. A Markov chain is ergodic if it is irreducible, positive
recurrent, and aperiodic.
Theorem 3.21 (Ergodic theorems). Let (Xn )n≥0 be an ergodic Markov
chain on S with n-step transition probabilites pij (n) and let Ti = inf{n ≥
1 : Xn = i}. Let π be the unique invariant distribution.

pij (n) → πj
for all i, j ∈ S where π.

n
1X
1{Xk =j} → πj almost surely
n k=0
for all j ∈ S, independently of the distribution of X0 .
3.2. Continuous time Markov chains.
Definition 3.22. Let (Xt )t≥0 be a continuous time stochastic pro-
cess taking values in a countable set S such that t 7→ Xt (ω) is right-
continuous for almost all ω ∈ Ω. Then (Xt )t≥0 is a Markov chain if
P(Xtn = in |Xt0 = i0 . . . , Xtn−1 = in−1 ) = P(Xtn = in |Xtn−1 = in−1 )
for each n ≥ 1, and i0 , . . . , in ∈ S. The Markov chain is called time
homogeneous if
P(Xt = j|Xs = i) = P(Xt−s = j|X0 = i)
for all 0 ≤ s ≤ t and all states i, j ∈ S.
As before, all Markov chains in these notes are time homogeneous.
Notation 3.23. Let (Xt )t≥0 be a Markov chain on a state space S.
Let pij (t) = Pi (Xt = j) for i, j ∈ S denote the transition probabilities,
and let P (t) = (pij (t))i,j∈S denote the |S| × |S| transition matrix.
16
Theorem 3.24. The collection (P (t))t≥0 of transition matrices for a
continuous time Markov chain has the following properties:
• For all t ≥ 0 the matrix P (t) is stochastic.
• P (0) = I where I is the |S| × |S| identity matrix
• The Chapman-Kolmogorov equation P (s P+ t) = P (s)P (t) holds
for s, t ≥ 0. Equivalently, pik (s + t) = j∈S pij (s)pjk (t) for all
i, k ∈ S.
Definition 3.25. The collection (P (t))t≥0 of transition matrices is uni-
form if the following conditions hold:
• There exist finite constants gij ≥ 0 such that
pij (t)
lim = gij
t↓0 t
for all i 6= j.
• There exist finite constants gii ≤ 0 such that
X
gii = − gij
j6=i

for all i.
• inf i gii > −∞
The matrix G = (gij )i,j∈S is called the generator of the Markov process.
All Markov chains considered here have uniform transition matrices.
Theorem 3.26. Let (Xt )t≥0 be a Markov chain with generator G, and
for each state i, let
Ui = inf{t > 0, Xt 6= i}.
Then conditional on X0 = i, the random variable Ui is exponential with
rate −gii ; that is
Pi (Ui > t) = egii t .
Furthermore,
gij
Pi (XU = j) = .
−gii
Theorem 3.27. Let G be the generator of a Markov chain with tran-
sition matrices (P (t))t≥0 . Then for all t ≥ 0
P
• j∈S gij = 0, or in matrix notation, G1 = 0 where 1 =
(1, 1, . . .)
• P (t) = etG = ∞ tn Gn
P
n=0 n!
• (Kolmogorov’s forward equation) P 0 (t) = P (t)G
• (Kolmogorov’s backward equation) P 0 (t) = GP (t)
17
where P 0 (t) denotes the matrix (p0ij (t))i,j∈S and p0ij denotes the deriva-
tive of pij .
Definition 3.28. An invariant distribution for a Markov chain with
transition matrices (P (t))t≥0 is a row vector π = (πi )i∈S such that
πP (t) = π for all t ≥ 0.
Theorem 3.29. Let G be the generator of a Markov chain. Then a
row vector π is an invariant distribution if and only if
πG = 0.
Definition 3.30. Let (pij (t))i,j∈S be the transition probabilities of a
Markov chain. The Markov chain is irreducible if pij (t) > 0 for all
t > 0.
Theorem 3.31. Let (pij (t))i,j∈S be the transition probabilities of an
irreducible Markov chain. If there exists an invariant distribution π,
the invariant distribution is unique and
pij (t) → πj as t ↑ ∞
for all i, j ∈ S . Otherwise, if no invariant distribution exists, then
pij (t) → 0.
4. Martingales
Theorem 4.1. Let X be an integrable random variable defined on the
probability space (Ω, F, P), and let G ⊆ F be a sigma-field of F. Then
there exists an integrable G-measureable random variable Y such that
E(1G Y ) = E(1G X)
for all G ∈ G. Furthermore, if there exists another G-measureable
random variable Y 0 such that E(1G Y 0 ) = E(1G X) for all G ∈ G, then
Y = Y 0 almost surely.
Definition 4.2. Let X be an integrable random variable and let G ⊆ F
be a sigma-field. The conditional expectation of X given G, written
E(X|G), is a G-measurable random variable with the property that
E [1G E(X|G)] = E(1G X)
for all G ∈ G.
Theorem 4.3. Let X and Y be integrable random variables defined on
(Ω, F, P), let H ⊆ G ⊆ F be sigma-algebras, and a, b ∈ R be constants.
• E(X|{∅, Ω}) = E(X)
• E(X|F) = X.
• E(aX + bY |G) = aE(X|G) + bE(Y |G)
18
• If X ≥ 0 almost surely, then E(X|G) ≥ 0 almost surely
• E[E(X|G)|H] = E[E(X|H)|G] = E(X|H)
• If Y is independent of G (the events {Y ≤ t} and G are inden-
dent for each t ∈ R and G ∈ G) then E(Y |G) = E(Y )
• If Y is G-measurable (the events {Y ≤ t} are in G for each
t ∈ R) then E(XY |G) = Y E(X|G)
Theorem 4.4. Let X be an integrable random variables defined on
(Ω, F, P).
• Let
S B1 , B2 , . . . be a sequence of disjoint non-null events with
n Bn = Ω. Let G be the smallest sigma-field containing {B1 , B2 , . . . , ...}.
Then
E(X1Bn )
E(X|G) = if ω ∈ Bn .
P(Bn )
• Let Y be a random variable on (Ω, F, P) and let G be the small-
est sigma-field containing the events {Y ≤ t} for all t ∈ R.
Then
E(X|G) = E(X|Y )
where the right side is defined by Definition 1.18.
Definition 4.5. A filtration (Fn )n≥0 on Ω is a collection of sigma-fields
on Ω such that
F0 ⊆ F1 ⊆ F2 ⊆ . . . .
Definition 4.6. A martingale relative to a filtration (Fn )n≥0 is a sto-
chastic process with the following properties:
• E|Mn | < ∞ for all n ≥ 0
• E(Mn+1 |Fn ) = Mn for all n ≥ 0.
Theorem 4.7 (The L2 martingale convergence theorem). Let (Mn )n≥0
be a martingale relative to a filtration (Fn )n≥0 . If supn≥0 E(Sn2 ) < ∞
then there exists a random variable M such that E(M 2 ) < ∞ and
Mn → M in L2 . Furthermore Mn = E(M |Fn ).
Definition 4.8. A stopping time for a filtration (Fn )n≥0 is a random
variable T : Ω → N ∪ {∞} with the property that {T = n} ∈ Fn for
each n ≥ 0.
Theorem 4.9 (The optional sampling theorem). Let (Mn )n≥0 be a
martingale relative to a filtration (Fn )n≥0 , let T a bounded stopping
time (there exists a real constant C > 0 such that T (ω) < C for almost
all ω ∈ Ω.) Then
E(MT ) = E(M0 ).
19
Theorem 4.10 (The optional stopping theorem). Let (Mn )n≥0 be a
martingale relative to a filtration (Fn )n≥0 , let T be almost surely finite
stopping time. (that is P(T < ∞) = 1.) If there is C > 0 such that
|Mn | ≤ C if n ≤ τ
then
E(MT ) = E(M0 ).
Theorem 4.11 (The optional stopping theorem). Let (Mn )n≥0 be a
martingale relative to a filtration (Fn )n≥0 , let T be almost surely finite
stopping time. (that is P(T < ∞) = 1.) If supn≥0 E(Mn2 ) < ∞ then
E(MT ) = E(M0 ).
As an application:

Pn 4.12. Let (Sn , n ≥ 0) be simple random walk on , i.e.,


Theorem
Sn = i=1 Xi with Xi = ±1 with probability 1/2, i.i.d. Let a, b ≥ 1
and let τa = inf{n ≥ 1 : Xn = a}. Then
b
P(τa ≤ τ−b ) =
a+b
and if τ = τa ∧ τ−b , then
E(τ ) = ab

20
R the set of real numbers
R the set of extended real numbers {−∞} ∪ R ∪ {∞}
N the set of natural numbers {0, 1, 2, . . .}
C the set of complex numbers
Z the set of integers {. . . , −2, −1, 0, 1, 2, . . .}
P(Ω) the set of all subsets of Ω, called the power set of Ω
Ac the complement of the set A
A⊂B A is a proper subset of B
A⊆B A is a subset of B, and the possibility that A = B is allowed

FX the distribution function of a random variable X


pX the mass function of a discrete random variable X
fX the density function of an absolutely continuous random variable X
FX,Y the joint distribution function of X and Y
pX,Y the joint mass function of X and Y
fX,Y the joint density function of X and Y
pX|Y the conditional mass function of X given Y
fX|Y the conditional density of X given Y
GX the probability generating function of X
MX the moment generating function of X
φX the characteristic function of X

E(X) the expected value of the random variable X


Var(X) the variance of X
Cov(X, Y ) the covariance of X and Y
E(X|B) the conditional expectation of X given the event B
E(X|Y = t) the conditional expectation of X given the (possibly null) event Y = t
E(X|Y ) the conditional expectation of X given the random variable Y
Ei (Z) the conditional expection E(Z|X0 = i), where (Xn )n≥0 is a Markov chain
E(X|G) the conditional expectation of X given the sigma-field G
a+ max{a, 0}
a− max{−a, 0}

X∼ν the random variable X is distributed as the probability measure ν


1A the indicator function of the event A
N (µ, σ 2 ) the normal probability law with mean µ and variance σ 2
bin(n, p) the binomial probability law with parameters n and p
exp(λ) the exponential probability law with parameter λ
unif(a, b) the uniform probability law on the interval (a, b)

lim supn↑∞ xn the limit superior of the sequence x1 , x2 , . . .


lim inf n↑∞ xn the limit inferior of the sequence x1 , x2 , . . .
Lp the set of random 21variables X with E|X|p < ∞

gcd(A) the greatest common divisor of the set A ⊆ N


Table 1. Notation

You might also like