Stochastic Processes: Set Theory Basics
Stochastic Processes: Set Theory Basics
Lecture 2: Preliminaries
1
ther finite nor countably infinite is uncountable. A set is countable if it
is not uncountable.2 Given an indexed family of sets {Ab ; b ∈ β}, where
β is an arbitrary non-empty index set, the Cartesian or direct product
of these sets is Πb∈β Ab which contains all functions f : β → b∈β Ab
S
such that f (b) ∈ Ab for all b ∈ β. The union of these sets, denoted
b∈β Ab , contains exactly those x contained by some Ab . The intersec-
S
by all Ab ’s. Two sets are disjoint if they have an empty intersection.
A collection {Ab ; b ∈ β} of subsets of X is called a partition of X if
b∈β Ab = X and ∀a, b ∈ β, a 6= b ⇒ Aa Ab = ∅. DeMorgan’s law
S T
says that
[ A b ]c = Acb , [ Ab ]c = Acb .
[ \ \ [
(1)
b∈β b∈β b∈β b∈β
2
if
(i) ∀x, y ∈ A, x 6= y, either xCy or yCx but never xCx;
(ii) ∀x, y, z ∈ A, xCy, yCz ⇒ xCz.
Given an order relation C on A and given two distinct elements of A,
say aCb, the set {z ∈ A : aCz, zCb} is an open interval. From now on,
use > to represent an order relation. Define ≥ as such that ∀x, y ∈ X,
x ≥ y if it is not true that y > x. (Thus ≥ is another binary relation on
X.) Suppose that (X, >) is an ordered set and A ⊂ X. We say a ∈ A
is the largest element of A if ∀b ∈ A, a ≥ b. The smallest element is
defined analogously. We say that A is bounded above in X if for some
x ∈ X, we have x ≥ a for all a ∈ A, and in this case, x is an upper
bound of A. Similarly we can define bounded below and lower bounds.
If there exists in X a largest element in the set of lower bounds of A,
denoted inf A, we call it the greatest lower bound (or the infimum) of
A. Similarly we can define the least upper bound (or the supremum)
of A, and denote it by sup A. An ordered set (X, >) is said to have the
least upper bound property if each of its nonempty subsets has a least
upper bound in X.
3
if F c is open. The closure of G ⊂ X is the smallest (with respect to
the order relation “inclusion” on X) closed set containing G, and the
interior of F is the largest open set contained in F . A point x ∈ X is
a limit point of A ⊂ X if for all e > 0, B(x, e) A {x}c 6= ∅. The set
T T
4
(i) kxk = 0 ⇒ x = 0 (that is, only the zero vector has zero length);
(ii) kxk ≥ 0, ∀x ∈ L (that is, the length of any vector cannot be nega-
tive);
(iii) kaxk = |a|kxk for all x ∈ L and all a ∈ R; and
(iv) kx + yk ≤ kxk + kyk, ∀x, y ∈ L (that is, the triangular inequality
must hold).
Note that k · k induces a metric d on L by
6. The following “Projection Theorem” holds true for all Hilbert spaces:
Suppose L is a Hilbert space and M ⊂ L is itself a closed subspace
given the same norm k · k. Then, for each x ∈ L, there exists a unique
x̂ ∈ M such that
5
as fn for all n ∈ Z+ . We say that l ∈ R is a limit of the sequence {xn }
if for all > 0 there is an N () ∈ Z+ such that
10. In this section we define supremum, infimum, limit superior, and limit
inferior.
3
Hereafter we abuse the notation {xn ; n ∈ Z+ } to let it denote both the function
x : Z+ → R itself and its image set {x(n); n ∈ Z+ }.
6
Proof Since {an } is increasing and bounded above, by the axiom
of continuity it converges to some a ∈ R. Of course, a ≥ an , ∀n.
Similarly, for some b ∈ R, bn → b, with b ≤ bn , ∀n. Since
limn→∞ (bn − an ) = 0, we have a = b, which is contained in every
In ≡ [an , bn ]. Suppose there is another a0 ∈ ∩n In . Without loss
of generality, suppose a > a0 . Then, (a0 , a) ⊂ In for all n, and
hence bn − an ≥ (a0 − a) > 0 for all n, contradicting the fact that
limn→∞ (bn − an ) = 0.
• (Least Upper Bound Property of R) Suppose that A is a non-
empty subset of R, and A is bounded above by some u ∈ R. Let
U be the set of upper bounds of A. Then, there exists a unique
member of U , denoted sup A, which is the minimal element of U ,
also written as min U . That is, any non-empty subset of R that
is bounded above has a least upper bound. (Similarly, any non-
empty subset of R that is bounded below has a greatest lower
bound.) Moreover, for all > 0, there exists some a ∈ A such
that a > sup A − .
Proof As A is nonempty, let x ∈ A, and relabel a1 = x, b1 = u.
If u ∈ A, then u = sup A obviously; or else, take x+u 2
. Now
either x < x+u2
∈ U c
or u > x+u
2
∈ U . In the former case, label
x+u
a2 = 2 and b2 = u; and in the latter case, label a2 = x and
b2 = x+u2
. In either case, there will be an element of A which
is greater than or equal to a2 , and b2 remains an upper bound
of A. Continuing this way, either we have bn ∈ A for some n so
that bn = sup A, or we have an infinite sequence of nested closed
intervals {[an , bn ]; n ∈ Z+ } of which the lengths converge to zero.
In the latter case, by the nested intervals theorem, there must be a
unique number limn→∞ bn which is an upper bound of A (because
each and every bn is), and if v < limn→∞ bn then v < an for some
n, and hence there is some element of A greater than v, showing
that v cannot be an upper bound of A. That is, limn→∞ bn is the
desired least upper bound of A in the latter case. This shows that
sup A exists.
Finally, consider the last assertion. Suppose instead that there
exists some > 0 such that for all a ∈ A with a + ≤ sup A. This
would mean that sup A − is also an upper bound of A, which is
7
less than sup A, a contradiction!4
• Define the extended real line R ≡ R {+∞, −∞}. Define sup ∅ =
S
8
Proof First recall that if inf A = z, then (i) a ∈ A ⇒ a ≥ z;
and (ii) e > 0 ⇒ a < z + e for some a ∈ A. In fact the converse
is also true. Now we claim that u = limxn , if and only if (i)
e > 0 ⇒ xk < u + e for all k ≥ some n; and (ii) given e > 0
and n, we have xk > u − e for some k ≥ n. To this end, define
yn ≡ supk≥n xk ≥ u for all n. Given n and e > 0, there exists
some k ≥ n such that xk ≥ yn − e ≥ u − e, which is (ii). On the
other hand, given e > 0 there exists some n such that yn < u + e,
which implies that for all k ≥ n, xk < u + e. Thus we obtain (i).5
Now note that inf k≥n xk ≤ supk≥n xk , and hence limxn ≤ limxn .
It is obvious that6 lim(−xn ) = −limxn and that xn converges in
the extended real line if and only if the upper and the lower limits
coincide.7
Now to prove these inequalities, we observe that given n,
sup xk + sup yk ≥ xk + yk , ∀k ≥ n;
k≥n k≥n
that is, supk≥n xk + supk≥n yk is one upper bound of the set {xk +
yk ; k ≥ n}. By definition, any upper bound is greater than or
equal to the least upper bound, and hence we conclude that
Now, note that {an }, {bn }, and {cn } are all decreasing sequences
converging on the extended real line,8 and term by term, an +bn ≥
5
Similarly, l is the lower limit of {xn } if and only if (i) given e > 0 and n, xk < l + e
for some k ≥ n; and (ii) given e > 0 there is some n such that xk > l − e for all k ≥ n.
6
Recall that sup{−x : x ∈ A ⊂ R} = − inf A.
7
Note that e > 0 ⇒ l − e < xk < u + e for all k ≥ some n. If u = l, this is identical to
the definition that {xn } converges to l. On the other hand, it is easy to show that for a
convergent sequence, the upper and the lower limit must be the same.
8
Note that
sup xk ≥ xj , ∀j ≥ n
k≥n
⇒ sup xk ≥ xj , ∀j ≥ n + 1
k≥n
9
cn , we have
lim an + lim bn ≥ lim cn ,
or equivalently,
Immediately, we have
= limxn + limyn .
The rest inequalities can be analogously proved.
10
{yn } from the original sequence {xn }, and correspondingly a sequence
of nested closed intervals {In } such that the length of In converges to
zero and yn ∈ In for all n ∈ Z+ . By the nested intervals theorem, yn
must converge to the unique point contained in n In . This finishes the
T
proof.
11
continuity of f , lim f (an ) = f (lim an ) = f (p) = f (lim bn ) = lim f (bn ).
As by construction, for all n, f (an ) ≤ y ≤ f (bn ), it follows that f (p) =
y.
12
showing that {xn ; n ∈ Z+ } is Cauchy.
Next suppose that {xn ; n ∈ Z+ } is Cauchy. Then pick any e > 0, and
there must exist N (e) ∈ Z+ such that
m, n ≥ N (e) ⇒ |xm − xn | < e,
or equivalently,
m ≥ N (e) ⇒ |xm − xN (e) | < e ⇒ xN (e) − e < xm < xN (e) + e.
It follows that for all i ∈ Z+ ,
|xi | ≤ max(|x1 |, |x2 |, · · · , |xN (e) |, |xN (e) − e|, |xN (e) + e|) ≡ M.
This implies that the sequence {xn ; n ∈ Z+ } is contained in the closed
interval [−M, M ], and hence by the Bolzano-Weierstrass theorem it has
a subsequence {xh(i) ; i ∈ Z+ } converging to some x0 ∈ [−M, M ]. Note
that the increasing function h : Z+ → Z+ is such that limi↑+∞ h(i) =
+∞, and hence for each N ∈ Z+ there exists M (N ) ∈ Z+ such that
i > M (N ) ⇒ h(i) > N.
Now given any e > 0, there exists N (e) ∈ Z+ such that
e
h(i) > N (e) ⇒ |xh(i) − x0 | < ,
2
and given the same e > 0, there exists L(e) ∈ Z+ such that
e
l, m > L(e) ⇒ |xm − xl | < .
2
Let
K(e) ≡ max(L(e), M (N (e)), N (e)),
and we have, given e > 0,
i > K(e) ⇒ |xi − x0 | < |xi − xh(i) | + |xh(i) − x0 | < e,
which, by definition, shows that
lim xi = x0 ,
i→∞
13
16. (Heine-Borel Theorem) If G is a family of open intervals covering
the closed interval [a, b], then a finite sub-family of G also covers [a, b].
Equivalently, [a, b] is compact.
Proof Suppose G is a collection of open intervals covering the closed
interval [a, b]. Suppose that it takes an infinite number of elements of G
to cover [a, b] ≡ I1 and we shall demonstrate a contradiction. Note that
it takes an infinite number of elements of G to cover either [a, a+b 2
] or
a+b a+b a+b
[ 2 , b]. Let I2 be the subinterval between [a, 2 ] and [ 2 , b] that takes
an infinite number of elements in G to cover. Now turn the spotlight
from I1 to I2 and and chop I2 into two subintervals with equal length.
Now it takes an infinite number of elements of G to cover one of the two
subintervals (call it I3 !). Repeating this argument, we obtain a sequence
of nested closed intervals {In ; n ∈ Z+ } with limn→+∞ (bn −an ) = 0, such
that it takes an infinite number of elements of G to cover each In . There
is, however, one unique point x ∈ n In , and there must be some G ∈ G
T
· · · x1k
x11 x12
x21 x22 · · · x2k
Nk = .. .. . .
. . · · · ..
xk1 xk2 · · · xkk
A matrix N is negative definite if and only if for all k = 1, 2, · · · , n,
(−1)k |Nk | > 0, where |Nk | stands for the determinant of Nk . If P
is positive definite, then −P is negative definite, and since | − Pk | =
14
(−1)k |Pk |, an equivalent condition for P to be positive definite is that
for all k = 1, 2, · · · , n, |Pk | > 0. The equivalent conditons for semi-
definite matrices are more complicated; see Lecture 1, sections 30-31.
2
which will be referred to as the gradient of f . Let D 2 f : Rn → Rn be
the matrix function
∂2f ∂2f ∂2f
∂x1 ∂x1 ∂x1 ∂x2
··· ∂x1 ∂xn
∂2f ∂2f ∂2f
···
∂x2 ∂x1 ∂x2 ∂x2 ∂x2 ∂xn
D2 f = ,
.. .. .. ..
. . . .
∂2f ∂2f ∂2f
∂xn ∂x1 ∂xn ∂x2
··· ∂xn ∂xn
21. Functions to appear in this and the next sections will be twice differ-
entiable. A necessary condition for x∗ ∈ Rn to solve (P1)
max f (x)
x∈Rn
15
is that Df (x∗ ) = 0. This condition is also sufficient if f is a concave
function. Consider the following maximization program (P2):
max f (x)
x∈Rn
subject to
∀i = 1, 2, · · · , m, gi (x) = 0,
where m < n, f is concave, and each gi : Rn → R is affine. An
equivalence condition (Lagrange Theorem) for x∗ to solve (P2) is the
existence of m constants (called Lagrange multipliers) π1 , π2 , · · · , πm
such that (i) ∀i = 1, 2, · · · , m, gi (x) = 0; and (ii)
m
πi Dgi (x∗ ) = Df (x∗ ).
X
i=1
∂g ∂g
Suppose that given a solution (x∗ , y ∗ ), either ∂x or ∂y is non-zero.
Suppose the latter is true. Then by the implicit function theorem, we
can represent y = h(x) locally, with
∂g
h0 (x) = − ∂x
∂g .
∂y
16
which means
∂f ∂f
∂x ∂y
∂g = ∂g ≡ π.
∂x ∂y
Df = πDg.
max f (x)
x∈Rn
subject to
∀i = 1, 2, · · · , m, gi (x) ≥ 0,
where f and each gi : Rn → R are concave. An equivalence condition
(Kuhn-Tucker Theorem) for x∗ to solve (P2), in case the set of vectors
{Dgi (x∗ ) : gi (x∗ ) = 0} is linearly independent, is the existence of m
positive constants (called Lagrange multipliers) π1 , π2 , · · · , πm such that
(i)
m
πi Dgi (x∗ ) = Df (x∗ );
X
i=1
23. The linear space RN can be made into a normed linear space. The
N N
qPR is defined as follows: If x = (x1 , x2 , · · · , xN ) ∈ R ,
usual norm for
N 2 9
then kxk ≡ i=1 xi ; cf. section 4.
9
Consider the set M of all m × n matrices. This is a linear space. Define for all A ∈ M
q
kAk = tr(AA0 ).
17
24. For all x = (x1 , x2 , · · · , xN ) and y = (y1 , y2 , · · · , yN ) belonging to RN ,
define qthe dot product x · y ≡ N i=1 xi yi . Moreover, let d(x, y) ≡ kx −
P
PN 2
yk ≡ i=1 (xi − yi ) represent the distance between x and y. This
function d : R ×R → R+ is a metric (verify it!). The pair (RN , d) is
N N
a metric space. Given d(·, ·), the set B(x, e) = {y ∈ RN : d(x, y) < e}
is called an open ball for all x ∈ RN and all e > 0. An open set in
RN is a union of open balls. The collection of all open sets in RN is
called the standard topology of RN . A subset F of RN is closed if its
complement, denoted F c , is open.
26. Let p ∈ RN with 0 < kpk and α ∈ R. The following set is called a
hyperplane in RN :
H(p, α) ≡ {x : p · x = α, x ∈ RN }.
18
30. Given m points a1 , a2 , · · · , am ∈ RN , the polyhedral cone generated by
these points is the set K ≡ { m i=1 ti ai : ti ∈ R+ }.
P
limx→y f (x) = inf sup f (x), limx→y f (x) = sup inf f (x). (4)
δ>0 0<|x−y|<δ δ>0 0<|x−y|<δ
19