0% found this document useful (0 votes)
8 views19 pages

Stochastic Processes: Set Theory Basics

The document provides an overview of foundational concepts in stochastic processes, including sets, functions, binary relations, metric spaces, and linear spaces. It discusses the properties of sets, types of functions, and relations such as equivalence and order relations, as well as the definitions and properties of metric spaces and linear spaces. Additionally, it introduces Banach and Hilbert spaces, emphasizing their completeness and specific properties like the Projection Theorem.

Uploaded by

chen79how
Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views19 pages

Stochastic Processes: Set Theory Basics

The document provides an overview of foundational concepts in stochastic processes, including sets, functions, binary relations, metric spaces, and linear spaces. It discusses the properties of sets, types of functions, and relations such as equivalence and order relations, as well as the definitions and properties of metric spaces and linear spaces. Additionally, it introduces Banach and Hilbert spaces, emphasizing their completeness and specific properties like the Projection Theorem.

Uploaded by

chen79how
Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Stochastic Processes

Lecture 2: Preliminaries

1. (Sets and Functions) A well-defined collection of objects (called ele-


ments) is a set. A set of sets is usually referred to as a collection, and
a set of collections is usually referred to as a family. A set contains
nothing is called an empty set and denoted ∅. A set contains exactly
one element is a singleton. If one has a universe set X in mind (so
that all sets discussed are subsets of X), then define the complement
of A ⊂ X to be the set containing exactly those x contained by X but
not by A. The complement of A is denoted by Ac . If given two sets
A and B, x ∈ A ⇒ x ∈ B, then we say A ⊂ B (A is a subset of B;
B a superset of A), and if further there exists y ∈ B, and y ∈ Ac ,
then we say A is a proper subset of B. Two sets A and B are equal,
if A ⊂ B and B ⊂ A. Given two sets A and B, an assigning rule f
is a (single-valued) function from A into B if ∀a ∈ A, there is b ∈ B
such that f (a) = b. In this case, we write f : A → B, and A and
B are called the domain and range of f respectively. If A is itself a
collection of sets, then we say that f is a set function. If instead B is
a collection of sets then f is a multi-valued function (or a correspon-
dence). Given f : A → B, the sets f (C) = {f (a) : a ∈ C ⊂ A} and
f −1 (D) = {a ∈ A : f (a) ∈ D ⊂ B} are called the image of C and
the pre-image of D under f . The function f : A → B is surjective if
B ⊂ f (A), injective if x, y ∈ A, x 6= y ⇒ f (x) 6= f (y), and bijective
if it is both surjective ane injective. A bijective function is also called
a one-to-one correspondence. A bijective function f : A → B has an
inverse function f −1 : B → A which assigns for each b ∈ B “the”
a ∈ A that satisfies f (a) = b. (Depending on the context, the notation
should not create confusion with the pre-image operator.) Two sets
A and B are of the same dimensionality (cardinality) if we can find
a one-to-one correspondence between them. A set A is finite if it is
empty or there is a one-to-one correspondence between A and the set
{1, 2, · · · , n} for some n ∈ Z+ , where Z+ stands for the set of strictly
positive integers.1 A set A is countably infinite (denumerable) if there
is a one-to-one correspondence between A and Z+ . A set which is nei-
1
The notation Z+ is taken from James R. Munkres, 1975, Topology: A First Course,
New Jersey: Prentice-Hall. It represents the set of natural numbers. We shall interchange-

1
ther finite nor countably infinite is uncountable. A set is countable if it
is not uncountable.2 Given an indexed family of sets {Ab ; b ∈ β}, where
β is an arbitrary non-empty index set, the Cartesian or direct product
of these sets is Πb∈β Ab which contains all functions f : β → b∈β Ab
S

such that f (b) ∈ Ab for all b ∈ β. The union of these sets, denoted
b∈β Ab , contains exactly those x contained by some Ab . The intersec-
S

tion of these sets, denoted b∈β Ab contains exactly those x contained


T

by all Ab ’s. Two sets are disjoint if they have an empty intersection.
A collection {Ab ; b ∈ β} of subsets of X is called a partition of X if
b∈β Ab = X and ∀a, b ∈ β, a 6= b ⇒ Aa Ab = ∅. DeMorgan’s law
S T

says that
[ A b ]c = Acb , [ Ab ]c = Acb .
[ \ \ [
(1)
b∈β b∈β b∈β b∈β

Denote by 2X the collection containing all subsets of X, which will be


referred to as the power set of X. The following facts are well known.
A countable union or a finite product of countable sets is countable. A
infinite product of countable sets need not be countable; in particular,
the power set of Z+ is uncountable.

2. (Binary Relations) A binary relation on a set A is a subset C of the


Cartesian product A × A. If (x, y) ∈ C, we write xCy. Two relations
which are more relevant to our subsequent analyses are equivalence
relation and order relation. A relation C is an equivalence relation if
(i) ∀x ∈ A, xCx;
(ii) ∀x, y ∈ A, xCy ⇒ yCx;
(iii) ∀x, y, z ∈ A, xCy, yCz ⇒ xCz.
It can be verified that an equivalence relation on A gives rise to a
collection of mutually disjoint subsets of which the union is A; this
collection of subsets is a partition of A. A relation is an order relation
ably use the notation N to stand for the same set, as in Kai Lai Chung, 1974, A Course
in Probability Theory.
2
The following three statements are apparently equivalent: (i) A is countable; (ii) there
is a surjective function f : Z+ → A; (iii) there is an injective function g : A → Z+ . Because
of (ii) a subset of a countable set is apparently countable. Because of (iii), Z2+ is countable:
simply define g(n, m) = 2n 3m . Now since the set of positive rationals has a one-to-one
correspondence with Z2+ , the former is countable. The following three statements are also
equivalent, but less apparently: (a) A is infinite; (b) there exists an injective function
f : Z+ → A; (c) there exists a bijective function of A with a proper subset of itself.

2
if
(i) ∀x, y ∈ A, x 6= y, either xCy or yCx but never xCx;
(ii) ∀x, y, z ∈ A, xCy, yCz ⇒ xCz.
Given an order relation C on A and given two distinct elements of A,
say aCb, the set {z ∈ A : aCz, zCb} is an open interval. From now on,
use > to represent an order relation. Define ≥ as such that ∀x, y ∈ X,
x ≥ y if it is not true that y > x. (Thus ≥ is another binary relation on
X.) Suppose that (X, >) is an ordered set and A ⊂ X. We say a ∈ A
is the largest element of A if ∀b ∈ A, a ≥ b. The smallest element is
defined analogously. We say that A is bounded above in X if for some
x ∈ X, we have x ≥ a for all a ∈ A, and in this case, x is an upper
bound of A. Similarly we can define bounded below and lower bounds.
If there exists in X a largest element in the set of lower bounds of A,
denoted inf A, we call it the greatest lower bound (or the infimum) of
A. Similarly we can define the least upper bound (or the supremum)
of A, and denote it by sup A. An ordered set (X, >) is said to have the
least upper bound property if each of its nonempty subsets has a least
upper bound in X.

3. (Metric Spaces) Given any non-empty set X a function d : X × X →


R+ (where R+ is the set of non-negative real numbers) is a metric if
(i) d(x, y) = 0 ⇔ x = y, ∀x, y ∈ X;
(ii) d(x, y) = d(y, x) ≥ 0, ∀x, y ∈ X;
(iii) d(x, y) + d(y, z) ≥ d(x, z).
The pair (X, d) is a metric space. If (i) is replaced by
(i)0 d(x, x) = 0, ∀x ∈ X,
then d and (X, d) are referred to respectively as a pseudo-metric and
a pseudo-metric space. A function x : Z+ → X is called a sequence
in (X, d). It is said to converge to z ∈ X if for all real  > 0 there
exists n() ∈ Z+ such that n ≥ n() ⇒ d(x(n), z) < . A sequence
is Cauchy if for all e > 0, there exists n(e) ∈ Z+ such that m, n ≥
n(e) ⇒ d(x(m), x(n)) < e. Given two metric spaces (X, d) and (Y, d0 ), a
function f : X → Y is continuous if {f (x(n))} is a convergent sequence
in (Y, d0 ) whenever {x(n)} is a convergent sequence in (X, d). Given
(X, d), for all z ∈ X and e ∈ R++ , the subsets B(z, e) = {x ∈ X :
d(x, z) < e} are called open balls in X. A subset G of X is open if it
can be represented as a union of open balls. A subset F of X is closed

3
if F c is open. The closure of G ⊂ X is the smallest (with respect to
the order relation “inclusion” on X) closed set containing G, and the
interior of F is the largest open set contained in F . A point x ∈ X is
a limit point of A ⊂ X if for all e > 0, B(x, e) A {x}c 6= ∅. The set
T T

of limit points of A, denoted A0 , is the derived set of A. A point x ∈ A


is an isolated point if for some e > 0, B(x, e) A {x}c = ∅. The
T T

closed subset A of X is perfect if it contains no isolated points. The


boundary of A is the intersection of the closure of A and the closure
of Ac in (X, d). The collection τ of all the open sets on X is called
a topology on X. The pair (X, τ ) is called a topological space. The
topological space (X, τ ) (or the topology τ ) is called Hausdorff if for all
x, y ∈ X, x 6= y, there exists open sets Gx , Gy ∈ τ such that x ∈ Gx ,
y ∈ Gy , and Gx Gy = ∅. Thus every metric space is Hausdorff. (Note
T

that a pseudo-metric also induces a topology on X in essentially the


same way, but the resulting topological space need not be Hausdorff;
for details of topological spaces, see the Note on Topology.)

4. (Linear Spaces) A set V is a (real) linear space on R, where R is


the set of real numbers, if it is equipped with and closed under op-
erations of vector addition and scalar multiplication together with a
number of well-defined operational (such as commutative and associa-
tive) laws. Elements in V are called vectors. Given a finite number of
elements x1 , x2 , · · · , xn in V , an element y ∈ V is a linear combination
of these elements if there exist real numbers a1 , a2 , · · · , an such that
y = ni=1 ai xi . A subset B of V is said to span V if all vectors in V
P

are linear combinations of elements of B. If no proper subset of B can


span B, then B is linearly independent, and in this case B is a basis
for V if B spans V . The cardinality of (the number of elements in)
a basis (any basis) is called the linear space V ’s dimension. V may
be finite-dimenional or infinite-dimensional depending on whether the
cardinality of B is finite or infinite. A subset A ⊂ V is convex if for
all λ ∈ [0, 1] and for all x, y ∈ A, λx + (1 − λ)y ∈ A. A norm of a
real linear space L is a measure of “lengths” of vectors in L. For ex-
ample, R2 is a real linear space q endowed with the following norm: For
2
all x = (x1 , x2 ) ∈ R , kxk ≡ x21 + x22 . For general real linear spaces
L, a norm k · k is formally defined as a function with L as domain and
R+ as range which satisfies the following 4 properties:

4
(i) kxk = 0 ⇒ x = 0 (that is, only the zero vector has zero length);
(ii) kxk ≥ 0, ∀x ∈ L (that is, the length of any vector cannot be nega-
tive);
(iii) kaxk = |a|kxk for all x ∈ L and all a ∈ R; and
(iv) kx + yk ≤ kxk + kyk, ∀x, y ∈ L (that is, the triangular inequality
must hold).
Note that k · k induces a metric d on L by

d(x, y) = kx − yk, ∀x, y ∈ L.

It is easy to see that d satisfies all the defining properties of a metric.


Thus a normed (real) linear space can always be regarded as a metric
space with the norm-induced metric, although a metric space need
not even be a linear space. A pseudo-norm is a function from L into
R+ that satisfies the above defining properties (ii)-(iv) of a norm. A
pseudo-norm induces a pseudo-metric via the above same definition.
(From now on we shall consider only real linear spaces, and whenever
we mention a linear space it will be understood that the scalar field is
R.)

5. A normed (real) linear space L is a Banach space if every Cauchy se-


quence {xn } ⊂ L converges to some element of L. In other words,
Banach spaces are “complete normed linear spaces.” We are interested
in a special kind of Banach spaces called “Hilbert spaces.” These are
those Banach spaces that satisfy the following “Parallelogram Equal-
ity”: for all x, y ∈ L,

kx + yk2 + kx − yk2 = 2kxk2 + 2kyk2 . (2)

6. The following “Projection Theorem” holds true for all Hilbert spaces:
Suppose L is a Hilbert space and M ⊂ L is itself a closed subspace
given the same norm k · k. Then, for each x ∈ L, there exists a unique
x̂ ∈ M such that

kx − x̂k ≤ kx − mk, ∀m ∈ M. (3)

7. (Real Functions and Sequences) If A = Z+ and B = R, then


f : A → B is called a real (real number) sequence, and we write f (n)

5
as fn for all n ∈ Z+ . We say that l ∈ R is a limit of the sequence {xn }
if for all  > 0 there is an N () ∈ Z+ such that

n ∈ Z+ , n ≥ N () ⇒ |xn − l| < .

In this case we say that {xn } converges to l and {xn } a convergent


sequence.3 The limit of a sequence so defined is unique. Given a se-
quence {xn } in R, define limxn = inf n supk≥n xk and limxn = supn inf k≥n xk ,
which will be shown to always exist in R {−∞, +∞}, and they are
S

equal iff {xn } converges; see section 10 below. A set I ⊂ R is an in-


terval if it is not a singleton (a one-point set) and if x, y ∈ I, x < y
implies that there exists some z ∈ I such that x < z < y. We also
write R = (−∞, +∞). A function g : I → R, where I is some interval,
is continuous if for all x ∈ I and all {xn } converging to x in I, {g(xn )}
is a sequence converging to g(x). The derivative of g(·) at x ∈ I, if it
exists (it is a finite number), is the (common) limit of g(xxnn)−g(x)
−x
for all
{xn } converging to x. If the derivative of g is defined for all x ∈ I,
written g 0 (x), we say that g is differentiable on I. In this case, g 0 (·)
itself is a mapping from I into R. If g 0 is a continuous function, we say
that g is continuously differentiable on I.

8. A subset A ⊂ R is bounded above by u ∈ R, if for all x ∈ A, x ≤ u. In


this case, u is called an upper bound of the set A. Lower bounds are
similarly defined. The set A is bounded if it has an upper and a lower
bounds.

9. (Axiom of Continuity) If the real sequence {xn } is increasing (i.e.


xn+1 ≥ xn , for all n ∈ Z+ ), and if xn ≤ M ∈ R for all n ∈ Z+ , then
{xn } converges to some l ≤ M and xn ≤ l for all n ∈ Z+ .

10. In this section we define supremum, infimum, limit superior, and limit
inferior.

• (Nested Interval Theorem) Suppose that {[an , bn ]; Z+ } is a se-


quence of closed intervals such that for all n, [an+1 , bn+1 ] ⊂ [an , bn ]
and limn→+∞ (bn − an ) = 0. Then, n∈Z+ [an , bn ] is a singleton.
T

3
Hereafter we abuse the notation {xn ; n ∈ Z+ } to let it denote both the function
x : Z+ → R itself and its image set {x(n); n ∈ Z+ }.

6
Proof Since {an } is increasing and bounded above, by the axiom
of continuity it converges to some a ∈ R. Of course, a ≥ an , ∀n.
Similarly, for some b ∈ R, bn → b, with b ≤ bn , ∀n. Since
limn→∞ (bn − an ) = 0, we have a = b, which is contained in every
In ≡ [an , bn ]. Suppose there is another a0 ∈ ∩n In . Without loss
of generality, suppose a > a0 . Then, (a0 , a) ⊂ In for all n, and
hence bn − an ≥ (a0 − a) > 0 for all n, contradicting the fact that
limn→∞ (bn − an ) = 0.
• (Least Upper Bound Property of R) Suppose that A is a non-
empty subset of R, and A is bounded above by some u ∈ R. Let
U be the set of upper bounds of A. Then, there exists a unique
member of U , denoted sup A, which is the minimal element of U ,
also written as min U . That is, any non-empty subset of R that
is bounded above has a least upper bound. (Similarly, any non-
empty subset of R that is bounded below has a greatest lower
bound.) Moreover, for all  > 0, there exists some a ∈ A such
that a > sup A − .
Proof As A is nonempty, let x ∈ A, and relabel a1 = x, b1 = u.
If u ∈ A, then u = sup A obviously; or else, take x+u 2
. Now
either x < x+u2
∈ U c
or u > x+u
2
∈ U . In the former case, label
x+u
a2 = 2 and b2 = u; and in the latter case, label a2 = x and
b2 = x+u2
. In either case, there will be an element of A which
is greater than or equal to a2 , and b2 remains an upper bound
of A. Continuing this way, either we have bn ∈ A for some n so
that bn = sup A, or we have an infinite sequence of nested closed
intervals {[an , bn ]; n ∈ Z+ } of which the lengths converge to zero.
In the latter case, by the nested intervals theorem, there must be a
unique number limn→∞ bn which is an upper bound of A (because
each and every bn is), and if v < limn→∞ bn then v < an for some
n, and hence there is some element of A greater than v, showing
that v cannot be an upper bound of A. That is, limn→∞ bn is the
desired least upper bound of A in the latter case. This shows that
sup A exists.
Finally, consider the last assertion. Suppose instead that there
exists some  > 0 such that for all a ∈ A with a +  ≤ sup A. This
would mean that sup A −  is also an upper bound of A, which is

7
less than sup A, a contradiction!4
• Define the extended real line R ≡ R {+∞, −∞}. Define sup ∅ =
S

−∞ and inf ∅ = +∞. If A ⊂ R is unbounded above (there exists


no upper bound for A in R), then define sup A = +∞. If A ⊂ R
is unbounded below (there exists no lower bound for A in R), then
define inf A = −∞. In this way, all subsets of R have well-defined
suprema and infima in R.
• Suppose that {xn ; n ∈ Z+ } is a real sequence. Recall from section
7 its limit inferior
limxn ≡ sup inf xk
n∈Z+ k≥n

and its limit superior

limxn ≡ inf sup xk .


n∈Z+ k≥n

Both limits exist in extended real line.


Proof Note that the real sequence yn ≡ supk≥n xk is decreasing;
see below for a proof. If it is bounded below, then it converges to
some l ∈ R, which one can easily verify to be inf n yn ; or else, we
define its infimum by −∞. Thus limit superior exists in extended
real line. The claim for limit inferior can be analogously proved.
• For any two real sequences {xn } and {yn },

limxn + limyn ≤ lim(xn + yn ) ≤ limxn + limyn

≤ lim(xn + yn ) ≤ limxn + limyn .


4
It follows that we can find a sequence {yn ; n ∈ Z+ } in A such that limn→∞ yn = sup A:
for each n ∈ Z+ , there exists yn ∈ A such that
1 1
sup A − < yn ≤ sup A < sup A + ,
n n
or
1
d(sup A, yn ) < .
n
This also implies that sup A is a limit point of the set A when < is endowed with the
T for each e > 0, there exists some y(e) ∈ A \ {sup A} such that y(e) ∈
usual topology:
B(sup A, e) A. The same conclusion applies to inf A also.

8
Proof First recall that if inf A = z, then (i) a ∈ A ⇒ a ≥ z;
and (ii) e > 0 ⇒ a < z + e for some a ∈ A. In fact the converse
is also true. Now we claim that u = limxn , if and only if (i)
e > 0 ⇒ xk < u + e for all k ≥ some n; and (ii) given e > 0
and n, we have xk > u − e for some k ≥ n. To this end, define
yn ≡ supk≥n xk ≥ u for all n. Given n and e > 0, there exists
some k ≥ n such that xk ≥ yn − e ≥ u − e, which is (ii). On the
other hand, given e > 0 there exists some n such that yn < u + e,
which implies that for all k ≥ n, xk < u + e. Thus we obtain (i).5
Now note that inf k≥n xk ≤ supk≥n xk , and hence limxn ≤ limxn .
It is obvious that6 lim(−xn ) = −limxn and that xn converges in
the extended real line if and only if the upper and the lower limits
coincide.7
Now to prove these inequalities, we observe that given n,

sup xk + sup yk ≥ xk + yk , ∀k ≥ n;
k≥n k≥n

that is, supk≥n xk + supk≥n yk is one upper bound of the set {xk +
yk ; k ≥ n}. By definition, any upper bound is greater than or
equal to the least upper bound, and hence we conclude that

sup xk (≡ an ) + sup yk (≡ bn ) ≥ sup(xk + yk ) ≡ cn .


k≥n k≥n k≥n

Now, note that {an }, {bn }, and {cn } are all decreasing sequences
converging on the extended real line,8 and term by term, an +bn ≥
5
Similarly, l is the lower limit of {xn } if and only if (i) given e > 0 and n, xk < l + e
for some k ≥ n; and (ii) given e > 0 there is some n such that xk > l − e for all k ≥ n.
6
Recall that sup{−x : x ∈ A ⊂ R} = − inf A.
7
Note that e > 0 ⇒ l − e < xk < u + e for all k ≥ some n. If u = l, this is identical to
the definition that {xn } converges to l. On the other hand, it is easy to show that for a
convergent sequence, the upper and the lower limit must be the same.
8
Note that
sup xk ≥ xj , ∀j ≥ n
k≥n

⇒ sup xk ≥ xj , ∀j ≥ n + 1
k≥n

⇒ sup xk is one upper bound of {xj ; j ≥ n + 1}


k≥n

9
cn , we have
lim an + lim bn ≥ lim cn ,
or equivalently,

limxn + limyn ≥ lim(xn + yn ).

Immediately, we have

lim(xn + yn ) + lim(−yn ) ≥ limxn ,

which implies that

lim(xn + yn ) ≥ limxn − lim(−yn )

= limxn + limyn .
The rest inequalities can be analogously proved.

11. (Bolzano-Weierstrass Theorem) If h : Z+ → Z+ is increasing and


x : Z+ → R, then {x(h(n))} is called a subsequence of {x(n)}. Every
bounded sequence in R contains a convergent subsequence.
Proof In case that the sequence only takes a finite number of values
in R (that is, the image of the function x : Z+ → R is a finite set), it is
easy to see that the sequence contains a constant subsequence, which
converges trivially. Thus we consider the case where the sequence con-
tains an infinite number of terms taking distinct values. By assumption,
{xn ; n ∈ Z+ } is bounded, and hence for all n ∈ Z+ , xn ∈ [a, b] for some
a, b ∈ R with b > a. It follows that an infinite number of distinct
values must be contained in either [a, a+b
2
] or [ a+b
2
, b]. We pick one term
from the subinterval that contains an infinite number of distinct terms,
and call it y1 . Keep chopping that subinterval into two, and repeat the
procedure to pick y2 . Continuing this way, we can create a subsequence
⇒ sup xk ≥ the least upper bound of {xj ; j ≥ n + 1}
k≥n

⇒ an = sup xk ≥ sup xk = an+1


k≥n k≥n+1

⇒ {an ; n ∈ Z+ } is a decreasing sequence.

10
{yn } from the original sequence {xn }, and correspondingly a sequence
of nested closed intervals {In } such that the length of In converges to
zero and yn ∈ In for all n ∈ Z+ . By the nested intervals theorem, yn
must converge to the unique point contained in n In . This finishes the
T

proof.

12. A function f : A ⊂ R → R is continuous at x ∈ A if for all  > 0,


there exists δ(x, ) > 0 such that

∀y ∈ A, |x − y| < δ(x, ) ⇒ |f (x) − f (y)| < .

The function is uniformly continuous at x if δ can be chosen to be


independent of x. We say f is a (uniformly) continuous function on A
if it is (uniformly) continuous at x for all x ∈ A. If f : [a, b] → R is
continuous on [a, b], then it is uniformly continuous on [a, b].
Proof Define A ≡ [a, b]. Suppose that f : A → R is continuous but not
uniformly continuous. Then there exists e > 0 such that for all n, we
can find a pair (xn , yn ) such that |xn − yn | < n1 but |f (xn ) − f (yn )| > e.
Consider a convergent subsequence of {xn }, say converging to x0 , then
by continuity of f , the corresponding subsequence of functional values
also converges to f (x0 ). Since the term-by-term distance between {yn }
and {xn } vanishes in the limit, by continuity of f , the functional values
of the corresponding subsequence of {yn } also converges to f (x0 ), which
is a contradiction.

13. (Intermediate Value Theorem) If f : I → R is continuous where I


is an interval, then f (I) is either a singleton or an interval.
Proof Suppose that f (I) contains two distinct elements z, x ∈ R. We
show that f (I) is an interval. Note that there must exist a, b ∈ I such
that a 6= b and f (a) = z, f (b) = x. Without loss of generality, assume
that a < b and z < x. Pick any y ∈ (z, x), and we shall show that
y ∈ f (I), or equivalently, there exists some element p ∈ I such that
f (p) = y. Define I1 = [a, b] = [a1 , b1 ] and since I is an interval, I1 ⊂ I.
Consider f ( a+b
2
). If it equals y, we are done; or else, if f ( a+b
2
) > y, then
a+b a+b
we define [a, 2 ] = I2 ; otherwise define [ 2 , b] = I2 , where I2 = [a2 , b2 ].
In this way, we create a sequence of nested closed intervals of which the
lengths converge to zero. Now there is a single point in each of these
intervals, called, say, p. It is obvious that lim an = lim bn = p, and by

11
continuity of f , lim f (an ) = f (lim an ) = f (p) = f (lim bn ) = lim f (bn ).
As by construction, for all n, f (an ) ≤ y ≤ f (bn ), it follows that f (p) =
y.

14. (Extreme Value Theorem) If f : [a, b] → R is continuous then


f ([a, b]) is either a singleton or a bounded closed interval. In particular,
there exist m, M ∈ I such that for all x ∈ I, f (m) ≤ f (x) ≤ f (M ).
Proof We first show that f ([a, b]) is bounded. Suppose instead that
for all n there exists xn ∈ [a, b] with f (xn ) > n (the case where for all
n there exists xn ∈ [a, b] with f (xn ) < −n can be dealt with similarly),
and we shall demonstrate a contradiction. Note that the Bolzano-
Weierstrass theorem implies that the bounded sequence {xn } has a
convergent subsequence converging to, say x0 , and by continuity of f ,
the corresponding sequence {f (xn )} of functional values would converge
to f (x0 ), and hence it cannot grow unboundedly, a contradiction.
Next we show that f attains its supremum and infimum on [a, b]. Sup-
pose instead that for all x ∈ [a, b], f (x) < M ≡ sup f (A). But then,
the function H(x) ≡ M −f1 (x) is continuous on [a, b] and hence H([a, b])
must be bounded: for all x ∈ [a, b], 0 < H(x) ≤ M 0 . This implies
that f ([a, b]) has an upper bound M − M1 0 which is smaller than M , a
contradiction.

15. (Cauchy Criterion for Convergence) A sequence {xn } is Cauchy


if for each e > 0, there exists a positive integer Ne ∈ Z+ such that
m, n ∈ Z+ , m, n > Ne implies that |xm − xn | < e. A convergent
sequence in an arbitrary metric space must be Cauchy. In R, every
Cauchy sequence converges; i.e. R endowed with the norm kxk = |x|
is a Banach space.
Proof First we show that a convergent sequence {xn ; n ∈ Z+ } is
Cauchy. Suppose that limn→∞ xn = x0 . By definition, for all e > 0
there must be N (e) ∈ Z+ such that
e e
m, n > N (e) ⇒ |xm − x0 | < , |xn − x0 | < .
2 2
It follows from the trangular inequality that

m, n > N (e) ⇒ |xn − xm | < |xn − x0 | + |x0 − xm | < e,

12
showing that {xn ; n ∈ Z+ } is Cauchy.
Next suppose that {xn ; n ∈ Z+ } is Cauchy. Then pick any e > 0, and
there must exist N (e) ∈ Z+ such that
m, n ≥ N (e) ⇒ |xm − xn | < e,
or equivalently,
m ≥ N (e) ⇒ |xm − xN (e) | < e ⇒ xN (e) − e < xm < xN (e) + e.
It follows that for all i ∈ Z+ ,
|xi | ≤ max(|x1 |, |x2 |, · · · , |xN (e) |, |xN (e) − e|, |xN (e) + e|) ≡ M.
This implies that the sequence {xn ; n ∈ Z+ } is contained in the closed
interval [−M, M ], and hence by the Bolzano-Weierstrass theorem it has
a subsequence {xh(i) ; i ∈ Z+ } converging to some x0 ∈ [−M, M ]. Note
that the increasing function h : Z+ → Z+ is such that limi↑+∞ h(i) =
+∞, and hence for each N ∈ Z+ there exists M (N ) ∈ Z+ such that
i > M (N ) ⇒ h(i) > N.
Now given any e > 0, there exists N (e) ∈ Z+ such that
e
h(i) > N (e) ⇒ |xh(i) − x0 | < ,
2
and given the same e > 0, there exists L(e) ∈ Z+ such that
e
l, m > L(e) ⇒ |xm − xl | < .
2
Let
K(e) ≡ max(L(e), M (N (e)), N (e)),
and we have, given e > 0,
i > K(e) ⇒ |xi − x0 | < |xi − xh(i) | + |xh(i) − x0 | < e,
which, by definition, shows that
lim xi = x0 ,
i→∞

and hence {xn ; n ∈ Z+ } is a convergent sequence.

13
16. (Heine-Borel Theorem) If G is a family of open intervals covering
the closed interval [a, b], then a finite sub-family of G also covers [a, b].
Equivalently, [a, b] is compact.
Proof Suppose G is a collection of open intervals covering the closed
interval [a, b]. Suppose that it takes an infinite number of elements of G
to cover [a, b] ≡ I1 and we shall demonstrate a contradiction. Note that
it takes an infinite number of elements of G to cover either [a, a+b 2
] or
a+b a+b a+b
[ 2 , b]. Let I2 be the subinterval between [a, 2 ] and [ 2 , b] that takes
an infinite number of elements in G to cover. Now turn the spotlight
from I1 to I2 and and chop I2 into two subintervals with equal length.
Now it takes an infinite number of elements of G to cover one of the two
subintervals (call it I3 !). Repeating this argument, we obtain a sequence
of nested closed intervals {In ; n ∈ Z+ } with limn→+∞ (bn −an ) = 0, such
that it takes an infinite number of elements of G to cover each In . There
is, however, one unique point x ∈ n In , and there must be some G ∈ G
T

with x ∈ G. But with limn→+∞ (bn − an ) = 0, there must exist m such


that In ⊂ G for all n ≥ m, so that the single member G of G suffices
to cover all these In ’s, a contradiction to the assumption that all these
In ’s take an infinite number of elements in G to cover.
17. A square matrix Nn×n is negative semi-definite if for all an×1 , a0 N a ≤ 0.
A square matrix P is positive semi-definite if −P is negative semi-
definite. A square matrix Nn×n is negative definite, if for all an×1 6=
0n×1 , a0 N a < 0. A square matrix P is positive definite if −P is negative.
For example, the indentity matrix is positive definite. If P is positive
semi-definite (definite), then for all i = 1, 2, · · · , n, its (i, i) element is
positive (strictly positive).
18. Given Nn×n with its (i, j) element being xij , define for k = 1, 2, · · · , n,

· · · x1k
 
x11 x12

 x21 x22 · · · x2k 

Nk =  .. .. . .
. . · · · ..
 
 
xk1 xk2 · · · xkk
A matrix N is negative definite if and only if for all k = 1, 2, · · · , n,
(−1)k |Nk | > 0, where |Nk | stands for the determinant of Nk . If P
is positive definite, then −P is negative definite, and since | − Pk | =

14
(−1)k |Pk |, an equivalent condition for P to be positive definite is that
for all k = 1, 2, · · · , n, |Pk | > 0. The equivalent conditons for semi-
definite matrices are more complicated; see Lecture 1, sections 30-31.

19. Consider a twice differentiable function f : Rn → R. Let the Df :


Rn → Rn be the vector function
∂f
 
∂x1

 ∂f 

∂x2
Df = .. ,
 


 . 

∂f
∂xn

2
which will be referred to as the gradient of f . Let D 2 f : Rn → Rn be
the matrix function
 ∂2f ∂2f ∂2f 
∂x1 ∂x1 ∂x1 ∂x2
··· ∂x1 ∂xn
∂2f ∂2f ∂2f
···
 
 
∂x2 ∂x1 ∂x2 ∂x2 ∂x2 ∂xn
D2 f = ,
 
 .. .. .. ..

 . . . . 

∂2f ∂2f ∂2f
∂xn ∂x1 ∂xn ∂x2
··· ∂xn ∂xn

which will be referred to as the Hessian of f .

20. A function f : Rn → R is (strictly) concave, if for all x, y ∈ Rn and


all λ ∈ [0, 1], f (λx + (1 − λ)y) ≥ (>)λf (x) + (1 − λ)f (y). A function
is convex if its negative is concave. A function is affine if it is both
concave and convex. An affine function is linear if it passes through
the origin.
Theorem 1 A twice differentiable function f : Rn → R is concave
(strictly concave) if D 2 f is a negative semi-definite (definite) matrix at
each and every (x1 , x2 , · · · , xn ) ∈ Rn .

21. Functions to appear in this and the next sections will be twice differ-
entiable. A necessary condition for x∗ ∈ Rn to solve (P1)

max f (x)
x∈Rn

15
is that Df (x∗ ) = 0. This condition is also sufficient if f is a concave
function. Consider the following maximization program (P2):

max f (x)
x∈Rn

subject to
∀i = 1, 2, · · · , m, gi (x) = 0,
where m < n, f is concave, and each gi : Rn → R is affine. An
equivalence condition (Lagrange Theorem) for x∗ to solve (P2) is the
existence of m constants (called Lagrange multipliers) π1 , π2 , · · · , πm
such that (i) ∀i = 1, 2, · · · , m, gi (x) = 0; and (ii)
m
πi Dgi (x∗ ) = Df (x∗ ).
X

i=1

For example, let m = 1, n = 2. In this case, (P2) is rewritten as

max f (x, y) s.t. g(x, y) = 0.


(x,y)∈R2

∂g ∂g
Suppose that given a solution (x∗ , y ∗ ), either ∂x or ∂y is non-zero.
Suppose the latter is true. Then by the implicit function theorem, we
can represent y = h(x) locally, with
∂g
h0 (x) = − ∂x
∂g .
∂y

That is, in a neighborhood of (x∗ , y ∗ ), we can reproduce the solution


by
max f (x, h(x)),
x

which is an unconstrained problem with a single control variable. It is


necessary that
df (x, h(x))
|[x=x∗ ] = 0,
dx
or, at (x∗ , h(x∗ )),
∂f ∂f 0
+ h = 0,
∂x ∂y

16
which means
∂f ∂f
∂x ∂y
∂g = ∂g ≡ π.
∂x ∂y

One way to reproduce this necessary condition is to form the La-


grangian L(x, y, π) = f (x, y) − πg(x, y), and require that conditions
(i) and (ii) stated above both hold. These are: (i) g(x, y) = 0; and (ii)

Df = πDg.

Here is a familiar application: f (x, y) = xa y 1−a , 0 < a < 1, and


g(x, y) = px + qy − I = 0.

22. Consider the following maximization program (P3):

max f (x)
x∈Rn

subject to
∀i = 1, 2, · · · , m, gi (x) ≥ 0,
where f and each gi : Rn → R are concave. An equivalence condition
(Kuhn-Tucker Theorem) for x∗ to solve (P2), in case the set of vectors
{Dgi (x∗ ) : gi (x∗ ) = 0} is linearly independent, is the existence of m
positive constants (called Lagrange multipliers) π1 , π2 , · · · , πm such that
(i)
m
πi Dgi (x∗ ) = Df (x∗ );
X

i=1

and (ii) (complementary slackness) for all i = 1, 2, · · · , m, πi gi (x∗ ) = 0.

23. The linear space RN can be made into a normed linear space. The
N N
qPR is defined as follows: If x = (x1 , x2 , · · · , xN ) ∈ R ,
usual norm for
N 2 9
then kxk ≡ i=1 xi ; cf. section 4.

9
Consider the set M of all m × n matrices. This is a linear space. Define for all A ∈ M
q
kAk = tr(AA0 ).

Determine if k · k is a well-defined norm on M .

17
24. For all x = (x1 , x2 , · · · , xN ) and y = (y1 , y2 , · · · , yN ) belonging to RN ,
define qthe dot product x · y ≡ N i=1 xi yi . Moreover, let d(x, y) ≡ kx −
P
PN 2
yk ≡ i=1 (xi − yi ) represent the distance between x and y. This
function d : R ×R → R+ is a metric (verify it!). The pair (RN , d) is
N N

a metric space. Given d(·, ·), the set B(x, e) = {y ∈ RN : d(x, y) < e}
is called an open ball for all x ∈ RN and all e > 0. An open set in
RN is a union of open balls. The collection of all open sets in RN is
called the standard topology of RN . A subset F of RN is closed if its
complement, denoted F c , is open.

25. Given A ⊂ RN , the interior of A, denoted Int(A), is the largest open


set contained in A; and the closure of A, denoted A, is the smallest
closed set containing A. A set A is open if Int(A) = A, and it is closed
if A = A. These just repeat our earlier general definitions for metric
spaces.

26. Let p ∈ RN with 0 < kpk and α ∈ R. The following set is called a
hyperplane in RN :

H(p, α) ≡ {x : p · x = α, x ∈ RN }.

27. A hyperplane H(p, α) ⊂ RN is closed and convex. A hyperplane divides


RN into two half spaces H1 = {x : p · x ≥ α, x ∈ RN } and H2 = {x :
p · x ≤ α, x ∈ RN }, and the latter are both closed and convex.

28. Theorem 2 (von Neumann and Morgenstern, 1947) Let X ⊂ RN


be nonempty, closed, and convex. Let x ∈ X c . Then, (i) there exists
a ∈ X such that d(x, a) > 0 and for all y ∈ X, d(y, x) ≥ d(a, x); and
(ii) there exists hyperplane H(p, α) such that X ⊂ H1 and x ∈ H1c . If
X is not closed and x ∈ X c , then there exists H(p, p · x) with such that
X ⊂ H1 .

29. Theorem 3 (Minkowski) Let X, Y ⊂ RN be disjoint, nonempty, and


convex. Then there exists H(p, α) such that X ⊂ H1 and Y ⊂ H2 .
Moreover, if Int(X) represents the interior of X, then Int(X) ⊂ H2c .
Moreover, if X is closed and Y is compact (or the other way around),
then either α > p · y for all y ∈ Y or α < p · x for all x ∈ X.

18
30. Given m points a1 , a2 , · · · , am ∈ RN , the polyhedral cone generated by
these points is the set K ≡ { m i=1 ti ai : ti ∈ R+ }.
P

31. (Farkas’ Lemma) Suppose a1 , a2 , · · · , am , b ∈ RN and b 6= 0. More-


over, suppose that b · x ≥ 0 whenever x ∈ RN and ai · x ≥ 0 for
all i. Then b is contained in the polyhedral cone K generated by
a1 , a2 , · · · , am .
32. A real function f : R → R is a step function if there exists a finite
collection of intervals which constitutes one partition of R, such that
the functional value of f does not vary on the interior of any given
partition set. A related concept is simple functions, which cannot be
defined without introducing measure theory first.
33. A function f : I → R is called an extended real-valued function on I,
where I ⊂ R is an interval and R = R {+∞, −∞} is the extended
S

real line. Define for y ∈ Int(I)

limx→y f (x) = inf sup f (x), limx→y f (x) = sup inf f (x). (4)
δ>0 0<|x−y|<δ δ>0 0<|x−y|<δ

An extended real-valued function f is lower (respectively, upper) semi-


continuous at y if f (y) 6= −∞ and f (y) ≤ limx→y f (x) (respectively,
f (y) 6= +∞ and f (y) ≥ limx→y f (x)). Such a function is continuous if
and only if it is both upper and lower semi-continuous at each point in
domain. A step function is lower (respectively, upper) semi-continuous
if at most it jumps down (respectively, up) at the boundary of two con-
secutive partition sets in the domain (that defines this step function).
34. Consider the function g : I → <, where I ⊂ <. At each x ∈ Int(I), we
can always define the 4 Dini’s derivates in the extended real line:
g(x + h) − g(x) g(x) − g(x − h)
D + g(x) = limh↓0 , D − g(x) = limh↓0 ,
h h
(5)
g(x + h) − g(x) g(x) − g(x − h)
D+ g(x) = limh↓0 , D− g(x) = limh↓0 .
h h
It can be verified that g 0 (x) exists if and only if the above 4 derivates
are finite and equal.

19

You might also like