Analysis 2 Notes
Analysis 2 Notes
December 1, 2023
Chapter 0. Introduction to the module Analysis II, Term I, Page i
Contents
Chapter 1
Differentiation in higher
dimensions
Q = {p/q | p ∈ Z, q ∈ Z \ {0}},
and the set of real numbers R. The set of real numbers is obtained as the completion
of Q. We may add, multiply and subtract elements of R, and we can divide by
elements of R \ {0}. Note that some authors use the notation N to denote the set
{0, 1, 2, . . . }, but we will omit 0 from this set.
On R we have a notion of ordering ≤, so that we may say whether a real number is
greater than, less than or equal to another. Moreover, R satisfies the completeness
axiom, that is, if A ⊂ R is non-empty and bounded above, then A has a least upper
bound. The standard notation for the least upper bound of A is sup(A).
The third property in the above list is called the triangle inequality for the mod-
ulus function.
x = x1 , . . . , xn , y = y1, . . . , yn ,
x + y = x1 + y 1 , . . . , xn + y n .
λx = λx1 , . . . , λxn .
is defined as
n
X
1 n 1 n
(x , . . . , x ), (y , . . . , y ) = xi y i .
i=1
Using the inner product, we may define the length, or norm, function
k·k : Rn → [0, ∞)
as
hx, xi = hx, xi1/2 .
p
kxk =
Note that the inner product of two vectors is a real number, not a vector.
The norm function on Rn has the following properties:
The third property in the above list is called the triangle inequality for the norm
on Rn .
Remark 1.1. As we shall see later, these properties can be used in an abstract
fashion to define more general “normed vector spaces”. The norm gives us a use-
ful notion of “distance” between two points, that is, the distance from x to y is
given by kx − yk. Notice that if n = 1 we have |·| = k·k, and we will use either
interchangeably in this case.
Exercise 1.1. (a) Show that the inner product satisfies the following properties:
for all x, y, and z in Rn and all a ∈ R,
kxk − kyk ≤ kx − yk
x0 , x1 , x2 , . . . ,
if the following holds: For every ǫ > 0, there exists N ∈ N such that for all i ≥ N
we have
kxi − xk < ǫ.
We then write:
xi → x, as i → ∞,
or
lim xi = x.
i→∞
One may compare the above definition to the one for convergence of a sequence
of real numbers. Indeed, this notion is intimately related to convergence of real
numbers, as stated in the next lemma.
xki → xk , as i → ∞.
so we deduce
√
kxi − xk ≤ n max xki − xk < ǫ.
k=1,...,n
kxi − xk < ǫ.
max y k ≤ kyk .
k=1,...,n
lim xi = x, lim yi = y.
i→∞ i→∞
deduce that
kxi k → kxk as i → ∞.
ai xi → ax, as i → ∞.
or
But this is very restrictive and does not capture the same level of generality of
intervals in dimension one. The domains of maps in higher dimensions may appear
in many forms. Due to this, we present a class of subsets of Rn , called open sets.
For x ∈ Rn and the real number r > 0, the open ball of radius r about x is
defined as the set
Br (x) = {y ∈ Rn : kx − yk < r} .
That is, Br (x) consists of all points in Rn which are at distance less than r from
x. We sometimes denote the open ball Br (x) by B(x, r). Both notations are widely
used in mathematics.
Definition 1.2. A set U ⊆ Rn is called open in Rn , if for every x ∈ U there exists
r > 0 such that Br (x) ⊆ U .
In other words, about any point in an open set we can find a small ball which is
entirely contained in the set. Note that in this definition, the radius of the ball is
allowed to depend on x. See Figure 1.1.4.
We may compare the above definition with the definition of open sets in R you
saw in Analysis I. Recall that a set I ⊆ R is open in R, if for every x ∈ I, there is
δ > 0 such that (x − δ, x + δ) ⊆ I. This definition is consistent with the one we have
given in Rn , since in R1 , Bδ (x) = (x − δ, x + δ).
Example 1.1. The ball B1 (0) is open in Rn . To see this, suppose x ∈ B1 (0), so
that kxk < 1. Let r = (1 − kxk)/2. We need to show that Br (x) ⊆ B1 (0). To that
end, let y ∈ Br (x) be an arbitrary point. Using the triangle inequality for the norm
in Rn , we have
1 − kxk 1 + kxk
kyk = ky − x + xk ≤ ky − xk + kxk < r + kxk = + kxk < < 1.
2 2
This means that y ∈ B1 (0).
b b
Figure 1.1: An open set in R2 in cyan, and some balls inside it. The radius of the
ball depends on the location of the point.
Observe that in the above example, one can replace 1 with any other positive
real number, and the result is still valid. That is, for every δ > 0, the set Bδ (0) is
open in Rn . Similarly, one can also replace 0 with any y ∈ Rn . Thus, in general, for
any y ∈ Rn and any δ > 0, Bδ (y) is open in Rn .
(a) Rn ?
(b) ∅?
kxk ≤ r.
Exercise 1.6. (a) Show that if U1 and U2 are open sets in Rn , then U1 ∪ U2 and
U1 ∩ U2 are open in Rn .
Uα is open in Rn .
S
(i) Show that the set α∈I
T
(ii) Give an example showing that α∈I Uα need not be open.
Remark 1.2. It is worth noting that the notion of open sets in Rn relies on the
length function k·k we have on Rn . As we shall see in the next chapter, one can
consider functions (called metric) with similar properties on a wide range of other
sets (such as the set of all continuous functions from [0, 1] to R or the set of all
sequences in [0, 1], etc). These lead to notions of open sets on such sets. We will
look into this in the next chapter.
1.2 Continuity
Last year, you learned about the notion of continuity for functions from R (or subsets
of R) to R. In this section we revisit those definitions and upgrade them to higher
dimensions. In fact, the definitions we shall give are almost identical: the only thing
that changes is that we use the appropriate “norm” for the domain and range.
Remark 1.3. The words “function” and “map” are not identical. For f : X → Y ,
we use the word “function” when the target space Y is the real numbers or the
complex numbers (or in general a field). Otherwise, we use the word “map”. Of
course it is correct to refer to f : X → R as a map, but it is uncommon to refer to
f : X → Y as a function, when Y is not a set of numbers where one can not add
and multiply elements. On the other hand, it is common in analysis and geometry
to see expressions like, “let f be a function on X”, which means that f : X → R or
f : X → C. In those cases, the target space is understood from the context.
Thus we can take δ = ǫ and we have satisfied the criteria for continuity of f at p.
M = max kΛ(ej )k .
j=1,...,n
We note that,
n
X
kΛ(x) − Λ(p)k = kΛ(x − p)k = Λ ej (x − p)j
j=1
n
X
= (x − p)j Λ(ej )
j=1
n
X
≤ (x − p)j Λ(ej )
j=1
Xn
≤ (x − p)j kΛ(ej )k
j=1
n
X
≤M (x − p)j
j=1
Thus, if we take δ = ǫ/(2M n), then for any x with 0 < kx − pk < δ, we have
ǫ
kΛ(x) − Λ(p)k < M n < ǫ,
2M n
so Λ is continuous.
so we may take δ = ǫ and we have satisfied the condition for continuity. Obviously
the same argument shows that all of the coordinate maps (i.e. the map taking x to
xk ) are continuous.
Note that in the above definition, we do not allow x = p. With this notion of a
limit in hand, we can give the definition of continuity more compactly as:
Then
f (x) F
lim = .
x→p g(x) G
Proof. (i) Fix an arbitrary ǫ > 0. Since limx→p f (x) = F , we know that there
exists δ1 > 0 such that for every x ∈ A with 0 < kx − pk < δ1 ,
ǫ
|f (x) − F | < .
2
Similarly, there exists δ2 > 0 such that for every x ∈ A with 0 < kx − pk < δ2 ,
ǫ
|g(x) − G| < .
2
Define δ = min{δ1 , δ2 }. Evidently δ > 0. For every x ∈ A with 0 < kx − pk <
δ, by the triangle inequality, we have
(ii) Fix an arbitrary ǫ > 0, and assume without loss of generality that ǫ < 3 (Why
can see assume this?). Since limx→p f (x) = F , we know that there exists
δ1 > 0 such that for every x ∈ A with 0 < kx − pk < δ1 ,
ǫ
|f (x) − F | < .
3(1 + |G|)
Similarly, there exists δ2 > 0 such that for every x ∈ A with 0 < kx − pk < δ2 ,
ǫ
|g(x) − G| < .
3(1 + |F |)
(iii) Given the previous part, it suffices to show that if limx→p g(x) = G with G 6= 0,
then
1 1
lim = .
x→p g(x) G
Fix an arbitrary ǫ > 0. Since limx→p g(x) = G, we know that there exist
δ1 > 0 such that for every x ∈ A with 0 < kx − pk < δ1 ,
ǫ |G|2
|g(x) − G| < .
2
Also, since G 6= 0, G/2 > 0, and hence, there is δ2 > 0 such that for every
x ∈ A with 0 < kx − pk < δ2 ,
|G|
|g(x) − G| < .
2
By the triangle inequality, this implies that
|G| |G|
|g(x)| = |g(x) − G + G| ≥ |G| − |g(x) − G| > |G| − = .
2 2
1 1 1 1 ǫ |G|2 1 2
− = |G − g(x)| · · < · · = ǫ.
g(x) G |G| |g(x)| 2 |G| |G|
This completes the proof.
(i) f + g is continuous at p.
(ii) f g is continuous at p.
f
(iii) If, furthermore g(p) 6= 0, then is continuous at p.
g
Exercise 1.7. Assume that A is an open set in Rn and f : A → Rm . Show
that limx→p f (x) = F , if and only if, for any sequence (xi )∞
i=0 in A \ {p} with
limi→∞ xi = p,
lim f (xi ) = F.
i→∞
Exercise 1.8. (a) Show that the map f : R → Rn defined as f (x) = (x, 0, . . . , 0)
is continuous on R.
With the above results, one can build many continuous maps from Rn to Rm .
For example,
(x1 , x2 ) 7→ sin(x1 x2 ), cos(x2 ) ,
1
x − x2 x 3
1 2 3
(x , x , x ) 7→ ,e .
1 + (x2 )2
Exercise 1.9 (*). (a) Suppose f : Rn → Rm is continuous on Rn , and suppose
U ⊂ Rm is open. Show that:
f −1 (U ) := {x ∈ Rn : f (x) ∈ U }
is open.
f (x) − f (p)
lim
x→p x−p
f (x)
x
p
about how the tangent is characterised. Any (non-vertical) straight line passing
through (p, f (p)) is the graph of the affine map
Aλ : x 7→ λ(x − p) + f (p)
for some λ ∈ R. Let’s consider the difference between f and such an affine map
|f (x) − Aλ (x)|
lim = 0.
x→p |x − p|
This is a stronger statement than (1.6) because it tells us that f (x) − Aλ (x) is going
to zero faster than |x − p|, as x → p. We make this informal discussion more precise
in the following lemma.
|f (x) − Aλ (x)|
lim = 0.
x→p |x − p|
so that
|f (x) − Aλ (x)| f (x) − f (p)
lim =0 ⇐⇒ lim = λ.
x→p |x − p| x→p x−p
The expression on the right-hand side of the above equation is the definition of
differentiability of f at p.
We may rewrite
Note that some authors refer to the derivative of a map as total derivative, or
differential. We shall refer to that as derivative.
It is often useful to have the following equivalent characterisation of differentia-
bility in higher dimensions: f : Ω → Rm is differentiable at p ∈ Ω if and only if
there exists Λ ∈ L(Rn ; Rm ) such that
kf (p + h) − f (p) − Λ[h]k
lim = 0.
h→0 khk
Proof. Since
kf (p + h) − f (p) − Λ[h]k
lim = 0,
h→0 khk
we must have
lim kf (p + h) − f (p) − Λ[h]k = 0.
h→0
On the other hand, since linear maps are continuous, see Example 1.4, we obtain
Thus,
kf (p + h) − f (p) − B(h)k
lim = lim 0 = 0.
h→0 khk h→0
f (x) = kxk2
Df (p)[h] = 2 hp, hi , ∀h ∈ Rn .
From the properties of the inner product in Exercise 1.1-(a), we can see that the
map h 7→ 2hp, hi is a linear map.
We note that
so that
kf (p + h) − f (p) − 2 hp, hik
lim = lim khk = 0.
h→0 khk h→0
As a matrix, we have that Df (p) = 2p, where p is viewed as a row vector with
n components (this is in line with our convention that a 1 × n matrix maps Rn to
R1 ). So the Jacobian is a row vector for this map.
Since each f j is differentiable at p, the left hand side of the above equation tends to
0, as h → 0. And since the left hand side of the equation is non-negative, it must
tend to 0, as h → 0. Notice here that the expression Df (p) [h] means applying the
linear map Df (p) to the one dimensional vector h, which gives us an element of Rm .
Implicitly in the discussion above, we’ve assumed that Df (p), if it exists, must
be unique. Of course, this is something that we need to prove.
kΛ[e] − Λ′ [e]k
Λ[αj e] Λ′ [αj e]
= −
αj αj
kΛ[αj e] − Λ′ [αj e]k
= lim
j→∞ kαj ek
k−f (p + αj e) + f (p) + Λ[αj e] + f (p + αj e) − f (p) − Λ′ [αj e]k
= lim
j→∞ kαj ek
kf (p + αj e) − f (p) − Λ[αj e]k kf (p + αj e) − f (p) − Λ′ [αj e]k
≤ lim + lim
j→∞ kαj ek j→∞ kαj ek
= 0.
For the last equality in the above equation we have used that αj e → 0 as j → ∞.
By the above equation, for any unit vector e we have Λ[e] = Λ′ [e], which implies
that (as linear maps) Λ = Λ′ .
Df (p) = id,
f : (x, y) 7→ x2 + y 2 ,
Exercise 1.12. One might hope that the derivative can be calculated by finding
f (x) − f (p)
lim .
x→p kx − pk
By considering the example of Exercise 1.10 or otherwise, show that this limit may
not always exist, even if f is differentiable at p.
f ◦g
g f
Ω Ω′ Rl
Dg Df
Df (g(p)) ◦ Dg(p)
ψ(y)
0 = lim , (1.8)
y→q ky − qk
On the other hand, we recall from Example 1.4 that there is a real number M
such that
kA(x)k ≤ M kxk , ∀x ∈ Rn .
Since B is linear, and hence continuous by Example 1.4, we have that
B(φ(x)) φ(x) φ(x)
lim = lim B = B lim = 0.
x→p kx − pk x→p kx − pk x→p kx − pk
Fix an arbitrary ǫ > 0. It follows from (1.8) that there exists δ > 0 such that
for y ∈ Ω′ with ky − qk < δ we have
kψ(y)k
<ǫ
ky − qk
which implies
kψ(y)k < ǫ ky − qk .
On the other hand, since g is continuous, there exists δ1 such that if x ∈ Ω with
kx − pk < δ1 then
kg(x) − g(p)k = kg(x) − qk < δ.
Thus, for every x ∈ Ω with kx − pk < δ1 , we have
On the other hand, in Example 1.8, we saw that the map f (x) = kxk2 is differen-
tiable at every point in Rm with derivative Df (q)[h] = 2 hq, hi. We have k = f ◦ g
on (a, b). Thus, by the chain rule, the map k is differentiable at p, with derivative
Thus, the Jacobian of k at p is the one by one matrix with real entry
2 hg(p), Dg(p)i = 2g1 (p)(g 1 )′ (p) + 2g2 (p)(g2 )′ (p) · · · + 2g m (p)(g m )′ (p).
f ◦ g(x) = x, ∀ x ∈ Ω.
g ◦ f (x) = x, ∀ x ∈ Ω′ .
Show that
Df (g(p)) = (Dg(p))−1 .
P (x, y) = xy
DP (p) = (η ξ) .
(c) Show that the map F : Rn → R defined as F (z) = f (z)g(z), for all z ∈ Rn , is
differentiable at q, with derivative
in time. Suppose we start at the origin 0 ∈ R3 and travel along the curve t 7→ vt,
for some fixed v ∈ R3 , that is we move along a straight line with velocity v passing
through the origin at time 0. We can record the temperature of our surroundings as
a function of time, θ(t) and we will find θ(t) = f (vt). Suppose we ask what the rate
of change of temperature is at t = 0. This will of course be θ ′ (0). Now, we notice
that we can write:
θ =f ◦V
where V is the linear map V : R → R3 given by V (t) = vt. Now, we can use the
chain rule to calculate θ ′ (0) = Dθ(0) and we find:
θ ′ (0) = Df (0)[v].
This gives us a nice interpretation of the derivative Df (0). When we apply Df (0) to
a vector v, we find the rate of change of f at 0 as we travel along a line with velocity
v. More generally, we can consider travelling along the line given by V (t) = p + tv
for some p ∈ R3 . Then at t = 0, we are passing through the point p ∈ R3 . Setting
θ(t) = f (p + tv), We call the quantity:
In other words, we can find any directional derivative at p, provided we know the
three numbers:
∂f
Di f (p) = (p), i = 1, 2, 3.
∂ei
called the partial derivatives of f at p. Equivalently, these can be defined as
f (p + tei ) − f (p)
Di f (p) := lim .
t→0 t
v3
so that the Jacobian of f at p is given by
Df (p) = D1 f (p) D2 f (p) D3 f (p) .
D3 f (p)
which is called the gradient of f at p, and with this notation
We can extend all of these notions to more general range and domains, which
leads us to the following definition.
then
Di f 1 (p)
..
Di f (p) =
. .
m
Di f (p)
Proof. Let {ei } be the canonical basis for Rn . For any v ∈ Rn , we write v =
Pn i
i=1 v ei . Then by the linearity of Df (p) we have:
" n # n n
X X X
i
Df (p)[v] = Df (p) v ei = v i Df (p) [ei ] = v i Di f (p).
i=1 i=1 i=1
P
n i D f 1 (p)
i=1 v i
..
=
.
Pn i m
i=1 v Di f (p)
D1 f 1 (p) . . . Dn f 1 (p) v1
.. .. .. .
..
=
. . .
m m
D1 f (p) . . . Dn f (p) v n
This allows us to restate the chain rule in terms of the partial derivatives of the
functions.
In the one dimensional case, we often use the derivative to search for turning
points, i.e. maxima and minima, since a differentiable function will have vanishing
derivative at a local maximum or minimum. A similar result holds in the higher
dimensional case.
Df (p) = 0.
Proof. Pick v ∈ Rn . Since Ω is open, there exists ǫ > 0 such that p + tv ∈ Ω for
t ∈ (−ǫ, ǫ). Consider the function gv : (−ǫ, ǫ) → R defined as
gv (t) = f (p + tv).
Since v was arbitrary, we have that Df (p) = 0. A similar argument deals with the
case where p is a minimum.
g(x, y, z) = x + y + z.
1
x2 + y 2 ,
First note that this function is continuous at the origin. Since |xy| ≤ 2
we have that for p = (x, y) 6= (0, 0):
1p 2
|f (p)| ≤ x + y2 ,
2
so that
lim f (p) = 0.
p→0
f (th) − f (0) t2 /2 1
= 2 = ,
t t 2
which contradicts the differentiability of f at the origin. Thus, even though the
partial derivatives exist for this function, the function is not differentiable.
Away from the origin, the function is a composition of smooth functions so is
differentiable. We can calculate the partial derivatives at a point p = (x, y) 6= (0, 0)
and we find
y x2 y y3
D1 f (p) = p − 3 = 3 ,
x2 + y 2 (x2 + y 2 ) 2 (x2 + y 2 ) 2
and by symmetry:
x3
D2 f (p) = 3 .
(x2 + y 2 ) 2
We claim that the function g : R2 \ {0} → R given by
x3
g(x, y) = 3
(x2 + y 2 ) 2
has no limit as p = (x, y) converges to (0, 0). To see this, let p = (r cos θ, r sin θ) for
some r ∈ (0, ∞), θ ∈ [0, 2π). Then
g(p) = cos3 θ,
As it happens, the fact that the partial derivatives are not continuous in a
neighbourhood of the origin is the only barrier to differentiability there.
x 7→ Di f (x)
(*) Proof. Since Ω is open, there exists r > 0 such that Br (p) ⊂ Ω. Suppose
h ∈ Br (0) has components hi , so that h = ni=1 hi ei . We consider
P
n
!
X
i
f (p + h) − f (p) = f p + h ei − f (p)
i=1
n n−1
! !
X X
=f p+ hi ei −f p+ hi ei
i=1 i=1
n−1 n−2
! !
X X
i i
+f p+ h ei −f p+ h ei
i=1 i=1
+ ...
+ f (p + h1 e1 ) − f (p).
Let’s consider a typical line in the right hand side of the above equation, that is,
k k−1
! !
X X
f p+ hi ei − f p + hi ei = f (q + hk ek ) − f (q),
i=1 i=1
Pk−1 i
where k ∈ {1, . . . , n} and q = p+ i=1 h ei . Now, applying the mean value theorem
to the function g(t) = f (q + tek ), which is differentiable by assumption, there exists
s ∈ − hk , hk such that:
f (q + hk ek ) − f (q) = hk Dk f (q + sek ) = hk Dk f (p + ck ),
Pk−1 i
where ck = i=1 h ei +sek . One has to consider separately the cases hk > 0, hk < 0
and hk = 0. Now, note that since |s| ≤ hk , we have
kck k ≤ khk .
Putting this together, we conclude that there exists c1 , . . . , cn ∈ Rn with kck k ≤ khk
such that
n
X
f (p + h) − f (p) = hk Dk f (p + ck ).
k=1
so that
n
!1
f (p + h) − f (p) − nk=1 hk Dk f (p)
P 2
2
X
≤ |Dk f (p + ck ) − Dk f (p)| .
khk
k=1
n
!1
f (p + h) − f (p) − nk=1 hk Dk f (p) ǫ2
P 2
X
< = ǫ.
khk n
k=1
(c) f (x, y) = x5 y 2 .
The map f is differentiable at 0 with derivative equal to the constant linear map
Λ = 0. To see this, note that
0.5
-0.5
-1
-0.5
0.5
Figure 1.6: The image of the maps f and g in Example 1.13. The differentiability
at (0, 0) depends on “how fast” we pass through the point (0, 0).
The maps k and g have the same image, that is, they map the interval (−1, +1)
to the same curve, which is the graph of the function t 7→ t3 on the interval (−1, +1).
However, k is differentiable at 0, but g is not differentiable at 0, as we show below.
We claim that the derivative of the map k at 0 is equal to the linear map
Λ(h) = (h, 0). To see this, note that
kk(0 + h) − k(0) − Λ[h]k (h, h3 ) − (h, 0)
lim = lim
h→0 khk h→0 khk
(0, h3 )
= lim
h→0 khk
|h|3
= lim = 0.
h→0 |h|
Let Λ(1) = (a, b) ∈ R2 , for some real constants a and b in R. It follows that for
every h ∈ R we have
Therefore,
= lim (h−2/3 − a, 1 − b)
h→0
In the last line of the above equation we have used that k·k is a continuous function,
so we may interchange the limit and the norm. Now recall that kyk = 0, if and only
if y = 0. Thus we must have
lim h−2/3 − a = ∞.
h→0
Df : Ω → L(Rn ; Rm )
p 7→ Df (p).
Thus, DDf (p) takes an n-vector to an (m × n) matrix. The above notation may
appear complicated, but you have already seen some examples of maps in the right
hand of the above equation. For example, the map h 7→ hh, ·i is an element of
L(Rn ; L(Rn ; R1 )), that is, for every h ∈ Rn , the map u 7→ hh, ui is a linear map
from Rn to R1 .
In terms of our definition of derivative, DDf (p) is a linear map L ∈ L(Rn ; L(Rn ; Rm ))
such that the following holds
kDf (x) − Df (p) − L[x − p]k
lim = 0.
x→p kx − pk
Note that in the above equation, the norm on the numerator is the norm k·k on
Rmn and the norm on the denominator is k·k on Rn .
Obviously, we can generalise this to consider a map which is k−times differ-
entiable. In practice, the condition of k−times differentiable at a point can be
difficult to establish. However, if f : Ω → Rm is k times differentiable with all those
derivatives continuous, we say f is k-times continuously differentiable.
Assume that f = (f 1 , f 2 , . . . , f m ). We know from the previous results that if f
is differentiable at p ∈ Ω, the partial derivative maps
Di f j : Ω → R
x 7→ Di f j (x).
f : (x, y) = x3 + y 3 + 5x2 y.
This is differentiable at each point p = (x, y) ∈ R2 , and the partial derivatives are
and
D2 f (x, y) = 3y 2 + 5x2 ,
and differentiate them. The second partial derivatives are thus
D1 D1 f (p) = 6x + 10y
D2 D1 f (p) = 10x
D1 D2 f (p) = 10x
D2 D2 f (p) = 6y
Notice that
D2 D1 f (p) = D1 D2 f (p).
This is a coincidence!
Df (p) = 2pA
(b) Find
Hess f (p).
f : (x, y, z) = xy 2 + x2 + xzey .
(i) Compute the first and second partial derivatives. Observe the properties of
the second partial derivative.
(ii) Write the terms of the Taylor expansion of f at zero up to and including the
second-order terms.
(iii) Without computation, write the same Taylor expansion up to and including
the fourth-order terms.
and
2y (xy 3 −x3 y )
3y2 x−x3 − if (x, y) 6= (0, 0)
x2 +y 2 (x2 +y 2 )2
D2 f (x, y) =
0 if (x, y) = (0, 0).
Show that both of these functions are continuous at (0, 0).
D2 D1 f (0) 6= D1 D2 f (0)
where the sum is taken over all multi-indices α = (α1 , . . . , αn ) with |α| ≤ k − 1 and
the remainder term is given by:
X hα
Rk (p, h) = D α f (x)
α!
|α|=k
g(t) = f (p + th).
By the chain rule, this function is k-times differentiable on the interval (−1−ǫ, 1+ǫ),
and [0, 1] ⊂ (−1 − ǫ, 1 + ǫ), so by one dimensional Taylor’s theorem we have
g ′′ (0) g(k−1) (0)
g(1) = g(0) + g ′ (0) + + ... + + Rk ,
2! (k − 1)!
where
g(k) (ξ)
Rk =,
k!
for some ξ ∈ (0, 1). We will be done if we can show that for j = 0, 1, . . . k we have
X hα
g(j) (t) = j! D α f (p + th). (1.12)
α!
|α|=j
This is certainly true for j = 0. Suppose it’s true for some j ≥ 0. Then we have
n
X X hα
g (j+1) (t) = hl Dl j! D α f (p + th)
α!
l=1 |α|=j
n
X X hα hl
= j! Dl D α f (p + th)
α!
l=1 |α|=j
Clearly, the right-hand side of the above equation is a sum of terms proportional to
hβ D β f (p + th) where |β| = j + 1. Suppose β = (β1 , . . . , βn ), then the coefficient of
the term proportional to hβ D β f (p + th) is
j! j! j!
+ + ... +
(β1 − 1)!β2 ! · · · βn ! β1 !(β2 − 1)! · · · βn ! β1 !β2 ! · · · (βn − 1)!
(j + 1)! (j + 1)!
= = ,
β1 !β2 ! · · · βn ! β!
by a result from combinatorics (you do not need to verify this). Thus we have
n X α l
(j+1)
X h h
g (t) = j! Dl D α f (p + th)
α!
l=1 |α|=j
X hβ
= (j + 1)! D β f (p + th)
β!
|β|=j+1
By induction we conclude that (1.12) holds for all j = 0, . . . , k and the result
follows.
a) Compute the degree 1 and degree 2 Taylor polynomial of f near the point
(x0 , y0 ) = (0, π/2) and use those to approximate the value of f at (x1 , y1 ) =
(0, π/2 + 1/4). Compare your results with the values you obtain from a cal-
culator.
b) How precise is the degree 1 approximation in the closed ball of radius 1/4
around (x0 , y0 ). Find a rigorous upper bound for the approximation error.
(i) f : U → V is a bijection,
Recall that since the Jacobian Df (q) is an n × n matrix, the statement that it
is invertible is equivalent to the statement that det Df (q) 6= 0.
f (x, y) = (x + y + 5xy, y − x2 )
Evidently, both of these maps are continuous from R2 to R2 . Thus, by Theorem 1.12,
f is differentiable at every point in R2 . Moreover, by Theorem 1.9, the Jacobian of
f at (x, y) ∈ R2 is given by the matrix
!
1 + 5y 1 + 5x
Df (x, y) = .
−2x 1
It is worth noting that obtaining an explicit formula for the inverse map is not easy,
and hence the derivative of the inverse map is out of reach using the direct approach.
Determine the set of points in R2 such that f is invertible near those points, and
compute the derivative of the inverse map.
f 1 (x1 , x2 , . . . , xn ) = y 1 ,
f 2 (x1 , x2 , . . . , xn ) = y 2 ,
..
.
f n (x1 , x2 , . . . , xn ) = y n .
is continuously differentiable, and DF at (x10 , x20 , . . . , xn0 ) is invertible, then for all
values of y 1 , y 2 , . . . y n sufficiently close enough to y01 , y02 , . . . y0n the above system of
equations has a unique solution. Indeed the solution is given by the inverse of the
map F .
For example, by the previous example, we conclude that for a and b close to 0,
the equations
x + y + 5xy = 0
y − x2 = 0
f ∗ g = f ◦ g.
(f ∗ g) ∗ h = (f ◦ g) ◦ h = f ◦ (g ◦ h) = f ∗ (g ∗ h).
f ∗ id = f ◦ id = f, id ∗f = id ◦f = f.
exy sin(x2 − y 2 + x) = 0
2 +y
ex cos(x2 + y 2 ) = 1
admits the solution (x, y) = (0, 0). Prove that there exists ε > 0 such that for all
(ξ, η) with ξ 2 + η 2 < ε2 , the perturbed system of equations
exy sin(x2 − y 2 + x) = ξ
2 +y
ex cos(x2 + y 2 ) = 1 + η
has a solution (x(ξ, η), y(ξ, η)) which depends continuously on (ξ, η).
f 1 (x1 , x2 , . . . , xn ) = y 1 ,
f 2 (x1 , x2 , . . . , xn ) = y 2 ,
..
.
f m (x1 , x2 , . . . , xn ) = y m .
x2 + y 2 − 1 = 0.
F (x, y) = x2 + y 2 − 1
F (x, y) = 0.
Suppose (a, b) satisfies F (a, b) = 0, and a 6= 1, −1. Then there is an open interval
A containing a and an open interval B containing b with the property that for each
x ∈ A there is a unique y ∈ B such that F (x, y) = 0. This permits us to define
a map g : A → B by g(x) = y, so that F (x, g(x)) = 0. We can think of this as
√
‘locally solving for y in terms of x’. If b > 0 then g(x) = 1 − x2 . For the problem
at hand, there is in fact another number b1 such that F (a, b1 ) = 0. Associated to
this point there is an open interval B1 containing b1 and a map g1 : A → B1 such
√
that F (x, g1 (x)) = 0. (If b > 0, then b1 < 0 and g1 (x) = − 1 − x2 ). Both g, g1 are
differentiable. See Figure 1.7.
In contrast when a = ±1 we must have b = 0 in order to have a2 + b2 = 1.
Assume that a = +1. There are no open sets A ⊂ R containing a and B ⊂ R
containing b satisfying the following property
This is because, since B is open, there is δ > 0 such that (−δ, δ) ⊂ B. Now, for
√
every x ∈ A close enough to a = 1, there are two points ± 1 − x2 that belong to B.
Of course one might wish to rectify this problem with choosing A as an interval of
√
the (1−c, 1], and B an interval of the from [0, 1 − c2 ) so that for every x ∈ A there
is a unique y ∈ B satisfying x2 + y 2 = 1. But, when we go to higher dimensions, it
is not clear what is the correct analogue of the intervals of the form [z, w) or (z, w].
(a,b)
B
x
A
B1
(a,b1)
(ii) D2 F (x′ , y ′ ) 6= 0.
Then, there are open sets A ⊂ R and B ⊂ R with x′ ∈ A and y ′ ∈ B, and a map
f : A → B such that
Roughly speaking, the above theorem states that for each solution x0 , y0 of the
equation
F (x, y) = 0,
the nearby solutions x, y of the above equation, look like the graph of a map from
x unknown to the y unknown.
Exercise 1.25. For each of the following equations determine at which points one
cannot find a function y = f (x) which describes the graph in this neighbourhood.
Sketch the graphs.
(a)
1 3
y − 2y + x = 1
3
(b)
2x2 + 4xy + y 2 = 3x + 4y
(a) Show that this system of equations (implicitly) defines a function y = f (x)
with f (1) = 1.
(c) Find an explicit formula for f and check your result from b).
(x, y) 7→ D2 F (x, y)
By the property in Step 1 we note that h′ (y) = D2 F (x′ , y) > 0, for all y ∈ (y ′ −
δ, y ′ + δ). This implies that h is strictly increasing on the interval (y ′ − δ, y ′ + δ).
As h(y ′ ) = 0, we must have h(y ′ − δ) < 0 and h(y ′ + δ) > 0.
By the above paragraph, F (x′ , y ′ − δ) < 0 and F (x′ , y ′ + δ) > 0. Since F is
continuous, there is δ′ > 0 such that F is negative on (x′ − δ′ , x′ + δ′ ) × {y ′ − δ},
and is positive on (x′ − δ′ , x′ + δ′ ) × {y ′ + δ}.
Step 3. For every x ∈ (x′ − δ′ , x′ + δ′ ), there is a unique y ∈ (y ′ − δ, y ′ + δ) such
that F (x, y) = 0.
Fix an arbitrary x ∈ (x′ − δ′ , x′ + δ′ ), and consider the map g : [y ′ − δ, y ′ + δ] → R
defined as
g(y) = F (x, y).
The map g is continuous on [y ′ − δ, y ′ + δ], with g(y ′ ) = F (x, y ′ − δ) < 0 and
g(y ′ + δ) = F (x, y ′ + δ) > 0. By the intermediate value theorem, there must be
y ∈ [y ′ − δ, y ′ + δ] such that g(y) = 0. So F (x, y) = 0.
On the other hand, since g′ (y) = D2 F (x, y) > 0 for all y ∈ [y ′ − δ, y ′ + δ], g is
strictly increasing on [y ′ − δ, y ′ + δ]. This implies that there is a unique point in
(y ′ − δ, y ′ + δ) where g becomes 0. This proves the uniqueness.
With the above argument, we can introduce A = (x′ − δ′ , x′ + δ′ ) and B =
(y ′ − δ, y ′ + δ).
Inverse Function Theorem implies the Implicit Function Theorem: Assume that f
satisfies the assumptions in Theorem 1.17. We define a new map
F : Ω × Ω′ → Rn × Rm
as
F (x, y) = (x, f (x, y)).
Here I is the n × n identity matrix, M is the matrix in Theorem 1.17, and N is the
m × n matrix with components:
Dj f i (p) ,
1 ≤ i ≤ m, 1 ≤ j ≤ n.
Since det M 6= 0, we must have det DF (p) 6= 0. Note that F (a, b) = (a, 0). There-
fore, we can apply the Inverse Function Theorem to deduce the existence of open
sets U ⊂ Ω × Ω′ and V ⊂ Rn × Rm with (a, b) ∈ U , (a, 0) ∈ V such that F : U → V
has a continuously differentiable inverse h : V → U . By shrinking U , if necessary,
we can assume that U = A × B for some open sets A ⊂ Ω and B ⊂ Ω′ .
Note that the map h must be of the form h(x, y) = (x, k(x, y)) for some con-
tinuously differentiable map k (since F has this form). Let π : Rn × Rm → Rm
be the projection map π(x, y) = y. Then f = π ◦ F . Now, by the associativity of
composition of maps,
F : Rn × Ω → Rn
defined as
F (y, x) = y − f (x).
F (p) = 0.
Dn+j F i (p), 1 ≤ i, j ≤ n
is −Df (q). So, by the assumption in inverse function theorem, the above matrix
is invertible. Therefore, by the Implicit Function Theorem, there is an open set
U ⊂ Rn and B ⊂ Ω with f (q) ∈ A and q ∈ B, and a map g : A → B such that
Chapter 2
• how much time does it take to walk from my apartment to the maths depart-
ment,
• how long does it take to travel from South Kensington tube station to Cam-
bridge by public transport,
• how much does the cheapest public transport from South Kensington tube
station to Heathrow airport cost,
What should be the correct way of defining “distance” in more general settings.
From the above examples we can see that the notion of distance should be a function
of two variables, that is, we give it two elements. There has been a long historical
development on this question, with various properties proposed and refined. Here
we present the outcome of those developments, and define what is now standard.
d: X ×X →R
(M1) for all x and y in X we have d(x, y) ≥ 0, and d(x, y) = 0 if and only if x = y;
Remark 2.1. The triangle inequality in Euclidean spaces has a rather simple in-
terpretation. That is, in any triangle, the length of each side is bounded from above
by the sum of the lengths of the other two sides. In an arbitrary set, triangles may
not make sense. But the interpretation still makes sense, and is the reason behind
requiring condition M3. We think of d(x, y) as “the length of the shortest way from
x to y”. So the length of the shortest way from x to y should be bounded from above
by the length of the shortest way from x to y passing through z. See Figure 2.1.
On the other hand, property M1 tells us that the metric “separates” points. That
is, the distance between distinct points is strictly positive.
b y
x b
z
Definition 2.2. By a metric space we mean a pair of a set and a metric on that
set. That is often denoted as M = (X, d), where X is a set, and d : X × X → R is
a metric. We refer to M as the metric space. The elements of X are called points.
Given two points x and y in X, the real number d(x, y) is called the distance
between x and y with respect to the metric d.
In the above definition, when it is clear what metric is involved, we simply refer
to d(x, y) as the distance between x and y.
It is customary to use the same notation for M and X, that is, the metric space
X = (X, d).
Remark 2.2. The reason that we refer to the elements of X as points, is because
we would like a unified approach to all metric spaces. That is, to present statements
and proofs so that it applies to a variety of settings. We understand that when
X = R, then elements of X are numbers, when X = Rn , the elements of X are
vectors, and when X is the set of all 5 × 5 matrices, then each element of X is a
matrix. We refer to all those elements as points in X.
d1 (x, y) = |x − y|.
From the properties of the modulus function, see Section 1.1.1, we immediately see
that d1 satisfies the properties M1, M2, and M3. For example, for M2, we see that
By the properties of the norm function on Rn , see Section 1.1.2, d2 satisfies the
properties M1, M2, and M3 in Definition 2.1. For example, to see property M3,
we note that for every x, y, and z in Rn , by the triangle inequality for the norm
function, we have
b y
|x2 − y 2 |
x b b
|x1 − y 1 |
We need to verify that the properties M1, M2 and M3 in Definition 2.1 hold.
M1: Fix arbitrary x = (x1 , x2 , . . . , xn ) and y = (y 1 , y 2 , . . . , y n ) in Rn . Since the
modulus function only produces non-negative values, for every j = 1, 2, . . . , n, we
have |xj − y j | ≥ 0. Thus,
n
X
d1 (x, y) = |xj − y j | ≥ 0.
j=1
then, for all j = 1, 2, . . . , n, we must have |xj − y j | = 0 (because each of the numbers
in the above sum is non-negative). By the first property of the modulus function,
this implies that for all j = 1, 2, . . . , n, we have xj = y j . Hence, x = y.
M2: For every x = (x1 , x2 , . . . , xn ) and y = (y 1 , y 2 , . . . , y n ) in Rn , we have
n
X n
X
j j
d1 (x, y) = |x − y | = |y j − xj | = d1 (y, x).
j=1 j=1
In the first line of the above equation we have used the triangle inequality for the
modulus function n times (i.e. |xj − y j | ≤ |xj − z j | + |z j − y j |, for j = 1, 2, . . . , n).
Intuitively, the metric d1 on R2 means that we are only allowed to travel along
horizontal and vertical directions to go from x ∈ R2 to y ∈ R2 . See Figure 2.2.
Exercise 2.1. Let X = Rn and define the function d∞ : Rn × Rn → R as
We note that f ≥ g on [a, b]. Also, since g is only discontinuous at two points (finite
number of points is ok), it is integrable on [a, b]. Moreover,
Z b Z b
f (t) dt ≥ g(t) dt ≥ δ · h/2 > 0.
a a
Note that since c ∈ [a, b], the length of the interval [c − δ, c + δ] ∩ [a, b] is at least δ,
with the minimum length happening when c = a or c = b.
Exercise 2.3. Assume that a < b are real numbers, and h : (a, b) → (0, ∞) be a
continuous function. For x and y in (a, b), we define
Z max{x,y}
dh (x, y) = h(t) dt.
min{x,y}
Intuitively, in the above exercise, the function h determines the cost of travelling
from x to y.
g(x, y) = |x − y|2 .
Any pair of points a and b in S 1 divides the circle S 1 into two arcs. We assume
the convention that the end points of the arcs are included in the arcs (this does
not make any difference when calculating the arc length). We define d(a, b) as the
length of the shortest arc between a and b. When the points a and b are antipodal
(diametrically opposite of one another), the shortest arc is not unique, but those
arcs have the same length. Thus, the function d : S 1 × s1 → R is well-defined.
M1: The length of any arc is non-negative, and when the end points are distinct,
the length is strictly positive.
M2: Since the shortest arc between two points does not depend on the order
at which we choose the end points, M2 holds as well. When the end points lie on
opposite sides, the shortest arc is not unique, but the length is unique. So in that
case we have symmetry as well.
M3: Let θ1 , θ2 and θ3 be arbitrary points on S 1 . If the points θ1 , θ2 and θ3 are
not pairwise disjoint, then we obviously have
That is because, if θ1 = θ3 , the left hand side of the above inequality is 0, and the
right hand side is non-negative by the definition of metric. Also, if θ2 ∈ {θ1 , θ3 }, the
value on the left hand side also appears on the right hand side of the inequality, with
the other term on the right hand side non-negative. So we may assume that the
points θ1 , θ2 , and θ3 are pairwise disjoint. Let ℓi,j denote the shortest arc between
θi and θj , for i and j in {1, 2, 3}. We consider few cases:
(iv) neither of the cases (i)-(iii) holds. Then, ℓ1,2 ∪ ℓ1,3 ∪ ℓ2,3 = S 1 , and hence
b
θ1
θ2 b
ℓ2,3 b
O
θ3
b
Figure 2.3: The circle of radius 1 about 0 in R2 , and the distance of arc length.
All the examples of metrics we have seen so far are on the of real numbers and
Euclidean spaces. But the purpose of giving an axiomatic definition of metric is
to generalises analysis. Here are few examples of metric spaces which shows the
generality of this notion.
Example 2.5. Let E be a finite set, and let P(E) denote the set of all subsets of
E. Given A ∈ P(E), we define Card(A) as the number of elements in A. Also, for
A and B in P(E), we define the symmetric difference of A and B as
A∆B = (A \ B) ∪ (B \ A).
is a metric on P(E).
This metric is called the Hamming metric, and plays important role in informa-
tion theory and cryptography.
You can see that this is a metric on X. In this metric all distinct points lie at
distance 1 from each other (you may wish to imagine this for some sets). The
metric ddisc is called the discrete metric.
x = (x1 , x2 , . . . , ), y = (y 1 , y 2 , . . . , ), z = (z 1 , z 2 , . . . ).
The right hand side of the above equation is a constant independent of j. Thus,
for all j ≥ 1, |xj − y j | is bounded from above by that constant. Therefore, their
supremum must be bounded by that constant. That is,
Assume that a and b are real numbers with a < b. Define the set
C([a, b]) = f : [a, b] → R | f : [a, b] → R is continuous.
Since f and g are continuous on [a, b], they are bounded so there exists k1 and k2 in R
such that for all t ∈ [a, b], |f (t)| ≤ k1 and |g(t)| ≤ k2 . Therefore, d∞ (f, g) ≤ k1 + k2 ,
so d∞ is well defined on C([a, b]).
As in the previous example, one can see that d∞ is a metric on C([a, b]). This
is called the supremum metric, or the uniform metric.
The function d1 is a metric on C([a, b]). To see this, first note that since the modulus
of a continuous function is a continuous function, the integral in the above definition
is defined.
M1: For every f and g in X, and every t ∈ [a, b], |f (t) − g(t)| ≥ 0. Thus,
Z b
d1 (f, g) = |f (t) − g(t)| dt ≥ 0.
a
M3: Let f , g and h be continuous functions on [a, b]. For all t ∈ [a, b], by the
triangle inequality for the modulus function, we have
|f (t) − g(t)| = (f (t) − h(t)) + (h(t) − g(t)) ≤ |f (t) − h(t)| + |h(t) − g(t)|.
which gives us
d1 (f, g) ≤ d1 (f, h) + d1 (h, g).
We have already seen many examples of metric spaces. There are some ways
to define new metric spaces using other metric spaces. We present two approaches
below.
Example 2.10. Consider the Euclidean metric space (R, d1 ). We may restrict this
metric to the set of rational numbers Q ⊂ R. Also, d1 induces a metric on the set
of integers Z ⊂ R.
Similarly, since Zn ⊂ Rn and Qn ⊂ Rn , we may restrict any of the metrics d1 ,
d2 , and d∞ onto those sets.
That is, the set of all ordered pairs (x1 , x2 ) such that x1 ∈ X1 and x2 ∈ X2 .
Definition 2.4. Let (X1 , d1 ) and (X2 , d2 ) be two metric spaces. We may use the
metrics d1 and d2 to define a metric on X1 × X2 . For example,
Each of the above functions from (X1 × X2 ) × (X1 × X2 ) to R is a metric. For each
of the above metrics d, the metric space (X1 × X2 , d) is called a product metric
spaces.
On any vector space (V, k·k) we have a natural notion of metric coming from the
norm function. We present this in the next lemma.
dkk (u, v) = ku − vk
is a metric on V .
Proof. Property M1 comes from the property N1 of the norm function, that is,
dkk (v, w) = kv − wk ≥ 0.
Also,
dkk (v, w) = 0 ⇐⇒ kv − wk = 0 ⇐⇒ v − w = 0 ⇐⇒ v = w.
Property M3 comes from the property N3 for the norm. That is because
Some of the examples we already seen are normed vector spaces. For example,
the distance d2 on Rn comes from the norm k·k on Rn .
(x1 , x2 , . . . , xn ) 1
= |x1 | + |x2 | + · · · + |xn |,
(x1 , x2 , . . . , xn ) ∞
= max{|x1 |, |x2 |, . . . , |xn |}.
One can easily see that these functions satisfy the three properties for the norm
function. These norms induce the metrics d1 and d∞ on Rn , respectively.
Assume that a < b are real numbers, and let C([a, b]) denote the set of all
continuous functions f : [a, b] → R. For f and g in C([a, b]), we define f + g as
the function (f + g)(x) = f (x) + g(x), for all x ∈ [a, b]. Also, for λ ∈ R, and
f ∈ C([a, b]), we define (λf )(x) = λf (x). These operations make C([a, b]) a vector
space on R. This vector space has infinite dimensions, since the functions x 7→ x,
x 7→ x2 , x 7→ x3 , . . . , are linearly independent.
Exercise 2.6. Assume that a < b are real numbers. Show that each of the following
functions is a norm on C([a, b]):
(i)
Z b
kf k1 = |f (t)| dt
a
(ii)
kf k∞ = max |f (t)|
t∈[a,b]
(iii)
Z b 1/2
2
kf k2 = |f (t)| dt
a
Remark 2.3. The norm k·k1 on C([a, b]) is called the l1 -norm, k·k2 on C([a, b])
is called the l2 -norm, and the norm k·k∞ on C([a, b]) is called the l∞ -norm, or
supremum norm. The metric induced from k·k1 on C([a, b]) is the d1 metric we
presented in Example 2.9 and the metric induced from k·k∞ on C([a, b]) is the d∞
metric we presented in Example 2.8.
You can learn more about these spaces in the modules Lebesgue Measure and
Integration, and Functional Analysis.
It is not true that every metric on a vector space comes from a norm. You can
show this by the following exercise.
Exercise 2.7. Show that if V is a vector space, and k·k : V → R is a norm function,
then for any v ∈ V , we must have dkk (0, 2v) = 2 dkk (0, v). Conclude that there is
no norm function on R2 which induced the discrete metric ddisc on R2 .
As we shall see in later sections, the notion of metric allows us to develop analysis
on general metric spaces. It is remarkable that such a simple notion can lead to a
huge volume of mathematical theory. Of all the properties of a function which
makes it a metric, the triangle inequality is the non-trivial one. It is worth taking a
moment to build intuition about that property. The following exercise helps you to
achieve that.
Definition 2.6. Consider a metric space (X, d), a point x ∈ X, and a real number
ǫ > 0. The ball of radius ǫ centred at x is the set of all points x′ ∈ X satisfying
d(x, x′ ) < ǫ. In other words,
(ii) In (Rn , d2 ), for every a ∈ Rn and ǫ > 0, Bǫ (a) consists of all the points inside
a hypersphere.
(iv) Let I = [0, 1] ⊂ R, and dI denote the induced metric on I from d1 on R. Then,
in (R, d1 ), we have
In (I, dI ), we have
In (I, dI ),
B1/2 (1/2) = B1/2 (1/2, I, dI ) = {x ∈ I | dI (x, 1/2) < 1/2} = (0, 1).
(v) In (X, ddisc ), where X is a non-empty set, and ddisc is the discrete metric, for
every x ∈ X and ǫ > 0 we have the following.
If ǫ ≤ 1, then
Bǫ (x) = {x′ ∈ X | ddisc (x, x′ ) < ǫ} = {x}.
If ǫ > 1,
Bǫ (x) = {x′ ∈ X | ddisc (x, x′ ) < ǫ} = X.
This consists of all continuous functions g : [a, b] → R such that the graph of
g lies between the graphs of f − ǫ and f + ǫ.
Figure 2.4: Figure on the left hand side shows Bǫ (0, R2 , d1 ), the figure in the middle
shows Bǫ (0, R2 , d2 ), and the figure on the right hand side shows Bǫ (0, R2 , d∞ ).
b b
a b
(i) Show that if ǫ < δ, then Bǫ (x) ⊆ Bδ (x). By example, show that the equality
may hold even if ǫ < δ.
Definition 2.7. Let (X, d) be a metric space, and U ⊆ X. We say that U is open
in (X, d), if for every u ∈ U , there is δ > 0 such that Bδ (u) ⊆ U .
Lemma 2.3. Let (X, d) be a metric space. For every x ∈ X and ǫ > 0, the ball
Bǫ (x) is open in X.
Proof. Fix an arbitrary y ∈ Bǫ (x). Let δ = ǫ − d(x, y). Since y ∈ Bǫ (x), we have
d(x, y) < ǫ, and hence δ > 0.
Let z ∈ Bδ (y) be an arbitrary point. By the triangle inequality of the metric,
Due to the above lemma, Bǫ (x) is also called an open ball of radius ǫ about x.
Lemma 2.4. In any metric space (X, d), the empty set and the set X are open.
Proof. To see that the empty set is open, we need to show that for every x in the
empty set, there is δ > such that Bδ (x) is contained in the empty set. Since there is
no such x in the empty set to begin with, for logical reasons this statement is true.
So the empty set is open.
On the other hand, for every x ∈ X, we have B1 (x) ⊂ X. That is because of
the definition of the ball. Thus we can take ǫ = 1, in the criterion for open sets.
Note that the definition of open set in a metric space (X, d) depends on both
the metric d and the underlying set X. We make this clear in the next example.
Example 2.13. Consider the discrete metric ddisc on R, that is (R, ddisc ). In this
space, any subset of R is open. To see that let U be an arbitrary subset of R, and
let u ∈ U be an arbitrary point. We let δ = 1/2, and note that B1/2 (u) = {u} ⊂ U .
This shows that U is open. But in the metric space (R, d1 ) it is not true that
every subset of R is open. For example, the set with single element {1} is open in
(R, ddisc ), but it is not open in (R, d1 ).
On the other hand, let I = [0, 1] ⊂ R, and let dI be the induced metric on [0, 1]
from d1 on R. The set [0, 1/2) is not open in (R, d1 ) (the definition does not hold
for the point 0 ∈ [0, 1/2)). But [0, 1/2) is open in ([0, 1], dI ). To show the latter
property, let x ∈ [0, 1/2). If x ∈ (0, 1/2), we define δ = min{x, 1/2 − x}, and see
that δ > 0 and
Bδ (0, I, dI ) = {x′ ∈ [0, 1/2) | dI (0, x′ ) < δ} = [0, 1/4) ⊂ [0, 1/2).
According to the definition of open sets, this shows that [0, 1/2) is open in ([0, 1], dI ).
Lemma 2.5. Let X = (X, d) be a metric space. The union of any number of (finite,
countable, uncountable) open sets in X is an open set in X.
Proof. Assume that Gα ⊆ X is open, for all α in a set I. Let x ∈ ∪α∈I Gα . Then
there exists some α0 ∈ I such that x ∈ Gα0 . Since Gα0 is an open set, there exists
δ > 0 such that Bδ (x) ⊂ Gα0 . This implies that Bδ (x) ⊆ ∪α∈I Gα .
Lemma 2.6. Let X = (X, d) be a metric space. The intersection of any finite
number of open sets in X is an open set in X.
The statement in the above lemma is not necessarily true if we drop the hy-
pothesis of finiteness. For example, as we saw in Exercise 2.9, in the metric space
(Rn , d2 ), we have ∩∞ 2
n=1 B1/n (x) = {x}. And the set {x} is not open in (R , d2 ).
We have already seen that there may be many metrics on a given set. For
example, we have metrics d1 , d2 , and d∞ on Rn . The definition of open set in a
metric space depends on the metric. So a priori, for each of these metrics on Rn , we
may have different open sets. This seems to be cumbersome, but can be alleviated
by the following definition.
Definition 2.8. Let d1 and d2 be metrics on a set X. The metrics d1 and d2 are
called topologically equivalent, if the following property holds. For every U ⊆ X,
U is open in (X, d1 ) if and only if U is open in (X, d2 ).
Notice the similarly between the above definition and the definition of conver-
gence of sequences in Euclidean spaces.
Example 2.14. In the metric space (R, d1 ) the sequence (1/n)n≥1 converges. That
is because 0 ∈ R, and for every ǫ > 0 we can choose an integer N > 1/ǫ, so that for
all n ≥ N we have d1 (1/n, 0) = 1/n < ǫ.
Now let I = (0, 1), and dI be the induced metric on I from d1 . In the metric
space (I, dI ), the sequence (1/n)n≥1 does not converge. That is because there is no
x ∈ (0, 1) satisfying the criterion for the convergence. Assume in the contrary that
there is such an x ∈ (0, 1). We choose ǫ = x/2 > 0, and for every N ∈ N, we choose
n ≥ max{N, 2/x}. Then,
Exercise 2.11. Let (X, ddisc ) be a discrete metric space, and (xn )n≥1 be a sequence
in X. Then, (xn )n≥1 converges in (X, ddisc ) if and only if the sequence (xn )n≥1 is
eventually constant.
Lemma 2.7. Let (X, d) be a metric space, and (xn )n≥1 be a sequence in X. If the
sequence (xn )n≥1 converges in (X, d), then its limit is unique.
Proof. Let us assume that there are two points x and y in X such that the sequence
(xn )n≥1 converges to. Fix an arbitrary ǫ > 0. Since the sequence converges to x,
there is N1 ∈ N such that for all n ≥ N1 , we have d(xn , x) < ǫ. Similarly, since
the sequence converges to y, there is N2 ∈ N such that for all n ≥ N2 , we have
d(xn , y) < ǫ. Now, let n = max{N1 , N2 }. We have
By property M1 of metrics, d(x, y) ≥ 0, and since ǫ > 0 was arbitrary, the above
inequality shows that d(x, y) = 0. Then, by property M1 of the metrics, we conclude
that x = y.
Exercise 2.12. Let (X, d) be a metric space, and (xn )n≥1 be a sequence in X.
Prove that the sequence (xn )n≥1 converges to x ∈ X if and only if, for every open
set U in (X, d) with x ∈ U , there is N ∈ N such that for all n ≥ N , we have xn ∈ U .
When it is clear what metric is involved, we may simple say that V is closed in
X. For example, when a metric is not specified on Rn , it is assumed that it is the
Euclidean metric d2 . Thus, when we say that E is closed in Rn , we mean that E is
closed in (Rn , d2 ).
Example 2.15. Consider real numbers a < b. The set [a, b] is closed in (R1 , d1 ).
That is because if (xn )n≥ is a sequence in [a, b] which converges to x in R, then we
have a ≤ xn ≤ b, and hence a ≤ limn→∞ xn ≤ b. This implies that x ∈ [a, b].
The intervals (a, b) and (a, b] are not closed in (R1 , d1 ). That is because
b−a
a+ , n≥2
n
is a sequence in (a, b] which converges to a in (R, d1 ), but a does not belong (a, b].
On the other hand, let I = (0, 1) and dI be the induced metric on I from
(R, d1 ). Then the set V = (0, 1/2] is closed in ((0, 1), dI ). To see this, assume that
(xn )n≥1 is a sequence in (0, 1/2] which converges in ((0, 1), d I ). By the definition of
convergence in ((0, 1), d I ), the limit of the sequence must be in (0, 1). However, since
the sequence belongs to (0, 1/2], its limit is at most 1/2. Thus, the limit belongs to
(0, 1/2].
Exercise 2.13. Let (X, ddisc ) be a discrete metric space. Then every set in X is
closed.
Note that open is not the opposite of closed. If a set is not open, it does not
mean that it is closed. For example, the set (1, 2] is neither open or closed in (R, d1 ).
There are sets that are both open and closed, as we shall see in a moment.
Proof. First assume that V is closed. Assume in the contrary that X \V is not open.
Then, there is x ∈ X \V , such that for all δ > 0, Bδ (x) * X \V . Equivalently, for all
δ > 0, Bδ (x) ∩ V 6= ∅. In particular, for each n ∈ N, we let δ = 1/n , and conclude
that there is a point xn ∈ Bδ (x) ∩ V . This process generates a sequence (xn )n∈N
in V . The sequence (xn )n∈N converges to x in (X, d), because xn ∈ Bδ (x) implies
that d(xn , x) < 1/n. But the limit x does not belong to V , which contradicts V is
closed.
Some authors define the notion of closed sets using the equivalence form in the
above theorem. That is, a set is closed, if its complement is open. Then, they prove
(as in the proof of the above theorem) that if a set is closed, it contains the limit of
any convergent sequence in that set.
(i) the intersection of any number (finite, countable or uncountable) of closed sets
in (X, d) is a closed set in (X, d),
(ii) the union of any finite number of closed sets in (X, d) is a closed set in (X, d).
we conclude that X \ (∩α∈I Fα ) is open. Using Theorem 2.9 again, we conclude that
∩α∈I Fα is closed in X. This proves part (i) of the lemma.
The proof for part (ii) is similar, except that one uses Lemma 2.6 instead of
Lemma 2.5.
It is also possible to give a proof of the above lemma, directly using the definition
of closed sets in Definition 2.10.
(ii) The point x is called an isolated point of V , if there exists δ > 0 such that
V ∩ Bδ (x) = {x}. In other words, there is a δ-neighbourhood of x which does
not contain any point of V except x.
(iv) The point x is called a boundary point of V , if for every δ > 0 we have
Bδ (x) ∩ V 6= ∅ and Bδ (x) \ V 6= ∅. In other words, x is a boundary point of
V , if every δ-neighbourhood of x meets both V and the complement of V .
Note that in items (i) and (ii), any interior point and any isolated point of V
is an element of V . But the limit point and the boundary point of a set V are not
necessarily elements of V .
Example 2.16. Consider the Euclidean metric space (R2 , d2 ), and the set
[
V = (x, y) ∈ R2 | k(x, y)k ≤ 1, x ≥ 0 (x, y) ∈ R2 | k(x, y)k < 1, x < 0 .
Figure 2.6: The solid black arc is part of V , but the dotted arc is not part of V .
You can see that (x, y) is an interior point of V if and only if k(x, y)k < 1. The
set V has no isolated points. The point (x, y) is a limit point of V if and only if
k(x, y)k ≤ 1. The point (x, y) is a boundary point of V if and only if k(x, y)k = 1.
Verify these statement.
V = {1/n | n ∈ N}.
Then, V has no interior point. Every point in V is an isolated point of V . The point
0 is the only limit point of V . Every point in V is a boundary point of V . But also
the point 0, which is not in V , is a boundary point of V .
If V = (0, 1] ∪ {2} in R1 , then a is an interior point of V if and only if a ∈ (0, 1).
The point 2 is the only isolated point of V . The point a is a limit point of V if and
only if a ∈ [0, 1]. A point a is a boundary point of V if and only if a ∈ {0, 1, 2}.
(i) The interior of V is defined as the set of all v ∈ V such that v is an interior
point of V . The interior of V is often denoted as V ◦ .
(ii) The closure of V is the union of V and all the limit points of V . The closure
of V is often denoted as V .
(iii) The boundary of V is the set of all v ∈ X such that v is a boundary point of
V . The boundary of the set V is often denoted as ∂V .
Example 2.19. Let V = (0, 1]∪ {2} in (R1 , d1 ). Then, V ◦ = (0, 1), V = [0, 1]∪ {2},
∂V = {0, 1, 2}.
Exercise 2.14. Let (X, d) be a metric space, and V be a subset of X. Show that
the set V is closed if and only if V = V .
Exercise 2.15. Let V and W be subsets of a metric space (X, d). The following
properties hold:
(i) if V ⊂ W , then V ◦ ⊂ W ◦ ,
(ii) if V ⊂ W , then V ⊂ W ,
Exercise 2.16. Let V and W be subsets of a metric space (X, d). Prove that
V ∪ W = V ∪ W.
(V ∪ W )◦ 6= V ◦ ∪ W ◦ .
Proof. Assume that there is a sequence of points, say (xn )n≥1 , in V \ {x} which
converges to x. We need to show that for every δ > 0, Bδ (x) ∩ V contains an
element of V other than x. Fix an arbitrary δ > 0. Because the sequence (xn )n≥1
converges to x, there is N ∈ N such that for all n ≥ N , xn ∈ Bδ (x). As the
sequence lies in V \ {x}, we conclude that xN is distinct from x, and xN ∈ Bδ (x).
This completes the proof.
Now assume that x is a limit point of V . For each n ∈ N, the number δn = 1/n
is strictly positive. So, by the definition of limit points, B1/n (x) ∩ V contains an
element different from x. Let xn be such an element. This process generates a
sequence (xn )n≥1 in V \ X. We do not know that the points in the sequence x1 , x2 ,
x3 , . . . are distinct points. But this does not matter for us. The sequence (xn )n≥1
converges to x since d(xn , x) < 1/n.
• We say that the metric space (X, d) is separable, if there is a countable set
which is dense in X.
Example 2.20. In the metric space (R1 , d1 ), the set Q is countable and dense. So
(R1 , d1 ) is separable.
In the metric space (Rn , d2 ) the set of all vectors with rational coordinates is
countable and dense in Rn .
(C([a, b]), d∞ ) is separable. You can see an elementary, but rather long, proof of
this classic theorem in the Principles of Analysis by Rudin.
On the other hand, one can approximate any continuous function on [0, 1] by a
(series) function of the form
∞
X
an cos(2πnx) + bn sin(2πnx),
n=1
where an and bn are real numbers. So they also form a dense subset of (C([0, 1]), d∞ ).
Functions of the above form are called Fourier series. There is an entire module
called “Fourier Analysis and the Theory of Distributions” devoted to the properties
of such functions.
Example 2.21.* Recall the space of all bounded sequences in R with the supremum
metric d∞ . This metric space is not separable. To see that, Let E denote the set
of all sequences of 0s and 1s (i.e. 00111010101010 . . . ). You have already seen in
Analysis I that E is uncountable.
Note that the d∞ distance between any two distinct elements of E is equal to
1. Then, for distinct elements e1 and e2 in E, B1/2 (e1 ) ∩ B1/2 (e2 ) = ∅. So any
dense subset needs to have at least one element from each such ball, but there are
an uncountable number of such balls. Hence, the dense subset can not be countable.
For a given n ∈ N , the set of elements m ∈ M such that f (m) = n is called the
pre-image of n. This should be denoted as f −1 ({n}), but abusing the notation,
it is often denoted as f −1 (n). For any set B ⊆ N , the pre-image of B, is defined
(and denoted) as
f −1 (B) = {m ∈ M | f (m) ∈ B}.
(i) We say that f is continuous at x ∈ X, if for every ǫ > 0 there is δ > 0 such
that for every x′ ∈ X satisfying dX (x′ , x) < δ we have
Proof. Let us first assume that f is continuous, and fix an arbitrary open set U in
A2 . Take any x ∈ f −1 (U ), then f (x) ∈ U . As U is open in A2 , there is ǫ > 0 such
that Bǫ (f (x)) ⊂ U . As f is continuous ∃δ > 0 such that f (Bδ (x)) ⊂ Bǫ (f (x)) ⊂ U .
Therefore Bδ (x) ⊂ f −1 (U ). Since x ∈ f −1 (U ) was arbitrary, we deduce that f −1 (U )
is open.
Now assume that the pre-image of any open set is an open set. Let x ∈ A1 and
ǫ > 0 be arbitrary elements. Consider the open set Bǫ (f (x)). By the assumption,
f −1 (Bǫ (f (x))) is open. Because x ∈ f −1 (Bǫ (f (x))), there must be δ > 0 such that
is a closed set. The above set is the pre-image of the closed set (−∞, −1]. Since f
is continuous, by the above exercise, the pre-image of (−∞, −1] must be a closed
set. For the same reason, the set
is an open set.
By exercise 2.17, we can easily verify the closed or openness of many sets in
Euclidean spaces. For example,
is an open set.
(i) f is continuous at x ∈ X,
(ii) for any sequence (xn )n≥1 in X which converges to some x ∈ X, the sequence
(f (xn ))n≥1 converges to f (x) in (Y, dY ).
Proof. The proof is identical to the proof of this statement for higher dimensional
Euclidean spaces. One only needs to replace the metric d2 with the metrics dX and
dY is suitable places.
Exercise 2.18. Recall that the set of all continuous functions from [0, 1] to R is
denoted by C([0, 1]). We also defined the metrics d1 and d∞ . Consider the map
Φ : C([0, 1]) → R,
defined as
Φ(f ) = f (1/2).
(i) Is the map Φ from the metric space (C([0, 1]), d ∞ ) to (R, d1 ) continuous?
(ii) Is the map Φ from the metric space (C([0, 1]), d 1 ) to (R, d1 ) continuous?
(iii) Is the map Φ from the metric space (C([0, 1]), d 2 ) to (R, d1 ) continuous?
Exercise 2.19. Consider the metric spaces X = (R, d1 ) and Y = (R, ddisc ). Show
that the map f (x) = x from X to Y is not continuous. Show that the map g(x) = x
from Y to X is continuous.
(i) show that the sequence (fn )n≥1 in C([0, 1]) converges to f in the metric space
(C([0, 1], d1 ).
(ii) show that the sequence (fn )n≥1 in C([0, 1]) does not converge to f in the metric
space (C([0, 1], d∞ ).
is not continuous.
(ii) Two metric spaces (X1 , d1 ) and (X2 , d2 ) are called homeomorphic, if there
is a homeomorphism from X1 to X2 .
Example 2.23. The sets (−∞, ∞) and (−1, 1) with respect to the metric d1 on R1
are homeomorphic. For example the map f (x) = arctan(x) is a homeomorphism
between these two sets.
(i) We say that f is Lipschitz, if there is a constant M > 0 such that for all x1
and x2 in X, we have
(ii) We say that f is bi-Lipschitz, if there are constant M1 > 0 and M2 > 0 such
that for all x1 and x2 in X, we have
Example 2.24. Let (S 1 , d) be the metric space from Example 2.4, that is S 1 is
the circle of radius 1 and d is the arc length between two points on S 1 . Recall that
every point on S 1 is equal to (cos(θ), sin(θ)), for a unique θ ∈ [0, 2π). For every
α ∈ [0, 2π] we can consider the rotation by α on S 1 , which may be defined as
Remark 2.5. As we will be using the word “set” and "subset" very often in this
section, we will use the words “collection” and “class” to mean “set”, and “subcol-
lection” and “subclass” to mean “subset”. So instead of saying “consider the set of
all subsets of R such that .... ”, we may prefer to say “consider the collection of all
subsets of R such that ... ”.
Example 2.25. Let A be an arbitrary set, and τ = {∅, A}. It is easy to see that
τ satisfies the three properties T1, T2, and T3 in the definition of topology. The
collection τ is called the coarse topology on A.
Example 2.26. Let A be an arbitrary set, and let τ be the collection of all subsets
of A. Evidently, τ satisfies the three properties T1, T2, and T3. In this topology,
every subset of A is open. This topology on A is called the discrete topology.
Example 2.27. Let A = {a, b}, where a and b are the letters “a” and “b” (so they
are distinct), and let
τ = {∅, {a, b}, {b}} .
It is easy to see that τ satisfies T1, T2, and T3, so it is a topology on A. The
only open sets in this topology are the empty set, A and the set {b}. So A is the
only open set containing a, and hence any open set containing a also contains b.
The collection τ is called the Sierpinski topology, and the pair (A, τ ) is called
the Sierpinski topological space. Note that this topology is not equal to the coarse
topology, and also it is not equal to the discrete topology.
Example 2.28. Let A = R and let τ be the collection of all subsets of R of the
form (a, +∞) for some a ∈ R ∪ {+∞, −∞}. Here we assume that (+∞, +∞) is the
empty set. You can verify that this collection satisfies the properties T1, T2, and
T3, so τ is a topology on R. This is called the order topology on R.
That is each set in τ is either empty, or its complement has a finite number of
elements. You can see that this set satisfies the properties T1, T2, and T3. This
topology on X is called the co-finite topology.
The following example shows that the topological spaces are, in a sense, gener-
alisation of metric spaces.
Example 2.30. Let (X, d) be a metric space, and let τ be the collection of all
open sets in (X, d). By Lemma 2.4, the empty set and the whole set X are open,
so they belong to τ . This shows that T1 holds. By Lemma 2.5, the union of any
arbitrary number of open sets in (X, d) is open, so property T2 holds. Similarly, by
Lemma 2.6, the intersection of any finite number of open sets in (X, d) is open in
(X, d). Hence property T3 holds. Therefore, τ is a topology on X.
The topology τ on X is called the induced topology from the metric d.
The topology induced on Rn from the metric d2 is called the Euclidean topol-
ogy on Rn . We note that since the metrics d1 , d2 and d∞ are equivalent, they all
induce the same metric on Rn .
As we explained in the above example, every metric on X naturally induces a
topology on X (the induced topology). But, this is not a reversible process. First of
all, distinct metrics on a set X may induce the same topology on X. For example,
if d1 and d2 are topologically equivalent metrics on X, then they induce the same
topology. Therefore, we cannot associate a unique metric to each topology. One
might ask whether for every topology τ on X, there is a metric d on X which induces
τ on X. For example, you can verify that the discrete topology on X is induced from
the discrete metric on X. We say that a topological space (X, τ ) is metrisable, if
there is a metric on X which induces the topology τ .
Exercise 2.22. Consider a discrete metric space (X, ddisc ), that is ddisc is a discrete
metric on X. Show that ddisc induces the discrete topology on X.
There are standard approaches to define new topologies using old ones. We
explain two of these approaches below.
τY = {U ∩ Y | U ∈ τ }.
Example 2.32. Assume that (X, τ ) and (Y, µ) are two topological spaces. Consider
the product set
X × Y = {(x, y) | x ∈ X, y ∈ Y }.
Let τ ∗ µ be the collection of all sets Ω ⊆ X × Y such that for every (x, y) ∈ Ω, there
are Ux ∈ τ and Vy ∈ µ such that x ∈ Ux , y ∈ Vy and Ux × Vy ⊆ Ω.
One can see that the collection τ ∗ µ is a topology on X × Y . This is called the
product topology on X × Y .
To define a topology on X × Y , one might wish to simply take the sets of the
form U × V , such that U ∈ τ and V ∈ µ. By the next exercise, you can see that
this does not work in general.
Exercise 2.24. Let τEucl be the Euclidean topology on R, that is τEucl is the
collection of all open sets in (R, d1 ). Show that the collection
{U × V | U ∈ τEucl , V ∈ τEucl }.
It follows from the above definition that the interior of any set is a subset of that
set. That is, if S ⊆ A, then S ◦ ⊆ S.
Exercise 2.25. Let (A, τ ) be a topological space, and let S and T be subsets of A.
The following properties hold:
(i) if S ⊂ T then S ◦ ⊂ T ◦ ,
Example 2.34. Let (A, τ ) be a topological space, with τ the coarse topology on
A. Then any sequence in A is convergent, and converges to any element in A.
On the other hand, if τ is the discrete topology on A, then a sequence (xn )n∈N
is convergent if and only if, the sequence is eventually constant.
You can shows that τ is a topology on A. The space (A, τ ) is not Hausdorff, since
b and c cannot be separated. The only open set in A which contains c is {a, b, c},
and that set also contains b.
Exercise 2.26. Let (X, d) be a metric space, and let τ be the topology on X
induced from the metric d. Show that (X, τ ) is a Hausdorff topological space.
Theorem 2.15. Let (A, τ ) be a Hausdorff topological space, and let (xn )n∈N be a
sequence in A. If the sequence (xn )n∈N converges in (A, τ ), then its limit is unique.
Proof. Assume in the contrary that there are distinct points x and y in A such that
Because (A, τ ) is a Hausdorff space, there are open sets Gx and Gy such that x ∈ Gx ,
y ∈ Gy , and Gx ∩ Gy = ∅. Since the sequence xn converges to x, there is Nx ∈ N
such that for all n ≥ Nx , we have x ∈ Gx . Similarly, there is Ny ∈ N such that for
all n ≥ Ny we have xn ∈ Gy . Now, for n = max{Nx , Ny }, we have xn ∈ Gx and
xn ∈ gy . This contradicts Gx ∩ Gy = ∅.
Definition 2.22. Let (A, τ ) be a topological space, and let V ⊆ A. We say that V
is closed in (A, τ ), if A \ V is open in (A, τ ). That is, V is closed in (A, τ ) if and
only if A \ V ∈ τ .
Lemma 2.16. Let (A, τ ) be a topological space. Then, the empty set and the set A
are closed in (A, τ ). Moreover, we have
(i) the intersection of any number of (finite, countable, uncountable) closed sets
in (A, τ ) is a closed set in (A, τ ),
(ii) the union of any finite number of closed sets in (A, τ ) is a closed set in (A, τ ).
Proof. This follows from Definition 2.22, and the properties T1, T2, and T3 of
topology, by taking complements. See the proof of Lemma 2.10 for a similar argu-
ment.
Lemma 2.17. Let (A, τ ) be a Hausdorff topological space, and a ∈ A. Prove that
the set {a} is a closed set.
Proof. For any b ∈ A with b 6= a, there are open sets Gb and Ga such that b ∈ Gb ,
a ∈ Ga , and Ga ∩ Gb = ∅. Then, by Definition 2.22, A \ Gb is a closed set. By
Lemmas 2.16, the intersection
\
(A \ Gb )
b∈A\{a}
is a closed set. Since for every b ∈ A \ {a}, A \ Gb contains a and does nor contain
b, the above intersection is equal to {a}. This completes the proof.
Example 2.36. Let τ be the Sierpinski topology on A = {a, b}. The constant
sequence b, b, b, b, . . . converges to the point a (and also to b) in this topology. That
is because, the only open set in (A, τ ) which contains a is A. Obviously, all points
in the sequence belongs to {b} ⊂ A. This implies that the closure of the set {b} is
A.
Lemma 2.18. Let (A, τ ) be a topological space, and assume that S and T are subsets
of A. The following properties hold:
(i) if S ⊂ T , then S ⊂ T ,
Remark 2.7. One can take the statement in part (ii) of the above lemma as the
definition of closed sets in a topological space. In other words, V is closed, if it
contains all the limit points of V . This is in the spirit of how we defined closed sets
in metric spaces, but it is not identical to that. If a set V is closed in a topological
space (A, τ ), in particular, for any sequence in V which converges to some point in
A, the limit of the sequence must belong to V . That is because the limit of the
sequence in V belongs to the limit set of V . However, one has to note that limits of
sequences are not necessarily unique in topological spaces. By considering the limit
points of the set we avoid discussing the notion of Hausdorff property when defining
closed sets.
Proof of Lemma 2.18. Part (i): Let x ∈ S be an arbitrary point. Any neighbour-
hood of x contains a point in S and hence a point in T . This implies that x ∈ T .
Part (ii): First assume that S is closed. Let us suppose in the contrary that
S 6= S. Then there is x ∈ S such that x ∈ / S. This implies that x ∈ A \ S. Since
S is closed, then A \ S is open. This open set contains x, but does not contain any
element of S. So x cannot be a limit point of S, which is a contradiction.
Now assume that S = S. Assume in the contrary that S is not closed. Then A\S
is not open. This implies that there exists x ∈ A \ S such that any neighbourhood
Gx of x is not contained in A \ S. Thus, Gx contains an element of S. This implies
that x ∈ S, and hence x ∈ S. This is a contradiction.
Note that the continuity of f in the above definition does not just depend on f
but also on the topologies on X and Y . This is illustrated in the next example.
Theorem 2.19. Let (X, τX ) and (Y, τY ) be two topological spaces. Then, f : X → Y
is continuous if and only if the pre-image of any closed set in Y is closed in X.
Theorem 2.20. Let (X, τX ), (Y, τY ) and (Z, τZ ) be topological spaces, and assume
that f : X → Y and g : Y → Z are continuous. Then, g ◦ f : X → Z is continuous.
Lemma 2.21. Let (X, τX ) and (Y, τY ) be topological spaces, and y ∈ Y . The
constant map f : X → Y defined as f (x) = y, for all x ∈ X, is continuous.
Since the empty set and the whole set are open in any topology, we conclude that
f −1 (U ) is open in X. Because U was arbitrary, we conclude that f is continuous.
(ii) the topological spaces (X, τX ) and (Y, τY ) are called topologically equiva-
lent, or homeomorphic), if there is a homeomorphism from X to Y .
(i) the sets [a, b] and [0, 1] are homeomorphic, by the map x 7→ (x − a)/(b − a)
from [a, b] to [0, 1],
(ii) the sets (a, b) and (0, 1) are homeomorphic, by the map x 7→ (x − a)/(b − a),
(iii) the sets (−∞, +∞) = R and (−1, 1) are homeomorphic, by the map x 7→
tan(πx/2),
(iv) the sets (0, +∞) and (0, 1) are homeomorphic by the map x 7→ x/(x + 1).
(v) the sets (−∞, +∞) and (0, +∞) are homeomorphic, by the map x 7→ ex .
(vi) the sets [0, 1) and (0, 1] are homeomorphic by the map x 7→ −x + 1.
Exercise 2.27. Assume that the topological spaces (X, τX ) and (Y, τY ) are topo-
logically equivalent. Then, (X, τX ) is Hausdorff if and only if (Y, τY ) is Hausdorff.
From here onward, we will only study metric spaces, as they are fairly general
and capture almost all settings you will come across in mathematics. However,
we will present most of the definitions, statements and proofs using open sets in
the metric space. Thus, most definitions, statements, and proofs can be readily
presented for topological spaces, replacing open sets in the metric with elements of
the topology.
2.3 Compactness
2.3.1 Compactness by covers
Definition 2.26. Let (X, d) be a metric space, and Y ⊆ X.
(iii) An open cover R for Y is called a finite cover, if the number of elements in
R is finite.
R1 = {(−n, n) | n ∈ N}
{(−2n, 2n) | n ∈ N}
The key concept we aim to study in this section is the following definition.
This definition may appear strange at this point, but by the end of this section,
it will be clear how important it is.
Example 2.40. In the metric space (R, d1 ), the set R, and the open interval (0, 1)
are not compact.
In order to show this, we need to present an open cover which does not have a
finite sub-cover. The collection R1 for R introduced in the above example does not
have a finite sub-cover for R. That is because, for any finite sub-cover, say
the point max{n1 , n2 , . . . , nk } does not belong to the above cover, but it belongs
to R. In fact, if n = max{n1 , n2 , . . . , nk }, then the union of the sets in the above
collection is equal to (−n, n), which clearly does not cover R.
Here is another open cover for R, which has no finite sub-cover
{(n − 1, n + 1) | n ∈ Z}.
For the interval (0, 1), we can consider the open cover
{(1/n, 1) | n ∈ N, n ≥ 2}.
This cover does not have a finite sub-cover. Suppose in the contrary that there is a
finite sub-cover of the above collection, say
Define m = max{n1 , n2 , . . . , nk }. We note that 1/m ∈ (0, 1) but 1/m does not
belong to any of the sets in the above collection. Thus, the above finite collection
does not cover (0, 1).
Another example is given by
1 1
, n ∈ N, n ≥ 2 .
n+1 n−1
To see that the above collection is an open cover, we note that for every r ∈ (0, 1),
1/r > 1. Thus, we can choose an integer n ≥ 2 such that 1/r ∈ (n − 1, n + 1). This
implies that r ∈ (1/(n + 1), 1/(n − 1)). By a similar argument, you can show that
this cover does not have a finite sub-cover.
Exercise 2.28. Consider the metric space (R, d1 ), and assume that a and b are
real numbers with a < b. Show that all of the intervals (a, b], [a, b), [a, +∞), and
(−∞, b] are not compact.
Example 2.41. Let (X, d) be a metric space, and assume that Y is a subset of X
with a finite number of elements. Then Y is compact.
To see this, let R be an arbitrary open cover of Y . Since Y only has finite
number of elements, and each of those elements belongs to one set in R, those finite
number of elements in R cover Y . Thus, R has a finite sub-cover.
Example 2.42. In the metric space (R, d1 ) the set Q ∩ [0, 1] is not compact.
√
To see this, let α be an irrational number in [0, 1] (for example, α = 2/2). We
can consider the open cover
Obviously, each set (−∞, α − 1/n) ∪ (α + 1/n, +∞) is open in R, and the above
collection covers Q ∩ [0, 1]. But there is no finite sub-cover of the above cover for
Q ∩ [0, 1].
Exercise 2.29. Show that if A and B are compact subsets of a metric space (X, d),
then A ∪ B is a compact set.
{(x, y) ∈ R2 | x2 + y 2 < 1}
As you may have noted from the definition of compactness, and the above ex-
amples, it is easier to show that a non-compact set is not compact than to show
that a compact set is compact. To show that a set is not compact, it suffices to give
one open cover which has no finite sub-cover. But to show that a set is compact,
we need to show that any open cover has a finite sub-cover. To deal with any open
cover brings a level of sophistication.
Proposition 2.22. Let a and b be real numbers with a ≤ b. In the metric space
(R, d1 ), the closed interval [a, b] is compact.
Proof. Let R be an arbitrary open cover for [a, b]. Let us consider the set
I = s ∈ [a, b] there is a finite sub-cover of R for [a, s] .
The set I is not empty, as it contains a. That is because, there is one element in R
which covers the interval [a, a] = {a}. Also, the set I is bounded from above, since
I ⊆ [a, b]. Thus, by the completeness of the set of real numbers, I has a supremum.
Let t = sup(I). Note that since I ⊆ [a, b], we have t ∈ [a, b]. First we show that
t = b.
Assume that t = a. Since R is an open cover for [a, b], there is an open set
U ∈ R such that a ∈ U . As U is an open set in R, there is δ > 0 such that
(−δ, +δ) = Bδ (a) ⊆ U . By choosing δ smaller, if necessary, we may assume that
δ < b − a. Now, the collection {U } is a finite sub-cover of R for [a, δ/2]. Thus,
δ/2 ∈ I, contradicting sup(I) = a.
Assume that t ∈ (a, b). Since R is an open cover for [a, b] there is U ∈ R such
that t ∈ U . As U ∩ (a, b) is an open set, and t ∈ U ∩ (a, b), there is δ > 0 such that
(t − δ, t + δ) ⊆ U ∩ (a, b). By the definition of supremum, there must be s ∈ I such
that s ∈ (t − δ, t]. As s ∈ I, [a, s] can be covered by a finite sub-cover of R, say
C ⊆ R. Now the collection C ∪ {U } is a finite sub-cover of R, and it covers the set
to the set). Again, since R is an open cover for [a, b] there is an open set U ∈ R
such that b ∈ U . As U is an open set, there is δ > 0 such that (b − δ, b + δ) ⊆ U .
By the definition of supremum, there must be s ∈ I such that s ∈ (b − δ, b]. As
s ∈ I, [a, s] can be covered by a finite sub-cover of R, say C ⊆ R. Now the collection
C ∪ {U } is a finite sub-cover of R, and it covers the set [a, b] ⊆ [a, s] ∪ (b − δ, b + δ).
This shows that there is a sub-cover of R for [a, b].
Therefore,
k
!
\ [
Br (z) ∩ Y ⊂ Br (z) Bri (yi ) = ∅.
i=1
Theorem 2.25. Let (X, dX ) and (Y, dY ) be metric spaces, and consider the product
space X × Y with any of the metrics d from Definition 2.4. If X and Y are compact,
then (X × Y, d) is compact.
Thus, there is a finite sub-cover of R for X × Y . This completes the proof in this
case.
Now assume that R is an arbitrary open cover for X×Y . For each (x, y) ∈ X×Y ,
there is an open set Wxy in R such that (x, y) ∈ Wx,y . Let us choose an open set Uxy
in X and an open set Vxy in Y such that x ∈ Uxy , y ∈ Vxy , and Uxy × Vxy ⊆ Wxy .
The collection of all such open sets Uxy × Vxy , for all x ∈ X and y ∈ Y , is an open
cover of X × Y . By the above proof, there is a finite sub-cover of this cover, say
Uxi yj × Vxi yj for (i, j) ∈ I, which covers X × Y .
For each (i, j) ∈ I, there is Wxi yj ∈ R such that Uxi yj ×Vxi yj ⊆ Wxi yj . Therefore,
{Wxi yj | (i, j) ∈ I}, is a finite sub-cover of R for X × Y . This completes the
proof.
Corollary 2.26. The set [a1 , b1 ] × [a2 , b2 ] × · · · × [an , bn ] in the Euclidean space Rn
is compact.
The above results show us how difficult it is to prove that a given set compact
set is compact. One may imagine how difficult it can be to deal with an unusual set
in Rn . Thus, it is important to have some criteria which can be verified easily, and
imply compactness. In the remaining of this section we aim to introduce few such
criteria.
Exercise 2.32. Let (X, d) be a non-empty metric space, and let Z ⊆ X. Show
that Z is bounded if and only if there is x ∈ X and r ∈ R such that Z ⊆ Br (x).
Proof. Fix an arbitrary x ∈ X and consider the open cover R = {Bn (x) | n ∈ N}.
As X is compact, there is a finite sub-cover of R which covers X. Let Bni (x), for
i = 1, 2, . . . , k, be those finite sets. Define m = max1≤i≤k ni . We have
Theorem 2.28 (Heine Borel). Consider the Euclidean metric space (Rn , d2 ), and
let X ⊆ Rn . Then, X is compact if and only if X is closed and bounded.
Proof. Let us first assume that X is compact. By Lemma 2.27, X is bounded, and
by Theorem 2.24, X is closed.
Now assume that X is closed and bounded. Since X is bounded, there is N ∈ N
such that C ⊆ [−N, N ]n . By Corollary 2.26, the set [−N, N ]n is compact. Thus,
X is a closed set in a compact set. By Proposition 2.23, that implies that X is
compact.
Exercise 2.33. Consider the set R with the discrete metric ddisc . The set (0, 1) is
closed and bounded in (R, ddisc ), but it is not compact.
Exercise 2.34. Let (X, d) be a metric space, and assume that Vn , for n ≥ 1, be a
nest of non-empty closed sets in X.
Example 2.43. The metric space (R, d1 ) is not sequentially compact. For example,
the sequence (n)n≥1 does not have any sub-sequence which converges in (R, d1 ).
Consider (0, 1) ⊆ R, and let d be the induced metric from d1 on (0, 1). The metric
space ((0, 1), d) is not sequentially compact. To see this, consider the sequence
(1/n)n≥1 . This sequence belongs to (0, 1), and converges to 0 in the metric space
(R, d1 ). So any subsequence of this sequence also converges to 0 in (R, d1 ). But,
since 0 ∈/ (0, 1), this sequence has no sub-sequence which converges in ((0, 1), d).
Lemma 2.29. Let (X, d) be a metric space, and (xn )n≥1 be a sequence in X. Then,
(xn )n≥1 has a sub-sequence which converges to an element in X if and only if there
is x ∈ X such that for every ǫ > 0, there are infinitely many i satisfying xi ∈ Bǫ (x).
Proof. First assume that (xn )n≥1 has a sub-sequence which converges to x ∈ X.
Let (xni )i≥1 be a sub-sequence which converges to x. Fix an arbitrary ǫ > 0. By
the definition of convergence, there is N ∈ N such that for all i ≥ N , we have
xni ∈ Bǫ (x). This shows that there are infinitely many n such that xn ∈ Bǫ (x).
Proof. Suppose in the contrary that X is not sequentially compact. Then, there is a
sequence (xn )n≥1 in X which has no convergent sub-sequence. Therefore, for every
x ∈ X, there is no subsequence of this sequence which converges x. Thus, using
Lemma 2.29, for every x ∈ X, there is ǫx > 0 such that only for finitely many n we
have xn ∈ Bǫx (x).
Let Ux = Bǫx (x). Then, the collection
{Ux | x ∈ X}
Proof. Let (xn )n≥1 be a bounded sequence in Rm . Then, there is M > 0 such that
for all n ≥ 1, we have kxn k ≤ M . Since [−M, M ]m is compact in the Euclidean met-
ric, by Theorem 2.30, ([−M, M ]n , d2 ) is sequentially compact. Therefore, (xn )n≥1
has a convergent subsequence.
Note that in the proof of Theorem 2.30 we did not say that there are finitely
many xn in each Uxi , but we say that xn ∈ Uni for finitely many n. This is important
since, the sequence xn may be constant, or there may be infinitely many entries in
the sequence which are the same.
The opposite implication in Theorem 2.30 is also true. But the proof requires
some technical steps, which we break into few optional exercises.
Exercise 2.36.* Let (X, d) be a sequentially compact metric space. Show that X
is separable, that is, there is a countable dense set in X.
The statement of the above theorem is not optional, but its proof is optional.
Proof.* Let R be an arbitrary open cover for X. By Exercise 2.37, we can extract
a countable sub-cover of R, say {V1 , V2 , V3 , . . . }
Suppose that there is no finite sub-cover of {V1 , V2 , . . . } for X. Then for every
n ≥ 1, {V1 , ..., Vn } does not cover X. Hence, for each n we can choose xn ∈ X
such that xn ∈ / ∪ni=1 Vi . In particular, xn ∈/ Vi for every i ≤ n. This implies that
only finitely many entries of the sequence lie in each Vi . This generates a sequence
(xn )n≥1 in X.
Since X is sequentially compact, we can find a convergent sub-sequence, say
(xnj )j≥1 . Suppose this converges to x ∈ X. Then, since {V1 , V2 , . . . } is a cover for X,
there is m ≥ 1 such that x ∈ Vm . Since Vm is open, by the definition of convergence,
there is N ∈ N such that for all j ≥ N , we have xnj ∈ Vm . Hence, infinitely many
entries in the sequence (xn )n≥1 lie in Vm , which is a contradiction.
Remark 2.9. The equivalence of compactness and sequential compactness does not
hold for arbitrary topological spaces. See Remark 2.8. For instance, you may see
that the statement in Exercise 2.36 does not hold for arbitrary topological spaces.
Indeed, in an arbitrary topological space, neither compactness implies sequential
compactness, nor sequential compactness implies compactness. The examples of
such topological spaces are too specialised for the scope of this module.
The above corollary allows us to immediately conclude that some pairs of sets
are not homeomorphic. For example, the intervals (0, 1) and [0, 1] are not homeo-
morphic, since one of them is compact and the other one is not.
Compactness is an extremely useful property in analysis. We shall study some
of the conveniences that come with it.
Recall that a map f : (X, dX ) → (Y, dY ) is uniformly continuous, if for every
ǫ > 0 there exists δ > 0 such that for all x1 and x2 in X satisfying dX (x1 , x2 ) < δ we
have dY (f (x1 ), f (x2 )) < ǫ. Note that δ in the above definition is independent of x1
and x2 . In general it is fairly difficult to verify that a map is uniformly continuous.
You have already seen this for maps from R to R.
Theorem 2.35. Every continuous map from a compact metric space to a metric
space is uniformly continuous.
Proof. To prove this, fix arbitrary metric spaces (A1 , d1 ) and (A2 , d2 ), and assume
that A1 is compact and f : A1 → A2 is continuous.
Suppose in the contrary that f is not uniformly continuous. Then, for some
ǫ > 0 and any n ∈ N there exists xn and x′n in A1 such that
On the other hand, since f is continuous, and the sequences (xnk )k≥1 and (x′nk )k≥1
converge to x, the sequences (f (xnk ))k≥1 and (f (x′nk ))k≥1 converge to f (x). But,
by our choice of these sub-sequences, we have
This is a contradiction.
Corollary 2.36. Assume that a and b are real numbers with a < b. If f : [a, b] → R
is continuous, then it is uniformly continuous.
f (xnk ) ≥ M − 1/nk ,
Exercise 2.38. Let (X, d) be a compact metric space, and assume that f : X → X
is a continuous map such that for all x ∈ X, we have f (x) 6= x. Show that there is
δ > 0 such that for all x ∈ X, we have d(x, f (x)) ≥ δ.
2.4 Completeness
2.4.1 Complete metric spaces and Banach space
The completeness of the set of real numbers is a fundamental property widely used
in analysis. It allows us to solve equations such as x2 = 2 in R, which have no
solutions in Q. Evidently, it is useful to have such a property in more general
settings. However, the completeness of R in terms of least upper bounds, uses the
order on the set of real numbers. This cannot be generalised to arbitrary sets in a
meaningful fashion. But, the completeness of R in terms of Cauchy sequences can
be generalised to arbitrary metric spaces. In this section, we aim to develop this
theory. You will see many applications of the completeness results of this section in
the second year module, Differential Equations.
Definition 2.30. Let (X, d) be a metric space, and (xn )n≥1 be a sequence in X.
We say that (xn )n≥1 is a Cauchy sequence in (X, d), if for every ǫ > 0 there exists
Nǫ ∈ N such that for all n and m bigger than Nǫ we have
d(xn , xm ) < ǫ.
Exercise 2.39. Show that any convergent sequence in a metric space, is a Cauchy
sequence.
Exercise 2.40. Let (X, d) be a metric space, and assume that (xn )n≥1 is a Cauchy
sequence in X. If there is a subsequence of (xn )n≥1 which converges to some x ∈ X,
then the sequence (xn )n≥1 converges to x.
Definition 2.31. (i) A metric space (X, d) is called complete, if every Cauchy
sequence in X converges to a limit in X.
(ii) A normed vector space (V, k·k) is called a Banach space, if V with the induced
metric space dkk is a complete metric space.
Example 2.44. You have already seen in Analysis I that any Cauchy sequence
in R is convergent. You can also prove this using Exercise 2.40 and the Bolzano-
Weierstrass Theorem 2.31. Thus, the metric space (R, d1 ) is complete.
The metric space (Q, d) is not complete (here d1 is the induced metric on Q).
√
For example, any sequence in Q which converges to 2, is Cauchy but does not
converge in (Q, d1 ).
In the same fashion, the metric space ((0, 1], d1 ) is not complete. For example,
the sequence (1/n)n≥1 in (0, 1] is Cauchy, but not convergent (the limit does not
belong to (0, 1]).
However, the metric space ([0, 1], d1 ) is complete.
Proof. Assume that (xn )n≥1 is a Cauchy sequence in Rm . For each n ≥ 1, let
us write xn = (x1n , x2n , . . . , xm 1 2 m
n ). Recall that for every z = (z , z , . . . , z ) and
y = (y 1 , y 2 , . . . , y m ) in Rm , and every k ∈ {1, 2, . . . , m}, we have
|z k − y k | ≤ kz − yk .
This implies that for every k ∈ {1, 2, . . . , m}, the sequence (xkn )n≥1 is a Cauchy
sequence in (R, d1 ). To see this, fix an arbitrary ǫ > 0. Since (xn )n≥1 is Cauchy
in (Rm , d2 ), there is Nǫ ∈ N, such that for all i and j bigger than Nǫ we have
d2 (xi , xj ) < ǫ. Then, by the above inequality, for all i and j bigger than Nǫ , we
have
d1 (xki , xkj ) = |xki − xkj | ≤ kxi − xj k = d2 (xi , xj ) < ǫ.
Now, since every Cauchy sequence in R is convergent, the sequence (xkn )n≥1 con-
verges to some point in R, say xk . This implies that the sequence (xn )n≥1 converges
to x = (x1 , x2 , . . . , xm ) in Rm .
Alternatively, by the the above lemma, we can say that the normed vector space
(Rm , k·k
2 ) is a Banach space.
Example 2.45. In any discrete metric space, only eventually constant sequences
are Cauchy. Obviously, any eventually constant sequence is convergent. Therefore,
any set with the discrete metric is complete.
Recall that for real numbers a and b with a ≤ b, C([a, b]) denotes the set of all
continuous functions f : [a, b] → R. We defined two norms on C([a, b]) denoted by
k·k2 and k·k∞ . These induce the metrics d2 and d∞ , respectively. In these metrics,
for f and g in C([a, b]), we have
and
Z b 1/2
d2 (f, g) = |f (t) − g(t)|2
a
Proposition 2.39. The metric space (C([a, b], d2 ) is not complete. Equivalently,
the normed vector space (C([a, b]), k·k2 ) is not a Banach space.
Proof. To simplify the argument, let us assume that a = −1 and b = 1 (one can
adapt the following example to the general case). For n ≥ 1, consider the functions
−1 if − 1 ≤ t ≤ −1/n,
φn (t) = nt if − 1/n ≤ t ≤ 1/n,
1 if 1/n ≤ t ≤ 1.
+1
b b b b
−1 1 1
−n +1
n
−1
Figure 2.7: The graphs of three functions in the sequence (φn )n≥1 .
By the above properties, the right hand side of the above equation tends to 0 as
n → ∞. As the left hand side is a non-negative number, we must have
Z 1
|f (t) − ψ(t)|2 = 0.
−1
f +ǫ
fn
f −ǫ
f
Figure 2.8: In the uniform convergence, for all n ≥ Nǫ , the graph of the function fn
lies between f − ǫ and f + ǫ.
Then, since f and φ are continuous on the intervals (−1, 0) and (0, 1), the above
equations imply that f = ψ on (−1, 0) and on (0, 1). Such a map f cannot be
continuous.
Remark 2.10. Just like building the completion of Q to get the set of real numbers,
one can build the completion of the metric space (C([a, b]), d2 ). This results in a
complete metric space of functions, where one can look for solutions to functional
equations. You can learn about this and similar results by taking the optional
module Lebesgue Measure and Integration.
Proof. Fix an arbitrary c ∈ [a, b]. In order to prove that f is continuous at c, let us
also fix an arbitrary ǫ > 0.
Because the sequence (fn )n≥1 converges uniformly to f , there is Nǫ ∈ N, such
that for all n ≥ Nǫ , and all x ∈ [a, b] we have |fn (x) − f (x)| < ǫ/3.
Now, fix an arbitrary n ≥ Nǫ . Since fn is continuous at c, there is δ > 0 such
that for all x ∈ Bδ (c) ∩ [a, b], we have |fn (x) − fn (c)| ≤ ǫ/3.
By the above inequalities, and the triangle inequality for the modulus function,
for all x ∈ Bδ (c) ∩ [a, b], we have
|f (x) − f (c)| ≤ |f (x) − fn (x)| + |fn (x) − fn (c)| + |fn (c) − f (c)|
< ǫ/3 + ǫ/3 + ǫ/3 = ǫ.
Theorem 2.41. The metric space (C([a, b]), d∞ ) is complete. Equivalently, the
normed vector space (C([a, b]), k·k∞ ) is a Banach space.
Proof. Let (φn )n≥1 be a Cauchy sequence in (C([a, b]), d∞ ). By definition, for every
ǫ > 0 there exists Nǫ ∈ N such that for all x ∈ [a, b] and all m and n bigger than Nǫ
we have |φn (x) − φm (x)| < ǫ.
Now, fix an arbitrary x ∈ [a, b]. By the above paragraph, the sequence of real
numbers (φn (x))n≥1 is a Cauchy sequence in (R, d1 ). Then, by the completeness
of the set of real numbers, the sequence of real numbers (φn (x))n≥1 converges to a
(unique) real number, which we denote by lx . As x in [a, b] was arbitrary, for each
x ∈ [a, b], we obtain a real number lx .
Let us define the function φ : [a, b] → R as φ(x) = lx . We claim that φn
converges uniformly to φ on [a, b]. To see this, fix an arbitrary ǫ > 0. Since (φn )n≥1
is a Cauchy sequence in (C([a, b]), d∞ ), (for ǫ/2 > 0) there exists Mǫ ∈ N such that
for all x ∈ [a, b] and all m and n bigger than Mǫ we have
Proof. Let (xn )n≥1 be a Cauchy sequence in (X, d). By theorem 2.30, (X, d) is
sequentially compact. Thus, there exists a subsequence (xnk )k≥1 which converges
to some x ∈ X. By Exercise 2.40, xn converges to x in (X, d).
2.4.2 Arzelà-Ascoli
There is an important corollary of the completeness of (C([a, b]), d∞ ), which we
present in this section.
(i) We say that the collection C is uniformly bounded, if there exists M such
that for all f ∈ C and all x ∈ [a, b] we have |f (x)| < M .
(ii) We say that the collection C is uniformly equi-continuous, if for all ǫ > 0
there exists δ > 0 such that for all f ∈ C, and all x1 and x2 in [a, b] satisfying
|x1 − x2 | < δ, we have |f (x1 ) − f (x2 )| < ǫ.
Note that in the second part of the above definition, the number δ does not
depend on the function f , but only on the collection C.
Proof. Let us fix an arbitrary sequence (fn )n≥1 in C. We need to show that there
is a sub-sequence of this sequence which converges to some continuous function
f : [a, b] → R with respect to the metric d∞ . We break the proof into several steps:
Step 1. The sequence (fi )∞ ∞
i=0 has a sub-sequence (gi )i=0 which converges point-
wise on [a, b] ∩ Q.
Proof of Step 1: Note that the set [a, b] ∩ Q is countable. This means that we
may write [a, b] ∩ Q = {x1 , x2 , . . .}.
Let us denote the function fi by the notation f0,i , that is, for all i ∈ N and for
all x ∈ [a, b], we have f0,i (x) = fi (x).
Now consider the sequence of numbers (f0,i (x1 ))∞ i=0 . This is a bounded sequence
of real numbers. By Bolzano–Weierstrass, this sequence has a convergent subse-
quence, say (f1,i (x))∞ ∞
i=0 . Now let us consider (f1,i (x2 ))i=0 , which again is a bounded
sequence of real numbers, with a convergent subsequence f2,i (x2 ). This is a sub-
sequence of f1,i such that f2,i (x1 ) and f2,i (x2 ) both converge. We can repeat this
process of extracting subsequences to obtain functions fk,i for k, i ∈ N with the
property that (fk+1,i )∞ ∞
i=0 is a subsequence of (fk,i )i=0 , and moreover for all l ≤ k,
the sequence fk,i (xl ) converges.
Let us define the sequence of functions gi = fi,i , for i ∈ N. Each gi is defined on
[a, b]. To illustrate the above process, one may think of fi,i as the diagonal of the
array
f0,0 f0,1 f0,2 f0,3 ...
f1,0 f1,1 f1,2 f1,3 ...
f2,0 f2,1 f2,2 f2,3 ...
f3,0 f3,1 f3,2 f3,3 ...
Clearly, (gi )∞
i=0 is a subsequence of F , and moreover for every l ∈ N the sequence
gi (xl ) converges. Let us define g(xl ) = limi→∞ gi (xl ).
Since [a, b] is bounded, there are rational numbers x1 , . . . , xk in [a, b] such that
Since gi converges at each rational point, for each m = 1, . . . k there exists Nm such
that for all i, j ≥ Nm we have
|gi (x) − gj (x)| = |gi (x) − gi (xm ) + gi (xm ) − gj (xm ) + gj (xm ) − gj (x)|
≤ |gi (x) − gi (xm )| + |gi (xm ) − gj (xm )| + |gj (xm ) − gj (x)|
< ǫ/3 + ǫ/3 + ǫ/3 = ǫ.
Theorem 2.44 (Banach fixed point Theorem). Let (X, d) be a non-empty complete
metric space, and f : X → X be a contracting map. Then, f has a unique fixed
point in X.
Show that the sequence (xn )n≥1 converges to a root of the equation
x4 − 4x2 − x + 4 = 0
√
which lies in the interval [ 3, 2].
Exercise 2.43. Consider the map f : (0, 1/3) → (0, 1/3), defined as f (x) = x2 .
Show that the map f is a contraction with respect to the Euclidean metric d1 . But,
f has no fixed point in (0, 1/3).
Exercise 2.44. Consider the map f : [1, ∞) → [1, ∞) defined as f (x) = x + 1/x.
Show that ([1, +∞), d1 ) is a complete metric space, and for all x and y in [1, ∞) we
have
d1 (f (x), f (y)) ≤ d(x, y).
But, f has no fixed point.
2.5 Connectedness
The intermediate value theorem is one of the main features in real analysis with
many applications. If f : [a, b] → R is a continuous function, and there are α and
β in [a, b] such that f (α) < 0 and f (β) > 0, then there must be γ between α and
β such that f (γ) = 0. What is it about the domain [a, b], or the range R, or the
continuity of the map f which makes this theorem work? Is there any way to extend
this useful statement to more general settings. This is the purpose of this section,
and we will see that there is indeed a natural way to extend this property to more
general settings.
(i) U ∩ V = ∅,
(ii) T ⊆ U ∪ V ,
(iii) T ∩ U 6= ∅ and T ∩ V 6= ∅.
Example 2.47. Consider the set R2 with the Euclidean metric d2 . Let
That is, T consists of two horizontal line segments in the plane. Intuitively, we see
T as having more than one piece. Indeed, T is disconnected. For example, let
U ∩ T = [−1, 1] × {−1} =
6 ∅, V ∩ T = [−1, 1] × {1} =
6 ∅.
We also have T ⊆ U ∪ V .
Note that being disconnected does not only depend on the set T , but it crucially
depends on the metric d on X. We illustrate this by the following example.
Example 2.48. Let (X, ddisc ) be a discrete metric space, and assume that X has
at least two elements. Then, X is disconnected. To see this, recall that in the
discrete topology, any subset of X is open. Let x ∈ X be an arbitrary elements.
Define U = {x} and V = X \ {x}. Then, since X has at least two points, V
must be non-empty. Then, U and V satisfy the three properties in the definition of
disconnectedness.
Exercise 2.45. Let (X, d) be a metric space. Show that X is connected if and only
if the only subsets of X which are both open and closed are X and ∅.
Example 2.49. Consider the set of real numbers with the Euclidean metric, and
let a ∈ R. Then the set R \ {a} is not connected (disconnected).
Let U = (−∞, a) and V = (a, +∞). Clearly, U and V are open, non-empty,
disjoint, and their union covers R \ {a}.
Exercise 2.46. Show that in the Euclidean metric space (R1 , d1 ), the set of rational
numbers Q is disconnected.
Proof. First assume that such a map f exists. Let U = f −1 (0) and V = f −1 (1).
Since f (T ) = {0, 1}, U 6= ∅ and V 6= ∅. Also, since f is continuous, U = f −1 (0) =
f −1 (−1/2, 1/2) and V = f −1 (1) = f −1 (1/2, 3/2) are open sets. Moreover, as
f (T ) = {0, 1}, T ⊆ U ∪ V . Obviously, U ∩ V = ∅. These imply that T is dis-
connect.
Now assume that T is disconnected. By definition, there are non-empty, disjoint,
open sets U and V in X such that T ⊆ U ∪ V , U ∩ T 6= ∅ and V ∩ T 6= ∅. Let us
define the map f : T → R as
0 if x ∈ U ∩ T,
f (x) =
1 if x ∈ V ∩ T.
Proof. If S is an interval, then by the definition of an interval, the latter side of the
lemma holds.
Now assume that the latter side of the theorem holds. If S is not bounded from
above, we define b = +∞, and if S is bounded from above, we define b = sup S.
Similarly, if S is not bounded from below, we define a = −∞, and if S is bounded
from below, we let a = inf S.
Let us first show that the open interval (a, b) ⊆ S. To see this, fix an arbitrary
z ∈ (a, b). Since z < b, z cannot be an upper bound for S (otherwise, sup S ≤ z).
Therefore, there is b′ ∈ S such that b′ > z. Similarly, since z > a, z cannot be
a lower bound for S (otherwise inf S ≥ z). Therefore, there is a′ ∈ S such that
a′ < z. Combining these together, we have a′ < z < b′ , a′ ∈ S, and b′ ∈ S. By the
assumption in the latter side of the theorem, we must have z ∈ S. Because z ∈ (a, b)
was arbitrary, we conclude that (a, b) ⊆ S.
Note that the supremum and infimum of a set do not have to be in the set itself.
There are several possibilities for the set S depending on whether each of a and b
belongs to S or not. (of course if a = −∞ or b = +∞, they cannot be in S). Then,
[a, b] if a ∈ S and b ∈ S,
[a, b) if a ∈ S and b ∈
/ S,
S=
(a, b] if a ∈/ S and b ∈ S,
(a, b) if a ∈/ S and b ∈/ S.
Theorem 2.47. Consider the Euclidean metric space (R, d1 ) and let S ⊆ R. If S
is connected, then S is an interval.
Proof. Suppose S is connected, but it is not an interval. By Lemma 2.46, there exist
x and y in S and z ∈ R such that x < z < y and z ∈ / S.
Consider the sets U = (−∞, z) and V = (z, +∞). Then, the sets U and V
are open in R1 , U ∩ V = ∅, S ⊆ U ∪ V , and U ∩ S 6= ∅ (since it contains x) and
V ∩ S 6= ∅ (since it contains y). These show that U and V disconnect S, which is a
contradiction.
Theorem 2.48. For every a and b in R with a < b, the interval [a, b] is connected
in the metric space (R, d1 ).
Proof. Let us assume that [a, b] is disconnected. Then, there must be open sets U
and V in R such that
I = {s ∈ [a, b] | [a, s] ⊆ U }.
As a ∈ I, the set I is not empty, and since I ⊆ [a, b], I is bounded from above.
Therefore, I has a supremum, which we denote by t. Note that t ∈ [a, b], and t may
or may not be in I. We consider three cases below.
(I) Assume that t ∈ I and t = b. These imply that [a, b] ⊂ U , which is a
contradiction, since [a, b] ∩ V 6= ∅ and U ∩ V = ∅.
(II) Assume that t ∈ / I. This implies that t ∈ / U , t 6= a and [a, t) ⊂ U . As
t ∈ [a, b] and [a, b] ⊂ U ∪ V , we must have t ∈ V . Now, since V is an open set
in R, there is δ > 0 such that (t − δ, t + δ) ⊂ V . As U ∩ V = ∅, we must have
(t − δ, t + δ) ∩ U = ∅. This contradicts [a, t) ⊂ U .
(III) Assume that t 6= b. We either have t ∈ U or t ∈ V . If t ∈ U , by the openness
of U , there is δ′ > 0 such that (t − δ′ , t + δ′ ) ⊂ U . This contradicts t = sup I. If
t ∈ V , by the openness of V , there is δ′′ > 0 such that (t − δ′′ , t + δ′′ ) ⊂ V . Thus
(t − δ′′ , t + δ′′ ) ∩ U = ∅. This contradicts t = sup I.
Exercise 2.47.* Consider the Euclidean metric space (R, d1 ), and assume that a
and b are real numbers with a < b.
Proof. Let us assume in the contrary that f (S) is not connected. Then, there are
open sets U and V in A2 such that
U ′ ∩ V ′ = ∅, S ⊂ U ′ ∪ V ′, S ∩ U ′ 6= ∅, S ∩ V ′ 6= ∅.
These are subsets of X. As f is continuous, and the sets (−∞, 0) and (0, +∞) are
open in R, the sets U and V are open in (X, d). Obviously, U ∩ V = ∅. Moreover,
U 6= ∅ since a ∈ U , and V 6= ∅ since b ∈ V . Also, since there is no c ∈ X satisfying
f (c) = 0, U ∪ V = X. These show that X is disconnected, contradicting the
hypothesis in the theorem.
Corollary 2.52. Let f : [a, b] → R be a continuous map, and assume that there are
x and y in [a, b] satisfying f (x) < 0 and f (y) > 0. Then, there is z ∈ [a, b] such
that f (z) = 0.
Proof. This immediately follows from Theorem 2.51 and Theorem 2.48.
Example 2.50. The intervals (0, 1) and (0, 1] are not homeomorphic. Assume in the
contrary that there is a homeomorphism f : (0, 1) → (0, 1]. Let x = f −1 (1) ∈ (0, 1).
Then, f : (0, 1) \ {x} → (0, 1) is a homeomorphism. This contradicts 2.50, since
(0, 1) is connected but (0, 1) \ {x} is nor connected.
By a similar argument, one can show that the pair of intervals (0, 1) and [0, 1],
as well as the pair of interval (0, 1] and [0, 1] are not homeomorphic.
Definition 2.36. Consider a metric space (X, d). Given a pair of points a and b in
X, a path from a to b in X is a continuous map f : [0, 1] → X such that f (0) = a
and f (1) = b. This is also called a path joining a and b.
Remark 2.11. In the above definition, the closed interval [0, 1] can be replaced by
any closed interval [α, β].
Definition 2.37. A metric space (X, d) is called path-connected, if for any pair
of points a and b in X there is a path from a to b in X.
Exercise 2.48. Show that the following metric spaces are path connected.
Proof. Let us assume that there is a metric space (X, d) which is path connected,
but not connected. By Lemma 2.45, there is a continuous map f : X → R satisfying
f (X) = {0, 1}. Then, there exist x and y in X such that f (x) = 0 and f (y) = 1.
Because X is path connected, there is a continuous map g : [0, 1] → X satisfying
g(0) = x and g(1) = y. Then, f ◦ g : [0, 1] → R is continuous, and its image is equal
to {0, 1}.
Let us consider the map (f ◦ g) − 1/2 on the interval [0, 1]. It take both values
−1/2 and +1/2, but it does not take the value 0. However, by Corollary 2.52, this
map must take the value 0 at some point in [0, 1].
By the above theorem, the sets in Exercise 2.48 are connected. In the same fash-
ion, one can show that the cube [0, 1]n is connected in Rn . Compare this argument
with how difficult it is to show that the interval [0, 1] is connected.
Exercise 2.49. Consider the set of all continuous functions f : [0, 1] → R, that is
C([0, 1]), with the metric d1 .
Exercise 2.50.* In this exercise, we aim to show that the converse of Theorem 2.53
is not true.
Consider the following subset of R2 :
That is, A is the union of the oscillating curve which is the graph of sin(1/x), and
the vertical line segment {0} × [−1, +1].
Proof. By Theorem 2.48, the interval [a, b] is connected in R. Since the image of
any connected set by a continuous map is connected (see Theorem 2.49), f ([a, b]) is
connected. Then, by Theorem 2.47, f ([a, b]) must be an interval. By the definition
of interval, f ([a, b]) is equal to one of the sets [m, M ], (m, M ], [m, M ), or (m, M ),
for some m ∈ R ∪ {−∞} and M ∈ R ∪ {+∞} with m ≤ M .
By Theorem 2.37, f ([a, b]) is compact. Thus, m and M are finite numbers and
f ([a, b]) = [m, M ].