0% found this document useful (0 votes)
8 views118 pages

Analysis 2 Notes

Analysis II is a continuation of Analysis I, focusing on higher-dimensional Euclidean spaces and the generalization of analysis concepts to metric and topological spaces. The module covers differentiation in higher dimensions, including continuity, derivatives, and theorems such as the Inverse and Implicit Function Theorems. It also introduces metric spaces, topological spaces, compactness, completeness, and connectedness, with an emphasis on definitions, examples, and theorems relevant to these topics.

Uploaded by

demos1960.dd
Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views118 pages

Analysis 2 Notes

Analysis II is a continuation of Analysis I, focusing on higher-dimensional Euclidean spaces and the generalization of analysis concepts to metric and topological spaces. The module covers differentiation in higher dimensions, including continuity, derivatives, and theorems such as the Inverse and Implicit Function Theorems. It also introduces metric spaces, topological spaces, compactness, completeness, and connectedness, with an emphasis on definitions, examples, and theorems relevant to these topics.

Uploaded by

demos1960.dd
Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Analysis II, Term I

Dr. Davoud Cheraghi

December 1, 2023
Chapter 0. Introduction to the module Analysis II, Term I, Page i

Introduction to the module

This is a continuation of Analysis I module you had in year-one. In that module,


you have learned about the real numbers, completeness, convergence of sequences
and series, continuity and differentiability of functions on an interval or R, integral
of a function on an interval. Analysis II is a single module in year-two, delivered
during term I and term II.
The content of Analysis II in term I has two parts. In the first part we complete
the study of analysis on Euclidean spaces, by introducing the concepts of converges
of sequences in higher dimensional Euclidean spaces Rn , and the continuity and
differentiability of maps from Rn to Rm . In the second part of the module, we
generalise these notions of analysis on Euclidean spaces into a broader setting, called
metric spaces and topological spaces. That is a setting where one can define the
notions of converge of sequences, completeness of spaces, continuity of maps, etc.
Many theorems you have learned in the previous analysis module extends into this
setting, and indeed, one can give unified proofs to all those statements at once.
Many theorems find a natural form in the setting of metric spaces, and you will see
that the proof you already know for a statement can be adapted to the more general
setting.
Any section/subsection marked with ∗ is not examinable, but will be valuable
in future courses, especially if you take pure analysis courses in your third year and
beyond. You should certainly at least read through the notes on these sections, even
if you choose not to attempt the questions. I will try to indicate in lectures when
I’m covering those material.
Throughout this lecture notes, the definitions are numbered successively within
each chapter, that is, in Chapter 1, you will see Definition 1.1, Definition 1.2, Defini-
tion 1.3, and so on. The same numbering mechanism applies to Examples, Exercises,
and Remarks in each chapter. On the other hand, the results such as lemmas, propo-
sitions, corollaries, and theorems are collectively numbered in a successive fashion.
That is, in Chapter 1, you will see Proposition 1.1, Theorem 1.2, Theorem 1.3, etc.
Contents Analysis II, Term I, Page ii

Contents

Introduction to the module i

1 Differentiation in higher dimensions 1


1.1 Euclidean spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1.1 Preliminaries from analysis I . . . . . . . . . . . . . . . . . . 1
1.1.2 Euclidean space of dimension n . . . . . . . . . . . . . . . . . 2
1.1.3 Convergence of sequences in Euclidean spaces . . . . . . . . . 4
1.1.4 Open sets in Euclidean spaces . . . . . . . . . . . . . . . . . . 7
1.2 Continuity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
1.2.1 Continuity at a point, and continuity on an open set . . . . . 9
1.3 Derivative of a map of Euclidean spaces . . . . . . . . . . . . . . . . 14
1.3.1 Derivative as a linear map . . . . . . . . . . . . . . . . . . . . 14
1.3.2 Chain rule . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
1.4 Directional derivatives . . . . . . . . . . . . . . . . . . . . . . . . . . 24
1.4.1 Rates of change and partial derivatives . . . . . . . . . . . . . 24
1.4.2 Relation between partial derivatives and differentiability . . . 28
1.5 Higher derivatives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
1.5.1 Higher derivatives as linear maps . . . . . . . . . . . . . . . . 35
1.5.2 Symmetry of mixed partial derivatives . . . . . . . . . . . . . 36
1.5.3 Taylor’s theorem . . . . . . . . . . . . . . . . . . . . . . . . . 38
1.6 Inverse and Implicit function theorems . . . . . . . . . . . . . . . . . 40
1.6.1 Inverse function theorem . . . . . . . . . . . . . . . . . . . . . 40
1.6.2 Implicit Function Theorem . . . . . . . . . . . . . . . . . . . 44
1.6.3 * Sketch of the proof of the Implicit Function Theorem . . . . 46
1.6.4 The general form of the Implicit Function Theorem . . . . . . 47
1.6.5 * Equivalence of the two theorems . . . . . . . . . . . . . . . 48

2 Metric and topological spaces 50


2.1 Metric spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
2.1.1 Motivation and definition . . . . . . . . . . . . . . . . . . . . 50
2.1.2 Examples of metric spaces . . . . . . . . . . . . . . . . . . . . 52
Contents Analysis II, Term I, Page iii

2.1.3 Normed vector spaces . . . . . . . . . . . . . . . . . . . . . . 60


2.1.4 Open sets in metric spaces . . . . . . . . . . . . . . . . . . . . 62
2.1.5 Convergence in metric spaces . . . . . . . . . . . . . . . . . . 66
2.1.6 Closed sets in metric spaces . . . . . . . . . . . . . . . . . . . 68
2.1.7 Interior, isolated, limit, and boundary points in metric spaces 70
2.1.8 Continuous maps of metric spaces . . . . . . . . . . . . . . . 73
2.2 Topological spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
2.2.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
2.2.2 Topology on a set . . . . . . . . . . . . . . . . . . . . . . . . . 78
2.2.3 Convergence, and Hausdorff property . . . . . . . . . . . . . . 82
2.2.4 Closed sets in topological spaces . . . . . . . . . . . . . . . . 83
2.2.5 Continuous maps on topological spaces . . . . . . . . . . . . . 85
2.3 Compactness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
2.3.1 Compactness by covers . . . . . . . . . . . . . . . . . . . . . . 88
2.3.2 Sequential compactness . . . . . . . . . . . . . . . . . . . . . 94
2.3.3 Continuous maps and compact sets . . . . . . . . . . . . . . . 96
2.4 Completeness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
2.4.1 Complete metric spaces and Banach space . . . . . . . . . . . 99
2.4.2 Arzelà-Ascoli . . . . . . . . . . . . . . . . . . . . . . . . . . . 104
2.4.3 Fixed point Theorem . . . . . . . . . . . . . . . . . . . . . . . 106
2.5 Connectedness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108
2.5.1 Connected sets . . . . . . . . . . . . . . . . . . . . . . . . . . 108
2.5.2 Continuous maps and connected sets . . . . . . . . . . . . . . 112
2.5.3 Path connected sets . . . . . . . . . . . . . . . . . . . . . . . 113
Chapter 1. Differentiation in higher dimensions Analysis II, Term I, Page 1

Chapter 1

Differentiation in higher
dimensions

1.1 Euclidean spaces


1.1.1 Preliminaries from analysis I
In this chapter we are going to extend some of the ideas that you saw last year (such
as limits and continuity) to higher dimensions. The definitions are almost identical,
so this should mostly feel like a review chapter to begin with, although some of the
ideas we are going to approach from a different point of view.
Throughout these notes we frequently use the standard notations for the set of
natural numbers
N = {1, 2, 3, . . .},

the set of integers


Z = {. . . , −2, −1, 0, 1, 2, . . .},

the set of rational numbers

Q = {p/q | p ∈ Z, q ∈ Z \ {0}},

and the set of real numbers R. The set of real numbers is obtained as the completion
of Q. We may add, multiply and subtract elements of R, and we can divide by
elements of R \ {0}. Note that some authors use the notation N to denote the set
{0, 1, 2, . . . }, but we will omit 0 from this set.
On R we have a notion of ordering ≤, so that we may say whether a real number is
greater than, less than or equal to another. Moreover, R satisfies the completeness
axiom, that is, if A ⊂ R is non-empty and bounded above, then A has a least upper
bound. The standard notation for the least upper bound of A is sup(A).

Lecture notes for Week 1


Chapter 1. Differentiation in higher dimensions Analysis II, Term I, Page 2

An important function defined on all real numbers is the modulus function,


defined as (
x x ≥ 0,
|x| :=
−x x < 0.
This function has the following properties:

(i) for all x ∈ R, we have |x| ≥ 0, with |x| = 0 if and only if x = 0,

(ii) for all x and y in R, |xy| = |x| |y|,

(iii) for all x and y in R,


|x + y| ≤ |x| + |y| .

The third property in the above list is called the triangle inequality for the mod-
ulus function.

1.1.2 Euclidean space of dimension n


For n ≥ 1, the n-dimensional Euclidean space, denoted by Rn , is defined as the
set of ordered n-tuples (x1 , x2 , . . . , xn ), where each xi ∈ R, for i = 1, 2, . . . , n. Each
such n-tuple is denoted by a single letter x = (x1 , x2 , . . . , xn ) and will be referred
to as a point in Rn . The entries xi are called the coordinates of x.
One may see each element of Rn as a row vector with n real components, or as
a column vector with n real components. We do not make this distinction (unless
when a matrix is acting on the point x. When a matrix M acts on a vector with the
same components as x we use M xt to make it clear that x is viewed as a column
vector. Here t denotes the transpose operation.)
We shall try to stick to the convention of using superscripts to label components
of vectors, and subscripts to label different vectors, so that x1 , x2 ∈ Rn are two
different vectors, while x1 , x2 ∈ R are the components of one vector.
If x and y are elements of Rn with

x = x1 , . . . , xn , y = y1, . . . , yn ,
 

we can add these two elements according to

x + y = x1 + y 1 , . . . , xn + y n .


Moreover, for every λ ∈ R, we define

λx = λx1 , . . . , λxn .


With these definitions, Rn is a vector space over R.


The inner product,
h· , ·i : Rn × Rn → R,

Lecture notes for Week 1


Chapter 1. Differentiation in higher dimensions Analysis II, Term I, Page 3

is defined as
n
X
1 n 1 n
(x , . . . , x ), (y , . . . , y ) = xi y i .
i=1

Using the inner product, we may define the length, or norm, function

k·k : Rn → [0, ∞)

as
hx, xi = hx, xi1/2 .
p
kxk =

Note that the inner product of two vectors is a real number, not a vector.
The norm function on Rn has the following properties:

(i) for all x ∈ Rn , we have kxk ≥ 0, with kxk = 0 if and only if x = 0,

(ii) for all x ∈ Rn and λ ∈ R, kλxk = |λ| kxk,

(iii) for all x and y in Rn ,


kx + yk ≤ kxk + kyk . (1.1)

The third property in the above list is called the triangle inequality for the norm
on Rn .

Remark 1.1. As we shall see later, these properties can be used in an abstract
fashion to define more general “normed vector spaces”. The norm gives us a use-
ful notion of “distance” between two points, that is, the distance from x to y is
given by kx − yk. Notice that if n = 1 we have |·| = k·k, and we will use either
interchangeably in this case.

Exercise 1.1. (a) Show that the inner product satisfies the following properties:
for all x, y, and z in Rn and all a ∈ R,

hx, yi = hy, xi , hx + y, zi = hx, zi + hy, zi , hax, yi = a hx, yi .

(b) For t ∈ R and x, y ∈ Rn , show that:

kx + tyk2 = kxk2 + 2t hx, yi + t2 kyk2 ≥ 0. (1.2)

(c) By thinking of (1.2) as a quadratic in t, and considering its possible roots,


deduce the Cauchy-Schwartz inequality:

|hx, yi| ≤ kxk kyk . (1.3)

When does equality hold?

(d) Deduce the triangle inequality (1.1).

Lecture notes for Week 1


Chapter 1. Differentiation in higher dimensions Analysis II, Term I, Page 4

(e) Show the reverse triangle inequality:

kxk − kyk ≤ kx − yk

Exercise 1.2. Suppose x = (x1 , . . . , xn ) ∈ Rn .

(a) Show that:


max xk ≤ kxk . (1.4)
k=1,...,n

(b) Show that:



kxk ≤ n max xk . (1.5)
k=1,...,n

1.1.3 Convergence of sequences in Euclidean spaces


Now that we have a few definitions relating to Rn , we’re ready to revisit some con-
cepts from first year analysis and see how they can be extended to higher dimensions.
A sequence in Rn is an ordered list

x0 , x1 , x2 , . . . ,

with each xi ∈ Rn , for i = 0, 1, 2, . . .. This is often written (xi )∞


i=0 , or (xi )i∈N . A
very important concept relating to sequences is convergence.

Definition 1.1. A sequence (xi )∞ n


i=0 with xi ∈ R converges to (the vector) x ∈ R
n

if the following holds: For every ǫ > 0, there exists N ∈ N such that for all i ≥ N
we have
kxi − xk < ǫ.

We then write:
xi → x, as i → ∞,

or
lim xi = x.
i→∞

One may compare the above definition to the one for convergence of a sequence
of real numbers. Indeed, this notion is intimately related to convergence of real
numbers, as stated in the next lemma.

Proposition 1.1. The sequence of vectors (xi )∞ n


i=0 with xi ∈ R converges to the
vector x ∈ Rn if and only if each component of xi converges to the corresponding
component of x. That is, if we write:

xi = (x1i , . . . , xni ), and x = (x1 , . . . xn ),

then, xi → x as i → ∞ if and only if for all k = 1, . . . n, xki → xk as i → ∞.

Lecture notes for Week 1


Chapter 1. Differentiation in higher dimensions Analysis II, Term I, Page 5

Proof. Let us first assume that for all k = 1, 2, . . . , n,

xki → xk , as i → ∞.

Fix an arbitrary ǫ > 0. Then, for each k = 1, . . . , n, we apply the definition of



convergence of xki → xk to ǫ/ n to obtain Nk ∈ N such that for all i ≥ Nk we have
ǫ
xki − xk < √ .
n

Let N = max{N1 . . . , Nn }. Then, for every i ≥ N , we have


ǫ
max xki − xk < √ .
k=1,...,n n

Now, recall from the inequality in (1.4) that for every y = (y 1 , y 2 , . . . , y n ) ∈ Rn ,



kyk ≤ n max yk ,
k=1,...,n

so we deduce

kxi − xk ≤ n max xki − xk < ǫ.
k=1,...,n

This establishes the result in one direction.


Now assume that
lim xi = x.
i→∞
Fix an arbitrary integer k with 1 ≤ k ≤ n, and an arbitrary ǫ > 0. We aim to show
that xki → xk , as i → ∞. The definition of convergence of xi → x, as i → ∞, with
ǫ, gives us N ∈ N such that for all i ≥ N we have

kxi − xk < ǫ.

Recall from Exercise 1.1, Equation (1.5) that for every y = (y 1 , y 2 , . . . , y n ) ∈ Rn ,

max y k ≤ kyk .
k=1,...,n

In particular, for all i ≥ N , we have

xki − xk ≤ max xki − xk ≤ kxi − xk < ǫ.


k=1,...,n

As ǫ > 0 was arbitrary, this shows that xki converges to xk , as i → ∞.

Exercise 1.3. Suppose that (xi )∞ ∞ n


i=0 and (yi )i=0 are two sequences in R with

lim xi = x, lim yi = y.
i→∞ i→∞

(a) Show that


xi + y i → x + y as i → ∞.

Lecture notes for Week 1


Chapter 1. Differentiation in higher dimensions Analysis II, Term I, Page 6

(b) Show that


hxi , yi i → hx, yi as i → ∞,

deduce that
kxi k → kxk as i → ∞.

(c) Suppose that (ai )∞


i=0 is a sequence in R with ai → a as i → ∞. Show that:

ai xi → ax, as i → ∞.

Lecture notes for Week 1


Chapter 1. Differentiation in higher dimensions Analysis II, Term I, Page 7

1.1.4 Open sets in Euclidean spaces


In dimension one, you are familiar with sets of the form (a, b) and [a, b], i.e. the
open interval and the closed interval respectively. These form natural domains for
functions in dimension one, and it is fairly general to present theorems about maps
in dimension one on such intervals. In higher dimensions, one may generalise these
sets to sets of the from

(a1 , b1 ) × (a2 , b2 ) × · · · × (an , bn )


= (x1 , x2 , . . . , xn ) ∈ Rn | for 1 ≤ i ≤ n, ai < xi < bi ,


or

[a1 , b1 ] × [a2 , b2 ] × · · · × [an , bn ]


= (x1 , x2 , . . . , xn ) ∈ Rn | for 1 ≤ i ≤ n, ai ≤ xi ≤ bi .


But this is very restrictive and does not capture the same level of generality of
intervals in dimension one. The domains of maps in higher dimensions may appear
in many forms. Due to this, we present a class of subsets of Rn , called open sets.
For x ∈ Rn and the real number r > 0, the open ball of radius r about x is
defined as the set
Br (x) = {y ∈ Rn : kx − yk < r} .
That is, Br (x) consists of all points in Rn which are at distance less than r from
x. We sometimes denote the open ball Br (x) by B(x, r). Both notations are widely
used in mathematics.
Definition 1.2. A set U ⊆ Rn is called open in Rn , if for every x ∈ U there exists
r > 0 such that Br (x) ⊆ U .
In other words, about any point in an open set we can find a small ball which is
entirely contained in the set. Note that in this definition, the radius of the ball is
allowed to depend on x. See Figure 1.1.4.
We may compare the above definition with the definition of open sets in R you
saw in Analysis I. Recall that a set I ⊆ R is open in R, if for every x ∈ I, there is
δ > 0 such that (x − δ, x + δ) ⊆ I. This definition is consistent with the one we have
given in Rn , since in R1 , Bδ (x) = (x − δ, x + δ).
Example 1.1. The ball B1 (0) is open in Rn . To see this, suppose x ∈ B1 (0), so
that kxk < 1. Let r = (1 − kxk)/2. We need to show that Br (x) ⊆ B1 (0). To that
end, let y ∈ Br (x) be an arbitrary point. Using the triangle inequality for the norm
in Rn , we have
1 − kxk 1 + kxk
kyk = ky − x + xk ≤ ky − xk + kxk < r + kxk = + kxk < < 1.
2 2
This means that y ∈ B1 (0).

Lecture notes for Week 2


Chapter 1. Differentiation in higher dimensions Analysis II, Term I, Page 8

b b

Figure 1.1: An open set in R2 in cyan, and some balls inside it. The radius of the
ball depends on the location of the point.

Observe that in the above example, one can replace 1 with any other positive
real number, and the result is still valid. That is, for every δ > 0, the set Bδ (0) is
open in Rn . Similarly, one can also replace 0 with any y ∈ Rn . Thus, in general, for
any y ∈ Rn and any δ > 0, Bδ (y) is open in Rn .

Example 1.2. The set A = {x ∈ Rn : kxk ≤ 1} is not open. Clearly y :=


(1, 0, . . . , 0) belongs to A. On the other hand, if r > 0 then z = (1 + r/2, 0, . . . , 0)
belongs to Br (y) but not to A, so there is no r > 0 such that Br (y) ⊂ A.

Exercise 1.4. Which of the following subsets of Rn is open:

(a) Rn ?

(b) ∅?

(c) x = (x1 , . . . , xn ) ∈ Rn | x1 > 0 ?




(d) x = (x1 , . . . , xn ) ∈ Rn | ∀i, xi ∈ [0, 1) ?




(e) x = (x1 , . . . , xn ) ∈ Rn | ∀i, xi ∈ Q ?




Exercise 1.5. Let (xi )∞ n n


i=0 be a sequence in R with limi→∞ xi = x ∈ R . Assume
that there is r > 0 such that for all i ≥ 0, we have kxi k < r. Show that

kxk ≤ r.

Exercise 1.6. (a) Show that if U1 and U2 are open sets in Rn , then U1 ∪ U2 and
U1 ∩ U2 are open in Rn .

(b) Suppose that Uα , for α in an index set I, are open sets in Rn .

Uα is open in Rn .
S
(i) Show that the set α∈I

Lecture notes for Week 2


Chapter 1. Differentiation in higher dimensions Analysis II, Term I, Page 9

T
(ii) Give an example showing that α∈I Uα need not be open.

Remark 1.2. It is worth noting that the notion of open sets in Rn relies on the
length function k·k we have on Rn . As we shall see in the next chapter, one can
consider functions (called metric) with similar properties on a wide range of other
sets (such as the set of all continuous functions from [0, 1] to R or the set of all
sequences in [0, 1], etc). These lead to notions of open sets on such sets. We will
look into this in the next chapter.

1.2 Continuity
Last year, you learned about the notion of continuity for functions from R (or subsets
of R) to R. In this section we revisit those definitions and upgrade them to higher
dimensions. In fact, the definitions we shall give are almost identical: the only thing
that changes is that we use the appropriate “norm” for the domain and range.

1.2.1 Continuity at a point, and continuity on an open set


We start with the simple definition

Definition 1.3. Let A ⊂ Rn be an open set, and suppose f : A → Rm . We say


that f is continuous at p ∈ A if the following holds: for every ǫ > 0, there exists
δ > 0 such that for all x ∈ A with kx − pk < δ we have

kf (x) − f (p)k < ǫ.

If f is continuous at every p in A, we say f is continuous on A.

We can think of this as saying “f maps points in A close to p to points in Rm


close to f (p)”. Notice that in the definition above, the symbol k·k is playing two
slightly different roles: as the norm on Rn and the norm on Rm .

Remark 1.3. The words “function” and “map” are not identical. For f : X → Y ,
we use the word “function” when the target space Y is the real numbers or the
complex numbers (or in general a field). Otherwise, we use the word “map”. Of
course it is correct to refer to f : X → R as a map, but it is uncommon to refer to
f : X → Y as a function, when Y is not a set of numbers where one can not add
and multiply elements. On the other hand, it is common in analysis and geometry
to see expressions like, “let f be a function on X”, which means that f : X → R or
f : X → C. In those cases, the target space is understood from the context.

Example 1.3. The map f : Rn → R defined as f (x) = kxk is continuous on Rn .

Lecture notes for Week 2


Chapter 1. Differentiation in higher dimensions Analysis II, Term I, Page 10

To show this, fix an arbitrary p ∈ Rn . Suppose kx − pk < δ, then by the reverse


triangle inequality (see Exercise 1.1) we have:

|f (x) − f (p)| = kxk − kpk ≤ kx − pk < δ.

Thus we can take δ = ǫ and we have satisfied the criteria for continuity of f at p.

Example 1.4. Every linear map Λ : Rn → Rm is continuous.


Let {ej }nj=1 be the canonical basis for Rn , that is, all entries of ej are 0 except
the j-th entry which is 1. We may define the real number

M = max kΛ(ej )k .
j=1,...,n

We note that,
n
X 
kΛ(x) − Λ(p)k = kΛ(x − p)k = Λ ej (x − p)j
j=1

n
X
= (x − p)j Λ(ej )
j=1
n
X
≤ (x − p)j Λ(ej )
j=1
Xn
≤ (x − p)j kΛ(ej )k
j=1
n
X
≤M (x − p)j
j=1

Thus, using the inequality in Equation (1.4),


n
X
kΛ(x) − Λ(p)k ≤ M kx − pk = M n kx − pk .
j=1

Thus, if we take δ = ǫ/(2M n), then for any x with 0 < kx − pk < δ, we have
ǫ
kΛ(x) − Λ(p)k < M n < ǫ,
2M n
so Λ is continuous.

Example 1.5. The map f : Rn → R defined as f (x1 , . . . , xn ) = x1 is continuous


on Rn .
To see this, fix an arbitrary p ∈ Rn . Suppose kx − pk < δ, then by the inequality
in (1.5) we have:

|f (x) − f (p)| = x1 − p1 ≤ max xk − pk ≤ kx − pk < δ,


k=1,...,n

Lecture notes for Week 2


Chapter 1. Differentiation in higher dimensions Analysis II, Term I, Page 11

so we may take δ = ǫ and we have satisfied the condition for continuity. Obviously
the same argument shows that all of the coordinate maps (i.e. the map taking x to
xk ) are continuous.

Theorem 1.2. Let A be an open subset of Rn and B be an open subset of Rm .


Suppose f : A → B is continuous at p and g : B → Rl is continuous at f (p). Then
g ◦ f : A → Rl is continuous at p.

Proof. Fix an arbitrary ǫ > 0. Since g is continuous at f (p), we know that


there exists δ1 > 0 such that for any y ∈ B with ky − f (p)k < δ1 , we have
kg(y) − g(f (p))k < ǫ. Similarly, since f is continuous at p, we know that there
exists δ > 0 such that for any x ∈ A with kx − pk < δ, we have kf (x) − f (p)k < δ1 .
Combining these two statements and taking y = f (x), we deduce that if x ∈ A with
kx − pk < δ, we have kg(f (x)) − g(f (p))k < ǫ.

It is sometimes useful to express the continuity of a map in a slightly different


way, for which we need the following definition:

Definition 1.4. Let A be an open subset of Rn and suppose f : A → Rm . For


p ∈ A, we say that the limit of f as x tends to p is equal to q ∈ Rm , if the following
holds: for every ǫ > 0 there exists δ > 0 such that for all x ∈ A with 0 < kx − pk < δ
we have
kf (x) − qk < ǫ.

In this case, we write


lim f (x) = q.
x→p

Note that in the above definition, we do not allow x = p. With this notion of a
limit in hand, we can give the definition of continuity more compactly as:

“f is continuous at p, if limx→p f (x) = f (p).”

Theorem 1.3. Suppose A is an open subset of Rn , p ∈ A, and f, g : A → R with

lim f (x) = F, lim g(x) = G.


x→p x→p

Then

(i) lim (f (x) + g(x)) = F + G,


x→p

(ii) lim (f (x)g(x)) = F G,


x→p

(iii) If, furthermore G 6= 0, then:

f (x) F
lim = .
x→p g(x) G

Lecture notes for Week 2


Chapter 1. Differentiation in higher dimensions Analysis II, Term I, Page 12

Proof. (i) Fix an arbitrary ǫ > 0. Since limx→p f (x) = F , we know that there
exists δ1 > 0 such that for every x ∈ A with 0 < kx − pk < δ1 ,
ǫ
|f (x) − F | < .
2
Similarly, there exists δ2 > 0 such that for every x ∈ A with 0 < kx − pk < δ2 ,
ǫ
|g(x) − G| < .
2
Define δ = min{δ1 , δ2 }. Evidently δ > 0. For every x ∈ A with 0 < kx − pk <
δ, by the triangle inequality, we have

|f (x) + g(x) − (F + G)| ≤ |f (x) − F | + |g(x) − G| < ǫ.

(ii) Fix an arbitrary ǫ > 0, and assume without loss of generality that ǫ < 3 (Why
can see assume this?). Since limx→p f (x) = F , we know that there exists
δ1 > 0 such that for every x ∈ A with 0 < kx − pk < δ1 ,
ǫ
|f (x) − F | < .
3(1 + |G|)

Similarly, there exists δ2 > 0 such that for every x ∈ A with 0 < kx − pk < δ2 ,
ǫ
|g(x) − G| < .
3(1 + |F |)

To control f (x)g(x) − F G, we add and subtract the same terms, so that we


obtain terms of the form f (x) − F and g(x) − G. That is,

f (x)g(x) − F G = f (x)g(x) − f (x) · G + f (x) · G − F · G


= f (x)(g(x) − G) + (f (x) − F ) · G
= (f (x) − F + F )(g(x) − G) + (f (x) − F ) · G
= (f (x) − F )(g(x) − G) + F · (g(x) − G) + (f (x) − F ) · G

Now, take δ = min{δ1 , δ2 }. For every x ∈ A with 0 < kx − pk < δ, by the


triangle inequality, we have

|f (x)g(x) − F G| ≤ |f (x) − F | |g(x) − G| + |F | |g(x) − G| + |G| |f (x) − F |


ǫ2 ǫ |F | ǫ |G|
< + +
9(1 + |F |)(1 + |G|) 3(1 + |F |) 3(1 + |G|)
< ǫ/3 + ǫ/3 + ǫ/3 = ǫ.

(iii) Given the previous part, it suffices to show that if limx→p g(x) = G with G 6= 0,
then
1 1
lim = .
x→p g(x) G

Lecture notes for Week 2


Chapter 1. Differentiation in higher dimensions Analysis II, Term I, Page 13

Fix an arbitrary ǫ > 0. Since limx→p g(x) = G, we know that there exist
δ1 > 0 such that for every x ∈ A with 0 < kx − pk < δ1 ,

ǫ |G|2
|g(x) − G| < .
2
Also, since G 6= 0, G/2 > 0, and hence, there is δ2 > 0 such that for every
x ∈ A with 0 < kx − pk < δ2 ,

|G|
|g(x) − G| < .
2
By the triangle inequality, this implies that
|G| |G|
|g(x)| = |g(x) − G + G| ≥ |G| − |g(x) − G| > |G| − = .
2 2

Let δ = min{δ1 , δ2 }. For every x ∈ A with 0 < kx − pk < δ, we have

1 1 1 1 ǫ |G|2 1 2
− = |G − g(x)| · · < · · = ǫ.
g(x) G |G| |g(x)| 2 |G| |G|
This completes the proof.

Corollary 1.4. Suppose A is an open set in Rn and f, g : A → R are continuous


at p ∈ A. Then,

(i) f + g is continuous at p.

(ii) f g is continuous at p.
f
(iii) If, furthermore g(p) 6= 0, then is continuous at p.
g
Exercise 1.7. Assume that A is an open set in Rn and f : A → Rm . Show
that limx→p f (x) = F , if and only if, for any sequence (xi )∞
i=0 in A \ {p} with
limi→∞ xi = p,
lim f (xi ) = F.
i→∞

Exercise 1.8. (a) Show that the map f : R → Rn defined as f (x) = (x, 0, . . . , 0)
is continuous on R.

(b) Let A be an open set in Rn and f 1 , f 2 , . . . , f m are functions from A to R.


Consider the map f : A → Rm defined as

f (x1 , . . . , xn ) 7→ f 1 (x1 , . . . , xn ), . . . , f m (x1 , . . . , xn ) .




Show that f is continuous at p ∈ A, if and only if, for every k = 1, . . . , m the


map f k : A → R is continuous at p.

Lecture notes for Week 2


Chapter 1. Differentiation in higher dimensions Analysis II, Term I, Page 14

(c) Show that the map f : Rn → R defined as f (x1 , x2 , . . . , xn ) = 3x1 (x2 )5 +


4x2 (xn )7 is continuous on Rn . Here, (xj )m denotes the coordinate xj raised to
power m.

With the above results, one can build many continuous maps from Rn to Rm .
For example,
(x1 , x2 ) 7→ sin(x1 x2 ), cos(x2 ) ,

 1
x − x2 x 3

1 2 3
(x , x , x ) 7→ ,e .
1 + (x2 )2
Exercise 1.9 (*). (a) Suppose f : Rn → Rm is continuous on Rn , and suppose
U ⊂ Rm is open. Show that:

f −1 (U ) := {x ∈ Rn : f (x) ∈ U }

is open.

(b) Suppose that f : Rn → Rm has the property that f −1 (U ) ⊂ Rn is open for


every open set U ⊂ Rm . Show that f is continuous on Rn .

1.3 Derivative of a map of Euclidean spaces


So far, when differentiating functions, we’ve restricted ourselves to the situation
where the function depends only on one variable. This covers lots of situations that
we’re interested in, but of course we often wish to consider maps of more than one
variable. In this chapter we will see how the idea of differentiation can be extended
to maps which send (subsets of) Rn to Rm . The basic idea will be that the derivative
of a map at a point p should be the “best linear approximation” to the map at p.

1.3.1 Derivative as a linear map


Before we think about how to define a derivative of a map in higher dimensions,
let’s first note some of the potential challenges. In one dimension, we say that f is
differentiable at p if the limit

f (x) − f (p)
lim
x→p x−p

exists. If x, p ∈ Rn and f (x), f (p) ∈ Rm then we obviously have a problem: we


don’t even know how to make sense of ‘dividing by x − p’, and it’s not clear what
sort of object we should end up with.
To try and find a way through this impasse, let’s just remind ourselves how the
derivative is introduced in one dimension. By approximating with successive chords,
we consider the tangent to the graph of f at p (see Figure 1.2). Let us think a little

Lecture notes for Week 2


Chapter 1. Differentiation in higher dimensions Analysis II, Term I, Page 15

f (x)

x
p

Figure 1.2: The tangent to f at p.

about how the tangent is characterised. Any (non-vertical) straight line passing
through (p, f (p)) is the graph of the affine map

Aλ : x 7→ λ(x − p) + f (p)

for some λ ∈ R. Let’s consider the difference between f and such an affine map

f (x) − Aλ (x) = f (x) − f (p) − λ(x − p).

In general, from the continuity of f we know that for any λ ∈ R,

lim [f (x) − Aλ (x)] = 0. (1.6)


x→p

However, if f is differentiable, there is a unique choice of λ that allows us to make


a stronger statement. If f is differentiable, there exists a unique λ ∈ R such that

|f (x) − Aλ (x)|
lim = 0.
x→p |x − p|

This is a stronger statement than (1.6) because it tells us that f (x) − Aλ (x) is going
to zero faster than |x − p|, as x → p. We make this informal discussion more precise
in the following lemma.

Lemma 1.5. The map f : (a, b) → R is differentiable at p ∈ (a, b) if and only if


there exists a map of the from Aλ (x) = λ(x − p) + f (p), for some λ ∈ R, such that

|f (x) − Aλ (x)|
lim = 0.
x→p |x − p|

Proof. We can re-write

|f (x) − f (p) − λ(x − p)| f (x) − f (p)


= −λ ,
|x − p| x−p

Lecture notes for Week 2


Chapter 1. Differentiation in higher dimensions Analysis II, Term I, Page 16

so that
|f (x) − Aλ (x)| f (x) − f (p)
lim =0 ⇐⇒ lim = λ.
x→p |x − p| x→p x−p
The expression on the right-hand side of the above equation is the definition of
differentiability of f at p.

We may rewrite

Aλ (x) = λ(x − p) + f (p) = λx + (f (p) − λp).

Thus, Aλ : R → R is the composition of the linear map x 7→ λx and the translation


x 7→ x + (f (p) − λp). Such maps are called affine maps of R. By the above lemma,
the map f is differentiable at p, if it is “well approximated” by an affine map at p.
We may generalise this to higher dimensions.
Since we are going to frequently apply linear an nonlinear maps to variables,
to distinguish between these two cases, we shall use the notation h[v] when h is a
linear map and v is seen as a vector, and use h(v) when h is a map and v is seen as
a point in the domain of h.
Let L(Rn ; Rm ) denote the set of all linear maps from Rn to Rm . Recall that
Λ : Rn → Rm is a linear map if

Λ[x + y] = Λ[x] + Λ[y], ∀x, y ∈ Rn ,

Λ[ax] = aΛ[x], ∀a ∈ R and x ∈ Rn .

In analogy to the statement in Lemma 1.5 we propose the following definition.

Definition 1.5. Suppose Ω ⊂ Rn is open. The map f : Ω → Rm is differentiable


at p ∈ Ω, if there exists a linear map Λ ∈ L(Rn ; Rm ) such that

kf (x) − (Λ[x − p] + f (p))k


lim = 0.
x→p kx − pk

In this case, we write


Df (p) := Λ,

and call Df (p) the derivative of the map f at the point p.

Note that some authors refer to the derivative of a map as total derivative, or
differential. We shall refer to that as derivative.
It is often useful to have the following equivalent characterisation of differentia-
bility in higher dimensions: f : Ω → Rm is differentiable at p ∈ Ω if and only if
there exists Λ ∈ L(Rn ; Rm ) such that

kf (p + h) − f (p) − Λ[h]k
lim = 0.
h→0 khk

Lecture notes for Week 2


Chapter 1. Differentiation in higher dimensions Analysis II, Term I, Page 17

Note that in the above equation, h → 0 in Rn .


Recall that using a canonical basis for Rn and Rm any linear map Λ ∈ L(Rn ; Rm )
can be expressed as an m × n matrix which is called the Jacobian of f at p. The
convention is that an m × n matrix has m rows and n columns. For the purposes
of this course, we won’t make a big deal of the difference between a linear map and
its matrix representation with respect to the canonical basis, so will use the words
derivative and Jacobian essentially indistinguishably.

Lemma 1.6. Let Ω ⊂ Rn be an open set. If f : Ω → Rm is differentiable at p ∈ Ω,


then it is continuous at p.

Proof. Since
kf (p + h) − f (p) − Λ[h]k
lim = 0,
h→0 khk
we must have
lim kf (p + h) − f (p) − Λ[h]k = 0.
h→0
On the other hand, since linear maps are continuous, see Example 1.4, we obtain

0 = lim (f (p + h) − f (p) − Λ[h]) = lim (f (p + h) − f (p)).


h→0 h→0

Example 1.6. By Lemma 1.5 any function f : (a, b) → R which is differentiable at


p satisfies the conditions of 1.5 with Df (p) = f ′ (p). Notice that a 1 × 1 matrix is
simply a real number.

Example 1.7. Let B ∈ L(Rn ; Rm ) and V ∈ Rm . Then, the map f : Rn → Rm


defined as
f (x) = B(x) + V

is differentiable at each p ∈ Rn , and Df (p) = B. To see this, note that

f (p + h) − f (p) − B(h) = (B(p + h) + V ) − (B(p) + V ) − B(h)


= B(p) + B(h) + V − B(p) − V − B(h) = 0.

Thus,
kf (p + h) − f (p) − B(h)k
lim = lim 0 = 0.
h→0 khk h→0

Example 1.8. The map f : Rn → R defined as

f (x) = kxk2

is differentiable at each p ∈ Rn , and Df (p) is the linear map

Df (p)[h] = 2 hp, hi , ∀h ∈ Rn .

Lecture notes for Week 2


Chapter 1. Differentiation in higher dimensions Analysis II, Term I, Page 18

From the properties of the inner product in Exercise 1.1-(a), we can see that the
map h 7→ 2hp, hi is a linear map.
We note that

f (p + h) = kp + hk2 = hp + h, p + hi = kpk2 + 2 hp, hi + khk2 ,

so that
kf (p + h) − f (p) − 2 hp, hik
lim = lim khk = 0.
h→0 khk h→0

As a matrix, we have that Df (p) = 2p, where p is viewed as a row vector with
n components (this is in line with our convention that a 1 × n matrix maps Rn to
R1 ). So the Jacobian is a row vector for this map.

Example 1.9. Let m ≥ 1 be an integer, and assume that for i = 1, 2, . . . , m, the


map f i : (a, b) → R is differentiable at p ∈ (a, b). Then the map f : (a, b) → Rm
defined as
f (x) = f 1 (x), f 2 (x), . . . , f m (x) ,


is differentiable at p, and the derivative Df (p) : R → Rm has the matrix represen-


tation  
(f 1 )′ (p)
 .. 
Df (p) =  . .

m
(f ) (p) ′

To see this, we note that


   
(f 1 )′ (p) f 1 (p + h) − f 1 (p) − (f 1 )′ (p)h
 ..   .. 
f (p + h) − f (p) − 
 . h = 
  . 

m
(f ) (p)′ f (p + h) − f (p) − (f m )′ (p)h
m m

so that, using the inequality in (1.5),

kf (p + h) − f (p) − Df (p) [h]k √ f j (p + h) − f j (p) − (f j )′ (p)h


≤ m max .
khk j=1,...,m |h|

Since each f j is differentiable at p, the left hand side of the above equation tends to
0, as h → 0. And since the left hand side of the equation is non-negative, it must
tend to 0, as h → 0. Notice here that the expression Df (p) [h] means applying the
linear map Df (p) to the one dimensional vector h, which gives us an element of Rm .

Implicitly in the discussion above, we’ve assumed that Df (p), if it exists, must
be unique. Of course, this is something that we need to prove.

Theorem 1.7. The derivative, if it exists, is unique.

Lecture notes for Week 2


Chapter 1. Differentiation in higher dimensions Analysis II, Term I, Page 19

Proof. Suppose Ω ⊂ Rn is open, f : Ω → Rm , p ∈ Ω and that Λ and Λ′ satisfy:


kf (p + h) − f (p) − Λ[h]k kf (p + h) − f (p) − Λ′ [h]k
lim = lim = 0.
h→0 khk h→0 khk
Let e be an arbitrary vector in Rn with kek = 1. Then for any real number α 6= 0
we have
Λ[αe]
= Λ[e].
α
Now, let (αj )∞
j=0 be a sequence of non-zero real numbers tending to 0 as j → ∞.
By adding and subtracting identical terms, we see that

kΛ[e] − Λ′ [e]k
Λ[αj e] Λ′ [αj e]
= −
αj αj
kΛ[αj e] − Λ′ [αj e]k
= lim
j→∞ kαj ek
k−f (p + αj e) + f (p) + Λ[αj e] + f (p + αj e) − f (p) − Λ′ [αj e]k
= lim
j→∞ kαj ek
kf (p + αj e) − f (p) − Λ[αj e]k kf (p + αj e) − f (p) − Λ′ [αj e]k
≤ lim + lim
j→∞ kαj ek j→∞ kαj ek
= 0.

For the last equality in the above equation we have used that αj e → 0 as j → ∞.
By the above equation, for any unit vector e we have Λ[e] = Λ′ [e], which implies
that (as linear maps) Λ = Λ′ .

Exercise 1.10. Suppose f : Rn → Rn is given by f (x) = x. Show that f is


differentiable at each p ∈ Rn and

Df (p) = id,

where id : Rn → Rn is the identity map.

Exercise 1.11. Show that the map f : R2 → R given by

f : (x, y) 7→ x2 + y 2 ,

is differentiable at all points p = (ξ, η) ∈ R2 with Jacobian

Df (p) = (2ξ 2η) .

Exercise 1.12. One might hope that the derivative can be calculated by finding
f (x) − f (p)
lim .
x→p kx − pk
By considering the example of Exercise 1.10 or otherwise, show that this limit may
not always exist, even if f is differentiable at p.

Lecture notes for Week 2


Chapter 1. Differentiation in higher dimensions Analysis II, Term I, Page 20

Exercise 1.13. Suppose that Ω ⊂ Rn is open, and f, g : Ω → Rm are differentiable


at p ∈ Ω. Show that h = f + g is differentiable at p and

Dh(p) = Df (p) + Dg(p)

Lecture notes for Week 2


Chapter 1. Differentiation in higher dimensions Analysis II, Term I, Page 21

f ◦g

g f

Ω Ω′ Rl

Dg Df

Df (g(p)) ◦ Dg(p)

Figure 1.3: Illustration of Theorem 1.8.

1.3.2 Chain rule


In dimension one there is a simple “algorithm” which allows us to calculate the
derivative of more complicated maps using the derivative of simpler ones. That
algorithm is the chain rule. If f, g : R → R, with g differentiable at p and f
differentiable at g(p), then f ◦ g is differentiable at p with

(f ◦ g)′ (p) = f ′ (g(p))g ′ (p).

Now, suppose that g : Rn → Rm and f : Rm → Rl , with g differentiable at


p and f differentiable at g(p). Let h = f ◦ g. We know that Dg(p) : Rn → Rm
and Df (g(p)) : Rm → Rl are linear maps, so it certainly makes sense to consider
Df (g(p)) ◦ Dg(p), where “◦” denotes the composition of linear maps (corresponding
to matrix multiplication). This will be a linear map from Rn to Rl , which is the
right kind of object to be Dh(p). In fact, it is the case that h = f ◦ g is differentiable
at p with
Dh(p) = Df (g(p)) ◦ Dg(p)

Theorem 1.8. Assume Ω ⊆ Rn and Ω′ ⊆ Rm are open sets, with g : Ω → Ω′


differentiable at p ∈ Ω and f : Ω′ → Rl differentiable at g(p) ∈ Ω′ . Then h = f ◦ g :
Ω → Rl is differentiable at p with derivative

Dh(p) = Df (g(p)) ◦ Dg(p).

(*) Proof. Let g(p) = q, A = Dg(p), B = Df (q). We define the map

φ(x) = g(x) − g(p) − A(x − p), ∀x ∈ Ω


ψ(y) = f (y) − f (q) − B(y − q), ∀y ∈ Ω′
τ (x) = f (g(x)) − f (g(p)) − B (A(x − p)) , ∀x ∈ Ω.

Lecture notes for Week 3


Chapter 1. Differentiation in higher dimensions Analysis II, Term I, Page 22

By the assumptions in the theorem we know that


φ(x)
0 = lim , (1.7)
kx − pk
x→p

ψ(y)
0 = lim , (1.8)
y→q ky − qk

and we need to show that


τ (x)
lim = 0.
x→p kx − pk

We may rewrite the map τ as

τ (x) = f (g(x)) − f (g(p)) − B (A(x − p))


= f (g(x)) − f (g(p)) − B (g(x) − g(p) − φ(x))
= f (g(x)) − f (g(p)) − B (g(x) − g(p)) + B(φ(x))
= ψ(g(x)) + B(φ(x)).

On the other hand, we recall from Example 1.4 that there is a real number M
such that
kA(x)k ≤ M kxk , ∀x ∈ Rn .
Since B is linear, and hence continuous by Example 1.4, we have that
   
B(φ(x)) φ(x) φ(x)
lim = lim B = B lim = 0.
x→p kx − pk x→p kx − pk x→p kx − pk

Fix an arbitrary ǫ > 0. It follows from (1.8) that there exists δ > 0 such that
for y ∈ Ω′ with ky − qk < δ we have
kψ(y)k

ky − qk
which implies
kψ(y)k < ǫ ky − qk .
On the other hand, since g is continuous, there exists δ1 such that if x ∈ Ω with
kx − pk < δ1 then
kg(x) − g(p)k = kg(x) − qk < δ.
Thus, for every x ∈ Ω with kx − pk < δ1 , we have

kψ(g(x))k < ǫ kg(x) − qk


= ǫ kφ(x) + A(x − p)k
≤ ǫ kφ(x)k + ǫM kx − pk .

Dividing through by kx − pk and taking the limit, we deduce that


kψ(g(x))k
lim ≤ ǫM.
x→p kx − pk

Lecture notes for Week 3


Chapter 1. Differentiation in higher dimensions Analysis II, Term I, Page 23

Since ǫ > 0 was arbitrary, we conclude


kψ(g(x))k
lim = 0,
x→p kx − pk

and we are done.

Example 1.10. Let m ≥ 1 be an integer, and assume that for i = 1, 2, . . . , m,


the functions gi : (a, b) → R are differentiable at some p ∈ (a, b). Then, the map
k : (a, b) → R, defined as
 2
k(x) = g1 (x), g 2 (x), . . . , g m (x)

is differentiable at p, and its Jacobian matrix has one real entry

2g 1 (p)(g1 )′ (p) + 2g 2 (p)(g2 )′ (p) · · · + 2g m (p)(gm )′ (p).

We note that by Example 1.9, the map g : (a, b) → Rm defined as

g(x) = (g 1 (p), g2 (p), . . . , g m (p))

is differentiable at p with derivative


 
(g1 )′ (p)
 .. 
Dg(p) = 
 . .

(gm )′ (p)

On the other hand, in Example 1.8, we saw that the map f (x) = kxk2 is differen-
tiable at every point in Rm with derivative Df (q)[h] = 2 hq, hi. We have k = f ◦ g
on (a, b). Thus, by the chain rule, the map k is differentiable at p, with derivative

Dk(p)[h] = Df (g(p)) ◦ Dg(p)[h]


= D(f (g(P )) (g 1 )′ (p)h, . . . , (g m )′ (p)h
 

= 2 g(p), ((g1 )′ (p)h, . . . , (gm )′ (p)h)


= 2 g(p), h((g 1 )′ (p), . . . , (g m )′ (p))
= 2 hg(p), Dg(p)i h.

Thus, the Jacobian of k at p is the one by one matrix with real entry

2 hg(p), Dg(p)i = 2g1 (p)(g 1 )′ (p) + 2g2 (p)(g2 )′ (p) · · · + 2g m (p)(g m )′ (p).

Exercise 1.14. Assume Ω and Ω′ are open sets in Rn , g : Ω → Ω′ differentiable at


p ∈ Ω and f : Ω′ → Ω differentiable at g(p) ∈ Ω′ . Moreover,

f ◦ g(x) = x, ∀ x ∈ Ω.
g ◦ f (x) = x, ∀ x ∈ Ω′ .

Show that
Df (g(p)) = (Dg(p))−1 .

Lecture notes for Week 3


Chapter 1. Differentiation in higher dimensions Analysis II, Term I, Page 24

Exercise 1.15 (*). (a) Show that the map P : R2 → R given by

P (x, y) = xy

is differentiable at each point p = (ξ, η) ∈ R2 , with Jacobian

DP (p) = (η ξ) .

(b) Suppose that f, g : Rn → R are differentiable at q ∈ Rn . Show that the map


Q : Rn → R2 defined as
Q(z) = (f (z), g(z))

is differentiable at q, with derivative


!
Df (q)
DQ(q) =
Dg(q)

(c) Show that the map F : Rn → R defined as F (z) = f (z)g(z), for all z ∈ Rn , is
differentiable at q, with derivative

DF (q) = g(q)Df (q) + f (q)Dg(q)

1.4 Directional derivatives


1.4.1 Rates of change and partial derivatives
Although the definitions of differentiability in dimension one and in higher dimen-
sions appear similar, there is a major difference which makes the latter a more
difficult concept. In dimension one, to see if a map : f (a, b) → R is differentiable at
some x ∈ (a, b), we only need to verify that the limit of (f (x) − f (p)/(x − p)) exists
as x → p. To verify this, we do not need to know the value of the limit beforehand,
that is, the value of the limit does not appear in this ratio. However, in higher
dimensions, to verify if a map f : Ω → Rn is differentiable at some p ∈ Ω, we need
to know the derivative at that point. In other words, the derivative of the map at
p appears in the criteria for differentiability. For basic maps, it is possible to guess
the derivative, but in general, it may not be obvious what the derivative is. See for
instance the map in Example 1.8. The purpose of this section is to present a simple
approach to identify a candidate for the derivative in higher dimensions.
For a function f : (a, b) → R, we are familiar with the idea of f ′ (p) telling
us something about the rate of change of f (x) as we vary x near p ∈ (a, b). We
can connect the derivative to this sort of concept with the directional derivative.
Let us suppose that we are given a function f : R3 → R, which is supposed to
represent the temperature of some three dimensional body which is not changing

Lecture notes for Week 3


Chapter 1. Differentiation in higher dimensions Analysis II, Term I, Page 25

in time. Suppose we start at the origin 0 ∈ R3 and travel along the curve t 7→ vt,
for some fixed v ∈ R3 , that is we move along a straight line with velocity v passing
through the origin at time 0. We can record the temperature of our surroundings as
a function of time, θ(t) and we will find θ(t) = f (vt). Suppose we ask what the rate
of change of temperature is at t = 0. This will of course be θ ′ (0). Now, we notice
that we can write:
θ =f ◦V
where V is the linear map V : R → R3 given by V (t) = vt. Now, we can use the
chain rule to calculate θ ′ (0) = Dθ(0) and we find:

θ ′ (0) = Dθ(0) = Df (0) ◦ DV (0).

Now, since V is a linear map, we have DV (0) = v and we conclude:

θ ′ (0) = Df (0)[v].

This gives us a nice interpretation of the derivative Df (0). When we apply Df (0) to
a vector v, we find the rate of change of f at 0 as we travel along a line with velocity
v. More generally, we can consider travelling along the line given by V (t) = p + tv
for some p ∈ R3 . Then at t = 0, we are passing through the point p ∈ R3 . Setting
θ(t) = f (p + tv), We call the quantity:

θ ′ (0) = Dθ(p) = Df (p)[v]

the directional derivative of f at p in the direction v. Sometimes the notation


∂f 1
(p) := lim [f (p + vt) − f (p)] = Df (p)[v]
∂v t→0 t

is used for the directional derivative.


Now, if we take {e1 , e2 , e3 } to be the canonical basis vectors for R3 , then we can
write v = v 1 e1 + v 2 e2 + v 3 e3 for v i ∈ R. Doing this, and recalling that Df (p) is a
linear map, we have:
∂f
(p) = Df (p) v 1 e1 + v 2 e2 + v 3 e3
 
(1.9)
∂v
= v 1 Df (p) [e1 ] + v 2 Df (p) [e2 ] + v 3 Df (p) [e3 ]
= v 1 D1 f (p) + v 2 D2 f (p) + v 3 D3 f (p). (1.10)

In other words, we can find any directional derivative at p, provided we know the
three numbers:
∂f
Di f (p) = (p), i = 1, 2, 3.
∂ei
called the partial derivatives of f at p. Equivalently, these can be defined as
f (p + tei ) − f (p)
Di f (p) := lim .
t→0 t

Lecture notes for Week 3


Chapter 1. Differentiation in higher dimensions Analysis II, Term I, Page 26

If f : R3 → R, then for x, y and z in R,


f (x + t, y, z) − f (x, y, z) ∂f
D1 f (x, y, z) = lim =: (x, y, z),
t→0 t ∂x
where we’ve introduced yet more notation. The expression ∂f ∂x you should think
of as meaning ‘differentiate f with respect to x, while treating y, z as constants.
Returning to (1.10), we see that for any v = (v 1 , v 2 , v 3 ), we have
 
1
  v
Df (p)[v] = D1 f (p) D2 f (p) D3 f (p)  v 2  ,
 

v3
so that the Jacobian of f at p is given by
 
Df (p) = D1 f (p) D2 f (p) D3 f (p) .

To introduce even more notation, we sometimes write


 
D1 f (p)
∇f (p) =  D2 f (p)  ,
 

D3 f (p)
which is called the gradient of f at p, and with this notation

Df (p) = (∇f (p))t .

We can extend all of these notions to more general range and domains, which
leads us to the following definition.

Definition 1.6. Suppose Ω ⊂ Rn is open and f : Ω → Rm is differentiable at p ∈ Ω.


For any vector v ∈ Rn with kvk = 1, the directional derivative of f at p in the
direction v is given by
∂f f (p + tv) − f (p)
(p) = lim = Df (p)[v]
∂v t→0 t
The partial derivatives of f at p are given by
∂f f (p + tei ) − f (p)
Di f (p) = (p) = lim , i = 1, . . . , n.
∂ei t→0 t
Notice that f (x) is now a vector in Rm , so expressions like limt→0 f (p+tv)−f
t
(p)

have to be understood as limits in Rm , so that ∂f ∂v (p) will be an m−dimensional


column vector. That is, if

f (x) = (f 1 (x), f 2 (x), . . . , f m (x)),

then  
Di f 1 (p)
 .. 
Di f (p) = 
 . .

m
Di f (p)

Lecture notes for Week 3


Chapter 1. Differentiation in higher dimensions Analysis II, Term I, Page 27

Theorem 1.9. Suppose Ω ⊂ Rn is open and f : Ω → Rm is of the form

f (x) = (f 1 (x), f 2 (x), . . . , f m (x)).

If f is differentiable at some p ∈ Ω, then the Jacobian of f at p is


 
D1 f 1 (p) . . . Dn f 1 (p)
 .. .. .. 
Df (p) = 
 . . . .

D1 f (p) . . . Dn f m (p)
m

Proof. Let {ei } be the canonical basis for Rn . For any v ∈ Rn , we write v =
Pn i
i=1 v ei . Then by the linearity of Df (p) we have:
" n # n n
X X X
i
Df (p)[v] = Df (p) v ei = v i Df (p) [ei ] = v i Di f (p).
i=1 i=1 i=1
 P 
n i D f 1 (p)
i=1 v i
 .. 
=
 . 

Pn i m
i=1 v Di f (p)
  
D1 f 1 (p) . . . Dn f 1 (p) v1
 .. .. ..  . 
  .. 
=
 . . .  
m m
D1 f (p) . . . Dn f (p) v n

This allows us to restate the chain rule in terms of the partial derivatives of the
functions.

Corollary 1.10. Suppose Ω ⊂ Rn and Ω′ ⊂ Rm are open sets, g : Ω → Ω′ is


differentiable at p ∈ Ω, and f : Ω′ → Rl is differentiable at g(p). Then h = f ◦ g is
differentiable at p with Jacobian
  
D1 f 1 (g(p)) . . . Dm f 1 (g(p)) D1 g1 (p) . . . Dn g1 (p)
 .. .. ..   .. .. .. 
Dh(p) =   . . . 
 . . . 

l l
D1 f (g(p)) . . . Dm f (g(p)) D1 g (p) . . . Dn gm (p)
m

In the one dimensional case, we often use the derivative to search for turning
points, i.e. maxima and minima, since a differentiable function will have vanishing
derivative at a local maximum or minimum. A similar result holds in the higher
dimensional case.

Lemma 1.11. Let Ω ⊂ Rn be open and f : Ω → R be differentiable at each point in


Ω. Suppose that f has a local maximum at p ∈ Ω. Then:

Df (p) = 0.

Similarly if p is a local minimum.

Lecture notes for Week 3


Chapter 1. Differentiation in higher dimensions Analysis II, Term I, Page 28

Proof. Pick v ∈ Rn . Since Ω is open, there exists ǫ > 0 such that p + tv ∈ Ω for
t ∈ (−ǫ, ǫ). Consider the function gv : (−ǫ, ǫ) → R defined as

gv (t) = f (p + tv).

Since f has a local maximum at p, gv has a local maximum at 0 and moreover, gv


is differentiable by the chain rule, so we deduce

0 = gv′ (0) = Df (p)[v].

Since v was arbitrary, we have that Df (p) = 0. A similar argument deals with the
case where p is a minimum.

Exercise 1.16. (i) Let the function f : R2 → R3 be given by

f (x, y) = (x2 + ex+y , x − log y, 2xy + 1).

Assuming f is differentiable at a point (x, y), what is its derivative?

(ii) Let g : R3 → R1 be given by

g(x, y, z) = x + y + z.

Compute the derivative of g ◦ f assuming it exists. Compute it in 2 ways, with


and without the chain rule.

1.4.2 Relation between partial derivatives and differentiability


We have seen above that for a function f : Rn → R which is differentiable at some
point p, the limits
f (p + tei ) − f (p)
Di f (p) := lim (1.11)
t→0 t
exist for i = 1, . . . n, and moreover these limits completely determine the derivative
of f at p. One might hope, based on this, that in order for f to be differentiable at
p it is enough to know that the partial derivatives (i.e. the limits in (1.11)) of f at
p all exist. Unfortunately, this is not the case, as we show in the following example.

Example 1.11. Consider the function f : R2 → R defined as



0 x=y=0
f (x, y) =
 √ xy otherwise
2 2x +y

See Figure 1.4 for the graph of the function f .

Lecture notes for Week 3


Chapter 1. Differentiation in higher dimensions Analysis II, Term I, Page 29

Figure 1.4: The graph of the function in Example 1.11.

1
x2 + y 2 ,

First note that this function is continuous at the origin. Since |xy| ≤ 2
we have that for p = (x, y) 6= (0, 0):
1p 2
|f (p)| ≤ x + y2 ,
2
so that
lim f (p) = 0.
p→0

Now consider the partial derivatives. We have


1 0−0
D1 f (0) = lim [f (te1 ) − f (0)] = lim =0
t→0 t t→0 t
since f (te1 ) = 0 for all t. Similarly, we also have
1 0−0
D2 f (0) = lim [f (te2 ) − f (0)] = lim =0
t→0 t t→0 t
Thus, if f is differentiable, then it must be that Df = 0, so all directional derivatives
at 0 exist and are equal to zero. However, let h = √12 (1, 1). For t > 0, we have

f (th) − f (0) t2 /2 1
= 2 = ,
t t 2
which contradicts the differentiability of f at the origin. Thus, even though the
partial derivatives exist for this function, the function is not differentiable.
Away from the origin, the function is a composition of smooth functions so is
differentiable. We can calculate the partial derivatives at a point p = (x, y) 6= (0, 0)
and we find
y x2 y y3
D1 f (p) = p − 3 = 3 ,
x2 + y 2 (x2 + y 2 ) 2 (x2 + y 2 ) 2

Lecture notes for Week 3


Chapter 1. Differentiation in higher dimensions Analysis II, Term I, Page 30

and by symmetry:
x3
D2 f (p) = 3 .
(x2 + y 2 ) 2
We claim that the function g : R2 \ {0} → R given by

x3
g(x, y) = 3
(x2 + y 2 ) 2
has no limit as p = (x, y) converges to (0, 0). To see this, let p = (r cos θ, r sin θ) for
some r ∈ (0, ∞), θ ∈ [0, 2π). Then

g(p) = cos3 θ,

so there can be no limit as r → 0, since g approaches a different value depending on


which angle we approach from.

As it happens, the fact that the partial derivatives are not continuous in a
neighbourhood of the origin is the only barrier to differentiability there.

Theorem 1.12. Let Ω ⊂ Rn be open and f : Ω → R. Suppose the partial derivatives


f (x + tei ) − f (x)
Di f (x) := lim
t→0 t
exist for all x ∈ Ω, and moreover suppose that the maps

x 7→ Di f (x)

are continuous at p ∈ Ω for all i = 1, . . . , n. Then f is differentiable at p.

(*) Proof. Since Ω is open, there exists r > 0 such that Br (p) ⊂ Ω. Suppose
h ∈ Br (0) has components hi , so that h = ni=1 hi ei . We consider
P

n
!
X
i
f (p + h) − f (p) = f p + h ei − f (p)
i=1
n n−1
! !
X X
=f p+ hi ei −f p+ hi ei
i=1 i=1
n−1 n−2
! !
X X
i i
+f p+ h ei −f p+ h ei
i=1 i=1

+ ...
+ f (p + h1 e1 ) − f (p).

Let’s consider a typical line in the right hand side of the above equation, that is,
k k−1
! !
X X
f p+ hi ei − f p + hi ei = f (q + hk ek ) − f (q),
i=1 i=1

Lecture notes for Week 3


Chapter 1. Differentiation in higher dimensions Analysis II, Term I, Page 31

Pk−1 i
where k ∈ {1, . . . , n} and q = p+ i=1 h ei . Now, applying the mean value theorem
to the function g(t) = f (q + tek ), which is differentiable by assumption, there exists
s ∈ − hk , hk such that:
 

f (q + hk ek ) − f (q) = hk Dk f (q + sek ) = hk Dk f (p + ck ),
Pk−1 i
where ck = i=1 h ei +sek . One has to consider separately the cases hk > 0, hk < 0
and hk = 0. Now, note that since |s| ≤ hk , we have

kck k ≤ khk .

Putting this together, we conclude that there exists c1 , . . . , cn ∈ Rn with kck k ≤ khk
such that
n
X
f (p + h) − f (p) = hk Dk f (p + ck ).
k=1

From here we can estimate using the Cauchy-Schwartz identity


n
X n
X
f (p + h) − f (p) − hk Dk f (p) ≤ hk |Dk f (p + ck ) − Dk f (p)|
k=1 k=1
n
!1
2
2
X
≤ khk |Dk f (p + ck ) − Dk f (p)| ,
k=1

so that
n
!1
f (p + h) − f (p) − nk=1 hk Dk f (p)
P 2
2
X
≤ |Dk f (p + ck ) − Dk f (p)| .
khk
k=1

Now, fix ǫ > 0. Since x 7→ Dk f (x) is continuous at p, for each k = 1, . . . , n, there


exists δk such that if kck < δk we have:
ǫ
|Dk f (p + c) − Dk f (p)| < √ .
n

Suppose khk < min{δ1 , . . . , δn } =: δ. Then as kck k ≤ khk, we deduce

n
!1
f (p + h) − f (p) − nk=1 hk Dk f (p) ǫ2
P 2
X
< = ǫ.
khk n
k=1

As ǫ was arbitrary, we conclude that f is differentiable at p, with derivative


n
X
Df (p) [h] = Dk f (p)hk .
k=1

Exercise 1.17. Show that each of the following maps f : R2 → R is everywhere


differentiable

Lecture notes for Week 3


Chapter 1. Differentiation in higher dimensions Analysis II, Term I, Page 32

(a) f (x, y) = x2 + y 2 − x − xy,


1
(b) f (x, y) = √ ,
1+x2 +y 2

(c) f (x, y) = x5 y 2 .

For maps f : (a, b) → R we have learned that when f is differentiable at some


p ∈ (a, b), then there is a tangent line to the graph of f that passes through (p, f (p))
and approximates the graph of f near p. This is an intuitive picture that is only
valid when we consider the graph of a function from one dimension to one dimension.
By an example below, we show that this intuition should not be employed for maps
of higher dimensions.

Example 1.12. Let f : (−1, +1) → R2 be define as



(x2 , 0) if x ≥ 0
f (x) =
(0, x2 ) if x < 0.

See Figure 1.12 for the image of the map f .

Figure 1.5: The image of the map f in Example 1.12

Clearly, f is continuous at 0 with

lim f (x) = (0, 0) = f (0).


x→0

The map f is differentiable at 0 with derivative equal to the constant linear map
Λ = 0. To see this, note that

kf (0 + h) − f (0) − Λ[h]k kf (h)k h2


lim = lim = lim = lim |h| = 0.
h→0 khk h→0 khk h→0 |h| h→0

In fact, it is not possible to understand just by looking at the image of a map


whether it is differentiable or not. As the example below shows, maps with the same
image may or may not be differentiable.

Example 1.13. Define the maps k and g from (−1, +1) to R2 as

k(x) = (x, x3 ), g(x) = (x1/3 , x).

Lecture notes for Week 3


Chapter 1. Differentiation in higher dimensions Analysis II, Term I, Page 33

See Figure 1.6 for the images of the maps f and g.


The maps k and g are continuous at 0 with
lim k(x) = (0, 0) = k(0),
x→0
and
lim g(x) = (0, 0) = g(0).
x→0

0.5

-0.5
-1

-0.5

0.5

Figure 1.6: The image of the maps f and g in Example 1.13. The differentiability
at (0, 0) depends on “how fast” we pass through the point (0, 0).

The maps k and g have the same image, that is, they map the interval (−1, +1)
to the same curve, which is the graph of the function t 7→ t3 on the interval (−1, +1).
However, k is differentiable at 0, but g is not differentiable at 0, as we show below.
We claim that the derivative of the map k at 0 is equal to the linear map
Λ(h) = (h, 0). To see this, note that
kk(0 + h) − k(0) − Λ[h]k (h, h3 ) − (h, 0)
lim = lim
h→0 khk h→0 khk
(0, h3 )
= lim
h→0 khk
|h|3
= lim = 0.
h→0 |h|

To prove that g is not differentiable at 0, we need to show that there is no linear


map Λ : R → R2 which is the derivative of the map g at 0. In contrary assume that
there is a linear map Λ : R → R2 such that
kg(0 + h) − g(0) − Λ[h]k
lim = 0.
h→0 khk

Lecture notes for Week 3


Chapter 1. Differentiation in higher dimensions Analysis II, Term I, Page 34

Let Λ(1) = (a, b) ∈ R2 , for some real constants a and b in R. It follows that for
every h ∈ R we have

Λ(h) = Λ(h · 1) = hΛ(1) = h(a, b) = (ha, hb).

Therefore,

kg(0 + h) − g(0) − Λ[h]k (h1/3 − ah, h − bh)


0 = lim = lim
h→0 khk h→0 |h|
h(h−2/3 − a, 1 − b)
= lim
h→0 |h|
= lim (h−2/3 − a, 1 − b)
h→0

= lim (h−2/3 − a, 1 − b)
h→0

In the last line of the above equation we have used that k·k is a continuous function,
so we may interchange the limit and the norm. Now recall that kyk = 0, if and only
if y = 0. Thus we must have

lim (h−2/3 − a, 1 − b) = (0, 0)


h→0

which implies that

lim h−2/3 − a = 0, and lim 1 − b = 0.


h→0 h→0

This is a contradiction, since for any real number a we have

lim h−2/3 − a = ∞.
h→0

This contradiction shows that there is no linear map Λ : R → R2 satisfying the


definition of differentiability for g at 0.
Note that the value of the other limit does not lead to any contradiction, it only
says that b must be equal to 1.

Lecture notes for Week 3


Chapter 1. Differentiation in higher dimensions Analysis II, Term I, Page 35

1.5 Higher derivatives


1.5.1 Higher derivatives as linear maps
Suppose that Ω ⊂ Rn is open, and f : Ω → Rm is differentiable at every point p ∈ Ω.
We may think of the differential of f as a map

Df : Ω → L(Rn ; Rm )
p 7→ Df (p).

Recall that every member of L(Rn ; Rm ) may be expressed as an m by n matrix,


using the standard basis for Rn and Rm . We can think of each m by n matrix as a
point in Rmn , for example, by

(ai,j )1≤i≤m,1≤j≤n 7→ (a1,1 , . . . , a1,n , a2,1 , . . . , a2,n , . . . , am,1 , . . . , am,n ).

Thus, we may think of Df as a map from Ω to Rmn . We can consider whether


this map Df is continuous, or differentiable at a point p ∈ Ω. If the map Df :
Ω → Rmn is continuous, we say f : Ω → Rm is continuously differentiable. If
Df : Ω → Rmn is differentiable at p, the derivative at p, denoted by DDf (p), is a
linear map from Rn to Rmn . That is,

DDf (p) ∈ L(Rn ; Rnm ) = L(Rn ; L(Rn ; Rm )).

Thus, DDf (p) takes an n-vector to an (m × n) matrix. The above notation may
appear complicated, but you have already seen some examples of maps in the right
hand of the above equation. For example, the map h 7→ hh, ·i is an element of
L(Rn ; L(Rn ; R1 )), that is, for every h ∈ Rn , the map u 7→ hh, ui is a linear map
from Rn to R1 .
In terms of our definition of derivative, DDf (p) is a linear map L ∈ L(Rn ; L(Rn ; Rm ))
such that the following holds
kDf (x) − Df (p) − L[x − p]k
lim = 0.
x→p kx − pk
Note that in the above equation, the norm on the numerator is the norm k·k on
Rmn and the norm on the denominator is k·k on Rn .
Obviously, we can generalise this to consider a map which is k−times differ-
entiable. In practice, the condition of k−times differentiable at a point can be
difficult to establish. However, if f : Ω → Rm is k times differentiable with all those
derivatives continuous, we say f is k-times continuously differentiable.
Assume that f = (f 1 , f 2 , . . . , f m ). We know from the previous results that if f
is differentiable at p ∈ Ω, the partial derivative maps

Di f j : Ω → R
x 7→ Di f j (x).

Lecture notes for Week 4


Chapter 1. Differentiation in higher dimensions Analysis II, Term I, Page 36

exist for all x ∈ Ω. If moreover, Df is differentiable at p ∈ Ω, then the second


partial derivatives
Di f j (p + tek ) − Di f j (p)
Dk Di f j (p) := lim
t→0 t
will exist.
It is easier to ask if all of the k−th partial derivatives exist and are continuous in
a neighbourhood of p. This is a slightly stronger condition, which implies k−times
differentiability at p by Theorem 1.12.

Example 1.14. Consider the map f : R2 → R given by

f : (x, y) = x3 + y 3 + 5x2 y.

This is differentiable at each point p = (x, y) ∈ R2 , and the partial derivatives are

D1 f (p) = 3x2 + 10xy, D2 f (p) = 3y 2 + 5x2 .

To find the second partial derivatives, we consider the maps

D1 f (x, y) = 3x2 + 10xy,

and
D2 f (x, y) = 3y 2 + 5x2 ,
and differentiate them. The second partial derivatives are thus

D1 D1 f (p) = 6x + 10y
D2 D1 f (p) = 10x
D1 D2 f (p) = 10x
D2 D2 f (p) = 6y

Notice that
D2 D1 f (p) = D1 D2 f (p).
This is a coincidence!

1.5.2 Symmetry of mixed partial derivatives


We will state a result here, but do not give a proof. This is not the optimal result
in this direction, but it is perfectly adequate for most purposes in the next section.

Theorem 1.13 (Schwartz’ Theorem). Suppose Ω ⊂ Rn is open and f : Ω → R is


differentiable at every p ∈ Ω. Suppose further that for some i, j ∈ {1, . . . , n} the
second partial derivatives Di Dj f and Dj Di f exist and are continuous at all p ∈ Ω.
Then, at every p ∈ Ω,
Di Dj f (p) = Dj Di f (p).

Lecture notes for Week 4


Chapter 1. Differentiation in higher dimensions Analysis II, Term I, Page 37

If f : Ω → R, the matrix of second partial derivatives at the point p,

Hess f (p) = [Di Dj f (p)]i,j=1,...,n

is called the Hessian of f at p. Assuming the hypotheses on the second partial


derivatives hold, Schwartz’ Theorem states that the Hessian is a symmetric matrix.

Exercise 1.18. Suppose A is a symmetric (n × n) matrix. Consider the map


f : Rn → R defined as
f (x) = xAxt .

(a) Show that f is differentiable at all points p ∈ Rn , with

Df (p) = 2pA

(b) Find
Hess f (p).

Exercise 1.19. Consider the function f : R3 → R given by:

f : (x, y, z) = xy 2 + x2 + xzey .

(i) Compute the first and second partial derivatives. Observe the properties of
the second partial derivative.

(ii) Write the terms of the Taylor expansion of f at zero up to and including the
second-order terms.

(iii) Without computation, write the same Taylor expansion up to and including
the fourth-order terms.

Exercise 1.20 (*). Consider the function f : R2 → R defined as



 xy3 −x3 y if (x, y) 6= (0, 0)
x2 +y 2
f (x, y) =
0 if (x, y) = (0, 0).

(a) Show that


2x(xy 3 −x3 y )

 y3 −3x 2y
− if (x, y) 6= (0, 0)
x2 +y 2 (x2 +y 2 )2
D1 f (x, y) =
0 if (x, y) = (0, 0).

and
2y (xy 3 −x3 y )

 3y2 x−x3 − if (x, y) 6= (0, 0)
x2 +y 2 (x2 +y 2 )2
D2 f (x, y) =
0 if (x, y) = (0, 0).
Show that both of these functions are continuous at (0, 0).

Lecture notes for Week 4


Chapter 1. Differentiation in higher dimensions Analysis II, Term I, Page 38

(b) Show that


1
lim (D1 f (te2 ) − D1 f (0)) = 1
t→0 t
and
1
lim (D2 f (te1 ) − D2 f (0)) = −1
t→0 t

(c) Conclude that both D2 D1 f (0) and D1 D2 f (0) exist, but

D2 D1 f (0) 6= D1 D2 f (0)

1.5.3 Taylor’s theorem


The differentiability of a map of higher dimensions allows us to approximate the
map near a point with a linear map which is a simpler object. This has significant
consequences which we discus in Section 1.6. However, when thinking of differentia-
bilities of higher orders, one may wonder if those lead to better approximations than
the ones by a linear map, perhaps, by more complicated objects than linear maps.
In terms of complexity, the next class of maps after linear ones are polynomial maps
in several variables. We look into such approximations in this section.
A powerful result concerning differentiable functions of one variable is Taylor’s
theorem, which permits us to approximate a function in a neighbourhood of a point
p by a polynomial, with an error term that goes to zero at a controlled rate as we
approach p. In order to state Taylor’s theorem for higher dimensions, it’s useful to
introduce some new notation.
When dealing with partial derivatives of high orders, the notation can get rather
messy. To mitigate this, it’s convenient to introduce “multi-indices”. We define a
multi-index α to be an element of (N)n , i.e. an n-vector of non-negative integers
α = (α1 , . . . , αn ). We define |α| = α1 + . . . + αn and

D α f := (D1 )α1 (D2 )α2 · · · (Dn )αn f,

It’s convenient to also introduce, for a vector h = (h1 , . . . , hn ),


α1 α2
hα := h1 h2 · · · (hn )αn

as well as the multi-index factorial α! = α1 !α2 ! · · · αn !,

Theorem 1.14. Suppose that p ∈ Rn and f : Br (p) → R is k−times continuously


differentiable at all points q ∈ Br (p), for some integer k ≥ 1. Then, for every h ∈ Rn
with khk < r, we have
X hα α
f (p + h) = D f (p) + Rk (p, h).
α!
|α|≤k−1

Lecture notes for Week 4


Chapter 1. Differentiation in higher dimensions Analysis II, Term I, Page 39

where the sum is taken over all multi-indices α = (α1 , . . . , αn ) with |α| ≤ k − 1 and
the remainder term is given by:
X hα
Rk (p, h) = D α f (x)
α!
|α|=k

for some x with 0 < kx − pk < khk.


(*) Proof. The result follows from the one-dimensional Taylor’s theorem. First, we
r
note that there exists ǫ > 0 such that khk < 1+ǫ . Let us define the function
g : (−1 − ǫ, 1 + ǫ) → R defined as

g(t) = f (p + th).

By the chain rule, this function is k-times differentiable on the interval (−1−ǫ, 1+ǫ),
and [0, 1] ⊂ (−1 − ǫ, 1 + ǫ), so by one dimensional Taylor’s theorem we have
g ′′ (0) g(k−1) (0)
g(1) = g(0) + g ′ (0) + + ... + + Rk ,
2! (k − 1)!
where
g(k) (ξ)
Rk =,
k!
for some ξ ∈ (0, 1). We will be done if we can show that for j = 0, 1, . . . k we have
X hα
g(j) (t) = j! D α f (p + th). (1.12)
α!
|α|=j

This is certainly true for j = 0. Suppose it’s true for some j ≥ 0. Then we have
 
n
X X hα
g (j+1) (t) = hl Dl j! D α f  (p + th)
α!
l=1 |α|=j
n
X X hα hl
= j! Dl D α f (p + th)
α!
l=1 |α|=j

Clearly, the right-hand side of the above equation is a sum of terms proportional to
hβ D β f (p + th) where |β| = j + 1. Suppose β = (β1 , . . . , βn ), then the coefficient of
the term proportional to hβ D β f (p + th) is
j! j! j!
+ + ... +
(β1 − 1)!β2 ! · · · βn ! β1 !(β2 − 1)! · · · βn ! β1 !β2 ! · · · (βn − 1)!
(j + 1)! (j + 1)!
= = ,
β1 !β2 ! · · · βn ! β!
by a result from combinatorics (you do not need to verify this). Thus we have
n X α l
(j+1)
X h h
g (t) = j! Dl D α f (p + th)
α!
l=1 |α|=j
X hβ
= (j + 1)! D β f (p + th)
β!
|β|=j+1

Lecture notes for Week 4


Chapter 1. Differentiation in higher dimensions Analysis II, Term I, Page 40

By induction we conclude that (1.12) holds for all j = 0, . . . , k and the result
follows.

Exercise 1.21. Consider the function f : R2 → R defined as f (x, y) = ex sin(y).

a) Compute the degree 1 and degree 2 Taylor polynomial of f near the point
(x0 , y0 ) = (0, π/2) and use those to approximate the value of f at (x1 , y1 ) =
(0, π/2 + 1/4). Compare your results with the values you obtain from a cal-
culator.

b) How precise is the degree 1 approximation in the closed ball of radius 1/4
around (x0 , y0 ). Find a rigorous upper bound for the approximation error.

1.6 Inverse and Implicit function theorems


1.6.1 Inverse function theorem
Suppose f : R → R is continuously differentiable in an interval around p ∈ R, with
f ′ (p) 6= 0, say f ′ (p) > 0. Then there is an open interval I with p ∈ I such that
f ′ (x) > 0 for all x ∈ I. This (by the mean value theorem) implies that f is strictly
monotone increasing on I and hence f : I → f (I) is bijective. In particular, there
exists an inverse function f −1 : f (I) → I. With a little work, one can establish that
f −1 is differentiable, and moreover, by an application of the chain rule, obtain the
following formula for the derivative of the inverse map,
1
f ′ (p) = .
(f −1 )′ (f (p))

This result can be generalised to higher dimensions.

Theorem 1.15 (Inverse Function Theorem). Let Ω be an open set in Rn , f : Ω →


Rn continuously differentiable on Ω, and there is q ∈ Ω such that Df (q) invertible.
Then, there are open sets U ⊂ Ω and V ⊂ Rn with q ∈ U and f (q) ∈ V such
that

(i) f : U → V is a bijection,

(ii) f −1 : V → U is continuously differentiable,

(iii) for all y ∈ V ,


−1
Df −1 (y) = Df (f −1 (y))

.

Recall that since the Jacobian Df (q) is an n × n matrix, the statement that it
is invertible is equivalent to the statement that det Df (q) 6= 0.

Lecture notes for Week 4


Chapter 1. Differentiation in higher dimensions Analysis II, Term I, Page 41

Example 1.15. Consider the map f : R2 → R2 defined as

f (x, y) = (x + y + 5xy, y − x2 )

The partial derivatives of f are

D1 f (x, y) = (1 + 5y, −2x), D2 f (x, y) = (1 + 5x, 1).

Evidently, both of these maps are continuous from R2 to R2 . Thus, by Theorem 1.12,
f is differentiable at every point in R2 . Moreover, by Theorem 1.9, the Jacobian of
f at (x, y) ∈ R2 is given by the matrix
!
1 + 5y 1 + 5x
Df (x, y) = .
−2x 1

This is a continuous function from R2 to R2×2 = R4 .


We note that !
1 1
Df (0, 0) =
0 1
with det Df (0, 0) = 1 6= 0, and hence Df (0, 0) is invertible. By the Inverse Function
Theorem, f is invertible on some neighbourhood of the origin, with
!
−1 −1 1 −1
Df (0, 0) = [Df (0, 0)] = .
0 1

It is worth noting that obtaining an explicit formula for the inverse map is not easy,
and hence the derivative of the inverse map is out of reach using the direct approach.

Exercise 1.22. Consider the function f : R2 → R2 given by:


! !
x x + y − xy
f: 7→
y x2

Determine the set of points in R2 such that f is invertible near those points, and
compute the derivative of the inverse map.

The Inverse Function Theorem has applications to solving systems of equations.


Assume that we have n equations in n unknowns x1 , x2 , . . . xn , given in the form

f 1 (x1 , x2 , . . . , xn ) = y 1 ,
f 2 (x1 , x2 , . . . , xn ) = y 2 ,
..
.
f n (x1 , x2 , . . . , xn ) = y n .

Lecture notes for Week 4


Chapter 1. Differentiation in higher dimensions Analysis II, Term I, Page 42

where y 1 , y 2 , . . . y n are given real numbers, and f 1 , f 2 , . . . , f n are some functions of


x1 , x2 , . . . , xn .
For arbitrary values of x10 , x20 , . . . xn0 , we obtain real numbers y01 , y02 , . . . y0n satis-
fying the above equation. That is, we define the values of y01 , y02 , . . . y0n using the
above functions. The Inverse Function Theorem can be used here and guarantees
that if the map F : Rn → Rn is defined as

F (x1 , x2 , . . . , xn ) = f 1 (x1 , x2 , . . . , xn ), f 2 (x1 , x2 , . . . , xn ), . . . , f n (x1 , x2 , . . . , xn )




is continuously differentiable, and DF at (x10 , x20 , . . . , xn0 ) is invertible, then for all
values of y 1 , y 2 , . . . y n sufficiently close enough to y01 , y02 , . . . y0n the above system of
equations has a unique solution. Indeed the solution is given by the inverse of the
map F .
For example, by the previous example, we conclude that for a and b close to 0,
the equations

x + y + 5xy = 0
y − x2 = 0

has unique solutions for x and y.


This is a fairly powerful statement, but the issue here is that the theorem does
not immediately say how close one must have a and b to 0 in order for the solutions
exist. It only says that for close enough a and b, there are solutions. However,
since there is a constructive proof of the theorem, one can follow the steps in the
proof, and obtain an explicit neighbourhood of (0, 0) such that for all (a, b) in that
neighbourhood, the solutions exit.
Let Ω and Ω′ be open subsets of Rn . We say that a map f : Ω → Ω′ is a
C 1 -diffeomorphism, if f : Ω → Ω′ is a bijection (i.e. injective and surjective),
f : Ω → Ω′ is continuously differentiable, and for every x ∈ Ω, Df (x) is invertible.
Example 1.16. Let Ω be an open sets in Rn , and define D as the set of all C 1 -
diffeomorphisms from Ω to Ω. Then D is a group, with the operation

f ∗ g = f ◦ g.

To see this, first we show that for every f and g in D, f ∗ g belongs to D. So


we need to show that f ◦ g is a C 1 -diffeomorphism from Ω to Ω. We need to verify
three properties for f ◦ g.
• Since f and g belong to D, f : Ω → Ω and g : Ω → Ω are bijections. Hence,
f ◦ g : Ω → Ω is a bijection.
• Since f and g belong to D, they are continuously differentiable at every point
in Ω. Thus, by the chain rule, the map f ◦ g : Ω → Ω is differentiable at every point
in Ω, with
D(f ◦ g)(x) = D(f (g(x)) ◦ Dg(x).

Lecture notes for Week 4


Chapter 1. Differentiation in higher dimensions Analysis II, Term I, Page 43

Thus f ◦ g is differentiable on Ω. Also, since the maps y 7→ Df (y) and x 7→ Dg(x)


are continuous on Ω, and the composition of continuous maps is a continuous map,
the above formula shows that D(f ◦g) is continuous on Ω. Thus, f ◦g is continuously
differentiable on Ω.
• Since f and g belong to D, both Df (y) and Dg(x) are invertible at all x and y
in Ω. The composition of invertible matrixes is invertible. Thus, the above formula
shows that D(f ∗ g) must be invertible at every point.
The associativity of the operation ∗ is obtained from the associativity of the
composition operation for functions. That is, for all f , g and h in D, we have

(f ∗ g) ∗ h = (f ◦ g) ◦ h = f ◦ (g ◦ h) = f ∗ (g ∗ h).

The identity map id : Ω → Ω is a C 1 -diffeomorphism and hence belongs to D.


It is the identity element in D, since for every f ∈ D, we have

f ∗ id = f ◦ id = f, id ∗f = id ◦f = f.

Finally, for every f ∈ D we need to show that f −1 belongs to D. First we note


that f −1 : Ω → Ω is a bijection. Since, f is continuously differentiable on Ω and
Df (x) is invertible, by the Inverse Function Theorem, f −1 is invertible on some
neighbourhood of f (x), and D(f −1 )(f (x)) = [Df (x)]−1 is invertible. This is true
on a neighbourhood of f (x) for every x ∈ Ω. So, since f is surjective, this is true
on a neighbourhood of every point in f (Ω) = Ω.
When Ω = B1 (0) is the open ball of radius 1 about the origin, every rotation
about 0 is an element of D. However, there are many other maps in D. It forms a
very large group, as seen, for example when Ω = (−1, 1) is the open interval in R.
Exercise 1.23. (a) Suppose f : R → R is continuously differentiable in a neigh-
bourhood of the origin, and f ′ (0) = 0. Give an example to show that f may
nevertheless be bijective.

(b) Suppose f : Rn → Rn is bijective, differentiable at the origin, and det Df (0) = 0.


Show that f −1 is not differentiable at f (0).
Exercise 1.24. The non-linear system of equations

exy sin(x2 − y 2 + x) = 0
2 +y
ex cos(x2 + y 2 ) = 1

admits the solution (x, y) = (0, 0). Prove that there exists ε > 0 such that for all
(ξ, η) with ξ 2 + η 2 < ε2 , the perturbed system of equations

exy sin(x2 − y 2 + x) = ξ
2 +y
ex cos(x2 + y 2 ) = 1 + η

has a solution (x(ξ, η), y(ξ, η)) which depends continuously on (ξ, η).

Lecture notes for Week 4


Chapter 1. Differentiation in higher dimensions Analysis II, Term I, Page 44

1.6.2 Implicit Function Theorem


In the previous section, we saw that the Inverse Function Theorem has applications
to systems of n equations with n unknowns. What if there are more unknowns than
equations. That is for some n > m, we have

f 1 (x1 , x2 , . . . , xn ) = y 1 ,
f 2 (x1 , x2 , . . . , xn ) = y 2 ,
..
.
f m (x1 , x2 , . . . , xn ) = y m .

We look into this through a simple example. Consider the equation

x2 + y 2 − 1 = 0.

We can consider the map F : R2 → R2 defined as

F (x, y) = x2 + y 2 − 1

and think of the above equation as

F (x, y) = 0.

Suppose (a, b) satisfies F (a, b) = 0, and a 6= 1, −1. Then there is an open interval
A containing a and an open interval B containing b with the property that for each
x ∈ A there is a unique y ∈ B such that F (x, y) = 0. This permits us to define
a map g : A → B by g(x) = y, so that F (x, g(x)) = 0. We can think of this as

‘locally solving for y in terms of x’. If b > 0 then g(x) = 1 − x2 . For the problem
at hand, there is in fact another number b1 such that F (a, b1 ) = 0. Associated to
this point there is an open interval B1 containing b1 and a map g1 : A → B1 such

that F (x, g1 (x)) = 0. (If b > 0, then b1 < 0 and g1 (x) = − 1 − x2 ). Both g, g1 are
differentiable. See Figure 1.7.
In contrast when a = ±1 we must have b = 0 in order to have a2 + b2 = 1.
Assume that a = +1. There are no open sets A ⊂ R containing a and B ⊂ R
containing b satisfying the following property

for every x ∈ A there is a unique y ∈ B satisfying x2 + y 2 = 1.

This is because, since B is open, there is δ > 0 such that (−δ, δ) ⊂ B. Now, for

every x ∈ A close enough to a = 1, there are two points ± 1 − x2 that belong to B.
Of course one might wish to rectify this problem with choosing A as an interval of

the (1−c, 1], and B an interval of the from [0, 1 − c2 ) so that for every x ∈ A there
is a unique y ∈ B satisfying x2 + y 2 = 1. But, when we go to higher dimensions, it
is not clear what is the correct analogue of the intervals of the form [z, w) or (z, w].

Lecture notes for Week 5


Chapter 1. Differentiation in higher dimensions Analysis II, Term I, Page 45

(a,b)
B

x
A

B1
(a,b1)

Figure 1.7: The set x2 + y 2 − 1 = 0, and the intervals A, B, B1 .

The main question here is to identify the conditions on F which allows us to


write the solutions of the equation F (x, y) = 0 as graphs of maps. The Implicit
Function Theorem gives us a sufficient condition for this property to be true in a
more general setting. We first state a relatively easier version of the theorem.

Theorem 1.16 (Implicit Function Theorem–low dimensional version). Assume that


Ω ⊂ R2 is open, F : Ω → R is continuously differentiable, and there is (x′ , y ′ ) ∈ Ω
such that

(i) F (x′ , y ′ ) = 0, and

(ii) D2 F (x′ , y ′ ) 6= 0.

Then, there are open sets A ⊂ R and B ⊂ R with x′ ∈ A and y ′ ∈ B, and a map
f : A → B such that

(x, y) ∈ A × B satisfies F (x, y) = 0 iff y = f (x) for some x ∈ A.

Moreover, the map f : A → B is continuously differentiable.

Roughly speaking, the above theorem states that for each solution x0 , y0 of the
equation
F (x, y) = 0,

the nearby solutions x, y of the above equation, look like the graph of a map from
x unknown to the y unknown.

Lecture notes for Week 5


Chapter 1. Differentiation in higher dimensions Analysis II, Term I, Page 46

Exercise 1.25. For each of the following equations determine at which points one
cannot find a function y = f (x) which describes the graph in this neighbourhood.
Sketch the graphs.

(a)
1 3
y − 2y + x = 1
3
(b)

cos2 φ sin2 φ sin2 φ cos2 φ


     
2 1 1 2
x + − xy − 2 sin(2φ) + y + = 1,
a2 b2 a 2 b a2 b2
where a > 0, b > 0, 0 ≤ φ ≤ π/2 are fixed parameters. Note the cases a = b,
φ = 0, φ = π/2.

Exercise 1.26. Consider the equation

2x2 + 4xy + y 2 = 3x + 4y

(a) Show that this system of equations (implicitly) defines a function y = f (x)
with f (1) = 1.

(b) Compute f ′ (1) without knowing f explicitly.

(c) Find an explicit formula for f and check your result from b).

1.6.3 * Sketch of the proof of the Implicit Function Theorem


There is an intuitive argument which explains why the conditions in Theorem 1.16
are sufficient. With careful attention to details, one may turn this into a proof. The
argument is fairly elementary, but since it is long, you may treat it as optional.
Consider a map F : Ω → R which satisfies the hypothesis in Theorem 1.16. We
break the argument into several steps. Note that D2 F (x′ , y ′ ) 6= 0. Without loss of
generality we may assume that D2 F (x′ , y ′ ) > 0 (the other case is similar).
Step 1. There is δ > 0 such that for every x ∈ [x′ − δ, x′ + δ] and every
y ∈ [y ′ − δ, y ′ + δ], we have D2 F (x, y) > 0.
To see this, note that since F is continuously differentiable, the map

(x, y) 7→ D2 F (x, y)

is continuous from Ω to R. As this function is positive at (x′ , y ′ ), it must be positive


on a neighbourhood of that point. Thus, there is δ > 0 satisfying the property in
Step 1.
Step 2. There are δ′ with 0 < δ′ < δ such that on the set (x′ −δ′ , x′ +δ′ )×{y ′ −δ}
we have F < 0, and on the set (x′ − δ′ , x′ + δ′ ) × {y ′ + δ} we have F > 0.

Lecture notes for Week 5


Chapter 1. Differentiation in higher dimensions Analysis II, Term I, Page 47

To see this, consider the map h : [y ′ − δ, y ′ + δ] → R defined as

h(y) = F (x′ , y).

By the property in Step 1 we note that h′ (y) = D2 F (x′ , y) > 0, for all y ∈ (y ′ −
δ, y ′ + δ). This implies that h is strictly increasing on the interval (y ′ − δ, y ′ + δ).
As h(y ′ ) = 0, we must have h(y ′ − δ) < 0 and h(y ′ + δ) > 0.
By the above paragraph, F (x′ , y ′ − δ) < 0 and F (x′ , y ′ + δ) > 0. Since F is
continuous, there is δ′ > 0 such that F is negative on (x′ − δ′ , x′ + δ′ ) × {y ′ − δ},
and is positive on (x′ − δ′ , x′ + δ′ ) × {y ′ + δ}.
Step 3. For every x ∈ (x′ − δ′ , x′ + δ′ ), there is a unique y ∈ (y ′ − δ, y ′ + δ) such
that F (x, y) = 0.
Fix an arbitrary x ∈ (x′ − δ′ , x′ + δ′ ), and consider the map g : [y ′ − δ, y ′ + δ] → R
defined as
g(y) = F (x, y).
The map g is continuous on [y ′ − δ, y ′ + δ], with g(y ′ ) = F (x, y ′ − δ) < 0 and
g(y ′ + δ) = F (x, y ′ + δ) > 0. By the intermediate value theorem, there must be
y ∈ [y ′ − δ, y ′ + δ] such that g(y) = 0. So F (x, y) = 0.
On the other hand, since g′ (y) = D2 F (x, y) > 0 for all y ∈ [y ′ − δ, y ′ + δ], g is
strictly increasing on [y ′ − δ, y ′ + δ]. This implies that there is a unique point in
(y ′ − δ, y ′ + δ) where g becomes 0. This proves the uniqueness.
With the above argument, we can introduce A = (x′ − δ′ , x′ + δ′ ) and B =
(y ′ − δ, y ′ + δ).

1.6.4 The general form of the Implicit Function Theorem


There is a more general version of the Implicit Function Theorem for arbitrary
dimensions.
Theorem 1.17 (Implicit Function Theorem). Let Ω ⊂ Rn , Ω′ ⊂ Rm be open sets,
and f : Ω × Ω′ → Rm be continuously differentiable on Ω × Ω′ . Suppose there is
p = (a, b) ∈ Ω × Ω′ such that
(i) f (p) = 0, and

(ii) the m × m matrix


Dn+j f i (p) ,

1 ≤ i, j ≤ m.
is invertible.
Then, there are open sets A ⊂ Ω and B ⊂ Ω′ with a ∈ A and b ∈ B, as well as a
map g : A → B such that
f (x, y) = 0 for some (x, y) ∈ A × B iff y = g(x) for some x ∈ A.
The map g is continuously differentiable.

Lecture notes for Week 5


Chapter 1. Differentiation in higher dimensions Analysis II, Term I, Page 48

1.6.5 * Equivalence of the two theorems


In this section we prove that the Inverse Function Theorem and the Implicit Function
Theorem are equivalent.

Inverse Function Theorem implies the Implicit Function Theorem: Assume that f
satisfies the assumptions in Theorem 1.17. We define a new map

F : Ω × Ω′ → Rn × Rm

as
F (x, y) = (x, f (x, y)).

The Jacobian of F at p = (a, b) is


!
I 0
DF (p) =
N M

Here I is the n × n identity matrix, M is the matrix in Theorem 1.17, and N is the
m × n matrix with components:

Dj f i (p) ,

1 ≤ i ≤ m, 1 ≤ j ≤ n.

Since det M 6= 0, we must have det DF (p) 6= 0. Note that F (a, b) = (a, 0). There-
fore, we can apply the Inverse Function Theorem to deduce the existence of open
sets U ⊂ Ω × Ω′ and V ⊂ Rn × Rm with (a, b) ∈ U , (a, 0) ∈ V such that F : U → V
has a continuously differentiable inverse h : V → U . By shrinking U , if necessary,
we can assume that U = A × B for some open sets A ⊂ Ω and B ⊂ Ω′ .
Note that the map h must be of the form h(x, y) = (x, k(x, y)) for some con-
tinuously differentiable map k (since F has this form). Let π : Rn × Rm → Rm
be the projection map π(x, y) = y. Then f = π ◦ F . Now, by the associativity of
composition of maps,

f (x, k(x, y)) = f ◦ h(x, y) = (π ◦ F ) ◦ h(x, y)


= π ◦ (F ◦ h)(x, y) = π(x, y) = y.

Thus f (x, k(x, 0)) = 0, so we can take g(x) = k(x, 0).

Implicit Function Theorem implies the Inverse Function Theorem. Let f : Ω → Rn


be the map in Theorem 1.15. Let us consider the map

F : Rn × Ω → Rn

defined as
F (y, x) = y − f (x).

Lecture notes for Week 5


Chapter 1. Differentiation in higher dimensions Analysis II, Term I, Page 49

Let us also define p = (f (q), q) ∈ Rn × Ω. We have

F (p) = 0.

We note that the matrix

Dn+j F i (p), 1 ≤ i, j ≤ n

is −Df (q). So, by the assumption in inverse function theorem, the above matrix
is invertible. Therefore, by the Implicit Function Theorem, there is an open set
U ⊂ Rn and B ⊂ Ω with f (q) ∈ A and q ∈ B, and a map g : A → B such that

F (y, x) = 0 for some (x, y) ∈ A × B iff x = g(y) for some y ∈ A.

In particular, for all y ∈ A, F (y, g(y)) = 0. By the definition of F , this means


that y = f (g(y)), for all y ∈ A. The if and only in the above statement, implies
that f is invertible on B, and g is the inverse of f on B.

Lecture notes for Week 5


Chapter 2. Metric and topological spaces Analysis II, Term I, Page 50

Chapter 2

Metric and topological spaces

2.1 Metric spaces


2.1.1 Motivation and definition
The notions of modulus function on R and the norm function on Rn allow us to
develop the analysis on Euclidean spaces. We would like to extend the funda-
mental notions of analysis, such as convergence of sequences, continuity of maps,
etc, to more general settings. We have already seen that most concepts in higher
dimensional Euclidean spaces are analogous to the corresponding concepts in one di-
mensional Euclidean space; replacing the modulus function with the norm function.
Over all, all those concepts rely on a notion of “distance” on the ambient space.
We have all been using the concept of “distance” in our everyday life, for example,
by asking

• how much time does it take to walk from my apartment to the maths depart-
ment,

• how long does it take to travel from South Kensington tube station to Cam-
bridge by public transport,

• how much does the cheapest public transport from South Kensington tube
station to Heathrow airport cost,

• what is the distance, in kilometres, from London to Edinburgh.

What should be the correct way of defining “distance” in more general settings.
From the above examples we can see that the notion of distance should be a function
of two variables, that is, we give it two elements. There has been a long historical
development on this question, with various properties proposed and refined. Here
we present the outcome of those developments, and define what is now standard.

Lecture notes for Week 5


Chapter 2. Metric and topological spaces Analysis II, Term I, Page 51

Definition 2.1. Let X be an arbitrary set. A metric on X is a function

d: X ×X →R

satisfying the following three properties:

(M1) for all x and y in X we have d(x, y) ≥ 0, and d(x, y) = 0 if and only if x = y;

(M2) for all x and y in X, d(x, y) = d(y, x);

(M3) for all x, y and z in X, we have d(x, y) ≤ d(x, z) + d(z, y).

Property M1 is called positivity, property M2 is called symmetry, and prop-


erty M3 is called triangle inequality.

Remark 2.1. The triangle inequality in Euclidean spaces has a rather simple in-
terpretation. That is, in any triangle, the length of each side is bounded from above
by the sum of the lengths of the other two sides. In an arbitrary set, triangles may
not make sense. But the interpretation still makes sense, and is the reason behind
requiring condition M3. We think of d(x, y) as “the length of the shortest way from
x to y”. So the length of the shortest way from x to y should be bounded from above
by the length of the shortest way from x to y passing through z. See Figure 2.1.
On the other hand, property M1 tells us that the metric “separates” points. That
is, the distance between distinct points is strictly positive.

b y

x b
z

Figure 2.1: The triangle inequality.

Definition 2.2. By a metric space we mean a pair of a set and a metric on that
set. That is often denoted as M = (X, d), where X is a set, and d : X × X → R is
a metric. We refer to M as the metric space. The elements of X are called points.
Given two points x and y in X, the real number d(x, y) is called the distance
between x and y with respect to the metric d.

Lecture notes for Week 5


Chapter 2. Metric and topological spaces Analysis II, Term I, Page 52

In the above definition, when it is clear what metric is involved, we simply refer
to d(x, y) as the distance between x and y.
It is customary to use the same notation for M and X, that is, the metric space
X = (X, d).

Remark 2.2. The reason that we refer to the elements of X as points, is because
we would like a unified approach to all metric spaces. That is, to present statements
and proofs so that it applies to a variety of settings. We understand that when
X = R, then elements of X are numbers, when X = Rn , the elements of X are
vectors, and when X is the set of all 5 × 5 matrices, then each element of X is a
matrix. We refer to all those elements as points in X.

2.1.2 Examples of metric spaces


There are many examples of metrics. You are already familiar with some of them,
although you did not use the terminology of metric spaces.

Example 2.1. Let X = R and d1 : R × R → R be the function defined as

d1 (x, y) = |x − y|.

From the properties of the modulus function, see Section 1.1.1, we immediately see
that d1 satisfies the properties M1, M2, and M3. For example, for M2, we see that

d1 (x, y) = |x − y| = |y − x| = d1 (y, x).

Example 2.2. Let X = Rn , and for x = (x1 , x2 , . . . , xn ) and y = (y 1 , y 2 , . . . , y n )


in Rn , let
 1/2
n
X
d2 (x, y) = kx − yk =  (xj − y j )2  .
j=1

By the properties of the norm function on Rn , see Section 1.1.2, d2 satisfies the
properties M1, M2, and M3 in Definition 2.1. For example, to see property M3,
we note that for every x, y, and z in Rn , by the triangle inequality for the norm
function, we have

d2 (x, y) = kx − yk ≤ kx − zk + kz − yk = d2 (x, z) + d2 (z, y).

The metric d2 on Rn is called the Euclidean metric on Rn .

Example 2.3. Let X = Rn , and for x = (x1 , x2 , . . . , xn ) and y = (y 1 , y 2 , . . . , y n )


in Rn , let
n
X
d1 (x, y) = |xj − y j |.
j=1

Lecture notes for Week 5


Chapter 2. Metric and topological spaces Analysis II, Term I, Page 53

b y
|x2 − y 2 |

x b b

|x1 − y 1 |

Figure 2.2: Illustration of the metric d1 on R2 .

We need to verify that the properties M1, M2 and M3 in Definition 2.1 hold.
M1: Fix arbitrary x = (x1 , x2 , . . . , xn ) and y = (y 1 , y 2 , . . . , y n ) in Rn . Since the
modulus function only produces non-negative values, for every j = 1, 2, . . . , n, we
have |xj − y j | ≥ 0. Thus,
n
X
d1 (x, y) = |xj − y j | ≥ 0.
j=1

On the other hand, if


n
X
d1 (x, y) = |xj − y j | = 0,
j=1

then, for all j = 1, 2, . . . , n, we must have |xj − y j | = 0 (because each of the numbers
in the above sum is non-negative). By the first property of the modulus function,
this implies that for all j = 1, 2, . . . , n, we have xj = y j . Hence, x = y.
M2: For every x = (x1 , x2 , . . . , xn ) and y = (y 1 , y 2 , . . . , y n ) in Rn , we have
n
X n
X
j j
d1 (x, y) = |x − y | = |y j − xj | = d1 (y, x).
j=1 j=1

M3: For every x = (x1 , x2 , . . . , xn ), y = (y 1 , y 2 , . . . , y n ) and z = (z 1 , z 2 , . . . , z n )


in Rn , we have
n
X n
X
|xj − y j | ≤ ( xj − z j | + |z j − y j |

d1 (x, y) =
j=1 j=1
n
X n
X
j j
= |x − z | + |z j − y j |
j=1 j=1

= d1 (x, z) + d1 (z, y).

In the first line of the above equation we have used the triangle inequality for the
modulus function n times (i.e. |xj − y j | ≤ |xj − z j | + |z j − y j |, for j = 1, 2, . . . , n).

Lecture notes for Week 5


Chapter 2. Metric and topological spaces Analysis II, Term I, Page 54

Intuitively, the metric d1 on R2 means that we are only allowed to travel along
horizontal and vertical directions to go from x ∈ R2 to y ∈ R2 . See Figure 2.2.
Exercise 2.1. Let X = Rn and define the function d∞ : Rn × Rn → R as

d∞ (x, y) = max{|x1 − y 1 |, . . . , |xn − y n |}.

Show that d∞ is a metric on Rn .


The above examples show that there can be more than one metric on Rn . The
following exercise shows that indeed, there can be many metrics on Rn .
Exercise 2.2. Show that each of the following functions is a metric on R:
(i) d(x, y) = |x3 − y 3 |, (here x3 means x raised to power 3)

(ii) d(x, y) = |ex − ey |,

(iii) d(x, y) = | tan−1 (x) − tan−1 (y)|.


Which property of the maps x 7→ x3 , x 7→ ex , and x 7→ tan−1 (x) makes these
functions a metric.
We will need the following property of the integral later on.
Lemma 2.1. Assume that a < b are real numbers, and f : [a, b] → R is a continuous
function such that f ≥ 0 on [a, b], and f is not identically equal to 0. Then,
Z b
f (t) dt > 0.
a
Proof. Since f is not identically equal to 0, there must be c ∈ [a, b] such that
f (c) > 0. Let h = f (c). Since f is continuous at c, for ǫ = h/2 > 0 there is δ > 0
such that for all t ∈ [a, b] with |t − c| < δ, we have |f (t) − h| ≤ h/2. This implies
that for all t ∈ (c − δ, c + δ) ∩ [a, b], we have

f (t) = h + (f (t) − h) ≥ h − h/2 = h/2.

Without loss of generality we may assume that δ < (b − a)/2.


Consider the function g : [a, b] → R defined as

0 if t ∈
/ (c − δ, c + δ) ∩ [a, b],
g(t) =
h/2 if t ∈ (c − δ, c + δ) ∩ [a, b].

We note that f ≥ g on [a, b]. Also, since g is only discontinuous at two points (finite
number of points is ok), it is integrable on [a, b]. Moreover,
Z b Z b
f (t) dt ≥ g(t) dt ≥ δ · h/2 > 0.
a a
Note that since c ∈ [a, b], the length of the interval [c − δ, c + δ] ∩ [a, b] is at least δ,
with the minimum length happening when c = a or c = b.

Lecture notes for Week 5


Chapter 2. Metric and topological spaces Analysis II, Term I, Page 55

Exercise 2.3. Assume that a < b are real numbers, and h : (a, b) → (0, ∞) be a
continuous function. For x and y in (a, b), we define
Z max{x,y}
dh (x, y) = h(t) dt.
min{x,y}

Show that dh is a metric on (a, b).

Intuitively, in the above exercise, the function h determines the cost of travelling
from x to y.

Exercise 2.4. Consider the function g : R × R → R defined as

g(x, y) = |x − y|2 .

Show that g is not a metric on R.

Below is an example of a metric on an slightly different set.

Example 2.4. Let S 1 be the circle of radius 1 about 0 in R2 , that is,

S 1 = (x, y) ∈ R2 | k(x, y)k = 1 .




Any pair of points a and b in S 1 divides the circle S 1 into two arcs. We assume
the convention that the end points of the arcs are included in the arcs (this does
not make any difference when calculating the arc length). We define d(a, b) as the
length of the shortest arc between a and b. When the points a and b are antipodal
(diametrically opposite of one another), the shortest arc is not unique, but those
arcs have the same length. Thus, the function d : S 1 × s1 → R is well-defined.
M1: The length of any arc is non-negative, and when the end points are distinct,
the length is strictly positive.
M2: Since the shortest arc between two points does not depend on the order
at which we choose the end points, M2 holds as well. When the end points lie on
opposite sides, the shortest arc is not unique, but the length is unique. So in that
case we have symmetry as well.
M3: Let θ1 , θ2 and θ3 be arbitrary points on S 1 . If the points θ1 , θ2 and θ3 are
not pairwise disjoint, then we obviously have

d(θ1 , θ3 ) ≤ d(θ1 , θ2 ) + d(θ2 , θ3 ).

That is because, if θ1 = θ3 , the left hand side of the above inequality is 0, and the
right hand side is non-negative by the definition of metric. Also, if θ2 ∈ {θ1 , θ3 }, the
value on the left hand side also appears on the right hand side of the inequality, with
the other term on the right hand side non-negative. So we may assume that the
points θ1 , θ2 , and θ3 are pairwise disjoint. Let ℓi,j denote the shortest arc between
θi and θj , for i and j in {1, 2, 3}. We consider few cases:

Lecture notes for Week 5


Chapter 2. Metric and topological spaces Analysis II, Term I, Page 56

(i) θ2 belongs to ℓ1,3 : Then ℓ1,2 ∪ ℓ2,3 = ℓ1,3 , and hence

d(θ1 , θ3 ) = d(θ1 , θ2 ) + d(θ2 , θ3 ) ≤ d(θ1 , θ2 ) + d(θ2 , θ3 ).

(ii) θ1 belongs to ℓ2,3 . Then, ℓ1,3 ⊂ ℓ2,3 , and hence

d(θ1 , θ3 ) ≤ d(θ2 , θ3 ) ≤ d(θ1 , θ2 ) + d(θ2 , θ3 ).

(iii) θ3 belongs to ℓ1,2 . Then, ℓ1,3 ⊂ ℓ1,2 , and hence

d(θ1 , θ3 ) ≤ d(θ1 , θ2 ) ≤ d(θ1 , θ2 ) + d(θ2 , θ3 ).

(iv) neither of the cases (i)-(iii) holds. Then, ℓ1,2 ∪ ℓ1,3 ∪ ℓ2,3 = S 1 , and hence

d(θ1 , θ3 ) = “length of” ℓ1,3 ≤ “length of” (S 1 \ ℓ1,3 ) = d(θ1 , θ2 ) + d(θ2 , θ3 ).

See Figure 2.3.


ℓ1,2

b
θ1

θ2 b

ℓ2,3 b

O
θ3
b

Figure 2.3: The circle of radius 1 about 0 in R2 , and the distance of arc length.

All the examples of metrics we have seen so far are on the of real numbers and
Euclidean spaces. But the purpose of giving an axiomatic definition of metric is
to generalises analysis. Here are few examples of metric spaces which shows the
generality of this notion.

Example 2.5. Let E be a finite set, and let P(E) denote the set of all subsets of
E. Given A ∈ P(E), we define Card(A) as the number of elements in A. Also, for
A and B in P(E), we define the symmetric difference of A and B as

A∆B = (A \ B) ∪ (B \ A).

The function dcard : P(E) × P(E) → R defined as

dcard (A, B) = Card(A∆B)

Lecture notes for Week 5


Chapter 2. Metric and topological spaces Analysis II, Term I, Page 57

is a metric on P(E).
This metric is called the Hamming metric, and plays important role in informa-
tion theory and cryptography.

Although we all have an intuitive way of thinking about distances, we need to


be cautious when dealing with metrics in general. The axiomatic description of the
metric in Definition 2.1 captures a wide range of settings, as we discuss in the next
two examples.

Example 2.6. Let X be an arbitrary non-empty set. Define, ddisc : X × X → R as



0 if x = y,
ddisc (x, y) =
1 if x 6= y.

You can see that this is a metric on X. In this metric all distinct points lie at
distance 1 from each other (you may wish to imagine this for some sets). The
metric ddisc is called the discrete metric.

Another counter intuitive example of a metric is presented in the next Exercise.

Exercise 2.5. Let X = R2 , and define drail : R2 × R2 → R as



kx − yk if x = ky for some k ∈ R
drail (x, y) =
kxk + kyk otherwise

Show that drail is a metric on R2 .


This is called the British rail metric. The intuition behind this metric is that if
two towns are on the same rail line, then we travel between them, but if the towns
are on distinct lines, we travel via London (represented as as the origin in R2 ).

Example 2.7. We say that a sequence (x1 , x2 , x3 , . . . ) is bounded, if there is M ∈ R


such that for all i ≥ 1, |xi | ≤ M . Let X be the set of all bounded sequences, and
consider the function d∞ : X × X → R defined as

d∞ (x, y) = sup |xk − y k |.


k≥1

M1: Since the supremum of a collection of non-negative numbers is a non-


negative number, d∞ (x, y) ≥ 0 for all x and y in X. On the other hand, if d∞ (x, y) =
supk≥1 |xk − y k | = 0, we must have |xk − y k | = 0 for all k ≥ 1. Therefore, x = y.
M2: Evidently, since |t| = | − t| for all t ∈ R, we have

d∞ (x, y) = sup |xk − y k | = sup |y k − xk | = d∞ (y, x).


k≥1 k≥1

M3: Fix arbitrary elements of X:

x = (x1 , x2 , . . . , ), y = (y 1 , y 2 , . . . , ), z = (z 1 , z 2 , . . . ).

Lecture notes for Week 5


Chapter 2. Metric and topological spaces Analysis II, Term I, Page 58

For every j ≥ 1, we have


! !
|xj − y j | ≤ |xj − z j | + |z j − y j | ≤ sup |xk − z k | + sup |z k − y k |
k≥1 k≥1

= d∞ (x, z) + d∞ (z, y).

The right hand side of the above equation is a constant independent of j. Thus,
for all j ≥ 1, |xj − y j | is bounded from above by that constant. Therefore, their
supremum must be bounded by that constant. That is,

d∞ (x, y) = sup |xj − y j | ≤ d∞ (x, z) + d∞ (z, y).


j≥1

The metric space (X, d∞ ) is called the l∞ space.

Assume that a and b are real numbers with a < b. Define the set

C([a, b]) = f : [a, b] → R | f : [a, b] → R is continuous.

Example 2.8. For f and g in C([a, b]), define

d∞ (f, g) = max |f (t) − g(t)|.


a≤t≤b

Since f and g are continuous on [a, b], they are bounded so there exists k1 and k2 in R
such that for all t ∈ [a, b], |f (t)| ≤ k1 and |g(t)| ≤ k2 . Therefore, d∞ (f, g) ≤ k1 + k2 ,
so d∞ is well defined on C([a, b]).
As in the previous example, one can see that d∞ is a metric on C([a, b]). This
is called the supremum metric, or the uniform metric.

Example 2.9. For f and g in C([a, b]), define


Z b
d1 (f, g) = |f (t) − g(t)| dt.
a

The function d1 is a metric on C([a, b]). To see this, first note that since the modulus
of a continuous function is a continuous function, the integral in the above definition
is defined.
M1: For every f and g in X, and every t ∈ [a, b], |f (t) − g(t)| ≥ 0. Thus,
Z b
d1 (f, g) = |f (t) − g(t)| dt ≥ 0.
a

On the other hand, if d1 (f, g) = 0, by Lemma 2.1, we must have |f − g| is identically


equal to 0. Thus, f = g as functions on [a, b].
M2: Since for all t ∈ R, |t| = | − t|, we have
Z b Z b
d1 (f, g) = |f (t) − g(t)| dt = |g(t) − f (t)| dt = d1 (g, f ).
a a

Lecture notes for Week 5


Chapter 2. Metric and topological spaces Analysis II, Term I, Page 59

M3: Let f , g and h be continuous functions on [a, b]. For all t ∈ [a, b], by the
triangle inequality for the modulus function, we have

|f (t) − g(t)| = (f (t) − h(t)) + (h(t) − g(t)) ≤ |f (t) − h(t)| + |h(t) − g(t)|.

Integrating the above functions, we note that


Z b Z b Z b
|f (t) − g(t)| dt ≤ |f (t) − h(t)| dt + |h(t) − g(t)| dt,
a a a

which gives us
d1 (f, g) ≤ d1 (f, h) + d1 (h, g).

We have already seen many examples of metric spaces. There are some ways
to define new metric spaces using other metric spaces. We present two approaches
below.

Definition 2.3. Let (X, d) be a metric space, and Y ⊂ X be an arbitrary subset.


Define d |Y : Y × Y → R as d|Y (x, y) = d(x, y), for all x and y in Y . Clearly d |Y is
a metric on Y (it inherits all the properties from d). The pair (Y, d |Y ) is called a
metric subspace of (X, d), and d |Y is called the induced metric on Y from d.

Example 2.10. Consider the Euclidean metric space (R, d1 ). We may restrict this
metric to the set of rational numbers Q ⊂ R. Also, d1 induces a metric on the set
of integers Z ⊂ R.
Similarly, since Zn ⊂ Rn and Qn ⊂ Rn , we may restrict any of the metrics d1 ,
d2 , and d∞ onto those sets.

Given arbitrary sets X1 and X2 , we define the (set-theoretical) product of these


two sets as
X1 × X2 = {(x1 , x2 ) | x1 ∈ X1 , x2 ∈ X2 }.

That is, the set of all ordered pairs (x1 , x2 ) such that x1 ∈ X1 and x2 ∈ X2 .

Definition 2.4. Let (X1 , d1 ) and (X2 , d2 ) be two metric spaces. We may use the
metrics d1 and d2 to define a metric on X1 × X2 . For example,

d ((x1 , x2 ), (y1 , y2 )) = max{d1 (x1 , y1 ), d2 (x2 , y2 )},


d ((x1 , x2 ), (y1 , y2 )) = d1 (x1 , y1 ) + d2 (x2 , y2 ).

Each of the above functions from (X1 × X2 ) × (X1 × X2 ) to R is a metric. For each
of the above metrics d, the metric space (X1 × X2 , d) is called a product metric
spaces.

Lecture notes for Week 5


Chapter 2. Metric and topological spaces Analysis II, Term I, Page 60

2.1.3 Normed vector spaces


Definition 2.5. Let V be a vector space on R. We say that a function k·k : V → R
is a norm on V , if the following properties are satisfied:

(N1) for every v ∈ V , kvk ≥ 0, and kvk = 0 if and only if v = 0,

(N2) for every v ∈ V and every λ ∈ R, we have kλV k = |λ| kvk,

(N3) for all u and v in V , ku + vk ≤ kuk + kvk.

A normed vector space, is a pair of a vector space V together with a norm


function on V . This is often denoted as (V, k·k).

On any vector space (V, k·k) we have a natural notion of metric coming from the
norm function. We present this in the next lemma.

Lemma 2.2. Let V be a vector space, and k·k : V → R be a norm function on V .


The function dkk : V × V → R, defined as

dkk (u, v) = ku − vk

is a metric on V .

Proof. Property M1 comes from the property N1 of the norm function, that is,

dkk (v, w) = kv − wk ≥ 0.

Also,
dkk (v, w) = 0 ⇐⇒ kv − wk = 0 ⇐⇒ v − w = 0 ⇐⇒ v = w.

Property M2 comes from the property N2 of the norm function, since

dkk (w, v) = kw − vk = k(−1)(v − w)k = | − 1| kv − wk = kv − wk = dkk (v, w).

Property M3 comes from the property N3 for the norm. That is because

dkk (v, z) = kv − zk ≤ kv − wk + kw − zk = dkk (v, w) + dkk (w, z).

Some of the examples we already seen are normed vector spaces. For example,
the distance d2 on Rn comes from the norm k·k on Rn .

Example 2.11. Let V = Rn , and consider the functions

(x1 , x2 , . . . , xn ) 1
= |x1 | + |x2 | + · · · + |xn |,
(x1 , x2 , . . . , xn ) ∞
= max{|x1 |, |x2 |, . . . , |xn |}.

One can easily see that these functions satisfy the three properties for the norm
function. These norms induce the metrics d1 and d∞ on Rn , respectively.

Lecture notes for Week 6


Chapter 2. Metric and topological spaces Analysis II, Term I, Page 61

Assume that a < b are real numbers, and let C([a, b]) denote the set of all
continuous functions f : [a, b] → R. For f and g in C([a, b]), we define f + g as
the function (f + g)(x) = f (x) + g(x), for all x ∈ [a, b]. Also, for λ ∈ R, and
f ∈ C([a, b]), we define (λf )(x) = λf (x). These operations make C([a, b]) a vector
space on R. This vector space has infinite dimensions, since the functions x 7→ x,
x 7→ x2 , x 7→ x3 , . . . , are linearly independent.

Exercise 2.6. Assume that a < b are real numbers. Show that each of the following
functions is a norm on C([a, b]):

(i)
Z b
kf k1 = |f (t)| dt
a

(ii)
kf k∞ = max |f (t)|
t∈[a,b]

(iii)
Z b 1/2
2
kf k2 = |f (t)| dt
a

Remark 2.3. The norm k·k1 on C([a, b]) is called the l1 -norm, k·k2 on C([a, b])
is called the l2 -norm, and the norm k·k∞ on C([a, b]) is called the l∞ -norm, or
supremum norm. The metric induced from k·k1 on C([a, b]) is the d1 metric we
presented in Example 2.9 and the metric induced from k·k∞ on C([a, b]) is the d∞
metric we presented in Example 2.8.
You can learn more about these spaces in the modules Lebesgue Measure and
Integration, and Functional Analysis.

It is not true that every metric on a vector space comes from a norm. You can
show this by the following exercise.

Exercise 2.7. Show that if V is a vector space, and k·k : V → R is a norm function,
then for any v ∈ V , we must have dkk (0, 2v) = 2 dkk (0, v). Conclude that there is
no norm function on R2 which induced the discrete metric ddisc on R2 .

As we shall see in later sections, the notion of metric allows us to develop analysis
on general metric spaces. It is remarkable that such a simple notion can lead to a
huge volume of mathematical theory. Of all the properties of a function which
makes it a metric, the triangle inequality is the non-trivial one. It is worth taking a
moment to build intuition about that property. The following exercise helps you to
achieve that.

Exercise 2.8. Let (X, d) be a metric space.

Lecture notes for Week 6


Chapter 2. Metric and topological spaces Analysis II, Term I, Page 62

(i) Show that for every x, y, and z in X, we have

| d(x, z) − d(y, z)| ≤ d(x, y).

(ii) Show that for all x, y, z and t in X, we have

| d(x, y) − d(z, t)| ≤ d(x, z) + d(y, t).

(iii) Show that for all x1 , x2 , . . . , xn in X, we have

d(x1 , xn ) ≤ d(x1 , x2 ) + d(x2 , x3 ) + · · · + d(xn−1 , xn ).

2.1.4 Open sets in metric spaces


The notion of a metric on a set allows us to describe some geometric properties of
subsets of that set. We shall discuss some of these properties in Sections 2.1.4 and
2.1.6.

Definition 2.6. Consider a metric space (X, d), a point x ∈ X, and a real number
ǫ > 0. The ball of radius ǫ centred at x is the set of all points x′ ∈ X satisfying
d(x, x′ ) < ǫ. In other words,

Bǫ (x) = {x′ ∈ X | d(x, x′ ) < ǫ}.

This set is also referred to as ǫ-ball about x, or ǫ-neighbourhood of x. To em-


phasise the dependence of the ball on the metric d and the underlying space X, we
may use the notation Bǫ (x, X, d).

Example 2.12. We look at ǫ-balls in some of the metric spaces we introduced in


the previous section.

(i) In (R, d1 ), for every a ∈ R and ǫ > 0, we have

Bǫ (a) = {x ∈ R | d1 (x, a) < ǫ} = {x ∈ R | |x − a| < ǫ} = (a − ǫ, a + ǫ).

(ii) In (Rn , d2 ), for every a ∈ Rn and ǫ > 0, Bǫ (a) consists of all the points inside
a hypersphere.

(iii) In (R2 , d∞ ), for every a = (a1 , a2 ) ∈ R2 and ǫ > 0,

Bǫ (a) = {(x1 , x2 ) ∈ R2 | d∞ ((a1 , a2 ), (x1 , x2 )) < ǫ}


= {(x1 , x2 ) ∈ R2 | max{|a1 − x1 |, |a2 − x2 |} < ǫ}.

This is a square with horizontal and vertical sides of lengths 2ǫ centre at a.

Lecture notes for Week 6


Chapter 2. Metric and topological spaces Analysis II, Term I, Page 63

(iv) Let I = [0, 1] ⊂ R, and dI denote the induced metric on I from d1 on R. Then,
in (R, d1 ), we have

B1 (1) = B1 (1, R, d1 ) = {x ∈ R||x − 1| < 1} = (0, 2).

In (I, dI ), we have

B1 (1) = B1 (1, I, dI ) = {x ∈ I | dI (x, 1) < 1}


= {x ∈ [0, 1] | |x − 1| < 1}
= (0, 1].

In (I, dI ),

B1/2 (1/2) = B1/2 (1/2, I, dI ) = {x ∈ I | dI (x, 1/2) < 1/2} = (0, 1).

(v) In (X, ddisc ), where X is a non-empty set, and ddisc is the discrete metric, for
every x ∈ X and ǫ > 0 we have the following.
If ǫ ≤ 1, then
Bǫ (x) = {x′ ∈ X | ddisc (x, x′ ) < ǫ} = {x}.

If ǫ > 1,
Bǫ (x) = {x′ ∈ X | ddisc (x, x′ ) < ǫ} = X.

(vi) In (C([a, b]), d∞ ), for f ∈ C([a, b]) and ǫ > 0, we have

Bǫ (f ) = {g ∈ C([a, b]) | d∞ (f, g) < ǫ}


{g ∈ C([a, b]) | max |f (t) − g(t)| < ǫ}.
t∈I

This consists of all continuous functions g : [a, b] → R such that the graph of
g lies between the graphs of f − ǫ and f + ǫ.

Figure 2.4: Figure on the left hand side shows Bǫ (0, R2 , d1 ), the figure in the middle
shows Bǫ (0, R2 , d2 ), and the figure on the right hand side shows Bǫ (0, R2 , d∞ ).

Lecture notes for Week 6


Chapter 2. Metric and topological spaces Analysis II, Term I, Page 64

b b

a b

Figure 2.5: In (C([a, b]), d∞ ), Bǫ (f ) consists of all continuous functions on [a, b]


whose graphs lie in the red region. We have drawn the graphs of three functions in
Bǫ (f ).

Exercise 2.9. Let (X, d) be a metric space.

(i) Show that if ǫ < δ, then Bǫ (x) ⊆ Bδ (x). By example, show that the equality
may hold even if ǫ < δ.

(ii) Show that for every x ∈ X, we have


\
B1/n (x) = {x}.
n∈N

Definition 2.7. Let (X, d) be a metric space, and U ⊆ X. We say that U is open
in (X, d), if for every u ∈ U , there is δ > 0 such that Bδ (u) ⊆ U .

Lemma 2.3. Let (X, d) be a metric space. For every x ∈ X and ǫ > 0, the ball
Bǫ (x) is open in X.

Proof. Fix an arbitrary y ∈ Bǫ (x). Let δ = ǫ − d(x, y). Since y ∈ Bǫ (x), we have
d(x, y) < ǫ, and hence δ > 0.
Let z ∈ Bδ (y) be an arbitrary point. By the triangle inequality of the metric,

d(z, x) ≤ d(z, y) + d(y, x) < δ + (ǫ − δ) = ǫ.

Hence, z ∈ Bǫ (x). As z ∈ Bδ (y) was arbitrary, we conclude that Bδ (y) ⊂ Bǫ (x). As


y ∈ Bǫ (x) was arbitrary, we conclude that Bǫ (x) is an open set.

Due to the above lemma, Bǫ (x) is also called an open ball of radius ǫ about x.

Lemma 2.4. In any metric space (X, d), the empty set and the set X are open.

Lecture notes for Week 6


Chapter 2. Metric and topological spaces Analysis II, Term I, Page 65

Proof. To see that the empty set is open, we need to show that for every x in the
empty set, there is δ > such that Bδ (x) is contained in the empty set. Since there is
no such x in the empty set to begin with, for logical reasons this statement is true.
So the empty set is open.
On the other hand, for every x ∈ X, we have B1 (x) ⊂ X. That is because of
the definition of the ball. Thus we can take ǫ = 1, in the criterion for open sets.

Note that the definition of open set in a metric space (X, d) depends on both
the metric d and the underlying set X. We make this clear in the next example.

Example 2.13. Consider the discrete metric ddisc on R, that is (R, ddisc ). In this
space, any subset of R is open. To see that let U be an arbitrary subset of R, and
let u ∈ U be an arbitrary point. We let δ = 1/2, and note that B1/2 (u) = {u} ⊂ U .
This shows that U is open. But in the metric space (R, d1 ) it is not true that
every subset of R is open. For example, the set with single element {1} is open in
(R, ddisc ), but it is not open in (R, d1 ).
On the other hand, let I = [0, 1] ⊂ R, and let dI be the induced metric on [0, 1]
from d1 on R. The set [0, 1/2) is not open in (R, d1 ) (the definition does not hold
for the point 0 ∈ [0, 1/2)). But [0, 1/2) is open in ([0, 1], dI ). To show the latter
property, let x ∈ [0, 1/2). If x ∈ (0, 1/2), we define δ = min{x, 1/2 − x}, and see
that δ > 0 and

Bδ (x, I, dI ) = {x′ ∈ [0, 1/2) | dI (x, x′ ) < δ} = (x − δ, x + δ) ⊂ [0, 1/2).

If x = 0, we let δ = 1/4, and see that

Bδ (0, I, dI ) = {x′ ∈ [0, 1/2) | dI (0, x′ ) < δ} = [0, 1/4) ⊂ [0, 1/2).

According to the definition of open sets, this shows that [0, 1/2) is open in ([0, 1], dI ).

Lemma 2.5. Let X = (X, d) be a metric space. The union of any number of (finite,
countable, uncountable) open sets in X is an open set in X.

Proof. Assume that Gα ⊆ X is open, for all α in a set I. Let x ∈ ∪α∈I Gα . Then
there exists some α0 ∈ I such that x ∈ Gα0 . Since Gα0 is an open set, there exists
δ > 0 such that Bδ (x) ⊂ Gα0 . This implies that Bδ (x) ⊆ ∪α∈I Gα .

Lemma 2.6. Let X = (X, d) be a metric space. The intersection of any finite
number of open sets in X is an open set in X.

Proof. Assume that m ≥ 1 and G1 , G2 , . . . , Gm are open sets in X. Fix an arbitrary


x ∈ ∩m k=1 Gk . For every k ∈ {1, 2, . . . , m}, x ∈ Gk . For every such k, since Gk is
open, there exists ǫk > 0 such that Bǫk (x) ⊂ Gk . Let ǫ = min{ǫ1 , . . . , ǫm } > 0.
By our choice of ǫ, for every k ∈ {1, 2, . . . , m}, Bǫ (x) ⊂ Bǫk (x) ⊂ Gk . Therefore,
Bǫ (x) ⊂ ∩mk=1 Gk .

Lecture notes for Week 6


Chapter 2. Metric and topological spaces Analysis II, Term I, Page 66

The statement in the above lemma is not necessarily true if we drop the hy-
pothesis of finiteness. For example, as we saw in Exercise 2.9, in the metric space
(Rn , d2 ), we have ∩∞ 2
n=1 B1/n (x) = {x}. And the set {x} is not open in (R , d2 ).
We have already seen that there may be many metrics on a given set. For
example, we have metrics d1 , d2 , and d∞ on Rn . The definition of open set in a
metric space depends on the metric. So a priori, for each of these metrics on Rn , we
may have different open sets. This seems to be cumbersome, but can be alleviated
by the following definition.

Definition 2.8. Let d1 and d2 be metrics on a set X. The metrics d1 and d2 are
called topologically equivalent, if the following property holds. For every U ⊆ X,
U is open in (X, d1 ) if and only if U is open in (X, d2 ).

Exercise 2.10. (i) Show that for all x and y in Rn , we have



d∞ (x, y) ≤ d2 (x, y) ≤ n · d∞ (x, y).

(ii) Show that for all x and y in Rn , we have

d∞ (x, y) ≤ d1 (x, y) ≤ n · d∞ (x, y).

(iii) Show/conclude that for all x and y in Rn , we have


1
√ d2 (x, y) ≤ d1 (x, y) ≤ n d2 (x, y).
n

(iv) Conclude that the metrics d1 , d2 and d∞ on Rn are topologically equivalent.

2.1.5 Convergence in metric spaces


Definition 2.9. Let (X, d) be a metric space, and (xn )n≥1 be a sequence of points
in X. We say that the sequence (xn )n≥1 converges in (X, d), if there is x ∈ X
satisfying the following:
for every ǫ > 0 there is N ∈ N such that for all n ≥ N we have d(xn , x) < ǫ.
In this case, we say that x is the limit of the sequence (xn )n≥1 , or say that the
sequence (xn )n≥1 converges to x in (X, d), and write xn → x as n → ∞, or
limn→∞ xn = x.

Notice the similarly between the above definition and the definition of conver-
gence of sequences in Euclidean spaces.

Example 2.14. In the metric space (R, d1 ) the sequence (1/n)n≥1 converges. That
is because 0 ∈ R, and for every ǫ > 0 we can choose an integer N > 1/ǫ, so that for
all n ≥ N we have d1 (1/n, 0) = 1/n < ǫ.

Lecture notes for Week 6


Chapter 2. Metric and topological spaces Analysis II, Term I, Page 67

Now let I = (0, 1), and dI be the induced metric on I from d1 . In the metric
space (I, dI ), the sequence (1/n)n≥1 does not converge. That is because there is no
x ∈ (0, 1) satisfying the criterion for the convergence. Assume in the contrary that
there is such an x ∈ (0, 1). We choose ǫ = x/2 > 0, and for every N ∈ N, we choose
n ≥ max{N, 2/x}. Then,

dI (1/n, x) = |1/n − x| = x − 1/n ≥ x − x/2 = x/2 = ǫ.

We say that a sequence (xn )n≥1 is eventually constant, if there is n1 ∈ N such


that for all n ≥ n1 we have xn = xn1 .

Exercise 2.11. Let (X, ddisc ) be a discrete metric space, and (xn )n≥1 be a sequence
in X. Then, (xn )n≥1 converges in (X, ddisc ) if and only if the sequence (xn )n≥1 is
eventually constant.

Lemma 2.7. Let (X, d) be a metric space, and (xn )n≥1 be a sequence in X. If the
sequence (xn )n≥1 converges in (X, d), then its limit is unique.

Proof. Let us assume that there are two points x and y in X such that the sequence
(xn )n≥1 converges to. Fix an arbitrary ǫ > 0. Since the sequence converges to x,
there is N1 ∈ N such that for all n ≥ N1 , we have d(xn , x) < ǫ. Similarly, since
the sequence converges to y, there is N2 ∈ N such that for all n ≥ N2 , we have
d(xn , y) < ǫ. Now, let n = max{N1 , N2 }. We have

d(x, y) ≤ d(x, xn ) + d(xn , y) < ǫ + ǫ = 2ǫ.

By property M1 of metrics, d(x, y) ≥ 0, and since ǫ > 0 was arbitrary, the above
inequality shows that d(x, y) = 0. Then, by property M1 of the metrics, we conclude
that x = y.

Exercise 2.12. Let (X, d) be a metric space, and (xn )n≥1 be a sequence in X.
Prove that the sequence (xn )n≥1 converges to x ∈ X if and only if, for every open
set U in (X, d) with x ∈ U , there is N ∈ N such that for all n ≥ N , we have xn ∈ U .

As a corollary of the above exercise, we obtain the following result.

Corollary 2.8. Let d1 and d2 be topologically equivalent metrics on X. Then, a


sequence (xn )n≥1 in X converges in (X, d1 ) if and only if it converges in (X, d2 ).

Proof. Recall that by the definition of equivalent metrics, U is open in (X, d1 ) if


and only if U is open in (X, d2 ). The result immediately follows from the previous
exercise.

Lecture notes for Week 6


Chapter 2. Metric and topological spaces Analysis II, Term I, Page 68

2.1.6 Closed sets in metric spaces


Definition 2.10. Let (X, d) be a metric space, and V ⊆ X be a set. We say that
V is closed in (X, d), if for every sequence (xn )n≥1 in V which converges in (X, d),
then the limit of (xn )n≥1 belongs to V .

When it is clear what metric is involved, we may simple say that V is closed in
X. For example, when a metric is not specified on Rn , it is assumed that it is the
Euclidean metric d2 . Thus, when we say that E is closed in Rn , we mean that E is
closed in (Rn , d2 ).

Example 2.15. Consider real numbers a < b. The set [a, b] is closed in (R1 , d1 ).
That is because if (xn )n≥ is a sequence in [a, b] which converges to x in R, then we
have a ≤ xn ≤ b, and hence a ≤ limn→∞ xn ≤ b. This implies that x ∈ [a, b].
The intervals (a, b) and (a, b] are not closed in (R1 , d1 ). That is because

b−a
a+ , n≥2
n
is a sequence in (a, b] which converges to a in (R, d1 ), but a does not belong (a, b].
On the other hand, let I = (0, 1) and dI be the induced metric on I from
(R, d1 ). Then the set V = (0, 1/2] is closed in ((0, 1), dI ). To see this, assume that
(xn )n≥1 is a sequence in (0, 1/2] which converges in ((0, 1), d I ). By the definition of
convergence in ((0, 1), d I ), the limit of the sequence must be in (0, 1). However, since
the sequence belongs to (0, 1/2], its limit is at most 1/2. Thus, the limit belongs to
(0, 1/2].

Exercise 2.13. Let (X, ddisc ) be a discrete metric space. Then every set in X is
closed.

Note that open is not the opposite of closed. If a set is not open, it does not
mean that it is closed. For example, the set (1, 2] is neither open or closed in (R, d1 ).
There are sets that are both open and closed, as we shall see in a moment.

Theorem 2.9. Let (X, d) be a metric space and V ⊆ X. Then, V is closed in


(X, d) if and only if X \ V is open in (X, d).

Proof. First assume that V is closed. Assume in the contrary that X \V is not open.
Then, there is x ∈ X \V , such that for all δ > 0, Bδ (x) * X \V . Equivalently, for all
δ > 0, Bδ (x) ∩ V 6= ∅. In particular, for each n ∈ N, we let δ = 1/n , and conclude
that there is a point xn ∈ Bδ (x) ∩ V . This process generates a sequence (xn )n∈N
in V . The sequence (xn )n∈N converges to x in (X, d), because xn ∈ Bδ (x) implies
that d(xn , x) < 1/n. But the limit x does not belong to V , which contradicts V is
closed.

Lecture notes for Week 6


Chapter 2. Metric and topological spaces Analysis II, Term I, Page 69

Now assume that X \ V is open. Let (xn )n∈N be an arbitrary sequence in V


which converges to some x ∈ X. We need to show that x ∈ V . If x ∈ / V , then
x ∈ X \ V . Then, since X \ V is open, there is δ > 0 such that Bδ (x) ⊂ X \ V . On
the other hand, since (xn )n∈N converges to x, there is N ∈ N such that for all n ≥ N ,
we have xn ∈ Bδ (x). Thus, for all n ≥ N , xn ∈ X \ V . This is a contradiction since
(xn )n∈N is a sequence in V .

Some authors define the notion of closed sets using the equivalence form in the
above theorem. That is, a set is closed, if its complement is open. Then, they prove
(as in the proof of the above theorem) that if a set is closed, it contains the limit of
any convergent sequence in that set.

Lemma 2.10. Let (X, d) be a metric space.

(i) the intersection of any number (finite, countable or uncountable) of closed sets
in (X, d) is a closed set in (X, d),

(ii) the union of any finite number of closed sets in (X, d) is a closed set in (X, d).

Proof. Let Fα , for α ∈ I, be a collection of closed sets in X. By Theorem 2.9, for


every α ∈ I, X \ Fα is an open set. Then, by Lemma 2.5, ∪α∈I (X \ Fα ) is open in
X. Since
X \ (∩α∈I Fα ) = ∪α∈I (X \ Fα ),

we conclude that X \ (∩α∈I Fα ) is open. Using Theorem 2.9 again, we conclude that
∩α∈I Fα is closed in X. This proves part (i) of the lemma.
The proof for part (ii) is similar, except that one uses Lemma 2.6 instead of
Lemma 2.5.

It is also possible to give a proof of the above lemma, directly using the definition
of closed sets in Definition 2.10.

Lecture notes for Week 6


Chapter 2. Metric and topological spaces Analysis II, Term I, Page 70

2.1.7 Interior, isolated, limit, and boundary points in metric spaces


In a metric space (X, d), given a set V ⊂ X and a point x ∈ X, the location of x in
X relative the set V can be of several types. The simple case is if x belongs to V or
not. But, one can also ask if all the balls around x meet V , or there is a ball about
x which is contained in V , etc. We formalise these types in the next definition.

Definition 2.11. Let (X, d) be a metric space, V ⊂ X, and x ∈ X.

(i) The point x is called an interior point of V , or an inner point of V , if there


is δ > 0 such that Bδ (x) ⊂ V .

(ii) The point x is called an isolated point of V , if there exists δ > 0 such that
V ∩ Bδ (x) = {x}. In other words, there is a δ-neighbourhood of x which does
not contain any point of V except x.

(iii) The point x is called a limit point of V , or an accumulation point of V , if


for every δ > 0, Bδ (x) ∩ V contains a point other than x. In other words, for
every δ > 0, (Bδ (x) ∩ V ) \ {x} =
6 ∅.

(iv) The point x is called a boundary point of V , if for every δ > 0 we have
Bδ (x) ∩ V 6= ∅ and Bδ (x) \ V 6= ∅. In other words, x is a boundary point of
V , if every δ-neighbourhood of x meets both V and the complement of V .

Note that in items (i) and (ii), any interior point and any isolated point of V
is an element of V . But the limit point and the boundary point of a set V are not
necessarily elements of V .

Example 2.16. Consider the Euclidean metric space (R2 , d2 ), and the set
[
V = (x, y) ∈ R2 | k(x, y)k ≤ 1, x ≥ 0 (x, y) ∈ R2 | k(x, y)k < 1, x < 0 .


Figure 2.6: The solid black arc is part of V , but the dotted arc is not part of V .

You can see that (x, y) is an interior point of V if and only if k(x, y)k < 1. The
set V has no isolated points. The point (x, y) is a limit point of V if and only if
k(x, y)k ≤ 1. The point (x, y) is a boundary point of V if and only if k(x, y)k = 1.
Verify these statement.

Lecture notes for the week 7


Chapter 2. Metric and topological spaces Analysis II, Term I, Page 71

Example 2.17. In the metric space (R, d1 ), consider the set

V = {1/n | n ∈ N}.

Then, V has no interior point. Every point in V is an isolated point of V . The point
0 is the only limit point of V . Every point in V is a boundary point of V . But also
the point 0, which is not in V , is a boundary point of V .
If V = (0, 1] ∪ {2} in R1 , then a is an interior point of V if and only if a ∈ (0, 1).
The point 2 is the only isolated point of V . The point a is a limit point of V if and
only if a ∈ [0, 1]. A point a is a boundary point of V if and only if a ∈ {0, 1, 2}.

Definition 2.12. Let (X, d) be a metric space, and V ⊂ X.

(i) The interior of V is defined as the set of all v ∈ V such that v is an interior
point of V . The interior of V is often denoted as V ◦ .

(ii) The closure of V is the union of V and all the limit points of V . The closure
of V is often denoted as V .

(iii) The boundary of V is the set of all v ∈ X such that v is a boundary point of
V . The boundary of the set V is often denoted as ∂V .

Note that V consists of

(i) all elements of V ,

(ii) all limit points of V which belong to V ,

(iii) all limit points of V which do not belong to V .

Indeed, there is a simple equivalent definition of the closure of a set V in terms


of balls. A point z belongs to the closure of V , if for every δ > 0, Bδ (z) ∩ V 6= ∅.

Example 2.18. In the metric space (R1 , d1 ), we have Q◦ = ∅, Q = R, and ∂Q = R.


Also, Z◦ = ∅, Z = Z, and ∂Z = Z.

Example 2.19. Let V = (0, 1]∪ {2} in (R1 , d1 ). Then, V ◦ = (0, 1), V = [0, 1]∪ {2},
∂V = {0, 1, 2}.

By the above definition, we note that a set V is open if and only if V ◦ = V .

Exercise 2.14. Let (X, d) be a metric space, and V be a subset of X. Show that
the set V is closed if and only if V = V .

Exercise 2.15. Let V and W be subsets of a metric space (X, d). The following
properties hold:

(i) if V ⊂ W , then V ◦ ⊂ W ◦ ,

Lecture notes for the week 7


Chapter 2. Metric and topological spaces Analysis II, Term I, Page 72

(ii) if V ⊂ W , then V ⊂ W ,

Exercise 2.16. Let V and W be subsets of a metric space (X, d). Prove that

V ∪ W = V ∪ W.

Give an example of (X, d), V and W such that

(V ∪ W )◦ 6= V ◦ ∪ W ◦ .

Lemma 2.11. Let (X, d) be a metric space, and V ⊆ X. Then, x ∈ X is a limit


point of V if and only if there exists a sequence of points in V \ {x} which converges
to x.

Proof. Assume that there is a sequence of points, say (xn )n≥1 , in V \ {x} which
converges to x. We need to show that for every δ > 0, Bδ (x) ∩ V contains an
element of V other than x. Fix an arbitrary δ > 0. Because the sequence (xn )n≥1
converges to x, there is N ∈ N such that for all n ≥ N , xn ∈ Bδ (x). As the
sequence lies in V \ {x}, we conclude that xN is distinct from x, and xN ∈ Bδ (x).
This completes the proof.
Now assume that x is a limit point of V . For each n ∈ N, the number δn = 1/n
is strictly positive. So, by the definition of limit points, B1/n (x) ∩ V contains an
element different from x. Let xn be such an element. This process generates a
sequence (xn )n≥1 in V \ X. We do not know that the points in the sequence x1 , x2 ,
x3 , . . . are distinct points. But this does not matter for us. The sequence (xn )n≥1
converges to x since d(xn , x) < 1/n.

Definition 2.13. Let (X, d) be a metric space.

• We say that a set V ⊆ X is dense in X, if V = X.

• We say that the metric space (X, d) is separable, if there is a countable set
which is dense in X.

Example 2.20. In the metric space (R1 , d1 ), the set Q is countable and dense. So
(R1 , d1 ) is separable.
In the metric space (Rn , d2 ) the set of all vectors with rational coordinates is
countable and dense in Rn .

Remark 2.4. By a classical theorem in analysis (Stone-Weierstrass theorem), any


continuous function f : [a, b] → R can be approximated by polynomials with real
coefficients. In other words, the set of polynomials is dense in the metric space
(C([a, b]), d∞ ). Since the set of polynomials with rational coefficients is count-
able and dense in the space of all polynomials with real coefficients, it follows that

Lecture notes for the week 7


Chapter 2. Metric and topological spaces Analysis II, Term I, Page 73

(C([a, b]), d∞ ) is separable. You can see an elementary, but rather long, proof of
this classic theorem in the Principles of Analysis by Rudin.
On the other hand, one can approximate any continuous function on [0, 1] by a
(series) function of the form

X
an cos(2πnx) + bn sin(2πnx),
n=1

where an and bn are real numbers. So they also form a dense subset of (C([0, 1]), d∞ ).
Functions of the above form are called Fourier series. There is an entire module
called “Fourier Analysis and the Theory of Distributions” devoted to the properties
of such functions.

Example 2.21.* Recall the space of all bounded sequences in R with the supremum
metric d∞ . This metric space is not separable. To see that, Let E denote the set
of all sequences of 0s and 1s (i.e. 00111010101010 . . . ). You have already seen in
Analysis I that E is uncountable.
Note that the d∞ distance between any two distinct elements of E is equal to
1. Then, for distinct elements e1 and e2 in E, B1/2 (e1 ) ∩ B1/2 (e2 ) = ∅. So any
dense subset needs to have at least one element from each such ball, but there are
an uncountable number of such balls. Hence, the dense subset can not be countable.

2.1.8 Continuous maps of metric spaces


Let us recall a terminology from basic set theory and maps.
Let f : M → N . For any m ∈ M , n = f (m) ∈ N is called the image of m
under the map f . If A is a subset of M , the image of A under f is defined (and
denoted) at
f (A) = {f (m) | m ∈ A}.

For a given n ∈ N , the set of elements m ∈ M such that f (m) = n is called the
pre-image of n. This should be denoted as f −1 ({n}), but abusing the notation,
it is often denoted as f −1 (n). For any set B ⊆ N , the pre-image of B, is defined
(and denoted) as
f −1 (B) = {m ∈ M | f (m) ∈ B}.

Of course it is possible that f −1 (B) = ∅, for some B ⊂ N .

Definition 2.14. Let (X, dX ) and (Y, dY ) be metric spaces, and f : X → Y be a


map.

(i) We say that f is continuous at x ∈ X, if for every ǫ > 0 there is δ > 0 such
that for every x′ ∈ X satisfying dX (x′ , x) < δ we have

dY (f (x), f (x′ )) < ǫ.

Lecture notes for the week 7


Chapter 2. Metric and topological spaces Analysis II, Term I, Page 74

(ii) We say that f : X → Y is continuous, if f is continuous at every x in X.

(iii) We say that f : X → Y is uniformly continuous, if f is continuous at every


x ∈ X, and δ = δ(ǫ) does not depend on x.

To emphasise the dependence of the notion of continuity on dX and dY , we may


say that f is continuous at x with respect to the metrics dX and dY .
There is a remarkable equivalent criterion for continuity of maps between metric
spaces. We state that as the next theorem.

Theorem 2.12. Let (A1 , d1 ) and (A2 , d2 ) be metric spaces. A map f : A1 → A2 is


continuous if and only if the pre-image of any open set in A2 is an open set in A1 .

Proof. Let us first assume that f is continuous, and fix an arbitrary open set U in
A2 . Take any x ∈ f −1 (U ), then f (x) ∈ U . As U is open in A2 , there is ǫ > 0 such
that Bǫ (f (x)) ⊂ U . As f is continuous ∃δ > 0 such that f (Bδ (x)) ⊂ Bǫ (f (x)) ⊂ U .
Therefore Bδ (x) ⊂ f −1 (U ). Since x ∈ f −1 (U ) was arbitrary, we deduce that f −1 (U )
is open.
Now assume that the pre-image of any open set is an open set. Let x ∈ A1 and
ǫ > 0 be arbitrary elements. Consider the open set Bǫ (f (x)). By the assumption,
f −1 (Bǫ (f (x))) is open. Because x ∈ f −1 (Bǫ (f (x))), there must be δ > 0 such that

Bδ (x) ⊂ f −1 (Bǫ (f (x))).

That is, f (Bδ (x)) ⊂ Bǫ (f (x)). Thus f is continuous at x. As x was arbitrary, we


conclude that f is continuous on a1 .

Exercise 2.17. Let (A1 , d1 ) and (A2 , d2 ) be metric spaces. A map f : A1 → A2 is


continuous if and only if the pre-image of any closed set in A2 is a closed set in A1 .

Example 2.22. The function f : R3 → R defined as

f (x, y, z) = x2 + 10xy 3 + sin(xy)

is continuous. Therefore, the set

{(x, y, z) ∈ R3 | f (x, y, z) ≤ −1},

is a closed set. The above set is the pre-image of the closed set (−∞, −1]. Since f
is continuous, by the above exercise, the pre-image of (−∞, −1] must be a closed
set. For the same reason, the set

{(x, y, z) ∈ R3 | f (x, y, z) ∈ (0, 1)}

is an open set.

Lecture notes for the week 7


Chapter 2. Metric and topological spaces Analysis II, Term I, Page 75

By exercise 2.17, we can easily verify the closed or openness of many sets in
Euclidean spaces. For example,

{x ∈ Rn | kxk ∈ [1, 2]}

is a closed set, and


{x ∈ Rn | kxk ∈ (5, ∞)}

is an open set.

Theorem 2.13. Let (X, dX ) and (Y, dY ) be metric spaces, and f : X → Y be a


map. The following statements are equivalent:

(i) f is continuous at x ∈ X,

(ii) for any sequence (xn )n≥1 in X which converges to some x ∈ X, the sequence
(f (xn ))n≥1 converges to f (x) in (Y, dY ).

Proof. The proof is identical to the proof of this statement for higher dimensional
Euclidean spaces. One only needs to replace the metric d2 with the metrics dX and
dY is suitable places.

Exercise 2.18. Recall that the set of all continuous functions from [0, 1] to R is
denoted by C([0, 1]). We also defined the metrics d1 and d∞ . Consider the map

Φ : C([0, 1]) → R,

defined as
Φ(f ) = f (1/2).

(i) Is the map Φ from the metric space (C([0, 1]), d ∞ ) to (R, d1 ) continuous?

(ii) Is the map Φ from the metric space (C([0, 1]), d 1 ) to (R, d1 ) continuous?

(iii) Is the map Φ from the metric space (C([0, 1]), d 2 ) to (R, d1 ) continuous?

Exercise 2.19. Consider the metric spaces X = (R, d1 ) and Y = (R, ddisc ). Show
that the map f (x) = x from X to Y is not continuous. Show that the map g(x) = x
from Y to X is continuous.

Exercise 2.20. Consider the sequence of functions fn : [0, 1] → R, for n ≥ 1,


defined as 
1 − nx if x ∈ [0, 1/n]
fn (x) =
0 otherwise.

Let f : [0, 1] → R be the constant map f ≡ 0.

Lecture notes for the week 7


Chapter 2. Metric and topological spaces Analysis II, Term I, Page 76

(i) show that the sequence (fn )n≥1 in C([0, 1]) converges to f in the metric space
(C([0, 1], d1 ).

(ii) show that the sequence (fn )n≥1 in C([0, 1]) does not converge to f in the metric
space (C([0, 1], d∞ ).

(iii) conclude that the identity map

id : (C([0, 1]), d 1 ) → (C([0, 1]), d∞ )

is not continuous.

Definition 2.15. Let (X1 , d1 ) and (X2 , d2 ) be metric spaces.

(i) A map f : X1 → X2 is called a homeomorphism, if f : X1 → X2 is a bijection


and both of the maps f : X1 → x2 and f −1 : X2 → X1 are continuous.

(ii) Two metric spaces (X1 , d1 ) and (X2 , d2 ) are called homeomorphic, if there
is a homeomorphism from X1 to X2 .

Example 2.23. The sets (−∞, ∞) and (−1, 1) with respect to the metric d1 on R1
are homeomorphic. For example the map f (x) = arctan(x) is a homeomorphism
between these two sets.

Definition 2.16. Let (X, dX ) and (Y, dY ) be metric spaces, and f : X → Y .

(i) We say that f is Lipschitz, if there is a constant M > 0 such that for all x1
and x2 in X, we have

dY (f (x1 ), f (x2 )) ≤ M · dX (x1 , x2 ).

(ii) We say that f is bi-Lipschitz, if there are constant M1 > 0 and M2 > 0 such
that for all x1 and x2 in X, we have

M2 · dX (x1 , x2 ) ≤ dY (f (x1 ), f (x2 )) ≤ M1 · dX (x1 , x2 ).

(iii) We say that f is an isometry, or distance preserving, if for every x1 and


x2 in X, we have
dY (f (x1 ), f (x2 )) = dX (x1 , x2 ).

Obviously, any isometry between metric spaces, is a Bi-lipschitz map (choose


both constants 1). Also, any bi-Lipschitz map is injective.

Example 2.24. Let (S 1 , d) be the metric space from Example 2.4, that is S 1 is
the circle of radius 1 and d is the arc length between two points on S 1 . Recall that

Lecture notes for the week 7


Chapter 2. Metric and topological spaces Analysis II, Term I, Page 77

every point on S 1 is equal to (cos(θ), sin(θ)), for a unique θ ∈ [0, 2π). For every
α ∈ [0, 2π] we can consider the rotation by α on S 1 , which may be defined as

Rα (cos(θ), sin(θ)) = (cos(θ + α), sin(θ + α)) .

For every α ∈ [0, 2π], the map Rα : S 1 → S 1 is an islometry.

Exercise 2.21. Let (X, dX ) and (Y, dY ) be metric spaces, and f : X → Y be a


surjective map. Show that if f is bi-Liptschitz, then it is a homeomorphisms.

Lecture notes for the week 7


Chapter 2. Metric and topological spaces Analysis II, Term I, Page 78

2.2 Topological spaces


2.2.1 Motivation
In this section we are going to generalise the fundamental concepts of analysis, such
as convergence of sequences and continuity of maps, to even wider settings. To
understand what we are about to do, let us briefly look back at how the notion of
a metric allowed us to define those fundamental concepts. We started with a set
X, and a non-negative function on X × X, called metric. We employed the metric
to define balls around points in X, and then using those balls we defined open sets
in X. So each metric on X gives rise to a collection of subsets of X which are
called open sets. From there, we saw that the convergence of sequences, continuity
of maps, etc, can be defined using open sets. See for instance, Exercise 2.12 and
Theorem 2.12.
Isn’t it easier to separate some subsets of X, call them open sets, and then use
them to define the convergence and continuity in the same fashion. Through this
approach, we avoid dealing with the notion of metric, which can be fairly complicated
in general. This approach seems to be more natural, because it is based on the more
basic objects; the subsets of X. Also, it is more direct, that is, we deal with things
happening in X (such as convergence of sequences in X) using objects living in X.
There is also a practical side in making this generalisation. Although most of the
spaces one comes across in mathematics are metric spaces, occasionally, one needs
to work on some sets where there cannot be a natural notion of metric (for example,
some function spaces). So this generalisation cannot be avoided.

Remark 2.5. As we will be using the word “set” and "subset" very often in this
section, we will use the words “collection” and “class” to mean “set”, and “subcol-
lection” and “subclass” to mean “subset”. So instead of saying “consider the set of
all subsets of R such that .... ”, we may prefer to say “consider the collection of all
subsets of R such that ... ”.

2.2.2 Topology on a set


Definition 2.17. Let A be an arbitrary set, and τ be a collection of subsets of A.
We say that τ is a topology on A, if the following properties hold:

(T1) the empty set ∅, and the whole set A belong to τ ,

(T2) if Gα ∈ τ , for α in a (finite or infinite) set I, then ∪α∈I Gα ∈ τ ,

(T3) if G1 , G2 , . . . , Gm belong to τ , then ∩m


i=1 Gi ∈ τ .

A topological space, denoted as (A, τ ), is a pair of a set A and a topology τ


on A. Every element of A is called a point, and every element of τ is called an

Lecture notes for the week 8


Chapter 2. Metric and topological spaces Analysis II, Term I, Page 79

open set in (A, τ ). For a point a ∈ A, we say that U is a neighbourhood of a, if


U is open (belongs to τ ) and a ∈ U .

It is possible to define a topology on any set, as we see in the next examples.

Example 2.25. Let A be an arbitrary set, and τ = {∅, A}. It is easy to see that
τ satisfies the three properties T1, T2, and T3 in the definition of topology. The
collection τ is called the coarse topology on A.

Example 2.26. Let A be an arbitrary set, and let τ be the collection of all subsets
of A. Evidently, τ satisfies the three properties T1, T2, and T3. In this topology,
every subset of A is open. This topology on A is called the discrete topology.

Below we give some non-trivial examples of topologies.

Example 2.27. Let A = {a, b}, where a and b are the letters “a” and “b” (so they
are distinct), and let
τ = {∅, {a, b}, {b}} .

It is easy to see that τ satisfies T1, T2, and T3, so it is a topology on A. The
only open sets in this topology are the empty set, A and the set {b}. So A is the
only open set containing a, and hence any open set containing a also contains b.
The collection τ is called the Sierpinski topology, and the pair (A, τ ) is called
the Sierpinski topological space. Note that this topology is not equal to the coarse
topology, and also it is not equal to the discrete topology.

Example 2.28. Let A = R and let τ be the collection of all subsets of R of the
form (a, +∞) for some a ∈ R ∪ {+∞, −∞}. Here we assume that (+∞, +∞) is the
empty set. You can verify that this collection satisfies the properties T1, T2, and
T3, so τ is a topology on R. This is called the order topology on R.

Example 2.29. Let X be an arbitrary set, and let

τ = {V ⊂ X | Card(X \ V ) < +∞, or V = ∅}.

That is each set in τ is either empty, or its complement has a finite number of
elements. You can see that this set satisfies the properties T1, T2, and T3. This
topology on X is called the co-finite topology.

The following example shows that the topological spaces are, in a sense, gener-
alisation of metric spaces.

Example 2.30. Let (X, d) be a metric space, and let τ be the collection of all
open sets in (X, d). By Lemma 2.4, the empty set and the whole set X are open,
so they belong to τ . This shows that T1 holds. By Lemma 2.5, the union of any
arbitrary number of open sets in (X, d) is open, so property T2 holds. Similarly, by

Lecture notes for the week 8


Chapter 2. Metric and topological spaces Analysis II, Term I, Page 80

Lemma 2.6, the intersection of any finite number of open sets in (X, d) is open in
(X, d). Hence property T3 holds. Therefore, τ is a topology on X.
The topology τ on X is called the induced topology from the metric d.

The topology induced on Rn from the metric d2 is called the Euclidean topol-
ogy on Rn . We note that since the metrics d1 , d2 and d∞ are equivalent, they all
induce the same metric on Rn .
As we explained in the above example, every metric on X naturally induces a
topology on X (the induced topology). But, this is not a reversible process. First of
all, distinct metrics on a set X may induce the same topology on X. For example,
if d1 and d2 are topologically equivalent metrics on X, then they induce the same
topology. Therefore, we cannot associate a unique metric to each topology. One
might ask whether for every topology τ on X, there is a metric d on X which induces
τ on X. For example, you can verify that the discrete topology on X is induced from
the discrete metric on X. We say that a topological space (X, τ ) is metrisable, if
there is a metric on X which induces the topology τ .

Remark 2.6. In general, it is a difficult problem to find out if a given topology on


a set is metrisable. There are important theorems in topology called metrisation
theorems (such as Urysohn’s metrisation theorem), which provide sufficient condi-
tions for a topology to be metrisable. You can learn more about this topic if you
take the module on Differential Topology, or Algebraic Topology.

Exercise 2.22. Consider a discrete metric space (X, ddisc ), that is ddisc is a discrete
metric on X. Show that ddisc induces the discrete topology on X.

There are standard approaches to define new topologies using old ones. We
explain two of these approaches below.

Example 2.31. Let (X, τ ) be a topological space, and let Y be a subset of X.


Consider the collection of sets

τY = {U ∩ Y | U ∈ τ }.

This is a collection of subsets of Y , and one can verify that τY is a topology on Y .


In other words, τY satisfies properties T1, T2, and T3. The topology τY is called
the induced topology on Y from (X, τ ). We may also say that (Y, τY ) has the
subspace topology induced from (X, τ ).

Exercise 2.23. Let (X, τ ), Y , and τY be as in Example 2.31. Show that τY is a


topology on Y .

Example 2.32. Assume that (X, τ ) and (Y, µ) are two topological spaces. Consider
the product set
X × Y = {(x, y) | x ∈ X, y ∈ Y }.

Lecture notes for the week 8


Chapter 2. Metric and topological spaces Analysis II, Term I, Page 81

Let τ ∗ µ be the collection of all sets Ω ⊆ X × Y such that for every (x, y) ∈ Ω, there
are Ux ∈ τ and Vy ∈ µ such that x ∈ Ux , y ∈ Vy and Ux × Vy ⊆ Ω.
One can see that the collection τ ∗ µ is a topology on X × Y . This is called the
product topology on X × Y .

To define a topology on X × Y , one might wish to simply take the sets of the
form U × V , such that U ∈ τ and V ∈ µ. By the next exercise, you can see that
this does not work in general.
Exercise 2.24. Let τEucl be the Euclidean topology on R, that is τEucl is the
collection of all open sets in (R, d1 ). Show that the collection

{U × V | U ∈ τEucl , V ∈ τEucl }.

is not a topology on R × R. Is condition T2 satisfied? How about condition T3?


Definition 2.18. Let A be a set, and τ1 and τ2 be two topologies on A. We say
that the topology τ1 is stronger (or finer) than τ2 , if τ2 ⊂ τ1 .
Example 2.33. For every set A, the coarse topology on A is the weakest (the least
strong) topology on A, and the discrete topology on A is the strongest topology on
A.
Note that it is not always possible to compare two topologies on a given set A
in the sense of Definition 2.18. That is, there may be topologies τ1 and τ2 on a set
A such that neither τ1 is stronger than τ2 , nor τ2 is stronger than τ1 . For example,
let
A = {a, b}, τ1 = {∅, {a, b}, {a}}, τ2 = {∅, {a, b}, {b}}.
Recall that in a topological space (X, τ ), members of τ are called open sets. This
is in analogy with the way we defined open sets in metric spaces using balls (see
Definition 2.7).

Lemma 2.14. Let (A, τ ) be a topological space. A set G ⊆ A is open in A if and


only if for all x ∈ G there is a neighbourhood of x contained in G.
Proof. Let us first assume that G is open. Since G is an open set in A, we have
G ∈ τ . Thus, for every x ∈ G, G is a neighbourhood of x, and G is a subset of G.
On the other hand, assume that there is a set G ⊂ X such that for every x ∈ G
there exists a neighbourhood Gx contained in G. By property T2, ∪x∈G Gx belongs
to τ , and hence it is an open set. Since G = ∪x∈G Gx , we conclude that G is an
open set.

Definition 2.19. Let (A, τ ) be a topological space, and Ω be a subset of A. A


point z ∈ Ω is called an interior point of Ω, if there is U ∈ τ such that z ∈ U and
U ⊂ Ω. The interior of the set Ω is defined as the set of all z ∈ Ω such that z is
an interior point of Ω. The interior of Ω is denoted by Ω◦ .

Lecture notes for the week 8


Chapter 2. Metric and topological spaces Analysis II, Term I, Page 82

It follows from the above definition that the interior of any set is a subset of that
set. That is, if S ⊆ A, then S ◦ ⊆ S.

Exercise 2.25. Let (A, τ ) be a topological space, and let S and T be subsets of A.
The following properties hold:

(i) if S ⊂ T then S ◦ ⊂ T ◦ ,

(ii) S is open in A if and only if S = S ◦ ,

(iii)* S ◦ is the largest open set contained in S.

2.2.3 Convergence, and Hausdorff property


Definition 2.20. Let (A, τ ) be a topological space, and (xn )∞
n=1 be a sequence in A.

We say that (xn )n=1 converges in (A, τ ), if there is x ∈ A satisfying the following
property: for any G ∈ τ with x ∈ G, there exists N ∈ N such that for all n ≥ N ,
we have xn ∈ G.
When this occurs, we say that xn converges to x as n tends to ∞, or write
limn→∞ xn = x.

Example 2.34. Let (A, τ ) be a topological space, with τ the coarse topology on
A. Then any sequence in A is convergent, and converges to any element in A.
On the other hand, if τ is the discrete topology on A, then a sequence (xn )n∈N
is convergent if and only if, the sequence is eventually constant.

The above example shows that behaviour of sequences in a topological space


may be strange, and counter intuitive. For example, it shows that the limit of a
convergent sequence may not be unique.

Definition 2.21. A topological space (A, τ ) is called Hausdorff, if the following


property holds: For every x and y in A with x 6= y, there are open sets U and V
such that x ∈ U , y ∈ V , and U ∩ V = ∅. In this case we say that U and V separate
x and y.

Example 2.35. Consider the set A = {a, b, c}, and



τ = ∅, {a}, {a, b}, {a, b, c} .

You can shows that τ is a topology on A. The space (A, τ ) is not Hausdorff, since
b and c cannot be separated. The only open set in A which contains c is {a, b, c},
and that set also contains b.

Exercise 2.26. Let (X, d) be a metric space, and let τ be the topology on X
induced from the metric d. Show that (X, τ ) is a Hausdorff topological space.

Lecture notes for the week 8


Chapter 2. Metric and topological spaces Analysis II, Term I, Page 83

The important property of Hausdorff spaces is stated in the next theorem.

Theorem 2.15. Let (A, τ ) be a Hausdorff topological space, and let (xn )n∈N be a
sequence in A. If the sequence (xn )n∈N converges in (A, τ ), then its limit is unique.

Proof. Assume in the contrary that there are distinct points x and y in A such that

lim xn = x, and lim xn = y.


n→∞ n→∞

Because (A, τ ) is a Hausdorff space, there are open sets Gx and Gy such that x ∈ Gx ,
y ∈ Gy , and Gx ∩ Gy = ∅. Since the sequence xn converges to x, there is Nx ∈ N
such that for all n ≥ Nx , we have x ∈ Gx . Similarly, there is Ny ∈ N such that for
all n ≥ Ny we have xn ∈ Gy . Now, for n = max{Nx , Ny }, we have xn ∈ Gx and
xn ∈ gy . This contradicts Gx ∩ Gy = ∅.

2.2.4 Closed sets in topological spaces


It is possible to give a definition of closed sets in a topological space in the same
fashion as we defined closed sets in a metric space (refer to Definition 2.10). However,
for technical reasons, in a topological space, one has to consider the limit points of
the set itself rather than the limit points of sequences in the set. It is convenient
to use the criterion in Theorem 2.9 for the definition of closed sets for topological
spaces, while we show in a moment that a set in a topological space is closed if and
only if it contains its limit points (see Lemma 2.18-(ii) and Remark 2.7).

Definition 2.22. Let (A, τ ) be a topological space, and let V ⊆ A. We say that V
is closed in (A, τ ), if A \ V is open in (A, τ ). That is, V is closed in (A, τ ) if and
only if A \ V ∈ τ .

Lemma 2.16. Let (A, τ ) be a topological space. Then, the empty set and the set A
are closed in (A, τ ). Moreover, we have

(i) the intersection of any number of (finite, countable, uncountable) closed sets
in (A, τ ) is a closed set in (A, τ ),

(ii) the union of any finite number of closed sets in (A, τ ) is a closed set in (A, τ ).

Proof. This follows from Definition 2.22, and the properties T1, T2, and T3 of
topology, by taking complements. See the proof of Lemma 2.10 for a similar argu-
ment.

Lemma 2.17. Let (A, τ ) be a Hausdorff topological space, and a ∈ A. Prove that
the set {a} is a closed set.

Lecture notes for the week 8


Chapter 2. Metric and topological spaces Analysis II, Term I, Page 84

Proof. For any b ∈ A with b 6= a, there are open sets Gb and Ga such that b ∈ Gb ,
a ∈ Ga , and Ga ∩ Gb = ∅. Then, by Definition 2.22, A \ Gb is a closed set. By
Lemmas 2.16, the intersection
\
(A \ Gb )
b∈A\{a}

is a closed set. Since for every b ∈ A \ {a}, A \ Gb contains a and does nor contain
b, the above intersection is equal to {a}. This completes the proof.

Definition 2.23. Let (A, τ ) be a topological space, and S be a subset of A. A point


x ∈ A is called a limit point of S, or an accumulation point of S, if the following
property holds: for any neighbourhood U of x, U contains a point in S different
from x. In other words, for any neighbourhood U of x, we have (S ∩ U ) \ {x} = 6 ∅.
Note that the point x may not be in S.
The closure of S is defined as the set of all points in S and all limit points of
S. The closure of S is denoted by S. Obviously, for any set S ⊂ A, S ⊂ S.

Example 2.36. Let τ be the Sierpinski topology on A = {a, b}. The constant
sequence b, b, b, b, . . . converges to the point a (and also to b) in this topology. That
is because, the only open set in (A, τ ) which contains a is A. Obviously, all points
in the sequence belongs to {b} ⊂ A. This implies that the closure of the set {b} is
A.

Lemma 2.18. Let (A, τ ) be a topological space, and assume that S and T are subsets
of A. The following properties hold:

(i) if S ⊂ T , then S ⊂ T ,

(ii) S is closed in (A, τ ) if and only if S = S,

Remark 2.7. One can take the statement in part (ii) of the above lemma as the
definition of closed sets in a topological space. In other words, V is closed, if it
contains all the limit points of V . This is in the spirit of how we defined closed sets
in metric spaces, but it is not identical to that. If a set V is closed in a topological
space (A, τ ), in particular, for any sequence in V which converges to some point in
A, the limit of the sequence must belong to V . That is because the limit of the
sequence in V belongs to the limit set of V . However, one has to note that limits of
sequences are not necessarily unique in topological spaces. By considering the limit
points of the set we avoid discussing the notion of Hausdorff property when defining
closed sets.

Proof of Lemma 2.18. Part (i): Let x ∈ S be an arbitrary point. Any neighbour-
hood of x contains a point in S and hence a point in T . This implies that x ∈ T .

Lecture notes for the week 8


Chapter 2. Metric and topological spaces Analysis II, Term I, Page 85

Part (ii): First assume that S is closed. Let us suppose in the contrary that
S 6= S. Then there is x ∈ S such that x ∈ / S. This implies that x ∈ A \ S. Since
S is closed, then A \ S is open. This open set contains x, but does not contain any
element of S. So x cannot be a limit point of S, which is a contradiction.
Now assume that S = S. Assume in the contrary that S is not closed. Then A\S
is not open. This implies that there exists x ∈ A \ S such that any neighbourhood
Gx of x is not contained in A \ S. Thus, Gx contains an element of S. This implies
that x ∈ S, and hence x ∈ S. This is a contradiction.

2.2.5 Continuous maps on topological spaces


Definition 2.24. Let (X, τX ) and (Y, τY ) be two topological spaces, and f : X → Y
be a map. We say that f is continuous on X, if for any open set U in Y , f −1 (U )
is open in X.

Note that the continuity of f in the above definition does not just depend on f
but also on the topologies on X and Y . This is illustrated in the next example.

Example 2.37. Let (X, τX ) and (Y, τY ) be topological spaces.

(i) If τX is the discrete topology on X, then any f : X → Y is continuous.

(ii) If τY is the coarse topology on Y , then any f : X → Y is continuous.

We have the following equivalent criterion for the continuity.

Theorem 2.19. Let (X, τX ) and (Y, τY ) be two topological spaces. Then, f : X → Y
is continuous if and only if the pre-image of any closed set in Y is closed in X.

Proof. First note that for any set V in Y , we have f −1 (Y \ V ) = X \ f −1 (V ). Now,


the theorem follows from theorem 2.9.

Theorem 2.20. Let (X, τX ), (Y, τY ) and (Z, τZ ) be topological spaces, and assume
that f : X → Y and g : Y → Z are continuous. Then, g ◦ f : X → Z is continuous.

Proof. This easily follows from the definition of continuity.

Lemma 2.21. Let (X, τX ) and (Y, τY ) be topological spaces, and y ∈ Y . The
constant map f : X → Y defined as f (x) = y, for all x ∈ X, is continuous.

Proof. Let U ⊆ Y be an arbitrary open set. Then



∅ if y ∈
/U
f −1 (U ) =
X if y ∈ U.

Since the empty set and the whole set are open in any topology, we conclude that
f −1 (U ) is open in X. Because U was arbitrary, we conclude that f is continuous.

Lecture notes for the week 8


Chapter 2. Metric and topological spaces Analysis II, Term I, Page 86

Definition 2.25. Let (X, τX ) and (Y, τY ) be topological spaces, and f : X → Y .

(i) We say that f : X → Y is a homeomorphism, if f : X → Y is a bijection,


and both maps f : X → Y and f −1 : Y → X are continuous.

(ii) the topological spaces (X, τX ) and (Y, τY ) are called topologically equiva-
lent, or homeomorphic), if there is a homeomorphism from X to Y .

Note that topological equivalence gives an equivalence relation on the set of


topological spaces.

Example 2.38. In the Euclidean space R, for every a < b,

(i) the sets [a, b] and [0, 1] are homeomorphic, by the map x 7→ (x − a)/(b − a)
from [a, b] to [0, 1],

(ii) the sets (a, b) and (0, 1) are homeomorphic, by the map x 7→ (x − a)/(b − a),

(iii) the sets (−∞, +∞) = R and (−1, 1) are homeomorphic, by the map x 7→
tan(πx/2),

(iv) the sets (0, +∞) and (0, 1) are homeomorphic by the map x 7→ x/(x + 1).

(v) the sets (−∞, +∞) and (0, +∞) are homeomorphic, by the map x 7→ ex .

(vi) the sets [0, 1) and (0, 1] are homeomorphic by the map x 7→ −x + 1.

Exercise 2.27. Assume that the topological spaces (X, τX ) and (Y, τY ) are topo-
logically equivalent. Then, (X, τX ) is Hausdorff if and only if (Y, τY ) is Hausdorff.

From here onward, we will only study metric spaces, as they are fairly general
and capture almost all settings you will come across in mathematics. However,
we will present most of the definitions, statements and proofs using open sets in
the metric space. Thus, most definitions, statements, and proofs can be readily
presented for topological spaces, replacing open sets in the metric with elements of
the topology.

Remark 2.8. Let X be a set, τX be a topology on X and dX be a metric on X.


For a given set U ⊆ X one can say if U is open in (X, τX ), and say if U is open
in (X, dX ). A topology τ on X is called metrisable, if there is a metric d on X
which satisfies the following property: For any set U ⊆ X, U is open in (X, τ ) iff
U is open in (X, d). In other words, a topology τ is metrisable if there is a metric
d on X such that U ∈ τ iff U is open with respect to d. Of course not every
topology on a given set X is metrisable. For instance, if a topology is not Hausdorff
it cannot be metrisable. The fundamental question of topological spaces is that
when a topology on X is metrisable. A sufficient condition is given by the classic

Lecture notes for the week 8


Chapter 2. Metric and topological spaces Analysis II, Term I, Page 87

Theorem of Usysohn/Tychonoff. The optimal condition was discovered and proved,


independently, by J. Nagata and Y. Smirnov. You may find an account of these
results in the class book, Topology by Munkres.

Lecture notes for the week 8


Chapter 2. Metric and topological spaces Analysis II, Term I, Page 88

2.3 Compactness
2.3.1 Compactness by covers
Definition 2.26. Let (X, d) be a metric space, and Y ⊆ X.

(i) A collection R of open subsets of X is called an open cover for Y , if


[
Y ⊆ U.
U ∈R

(ii) Given an open cover R for Y , we say that C is a sub-cover of R for Y , if


[
C ⊆ R and Y ⊆ U.
U ∈C

(iii) An open cover R for Y is called a finite cover, if the number of elements in
R is finite.

Example 2.39. In the metric space (R, d1 ), the collection

R1 = {(−n, n) | n ∈ N}

is an open cover for R. This cover is not finite. The collection

{(−2n, 2n) | n ∈ N}

is a sub-cover of R1 for R. The collection of open sets

{(n − 1/4, n + 1/4) | n ∈ Z}

is an open cover for Z.

The key concept we aim to study in this section is the following definition.

Definition 2.27. Let (X, d) be a metric space, and Y ⊆ X. We say that Y is


compact in (X, d), if every open cover for Y has a finite sub-cover.

This definition may appear strange at this point, but by the end of this section,
it will be clear how important it is.

Example 2.40. In the metric space (R, d1 ), the set R, and the open interval (0, 1)
are not compact.
In order to show this, we need to present an open cover which does not have a
finite sub-cover. The collection R1 for R introduced in the above example does not
have a finite sub-cover for R. That is because, for any finite sub-cover, say

{(−n1 , n1 ), (−n2 , n2 ), . . . , (−nk , nk )},

Lecture notes for the week 9


Chapter 2. Metric and topological spaces Analysis II, Term I, Page 89

the point max{n1 , n2 , . . . , nk } does not belong to the above cover, but it belongs
to R. In fact, if n = max{n1 , n2 , . . . , nk }, then the union of the sets in the above
collection is equal to (−n, n), which clearly does not cover R.
Here is another open cover for R, which has no finite sub-cover

{(n − 1, n + 1) | n ∈ Z}.

For the interval (0, 1), we can consider the open cover

{(1/n, 1) | n ∈ N, n ≥ 2}.

This cover does not have a finite sub-cover. Suppose in the contrary that there is a
finite sub-cover of the above collection, say

{(1/n1 , 1), (1/n2 , 1), . . . , (1/nk , 1)}.

Define m = max{n1 , n2 , . . . , nk }. We note that 1/m ∈ (0, 1) but 1/m does not
belong to any of the sets in the above collection. Thus, the above finite collection
does not cover (0, 1).
Another example is given by
  
1 1
, n ∈ N, n ≥ 2 .
n+1 n−1
To see that the above collection is an open cover, we note that for every r ∈ (0, 1),
1/r > 1. Thus, we can choose an integer n ≥ 2 such that 1/r ∈ (n − 1, n + 1). This
implies that r ∈ (1/(n + 1), 1/(n − 1)). By a similar argument, you can show that
this cover does not have a finite sub-cover.

Exercise 2.28. Consider the metric space (R, d1 ), and assume that a and b are
real numbers with a < b. Show that all of the intervals (a, b], [a, b), [a, +∞), and
(−∞, b] are not compact.

Example 2.41. Let (X, d) be a metric space, and assume that Y is a subset of X
with a finite number of elements. Then Y is compact.
To see this, let R be an arbitrary open cover of Y . Since Y only has finite
number of elements, and each of those elements belongs to one set in R, those finite
number of elements in R cover Y . Thus, R has a finite sub-cover.

Example 2.42. In the metric space (R, d1 ) the set Q ∩ [0, 1] is not compact.

To see this, let α be an irrational number in [0, 1] (for example, α = 2/2). We
can consider the open cover

{(−∞, α − 1/n) ∪ (α + 1/n, +∞) | n ∈ N} .

Obviously, each set (−∞, α − 1/n) ∪ (α + 1/n, +∞) is open in R, and the above
collection covers Q ∩ [0, 1]. But there is no finite sub-cover of the above cover for
Q ∩ [0, 1].

Lecture notes for the week 9


Chapter 2. Metric and topological spaces Analysis II, Term I, Page 90

Exercise 2.29. Show that if A and B are compact subsets of a metric space (X, d),
then A ∪ B is a compact set.

Exercise 2.30. Show that the ball

{(x, y) ∈ R2 | x2 + y 2 < 1}

in the metric space (R2 , d2 ) is not compact.

As you may have noted from the definition of compactness, and the above ex-
amples, it is easier to show that a non-compact set is not compact than to show
that a compact set is compact. To show that a set is not compact, it suffices to give
one open cover which has no finite sub-cover. But to show that a set is compact,
we need to show that any open cover has a finite sub-cover. To deal with any open
cover brings a level of sophistication.

Proposition 2.22. Let a and b be real numbers with a ≤ b. In the metric space
(R, d1 ), the closed interval [a, b] is compact.

Proof. Let R be an arbitrary open cover for [a, b]. Let us consider the set

I = s ∈ [a, b] there is a finite sub-cover of R for [a, s] .

The set I is not empty, as it contains a. That is because, there is one element in R
which covers the interval [a, a] = {a}. Also, the set I is bounded from above, since
I ⊆ [a, b]. Thus, by the completeness of the set of real numbers, I has a supremum.
Let t = sup(I). Note that since I ⊆ [a, b], we have t ∈ [a, b]. First we show that
t = b.
Assume that t = a. Since R is an open cover for [a, b], there is an open set
U ∈ R such that a ∈ U . As U is an open set in R, there is δ > 0 such that
(−δ, +δ) = Bδ (a) ⊆ U . By choosing δ smaller, if necessary, we may assume that
δ < b − a. Now, the collection {U } is a finite sub-cover of R for [a, δ/2]. Thus,
δ/2 ∈ I, contradicting sup(I) = a.
Assume that t ∈ (a, b). Since R is an open cover for [a, b] there is U ∈ R such
that t ∈ U . As U ∩ (a, b) is an open set, and t ∈ U ∩ (a, b), there is δ > 0 such that
(t − δ, t + δ) ⊆ U ∩ (a, b). By the definition of supremum, there must be s ∈ I such
that s ∈ (t − δ, t]. As s ∈ I, [a, s] can be covered by a finite sub-cover of R, say
C ⊆ R. Now the collection C ∪ {U } is a finite sub-cover of R, and it covers the set

[a, t + δ/2] ⊆ [a, s] ∪ (t − δ, t + δ).

This shows that t + δ/2 ∈ I, contradicting sup(I) = t.


By the above two paragraphs, we must have t = b. This does not immediately
mean that [a, b] can be covered by a finite sub-cover of R (supremum may not belong

Lecture notes for the week 9


Chapter 2. Metric and topological spaces Analysis II, Term I, Page 91

to the set). Again, since R is an open cover for [a, b] there is an open set U ∈ R
such that b ∈ U . As U is an open set, there is δ > 0 such that (b − δ, b + δ) ⊆ U .
By the definition of supremum, there must be s ∈ I such that s ∈ (b − δ, b]. As
s ∈ I, [a, s] can be covered by a finite sub-cover of R, say C ⊆ R. Now the collection
C ∪ {U } is a finite sub-cover of R, and it covers the set [a, b] ⊆ [a, s] ∪ (b − δ, b + δ).
This shows that there is a sub-cover of R for [a, b].

Let us look at some basic properties of compact sets.

Proposition 2.23. Let (X, d) be a metric space, and let Y ⊆ X. If X is compact


and Y is closed, then Y is compact.

Proof. Let R be a cover for Y . Because Y is closed, X \ Y is open. Therefore,


R ∪ {X \ Y } is an open cover for X. Since X is compact, there is a finite sub-cover
of R ∪ {X \ Y }, which covers X. This sub-cover also covers Y . However, we do not
need X \ Y to cover Y . Hence we may remove X \ Y from that sub-cover, and still
cover Y . Thus, there is a finite sub-cover of R which covers Y .

Theorem 2.24. Let (X, d) be a metric space, and Y ⊆ X. If Y is compact, then


Y is closed.

Proof. Let Y ⊆ X be a compact set. We aim to show that X \ Y is open (by


Theorem 2.9 this implies that Y is closed). Fix an arbitrary point z ∈ X \ Y .
For each y ∈ Y , we define ry = d(z, y)/2 > 0. Consider the ball Bry (y). The
collection
{Bry (y) | y ∈ Y }
is an open cover for Y . Since Y is compact, there is a finite sub-cover of this cover
for Y . Thus, there are a finite number of points y1 , y2 , . . . , yk in Y and positive real
numbers ry1 , ry2 , . . . , ryk such that
k
[
Y ⊆ Bryi (yi ).
i=1

Let r = min{ryi | 1 ≤ i ≤ k}, and note that


k
\
Br (z) = Bryi (z).
i=1

By our choice of ry , we have

Br (z) ∩ Bryi (yi ) ⊆ Bryi (z) ∩ Bryi (yi ) = ∅.

Therefore,
k
!
\ [
Br (z) ∩ Y ⊂ Br (z) Bri (yi ) = ∅.
i=1

Lecture notes for the week 9


Chapter 2. Metric and topological spaces Analysis II, Term I, Page 92

Thus, Br (z) ⊆ X \ Y . As z ∈ X \ Y was arbitrary, we conclude that X \ Y is


open.

Theorem 2.25. Let (X, dX ) and (Y, dY ) be metric spaces, and consider the product
space X × Y with any of the metrics d from Definition 2.4. If X and Y are compact,
then (X × Y, d) is compact.

Proof. Let R be an arbitrary open cover for X × Y .


Let us first assume that every set in R is of the from U × V , where U is an open
set in X and V is an open set in Y . Thus, for every (x, y) ∈ X × Y , there is Wxy in
R such that (x, y) ∈ Wxy , and Wxy = Uxy × Vxy for some open sets Uxy in X and
Vx,y in Y .
Fix an arbitrary x ∈ X. For any y ∈ Y , there is Wxy ∈ R such that (x, y) ∈ Wxy .
Let us consider the collection

Rx = {Vxy | Wxy ∈ R, (x, y) ∈ Wxy , Wxy = Uxy × Vxy }.

Since R is an open cover for X × Y , Rx is an open cover for Y . As Y is compact,


there is a finite sub-collection of Rx for Y , say {Vxy1 , Vxy2 , . . . , Vxyn }. Consider the
set
\n
Ux = Uxyi .
i=1
Since each Uxyi is an open set in X, and the above intersection is finite, the set Ux
is open in X. In particular, we have
n
[ n
[

Ux × Y ⊆ Uxyi × Vxyi = Wxyi .
i=1 i=1

As x ∈ X was arbitrary, by the above argument, for each x ∈ X we obtain an


open set Ux in X. Let us consider the collection of open sets {Ux | x ∈ X}. This is
an open cover for X. Because X is compact, there is a finite sub-cover of this cover
for X, say {Ux1 , Ux2 , . . . , Uxm }. Combining with the above equation, we note that
i=m,j=n
[
X ×Y ⊆ W x i yj .
i=1,j=1

Thus, there is a finite sub-cover of R for X × Y . This completes the proof in this
case.
Now assume that R is an arbitrary open cover for X×Y . For each (x, y) ∈ X×Y ,
there is an open set Wxy in R such that (x, y) ∈ Wx,y . Let us choose an open set Uxy
in X and an open set Vxy in Y such that x ∈ Uxy , y ∈ Vxy , and Uxy × Vxy ⊆ Wxy .
The collection of all such open sets Uxy × Vxy , for all x ∈ X and y ∈ Y , is an open
cover of X × Y . By the above proof, there is a finite sub-cover of this cover, say
Uxi yj × Vxi yj for (i, j) ∈ I, which covers X × Y .

Lecture notes for the week 9


Chapter 2. Metric and topological spaces Analysis II, Term I, Page 93

For each (i, j) ∈ I, there is Wxi yj ∈ R such that Uxi yj ×Vxi yj ⊆ Wxi yj . Therefore,
{Wxi yj | (i, j) ∈ I}, is a finite sub-cover of R for X × Y . This completes the
proof.

By Proposition 2.22 and Theorem 2.25, we obtain the following corollary.

Corollary 2.26. The set [a1 , b1 ] × [a2 , b2 ] × · · · × [an , bn ] in the Euclidean space Rn
is compact.

The above results show us how difficult it is to prove that a given set compact
set is compact. One may imagine how difficult it can be to deal with an unusual set
in Rn . Thus, it is important to have some criteria which can be verified easily, and
imply compactness. In the remaining of this section we aim to introduce few such
criteria.

Definition 2.28. Let (X, d) be a non-empty metric space, and Z ⊆ X. We say


that the set Z is bounded in (X, d), if there exists M ∈ R such that for all x and
y in Z we have d(x, y) ≤ M .
Let S be an arbitrary set, and f : S → X. We say that f is bounded, if the
set f (S) is bounded in X.

Exercise 2.31. Let (X, d) be a metric space, and A1 , A2 , . . . , An be a finite number


of bounded sets in X. Then ∪ni=1 Ai is a bounded set in X.

Exercise 2.32. Let (X, d) be a non-empty metric space, and let Z ⊆ X. Show
that Z is bounded if and only if there is x ∈ X and r ∈ R such that Z ⊆ Br (x).

Lemma 2.27. If (X, d) is a compact metric space, then X is bounded.

Proof. Fix an arbitrary x ∈ X and consider the open cover R = {Bn (x) | n ∈ N}.
As X is compact, there is a finite sub-cover of R which covers X. Let Bni (x), for
i = 1, 2, . . . , k, be those finite sets. Define m = max1≤i≤k ni . We have

X ⊂ ∪ki=1 Bni (x) = Bm (x).

The main criterion for compactness of subsets of Rn is presented in the next


theorem.

Theorem 2.28 (Heine Borel). Consider the Euclidean metric space (Rn , d2 ), and
let X ⊆ Rn . Then, X is compact if and only if X is closed and bounded.

Proof. Let us first assume that X is compact. By Lemma 2.27, X is bounded, and
by Theorem 2.24, X is closed.
Now assume that X is closed and bounded. Since X is bounded, there is N ∈ N
such that C ⊆ [−N, N ]n . By Corollary 2.26, the set [−N, N ]n is compact. Thus,
X is a closed set in a compact set. By Proposition 2.23, that implies that X is
compact.

Lecture notes for the week 9


Chapter 2. Metric and topological spaces Analysis II, Term I, Page 94

In the above theorem it is important that the set X is contained in Rn . The


statement of the theorem is not true for general metric spaces, as you show in the
next exercise.

Exercise 2.33. Consider the set R with the discrete metric ddisc . The set (0, 1) is
closed and bounded in (R, ddisc ), but it is not compact.

We say that a sequence of sets Vn , for n ≥ 1, is a nest, if for all i ≥ 1 we have


Vi+1 ⊆ Vi . That is,
V1 ⊇ V2 ⊇ V3 ⊇ V4 ⊇ . . . .

Exercise 2.34. Let (X, d) be a metric space, and assume that Vn , for n ≥ 1, be a
nest of non-empty closed sets in X.

(i) Show that if X is compact, then ∩n≥1 Vn is not empty.

(ii) Give an example of a nest of non-empty closed sets Vn , for n ≥ 1, in a metric


space such that ∩n≥1 Vn is empty.

2.3.2 Sequential compactness


In this section we aim to find a simpler criterion which implies the compactness.

Definition 2.29. We say that a metric space (X, d) is sequentially compact,


if every sequence in X has a sub-sequence which converges in (X, d). That is, for
every sequence (xn )n≥1 in X, there is a sub-sequence (xnk )k≥1 and a point x ∈ X
such that xnk → x, as k → +∞.

Example 2.43. The metric space (R, d1 ) is not sequentially compact. For example,
the sequence (n)n≥1 does not have any sub-sequence which converges in (R, d1 ).
Consider (0, 1) ⊆ R, and let d be the induced metric from d1 on (0, 1). The metric
space ((0, 1), d) is not sequentially compact. To see this, consider the sequence
(1/n)n≥1 . This sequence belongs to (0, 1), and converges to 0 in the metric space
(R, d1 ). So any subsequence of this sequence also converges to 0 in (R, d1 ). But,
since 0 ∈/ (0, 1), this sequence has no sub-sequence which converges in ((0, 1), d).

Lemma 2.29. Let (X, d) be a metric space, and (xn )n≥1 be a sequence in X. Then,
(xn )n≥1 has a sub-sequence which converges to an element in X if and only if there
is x ∈ X such that for every ǫ > 0, there are infinitely many i satisfying xi ∈ Bǫ (x).

Proof. First assume that (xn )n≥1 has a sub-sequence which converges to x ∈ X.
Let (xni )i≥1 be a sub-sequence which converges to x. Fix an arbitrary ǫ > 0. By
the definition of convergence, there is N ∈ N such that for all i ≥ N , we have
xni ∈ Bǫ (x). This shows that there are infinitely many n such that xn ∈ Bǫ (x).

Lecture notes for the week 9


Chapter 2. Metric and topological spaces Analysis II, Term I, Page 95

For the other direction, we aim to find a subsequence of xn which converges to


x. We shall do this inductively. Let n1 = 1. Suppose we have defined xn1 , xn2 , . . . ,
xni−1 . Then by the assumption, for infinitely many n we have xn ∈ B1/i (x). So
take ni to be the smallest such n such that ni ≥ ni−1 and xni ∈ B1/i (x). With this
process, we define a sub-sequence of (xn )n≥1 . We note that for every i ≥ 1, we have
d(xni , x) < 1/i. This shows that the sub-sequence xni converges to x as i → ∞.

Exercise 2.35. Show that if a metric space is sequentially compact, then it is


bounded.

Theorem 2.30. If a metric space (X, d) is compact, then it is sequentially compact.

Proof. Suppose in the contrary that X is not sequentially compact. Then, there is a
sequence (xn )n≥1 in X which has no convergent sub-sequence. Therefore, for every
x ∈ X, there is no subsequence of this sequence which converges x. Thus, using
Lemma 2.29, for every x ∈ X, there is ǫx > 0 such that only for finitely many n we
have xn ∈ Bǫx (x).
Let Ux = Bǫx (x). Then, the collection

{Ux | x ∈ X}

is an open cover for X. By the compactness of X, there is a finite sub-cover


{Ux1 , Ux2 , . . . , Uxm } such that X = ∪m
i=1 Uxi . But, for each i, xn ∈ Uxi for only
finitely many n. Thus, xn ∈ X, for only finitely many n, which is a contradiction,
since the whole sequence (xn )n≥1 belongs to X.

We state an important application of the above theorem.

Theorem 2.31 (Bolzano-Weierstrass). Any bounded sequence in Rm has a conver-


gent subsequence.

Proof. Let (xn )n≥1 be a bounded sequence in Rm . Then, there is M > 0 such that
for all n ≥ 1, we have kxn k ≤ M . Since [−M, M ]m is compact in the Euclidean met-
ric, by Theorem 2.30, ([−M, M ]n , d2 ) is sequentially compact. Therefore, (xn )n≥1
has a convergent subsequence.

Note that in the proof of Theorem 2.30 we did not say that there are finitely
many xn in each Uxi , but we say that xn ∈ Uni for finitely many n. This is important
since, the sequence xn may be constant, or there may be infinitely many entries in
the sequence which are the same.
The opposite implication in Theorem 2.30 is also true. But the proof requires
some technical steps, which we break into few optional exercises.

Exercise 2.36.* Let (X, d) be a sequentially compact metric space. Show that X
is separable, that is, there is a countable dense set in X.

Lecture notes for the week 9


Chapter 2. Metric and topological spaces Analysis II, Term I, Page 96

Exercise 2.37.* Let (X, d) be a sequentially compact metric space, and R be an


open cover for X. Show that there is a countable sub-cover of R for X.

Theorem 2.32. Assume that (X, d) is a metric space. If X is sequentially compact,


then X is compact.

The statement of the above theorem is not optional, but its proof is optional.

Proof.* Let R be an arbitrary open cover for X. By Exercise 2.37, we can extract
a countable sub-cover of R, say {V1 , V2 , V3 , . . . }
Suppose that there is no finite sub-cover of {V1 , V2 , . . . } for X. Then for every
n ≥ 1, {V1 , ..., Vn } does not cover X. Hence, for each n we can choose xn ∈ X
such that xn ∈ / ∪ni=1 Vi . In particular, xn ∈/ Vi for every i ≤ n. This implies that
only finitely many entries of the sequence lie in each Vi . This generates a sequence
(xn )n≥1 in X.
Since X is sequentially compact, we can find a convergent sub-sequence, say
(xnj )j≥1 . Suppose this converges to x ∈ X. Then, since {V1 , V2 , . . . } is a cover for X,
there is m ≥ 1 such that x ∈ Vm . Since Vm is open, by the definition of convergence,
there is N ∈ N such that for all j ≥ N , we have xnj ∈ Vm . Hence, infinitely many
entries in the sequence (xn )n≥1 lie in Vm , which is a contradiction.

Remark 2.9. The equivalence of compactness and sequential compactness does not
hold for arbitrary topological spaces. See Remark 2.8. For instance, you may see
that the statement in Exercise 2.36 does not hold for arbitrary topological spaces.
Indeed, in an arbitrary topological space, neither compactness implies sequential
compactness, nor sequential compactness implies compactness. The examples of
such topological spaces are too specialised for the scope of this module.

2.3.3 Continuous maps and compact sets


Theorem 2.33. Let (X, dX ) and (Y, dY ) be metric spaces, and f : X → Y be a
continuous map. If Z is a compact set in X, then f (Z) is a compact set in Y .

Proof. Let R = {Vα | α ∈ I} be an open cover for f (Z). Define Uα = f −1 (Vα ).


Note that each Uα is an open set in X, since f is continuous. Moreover, ∪α∈I Uα
covers Z. Since Z is compact, there exists a finite sub-cover U1 , U2 , . . . , Un for
Z. Then, V1 = f (U1 ), V2 = f (U2 ), . . . , Vn = f (Un ) is a finite sub-cover of R for
f (Z).

Corollary 2.34. Let (X, dX ) and (Y, dY ) be metric spaces, and f : X → Y be a


homeomorphism. Then, X is compact if and only Y is compact.

Lecture notes for the week 9


Chapter 2. Metric and topological spaces Analysis II, Term I, Page 97

The above corollary allows us to immediately conclude that some pairs of sets
are not homeomorphic. For example, the intervals (0, 1) and [0, 1] are not homeo-
morphic, since one of them is compact and the other one is not.
Compactness is an extremely useful property in analysis. We shall study some
of the conveniences that come with it.
Recall that a map f : (X, dX ) → (Y, dY ) is uniformly continuous, if for every
ǫ > 0 there exists δ > 0 such that for all x1 and x2 in X satisfying dX (x1 , x2 ) < δ we
have dY (f (x1 ), f (x2 )) < ǫ. Note that δ in the above definition is independent of x1
and x2 . In general it is fairly difficult to verify that a map is uniformly continuous.
You have already seen this for maps from R to R.

Theorem 2.35. Every continuous map from a compact metric space to a metric
space is uniformly continuous.

Proof. To prove this, fix arbitrary metric spaces (A1 , d1 ) and (A2 , d2 ), and assume
that A1 is compact and f : A1 → A2 is continuous.
Suppose in the contrary that f is not uniformly continuous. Then, for some
ǫ > 0 and any n ∈ N there exists xn and x′n in A1 such that

d1 (xn , x′n ) < 1/n and d2 (f (xn ), f (x′n )) ≥ ǫ.

Because A1 is compact, by Theorem 2.32, A1 is sequentially compact. Thus, there


exists a sub-sequence (xnk )k≥1 which converges to some x ∈ A1 . We note that
the sub-sequence (x′nk )k≥1 also converges to x. That is because, by the triangle
inequality,

d1 (x′nk , x) ≤ d1 (x′nk , xnk ) + d1 (xnk , x) ≤ 1/n + d1 (xnk , x).

On the other hand, since f is continuous, and the sequences (xnk )k≥1 and (x′nk )k≥1
converge to x, the sequences (f (xnk ))k≥1 and (f (x′nk ))k≥1 converge to f (x). But,
by our choice of these sub-sequences, we have

ǫ ≤ d2 (f (xnk ), f (x′nk )) ≤ d2 (f (xnk ), f (x)) + d2 (f (x), f (x′nk ))

This is a contradiction.

Corollary 2.36. Assume that a and b are real numbers with a < b. If f : [a, b] → R
is continuous, then it is uniformly continuous.

Theorem 2.37. Let (X, d) be a compact metric space, and f : X → R be a con-


tinuous map. Then f is bounded from above and below on X, and attains its upper
and lower bounds.

Lecture notes for the week 9


Chapter 2. Metric and topological spaces Analysis II, Term I, Page 98

Proof. By Theorem 2.33, f (X) ⊆ R is compact. Then, by Theorem 2.28, f (X) is


closed and bounded in R. In particular, f (X) is bounded from above and below.
Let M = sup f (X). Since M is the least upper bound for f (X), for every n ≥ 1,
M − 1/n is not an upper bound for f (X). Thus, for every n ≥ 1, there is xn ∈ X
such that f (xn ) ≥ M − 1/n.
Consider the sequence (xn )n≥1 . Since X is compact, it is sequentially compact.
Thus, there is a sub-sequence of (xn )n≥1 , say (xnk )k≥1 , which converges to some x
in X. As f is continuous, f (xnk ) → f (x) as k → ∞. Taking limits in the inequality

f (xnk ) ≥ M − 1/nk ,

as k → ∞, we obtain f (x) ≥ M . On the other hand, since f (x) ∈ f (X), and


sup f (X) = M , we must have f (x) ≤ M . Therefore, f (x) = M .
Similarly, we may show that there is x′ ∈ X such that f (x′ ) = inf f (X).

Exercise 2.38. Let (X, d) be a compact metric space, and assume that f : X → X
is a continuous map such that for all x ∈ X, we have f (x) 6= x. Show that there is
δ > 0 such that for all x ∈ X, we have d(x, f (x)) ≥ δ.

Lecture notes for the week 9


Chapter 2. Metric and topological spaces Analysis II, Term I, Page 99

2.4 Completeness
2.4.1 Complete metric spaces and Banach space
The completeness of the set of real numbers is a fundamental property widely used
in analysis. It allows us to solve equations such as x2 = 2 in R, which have no
solutions in Q. Evidently, it is useful to have such a property in more general
settings. However, the completeness of R in terms of least upper bounds, uses the
order on the set of real numbers. This cannot be generalised to arbitrary sets in a
meaningful fashion. But, the completeness of R in terms of Cauchy sequences can
be generalised to arbitrary metric spaces. In this section, we aim to develop this
theory. You will see many applications of the completeness results of this section in
the second year module, Differential Equations.

Definition 2.30. Let (X, d) be a metric space, and (xn )n≥1 be a sequence in X.
We say that (xn )n≥1 is a Cauchy sequence in (X, d), if for every ǫ > 0 there exists
Nǫ ∈ N such that for all n and m bigger than Nǫ we have

d(xn , xm ) < ǫ.

Exercise 2.39. Show that any convergent sequence in a metric space, is a Cauchy
sequence.

Exercise 2.40. Let (X, d) be a metric space, and assume that (xn )n≥1 is a Cauchy
sequence in X. If there is a subsequence of (xn )n≥1 which converges to some x ∈ X,
then the sequence (xn )n≥1 converges to x.

Definition 2.31. (i) A metric space (X, d) is called complete, if every Cauchy
sequence in X converges to a limit in X.

(ii) A normed vector space (V, k·k) is called a Banach space, if V with the induced
metric space dkk is a complete metric space.

Example 2.44. You have already seen in Analysis I that any Cauchy sequence
in R is convergent. You can also prove this using Exercise 2.40 and the Bolzano-
Weierstrass Theorem 2.31. Thus, the metric space (R, d1 ) is complete.
The metric space (Q, d) is not complete (here d1 is the induced metric on Q).

For example, any sequence in Q which converges to 2, is Cauchy but does not
converge in (Q, d1 ).
In the same fashion, the metric space ((0, 1], d1 ) is not complete. For example,
the sequence (1/n)n≥1 in (0, 1] is Cauchy, but not convergent (the limit does not
belong to (0, 1]).
However, the metric space ([0, 1], d1 ) is complete.

Lemma 2.38. For every m ≥ 1, the metric space (Rm , d2 ) is complete.

Lecture notes for the week 10


Chapter 2. Metric and topological spaces Analysis II, Term I, Page 100

Proof. Assume that (xn )n≥1 is a Cauchy sequence in Rm . For each n ≥ 1, let
us write xn = (x1n , x2n , . . . , xm 1 2 m
n ). Recall that for every z = (z , z , . . . , z ) and
y = (y 1 , y 2 , . . . , y m ) in Rm , and every k ∈ {1, 2, . . . , m}, we have

|z k − y k | ≤ kz − yk .

This implies that for every k ∈ {1, 2, . . . , m}, the sequence (xkn )n≥1 is a Cauchy
sequence in (R, d1 ). To see this, fix an arbitrary ǫ > 0. Since (xn )n≥1 is Cauchy
in (Rm , d2 ), there is Nǫ ∈ N, such that for all i and j bigger than Nǫ we have
d2 (xi , xj ) < ǫ. Then, by the above inequality, for all i and j bigger than Nǫ , we
have
d1 (xki , xkj ) = |xki − xkj | ≤ kxi − xj k = d2 (xi , xj ) < ǫ.

Now, since every Cauchy sequence in R is convergent, the sequence (xkn )n≥1 con-
verges to some point in R, say xk . This implies that the sequence (xn )n≥1 converges
to x = (x1 , x2 , . . . , xm ) in Rm .

Alternatively, by the the above lemma, we can say that the normed vector space
(Rm , k·k
2 ) is a Banach space.

Example 2.45. In any discrete metric space, only eventually constant sequences
are Cauchy. Obviously, any eventually constant sequence is convergent. Therefore,
any set with the discrete metric is complete.

Recall that for real numbers a and b with a ≤ b, C([a, b]) denotes the set of all
continuous functions f : [a, b] → R. We defined two norms on C([a, b]) denoted by
k·k2 and k·k∞ . These induce the metrics d2 and d∞ , respectively. In these metrics,
for f and g in C([a, b]), we have

d∞ (f, g) = sup |f (t) − g(t)|,


t∈[a,b]

and
Z b 1/2
d2 (f, g) = |f (t) − g(t)|2
a

Proposition 2.39. The metric space (C([a, b], d2 ) is not complete. Equivalently,
the normed vector space (C([a, b]), k·k2 ) is not a Banach space.

Proof. To simplify the argument, let us assume that a = −1 and b = 1 (one can
adapt the following example to the general case). For n ≥ 1, consider the functions



 −1 if − 1 ≤ t ≤ −1/n,

φn (t) = nt if − 1/n ≤ t ≤ 1/n,


1 if 1/n ≤ t ≤ 1.

Lecture notes for the week 10


Chapter 2. Metric and topological spaces Analysis II, Term I, Page 101

+1

b b b b

−1 1 1
−n +1
n

−1

Figure 2.7: The graphs of three functions in the sequence (φn )n≥1 .

See Figure 2.7 for the graphs of these functions.


Each φn is a continuous and hence belongs to C([−1, +1]). We note that for
every m and n in N, we have
Z 1
2
|φn (t) − φm (t)|2 dt ≤ .
−1 min(n, m)
This implies that the sequence φn is Cauchy.
We claim that the sequence (φn )n≥1 does not converge in (C([−1, +1]), d2 ). To
see this let us consider the function

−1 if t ∈ [−1, 0),
ψ(t) =
1 if t ∈ [0, 1].

For every n ≥ 1, we have


+1
1
Z
|φn (t) − ψ(t)|2 dt ≤ 2 ·
−1 n
Now, assume in the contrary that the sequence (φn )n≥1 , converges to some f in
C([−1, +1]). By the triangle inequality for the metric d2 , we have
Z 1 1/2  Z 1 1/2  Z 1 1/2
|f (t)− ψ(t)|2 dt ≤ |f (t)− φn (t)|2 dt + |φn (t)− ψ(t)|2 dt .
−1 −1 −1

By the above properties, the right hand side of the above equation tends to 0 as
n → ∞. As the left hand side is a non-negative number, we must have
Z 1
|f (t) − ψ(t)|2 = 0.
−1

This implies that


Z 0 Z 1
2
|f (t) − ψ(t)| = 0 and |f (t) − ψ(t)|2 = 0.
−1 0

Lecture notes for the week 10


Chapter 2. Metric and topological spaces Analysis II, Term I, Page 102

f +ǫ
fn

f −ǫ
f

Figure 2.8: In the uniform convergence, for all n ≥ Nǫ , the graph of the function fn
lies between f − ǫ and f + ǫ.

Then, since f and φ are continuous on the intervals (−1, 0) and (0, 1), the above
equations imply that f = ψ on (−1, 0) and on (0, 1). Such a map f cannot be
continuous.

Remark 2.10. Just like building the completion of Q to get the set of real numbers,
one can build the completion of the metric space (C([a, b]), d2 ). This results in a
complete metric space of functions, where one can look for solutions to functional
equations. You can learn about this and similar results by taking the optional
module Lebesgue Measure and Integration.

Recall that a sequences of functions fn : [a, b] → R converges point-wise to


f : [a, b] → R, if for every x ∈ [a, b], the sequence of real numbers fn (x) converges
to f (x). That is, for every x ∈ [a, b] and every ǫ > 0 there exists Nx,ǫ ∈ N such that
for all n ≥ Nx,ǫ we have |fn (x) − f (x) < ǫ.
Recall that the sequence fn : [a, b] → R converges uniformly to f : [a, b] → R, if
for all ǫ > 0 there exists Nǫ ∈ N such that for all n ≥ Nǫ and for all x ∈ [a, b] we
have |fn (x) − f (x)| < ǫ. This is equivalent to

sup |fn (x) − f (x)| → 0, as n → ∞.


x∈[a,b]

Example 2.46. (i): Consider the functions fn : [0, 1] → R defined as fn (x) = xn ,


for n ≥ 1. The sequence fn converges point-wise to the function

0 if x ∈ [0, 1),
f=
1 if x = 1.

But, for every n ≥ 1, we have

sup |fn (x) − f (x)| = 1.


x∈[0,1]

Lecture notes for the week 10


Chapter 2. Metric and topological spaces Analysis II, Term I, Page 103

That is because, as x tends to 1 from the left, we have fn (x) − f (x) = xn → 1.


Therefore, fn does not converge uniformly to f .
(ii) Consider the sequence of functions fn : [0, 1] → R, defined as

fn (x) = n2 x(1 − x)n , ∀n ≥ 1.

This sequence converges point-wise to f ≡ 0.


To see whether fn converges uniformly to f , we examine the functions fn . Each
fn takes non-negative values, with fn (0) = 0 and fn (1) = 0. Also, each fn is
differentiable on (0, 1), so it takes its maximum where the derivative of fn becomes
0. By calculation, we see that fn′ (1/(n + 1)) = 0, and
n
n2
  
1 n
fn = .
n+1 n+1 n+1
Therefore,
n
n2

n
sup |fn (t) − f (t)| = → ∞, as n → ∞.
t∈[0,1] n+1 n+1
This implies that fn does not converge uniformly to f .
2
(iii): Consider the sequence of functions fn : [0, 1] → R defined as fn = xe−nx .
The sequence (fn )n≥1 converges uniformly (and hence point-wise) to f ≡ 0. That
is because,
2
sup xe−nx → 0, as n → ∞.
x∈[0,1]

It is likely that you have seen the following theorem in Analysis I.

Theorem 2.40. Assume that (fn : [a, b] → R)n≥1 is a sequence of continuous


functions which converges uniformly to f : [a, b] → R. Then, f : [a, b] → R is
continuous.

Proof. Fix an arbitrary c ∈ [a, b]. In order to prove that f is continuous at c, let us
also fix an arbitrary ǫ > 0.
Because the sequence (fn )n≥1 converges uniformly to f , there is Nǫ ∈ N, such
that for all n ≥ Nǫ , and all x ∈ [a, b] we have |fn (x) − f (x)| < ǫ/3.
Now, fix an arbitrary n ≥ Nǫ . Since fn is continuous at c, there is δ > 0 such
that for all x ∈ Bδ (c) ∩ [a, b], we have |fn (x) − fn (c)| ≤ ǫ/3.
By the above inequalities, and the triangle inequality for the modulus function,
for all x ∈ Bδ (c) ∩ [a, b], we have

|f (x) − f (c)| ≤ |f (x) − fn (x)| + |fn (x) − fn (c)| + |fn (c) − f (c)|
< ǫ/3 + ǫ/3 + ǫ/3 = ǫ.

As ǫ > 0 was arbitrary, this shows that f is continuous at c. As c ∈ [a, b] was


arbitrary, we conclude that f is continuous on [a, b].

Lecture notes for the week 10


Chapter 2. Metric and topological spaces Analysis II, Term I, Page 104

Theorem 2.41. The metric space (C([a, b]), d∞ ) is complete. Equivalently, the
normed vector space (C([a, b]), k·k∞ ) is a Banach space.

Proof. Let (φn )n≥1 be a Cauchy sequence in (C([a, b]), d∞ ). By definition, for every
ǫ > 0 there exists Nǫ ∈ N such that for all x ∈ [a, b] and all m and n bigger than Nǫ
we have |φn (x) − φm (x)| < ǫ.
Now, fix an arbitrary x ∈ [a, b]. By the above paragraph, the sequence of real
numbers (φn (x))n≥1 is a Cauchy sequence in (R, d1 ). Then, by the completeness
of the set of real numbers, the sequence of real numbers (φn (x))n≥1 converges to a
(unique) real number, which we denote by lx . As x in [a, b] was arbitrary, for each
x ∈ [a, b], we obtain a real number lx .
Let us define the function φ : [a, b] → R as φ(x) = lx . We claim that φn
converges uniformly to φ on [a, b]. To see this, fix an arbitrary ǫ > 0. Since (φn )n≥1
is a Cauchy sequence in (C([a, b]), d∞ ), (for ǫ/2 > 0) there exists Mǫ ∈ N such that
for all x ∈ [a, b] and all m and n bigger than Mǫ we have

|φn (x) − φm (x)| < ǫ/2.

Taking limit as m → ∞, the above inequality implies that

|φn (x) − φ(x)| ≤ ǫ/2 < ǫ.

Thus, for all x ∈ [a, b] and all n ≥ Mǫ , we have

|φn (x) − φ(x)| < ǫ.

As ǫ > 0 was arbitrary, we conclude that (φn )n≥1 converges uniformly to φ. By


Theorem 2.40, φ : [a, b] → R is continuous. Therefore, any Cauchy sequence in
(C([a, b]), d∞ ) converges to an element of C([a, b]).

Theorem 2.42. If (X, d) is a compact metric space, then (X, d) is complete.

Proof. Let (xn )n≥1 be a Cauchy sequence in (X, d). By theorem 2.30, (X, d) is
sequentially compact. Thus, there exists a subsequence (xnk )k≥1 which converges
to some x ∈ X. By Exercise 2.40, xn converges to x in (X, d).

2.4.2 Arzelà-Ascoli
There is an important corollary of the completeness of (C([a, b]), d∞ ), which we
present in this section.

Definition 2.32. Let C be a collection of functions f : [a, b] → R.

(i) We say that the collection C is uniformly bounded, if there exists M such
that for all f ∈ C and all x ∈ [a, b] we have |f (x)| < M .

Lecture notes for the week 10


Chapter 2. Metric and topological spaces Analysis II, Term I, Page 105

(ii) We say that the collection C is uniformly equi-continuous, if for all ǫ > 0
there exists δ > 0 such that for all f ∈ C, and all x1 and x2 in [a, b] satisfying
|x1 − x2 | < δ, we have |f (x1 ) − f (x2 )| < ǫ.

Note that in the second part of the above definition, the number δ does not
depend on the function f , but only on the collection C.

Exercise 2.41. Let C be a collection of functions f : [a, b] → R. Assume that there


is K > 0 such that for all f ∈ C and all x and y in [a, b], we have

|f (x) − f (y)| ≤ K|x − y|.

Show that the family C is uniformly equi-continuous.

Theorem 2.43 (Arzelà-Ascoli). Assume that C is a collection of continuous func-


tions f : [a, b] → R. If C is uniformly bounded and uniformly equi-continuous, then
every sequence in C has a sub-sequence which converges in (C([a, b]), d∞ ).

Proof. Let us fix an arbitrary sequence (fn )n≥1 in C. We need to show that there
is a sub-sequence of this sequence which converges to some continuous function
f : [a, b] → R with respect to the metric d∞ . We break the proof into several steps:
Step 1. The sequence (fi )∞ ∞
i=0 has a sub-sequence (gi )i=0 which converges point-
wise on [a, b] ∩ Q.
Proof of Step 1: Note that the set [a, b] ∩ Q is countable. This means that we
may write [a, b] ∩ Q = {x1 , x2 , . . .}.
Let us denote the function fi by the notation f0,i , that is, for all i ∈ N and for
all x ∈ [a, b], we have f0,i (x) = fi (x).
Now consider the sequence of numbers (f0,i (x1 ))∞ i=0 . This is a bounded sequence
of real numbers. By Bolzano–Weierstrass, this sequence has a convergent subse-
quence, say (f1,i (x))∞ ∞
i=0 . Now let us consider (f1,i (x2 ))i=0 , which again is a bounded
sequence of real numbers, with a convergent subsequence f2,i (x2 ). This is a sub-
sequence of f1,i such that f2,i (x1 ) and f2,i (x2 ) both converge. We can repeat this
process of extracting subsequences to obtain functions fk,i for k, i ∈ N with the
property that (fk+1,i )∞ ∞
i=0 is a subsequence of (fk,i )i=0 , and moreover for all l ≤ k,
the sequence fk,i (xl ) converges.
Let us define the sequence of functions gi = fi,i , for i ∈ N. Each gi is defined on
[a, b]. To illustrate the above process, one may think of fi,i as the diagonal of the
array
f0,0 f0,1 f0,2 f0,3 ...
f1,0 f1,1 f1,2 f1,3 ...
f2,0 f2,1 f2,2 f2,3 ...
f3,0 f3,1 f3,2 f3,3 ...

Lecture notes for the week 10


Chapter 2. Metric and topological spaces Analysis II, Term I, Page 106

Clearly, (gi )∞
i=0 is a subsequence of F , and moreover for every l ∈ N the sequence
gi (xl ) converges. Let us define g(xl ) = limi→∞ gi (xl ).

Step 2. The sequence of functions gi : [a, b] → R, for i ≥ 0, is Cauchy with


respect to the metric d∞ .
Proof of Step 2: Let us fix an arbitrary ǫ > 0. Since C is uniformly equi-
continuous, we may find δ > 0 such that for all x and y in [a, b] and all i ∈ N we
have
|x − y| < δ =⇒ |gi (x) − gi (y)| < ǫ/3.

Since [a, b] is bounded, there are rational numbers x1 , . . . , xk in [a, b] such that

[a, b] ⊂ ∪km=1 (xm − δ, xm + δ).

Since gi converges at each rational point, for each m = 1, . . . k there exists Nm such
that for all i, j ≥ Nm we have

|gi (xm ) − gj (xm )| < ǫ/3.

Let N = max{N1 , . . . Nk }, and suppose i, j ≥ N . Fix x ∈ [a, b]. By construction,


there is m ∈ {1, . . . , k} such that |x − xm | < δ. We have

|gi (x) − gj (x)| = |gi (x) − gi (xm ) + gi (xm ) − gj (xm ) + gj (xm ) − gj (x)|
≤ |gi (x) − gi (xm )| + |gi (xm ) − gj (xm )| + |gj (xm ) − gj (x)|
< ǫ/3 + ǫ/3 + ǫ/3 = ǫ.

Step 3. The sequence (gi )∞


i=0 converges in (C([a, b]), d ∞ ).
Proof of Step 3: By Step 2, gi is a Cauchy sequence in (C([a, b]), d∞ ). Then, by
Theorem 2.41, (gi )i≥1 converges to some g in the metric space (C([a, b], d∞ ).

2.4.3 Fixed point Theorem


Definition 2.33. Let (X1 , d1 ) and (X2 , d2 ) be metric spaces, and f : X1 → X2 .
We say that f is contracting, if there exists K ∈ (0, 1) such that for all a and b in
X1 we have
d2 (f (a), f (b)) ≤ K · d1 (a, b).

It is easy to see that every contracting map is continuous.


For a map f : X → X, we say that x ∈ X is a fixed point of f , if f (x) = x.

Theorem 2.44 (Banach fixed point Theorem). Let (X, d) be a non-empty complete
metric space, and f : X → X be a contracting map. Then, f has a unique fixed
point in X.

Lecture notes for the week 10


Chapter 2. Metric and topological spaces Analysis II, Term I, Page 107

Proof. Let x0 ∈ X be an arbitrary point. Let us define the sequence of points


(xn )n≥0 according to xn+1 = f (xn ), for n ≥ 0.
Since f is contracting, there is K ∈ (0, 1) such that for all a and b in X we have
d(f (a), f (b)) ≤ K · d(a, b). Then, for every j ∈ N, we have

d(xj+1 , xj ) = d(f (xj ), f (xj−1 )) ≤ K d(xj , xj−1 ) ≤ · · · ≤ K j d(x1 , x0 ).

Therefore, for integers m > n, we have

d(xm , xn ) ≤ d(xm , xm−1 ) + · · · + d(xn+1 , xn )


≤ (K m−1 + K m−2 + · · · + K n ) d(x1 , x0 )
1
≤ Kn d(x1 , x0 ).
1−K
Because K ∈ (0, 1), the last expression in the above equation converges to 0 as
n → ∞. This implies that the sequence (xn )n≥1 is Cauchy in (X, d).
Since (X, d) is complete, the sequence (xn )n≥1 converges to some x in X. As f
is continuous, f (xn ) → f (x), as n → ∞. But f (xn ) = xn+1 → x, as n → ∞. By
the uniqueness of the limits of convergent sequences in metric spaces, we must have
x = f (x).
The above argument shows that f has a fixed point. To show the uniqueness of
the fixed point, assume that there is y ∈ X such that f (y) = y. By the contraction
property of f , we have

d(x, y) = d(f (x), f (y)) < K d(x, y).

Since K < 1, we must have d(x, y) = 0, and hence x = y.



Exercise 2.42. Let x1 = 2, and define the sequence (xn )n≥1 according to

q
xn+1 = 2 + xn .

Show that the sequence (xn )n≥1 converges to a root of the equation

x4 − 4x2 − x + 4 = 0

which lies in the interval [ 3, 2].

Exercise 2.43. Consider the map f : (0, 1/3) → (0, 1/3), defined as f (x) = x2 .
Show that the map f is a contraction with respect to the Euclidean metric d1 . But,
f has no fixed point in (0, 1/3).
Exercise 2.44. Consider the map f : [1, ∞) → [1, ∞) defined as f (x) = x + 1/x.
Show that ([1, +∞), d1 ) is a complete metric space, and for all x and y in [1, ∞) we
have
d1 (f (x), f (y)) ≤ d(x, y).
But, f has no fixed point.

Lecture notes for the week 10


Chapter 2. Metric and topological spaces Analysis II, Term I, Page 108

2.5 Connectedness
The intermediate value theorem is one of the main features in real analysis with
many applications. If f : [a, b] → R is a continuous function, and there are α and
β in [a, b] such that f (α) < 0 and f (β) > 0, then there must be γ between α and
β such that f (γ) = 0. What is it about the domain [a, b], or the range R, or the
continuity of the map f which makes this theorem work? Is there any way to extend
this useful statement to more general settings. This is the purpose of this section,
and we will see that there is indeed a natural way to extend this property to more
general settings.

2.5.1 Connected sets


Definition 2.34. Let (X, d) be a metric space, and consider a subset T ⊆ X. We
say that T is disconnected, if there are open sets U and V in X satisfying the
following properties:

(i) U ∩ V = ∅,

(ii) T ⊆ U ∪ V ,

(iii) T ∩ U 6= ∅ and T ∩ V 6= ∅.

In particular, X is disconnected, if there are two open sets in X which are


non-empty, disjoint, and their union is equal to X.

Intuitively, the above definition suggests that T is disconnected, if it can be


separated into more than one piece using open sets. The separate pieces are T ∩ U
and T ∩ V .

Example 2.47. Consider the set R2 with the Euclidean metric d2 . Let

T = {(x, y) ∈ R2 | x ∈ [−1, 1], y = −1} ∪ {(x, y) ∈ R2 | x ∈ [−1, 1], y = 1}.

That is, T consists of two horizontal line segments in the plane. Intuitively, we see
T as having more than one piece. Indeed, T is disconnected. For example, let

U = {(x, y) ∈ R2 | x ∈ (−2, 2), y ∈ (−5/4, −3/4)},

V = {(x, y) ∈ R2 | x ∈ (−2, 2), y ∈ (3/4, 5/4)}.

The sets U and V are open in R2 , U ∩ V = ∅,

U ∩ T = [−1, 1] × {−1} =
6 ∅, V ∩ T = [−1, 1] × {1} =
6 ∅.

We also have T ⊆ U ∪ V .

Lecture notes for the week 11


Chapter 2. Metric and topological spaces Analysis II, Term I, Page 109

Note that being disconnected does not only depend on the set T , but it crucially
depends on the metric d on X. We illustrate this by the following example.

Example 2.48. Let (X, ddisc ) be a discrete metric space, and assume that X has
at least two elements. Then, X is disconnected. To see this, recall that in the
discrete topology, any subset of X is open. Let x ∈ X be an arbitrary elements.
Define U = {x} and V = X \ {x}. Then, since X has at least two points, V
must be non-empty. Then, U and V satisfy the three properties in the definition of
disconnectedness.

Definition 2.35. Let (X, d) be a metric space, and let T ⊂ X be an arbitrary


subset. We say that T is connected, if T is not disconnected. Equivalently, T is
connected, if for every pair of open sets U and V in X satisfying U ∩ V = ∅ and
T ⊆ U ∪ V , we must have either U ∩ T = ∅ or T ∩ V = ∅.
In particular, the whole set X is connected, if for every pair of open sets U and
V satisfying U ∪ V = X and U ∩ V = ∅, we must have either U = ∅ or V = ∅.

Exercise 2.45. Let (X, d) be a metric space. Show that X is connected if and only
if the only subsets of X which are both open and closed are X and ∅.

Example 2.49. Consider the set of real numbers with the Euclidean metric, and
let a ∈ R. Then the set R \ {a} is not connected (disconnected).
Let U = (−∞, a) and V = (a, +∞). Clearly, U and V are open, non-empty,
disjoint, and their union covers R \ {a}.

Exercise 2.46. Show that in the Euclidean metric space (R1 , d1 ), the set of rational
numbers Q is disconnected.

Lemma 2.45. Let (X, d) be a metric space, and T ⊆ X. Then, T is disconnected


if and only if there exists a continuous map f : T → R satisfying f (T ) = {0, 1}.

Proof. First assume that such a map f exists. Let U = f −1 (0) and V = f −1 (1).
Since f (T ) = {0, 1}, U 6= ∅ and V 6= ∅. Also, since f is continuous, U = f −1 (0) =
f −1 (−1/2, 1/2) and V = f −1 (1) = f −1 (1/2, 3/2) are open sets. Moreover, as
f (T ) = {0, 1}, T ⊆ U ∪ V . Obviously, U ∩ V = ∅. These imply that T is dis-
connect.
Now assume that T is disconnected. By definition, there are non-empty, disjoint,
open sets U and V in X such that T ⊆ U ∪ V , U ∩ T 6= ∅ and V ∩ T 6= ∅. Let us
define the map f : T → R as

0 if x ∈ U ∩ T,
f (x) =
1 if x ∈ V ∩ T.

Since (U ∩ T ) ∩ (V ∩ T ) = ∅, the above conditions make sense, and since T ⊂ U ∪ V ,


the map f is defined on T . We need to show that f is continuous on T .

Lecture notes for the week 11


Chapter 2. Metric and topological spaces Analysis II, Term I, Page 110

Let x be an arbitrary point in T and let (xn )n≥1 be a sequence in T which


converges to x. Since T ⊂ U ∪ V and U ∩ V = ∅, x belongs to one of U and V .
Without loss of generality, assume that x ∈ U . Since U us open, by the definition
of converges of sequences, there is N ∈ N such that for all n ≥ N , we have xn ∈ U .
Thus, for all n ≥ N , f (xn ) = 0. This implies that the sequence (f (xn ))n∈N converges
to 0 = f (x). Therefore, f is continuous at x. Since x was arbitrary in T , we conclude
that f is continuous on T .

It is easier to show that a set is disconnected than to show that it is connected. In


the former case, it is enough to find examples of two open sets with those properties.
But in the latter case, one needs to show that such pairs do not exist. Of course
that becomes a difficult task if there are two many open sets in the metric. You can
see this below, as we try to prove that the interval [a, b] is connected.
By an interval in R we mean any of the sets (a, b), (a, b], [a, b), [a, b], (−∞, +∞),
(−∞, b), (−∞, b], (a, +∞), or [a, +∞), for some a and b in R.

Lemma 2.46. Let S ⊆ R be a non-empty set. Then, S is an interval if and only if


for all x and y in S and all z ∈ R satisfying x < z < y we have z ∈ S.

Proof. If S is an interval, then by the definition of an interval, the latter side of the
lemma holds.
Now assume that the latter side of the theorem holds. If S is not bounded from
above, we define b = +∞, and if S is bounded from above, we define b = sup S.
Similarly, if S is not bounded from below, we define a = −∞, and if S is bounded
from below, we let a = inf S.
Let us first show that the open interval (a, b) ⊆ S. To see this, fix an arbitrary
z ∈ (a, b). Since z < b, z cannot be an upper bound for S (otherwise, sup S ≤ z).
Therefore, there is b′ ∈ S such that b′ > z. Similarly, since z > a, z cannot be
a lower bound for S (otherwise inf S ≥ z). Therefore, there is a′ ∈ S such that
a′ < z. Combining these together, we have a′ < z < b′ , a′ ∈ S, and b′ ∈ S. By the
assumption in the latter side of the theorem, we must have z ∈ S. Because z ∈ (a, b)
was arbitrary, we conclude that (a, b) ⊆ S.
Note that the supremum and infimum of a set do not have to be in the set itself.
There are several possibilities for the set S depending on whether each of a and b
belongs to S or not. (of course if a = −∞ or b = +∞, they cannot be in S). Then,



 [a, b] if a ∈ S and b ∈ S,

[a, b) if a ∈ S and b ∈

/ S,
S=


 (a, b] if a ∈/ S and b ∈ S,


(a, b) if a ∈/ S and b ∈/ S.

Lecture notes for the week 11


Chapter 2. Metric and topological spaces Analysis II, Term I, Page 111

Theorem 2.47. Consider the Euclidean metric space (R, d1 ) and let S ⊆ R. If S
is connected, then S is an interval.

Proof. Suppose S is connected, but it is not an interval. By Lemma 2.46, there exist
x and y in S and z ∈ R such that x < z < y and z ∈ / S.
Consider the sets U = (−∞, z) and V = (z, +∞). Then, the sets U and V
are open in R1 , U ∩ V = ∅, S ⊆ U ∪ V , and U ∩ S 6= ∅ (since it contains x) and
V ∩ S 6= ∅ (since it contains y). These show that U and V disconnect S, which is a
contradiction.

Theorem 2.48. For every a and b in R with a < b, the interval [a, b] is connected
in the metric space (R, d1 ).

Proof. Let us assume that [a, b] is disconnected. Then, there must be open sets U
and V in R such that

U ∩ [a, b] 6= ∅, V ∩ [a, b] 6= ∅, [a, b] ⊂ U ∪ V, U ∩ V = ∅.

Since a ∈ U ∪ V , we must have either a ∈ U or a ∈ V . By relabelling U and V if


necessary, we may assume that a ∈ U . Consider the set

I = {s ∈ [a, b] | [a, s] ⊆ U }.

As a ∈ I, the set I is not empty, and since I ⊆ [a, b], I is bounded from above.
Therefore, I has a supremum, which we denote by t. Note that t ∈ [a, b], and t may
or may not be in I. We consider three cases below.
(I) Assume that t ∈ I and t = b. These imply that [a, b] ⊂ U , which is a
contradiction, since [a, b] ∩ V 6= ∅ and U ∩ V = ∅.
(II) Assume that t ∈ / I. This implies that t ∈ / U , t 6= a and [a, t) ⊂ U . As
t ∈ [a, b] and [a, b] ⊂ U ∪ V , we must have t ∈ V . Now, since V is an open set
in R, there is δ > 0 such that (t − δ, t + δ) ⊂ V . As U ∩ V = ∅, we must have
(t − δ, t + δ) ∩ U = ∅. This contradicts [a, t) ⊂ U .
(III) Assume that t 6= b. We either have t ∈ U or t ∈ V . If t ∈ U , by the openness
of U , there is δ′ > 0 such that (t − δ′ , t + δ′ ) ⊂ U . This contradicts t = sup I. If
t ∈ V , by the openness of V , there is δ′′ > 0 such that (t − δ′′ , t + δ′′ ) ⊂ V . Thus
(t − δ′′ , t + δ′′ ) ∩ U = ∅. This contradicts t = sup I.

Exercise 2.47.* Consider the Euclidean metric space (R, d1 ), and assume that a
and b are real numbers with a < b.

(i) Show that the interval [a, b) is connected.

(ii) Show that the interval (a, b] is connected.

(iii) Show that the interval (a, b) is connected.

Lecture notes for the week 11


Chapter 2. Metric and topological spaces Analysis II, Term I, Page 112

2.5.2 Continuous maps and connected sets


Theorem 2.49. Let (A1 , d1 ) and (A2 , d2 ) be metric spaces, and f : A1 → A2 be a
continuous map. If S ⊂ A1 is connected, then f (S) is connected.

Proof. Let us assume in the contrary that f (S) is not connected. Then, there are
open sets U and V in A2 such that

U ∩ V = ∅, f (S) ⊂ U ∪ V, f (S) ∩ U 6= ∅, f (S) ∩ V 6= ∅.

Since f is continuous, the sets U ′ = f −1 (U ) and V ′ = f −1 (V ) are open in A1 .


Moreover, we have

U ′ ∩ V ′ = ∅, S ⊂ U ′ ∪ V ′, S ∩ U ′ 6= ∅, S ∩ V ′ 6= ∅.

These show that S is not connected in (A1 , d1 ), which is a contradiction.

Corollary 2.50. Assume that f : (X, dX ) → (Y, dY ) is a homeomorphism. Then


X is connected if and only if Y is connected.

Theorem 2.51. Let (X, d) be a connected metric space, and let f : X → R be


a continuous map. Assume that there are a and b in X satisfying f (a) < 0 and
f (b) > 0. Then, there is c ∈ X such that f (c) = 0.

Proof. Assume in the contrary that there is no c ∈ X satisfying f (c) = 0. Consider


the sets
U = f −1 ((−∞, 0)), V = f −1 ((0, +∞)).

These are subsets of X. As f is continuous, and the sets (−∞, 0) and (0, +∞) are
open in R, the sets U and V are open in (X, d). Obviously, U ∩ V = ∅. Moreover,
U 6= ∅ since a ∈ U , and V 6= ∅ since b ∈ V . Also, since there is no c ∈ X satisfying
f (c) = 0, U ∪ V = X. These show that X is disconnected, contradicting the
hypothesis in the theorem.

The connectedness of the domain X is a necessary condition for the interme-


diate value theorem for arbitrary metric spaces. To see that, assume that X is a
disconnected topological space. By Lemma 2.45 there is a continuous and surjective
map f : X → {0, 1}. Consider the map f − 1/2, which takes both values +1/2 and
−1/2, but does not take the value 0 at any point in X.

Corollary 2.52. Let f : [a, b] → R be a continuous map, and assume that there are
x and y in [a, b] satisfying f (x) < 0 and f (y) > 0. Then, there is z ∈ [a, b] such
that f (z) = 0.

Proof. This immediately follows from Theorem 2.51 and Theorem 2.48.

Lecture notes for the week 11


Chapter 2. Metric and topological spaces Analysis II, Term I, Page 113

Example 2.50. The intervals (0, 1) and (0, 1] are not homeomorphic. Assume in the
contrary that there is a homeomorphism f : (0, 1) → (0, 1]. Let x = f −1 (1) ∈ (0, 1).
Then, f : (0, 1) \ {x} → (0, 1) is a homeomorphism. This contradicts 2.50, since
(0, 1) is connected but (0, 1) \ {x} is nor connected.
By a similar argument, one can show that the pair of intervals (0, 1) and [0, 1],
as well as the pair of interval (0, 1] and [0, 1] are not homeomorphic.

2.5.3 Path connected sets


We already mentioned that in general it is easier to show that a set is disconnected
than to show that it is connected. In this section we aim to provide a constructive
criterion to show that a set is connected.

Definition 2.36. Consider a metric space (X, d). Given a pair of points a and b in
X, a path from a to b in X is a continuous map f : [0, 1] → X such that f (0) = a
and f (1) = b. This is also called a path joining a and b.

Remark 2.11. In the above definition, the closed interval [0, 1] can be replaced by
any closed interval [α, β].

Definition 2.37. A metric space (X, d) is called path-connected, if for any pair
of points a and b in X there is a path from a to b in X.

Exercise 2.48. Show that the following metric spaces are path connected.

(i) the Euclidean space Rn , for any n ≥ 1,

(ii) the open ball B1 (0) in (Rn , d2 ), for any n ≥ 2,

(iii) the annulus {(x, y) ∈ R2 | 1 ≤ k(x, y)k ≤ 2}.

Theorem 2.53. If a metric space (X, d) is path connected, then it is connected.

Proof. Let us assume that there is a metric space (X, d) which is path connected,
but not connected. By Lemma 2.45, there is a continuous map f : X → R satisfying
f (X) = {0, 1}. Then, there exist x and y in X such that f (x) = 0 and f (y) = 1.
Because X is path connected, there is a continuous map g : [0, 1] → X satisfying
g(0) = x and g(1) = y. Then, f ◦ g : [0, 1] → R is continuous, and its image is equal
to {0, 1}.
Let us consider the map (f ◦ g) − 1/2 on the interval [0, 1]. It take both values
−1/2 and +1/2, but it does not take the value 0. However, by Corollary 2.52, this
map must take the value 0 at some point in [0, 1].

By the above theorem, the sets in Exercise 2.48 are connected. In the same fash-
ion, one can show that the cube [0, 1]n is connected in Rn . Compare this argument
with how difficult it is to show that the interval [0, 1] is connected.

Lecture notes for the week 11


Chapter 2. Metric and topological spaces Analysis II, Term I, Page 114

Exercise 2.49. Consider the set of all continuous functions f : [0, 1] → R, that is
C([0, 1]), with the metric d1 .

(i) Show that the space (C([0, 1]), d 1 ) is path connected.

(ii) Conclude that the space (C([0, 1]), d1 ) is connected.

Exercise 2.50.* In this exercise, we aim to show that the converse of Theorem 2.53
is not true.
Consider the following subset of R2 :

A = {(x, sin(1/x)) ∈ R2 | x > 0} ∪ {(x, y) ∈ R2 | x = 0, y ∈ [−1, +1]}.

That is, A is the union of the oscillating curve which is the graph of sin(1/x), and
the vertical line segment {0} × [−1, +1].

(i) show that the set A is connected.

(ii) show that the set A is not path connected.

Theorem 2.54. Assume that f : R → R is a continuous map with respect to the


Euclidean metrics on the domain and the range. For any interval [a, b], f ([a, b]) is
an interval of the form [m, M ], for some real numbers m and M .

Proof. By Theorem 2.48, the interval [a, b] is connected in R. Since the image of
any connected set by a continuous map is connected (see Theorem 2.49), f ([a, b]) is
connected. Then, by Theorem 2.47, f ([a, b]) must be an interval. By the definition
of interval, f ([a, b]) is equal to one of the sets [m, M ], (m, M ], [m, M ), or (m, M ),
for some m ∈ R ∪ {−∞} and M ∈ R ∪ {+∞} with m ≤ M .
By Theorem 2.37, f ([a, b]) is compact. Thus, m and M are finite numbers and
f ([a, b]) = [m, M ].

Lecture notes for the week 11

You might also like