0% found this document useful (0 votes)
17 views375 pages

Linear Algebra Fundamentals and Applications

The document is a textbook on Linear Algebra authored by Jin Ho Kwak and Sungpyo Hong, aimed at undergraduate students in science and engineering. It covers fundamental concepts such as vector spaces, linear transformations, and matrices, while emphasizing computational skills and practical applications across various disciplines. The book includes numerous examples, exercises, and applications to enhance understanding and problem-solving abilities in linear algebra.

Uploaded by

Nityanand Thakur
Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
17 views375 pages

Linear Algebra Fundamentals and Applications

The document is a textbook on Linear Algebra authored by Jin Ho Kwak and Sungpyo Hong, aimed at undergraduate students in science and engineering. It covers fundamental concepts such as vector spaces, linear transformations, and matrices, while emphasizing computational skills and practical applications across various disciplines. The book includes numerous examples, exercises, and applications to enhance understanding and problem-solving abilities in linear algebra.

Uploaded by

Nityanand Thakur
Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

JIN Ho KWAK

SUNGPYO HONG

Linear Algebra

SPRINGER SCIENCE+BUSINESS MEDIA, LLC


lin Ho Kwak
Sungpyo Hong
Department of Mathematics
Pohang University of Science and Technology
Pohang, The Republic of Korea

Lihrary of Congress Cataloging-in-Publication Data


Kwak, Jin Ho, 1948-
Linear Algebra / Jin Ho Kwak, Sungpyo Hong.
p. cm.
Includes index.
ISBN 978-1-4757-1202-5 ISBN 978-1-4757-1200-1 (eBook)
DOI 10.1007/978-1-4757-1200-1
1. Algebras, Linear. 1. Hong, Sungpyo, 1948- II. Title.
QA184.K94 1997
512'.5--dc21 97-9062
CIP

Printed on acid-free paper


© 1997 Springer Science+Business Media New York
Originally published by Birkhiiuser Boston 1997
Softcover reprint ofthe hardcover lst edition 1997

Copyright is not c1aimed for works of U.S. Government employees.


AlI rights reserved. No part ofthis publication may be reproduced, stored in a retrieval system,
or transmitted, in any form or by any means, electronic, mechanical, photocopying, record ing,
or otherwise, without prior permis sion of the copyright owner.

Permission to photocopy for internal or personal use of specific c1ients is granted by


Springer Science+Business Media, LLC, for libraries and other users registered with the
Copyright Clearance Center (CCC), provided that the base fee of $6.00 per copy, IJlus $0.20
per page is paid directly to CCC, 222 Rosewood Drive, Danvers, MA 01923, U.S.A.
Special requests should be addressed directly to Springer Science+Business Media, LLC.

ISBN 978-1-4757-1202-5
Typesetting by the authors in L"'r~

9 8 7 65 43 2 1
Preface

Linear algebra is one of the most important subjects in the study of science
and engineering because of its widespread applications in social or natural
science, computer science, physics, or economics. As one of the most useful
courses in undergraduate mathematics, it has provided essential tools for
industrial scientists. The basic concepts of linear algebra are vector spaces,
linear transformations, matrices and determinants, and they serve as an
abstract language for stating ideas and solving problems.
This book is based on the lectures delivered several years in a sophomore-
level linear algebra course designed for science and engineering students. The
primary purpose of this book is to give a careful presentation of the basic
concepts of linear algebra as a coherent part of mathematics, and to illustrate
its power and usefulness through applications to other disciplines. We have
tried to emphasize the computational skills along with the mathematical
abstractions, which have also an integrity and beauty of their own. The
book includes a variety of interesting applications with many examples not
only to help students understand new concepts but also to practice wide
applications of the subject to such areas as differential equations, statistics,
geometry, and physics. Some of those applications may not be central to
the mathematical development and may be omitted or selected in a syllabus
at the discretion of the instructor. Most basic concepts and introductory
motivations begin with examples in Euclidean space or solving a system of
linear equations, and are gradually examined from different points of views
to derive general principles.
For those students who have completed a year of calculus, linear algebra
may be the first course in which the subject is developed in an abstract way,
and we often find that many students struggle with the abstraction and
miss the applications. Our experience is that, to understand the material,
students should practice with many problems, which are sometimes omitted
because of a lack of time. To encourage the students to do repeated practice,

v
VI Preface

we placed in the middle of the text not only many examples but also some
carefully selected problems, with answers or helpful hints. We have tried
to make this book as easily accessible and clear as possible, but certainly
there may be some awkward expressions in several ways. Any criticism or
comment from the readers will be appreciated.
We are very grateful to many colleagues in Korea, especially to the faculty
members in the mathematics department at Pohang University of Science
and Technology (POSTECH), who helped us over the years with various
aspects of this book. For their valuable suggestions and comments, we would
like to thank the students at POSTECH, who have used photocopied versions
of the text over the past several years. We would also like to acknowledge the
invaluable assistance we have received from the teaching assistants who have
checked and added some answers or hints for the problems and exercises in
this book. Our thanks also go to Mrs. Kathleen Roush who made this book
much more legible with her grammatical corrections in the final manuscript.
Our thanks finally go to the editing staff of Birkhauser for gladly accepting
our book for publication.

Jin Ho Kwak
Sungpyo Hong
E-mail: jinkwak@[Link]
sungpyo@[Link]
April 1997, in Pohang, Korea

"Linear algebra is the mathematics of our modern technological world of


complex multivariable systems and computers"
- Alan Tucker -

"We (Halmos and Kaplansky) share a love of linear algebra. I think it


is our conviction that we'll never understand infinite-dimensional operators
properly until we have a decent mastery of finite matrices. And we share a
philosophy about linear algebra: we think basis-free, we write basis-free, but
when the chips are down we close the office door and compute with matrices
like fury"
- Irving Kaplansky -
Contents

Preface v

1 Linear Equations and Matrices 1


1.1 Introduction . . . . . 1
1.2 Gaussian elimination 4
1.3 Matrices....... 12
1.4 Products of matrices 16
1.5 Block matrices . . . 22
1.6 Inverse matrices . . . 24
1.7 Elementary matrices 27
1.8 LDU factorization 33
1.9 Application: Linear models 38
1.10 Exercises 45

2 Determinants 49
2.1 Basic properties of determinant 49
2.2 Existence and uniqueness 54
2.3 Cofactor expansion . . . . . . . 60
2.4 Cramer's rule . . . . . . . . . . 65
2.5 Application: Area and Volume 68
2.6 Exercises .......... . 71

3 Vector Spaces 75
3.1 Vector spaces and subspaces . 75
3.2 Bases . . . . . . . . . . 81
3.3 Dimensions . . . . . . . 88
3.4 Rowand column spaces 94
3.5 Rank and nullity . . . . 100

vii
viii CONTENTS

3.6 Bases for subs paces . . . 104


3.7 Invertibility . . . . . 110
3.8 Application: Interpolation 113
3.9 Application: The Wronskian. 115
3.10 Exercises 117

4 Linear Transformations 121


4.1 Introduction. . . . . 121
4.2 Invertible linear transformations 127
4.3 Application: Computer graphics 132
4.4 Matrices of linear transformations 135
4.5 Vector spaces of linear transformations. 140
4.6 Change of bases. 143
4.7 Similarity 146
4.8 Dual spaces 152
4.9 Exercises 156

5 Inner Product Spaces 161


5.1 Inner products ............ . 161
5.2 The lengths and angles of vectors . . . . 164
5.3 Matrix representations of inner products 167
5.4 Orthogonal projections . . . . . . . . . . 171
5.5 The Gram-Schmidt orthogonalization 177
5.6 Orthogonal matrices and transformations 181
5.7 Relations of fundamental subspaces. . 185
5.8 Least square solutions .... . . 187
5.9 Application: Polynomial approximations 192
5.10 Orthogonal projection matrices 196
5.11 Exercises 204

6 Eigenvectors and Eigenvalues 209


6.1 Introduction.. . . . . . . 209
6.2 Diagonalization of matrices 216
6.3 Application: Difference equations 221
6.4 Application: Differential equations I 226
6.5 Application: Differential equations II . 230
6.6 Exponential matrices . . . . . . . . . . 235
6.7 Application: Differential equations III 240
6.8 Diagonalization of linear transformations. 243
CONTENTS ix

6.9 Exercises ...... . 245

7 Complex Vector Spaces 251


7.1 Introduction . . . . . 251
7.2 Hermitian and unitary matrices. 259
7.3 Unitarily diagonalizable matrices 263
7.4 Normal matrices .. 268
7.5 The spectral theorem. 271
7.6 Exercises . . . 276

8 Quadratic Forms 279


8.1 Introduction .. 279
8.2 Diagonalization of a quadratic form. 282
8.3 Congruence relation . . . . . . . . . 288
8.4 Extrema of quadratic forms . . . . . 292
8.5 Application: Quadratic optimization 298
8.6 Definite forms. 300
8.7 Bilinear forms. 303
8.8 Exercises .. 313

9 Jordan Canonical Forms 317


9.1 Introduction . . . . . 317
9.2 Generalized eigenvectors 327
9.3 Computation of eA .. 333
9.4 Cayley-Hamilton theorem 337
9.5 Exercises . . . . 340

Selected Answers and Hints 343

Index 365
Linear Algebra
Chapter 1

Linear Equations and


Matrices

1.1 Introduction
One of the central motivations for linear algebra is solving systems of linear
equations. We thus begin with the problem of finding the solutions of a
system of m linear equations in n unknowns of the following form:

au Xl + a12 X 2 + + alnXn bl
a2l X l + a22 x 2 + + a2n X n b2

1 amlxl + a m 2X2 + ... + amnXn = bm ,

where Xl, X2, ... , Xn are the unknowns and ai/s and b/s denote constant
(real or complex) numbers.
A sequence of numbers (81, 82, ... , 8 n ) is called a solution of the
system if Xl = 81, X2 = 82, ... , Xn = 8 n satisfy each equation in the system
simultaneously. When bl = b2 = ... = b m = 0, we say that the system is
homogeneous.
The central topic of this chapter is to examine whether or not a given
system has a solution, and to find a solution if it has one. For instance,
any homogeneous system always has at least one solution Xl = X2 = ... =
Xn = 0, called the trivial solution. A natural question is whether such a
homogeneous system has a nontrivial solution. If so, we would like to have a
systematic method of finding all the solutions. A system of linear equations
is said to be consistent if it has at least one solution, and inconsistent if

1
2 CHAPTER 1. LINEAR EQUATIONS AND MATRICES

it has no solution. The following example gives us an idea how to answer


the above questions.

Example 1.1 When m = n = 2, the system reduces to two equations in


two unknowns x and y:

Geometrically, each equation in the system represents a straight line


when we interpret x and y as coordinates in the xy-plane. Therefore, a
point P = (x, y) is a solution if and only if the point P lies on both lines.
Hence there are three possible types of solution set:

(1) the empty set if the lines are parallel,


(2) only one point if they intersect,
(3) a straight line: i.e., infinitely many solutions, if they coincide.

The following examples and diagrams illustrate the three types:

Case (1) Case (2) Case (3)

{ x-y =
x-y =
-1
0 { x+y = 1
x-y = 0
{
X -
2x - 2y
Y -1
-2
y Y y

1 1

x x x

To decide whether the given system has a solution and to find a general
method of solving the system when it has a solution, we repeat here a well-
known elementary method of elimination and substitution.
Suppose first that the system consists of only one equation ax + by = c.
Then the system has either infinitely many solutions (i.e., points on the
straight line x = -h
+ ~ or y = -%x + ~ depending on whether a i= 0 or
b i= 0) or no solutions when a = b = 0 and c i= O.
1.1. INTRODUCTION 3

We now assume that the system has two equations representing two lines
in the plane. Then clearly the two lines are parallel with the same slopes
if and only if a2 = Aal and b2 = Ab l for some A f. 0, or a I b2 - a2bI = o.
Furthermore, the two lines either coincide (infinitely many solutions) or are
distinct and parallel (no solutions) according to whether C2 = ACI holds or
not.
Suppose now that the lines are not parallel, or al b2 - a2bI f. o. In this
case, the two lines cross at a point, and hence there is exactly one solution:
For instance, if the system is homogeneous, then the lines cross at the origin,
so (0,0) is the only solution. For a nonhomogeneous system, we may find
the solution as follows: Express x in terms of y from the first equation, and
then substitute it into the second equation (i.e., eliminate the variable x
from the second equation) to get

which is in turn substituted into one of the equations to find x and give a
complete solution of the system. In detail, the process can be summarized
as follows:
(1) Without loss of generality, we may assume al f. 0 since otherwise we can
interchange the two equations. Then the variable x can be eliminated from
the second equation by adding - a2 times the first equation to the second,
al
to get

(2) Since a I b2 - a2bI f. 0, y can be found by multiplying the second equation


al
by a nonzero number b b to get
al 2 - a2 I

C!
aIc2 - a2C!
y =
a I b2 - a2 bI .
4 CHAPTER 1. LINEAR EQUATIONS AND MATRICES

(3) Now, x is solved by substituting the value of y into the first equation,
and we obtain the solution to the problem:

b2 Cl - b1 C2
al b 2 - a2bl

alc2 - a2Cl
a 1 b 2 - a2 bl .

Note that the condition a 1 b 2 - a2bl i- 0 is necessary for the system to have
only one solution. 0

In this example, we have changed the original system of equations into a


simpler one using certain operations, from which we can get the solution of
the given system. That is, if (x, y) satisfies the original system of equations,
then x and y must satisfy the above simpler system in (3), and vice versa.
It is suggested that the readers examine a system of three equations in
three unknowns, each equation representing a plane in the 3-dimensional
space ]R.3, and consider the various possible cases in a similar way.

Problem 1.1 For a system of three equations in three unknowns


anX + aI2Y + aI3z = bl
{ a2I x + a22Y + a23 z = b2
a3I x + a32Y + a33 z = b3 ,

describe all the possible types of the solution set in ]R.3.

1.2 Gaussian elimination


As we have seen in Example 1.1, a basic idea for solving a system of linear
equations is to change the given system into a simpler system, keeping the
solutions unchanged; the example showed how to change a general system
to a simpler one. In fact, the main operations used in Example 1.1 are the
following three operations, called elementary operations:

(1) multiply a nonzero constant throughout an equation,


(2) interchange two equations,
(3) change an equation by adding a constant multiple of another equation.
1.2. GAUSSIAN ELIMINATION 5

After applying a finite sequence of these elementary operations to the


given system, one can obtain a simpler system from which the solution can
be derived directly.
Note also that each of the three elementary operations has its inverse
operation which is also an elementary operation:
(1)' divide the equation with the same nonzero constant,
(2)' interchange two equations again,
(3)' change the equation by subtracting the same constant multiple of the
same equation.
By applying these inverse operations in reverse order to the simpler system,
one can recover the original system. This means that a solution of the
original system must also be a solution of the simpler one, and vice versa.
These arguments can be formalized in mathematical language. Observe
that in performing any of these basic operations, only the coefficients of the
variables are involved in the calculations and the variables Xl, ... , Xn and
the equal sign "=" are simply repeated. Thus, keeping the order of the
variables and "=" in mind, we just extract the coefficients only from the
equations in the given system and make a rectangular array of numbers:

This matrix is called the augmented matrix for the system. The term
matrix means just any rectangular array of numbers, and the numbers in this
array are called the entries of the matrix. To explain the above operations
in terms of matrices, we first introduce some terminology even though in the
following sections we shall study matrices in more detail.
Within a matrix, the horizontal and vertical subarrays

are called the i-th row (matrix) and the j-th column (matrix) of the aug-
mented matrix, respectively. Note that the entries in the j-th column are
6 CHAPTER 1. LINEAR EQUATIONS AND MATRICES

just the coefficients of j-th variable Xj, so there is a correspondence between


columns of the matrix and variables of the system.
Since each row of the augmented matrix contains all the information of
the corresponding equation of the system, we may deal with this augmented
matrix instead of handling the whole system of linear equations.
The elementary operations to a system of linear equations are rephrased
as the elementary row operations for the augmented matrix, as follows:
(1) multiply a nonzero constant throughout a row,
(2) interchange two rows,
(3) change a row by adding a constant multiple of another row.

The inverse operations are


(1)' divide the row by the same constant,
(2)' interchange two rows again,
(3)' change the row by subtracting the same constant multiple of the other
row.

Definition 1.1 Two augmented matrices (or systems of linear equations)


are said to be row-equivalent if one can be transformed to the other by a
finite sequence of elementary row operations.

If a matrix B can be obtained from a matrix A in this way, then we


can obviously recover A from B by applying the inverse elementary row
operations in reverse order. Note again that an elementary row operation
does not alter the solution of the system, and we can formalize the above
argument in the following theorem:

Theorem 1.1 If two systems of linear equations are row-equivalent, then


they have the same set of solutions.

The general procedure for finding the solutions will be illustrated in the
following example:

Example 1.2 Solve the system of linear equations:

+
3~ ++
2y 4z 2
2y + 2z 3
{ 4y + 6z = -1.
1.2. GAUSSIAN ELIMINATION 7

Solution: We could work with the augmented matrix alone. However,


to compare the operations on systems of linear equations with those on the
augmented matrix, we work on the system and the augmented matrix in
parallel. Note that the associated augmented matrix of the system is

[ ~~~ ~].
3 4 6 -1

(1) Since the coefficient of x in the first equation is zero while that in the
second equation is not zero, we interchange these two equations:

+ 2z 3
+ 4z 2
+ 6z -1

(2) Add -3 times the first equation to the third equation:

X + 2y + 2z = 3 1 2
{ 2y + 4z = 2 [ o 2
- 2y = -10 o -2

The coefficient 1 of the first unknown x in the first equation (row) is called
the pivot in this first elimination step.
Now the second and the third equations involve only the two unknowns
y and z. Leave the first equation (row) alone, and the same elimination
procedure can be applied to the second and the third equations (rows): The
pivot for this step is the coefficient 2 of y in the second equation (row). To
eliminate y from the last equation,
(3) Add 1 times the second equation (row) to the third equation (row):

[H : J]
X + 2y + 2z = 3
{ 2y + 4z = 2
4z = -8

The elimination process done so far to obtain this result is called a for-
ward elimination: i.e., elimination of x from the last two equations (rows)
and then elimination of y from the last equation (row).
Now the pivots of the second and third rows are 2 and 4, respectively.
To make these entries 1,
8 CHAPTER 1. LINEAR EQUATIONS AND MATRICES

(4) Divide each row by the pivot of the row:

X + 2y + 2z = 3
{ y + 2z = 1
z = -2

The resulting matrix on the right side is called a row-echelon form of the
matrix, and the 1 's at the leftmost entries in each row are called the leading
1 'so The process so far is called a Gaussian elimination.
We now want to eliminate numbers above the leading 1 's;
(5) Add -2 times the third row to the second and the first rows,

[ o~~~ ~l·
7
5
z -2 0 1 -2

(6) Add -2 times the second row to the first row:

[ o~ ~ ~ -~ 1
-3
y 5
z -2 0 1 -2

This matrix is called the reduced row-echelon form. The procedure


to get this reduced row-echelon form from a row-echelon form is called the
back substitution. The whole process to obtain the reduced row-echelon
form is called a Gauss-Jordan elimination.
Notice that the corresponding system to this reduced row-echelon form
is row-equivalent to the original one and is essentially a solved form: z.e.,
the solution is x = -3, y = 5, Z = -2. D

In general, a matrix of row-echelon form satisfies the following prop-


erties.

(1) The first nonzero entry of each row is 1, called a leading 1.


(2) A row containing only O's should come after all rows with some nonzero
entries.
(3) The leading l's appear from left to the right in successive rows. That
is, the leading 1 in the lower row occurs farther to the right than the
leading 1 in the higher row.

Moreover, the matrix of the reduced row-echelon form satisfies


1.2. GAUSSIAN ELIMINATION 9

(4) Each column that contains a leading 1 has zeros everywhere else, in
addition to the above three properties.

Note that an augmented matrix has only one reduced row-echelon form
while it may have many row-echelon forms. In any case, the number of
nonzero rows containing leading 1's is equal to the number of columns con-
taining leading 1'so The variables in the system corresponding to columns
with the leading 1's in a row-echelon form are called the basic variables. In
general, the reduced row-echelon form U may have columns that do not con-
tain leading 1'so The variables in the system corresponding to the columns
without leading 1's are called free variables. Thus the sum of the number
of basic variables and that of free variables is precisely the total number of
variables.
For example, the first two matrices below are in reduced row-echelon
form, and the last two just in row-echelon form.

[ ~ ~ ~l' [~ ~ ~ ~ ~l' [~ ~ ! ~l [~ ~ ~ ~l·


000 0 0 0 0 0 001 7 0 0 1 3

Notice that in an augmented matrix [A b], the last column b does not
correspond to any variable. Hence, if we consider the four matrices above
as augmented matrices for some systems, then the systems corresponding
to the first and the last two augmented matrices have only basic variables
but no free variables. In the system corresponding to the second augmented
matrix, the second and the forth variables, X2 and X4, are basic, and the
first and the third variables, Xl and X3, are free variables. These ideas will
be used in later chapters.
In summary, by applying a finite sequence of elementary row operations,
the augmented matrix for a system of linear equations can be changed to
its reduced row-echelon form which is row-equivalent to the original one.
From the reduced row-echelon form, we can decide whether the system has
a solution, and find the solution of the given system if it has one.

Example 1.3 Solve the following system of linear equations by Gauss-


Jordan elimination.

{ Xl
+ 3X2 2X3 = 3
2XI + 6X2 2X3 + 4X4 18
X2 + X3 + 3X4 = 10.
10 CHAPTER 1. LINEAR EQUATIONS AND MATRICES

Solution: The augmented matrix for the system is

[1o 3 -2 0 3]
2 6 -2 4 18
1 1 3 10
.

The Gaussian elimination begins with:


(1) Adding -2 times the first row to the second produces

[1o 3 -2 0 3]
0 2 4 12 .
o 1 1 3 10

(2) Note that the coefficient of X2 in the second equation is zero and that
in the third equation is not. Thus, interchanging the second and the third
rows produces

[1o 3 -2 0 3]
011310
0 2 4 12
.

(3) The pivot in the third row is 2. Thus, dividing the third row by 2
produces a row-echelon form

3 -2 0 3]
[~ 1
o
1 3 10
1 2 6
.

This is a row-echelon form, and we now continue the back-substitution:


(4) Adding -1 times the third row to the second, and 2 times the third
row to the first produces

1 3 0 4 15]
[ o 1 0 1 4 .
00126

(5) Finally, adding -3 times the second row to the first produces the
reduced row-echelon form:

~ ~ ~].
1 0
[ o 1
o 0 126
1.2. GAUSSIAN ELIMINATION 11

The corresponding system of equations is

3
= 4
6.

Since Xl, x2, and X3 correspond to the columns containing leading 1's,
they are the basic variables, and X4 is the free variable. Thus by solving this
system for the basic variables in terms of the free variable X4, we have the
system of equations in a solved form:

3
4
6

By assigning an arbitrary value t to the free variable X4, the solutions can
be written as

(Xl, X2, X3, X4) = (3 - t, 4 - t, 6 - 2t, t),

for any t E JR, where JR denotes the set of real numbers. o

!
Remark: Consider a homogeneous system

""Xl
a2l X I

amlxl
+
+

+
al2 x 2
a22 x 2

a m 2X 2
+
+

+ ... +
+
+
alnXn
a2n X n

amnXn
=
=

=
0
0

0,

with the number of unknowns greater than the number of equations: that
is, m < n. Since the number of basic variables cannot exceed the number
of rows, a free variable always exists as in Example 1.3, so by assigning
an arbitrary value to each free variable we can always find infinitely many
nontrivial solutions.

Problem 1.2 Suppose that the augmented matrix for a system of linear equations
has been reduced to the reduced row-echelon form below by elementary row opera-
tions. Solve the systems:

10
(1) [ 0 1
oo 5] ,
-2
1 0 o 4
(2) [ 0 1 o 2
-1 ]
6 .
o 0 o 4 o 0 1 3 2
12 CHAPTER 1. LINEAR EQUATIONS AND MATRICES

We note that if a row-echelon form of an augmented matrix has a row


of the type [ 0 0 ... 0 b 1 with b -=1= 0, then it represents an equation of the
form OXl + OX2 + ... + OXn = b with b -=1= O. In this case, the system
has no solution. If b = 0, then it has a row containing only O's that can be
neglected. Hence, when we deal with a row-echelon form, we may assume
that the zero rows are deleted. Note also that, as in Example 1.3, if there
exists at least one free variable in the row-echelon form, then the system
has infinitely many solutions. On the other hand, if the system has no free
variable, the system has a unique solution.
To study systems of linear equations in terms of matrices systematically,
we will develop some general theories of matrices in the following sections.

Problem 1.3 Solve the following systems of equations by Gaussian elimination.


What are the pivots?

{
-x + y + 2z
{ z
0 2y 1
(1) 3x + 4y + z 0 (2) 4x lOy + 3z 5
2x + 5y + 3z O. 3x 3y 6.
w + x + y 3
-3w 17x + y + 2z 1
(3) { 4w 17x + 8y 5z 1
5x 2y + z 1.

Problem 1.4 Determine the condition on b; so that the following system has a so-
lution.
+ 2y + 6z b1 + 3y 2z b1
(1) { ,: 3y 2z b2 (2) { 2~ Y + 3z b2
3x y + 4z h 4x + 2y + z h

1.3 Matrices
Rectangular arrays of real numbers arise in many real-world problems. His-
torically, it was the English mathematician A. Cayley who first introduced
the word "matrix" in the year 1858. The meaning of the word is "that within
which something originates," and he used matrices simply as a source for
rows and columns to form squares.
In this section we are interested only in very basic properties of such
matrices.

Definition 1.2 An m by n (written mxn) matrix is a rectangular array of


numbers arranged into m (horizontal) rows and n (vertical) columns. The
1.3. MATRICES 13

size of a matrix is specified by the number m of rows and the number n of


columns.

In general, a matrix is written in the following form:

A=

or just A = [aij] if the size of the matrix is clear from the context. The
number aij is called the (i, j)-entry of the matrix A, and can be also written
as aij = [A]ij.
An mx 1 matrix is called a column (matrix) or sometimes a column
vector, and a 1 x n matrix is called a row (matrix), or a row vector.
These special cases are important, as we will see throughout the book. We
will generally use capital letters like A, B, C for matrices and small boldface
letters like x, y, z for columns or row vectors.

Definition 1.3 Let A = [aij] be an mxn matrix. The transpose of A is


the nXm matrix, denoted by AT, whose j-th column is taken from the j-th
row of A: That is, AT = [b ij ] with bij = aji .

For example, if A = [~ ! ~ 1' then AT = [~ :].


In particular, the transpose of a column vector is a row vector and vice
versa. For example, for an n x 1 column vector

x=

its transpose x T = [Xl X2 ... xn] is a row vector.

Definition 1.4 Let A = [aij] be an m x n matrix.


(1) A is called a square matrix of order n if m = n.
In the following, we assume that A is a square matrix of order n.
14 CHAPTER 1. LINEAR EQUATIONS AND MATRICES

(2) The entries all, a22, ... , ann are called the diagonal entries of A.
(3) A is called a diagonal matrix if all the entries except for the diagonal
entries are zero.
(4) A is called an upper (lower) triangular matrix if all the entries
below (above, respectively) the diagonal are zero.

The following matrices U and L are the general forms of the upper tri-
angular and lower triangular matrices, respectively:

all aI2 aln all 0 0


0 a22 a2n a2I a22 0
U= L=
0 0 ann anI an2 ann

Note that a matrix which is both upper and lower triangular must be a
diagonal matrix, and the transpose of an upper (lower) triangular matrix is
lower (upper, respectively) triangular matrix.

Definition 1.5 Two matrices A = [aij] and B = [b ij ] are said to be equal,


written A = B, if they have the same size and corresponding entries are
equal: i. e., aij = bij for all i and j.

This definition allows us to write matrix equations. A simple example is


(AT)T = A by definition.
Let Mmxn(lR) denote the set of all m x n matrices with entries of real
numbers. Among the elements of Mmxn(lR), we can define two operations,
called scalar multiplication and the sum of matrices, as follows:
Scalar multiplication: Given an m x n matrix A = [aij] and a scalar k
(which is simply a real number), the scalar multiplication kA of k and A is
defined to be the matrix kA = [kaij]: i.e., in an expanded form:

Sum of matrices: If A = [aij] and B = [b ij ] are two matrices of the same


size, then the sum A + B is defined to be the matrix A + B = [aij + bij ]:
1.3. MATRICES 15

i.e., in an expanded form:

aIn]
:. + [ bu
:. [
au 7 bu al n 7bIn ].

a mn bml amI + bml a mn + bmn


Note that matrices of different sizes cannot be added. It is quite clear
that A + A = 2A, and A + (A + A) = (A + A) + A = 3A. Thus, inductively
we define nA = (n - I)A + A for any positive integer n. If B is any matrix,
then -B is by definition the multiplication (-I)B. Moreover, if A and B
are two matrices of the same size, then the difference A - B is by definition
the sum A + (-1)B = A + (- B). A matrix whose entries are all zero is called
a zero matrix, denoted by the symbol 0 (or Omxn when we emphasize the
number of rows and columns).
Clearly, matrix addition has the same properties as the addition of real
numbers. The real numbers in the context here are traditionally called
scalars even though "numbers" is a perfectly good name and "scalar" sounds
more technical. The following theorem lists some basic rules of these opera-
tions.

Theorem 1.2 Suppose that the sizes of A, Band C are the same. Then
the following rules of matrix arithmetic are valid:
(1) (A + B) + C = A + (B + C), (written as A + B + C) (Associativity),
(2) A + 0 = 0 + A = A,
(3) A + (-A) = (-A) + A = 0,
(4) A + B = B + A, (Commutativity),
(5) k(A+B)=kA+kB,
(6) (k+£)A=kA+fA,
(7) (k£)A = k (fA).

Proof: We prove only (5) and the remaining are left for exercises. For any
(i, j),

[k(A + B)lij = k[A + Blij = k([Alij + [Blij) = [kAlij + [kBl ij .


Consequently, k(A + B) = kA + kB. o
16 CHAPTER 1. LINEAR EQUATIONS AND MATRICES

Definition 1.6 A square matrix A is said to be symmetric if AT = A, or


skew-symmetric if AT = -A.

For example, the matrices A and B below

o1 32]
o
A= -1
-2 -3 0

are symmetric and skew-symmetric, respectively. Notice here that all the
diagonal entries of a skew-symmetric matrix must be zero, since aii = -aii.
By a direct computation, one can easily verify the following rules of the
transpose of matrices:

Theorem 1.3 Let A and B be mxn matrices. Then

Problem 1.5 Prove the remaining parts of Theorem 1.2.

Problem 1.6 Find a matrix B such that A + BT = (A - Bf, where

A= [
-1
~ =~0 ~].
1

Problem 1.7 Find a, b, c and d such that

[~ ~]=2[~ a+~]+[~:~ a+~].

1.4 Products of matrices


We introduced the operations sum and scalar multiplication of matrices in
Section 1.3. In this section, we introduce the product of matrices. Unlike the
sum of two matrices, the product of matrices is a little bit more complicated,
in the sense that it is defined for two matrices of different sizes or for square
matrices of the same order. We define the product of matrices in three steps:
1.4. PRODUCTS OF MATRICES 17

(1) For a 1 x n row matrix a = [a1 ... an] and an n x 1 column matrix
x = [Xl ... xn]T, the product ax is a lxl matrix (i.e., just a number)
defined by the rule

= [a1 x 1 + a2 x 2 + ... + anxn] = [t


t=l
aiXi]'

Xn

Note that the number of columns of the first matrix must be equal to the
number of rows of the second matrix to have entrywise multiplications of
the entries.
(2) For an m x n matrix

A=
[i I
where ai's denote the row vectors, and for an n x 1 column matrix x =
[Xl ... Xn]T, the product Ax is by definition an m x 1 matrix defined by
the rule:
a1 x 2:7=1 a1i x i
a2x 2:7=1 a2i X i
Ax=

I
amx
or in an expanded form

au a12 a1n Xl aU X 1 + a12 x 2 + ... + a1n X n


a21 a22 a2n X2 a21 x 1 + a22 x 2 + ... + a2n X n

a m1 a m2 a mn Xn a m 1X 1 + a m 2x 2: + ... + amnXn
which is just an m x 1 column matrix of the form [b 1 b2 ... bm]T.
Therefore, for a system of m linear equations in n unknowns, by writing
the n unknowns as an n x 1 column matrix x and the coefficients as an m x n
matrix A the system may be expressed as a matrix equation Ax = b. Notice
that this looks just like the usual linear equation in one variable: ax = b.
18 CHAPTER 1. LINEAR EQUATIONS AND MATRICES

(3) Product of matrices: Let A be an mxn matrix and B an nxr


matrix. The product AB is defined to be an m x r matrix whose columns
are the products of A and the columns of B in corresponding order.
Thus if A is mxn and B is nxr, then B has r columns and each column
of B is an n x 1 matrix. If we denote them by b l , ... , b T , or B = [b l ... bTJ,
then
AB = [ Ab l Ab2 AbT
alb l alb2 alb T
a2 bl a2 b2 a2 bT
=
amb l am b2 amb T

which is an m x r matrix. Therefore, the (i, j)-entry [ABJij of AB is the


i-th entry of the j-th column matrix

i.e., for i = 1, ... , m and j = 1, ... , r, it is the product of i-th row and
j-th column of A: n
[ABJij = ~bi = L aikbkj.
k=l

Example 1.4 Consider the matrices

The columns of AB are the product of A and each column of B:

[! ~][~] = [2.1+3.5]=[17],
4·1+0·5 4

[! ~] [-~ ] = [2.2+3.(-1)]=[1],
4·2+0·(-1) 8

[! ~][~] = [ 2.0+3.0]=[0].
4·0+0·0 °
1.4. PRODUCTS OF MATRICES 19

Therefore, AB is

[~ ~] [! _~ ~] = [1~ ~ ~].
Since A is a 2x2 matrix and B is a 2x3 matrix, the product AB is a 2x3
matrix. If we concentrate, for example, on the (2, I)-entry of AB, we single
out the second row from A and the first column from B, and then we multiply
corresponding entries together and add them up, i.e., 4·1 + 0·5 = 4. D

Note that the product AB of A and B is not defined if the number of


columns of A and the number of rows of B are not equal.
Remark: In step (2), we could have defined for a 1 x n row matrix A and
an n x r matrix B using the same rule defined in step (1). And then in step
(3) an appropriate modification produces the same definition of the product
of matrices. We suggest the readers verify this (see Example 1.6).
The identity matrix of order n, denoted by In (or I if the order is clear
from the context ), is a diagonal matrix whose diagonal entries are all 1, i.e.,
1 o o
o 1
o
001
By a direct computation, one can easily see that AIn = A = InA for any
n x n matrix A.
Many, but not all, of the rules of arithmetic for real or complex numbers
also hold for matrices with the operations of scalar multiplication, the sum
and the product of matrices. The matrix Omxn plays the role of the number
0, and In that of the number 1 in the set of real numbers.
The rule that does not hold for matrices in general is the commutativity
AB = BA of the product, while the commutativity of the matrix sum
A + B = B + A does hold in general. The following example illustrates the
noncommutativity of the product of matrices.

Example 1.5 Let A = [~ _ ~] and B = [~ ~]. Then,


BA = [0-1]
1 0 .
20 CHAPTER 1. LINEAR EQUATIONS AND MATRICES

Thus the matrices A and B in this example satisfy AB i- BA. o

The following theorem lists some rules of ordinary arithmetic that do


hold for matrix operations.

Theorem 1.4 Let A, B, C be arbitrary matrices for which the matrix op-
erations below are defined, and let k be an arbitrary scalar. Then
(1) A(BC) = (AB)C, (written as ABC) (Associativity),
(2) A(B + C) = AB + AC, and (A + B)C = AC + BC, (Distributivity),
(3) I A = A = AI,
(4) k(BC) = (kB)C = B(kC),
(5) (AB)T = BT AT.

Proof: Each equality can be shown by direct calculations of each entry of


both sides of the equalities. We illustrate this by proving (1) only, and leave
the others to the readers.
Assume that A = [aij] is an mxn matrix, B = [bkC] is an nxp matrix,
and C = [cst] is a pxr matrix. We now compute the (i,j)-entry of each
side of the equation. Note that BC is an nxr matrix whose (i,j)-entry is
[BC]ij = L~=1 bi),c)o.j. Thus
n n p n p
[A(BC)L j = L ai"[BC],,j = L ai" L b,,)o.c)o.j = L L ai"b,,)o.c)o.j.
,,=1 ,,=1 )0.=1 ,,=1)0.=1

Similarly, AB is an mxp matrix with the (i,j)-entry [AB]ij = L~=1 ai"b"j,


and
p p n n p
[(AB)CLj = L[AB]i)o.C)o.j = L L ai"b,,)o.c)o.j = L L ai"b,,)o.c)o.j.
)0.=1 )0.=1,,=1 ,,=1)0.=1

This clearly shows that [A(BC)L j = [(AB)C]ij for all i, j, and consequently
A(BC) = (AB)C as desired. 0

Problem 1.8 Prove or disprove: If A is not a zero matrix and AB = AG, then
B=G.

Problem 1.9 Show that any triangular matrix A satisfying AAT = AT A is a diag-
onal matrix.
1.4. PRODUCTS OF MATRICES 21

Problem 1.10 For a square matrix A, show that


(1) AAT and A + AT are symmetric,
(2) A - AT is skew-symmetric, and
(3) A can be expressed as the sum of symmetric part B = ~(A + AT) and skew-
symmetric part C = ~(A - AT), so that A = B + C.

As an application of our results on matrix operations, we shall prove the


following important theorem:

Theorem 1.5 Any system of linear equations has either no solution, exactly
one solution, or infinitely many solutions.

Proof: We have already seen that a system of linear equations may be


written as Ax = b, which may have no solution or exactly one solution.
Now assume that the system Ax = b of linear equations has more than one
solution and let Xl and X2 be two different solutions so that AXI = band
AX2 = b. Let Xo = Xl - X2 =f O. Since Ax is just a particular case of a
matrix product, Theorem 1.4 gives us

for any real number k. This says that Xl + kxo is also a solution of Ax = b
for any k. Since there are infinitely many choices for k, Ax = b has infinitely
many solutions. 0

Problem 1.11 For which values of a does each of the following systems have no
solution, exactly one solution, or infinitely many solutions?
x + 2y 3z 4
(1) { 3x y + 5z 2
4x + y + (a 2 - 14)z a+2.
x - y + z 1
(2) { x + 3y + az 2
2x + ay + 3z 3.
22 CHAPTER 1. LINEAR EQUATIONS AND MATRICES

1.5 Block matrices


In this section we introduce some techniques that will often be very helpful
in manipulating matrices. A submatrix of a matrix A is a matrix obtained
from A by deleting certain rows and/or columns of A. Using a system of
horizontal and vertical lines, we can partition a matrix A into submatrices,
called blocks, of A as follows: Consider a matrix

au a12 a13 I a14]


A = [ a21 a22 a23 I a24 ,
a31 a32 a33 I a34
divided up into four blocks by the dotted lines shown. Now, if we write

a12
A u= [ all a 13 ] , A12 = [ a 14 ] ,
a21 a22 a23 a24

A21 = [ a31 a32 a33 ] , A22 = [ a34 ] ,

then A can be written as

called a block matrix.


The product of matrices partitioned into blocks also follows the matrix
product formula, as if the Aij were numbers:

A [ All A12] B = [Bu B12].


A21 A22 ' B21 B22 '

AB = [
AllBll + A12B21 AllB12 + A12B22]
A21Bll + A22B21 A21B12 + A22B22 '

provided that the number of columns in Aik is equal to the number of rows
in Bkj. This will be true only if the columns of A are partitioned in the
same way as the rows of B.
It is not hard to see that the matrix product by blocks is correct. Sup-
pose, for example, that we have a 3x3 matrix A and partition it as
1.5. BLOCK MATRICES 23

and a 3 x 2 matrix B which we partition as

B= [~~~ ~~~ 1
b31 b32
[ Bn ].
B21

Then the entries of C = [ciiJ = AB are

Cij = (ail b1j + ai2b2j) + ai3 b3j .


The quantity ai1b1j + ai2b2j is simply the (ij)-entry of AnBn if i ::; 2, and
the (i j)-entry of A21BU if i = 3. Similarly, ai3b3j is the (i j)-entry of A12B21
if i ::; 2, and of A22B21 if i = 3. Thus AB can be written as

AB = [ Cn ] = [ AnBn + A12B21 ].
C12 A21 B n + A22B21

In particular, if an m x n matrix A is partitioned into blocks of column


vectors: i.e., A = [a1 a 2 ... an], where each block a j is the j-th column,
then the product Ax with x = [Xl ... xnV is the sum of the block matrices
(or column vectors) with coefficients xi's:

Example 1.6 Let A be an m x n matrix partitioned into the row vectors


B be an n x r matrix so that their
a1, a2, ... , an as its blocks, and let

I
product AB is well-defined. By considering the matrix B as a block, the
product AB can be written

a1 a1 b1 a1 b2
a2
[ alB
a2 B a2 b1 a2 b2 alb"r
AB= B= = a 2:b ,

am amB a mb 1 a mb 2 amb r

where b 1, b 2, "', b r denote the columns of B. Hence, the row vectors of


AB are the products of the row vectors of A and B.
24 CHAPTER 1. LINEAR EQUATIONS AND MATRICES

Problem 1.12 Compute AB using block multiplication, where

A =
[
1 2
-~ ~: ~
1 1
~ 1' B=[~
-1
3
!i:].
-21 1

1.6 Inverse matrices


As we saw in Section 1.4, a system of linear equations can be written as
Ax = b in matrix form. This form resembles one of the simplest linear
equation in one variable ax = b whose solution is simply x = a-lb when
a -:F O. Thus it is tempting to write the solution of the system as x = A-lb.
However, in the case of matrices we first have to have a precise meaning of
A -1. To discuss this we begin with the following definition.

Definition 1. 7 For an m x n matrix A, an n x m matrix B is called a left


inverse of A if BA = In, and an n x m matrix C is called a right inverse
of A if AC = 1m.

Example 1.7 From a direct calculation for two matrices

A= [~ ~ ~ 1 - =~ andB = [ -n'
we have AB = 12, and BA = [-5 2-4] h.9 -2
12 -4
6 -:F
9
Thus, the matrix B is a right inverse but not a left inverse of A, while A is a
left inverse but not a right inverse of B. Since (ABf = BT AT and IT = I,
a matrix A has a right inverse if and only if AT has a left inverse. D

However, if A is a square matrix and has a left inverse, then we prove


later (Theorem 1.8) that it has also a right inverse, and vice versa. Moreover,
the following lemma shows that the left inverses and the right inverses of a
square matrix are all equal. (This is not true for nonsquare matrices, of
course).

Lemma 1.6 If an n x n square matrix A has a left inverse B and a right


inverse C, then Band C are equal, i.e., B = C.
1.6. INVERSE MATRICES 25

Proof: A direct calculation shows that

B = BI = B(AC) = (BA)C = IC = C.
Now any two left inverses must be both equal to a right inverse C, and hence
to each other, and any two right inverses must be both equal to a left inverse
B, and hence to each other. So there exist only one left and only one right
inverse for a square matrix A ifit is known that A has both left and right
inverses. Furthermore, the left and right inverses are equal. 0

This theorem says that if a matrix A has both a right inverse and a left
inverse, then they must be the same. However, we shall see in Chapter 3
that any mxn matrix A with m -=1= n cannot have both a right inverse and
a left inverse: that is, a nonsquare matrix may have only a left inverse or
only a right inverse. In this case, the matrix may have many left inverses or
many right inverses.

Example 1.8 A nonsquare matrix A ~ [~ ~ 1can have more than one


left inverse. In fact, for any x, y E JR, one can easily check that the matrix

B = [~ ~ :] is a left inverse of A. 0

Definition 1.8 An n x n square matrix A is said to be invertible (or


nonsingular) if there exists a square matrix B of the same size such that
AB=I=BA.
Such a matrix B is called the inverse of A, and is denoted by A-I. A matrix
A is said to be singular if it is not invertible.

Note that Lemma 1.6 shows that if a square matrix A has both left and
right inverses, then it must be unique. That is why we call B "the" inverse
of A. For instance, consider a 2x2 matrix A = [~ !]. If ad - be -=1= 0,
then it is easy to verify that
d
A-I _ 1 [d
-! ] [ =
ad- be ad-b
- be 1
- ad - be -e -e a '
ad- be ad- be
26 CHAPTER 1. LINEAR EQUATIONS AND MATRICES

since AA- l = 12 = A-I A. (Check this product of matrices for practice!)


Note that any zero matrix is singular.

Problem 1.13 Let A be an invertible matrix and k any nonzero scalar. Show that
(1) A-I is invertible and (A-1)-1 = A;
(2) the matrix kA is invertible and (kA)-1 = iA-1;
(3) AT is invertible and (AT)-l = (A-I)T.

Theorem 1.7 The product of invertible matrices is also invertible, whose


inverse is the product of the individual inverses in reverse order:

Proof: Suppose that A and B are invertible matrices of the same size. Then
(AB)(B- l A-I) = A(BB-l )A- l = A1A-l = AA- l = 1, and similarly
(B- 1 A-I )(AB) = 1. Thus AB has the inverse B- 1 A-I. 0

We have written the inverse of A as "A to the power -1", so we can


give the meaning of Ak for any integer k: Let A be a square matrix. Define
AD = I. Then, for any positive integer k, we define the power Ak of A
inductively as

Moreover, if A is invertible, then the negative integer power is defined as

It is easy to check that with these rules we have A k+e = Ak A e whenever


the right hand side is defined. (If A is not invertible, A 3+( -1) is defined but
A-I is not.)

Problem 1.14 Prove:


(1) If A has a zero row, so does AB.
(2) If B has a zero column, so does AB.
(3) Any matrix with a zero row or a zero column cannot be invertible.

Problem 1.15 Let A be an invertible matrix. Is it true that (Ak)T = (AT)k for any
integer k? Justify your answer.
1.7. ELEMENTARY MATRICES 27

1.7 Elementary matrices


We now return to the system of linear equations Ax = b. If A has a right
inverse B such that AB = 1m , then x = Bb is a solution of the system since

Ax = A(Bb) = (AB)b = b.
In particular, if A is an invertible square matrix, then it has only one inverse
A-I by Lemma 1.6, and x = A-Ib is the only solution of the system. In
this section, we discuss how to compute A-I when A is invertible.
Recall that Gaussian elimination is a process in which the augmented
matrix is transformed into its row-echelon form by a finite number of ele-
mentary row operations. In the following, we will show that each elementary
row operation can be expressed as a nonsingular matrix, called an elementary
matrix, and hence the process of Gaussian elimination is simply multiplying
a finite sequence of corresponding elementary matrices to the augmented
matrix.

Definition 1.9 A matrix E obtained from the identity matrix In by exe-


cuting only one elementary row operation is called an elementary matrix.

For example, the following matrices are three elementary matrices cor-
responding to each type of the three elementary row operations.

(1) [~ _~ 1 multiply the second row of I2 by -5;

(2)
[ 0~1 ~ 0~1 1
°001 interchange the second and the fourth rows of I 4 ;

(3) [~ ~ ~ 1 add 3 times the third row to the first row of h.


001

It is an interesting fact that, if E is an elementary matrix obtained by


executing a certain elementary row operation on the identity matrix 1m ,
then for any m x n matrix A, the product EA is exactly the matrix that is
obtained when the same elementary row operation in E is executed on A.
The following example illustrates this argument. (Note that AE is not what
we want. For this, see Problem 1.17).
28 CHAPTER 1. LINEAR EQUATIONS AND MATRICES

Example 1.9 For simplicity, we work on a 3 x 1 column matrix b. Suppose


that we want to do the operation "adding (-2) x the first row to the second
row" on matrix b. Then, we execute this operation on the identity matrix
I first to get an elementary matrix E:

E = [-~0 0~ 1~].
Multiplying the elementary matrix E to b on the left produces the desired
result:

Similarly, the operation "interchanging the first and third rows" on the
matrix b can be achieved by multiplying a permutation matrix P, which is
an elementary matrix obtained from fa by interchanging two rows, to b on
the left:

Recall that each elementary row operation has an inverse operation,


which is also an elementary operation, that brings the matrix back to the
original one. Thus, suppose that E denotes an elementary matrix corre-
sponding to an elementary row operation, and let E' be the elementary
matrix corresponding to its "inverse" elementary row operation in E. Then,

(1) if E multiplies a row by c i- 0, then E' multiplies the same row by i;


(2) if E interchanges two rows, then E' interchanges them again;
(3) if E adds a multiple of one row to another, then E' subtracts it back
from the same row.

Thus, for any m x n matrix A, E' EA = A, and E' E = I = EE'. That is,
every elementary matrix is invertible so that E-l = E', which is also an
elementary matrix.
For instance, if
1.7. ELEMENTARY MATRICES 29

Definition 1.10 A permutation matrix is a square matrix obtained from


the identity matrix by permuting the rows.

Problem 1.16 Prove:


(1) A permutation matrix is the product of a finite number of elementary matrices
each of which is corresponding to the "row-interchanging" elementary row
operation.
(2) Any permutation matrix P is invertible and p-l = pT.
(3) The product of any two permutation matrices is a permutation matrix.
(4) The transpose of a permutation matrix is also a permutation matrix.
Problem 1.17 Define the elementary column operations for a matrix by just
replacing "row" by "column" in the definition of the elementary row operations.
Show that if A is an m x n matrix and if E is an elementary matrix obtained by
executing an elementary column operation on In, then AE is exactly the matrix
that is obtained from A when the same column operation is executed on A.

The next theorem establishes some fundamental relationships between


n x n square matrices and systems of n linear equations in n unknowns.

Theorem 1.8 Let A be an n x n matrix. The following are equivalent:


(1) A has a left inverse;
(2) Ax = 0 has only the trivial solution x = 0;
(3) A is row-equivalent to In;
(4) A is a product of elementary matrices;
(5) A is invertible;
(6) A has a right inverse.

Proof: (1)::} (2) : Let x be a solution of the homogeneous system Ax = 0,


and let B be a left inverse of A. Then

x = Inx = (BA)x = BAx = BO = o.


30 CHAPTER 1. LINEAR EQUATIONS AND MATRICES

(2) ~ (3) : Suppose that the homogeneous system Ax = 0 has only the
trivial solution x = 0:

o
= 0

= o.
This means that the augmented matrix [A OJ of the system Ax = 0 is reduced
to the system [In OJ by Gauss-Jordan elimination. Hence, A is row-equivalent
to In.
(3) ~ (4) : Assume A is row-equivalent to In, so that A can be reduced
to In by a finite sequence of elementary row operations. Thus, we can find
elementary matrices E 1, E 2, ... , Ek such that

Since E 1, E 2, ... , Ek are invertible, by multiplying both sides of this equation


on the left successively by E;l, ... , E:;l, Ell, we obtain

A -- E- 1 2 1 ... E-
1 E-
1
k 1n-- E-
1 2 1 ... E-
1 E- k 1'

which expresses A as the product of elementary matrices.


(4) ~ (5) is trivial, because any elementary matrix is invertible. In fact,
A-I = Ek··· E2E1.
(5) ~ (1) and (5) ~ (6) are trivial.
(6) ~ (5) : If B is a right inverse of A, then A is a left inverse of B
and we can apply (1) ~ (2) ~ (3) ~ (4) ~ (5) to B and conclude that B
is invertible, with A as its unique inverse. That is, B is the inverse of A and
so A is invertible. D

This theorem shows that a square matrix is invertible if it has a one-side


inverse. In particular, if a square matrix A is invertible, then x = A- 1 b is a
unique solution to the system Ax = b.

Problem 1.18 Find the inverse of the product

[~o
~ ~]
-c 1
[
~ ~ ~]
-b 0 1
[-! ~ ~].
0 0 1
1.7. ELEMENTARY MATRICES 31

As an application of the preceding theorem, we give a practical method


for finding the inverse A-I of an invertible nxn matrix A. If A is invertible,
there are elementary matrices EI, E 2, ... , Ek such that

Hence,
A-I = Ek··· E2EI = Ek··· E2EIIn.
It follows that the sequence of row operations that reduces an invertible ma-
trix A to In will resolve In to A-I. In other words, let [A I I] be the aug-
mented matrix with the columns of A on the left half, the columns of I
on the right half. A Gaussian elimination, applied to both sides, by some
elementary row operations reduces the augmented matrix [A I I] to [U I K],
where U is a row-echelon form of A. Next, the back substitution process by
another series of elementary row operations reduces [U I K] to [I I A-I]:

[A I I] -t [Ef···E1 A I Ef···EII] = [U I K]
-t [Fk ··· HU I Fk··· FIK] = [I I A-I],
where Ef ··· EI represents a Gaussian elimination and Fk ... FI represents
the back substitution. The following example illustrates the computation of
an inverse matrix.

n.
Example 1.10 Find the inverse of

A~[~ ~
We apply Gauss-Jordan elimination to
2 3 1
0 o 1 (-2)row 1 + row 2
[A I I]
U 3 5
0 2
1 0
0
0 o 1 (-l)row 1 + row 3

[~ ~ 1(-l)row 2
2 3 1 0
-1 -1 -2 1

n
-2 -1 -1 0

[~
2 3 1 0
1 1 2 -1
-2 -1 -1 (2)row 2+ row 3
0
32 CHAPTER 1. LINEAR EQUATIONS AND MATRICES

[
1 2 3 I
o 1 1 I 12 -10 00 1.
o 0 1 I 3 -2 1
This is [U I K] obtained by Gaussian elimination. Now continue the back
substitution to reduce [U I K] to [1 I A-I]

1 (-1)row 3 + row 2
[~
2 3
1 0 0
[UIK] 1 1 2 -1 0
( -3)row 3 + row 1
0 1 3 -2 1

[~
2 0
1 0
-8
-1
6
1 -1-3] (-2)row 2 + row 1
0 1 3 -2 1

[~ 1=
0 0 -6 4
1 0 -1 1 -1
-1 [II A- I J.
0 1 3 -2 1

Thus, we get

A-l~ [=~ _! =:]


(The reader should verify that AA -1 = 1 = A-I A .) o

:h:
Note that if A is not invertible, then, at some step in Gaussian elimina-

::::~ :: '[OW rl~ Sh~~ lU~sO,::~:q:;:a:::: :: I[U~I ~~ F~~elxa:~:


-1 2 5 0 0 0
noninvertible matrix.

Problem 1.1 9 Write A -1 as a product of elementary matrices for A in Example 1.10.

of A by using Gaussian elimination.

Problem 1.20 Find the inverse of each of the following matrices:


0 0

nc~ [l ~ 1(k~O)
1 0 0
-1
A~ [ -:
-6
0
4 11
q'B~ 1 2 0
1 2 4
1 2 4
k 0
1 k
0 1
1.B. LDU FACTORIZATION 33

Problem 1.21 When is a diagonal matrix D = [


d1
'.
0 1nonsingular, and
o dn
what is D-1?

From Theorem 1.8, a square matrix A is nonsingular if and only if Ax = 0


has only the trivial solution. That is, a square matrix A is singular if and
only if Ax = 0 has a nontrivial solution, say Xo. Now, for any column vector
b = [b l ... bn]T, if Xl is a solution of Ax = b for a singular matrix A, then
so is kxo + Xl for any k:

A(kxo + xd = k(Axo) + AXI = kO +b = b.

This argument strengthens Theorem 1.5 as follows when A is a square


matrix:

Theorem 1.9 If A is an invertible n x n matrix, then for any column vector


b = [b l ... bn]T, the system Ax = b has exactly one solution x = A-lb.
If A is not invertible, then the system has either no solution or infinitely
many solutions according to whether or not the system is inconsistent. D

Problem 1.22 Write the system of linear equations


X + 2y + 2z = 10
{ 2x - 2y + 3z = 1
4x - 3y + 5z = 4
in matrix form Ax = b and solve it by finding A -1 b.

1.8 LDU factorization


Recall that a basic method of solving a linear system Ax = b is by Gauss-
Jordan elimination. For a fixed matrix A, if we want to solve more than one
system Ax = b for various values of b, then the same Gaussian elimination
on A has to be repeated over and over again. However, this repetition may
be avoided by expressing Gaussian elimination as an invertible matrix which
is a product of elementary matrices.
We first assume that no permutations of rows are necessary throughout
the whole process of Gaussian elimination on [A b]. Then the forward elim-
ination is just to multiply finitely many elementary matrices Ek, ... , EI to
the augmented matrix [A b]: that is,
34 CHAPTER 1. LINEAR EQUATIONS AND MATRICES

where each Ei is a lower triangular elementary matrix whose diagonal entries


are all I's and [U c] is the augmented matrix of the system obtained after
forward elimination on Ax = b (Note that U need not be an upper triangular
matrix if A is not a square matrix). Therefore, if we set L = (Ek ... Ed- l =
Ell ... Ei:l, then A = LU anc.
c = Ux = E k ··· ElAx = E k ··· Elb = L-lb.
Note that L is a lower triangular matrix whose diagonal entries are alII's (see
Problem 1.24). Now, for any column matrix b, the system Ax = LUx = b
can be solved in two steps: first compute c = L-lb which is a forward
elimination, and then solve U x = c by the back substitution.
This means that, to solve the €-systems Ax = b i for i = 1, ... , €, we
first find the matrices L and If such that A = LU by performing forward
elimination on A, and then compute Ci = L -1 b i for i = 1, ... , €. The
solutions of Ax = b i are now those of Ux = Ci.
Example 1.11 Consider the system of linear equations

Ax = [ ~ ~ ~ ~ 1[ ~~ 1 [ -~ 1
-2 2 1 1 X3 7
= b.

The elementary matrices for Gaussian elimination of A are easily found


to be

oo 1,E2 = [10 01 ~ 1,and [~ ~ ~ 1'


1 0
-2 1 E3 =
o 0 1 1 0 1 0 3 1
so that

Note that U is the matrix obtained from A after forward elimination, and
A = LU with

which is a lower triangular matrix with I's on the diagonal. Now, the system
1
Lc=b: -2
7
1.B. LDU FACTORIZATION 35

resolves to c = (1, -4, -4) and the system

1
Ux=c: -4
-4

resolves to

x=
[
~: 3~ 1 [ -~ 1+ [~ 1
- 1- t 1 t -1 '
t 0 1
for t E R It is suggested that the readers find the solutions for various values
ofb. 0

Problem 1.23 Determine an LU decomposition of the matrix

A = [-~o -1-~ -~2 1'


and then find solutions of Ax = b for (1) b = [lllV and (2) b = [20 - IV.
Problem 1.24 Let A, B be two lower triangular matrices. Prove that
(1) their product is also a lower triangular matrix;
(2) if A is invertible, then its inverse is also a lower triangular matrix;
(3) if the diagonal entries are all l's, then the same holds for their product and
their inverses.
Note that the same holds for upper triangular matrices, and for the product of more
than two matrices.

Now suppose that A is a nonsingular square matrix with A = LU in


which no row interchanges were necessary. Then the pivots on the diagonal
of U are all nonzero, and the diagonal of L are all 1 'so Thus, by dividing
each i-th row of U by the nonzero pivot di , the matrix U is factorized into a
diagonal matrix D whose diagonals are just the pivots d l , d2, ... , dn and
a new upper triangular matrix, denoted again by U, whose diagonals are all
l's so that A = LDU. For example,

o 1 rld l
o o 1
u o
o dn o dn o
36 CHAPTER 1. LINEAR EQUATIONS AND MATRICES

This decomposition of A is called the LDU factorization of A. Note that,


in this factorization, U is just a row-echelon form of A (with leading l's on
the diagonal) after Gaussian elimination and before back substitution.
In Example 1.11, we found a factorization of A as

A= [
-1
~ -3~ ~]1 [~0 -~0-4-~].
This can be further factored as A = LDU by taking

[
~ _~ _~] = [~_~ ~] [~
o 0 -4 0 0 -4 0
1i2
0
1~12] = DU.

Suppose now that during forward elimination row interchanges are nec-
essary. In this case, we can first do all the row interchanges before doing any
other type of elementary row operations, since the interchange of rows can be
done at any time, before or after the other operations, with the same effect
on the solution. Those "row-interchanging" elementary matrices altogether
form a permutation matrix P so that no more row interchanges are needed
during Gaussian elimination of P A. So P A has an LDU factorization.

Example 1.12 Consider a square matrix A = [~ i ~]. For Gaussian

elimination, it is clearly necessary to interchange the first row Wi[tho~th~ th~lir]d

row, that is, we need to multiply the permutation matrix P = 1


o
to A so that

o
Of course, if we choose a different permutation pI, then the LDU fac-
torization of pI A may be different from that of P A, even if there is an-
other permutation matrix pI! that changes pI A to PA. However, if we fix
a permutation matrix P when it is necessary, the uniqueness of the LDU
factorization of A can be proved.
1.B. LDU FACTORIZATION 37

Theorem 1.10 For an invertible matrix A, the LDU factorization of A is


unique up to a permutation: that is, for a fixed P the expression P A = LDU
is unique.

Proof: Suppose that A = L 1D 1U1 = L2D2U2, where the L's are lower
triangular, the U's are upper triangular, all with l's on the diagonal, and
the D's are diagonal matrices with no zeros on the diagonal. We need to
show Ll = L2, Dl = D2, and U1 = U2.
Note that the inverse of a lower (upper) triangular matrix is also a lower
(upper) triangular matrix. And the inverse of a diagonal matrix is also
diagonal. Therefore, by multiplying (LID1)-1 = D11 L11 on the left and
U:;1 on the right, our equation L 1D 1U1 = L2D2U2 becomes

The left side is an upper triangular matrix, while the right side is a lower
triangular matrix. Hence, both sides must be diagonal. However, since the
diagonal entries of the upper triangular matrix Ul U:;1 are all 1's, it must be
the identity matrix I (see Problem 1.24). Thus U1U:;1 = I, i.e., U1 = U2.
Similarly, L11 L2 = DID21 implies that Ll = L2 and Dl = D2. 0

In particular, if A is symmetric (i.e., A = AT), and if it can be factored


into A = LDU without row interchanges, then we have

and thus, by the uniqueness of factorizations, we have U = LT and A =


LDLT.

Probl,m 1.25 Find th, facto", L, D, and U fm A ~ [ - ~ ~~ - ~ 1

f1T
What is the solution to Ax = b for b = [10 - l]T ?

:;:70:~6~Tr p',mutation matri,,, P, find the LDU factm~ation


38 CHAPTER 1. LINEAR EQUATIONS AND MATRICES

1.9 Application: Linear models


(1) In an electrical network, a simple current flow may be illustrated by a
diagram like the one below. Such a network involves only voltage sources,
like batteries, and resistors, like bulbs, motors, or refrigerators. The voltage
is measured in volts, the resistance in ohms, and the current flow in amperes
(amps, in short). For such an electrical network, current flow is governed by
the following three laws:

• Ohm's Law: The voltage drop V across a resistor is the product of


the current I and the resistance R: V = I R.
• Kirchhoff's Current Law (KCL): The current flow into a node
equals the current flow out of the node.
• Kirchhoff's Voltage Law (KVL): The algebraic sum of the voltage
drops around a closed loop equals the total voltage sources in the loop.

Example 1.13 Determine the currents in the network given in the above
figure.

2 ohms 2 ohms
P

3 ohms 2 ohms 18 volts

Q
1 ohms 1 ohms

Solution: By applying KCL to nodes P and Q, we get equations

h + h h at P,
h h + h at Q.
Observe that both equations are the same, and one of them is redundant.
By applying KVL to each of the loops in the network clockwise direction,
1.9. APPLICATION: LINEAR MODELS 39

we get

6h + 212 = 0 from the left loop,


212 + 313 = 18 from the right loop.

Collecting all the equations, we get a system of linear equations:

13 = o
o
18.

By solving it, the currents are h = -1 amp, 12 = 3 amps and fa =


4 amps. The negative sign for h means that the current h flows in the
direction opposite to that shown in the figure. 0

Problem 1.27 Determine the currents in the following networks.

(1) (2)

40 ohms 20 volts 1 ohm

30 ohms

5 volts

40 ohms 40 volts 4 volts

(2) Cryptography is the study of sending messages in disguised form


(secret codes) so that only the intended recipients can remove the disguise
and read the message; modern cryptography uses advanced mathematics.
As another application of invertible matrices, we introduce a simple coding.
Suppose we associate a prescribed number with every letter in the alphabet;
for example,

ABC D x Y Z Blank ?
1 1 1 1 1 1 1 1 1 1
o 1 2 3 23 24 25 26 27 28.
40 CHAPTER 1. LINEAR EQUATIONS AND MATRICES

Suppose that we want to send the message "GOOD LUCK". Replace


this message by
6, 14, 14, 3, 26, 11, 20, 2, 10
according to the preceding substitution scheme. A code of this type could be
cracked without difficulty by a number of techniques of statistical methods,
like the analysis of frequency of letters. To make it difficult to crack the code,
we first break the message into six vectors in ]R3, each with 3 components
(optional), by adding extra blanks if necessary:

Next, choose a nonsingular 3 x 3 matrix A, say

1 0 0
A= [ 2 1 0
1,
111

which is supposed to be known to both sender and receiver. Then as a linear


transformation A translates our message into

By putting the components of the resulting vectors consecutively, we trans-


mit
6, 26, 34, 3, 32, 40, 20, 42, 32.
To decode this message, the receiver may follow the following process.
Suppose that we received the following reply from our correspondent:

19, 45, 26, 13, 36, 41.

To decode it, first break the message into two vectors in ]R3 as before:
1.9. APPLICATION: LINEAR MODELS 41

We want to find two vectors Xl, x2 such that AXi is the i-th vector of the
above two vectors: z. e.,

Since A is invertible, the vectors Xl, X2 can be found by multiplying the


inverse of A to the two vectors given in the message. By an easy computation,
one can find
1
-2
1
Therefore,

Xl = [-~1 -1~ ~1 1[ 26!~ 1 [ 1~°1


The numbers one obtains are

19, 7, 0, 13, 10, 18.

Using our correspondence between letters and numbers, the message we have
received is "THANKS".

Problem 1.28 Encode "TAKE UFO" using the same matrix A used in the above
example.

(3) Another significant application of linear algebra is to a mathematical


model in economics. In most nations, an economic society may be divided
into many sectors that produce goods or services, such as the automobile
industry, oil industry, steel industry, communication industry, and so on.
Then a fundamental problem in economics is to find the equilibrium of the
supply and the demand in the economy.
There are two kinds of demands for goods: the intermediate demand
from the industries themselves (or the sectors) that are needed as inputs for
their own production, and the extra demand from the consumer, the gov-
ernmental use, surplus production, or exports. Practically, the interrelation
between the sectors is very complicated, and the connection between the
42 CHAPTER 1. LINEAR EQUATIONS AND MATRICES

extra demand and the production is unclear. A natural question is whether


there is a production level such that the total amounts produced (or supply)
will exactly balance the total demand for the production, so that the equality
{Total output} = {Total demand}
{Intermediate demand} + {Extra demand}
holds. This problem can be described by a system of linear equations, which
is called the Leontief Input-Output Model. To illustrate this, we show a
simple example.
Suppose that a nation's economy consists of three sectors: h = automo-
bile industry, 12 = steel industry, and Is = oil industry.
Let x = [Xl X2X3V denote the production vector (or production level) in
]R3, where each entry Xi denotes the total amount (in a common unit such as
"dollars" rather than quantities such as "tons" or "gallons") of the output
that the industry Ii produces per year.
The intermediate demand may be explained as follows. Suppose that,
for the total output X2 units of the steel industry 12, 20% is contributed by
the output of h, 40% by that of hand 20% by that of 13. Then we can
write this as a column vector, called a unit consumption vector of 12:

0.2]
C2 = [ 0.4 .
0.2

For example, if h decides to produce 100 units per year, then it will order (or
demand) 20 units from h, 40 units from h, and 20 units from Is: i.e., the
consumption vector of h for the production X2 = 100 units can be written as
a column vector: 100c2 = [2040 20]T. From the concept of the consumption
vector, it is clear that the sum of decimal fractions in the column C2 must
be ~ 1.
In our example, suppose that the demands (inputs) of the outputs are
given by the following matrix, called an input-output matrix:
output
h h Is
1, [
A = input h 0.1
0.3 0.2
0.4
03]
0.1 .
13 0.3 0.2 0.3
i i i
CI C2 C3
1.9. APPLICATION: LINEAR MODELS 43

In this matrix, an industry looks down a column to see how much it needs
from where to produce its total output, and it looks across a row to see how
much of its output goes to where. For example, the second row says that,
out of the total output X2 units of the steel industry h, as the intermediate
demand, the automobile industry h demands 10% of the output Xl, the steel
industry 12 demands 40% of the output X2 and the oil industry 13 demands
10% of the output X3. Therefore, it is now easy to see that the intermediate
demand of the economy can be written as

0.3 0.2 0.3] [Xl] [ 0.3XI + 0.2X2 + 0.3X3]


Ax = [ 0.1 0.4 0.1 X2 = 0.1x1 + 0.4x2 + 0.lx3 .
0.3 0.2 0.3 X3 0.3XI + 0.2X2 + 0.3X3
Suppose that the extra demand in our example is given by d = [d l , d2 , d3]T =
[30,20, 10]T. Then the problem for this economy is to find the production
vector x satisfying the following equation:

x = Ax+d.

Another form of the equation is (I - A)x = d, where the matrix I - A


is called the Leontief matrix. If I - A is not invertible, then the equation
may have no solution or infinitely many solutions depending on what d is. If
I -A is invertible, then the equation has the unique solution x = (I -A)-ld.
Now, our example can be written as

Xl] [0.3 0.2 0.3]


0.1 0.4 0.1 [ Xl] + [ 30
20 ] .
[ X2 = X2
X3 0.3 0.2 0.3 X3 10

In this example, it turns out that the matrix I - A is invertible and

2.0 1.0 1.0]


(I - A)-l = [ 0.5 2.0 0.5 .
1.0 1.0 2.0

Therefore,

x = (1 ~ A)-ld = [ ~ ] ,
which gives the total amount of product Xi of the industry Ii for one year to
meet the required demand.
44 CHAPTER 1. LINEAR EQUATIONS AND MATRICES

Remark: (1) Under the usual circumstances, the sum of the entries in a
column of the consumption matrix A is less than one because a sector should
require less than one unit's worth of inputs to produce one unit of output.
This actually implies that 1 - A is invertible and the production vector x is
feasible in the sense that the entries in x are all nonnegative as the following
argument shows.
(2) In general, by using induction one can easily verify that for any
k = 1,2, ... ,

If the sums of column entries of A are all strictly less than one, then
limk_oo Ak = 0 (see Section 6.6 for the limit of a sequence of matrices).
Thus, we get (1 - A)(I + A + ... + Ak + ... ) = 1, that is,

(1 - A)-l = 1 + A + ... + Ak + ....


This also shows a practical way of computing (1 - A)-l since by taking k
sufficiently large the right side may be made very close to (1 - A)-I. In
Chapter 6, an easier method of computing Ak will be shown.
In summary, if A and d have nonnegative entries and if the sum of the
entries of each column of A is less than one, then 1 - A is invertible and the
inverse is given as the above formula. Moreover, as the formula shows the
entries of the inverse are all nonnegative, and so are those of the production
vector x = (1 - A)-ld.

Problem 1.29 Determine the total demand for industries 11 ,12 and Is for the input-
output m[atg~ Ao~~d ~~; jextra demand vector d given below:

A= 0.5 0.1 0.6 with d = o.


0.4 0.2 0.2

Problem 1.30 Suppose that an economy is divided into three sectors: h = services,
12 = manufacturing industries, and Is = agriculture. For each unit of output, h
demands no services from h, 0.4 units from h, and 0.5 units from Is. For each unit
of output, h requires 0.1 units from sector II of services, 0.7 units from other parts
in sector h, and no product from sector Is. For each unit of output, Is demands
0.8 units of services II, 0.1 units of manufacturing products from 12 , and 0.1 units
of its own output from 13 • Determine the production level to balance the economy
when 90 units of services, 10 units of manufacturing, and 30 units of agriculture are
required as the extra demand.
1.10. EXERCISES 45

1.10 Exercises
1.l. Which of the following matrices are in row-echelon form or in reduced row-

n
echelon form?

[~ -3]
0 0 0

[~
0 0 0
0 1 0
A= 0 1 0 4 , B=
0 0 1
0 0 1 2
0 0 0
0 0 0

[~ -~ 1 D~[~
-n'
1 0 0
0 1 2
c= 0 0 1 o ' 0 1 1

n [~ -tJ
0 0 1
0 0 0 0
0 0 0
1 0 0
E~[~ 0 1
1 0 -2
0 F=
1
0
0
0
0
1
0 0 0

n
1.2. Find a row-echelon form of each matrix.
1 2 3 4 5
-3 2 1
2 3 4 5 1
(I) [ 1-9 10 2
-6
-6 8 1
4 2 (2)
3 4
4 5
5
1
1
2
2
3
5 1 2 3 4
1.3. Find the reduced row-echelon form of the matrices in Exercise 1.2.
1.4. Solve the systems of equations by Gauss-Jordan elimination.
Xl + X2 + X3 X4 -2
2Xl X2 + X3 + X4 0
(I) { 3Xl + 2X2 X3 X4 1
Xl + X2 + 3X3 3X4 -8.
2x 3y 8
(2) { 4x 5y + z 15
2x + 4z 1.

What are the pivots in each elimination step?


1.5. Which of the following systems has a nontrivial solution?
X + 2y + 3z = 0 {2X + y - z 0
(1) { 2y + 2z = 0 (2) x - 2y - 3z 0
x + 2y + 3z = O. 3x + y - 2z o.
1.6. Determine all values of the bi that make the following system consistent:
X + y - z bl
{ 2y + Z b2
Y - Z b3 .
46 CHAPTER 1. LINEAR EQUATIONS AND MATRICES

1. 7. Determine the condition on bi so that the following system has no solution:


2x + y + 7z bI
{ 6x - 2y + Hz b2
2x - y + 3z b3 .

1.8. Let A and B be matrices of the same size.


(1) Show that, if Ax = 0 for all x, then A is the zero matrix.
(2) Show that, if Ax = Bx for all x, then A = B.

U],
1.9. Compute ABC and CAB, for

A ~ [; -i : l' B ~ c~ [1 -1 l,
1.10. Prove that if A is a 3 x 3 matrix such that AB = BA for every 3 x 3 matrix
B, then A = kI3 for some constant k.

1.11. Let A = [~
001
i ~]. Find Ak for all integers k.

n' n'
1.12. Compute (2A - B)C and CC T for

A ~ [~ ~ B ~ -~ ~ [ c ~[ -2
~ 2~ ~].
1
1.13. Let f(x) = anx n +an_IX n - I + .. ·+alx+aO be a polynomial. For any square
matrix A, a matrix polynomial f(A) is defined as f(A) = anAn +an_lAn-l +
... + alA + aoI. For f(x) = 3x3 + x 2 - 2x + 3, find f(A) for

(1) A = [-~o 0~ 5~], (2) A = [~0 -~0 -i].


3
1.14. Find the symmetric part and the skew-symmetric part of each of the following

U n [~ -n
matrices.
3 3
(1) A = 5 (2) A = 2

n
3 0

[l
-1 0
1.15. Find AAT and AT A for the matrix A = 1 3
8 4
1.10. EXERCISES 47

(1) Fmd a mat,", B 'uch that AB ~ [~ :].

(2) Find a matrix C such that AC = A2 + A.

1.17. Find all possible choices of a, band c so that A = [~ ~] has an inverse


matrix such that A-I = A.
1.18. Decide whether or not each of the following matrices is invertible. Find the
inverses for invertible ones.

n [~ n [~ -n
2 3

[~
1 2
2 3
A= 0 3
B= 2 C= 2
5 2
0 0
1.19. Suppose A is a 2 x 1 matrix and B is a 1 x 2 matrix. Prove that the product
AB is not invertible.

1.20. Find three matrices which are row equivalent to A = [~ - ~ ~


5 2 -3
- i ].
4
1.21. Write the following systems of equations as matrix equations Ax = band
solve them by computing A-Ib:
2XI - X2 + 3X3 = 2 { Xl - X2 + X3 5
(1) { X2 - 4X3 = 5 (2) Xl + X2 X3 -1
2XI + X2 - 2X3 = 7, 4XI - 3X2 + 2X3 -3 .
1.22. Find the LDU factorization for each of the following matrices:

(1) A = [~ ~], (2) A = [~ ~].

n
1.23. Find the LDLT factorization of the following symmetric matrices:

(1) A ~ [~ L~]' (2) A ~ [:

1.24. Solve Ax = b with A = LU, where Land U are given as

L= [ - ~ -1~ ~], U= [~0 - 0~ - ~1 ], b = [ -~4 ].

n ~n
o 1
Forward elimination is the same as Lc = b, and back-substitution is Ux = c.

1.25 Urt A ~ [: : and b [


48 CHAPTER 1. LINEAR EQUATIONS AND MATRICES

(1) Solve Ax = b by Gauss-Jordan elimination.


(2) Find the LDU factorization of A.
(3) Write A as a product of elementary matrices.
(4) Find the inverse of A.

1.26. A square matrix A is said to be nilpotent if Ak = 0 for a positive integer k.


(1) Show that an invertible matrix is not nilpotent.
(2) Show that any triangular matrix with zero diagonal is nilpotent.
(3) Show that if A is a nilpotent with Ak = 0, then I - A is invertible with
its inverse I + A + ... + Ak-l.
1.27. A square matrix A is said to be idempotent if A2 = A.
(1) Find an example of an idempotent matrix other than 0 or I.
(2) Show that, if a matrix A is both idempotent and invertible, then A = I.

1.28. Determine whether the following statements are true or false, in general, and
justify your answers.
(1) Let A and B be row-equivalent square matrices. Then A is invertible if
and only if B is invertible.
(2) Let A be a square matrix such that AA = A. Then A is the identity.
(3) If A and B are invertible matrices such that A2 = I and B2 = I, then
(AB)-l = BA.
(4) If A and B are invertible matrices, A + B is also invertible.
(5) If A, Band AB are symmetric, then AB = BA.
(6) If A and B are symmetric and the same size, then AB is also symmetric.
(7) Let ABT = I. Then A is invertible if and only if B is invertible.
(8) If a square matrix A is not invertible, then neither is AB for any B.
(9) If El and E2 are elementary matrices, then EIE2 = E 2 E 1 .
(10) The inverse of an invertible upper triangular matrix is upper triangular.
(11) Any invertible matrix A can be written as A = LU, where L is lower
triangular and U is upper triangular.
(12) If A is invertible and symmetric, then A-I is also symmetric.
Chapter 2

Determinants

2.1 Basic properties of determinant


Our primary interest in Chapter 1 was in the solvability or solutions of a
system Ax = b of linear equations. For an invertible matrix A, Theorem 1.8
shows that the system has a unique solution x = A-I b for any b.
Now the question is how to decide whether or not a square matrix A
is invertible. In this section, we introduce the notion of determinant as
a real-valued function of square matrices that satisfies certain axiomatic
rules, and then show that a square matrix A is invertible if and only if the
determinant of A is not zero. In fact, we saw in Chapter 1 that a 2 x 2
matrix A = [~ ~ 1is invertible if and only if ad - be =I O. This number is
called the determinant of A, and is defined formally as follows:

Definition 2.1 For a 2 x 2 matrix A = [~ ~ 1E M 2x2 (lR), the deter-


minant of A is defined as det A = ad - be.

In fact, it turns out that geometrically the determinant of a 2 x 2 matrix


A represents, up to sign, the area of a parallelogram in the xy-plane whose
edges are constructed by the row vectors of A (see Theorem 2.9), so it will
be very nice if we can have the same idea of determinant for higher order
matrices. However, the formula itself in Definition 2.1 does not provide any
clue of how to extend this idea of determinant to higher order matrices.
Hence, we first examine some fundamental properties of the determinant
function defined in Definition 2.1.

49
50 CHAPTER 2. DETERMINANTS

By a direct computation, one can easily verify that the function det in
Definition 2.1 satisfies the following three fundamental properties:

[~ ~]
( 1 ) det 1.

(2) det [ : ~] be - ad = - (ad - be) = - det [~ ~].


(3) det [ ka: fa' kb ~ fb' ] = (ka + fa')d _ (kb + fb')e
= k(ad - be) + f(a'd - b'e)

= k det [~ ~] + f det [ : ~].


Actually all the important properties of the determinant function can be
derived from these three properties. We will show in Lemma 2.3 that if a
function f : M2x2(1R) -> IR satisfies the properties (1), (2) and (3) above, then
it must be of the form f(A) = ad - be. An advantage of looking at these
properties of the determinant rather than looking at the explicit formula
given in Definition 2.1 is that these three properties enable us to define the
determinant function for any n x n square matrices.

Definition 2.2 A real-valued function f : Mnxn(lR) -> IR of all n x n square


matrices is called a determinant if it satisfies the following three rules:
(R 1 ) the value of f of the identity matrix is 1, i.e., f(In) = 1;
(R2) the value of f changes sign if any two rows are interchanged;
(R3) f is linear in the first row: that is, by definition,

f ( krl ~. frIll = kf (~~ 1 +ff (~~ 1


,
rn rn rn

where ri's denote the row vectors of a matrix.

It is already shown that the det on 2 x 2 matrices satisfies these rules.


We will show later that for each positive integer n there always exists such
a function f : Mnxn(lR) -> IR satisfying the three rules in the definition, and,
moreover, it is unique. Therefore, we say "the" determinant and designate
it as "det" in any order.
2.1. BASIC PROPERTIES OF DETERMINANT 51

Let us first derive some direct consequences of the three rules in the
definition (the readers are suggested to verify that det of 2 x 2 matrices also
satisfies the following properties):

Theorem 2.1 The determinant satisfies the following properties.


(1) The determinant is linear in each row, i.e., for each row the rule (R3)
also holds.
(2) If A has either a zero row or two identical rows, then det A = o.
(3) The elementary row operation that adds a constant multiple of one row
to another row leaves the determinant unchanged.

Proof: (1) Any row can be placed in the first row with a change of sign in
the determinant by rule (R2), and then use rules (R3) and (R2).
(2) If A has a zero row, then the row is zero times the zero row. If A
has two identical rows, then interchanging these identical rows changes only
the sign of the determinant, but not A itself. Thus we get det A = - det A.
(3) By a direct computation using (1), we get

f =f +kf

in which the second term on the right side is zero by (2). o

It is now easy to see the effect of elementary row operations on evaluations


of the determinant. The first elementary row operation that "multiplies a
constant k to a row" changes the determinant to k times the determinant
by (1) of Theorem 2.1. The rule (R2) in the definition explains the effect
of the elementary row operation that "interchanges two rows". The last
elementary row operation that "adds a constant multiple of a row to another"
is explained in (3) of Theorem 2.1.

Example 2.1 Consider a matrix

!
1
A= [ b
b+c c+a
52 CHAPTER 2. DETERMINANTS

If we add the second row to the third, then the third row becomes
[a + b+ c a +b+c a + b + c],
which is a scalar multiple of the first row. Thus, det A = o. D

Problem 2.1 Show that, for an n x n matrix A and k E JR., det(kA) = k n det A.

Problem 2.2 Explain why det A = 0 for


a+1 a+4 a+7]
(1) A = [ a + 2 a + 5 a + 8 ,
a+3 a+6 a+9

Recall that any square matrix can be transformed to an upper triangular


matrix by forward eliminations. Further properties of the determinant are
obtained in the following theorem.

Theorem 2.2 The determinant satisfies the following properties.


(1) The determinant of a triangular matrix is the product of the diagonal
entries.
(2) The matrix A is invertible if and only if det A =1= O.
(3) For any two n x n matrices A and B, det(AB) = detA detB.

(4) detAT = detA.

Proof: (1) If A is a diagonal matrix, then it is clear that det A = all ... ann
by (1) of Theorem 2.1 and rule (Rd. Suppose that A is a lower triangular
matrix. Then a forward elimination, which does not change the determinant,
produces a zero row if A has a zero diagonal entry, or makes A row equivalent
to the diagonal matrix D whose diagonal entries are exactly those of A if
the diagonal entries are all nonzero. Thus, in the former case, det A = 0
and the product of the diagonal entries is also zero. In the latter case,
det A = det D = au·· . ann. Similar arguments apply when A is an upper
triangular matrix.
(2) Note again that a forward elimination reduces a square matrix A to
an upper triangular matrix, which has a zero row if A is singular and has no
zero row if A is nonsingular (see Theorem 1.8).
(3) If A is not invertible, then AB is not invertible, and so det(AB) =
o = det A det B. By the properties of the elementary matrices, it is clear
that for any elementary matrix E, det(EB) = det E det B. If A is invertible,
2.1. BASIC PROPERTIES OF DETERMINANT 53

it can be written as a product of elementary matrices, say A = E1E2 ... Ek.


Then by induction on k, we get

det(AB) det(E1 E2··· EkB)


det E1 det E2 ... det Ek det B
det(E1 E 2 ··· E k ) det B
detAdetB.

(4) Clearly, A is not invertible if and only if AT is not. Thus for a


singular matrix A we have det AT = 0 = det A. If A is invertible, then there
is a factorization P A = LDU for a permutation matrix P. By (3), we get

det P det A = det L det D det U.

Note that the transpose of PA = LDU is AT p T = U T DT LT and that for any


triangular matrix B, detB = detB T by (1). In particular, since L, U, L T ,
and UT are triangular with 1 's on the diagonal, their determinants are all
equal to 1. Therefore, we have

detAT detpT det U T det DT det LT


det L det D det U = det A det P.
By the definition, a permutation matrix P is obtained from the identity
matrix by a sequence of row interchanges: that is, P = Ek ... E 1 I n for some
k, where each Ei is an elementary matrix obtained from the identity matrix
by interchanging two rows. Thus, det Ei = -1 for each i = 1, ... , k, and
clearly ET = Ei = E;l. Therefore, detP = (_l)k = detp T by (3), so
det A = det AT. D

Remark: From the equality det A = det AT, we could define the determi-
nant in terms of columns instead of rows in Definition 2.2, and Theorem 2.1
is also true with "columns" instead of "rows".

Example 2.2 Evaluate the determinant of the following matrix A:

2 -4 o 0
[ 1 -3 o 1
A= 1 0 -1 2
3 -4 3 -1
54 CHAPTER 2. DETERMINANTS

Solution: By using forward elimination, A can be transformed to an


upper triangular matrix U. Since the forward elimination does not change
the determinant, the determinant of A is simply the product of the diagonal
entries of U:

o2 -4
-1
0
0
0
1
1
det A = det U = det r 0 0 -1 4
o 0 0 13
2· (_1)2 ·13 = 26. o
Problem 2.3 Prove that if A is invertible, then det A -1 = 1/ det A.
Problem 2.4 Evaluate the determinant of each of the following matrices:

(1) [_; ~ i 1' [~: ~~ ~: H], [i: :: x~ :~].


2 2 3
(2)
41 42 43 44
(3)
x x2 x3 1

2.2 Existence and uniqueness


Recall that det A = ad - be defined in the previous section satisfies the
three rules of Definition 2.2. Conversely, the following lemma shows that
any function of M2x2 (lR) into lR satisfying the three rules (R 1 ) - (R3) of
Definition 2.2 must be det, which implies the uniqueness of the determinant
function on M2x2(lR).
Lemma 2.3 If f : M2x 2(lR) ~ lR satisfies the three rules in Definition 2.2,
then f(A) = ad - be.

Proof: First, note that f [~ ~] = -1 by the rules (R 1 ) and (R2).

f(A) = f [~ !] = f [: + +! ]
0 0

f[~ ~]+f[~!]
f[~ ~]+f[~ ~]+f[~ !]+f[~ ~]
ad+ 0+ 0 - be. o
2.2. EXISTENCE AND UNIQUENESS 55

Therefore, when n = 2 there is only one function f on M 2x2 (lR) which


satisfies the three rules: i.e., f = det.
Now for n = 3, the same calculation as in the case of n = 2 can be
applied. That is, by repeated use of the three rules (Rl) - (R3) as in the
proof of Lemma 2.3, we can obtain the explicit formula for the determinant
function on M3x3 (lR) as follows:

det
[all
a21
a12
a22
a13]
a23
a31 a32 a33

q+det [ ~ 01 [ 0
Tl
0 0
= det
[ all~ a22
a12
0 a~3 + det a~l 0
0 a33 a31 0 a32

oo 0 1 [0
~ 1+ det [ ~ Tl
0
+det
[ all~ a23 + det a21
a120 a22
a32 0 0 0 a33 a31 0
all a22 a 33 + a12 a 23 a 31 + a13 a 21 a32 - all a23 a 32 - a12 a 21 a33 - a13 a 22 a 31·

This expression of det A for a matrix A E M3x3(lR) satisfies the three


rules. Therefore, for n = 3, it shows both the uniqueness and the existence
of the determinant function on M3x 3(lR).

Problem 2.5 Show that the above formula of the determinant for 3 x 3 matrices
satisfies the three rules in Definition 2.2.

To get the formula of the determinant for matrices of order n > 3, the
same computational process can be repeated using the three rules again, but
the computation is going to be more complicated as the order gets higher.
To derive the explicit formula for det A of order n > 3, we examine the above
case in detail. In the process of deriving the explicit formula for det A of a
3 x 3 matrix A, we can observe the following three steps:
(1st) By using the linearity of the determinant function in each row,
det A of a 3 x 3 matrix A is expanded as the sum of the determinants of
3 3 = 27 matrices. Except for exactly six matrices, all of them have zero
columns so that their determinants are zero (see the proof of Lemma 2.3).
(2nd) In each of these remaining six matrices, all entries are zero except
for exactly three entries that came from the given matrix A. Indeed, no two
of the three entries came from the same column or from the same row of A.
56 CHAPTER 2. DETERMINANTS

In other words, in each row there is only one entry that came from A and
at the same time in each column there is only one entry that came from A.
Actually, in each of the six matrices, the three entries from A, say aij,
ak£, and a pq , are chosen as follows: If the first entry aij is chosen from the
first row and the third column of A, say a13, then the other entries ak£ and
a pq in the product should be chosen from the second or the third row and
the first or the second column. Thus, if the second entry akl is taken from
the second row, the column it belongs to must be either the first or the
second, i.e., either a21 or a22. If a21 is taken, then the third entry a pq must
be, without option, a32. Thus, the entries from A in the chosen matrix are
a13, a21, and a32. Therefore, the three entries in each of the six remaining
matrices are determined as follows: when the row indices (the first indices i
of aij) are arranged in the order 1, 2, 3, the assignment of the column indices
1, 2, 3 (the second indices j of aij) to each of the row indices is simply a
re-arrangement of 1, 2, 3 without repetitions or omissions. In this way, one
can recognize that the number 6 = 3! is simply the number of ways in which
the three column indices 1, 2, 3 are rearranged.
(3rd) The determinant of each of the six matrices may be computed by
converting it into a diagonal matrix using suitable "column interchanges"
(see Theorem 2.2 (1)), so its determinant becomes ±aijaklapq, where the
sign depends on the number of column interchanges.
For example, for the matrix having entries a13, a22, and a31 from A,
one can convert this matrix into a diagonal matrix in a couple of ways:
for instance, one can take just one interchanging of the first and the third
columns or take three interchanges: the first and the second, and then the
second and the third, and then the first and the second. In any case,

Note that an interchange of two columns is the same as an interchange of


two corresponding column indices. As mentioned above, there may be sev-
eral ways of column interchanges to convert the given matrix to a diagonal
matrix. However, it is very interesting that, whatever ways of column inter-
changes we take, the parity of the number of column interchanges remains
the same all the time.
In this example, the given arrangement of the column is expressed in
the arrangement of column indices, which is 3, 2, 1. Thus, to arrive at the
2.2. EXISTENCE AND UNIQUENESS 57

order 1, 2, 3, which represents the diagonal matrix, we can take either just
one interchanging of 3 and 1, or three interchanges: 3 and 2, 3 and 1, and
then 2 and 1. In either case, the parity is odd so that the "-" sign in the
computation of determinant came from (_1)1 = (_1)3, where the exponents
mean the numbers of interchanges of the column indices.
In summary, in the expansion of det A for A E M3X3(~), the number 6 =
3! of the determinants which contribute to the computation of det A is simply
the number of ways in which the three numbers 1,2,3 are rearranged without
repetitions or omissions. Moreover, the sign of each of the six determinants is
determined by the parity (even or odd) of the number of column interchanges
required to arrive at the order of 1, 2, 3 from the given arrangement of the
column indices.
These observations can be used to derive the explicit formula of the deter-
minant for matrices of order n > 3. We begin with the following definition.

Definition 2.3 A permutation of the set of integers N n = {I, 2, ... , n}


is a one-to-one function from N n onto itself.

Therefore, a permutation a of N n assigns a number a( i) in N n to each


number i in N n , and this permutation a is commonly denoted by

1 2
a = (a(l), a(2), ... , a(n)) = ( a(l) a(2)

Here, the first row is the usual lay-out of N n as the domain set, and the
second row is just an arrangement in a certain order without repetitions or
omissions of the numbers in N n as the image set. A permutation that inter-
changes only two numbers in N n , leaving the rest of the numbers fixed, such
as a = (3,2,1, ... ,n), is called a transposition. Note that the composition
of any two permutations is also a permutation. Moreover, the composition
of a transposition to a permutation a produces an interchanging of two num-
bers in the permutation a. In particular, the composition of a transposition
with itself always produces the identity permutation.
It is not hard to see that if Sn denotes the set of all permutations of N n ,
then Sn has exactly n! permutations.
Once we have listed all the permutations, the next step is to determine
the sign of each permutation. A permutation a = (j1, j2, ... , jn) is said
to have an inversion if js > jt for s < t (i.e., a larger number precedes
a smaller number). For example, the permutation a = (3,1,2) has two
inversions since 3 precedes 1 and 2.
58 CHAPTER 2. DETERMINANTS

An inversion in a permutation can be eliminated by composing it with


a suitable transposition: for example, if (1 = (3,2,1) with three inversions,
then by multiplying a transposition (2,1,3) to it, we get (2,3,1) with two
inversions, which is the same as interchanging the first two numbers 3, 2 in
(1. Therefore, given a permutation (1 = ((1(1), (1(2), ... , (1(n)) in Sn, one can
convert it to the identity permutation (1, 2, ... , n), which is the only one
with no inversions, by composing it with certain number of transpositions.
For example, by composing the three (which is the number of inversions in
(1) transpositions (2,1,3), (1,3,2) and (2,1,3) with (1 = (3,2,1), we get the
identity permutation. However, the number of necessary transpositions to
convert the given permutation into the identity permutation need not be
unique as we have seen in the third step. Notice that even if the number
of necessary transpositions is not unique the parity (even or odd) is always
consistent with the number of inversions.
Recall that all we need in the computation of the determinant is just
the parity (even or odd) of the number of column interchanges, which is the
same as that of the number of inversions in the permutation of the column
indices.
A permutation is said to be even if it has an even number of inversions,
and it is said to be odd if it has an odd number of inversions. For example,
when n = 3, the permutations (1, 2, 3), (2, 3, 1) and (3, 1, 2) are even,
while the permutations (1, 3, 2), (2, 1, 3) and (3, 2, 1) are odd. In general,
for a permutation (1 in Sn, the sign of (1 is defined as

( )_ { 1 if (1 is an even permutation
sgn (1 - -
l ·1f(1·IS an 0 dd permutatIOn.
.

It is not hard to see that the number of even permutations is equal to that
of odd permutations, so it is ~. In the case n = 3, one can notice that there
are 3 terms with + sign and 3 terms with - sign.

Problem 2.6 Show that the number of even permutations and the number of odd
permutations in Sn are equal.

Now, we repeat the three steps to get an explicit formula for det A of
a square matrix A = [aij] of order n. First, the determinant det A can
be expressed as the sum of determinants of n! matrices, each of which has
zero entries except the n entries alO"(l), a20"(2) , ... , ana(n) taken from A,
where (1 is a permutation of the set {I, 2, ... , n} of column indices. The n
entries alO"(l), a20"(2) , ... , ana(n) are chosen from A in such a way that no
2.2. EXISTENCE AND UNIQUENESS 59

two of them come from the same row or the same column. Such a matrix
can be converted to a diagonal matrix. Hence, its determinant is equal to
±al u(1)a2u(2) ... anu(n) , where the sign ± is determined by the parity of the
number of column interchanges to convert the matrix to a diagonal matrix,
which is equal to that of inversions in a: sgn(a). Therefore, the determinant
of the matrix whose entries are all zero except for aiu(i) 's is equal to

which is called a signed elementary product of A. Now, our discussions


can be summarized as follows:

Theorem 2.4 For an n x n matrix A,

det A = L sgn(a)al u (1)a2u(2)··· anu(n)·


uESn

That is, det A is the sum of all signed elementary products of A.

It is not difficult to see that this explicit formula for det A satisfies the
three rules in the definition of the determinant. Therefore, we have both
existence and uniqueness for the determinant function of square matrices of
any order n ~ 1.

Example 2.3 Consider a permutation a = (3,4,2,5,1) E 8 5 : i.e., a(l) = 3,


a(2) = 4, ... , a(5) = 1. Then a has total 2 + 4 = 6 inversions: two
inversions caused by the position of a(l) = 3, which precedes 1 and 2, and
four inversions in the permutation T = (4,2,5,1), which is a permutation of
the set {I, 2, 4, 5}. Thus,

sgn(a) = (_1)2+4 = (-1)2sgn(T).

Note that the permutation T can be considered as a permutation of N4 by


replacing the numbers 4 and 5 by 3 and 4, respectively.
Moreover, a = (3,4,2,5,1) can be converted to (1,3,4,2,5) by shifting
the number 1 by four transpositions, and then (1,3,4,2,5) can be converted
to the identity permutation (1,2,3,4,5) by two transpositions. Hence, a can
be converted to the identity permutation by six transpositions. 0

In general, for a fixed j, 1 :::; j :::; n, there are (n - I)! permutations a's in
8 n such that a(l) = j. Each a of those permutations has j - 1 inversions (j
60 CHAPTER 2. DETERMINANTS

precedes j -1 smaller numbers) and as many inversions as in the permutation


T = (0"(2), ... , O"(n)). Therefore,
sgn(O") = (-1)j- 1sgn(T).
Also, the permutation T = (0"(2), ... , O"(n)) can be considered as a permu-
tation of N n - 1 by replacing {j + 1, ... ,n} by {j, ... ,n - 1}. Thus we have
the following lemma.

Lemma 2.5 For any permutation 0" in Sn, if 0"(1) = j, then

where (0"(2), ... ,O"(n)) is a permutation ofn -1 numbers N n - {j = 0"(1)}.


Problem 2.7 Let A = [Cl ... c n ] be an n x n matrix with the column vectors Cj's.
Show that det[cj Cl ... Cj-l CHl ... c n ] = (-l)j-l det[cl ... Cj ... cn]. Note that
the same kind of equality holds when A is written in row vectors.

Problem 2.8 Compute the determinant of the matrix [


-2
~ i
2
i].
3

2.3 Cofactor expansion


Recall that the determinant of an n x n matrix A is the sum of all signed
elementary products of A, and

det A = L sgn( 0" )ala(1)a2a(2) ... ana(n)·


aESn

The first factor ala(l) in each term can be anyone of au, a12, ... , al n in the
first row of A. Among the n! terms in this sum, there are precisely (n - 1)!
permutations such that ala(l) = an, i.e., 0"(1) = 1. The sum of those terms
such that 0"(1) = 1 can be written as auAu, where

An = (_1)° Lsgn(T)a2r(2) ... anr(n) ,


r

summing over all permutations T of the numbers {2, 3, ... ,n}. The term
(-1)° means that there is no extra inversion other than that of T if 0"( 1) = 1
is at the first place. Note that each term in An contains no entries from the
2.3. COFACTOR EXPANSION 61

first row or from the first column of A. Hence, all the terms of the sum in Au
are the signed elementary products of the submatrix Mu of A obtained by
deleting the first row and the first column of A. Thus Au = (_1)0 det Mu.
Similarly, if alu(l) is chosen to be alj with 1 ~ j ~ n, then all (n - 1)!
terms such that 0"(1) = j in the expression of det A add up to aljAlj with

A lj = L sgn(0")a2u(2)a3u(3)··· ano-(n) = (_1)j-l det M lj ,


uESn , u(l)=j
where Ml j is the submatrix of A obtained by deleting the row and the column
containing alj, and the sign (_1)j-l means the extra inversion numbers
caused by placing 0"(1) = j at the first place as shown in Lemma 2.5.
By grouping aljA lj for all j = 1, ... , n in the expression of det A, we
can get an expansion of det A with respect to the first row:

det A = auAu + a12A12 + ... + alnAln ,


where Al j = (_1)j-l detMlj and M lj is the submatrix of A obtained by
deleting the row and the column containing alj.
There is a similar expansion with respect to any other row, say the i-th
row. To show this, first construct a new matrix A by using the i-th row of
A as the first row and then shifting each of the preceding i - 1 rows one
row down. Then it is easy to see that det A = (_1)i-l det A by Lemma 2.5.
Now, with the expansion of det A with respect to the first row [ail ... ain],
we get
det A = ailAil + ai2Ai2 + ... + ainAin,
where Aij (-1)i+j detMij and Mij is the submatrix of A obtained by
deleting the row and the column containing aij.
Also, we can do the same with the column vectors because det AT =
det A. This gives the following theorem:

Theorem 2.6 Let A be an n x n matrix. Then,


(1) for each 1 ~ i ~ n,
det A = ailAil + ai2Ai2 + ... + ainAin,
called the cofactor expansion of det A along the i-th row.

(2) For each 1 ~ j ~ n,


det A = aljAlj + a2jA2j + ... + anjAnj ,
called the cofactor expansion of det A along the j -th column.
62 CHAPTER 2. DETERMINANTS

The submatrix Mij is called the minor of the entry aij and the number
Aij = (-1) Hj det Mij is called the cofactor of the entry aij. Therefore,
the determinant of an n x n matrix A can be computed by multiplying the
entries in anyone row by their co factors and adding the resulting products.
As a matter of fact, the determinant could be defined inductively by these
explicit formulas.

Example 2.4 Let

Then the cofactors of au, a12 and


A~ [~ i
a13 are
n
All = (-l)l+ldet[ ~ ~1 5·9-8·6=-3,

A12 (-1)1+2 d et [ ~ ~l = (-1)(4·9-7·6)=6,

A13 = (-1)1+3 d et [ ~ ~1= 4·8-7·5=-3,

respectively. Hence the expansion of det A along the first column is

det A = allA ll + a12A12 + a13A13 = 1 . (-3) + 2 . 6 + 3 . (-3) = 0. o

For a 3 x 3 matrix A, the cofactor expansion of A along the second column


has the following form:

As this formula suggests, in the cofactor expansion of det A along a row or


a column, the evaluation of Aij can be avoided whenever aij = 0, because the
product aijAij is zero regardless of the value A ij . Therefore it is beneficial to
make the cofactor expansion along a row or a column that contains as many
zero entries as possible. Moreover, by using the elementary operations, a
2.3. COFACTOR EXPANSION 63

matrix A may be simplified into another one having more zero entries in a
row or in a column. This kind of simplification can be done by the elementary
row (or column) operations, and generally gives the most efficient way to
evaluate the determinant of a matrix. The next examples illustrate this
method for an evaluation of the determinant.

Example 2.5 Evaluate the determinant of

A= [-~ -! ~ =~ 1
2 -5 -3 8 .
-2 6 -4 1
Solution: Apply the elementary operations:
3 x row 1 + row 2,
(-2) x row 1 + row 3,
2 x row 1 + row 4
to A. Then

detA
o1 -11 27 -4
-1 1 [ 1 7 -4]10 .
[
det 0 -3 -7 10 = det -3 -7
o 4 0 -1 4 o -1

Now apply the operation: row 1 + row 2, to the matrix on the right side,
then

det [ 1 7 -4]
-3 -7 10
4 0-1
det [-~4 ~0 -:]
-1

= (-I)1+2·[Link] [-~ -~ 1
= -7(2 - 24) = 154.
Thus det A = 154. o
Problem 2.9 Use cofactor expansions along a row or a column to evaluate the de-
terminants of the following matrices:

(1) A = [0111]
2 0 1 1
2 2 0 1 '
1o 1]2
1 3 .
2 2 2 0 5 0
64 CHAPTER 2. DETERMINANTS

Example 2.6 Show that det A = (x-y)(x - z)(x - w)(y - z)(y- w)(z - w)
for

A=[: ~~: ~:].


1 w w2 w3
Solution: Use Gaussian elimination. To begin with, add (-1) x row 1 to
rows 2, 3, and 4 of A:

detA det [
~ y ~ X y2 ~ x 22 y3 ~ X33 ]
o z- X z2 - x z3 - x
o W - X w2 - x 2 w3 - x 3

det [ ~ =~ ~~ =~~ ~: =~: ]


w - X w2 - x 2 w3 - x 3
y + X y2 + xy + X2]
1
(y-x)(z-x)(w-x)det [ 11 z+x z2+xz+x 2
w + x w 2 + xw + x 2
1 y+X y2 + xy + x 2 ]
(x - y) (x - z) (w - x) det [ 0 z - y (z - y) (z + y + x)
o w-y (w-y)(w+y+x)
(x - y) (x - z) (w - x) det [ z - y (z - y) (z + y + x)
w-y (w-y)(w+y+x)
1
(x - y) (x - z) (x - w) (y - z) (w - y) det [11 z+y+x
w+y+x
1
(x - y)(x - z)(x - w)(y - z)(y - w)(z - w). 0

Problem 2.10 Let A be the Vandermonde matrix of order n:


x2 ...

A~
Xl 1
x2 . n-' ]
Xl
n-l

j
X2 2 " X2
[
Xn x2 x~-l
n

Show that
detA= II
l~i<j~n
(Xj - Xi).
2.4. CRAMER '8 RULE 65

2.4 Cramer's rule


The cofactor expansion of the determinant gives a method for computing
the inverse of an invertible matrix A. For if. j, let A* be the matrix A with
the j-th row replaced by the i-th row. Then the determinant of A* must be
zero, because the entries of the i-th and j-th rows are the same. Moreover,
the cofactors of A* with respect to the j-th row are the same as those of A:
that is, Ajk = Ajk for all k = 1, ... , n. Therefore, we have

0= det A* = ailAjl + ai2Aj2 + ... + ainAjn


ail A jl + ai2 A j2 + ... + ainAjn.
This proves the following lemma.

Lemma 2.7
if i = j
if i f. j.

Definition 2.4 If A is an n X n matrix and Aj is the cofactor of aij, then


the new matrix
All A12 A 1n
A21 A22 A2n

AnI An2 Ann


is called the matrix of cofactors of A. The transpose of this matrix is
called the adjoint of A and is denoted by adjA.

It follows from Lemma 2.7 that


detA 0 0
0 detA 0
A· adjA = = (detA)I.

0 0 detA

If A is invertible, then det A f. 0 and we may write A (de~ A adjA) = I. Thus

A -I 1 d·A and A = (det A) adj (A -1 )


= detA a J ,

by replacing A with A-I.


66 CHAPTER 2. DETERMINANTS

Example 2.7 For a matrix A = [~ ~ 1' adjA = [ _~ -! 1' and if


det A = ad - bc =I- 0, then

A-I - 1
- ad - bc
[d
-c
-ba 1.

Problem 2.11 Compnte adjA ""d A-' fo< A ~ [~ j :].


Problem 2.12 Show that A is invertible if and only if adjA is invertible, and that if
A is invertible, then A
(adjA)-l = detA = adj(A-l).

Problem 2.13 Let A be an n x n matrix with n > 1. Show that


(1) det(adjA) = (det A)n-l;
(2) adj( adjA) = (det A)n-2 A, if A is invertible.

The next theorem establishes a formula for the solution of a system of n


equations in n unknowns. It is not useful as a practical method but can be
used to study properties of the solution without solving the system.

Theorem 2.8 (Cramer's rule) Let Ax = b be a system of n linear equa-


tions in n unknowns such that det A =I- 0. Then the system has the unique
solution given by
detCj
Xj= detA' j = 1, 2, ... , n,

where C j is the matrix obtained from A by replacing the j-th column with
the column matrix b = [b 1 b2 ... bn]T.

Proof: If det A =I- 0, then A is invertible and x = A-Ibis the unique


solution of Ax = b. Since

x = A- 1 b = de~ A (adjA)b,
it follows that

o
2.4. CRAMER'S RULE 67

Example 2.8 Use Cramer's rule to solve


{ Xl
+ 2X2 + X3 50
2Xl + 2X2 + X3 60
Xl + 2X2 + 3X3 90.

il
Solution:
[5060 22
i1
2
A=
[~ 2
2
Cl
90 2

[~ n [~
50 2
C2 = 60 C3 2 60 50 .

Therefore,
90 2 90 1
detCl detC2 _ detC3 _ 20
Xl = detA = 10, X2 = detA = 10, X3 - detA - . 0

Cramer's rule provides a convenient method for writing down the solution
of a system of n linear equations in n unknowns in terms of determinants.
To find the solution, however, one must evaluate n + 1 determinants of
order n. Evaluating even two of these determinants generally involves more
computations than solving the system by using Gauss-Jordan elimination.
Problem 2.14 Use Cramer's rule to solve the systems

(1)
{ 3Xl
-2Xl
2
+
+
3
4X2
4X2
5X2
+
+
5
3X3
5X3
2X3
-2
6
l.

x y + z
3
4 7 2
(2)
x + y + z
0

2 1
2.
Y z
Problem 2.15 Let A be the matrix obtained from the identity matrix In with i-th
column replaced by the column vector x = [Xl ..• XnV, Compute det A.
Problem 2.16 Prove that if Aij is the cofactor of aij in A = [aij], and if n > 1, then
o Xl X2 Xn
Xl all a12 al n n n
a2n =- LLAijXiXj.
i=l j=l
68 CHAPTER 2. DETERMINANTS

2.5 Application: Area and Volume


In this section, we restrict our attention to the case of n = 2 or 3 in order to
visualize the geometric figures conveniently, even if the same argument can
be applied for n > 3.
For an n x n square matrix A, the row vectors ri = [ail ... ain], i =
1, ... , n, of A can be considered as elements in

lRn={(al, ... ,an ) ai ElR,i=l, ... ,n}.

The set
P(A) = {ttiri : O:S ti:S 1, i = 1, ...
t=l
,n}
is called a parallelogram if n = 2, or a parallelepiped if n :::: 3. Note
that the row vectors of A form the edges of P(A), and a different order of
the row vectors does not alter the shape of P(A).
A geometrical meaning of the determinant is that it represents the volume
(or area for n = 2 ) of the parallelepiped P(A).

Theorem 2.9 The determinant det A of an n x n matrix A is the volume


of P(A) up to sign. In fact, the volume of P(A) is equal to 1 det A I.

Proof: We present here a geometrical sketch since this way seems more
intuitive and more convincing. We give only the proof of the case n = 2,
and leave the case n = 3 to the readers. Let

where rl, r2 are the row vectors of A. Let Area(A) = Area(rl, r2) denote
the area of the parallelogram P(rl, r2) (see the figure below).

--~~-------------------- x
2.5. APPLICATION: AREA AND VOLUME 69

(1) It is quite clear that if A = 12, then Area [~ ~ 1= 1.

(2) Since the shape of the parallelogram P(A) does not depend on the
order of placing the row vectors: i.e., P(rl, r2) = P(r2, rl), we have
Area(rl,r2) = Area(r2 , rl) . On the other hand, det(rl , r2) = -det(r2,rl) .
Thus

which explains why we say "up to sign" .


(3) From the figure above, if we replace rl by krl in A, then the bottom
edge rl of P(A) is elongated by Ikl while the height h remains unchanged.
Thus
Area(krI, r2) = IkIArea(rl, r2)·
(4) The additivity in the first row is a trivial consequence of examining
the following figure: That is, if we replace rl by rl + r~ while fixing r2 , then ,
as the following figure shows, we have

Area( rl + r~ , r2) = Area(rI, r2) + Area( r~ , r2) .

--~----~----~~---+------~ x

(5) Thus the area function Area on M 2x2 (JR) satisfies the rules (R 1 ) and
(R3) of the determinant except for the rule (R2). Therefore, by uniqueness,
det = ±Area. 0

Remark: (1) Note that if we have constructed the parallelepiped P(A)


using the column vectors of A, then the shape of the parallelepiped is totally
different from the one constructed using the row vectors. However, det A =
det AT means their volumes are the same, which is a totally nontrivial fact.
70 CHAPTER 2. DETERMINANTS

(2) For n 2: 3, the volume of P(A) can be defined by induction on n ,


and exactly the same argument in the proof can be applied to show that
the volume is the determinant. However, there is another way of looking at
this fact. Let {Cl ' ... , c n } be n column vectors of an m x n matrix A . They
constitute an n-dimensional parallelepiped in ]Rm such that

P(A) = {ttiCi : 0::; ti::; 1, i = 1, ... ,n} .


2=1

A formula for the volume of this parallelepiped may be expressed as fol-


lows: We first consider a two-dimensional parallelepiped (a parallelogram)
determined by two column vectors Cl and C2 of A = [Cl C2] in ]R3.
y

--~~------------------ x
z

The area of this parallelogram is simply Area(P(A)) = Ilclllh, where h =


IIc211 sinO and 0 is the angle between Cl and C2. Therefore, we have

Area(P(A))2 IIcll1211c2112 sin 2 0 = Ilcl11211c2112(1 - cos 2 0)

(Cl . Cd(C2 ' C2) (1 _ ( (Cl ;~2)2 ))


Cl . Cl C2' C2
(Cl . Cd(C2 . C2) - (Cl . r2)2
det [ Cl . Cl Cl' C2]
C2 . Cl C2 ' C2

det ([ :~ ] [C1 C2]) ~ T


det(A A).

In general, let Cl,' .. ,Cn be n column vectors of an m x n matrix A.


Then one can show (for a proof see Exercise 5.16) that the volume of the
n-dimensional parallelepiped P(A) determined by those n column vectors
Cj ' S in ]Rm is

vol(P(A)) = Vdet(AT A) .
2.6. EXERCISES 71

In particular, if A is an m x m square matrix, then

vol(P(A)) = Vdet(AT A) = Vdet(AT) det(A) = I det(A)I,

as expected.

Problem 2.17 Show that the area of a triangle ABC in the plane ]R2, where A =
(Xl, Yl), B = (X2' Y2), C = (X3, Y3), is equal to the absolute value of

1"2 det
[Xl
X2
Yl
Y2
1]
1 .
X3 Y3 1

2.6 Exercises
2.1. Determine the values of k for which det [ : 2~] = O.
2.2. Evaluate det(A 2BA-l) and det(B- 1 A 3) for the following matrices:

A= [-~ -~
o 1 0
i], B = [! -~ ;].
2 1 3
2.3. Evaluate the determinant of

A= [=i -~ -~ =i].
2 -3 -5 8
2.4. Evaluate det A for an n x n matrix A = [aij] when
(1) aij= { 0I ii=j,
=1= j
or
(2) ..
aij=z+J.

-A)isapolynomialinxofclx + Co.
2.5. Find all solutions of the equation det(AB) = 0 for

A=[X;2 X~2]' B=[~ X~2]'


2.6. Prove that if A is an n x n skew-symmetric matrix and n is odd, then det A =
O. Give an example of 4 x 4 skew-symmetric matrix A with det A =1= O.
2.7. Use the determinant function to find
(1) the area of the parallelogram with edges determined by (4, 3) and (7, 5),
72 CHAPTER 2. DETERMINANTS

(2) the volume of the parallelepiped with edges determined by the vectors
(1, 0, 4), (0, -2, 2) and (3, 1, -1).
2.8. Use Cramer's rule to solve each system.

{"' + +
X2 X3 2
(1) { Xl + X2 = 3
(2) Xl + 2X2 + X3 = 2
Xl X2 -1.
Xl + 3X2 X3 -4.
X2 + X4 = -1
(3) { x, + X3 3
Xl X2 X3 X4 2
Xl + X2 + X3 + X4 o.
2.9. Use Cramer's rule to solve the given system:

! [~ 1 [ ~.1
2
(1) [ ;]x=[;] (2) 3 -1: x=
-1
1
2.10. Find a constant k so that the system of linear equations
kx - 2y - z 0
{ (k + l)y + 4z = 0
(k -l)z = o.
has more than one solution. (Is it possible to apply Cramer's rule here?)
2.11. Solve the following system of linear equations by using Cramer's rule and by
using Gaussian elimination:

[ ~;~~] =[;]
1121 x
1 1 124

2.12. Solve the following system of equations by using Cramer's rule:


3x + 2y = 3z + 1
{ 3x + 2z = 8 - 5y
3z - 1 = X - 2y.
2.13. Calculate the cofactors All, A 12 , Al3 and A33 for the matrix A:

(1) A = !],
[~2 1i 1 (2) A = [~3 162~], (3) A = [-i -; ~].
321
2.14. Let A be the n x n matrix whose entries are alII. Show that
(1) det(A - nIn) = o.
(2) (A - nIn)ij = (_1)n- I nn-2 for all i, j, where (A - nIn)ij denotes the
cofactor of the (i, j)-entry of A - nIno
2.15. Show that if A is symmetric, then so is adjA. Moreover, if it is invertible,
then the inverse of A is also symmetric.
2.6. EXERCISES 73

2.16. Use the adjoint formula to compute the inverse of the each of the following
matrices:

A= [-~ ~ ~ ] , B= [COs~ ~ - sin~ ].


4 1 -1 sinO 0 cosO
2.17. Compute adjA, det A, det(adjA), A-I, and verify A· adjA = (det A)1 for

(1) A = [-i ; ~],


3 -2 1
(2) A = [ ;
1 5
~ !].
7
2.18. Let A, B be invertible matrices. Show that adj(AB) = adjB adjA.
(The reader may also try to prove this equality for noninvertible matrices.)
2.19. For an m x n matrix A and n x m matrix B, show that

det [_~ 1] = det(AB).


2.20. Find the area of the triangle with vertices at (0,0), (1,3) and (3,1) in ]R2.

2.21. Find the area of the triangle with vertices at (0,0,0), (1,1,2) and (2,2,1) in
]R3.

2.22. For A,B,C,D E Mnxn(]R), show that det [~ ~] = detAdetD. But, in

general, det [~ ~] f. det A det D - det B det C.

2.23. Determine whether or not the following statements are true in general, and
justify your answers.
(1) For any square matrices A and B of the same size, det(A+B) = detA+
detB.
(2) For any square matrices A and B of the same size, det(AB) = det(BA).
(3) If A is an n x n square matrix, then for any scalar e, det(e1n - A) =
en - detA.
(4) If A is an n x n square matrix, then for any scalar e, det(cIn - AT) =
det(cIn - A).
(5) If E is an elementary matrix, then det E = ± 1.
(6) There is no matrix A of order 3 such that A2 = -13.
(7) Let A be a nilpotent matrix, i.e., Ak = 0 for some natural number k.
Then det A = O.
(8) det(kA) = kdetA for any square matrix A.
(9) Any system Ax = b has a solution if and only if det A f. O.
(10) For any n x 1, n ~ 2, column vectors u and v, det(uv T ) = O.
(11) If A is a square matrix with det A = 1, then adj(adjA) = A.
74 CHAPTER 2. DETERMINANTS

(12) If the entries of A are all integers and detA = 1 or -1, then the entries
of A -1 are also integers.
(13) If the entries of A are O's or l's, then detA = 1, 0, or-l.
(14) Every system of n linear equations in n unknowns can be solved by
Cramer's rule.
(15) If A is a permutation matrix, then AT = A.
Chapter 3

Vector Spaces

3.1 Vector spaces and subspaces


We discussed how to solve a system Ax = b of linear equations, and we
saw that the basic questions of the existence or uniqueness of the solution
were much easier to answer after Gaussian-elimination. In this chapter, we
introduce the notion of a vector space, which is an abstraction of the usual
algebraic structures of the 3-space JR3 and then elaborate our study of a
system of linear equations to this framework.
Usually, many physical quantities, such as length, area, mass, tempera-
ture are described by real numbers as magnitudes. Other physical quantities
like force or velocity have directions as well as magnitudes. Such quantities
with direction are called vectors, while the numbers are called scalars. For
instance, an element (or a point) x in the 3-space JR3 is usually represented
as a triple of real numbers:

where Xi E JR, i = 1, 2, 3, are called the coordinates of x. This expression


provides a rectangular coordinate system in a natural way. On the other
hand, pictorially such a point in the 3-space JR3 can also be represented by
an arrow from the origin to x. In this way, a point in the 3-space JR3 can be
understood as a vector. The direction of the arrow specifies the direction of
the vector, and the length of the arrow describes its magnitude.
In order to have a more general definition of vectors, we extract the most
basic properties of those arrows in JR3. Note that for all vectors (or points)
in JR3, there are two algebraic operations: the addition of any two vectors

75
76 CHAPTER 3. VECTOR SPACES

and scalar multiplication of a vector by a scalar. That is, for two vectors
x = (Xl, X2, X3), Y = (YI, Y2, Y3) in lR 3 and k a scalar, we define

x+y (Xl + YI, X2 + Y2, X3 + Y3) ,


kx (kXI, kX2, kX3).

The addition of vectors and scalar multiplication of a vector in the 3-space


lR 3 are illustrated as follows:

x+y

kv

~~---------------- X2 ~---------------- X2

Even though our geometric visualization of vectors does not go beyond


the 3-space lR 3 , it is possible to extend the above algebraic operations of
vectors in the 3-space lR 3 to the general n-space lRn for any positive integer
n. It is defined to be the set of all ordered n-tuples (aI, a2, ... , an) of real
numbers, called vectors: i. e.,

For any two vectors x = (Xl, X2, ... , xn) and y = (YI, Y2, ... , Yn) in the
n-space lRn, and a scalar k, the sum x + y and the scalar multiplication kx
of them are vectors in lRn defined by

x+y = (XI+YI, X2+Y2, ... , xn+Yn),


kx = (kXI, kX2, ... , kXn).

It is easy to verify the following list of arithmetical rules of the operations:

Theorem 3.1 For any scalars k and C, and vectors x = (Xl, X2, ... , x n ),
y = (Yl, Y2, ... , Yn), and z = (Zl, Z2, ... , zn) in the n-space lR n , the
following rules hold:
(1) x+y=y+x,
(2) x + (y + z) = (x + y) + z,
(3) x + 0 = x = 0 + x,
3.1. VECTOR SPACES AND SUBSPACES 77

(4) x + (-I)x = 0,
(5) k(x+y) = kx+ky,
(6) (k + f)x = kx + fx,
(7) k(fx) = (kf)x,
(8) Ix = x,
where 0 = (0, 0, ... , 0) is the zero vector.

I
We usually write a vector (aI, a2, ... , an) in the n-space]Rn as an n x I
column matrix

a2
al T
: = [al a2 ... an] ,

an

also called a column vector. Then the two operations of the matrix sum and
the scalar multiplication of column matrices coincide with those of vectors
in ]Rn, and the above theorem is just Theorem 1.2.
These rules of arithmetic of vectors are the most important ones because
they are the only rules that we need to manipulate vectors in the n-space
]Rn. Hence, an (abstract) vector space can be defined with respect to these
rules of operations of vectors in the n-space IRn so that ]Rn itself becomes
a vector space. In general, a vector space is defined to be a set with two
operations: an addition and a scalar multiplication which satisfy the above
rules of operations in ]Rn.

Definition 3.1 A (real) vector space is a nonempty set V of elements,


called vectors, with two algebraic operations that satisfy the following rules.
(A) There is an operation called vector addition that associates to every
pair x and y of vectors in V a unique vector x + y in V, called the the sum
of x and y, so that the following rules hold for all vectors x, y, z in V:
(1) x +y = y +x (commutativity in addition),
(2) x + (y+z) = (x+y) + z(= x + y + z) (associativity in addition),
(3) there is a unique vector 0 in V such that x + 0 = x = 0 + x for all
x E V (it is called the zero vector),
(4) for any x E V, there is a vector -x E V, called the negative of x,
such that x + (-x) = (-x) + x = O.
78 CHAPTER 3. VECTOR SPACES

(B) There is an operation called scalar multiplication that associates


to each vector x in V and each scalar k a unique vector kx in V called the
multiplication of x by a (real) scalar k, so that the following rules hold for
all vectors x, y, z in V and all scalars k, £:
(5) k(x + y) = kx + ky (distributivity with respect to vector addition),
(6) (k + £)x = kx + £x (distributivity with respect to scalar addition),
(7) k(£x) = (k£)x (associativity in scalar multiplication),
(8) Ix = x.

Clearly, the n-space R n is a vector space by Theorem 3.1. A complex


vector space is obtained if, instead of real numbers, we take complex num-
bers for scalars. For example, the set en of all ordered n-tuples of complex
numbers is a complex vector space. In Chapter 7 we shall discuss complex
vector spaces, but until then we will discuss only real vector spaces unless
otherwise stated.

Example 3.1 (1) For any positive integer m and n, the set Mmxn(R) of
all m x n matrices forms a vector space under the matrix sum and scalar
multiplication defined in Section 1.3. The zero vector in this space is the
zero matrix Omxn, and -A is the negative of a matrix A.
(2) Let C(R) denote the set of real-valued continuous functions defined
on the real line R For two functions f(x) and g(x), and a real number k,
the sum f + g and the scalar multiplication kf of them are defined by

(f + g)(x) f(x) + g(x),


(kJ)(x) = kf(x).

Then one can easily verify, as an exercise, that the set C(R) is a vector
space under these operations. The zero vector in this space is the constant
function whose value at each point is zero.
(3) Let A be an m x n matrix. Then it is easy to show that the set of
solutions of the homogeneous system Ax = 0 is a vector space (under the
sum and scalar multiplication of matrices).

Theorem 3.2 Let V be a vector space and let x, y be vectors in V. Then


(1) x + y = y implies x = 0,
(2) Ox = 0,
(3) kO = 0 for any k E R,
3.1. VECTOR SPACES AND SUBSPACES 79

(4) -x is unique and -x = (-l)x,


(5) if kx = 0, then k = a or x = O.

Proof: (1) By adding -y to both sides of x + y = y, we have


x = x + 0 = x + y + (-y) = y + (-y) = o.
(2) Ox = (0 + O)x = Ox + Ox implies Ox = 0 by (1).
(3) This is an easy exercise.
(4) The uniqueness of the negative -x of x can be shown by a simple
modification of Lemma 1.6. In fact, if x is another negative of x such that
x+x = 0, then
-x = -x + 0 = -x + (x + x) = (-x + x) + X = 0 + x = x.
On the other hand, the equation
x+ (-l)x = 1x+ (-l)x = (1-1)x = Ox = 0
shows that (-l)x is another negative of x, and hence -x = (-I)x by the
uniqueness of -x.
(5) Suppose kx = 0 and k "I O. Then x = Ix = li(kx) = = o. 0liO

Problem 3.1 Let V be the set of all pairs (x, y) of real numbers. Suppose that an
addition and scalar multiplication of pairs are defined by
(x, y) + (u, v) = (x + 2u, y + 2v), k(x, y) = (kx, ky).
Is the set V a vector space under those operations? Justify your answer.

A subset W of a vector space V is called a subspace of V if W is itself


a vector space under the addition and scalar multiplication defined in V.
Usually, in order to show that a subset W is a subspace, it is not necessary
to verify all the rules of the definition of a vector space, because certain rules
satisfied in the larger space V are automatically satisfied in every subset, if
vector addition and scalar multiplication are closed in subset.

Theorem 3.3 A nonempty subset W of a vector space V is a subspace if


and only if x + y and kx are contained in W (or equivalently, x + ky E W)
for any vectors x and y in Wand any scalar k E R
80 CHAPTER 3. VECTOR SPACES

Proof: We need only to prove the sufficiency. Assume both conditions hold
and let x be any vector in W. Since W is closed under scalar multiplication,
0= Ox and -x = (-1)x are in W, so rules (3) and (4) for a vector space
hold. All the other rules for a vector space are clear. 0

A vector space V itself and the zero vector {O} are trivially subspaces.
Some nontrivial subspaces are given in the following examples.

Example 3.2 Let W = {(x, y, z) E 1R3 : ax + by + cz = a}, where a, b, c


are constants. If x = (Xl, X2, X3), Y = (YI, Y2, Y3) are points in W, then
clearly x + Y = (Xl + YI, X2 + Y2, X3 + Y3) is also a point in W, because
it satisfies the equation in W. Similarly, kx also lies in W for any scalar k.
Hence, W is a subspace of 1R3 and is a plane passing through the origin in
1R3 .

Example 3.3 Let A be an m x n matrix. Then, as we have seen in Exam-


ple 3.1 (3), the set
W = {x E IR n : Ax = O}
of solutions of the homogeneous system Ax = 0 is a vector space. Moreover,
since the operations in Wand in IR n coincide, W is a subspace of IRn.

Example 3.4 For a nonnegative integer n, let Pn(lR) denote the set of all
real polynomials in X with degree::; n. Then Pn(lR) is a subspace of the
vector space C(IR) of all continuous functions on R

Example 3.5 Let W be the set of all n x n real symmetric matrices. Then
W is a subspace of the vector space Mnxn(lR) of all n x n matrices, because
the sum of two symmetric matrices is symmetric and a scalar multiplication
of a symmetric matrix is also symmetric. Similarly, the set of all n x n
skew-symmetric matrices is also a subspace of Mnxn(IR).

Problem 3.2 Which of the following sets are subspaces of the 3-space 1R3 ? Justify
your answer.
(1) W = {(x, y, z) E 1R3 : xyz = O},
(2) W = {(2t, 3t, 4t) E 1R3 : t E 1R},
(3) W = {(x, y, z) E 1R3 : x 2 + y2 - z2 = O},
(4) W = {x E 1R3 : x T u = 0 = x T v}, where u and v are any two fixed nonzero
vectors in 1R3 .
Can you describe all subspaces of the 3-space 1R3 ?
3.2. BASES 81

Problem 3.3 Let V = C(JR.) be the vector space of all continuous functions on R
Which of the following sets Ware subspaces of V? Justify your answer.
(1) W is the set of all differentiable functions on R
(2) W is the set of all bounded continuous functions on R
(3) W is the set of all continuous nonnegative-valued functions on JR., i.e., f(x) 2
o for any x E R
(4) W is the set of all continuous odd functions on JR., i. e., f (- x) = - f (x) for
any x E R
(5) W is the set of all polynomials with integer coefficients.

3.2 Bases
Recall that any vector in the 3-space JR.3 is of the form (Xl, X2, X3) which
can alsobe written as

That is, any vector in ]R3 can be expressed as the sum of scalar multiples of
el = (1, 0, 0), e2 = (0, 1, 0) and e3 = (0, 0, 1), which are also denoted
by i, j and k, respectively. The following definition gives a name to such
expressions.

Definition 3.2 Let V be a vector space, and let {Xl, X2, ... , xm} be a set
of vectors in V. Then a vector y in V of the form

where aI, ... , am are scalars, is called a linear combination of the vectors
Xl, X2, ... , Xm·

The next theorem shows that the set of all linear combinations of a finite
set of vectors in a vector space forms a subspace.

Theorem 3.4 Let Xl, X2, ... , Xm be vectors in a vector space V. Then
the set W = {alxl + a2x2 + ... + amXm : ai E JR.} of all linear combinations
of Xl, X2, ... , Xm is a subspace of V called the subspace of V spanned
by Xl, X2, ... , Xm ·
82 CHAPTER 3. VECTOR SPACES

Proof: We want to show that W is closed under addition and scalar mul-
tiplication. Let u and w be any two vectors in W. Then

u
w

for some scalars ai's and bi's. Therefore,

and, for any scalar k,

Thus, u + wand ku are linear combinations of Xl, X2, ... , Xm and conse-
quently contained in W. Therefore, W is a subspace of V. 0

Suppose that {Xl, X2, ... , xm} is any set of m vectors in a vector space
V. If any vector in V can be written as a linear combination of these vector
Xi'S, we say that it is a spanning set of V.

Example 3.6 (1) For a nonzero vector v in a vector space V, linear com-
binations of v are simply scalar multiples of v. Thus the subspace W of V
spanned by v is W = {kv : k E Il~.}.
(2) Consider three vectors VI = (1,1,1), V2 = (1, -1, 1) and V3 =
(1,0,1) in ~a. The subspace WI spanned by VI and V2 is written as

and the subspace W 2 spanned by VI, v2 and V3 is written as

Then alVI + a2V2 = alvl + a2v2 + OV3 implies WI ~ W 2 . On the other


hand, any vector in W2 is of the form al VI + a2v2 + a3v3. But, since
V3 = ~(VI + V2), this can be rewritten as CIVI + C2V2. This means that
W 2 ~ WI, thus WI = W 2 which is a plane in ~3 containing the vectors VI,
v2 and V3. In general, a subspace in a vector space can have many different
spanning sets. 0
3.2. BASES 83

Example 3.7 Let


(1, 0, 0, ... , 0),
(0, 1, 0, ... , 0),

en (0,0,0, ... , 1)

be n vectors in the n-space IR n (n ~ 3). Then a linear combination of


el, e2, e3 is of the form
aIel + a2e2 + a3e3 = (aI, a2, a3, 0, ... , 0).
Hence, the set
W = {(aI, a2, a3, 0, ... , 0) E IR n : aI, a2, a3 E IR}
is the subspace of the n-space IR n spanned by the vectors el, e2, e3. Note
that the subspace W can be identified with the 3-space IR3 through the
identification
(aI, a2, a3, 0, ... , 0) == (aI, a2, a3)

with ai ERIn general, for m ~ n, the m-space IRm can be identified as a


subspace of the n-space IRn. 0

Example 3.8 Let A = [al a2 ... an] be an m x n matrix. Then the column
vectors ai's are in IRm , and the matrix product Ax represents the linear
combination of the column vector ai's whose coefficients are the components
of x E IR n , i.e., Ax = xlal + X2a2 + ... + xnan . Therefore, the set

of all linear combinations of the column vectors of A is a subspace of IR m


called the column space of A. Therefore, Ax = b has a solution (Xl, ... ,
xn) in IR n if and only if the vector b belongs to the subspace W spanned by
the column vectors of A. 0

Problem 3.4 Let Xl, X2, ... , Xm be vectors in a vector space V and let W be the
subspace spanned by Xl, X2, ... , X m . Show that W is the smallest subspace of V
containing Xl, X2, ... , X m , i.e., if U is a subspace of V containing Xl, X2, ... , X m ,
then W S;;; U.

Problem 3.5 Show that the set of all matrices of the form AB - BA does not span
the vector space JI;[nxn(IR).
84 CHAPTER 3. VECTOR SPACES

As we saw above, any nonempty subset of a vector space V spans a


subspace through the linear combinations of the vectors, and a subspace can
have many spanning sets with a different number of vectors. This means
that a vector can be written as linear combinations in various ways. If one
can find a finite number of vectors in V such that any vector in V can be
expressed in a unique way as a linear combination of them, then the study
of the vector space V might be easier and the computations of vectors may
be simplified. Thus, for a fixed set of vectors Xl, X2, ... , Xm in a vector
space V, we look at their linear combinations CIXI + C2X2 + ... + CmXm
and see whether any vector in V can be written in this form in exactly one
way. This problem can be rephrased as to whether or not a nontrivial linear
combination produces the zero vector, while the trivial combination, with
all scalars Ci = 0, obviously produces the zero vector.

Definition 3.3 A set of vectors {Xl, X2, ... , xm} in a vector space V is
said to be linearly independent if the vector equation, called the linear
dependence of Xi'S,

has only the trivial solution CI = C2 = ... = Cm = o. Otherwise, it is said to


be linearly dependent.

Therefore, a set of vectors {Xl, X2, ... , xm} is linearly dependent if and
only if there is a linear dependence

with a nontrivial solution (CI' C2, ... , cm). In this case, we may assume
that Cm i=- o. Then the equation can be rewritten as
CI C2 Cm-l
Xm = - - X l - -X2 - ... - --Xm-l·
Cm Cm Cm

That is, a set of vectors is linearly dependent if and only if at least one of
the vectors in the set can be written as a linear combination of the others.

Example 3.9 Let X = (1,2,3) and y = (3,2,1) be two vectors in the 3-


space lR 3 . Then clearly y i=- AX for any A E lR (or ax + by = 0 only
when a = b = 0). This means that {x, y} is linearly independent in lR 3 .
If w = (3,6,9), then {x,w} is linearly dependent since w - 3x = O. In
3.2. BASES 85

general, if x, yare noncollinear vectors in the 3-space IR 3 , the set of all linear
combinations of x and y determines a plane W through the origin in IR3 , i. e.,
W = {ax + by : a, b E IR}. Let z be another nonzero vector in the 3-space
IR3. If z E W, then there are some numbers a, b E IR, not all of them are
zero, such that z = ax + by, that is, the set {x, y, z} is linearly dependent.
If z ¢ W, then ax + by + cz = 0 is possible only when a = b = c = (prove °
it). Therefore, the set {x, y, z} is linearly independent if and only if z does
not lie in W. 0

By abuse of language, it is sometimes convenient to say that "the vectors


xm are linearly independent," although this is really a property
Xl, X2, ... ,
of a set.

[! -; -! ~ 1
Example 3.10 The columns of the matrix

A=
2 -1 1 3
are linearly dependent in the 3-space IR , since the third column is the sum
3
of the first and the second.
As this example shows, the concept of linear dependence can be applied
to the row or column vectors of any matrix.

n
Example 3.11 Consider an upper triangular matrix

A~ [~ ~
The linear dependence of the column vectors of A may be written as

which, in matrix notation, may be written as a homogeneous system:

From the third row, C3 = 0, from the second row C2 = 0, and substitution of
them into the first row forces CI = 0, i.e., the homogeneous system has only
the trivial solution, so that the column vectors are linearly independent. 0
86 CHAPTER 3. VECTOR SPACES

The following theorem can be proven by the same argument.

Theorem 3.5 The nonzero rows of a matrix of a row-echelon form are lin-
early independent, and so are the columns that contain leading 1 'so

In particular, the rows of any triangular matrix with nonzero diagonals


are linearly independent, and so are the columns.
In general, if V = ]Rm and VI, V2, ... , Vn are n vectors in ]Rm, then they
form an m x n matrix A = [VI V2 ... v n]. On the other hand, Example 3.8
shows that the linear dependence cI VI + c2v2 + ... + cn vn = 0 of vi's is
nothing but the homogeneous equation Ax = 0, where x = (CI l c2,"', cn).
Thus, the column vectors Vi'S of A are linearly independent in ]Rm if and
only if the homogeneous system Ax = 0 has only the trivial solution, and
they are linearly dependent if and only if Ax = 0 has a nontrivial solution.
If U is the reduced row-echelon form of A, then we know that Ax = 0 and
Ux = 0 have the same set of solutions. Moreover, a homogeneous system
Ax = 0 with unknowns more than the number of equations always has a
nontrivial solution (see the remark on page 11). This proves the following
lemma.

Lemma 3.6 (1) Any set of n vectors in the m-space ]Rm is linearly de-
pendent if n > m.
(2) If U is the reduced row-echelon form of A, then the columns of U
are linearly independent if and only if the columns of A are linearly
independent.

Example 3.12 Consider the vectors el = (1, 0, 0), e2 = (0, 1, 0) and


e3 = (0, 0, 1) in the 3-space ]R3. The vector equation CI el + C2e2 + C3e3 = 0
becomes
CI (1, 0, 0) + C2(0, 1, 0) + C3(0, 0, 1) = (0, 0, 0)
or, equivalently, (CI, C2, C3) = (0, 0, 0). Thus, CI = C2 = C3 = 0, so the set
of vectors {el' e2, e3} is linearly independent and also spans ]R3.

Example 3.13 In general, it is quite clear that the vectors el, e2""l en in
]Rn are linearly independent. Moreover, they span the n-space ]Rn: In fact,
when we write a vector x E ]Rn as (Xl, ... , x n ), it means just the linear
combination of the vector ei's:
x = (Xl, ... , xn) = xlel + '" + xnen ·
However, if anyone of the ei's is missed, then they cannot span ]Rn. Thus,
this kind of vector plays a special role in the vector space.
3.2. BASES 87

Definition 3.4 Let V be a vector space. A basis for V is a set of linearly


independent vectors that spans V.

For example, as we saw in Example 3.13, the set {el' e2, ... , en} forms
a basis, called the standard basis for the n-space ]Rn. Of course, there are
many other bases for ]Rn.

Example 3.14 (1) The set of vectors (1,1,0), (0, -1, 1), and (1,0,1) is not
a basis for the 3-space ]R3, since this set is linearly dependent (the third is
the sum of the first two vectors), and cannot span ]R3. (The vector (1,0,0)
cannot be obtained as a linear combination of them (prove it).) This set
does not have enough vectors spanning ]R3.
(2) The set of vectors (1,0,0), (0,1,1), (1,0,1) and (0,1,0) is not a basis
either, since they are not linearly independent (the sum of the first two minus
the third makes the fourth) even though they span ]R3. This set of vectors
has some redundant vectors spanning ]R3.
(3) The set of vectors (1,1,1), (0,1,1), and (0,0,1) is linearly inde-
pendent and also spans ]R3. That is, it is a basis for ]R3, different from the
standard basis. This set has the proper number of vectors spanning ]R3, since
the set cannot be reduced to a smaller set nor does it need any additional
vector spanning ]R3. 0

By definition, in order to show that a set of vectors in a vector space is a


basis, one needs to show two things: it is linearly independent, and it spans
the whole space. The following theorem shows that a basis for a vector space
represents a coordinate system just like the rectangular coordinate system
by the standard basis for ]Rn.

Theorem 3.7 Let Q = {VI, V2, ... , V n} be a basis for a vector space V.
Then each vector x in V can be uniquely expressed as a linear combination
of VI, V2, ... , v n , i.e., there are unique scalars ai's, i = 1, ... , n, such
that

In this case, the column vector [al a2 ... an]T is called the coordinate
vector of x with respect to the basis Q, and it is denoted [x]a.
Proof: If x can be also expressed as x = b l VI + b2V2 + ... + bn V n , then
we have 0 = (al - bl)VI + (a2 - b 2 )V2 + ... + (an - bn)vn . By the linear
independence of xi's, ai = bi for all i = 1, ... , n. 0
88 CHAPTER 3. VECTOR SPACES

Example 3.15 Let a = {el' e2, e3} be the standard basis for ]R3, and let
(3 = {VI, V2, V3} with VI = (1,1,1) = el + e2 + e3, V2 = (0,1,1) = e2 + e3,
V3 = (0,0,1) = e3. Then

Problem 3.6 Show that the vectors VI = (1, 2, 1), V2 = (2, 9, 0) and V3 = (3, 3, 4)
in the 3-space ]R3 form a basis.

Problem 3.7 Show that the set {I, x, x 2 , ... , xn} is a basis for Pn (]R), the vector
space of all polynomials of degree::; n with real coefficients.

Problem 3.8 In the n-space ]Rn, determine whether or not the set

is linearly dependent.

Problem 3.9 Let Xk denote the vector in ]Rn whose first k - 1 coordinates are zero
and whose last n - k + 1 coordinates are 1. Show that the set {Xl, X2, ... , xn} is
a basis for ]Rn.

3.3 Dimensions
We often say that the line ]RI is one-dimensional, the plane ]R2 is two-
dimensional and the space ]R3 is three-dimensional, etc. This is mostly due
to the fact that the freedom in choosing coordinates for each element in the
space is 1, 2 or 3, respectively. This means that the concept of dimension
is closely related to the concept of bases. Note that for a vector space in
general there is no unique way to choose a basis. However, there is some-
thing common to all bases, and this is related to the notion of dimension.
We first need the following important lemma from which one can define the
dimension of a vector space.

Lemma 3.8 Let V be a vector space and let a = {Xl, X2, ... , xm} be a
set of m-vectors in V.
3.3. DIMENSIONS 89

(1) If a spans V, then every set of vectors with more than m vectors cannot
be linearly independent.
(2) If a is linearly independent, then any set of vectors with fewer than m
vectors cannot span V.

Proof: Since (2) follows from (1) directly, we prove only (1). Let (3 =
{YI, Y2, ... , Yn} be a set of n-vectors in V with n > m. We will show that
(3 is linearly dependent. Indeed, since each vector Yj is a linear combination
of the vectors in the spanning set a, i.e., for j = 1, ... , n,
m

Yj = aljXI + a2j X 2 + ... + amjXm = L aijXi,


i=l
we have
CIYI + C2Y2 + ... + CnYn CI(aUXI + a2lX2 + ... + amlXm )
+ c2(al2 x I + a22X2 + ... + a m 2X m)

(aUCI + al2c2 + ... + alncn)xl


+ (a2l c I + a22C2 + ... + a2nCn)X2

Thus, (3 is linearly dependent if and only if the system of linear equations

qYI + C2Y2 + ... + CnYn = 0


has a nontrivial solution (CI' C2, ... , cn) -# (0,0,···,0).
This is true if all
the coefficients of Xi'S are zero but not all of Ci'S are zero. This means that
the homogeneous system of linear equations in Ci'S,

au al2 al n CI 0
a21 a22 a2n C2 0

amI am2 a mn Cn 0

has a nontrivial solution. This is guaranteed by Lemma 3.6, since A is an


m x n matrix with m < n. D
90 CHAPTER 3. VECTOR SPACES

It is clear by Lemma 3.8 that if a set 0: = {Xl, X2, ... , xn} of n vectors
is a basis for a vector space V, then no other set f3 = {YI, Y2, ... , Yr} of r
vectors can be a basis for V if r t= n. This means that all bases for a vector
space V have the same number of vectors, even if there are many different
bases for a vector space. Therefore, we obtain the following important result:

Theorem 3.9 If a basis for a vector space V consists of n vectors, then so


does every other basis.

Definition 3.5 The dimension of a vector space V is the number, say n,


of vectors in a basis for V, denoted by dim V = n. When V has a basis of a
finite number of vectors, V is said to be finite dimensional.

Example 3.16 The following can be easily verified:


(1) If V has only the zero vector: V = {O}, then dim V = o.
(2) If V = ~n, then dim~n = n, since V has the standard basis
{el,e2, ... ,en }.
(3) If V = Pn(~) of all polynomials of degree less than or equal to n,
then dimPn(~) = n + 1 since {I, x, x2, ... , xn} is a basis for V.
(4) If V = Mmxn(~) of all m x n matrices, then dimMmxn(~) = mn
since {Eij : i = l, ... ,m, j = l, ... ,n} is a basis for V, where Eij is the
m x n matrix whose (i, j)-th entry is 1 and all others are zero. 0

If V = C(~) of all real-valued continuous functions defined on the real


line, then one can show that V is not finite dimensional. A vector space V
is infinite dimensional if it is not finite dimensional. In this book, we are
concerned only with finite dimensional vector spaces unless otherwise stated.

Theorem 3.10 Let V be a finite dimensional vector space.


(1) Any linearly independent set in V can be extended to a basis by adding
more vectors if necessary.
(2) Any set of vectors that spans V can be reduced to a basis by discarding
vectors if necessary.

Proof: We prove (1) only and leave (2) as an exercise. Let 0: = {Xl, ... ,
xd be a linearly independent set in V. If 0: spans V, then 0: is a basis. If 0:
does not span V, then there exists a vector, say Xk+l, in V that is not con-
tained in the subspace spanned by the vectors in 0:. Now {Xl, ... , Xk, Xk+1}
is linearly independent (check why). If {Xl, ... , Xk, xk+d spans V, then
3.3. DIMENSIONS 91

this is a basis for V. If it does not span V, then the same procedure can be
repeated, yielding a linearly independent set that spans V, i. e., a basis for
V. This procedure must stop in a finite step because of Lemma 3.8 for a
finite dimensional vector space V. 0

Theorem 3.10 shows that a basis for a vector space V is a set of vectors
in V which is maximally independent and minimally spanning in the above
sense. In particular, if W is a subspace of V, then any basis for W is
linearly independent also in V, and can be extended to a basis for V. Thus
dim W::::: dim V.

Corollary 3.11 Let V be a vector space of dimension n. Then


(1) any set of n vectors that spans V is a basis for V, and
(2) any set of n linearly independent vectors is a basis for V.

Proof: Again we prove (1) only. If a spanning set of n vectors were not
linearly independent, then the set would be reduced to a basis that has a
smaller number of vectors than n vectors. 0

Corollary 3.11 means that if it is known that dim V = n and if a set of n


vectors either is linearly independent or spans V, then it is already a basis
for the space V.

Example 3.17 Let W be the subspace of]R4 spanned by the vectors

Xl = (1, -2, 5, -3), X2 = (0, 1, 1, 4), X3 = (1, 0, 1, 0).


Find a basis for Wand extend it to a basis for ]R4.
Solution: Note that dim W ~ 3 since W is spanned by three vectors xi's.
Let A be the 3 x 4 matrix whose rows are Xl, x2 and X3:

A~[~
-2 5
1 1 -3]
4 .
o 1 o
Reduce A to a row-echelon form:
92 CHAPTER 3. VECTOR SPACES

The three nonzero row vectors of U are clearly linearly independent, and
they also span W because the vectors Xl, x2 and X3 can be expressed as
a linear combination of these three nonzero row vectors of U. Hence, U
provides a basis for W. (Note that this implies dim W = 3 and hence Xl,
X2, X3 is also a basis for W by Corollary 3.11. The linear independence of
Xi'S is a by-product of this fact).
To extend this basis, just add any nonzero vector of the form X4
(0, 0, 0, t) to the rows of U to get a basis for the space lR4. 0

Problem 3.10 Let W be a subspace of a vector space V. Show that if dim W =


dim V, then W = V.

Problem 3.11 Find a basis and the dimension of each of the following subspaces of
Mnxn(lR) of all n x n matrices:
(1) the space of all n x n diagonal matrices whose traces are zero;
(2) the space of all n x n symmetric matrices;
(3) the space of all n x n skew-symmetric matrices.

Now consider two subspaces U and W of a vector space V. The sum of


these subspaces U and W is defined by

U + W = {u + w : u E U, W E W}.

It is not hard to see that this is a subspace of V.

Problem 3.12 Let U and W be subspaces of a vector space V.


(1) Show that U + W is the smallest subspace of V containing U and W.
(2) Prove that Un W is also a subspace of V. Is Uu W a subspace of V? Justify
your answer.

Definition 3.6 A vector space V is called the direct sum of two subspaces
U and W, written V = U EEl W, if V = U + Wand Un W = {O}.

Theorem 3.12 A vector space V is the direct sum of subspaces U and W,


i. e., V = U EEl W, if and only if for any v E V there exist unique u E U and
W E W such that v = u + w.
3.3. DIMENSIONS 93

Proof: Suppose that V = U EEl W. Then, for any v E V, there exist vectors
U E U and w E W such that v = U + w, since V = U + W. To show the
uniqueness, suppose that v is also expressed as a sum u' + w' for u' E U
and w' E W. Then U + w = u' + w' implies
U - u' = w' - w E U n W = {O}.
Hence, U = u' and w = w'.
Conversely, if there exists a nonzero vector v in U n W, then v can be
written as sum of vectors in U and W in many different ways:
1 1 1 2
v =v+ 0=0+ v = 2v + 2v = 3v + 3v E U + W. o

Example 3.18 Consider the three vectors el, e2 and e3 in JR.3. Let U =
{aIel +b3e3 : aI, b3 E JR.} be the subspace spanned by el and e3 (xz-plane),
and let W = {a2e2 + C2e3 : a2, C3 E JR.} be the subspace of JR.3 spanned by
e2 and e3 (yz-plane). Then a vector in U + W is of the form

(aIel +b3e3)+(a2e2+c2e3) = aIel +a2e 2+(b3+ c2)e3 = aIel +a2e 2+ a3e 3

where a3 = b3+C3 and aI, a2, a3 are arbitrary numbers. Thus U + W = JR.3.
However, JR.3 i- U EEl W since clearly e3 E Un W i- {O}. In fact, the vector
e3 E JR.3 can be written as many linear combinations of vectors in U and W:
1 1 1 2
e3 = 2e3 + 2e3 = 3e3 + 3e3 E U + W.

Note that if we had taken W to be the subspace spanned by e2 alone,


then it would be easy to see that JR.3 = U EEl W. Note also that there are
many choices for W. 0

As a direct consequence of Theorem 3.10 and the definition of the direct


sum, one can show the following.

Corollary 3.13 If U is a subspace of V, then there is a subspace W in V


such that V = U EEl W.

Proof: Choose a basis {UI' ... , ud for U, and extend it to a basis {Ul' ... ,
Uk, uk+l, ... , un} for V. Then the subspace W spanned by {Vk+l' ... , v n }
satisfies the requirement. 0
94 CHAPTER 3. VECTOR SPACES

Problem 3.13 Let U and W be the su bspaces of the vector space M n x n OR) consist-
ing of all symmetric matrices and all skew-symmetric matrices, respectively. Show
that Mnxn(lR) = U EEl W. Therefore, the decomposition of a square matrix A given
in (3) of Problem 1.10 is unique.

Problem 3.14 Let {VI, v2, ... , v n } be a basis for a vector space V and let Wi =
{rvi : r E JR} be the subspace of V spanned by Vi. Show that V = WI EEl W2 EEl
···EEl Wn ·

3.4 Rowand column spaces


In this section, we go back to systems of linear equations and study them
in terms of the concepts introduced in the previous sections. Note that an
m x n matrix A can be abbreviated by the row vectors or column vectors as
follows:
au al2 al n rl
a2l a22 a2n r2
A

aml am2 a mn rm
Cl C2 cn ] ,

where the ri's are the row vectors of A that are in JR n , and the Cj'S are the
column vectors of A that are in JR m .

Definition 3.7 Let A be an m x n matrix with row vectors {q, ... , rm}
and column vectors {Cl' ... , c n }.
(1) The row space of A is the subspace in JRn spanned by the row vectors
{rl' ... , r m}, denoted by R(A).
(2) The column space of A is the subspace in JRm spanned by the column
vectors {Cl' ... , C n }, denoted by C(A).
(3) The solution set of the homogeneous equation Ax = 0 is called the
null space of A, denoted by N(A).
Note that the null space N(A) is a subspace of the n-space ]Rn, whose
dimension is called the nullity of A. Since the row vectors of A are just the
column vectors of its transpose AT, and the column vectors of A are the row
vectors of AT, the row space of A is the column space of AT; that is,
3.4. ROW AND COLUMN SPACES 95

Since Ax = XICI + X2C2 + ... XnCn for any vector x = (Xl, X2,"" xn) E ]Rn,
we get
C(A) = {Ax : x E ]Rn}.

Thus, for a vector b E ]Rm, the system Ax = b has a solution if and only if
b E C(A) ~ ]Rm. Thus, the column space C(A) is the set of vectors b E ]Rm
for which Ax = b has a solution.
It is quite natural to ask what the dimensions of those subspaces are,
and how one can find bases for them. This will help us to understand the
structure of all the solutions of the equation Ax = b. Since the set of the
row vectors and the set of the column vectors of A are spanning sets for the
row space and the column space, respectively, a minimally spanning subset
of each of them will be a basis for each of them.
This is not difficult for a matrix of a (reduced) row-echelon form.

Example 3.19 Let U be in a reduced row-echelon form given as

Clearly, the first three nonzero row vectors containing leading 1's are lin-
early independent and they form a basis for the row space R(U), so that
dim R(U) = 3. On the other hand, note that the first three columns con-
taining leading 1's are linearly independent (see Theorem 3.5), and that the
last two column vectors can be expressed as linear combinations of them.
Hence, they form a basis for C(U), and dimC(U) = 3. To find a basis for the
null space N(U), we first solve the system Ux = 0 with arbitrary values 8
and t for the free variables X4 and Xs, and get the solution

Xl -28 2t -2 -2
X2 8 3t 1 -3
X3 -48 + t =8 -4 +t 1 = 8lls + tllt,
X4 s 1 0
Xs t 0 1

where lls = (-2, 1, -4, 1, 0), llt = (-2, -3, 1, 0, 1). It shows that these
two vectors lls and llt span the null space N(U), and they are clearly linearly
independent. Hence, the set {lls, llt} is a basis for the null space N(U). 0
96 CHAPTER 3. VECTOR SPACES

In the following, the row, the column or the null space of a matrix A
will be discussed in relation to the corresponding space of its (reduced) row-
echelon form. We first investigate the row space R(A) and the null space
N(A) of A by comparing them with those of the reduced row-echelon form U
of A. Since Ax = 0 and Ux = 0 have the same solution set by Theorem 1.1,
we have N(A) = N(U).

Let A = [ r.1 1be an m x n matrix, where ri's are the row vectors of
r~
A. The three elementary row operations change A into the following three
types:

rl rl
rj
AI= kri for k =1= 0, A2 = for i < j, A3 = ri + krj
ri
rm rm

It is clear that the row vectors of the three matrices AI, A2 and A3 are linear
combinations of the row vectors of A. On the other hand, by the inverse
elementary row operations, these matrices can be changed into A. Thus,
the row vectors of A can also be written as linear combinations of those of
Ai'S. This means that if matrices A and B are row equivalent, then their
row spaces must be equal, i.e., R(A) = R(B).
Now the nonzero row vectors in the reduced row-echelon form U are
always linearly independent and span the row space of U (see Theorem 3.5).
Thus they form a basis for the row space R(A) of A. We have the following
theorem.

Theorem 3.14 Let U be a (reduced) row-echelon form of a matrix A. Then

R(A) = R(U) and N(A) = N(U).


Moreover, if U has l' nonzero row vectors containing leading 1 's, then they
form a basis for the row space R( A), so that the dimension of R( A) is r.

The following example shows how to find bases for the row and the null
spaces, and at the same time how to find a basis for the column space C(A).
3.4. ROW AND COLUMN SPACES 97

Example 3.20 Let A be a matrix given as

A = [ -21o -5
2 01 -12 -85] [ r2.
-3 3 4 1
rl ]
r3
3 6 0 -7 2 r4

Find bases for the row space R(A), the null space N(A), and the column
space C(A) of A.

Solution: (1) Find a basis for R(A): By Gauss-Jordan elimination on A,


we get the reduced row-echelon form U:

0 2 0

[~ 1]
1 -1 0
u= 0 0 1
0 0 0

Since the three nonzero row vectors

VI (1, 0, 2, 0, 1),
V2 (0, 1, -1, 0, 1),
V3 (0, 0, 0, 1, 1)

of U are linearly independent, they form a basis for the row space R(U) =
R(A), so dim R(A) = 3. (Note that in the process of Gaussian elimination,
we did not use a permutation matrix. This means that the three nonzero
rows of U were obtained from the first three row vectors rl, r2, r3 of A and
the fourth row r 4 of A turned out to be a linear combination of them. Thus
the first three row vectors of A also form a basis for the row space.)
(2) Find a basis for N(A). It is enough to solve the homogeneous system
Ux = 0, since N(A) = N(U). That is, neglecting the fourth zero equation,
the equation U x = 0 takes the following system of equations:

+ Xs o
+ Xs o
X4 + Xs O.

Since the first, the second and the fourth columns of U contain the leading
1's, we see that the basic variables are Xl, X2, X4, and the free variables are
98 CHAPTER 3. VECTOR SPACES

X3, X5. By assigning arbitrary values sand t to the free variables X3 and
X5, we find the solution x of Ux = 0 as

Xl -2s t -2 -1
X2 s - t 1 -1
+ tnt,
x= X3 s =s 1 +t
° = sns

°°(-1, -1, 0, -1, 1).


X4 t -1
X5 t 1

where ns = (-2, 1, 1, 0, 0) and nt = In fact, the


two vectors ns and nt are the solutions when the values of (X3,X5) = (s, t)
are (1,0) and those of (X3, X5) = (s, t) are (0,1), respectively. They must be
linearly independent, since (1, 0) and (0, 1), as the (X3, x5)-coordinates of
ns and nt respectively, are clearly linearly independent. Since any solution
of U x = 0 is a linear combination of them, the set {ns, nt} is a basis for
the null space N(U) = N(A). Thus dimN(A) = 2 = the number of free
variables in U x = o.
(3) Find a basis for C(A). Let CI, C2, C3, C4, C5 denote the column
vectors of A in the given order. Since these column vectors of A span C(A),
we only need to discard some of the columns that can be expressed as linear
combinations of other column vectors. But, the linear dependence

holds if and only if x = (Xl, •.. , X5) E N(A). By taking x = ns =


(-2, 1, 1, 0, 0) or x = nt = (-1, -1, 0, -1, 1), the basis vectors of N(A)
given in (2), we obtain two nontrivial linear dependencies of Ci'S:

-2CI+C2+C3 0,
-CI - C2 - C4 + C5 0,

respectively. Hence, the column vectors C3 and C5 corresponding to the free


variables in Ax = 0 can be written as

C3 2CI - C2,
C5 = CI + C2 + C4.

That is, the column vectors C3, C5 of A are linear combinations of the column
vectors CI, C2, C4, which correspond to the basic variables in Ax = o. Hence,
{CI, C2, C4} spans the column space C(A).
3.4. ROW AND COLUMN SPACES 99

We claim that {CI' C2, C4} is linearly independent. Let A = [CI C2 C4]
and U = [UI U2 U4] be submatrices of A and U, respectively, where Uj is the
j-th column vector of the reduced row-echelon form U of A obtained in (1).
Then clearly U is the reduced row-echelon form of A so that N(A) = N(U).
Since the vectors UI, U2, U4 are just the columns of U containing leading
l's, they are linearly independent, by Theorem 3.5, and Ux = 0 has only a
trivial solution. This means that Ax = 0 has also only a trivial solution, so
{CI' C2, C4} is linearly independent. Therefore, it is a basis for the column
space C(A) and dimC(A) = 3 = the number of basic variables. That is, the
column vectors of A corresponding to the basic variables in Ux = 0 form a
basis for the column space C(A). 0

In summary, given a matrix A, we first find the (reduced) row-echelon


form U of A by Gauss-Jordan elimination. Then a basis for R(A) = R(U)
is the set of nonzero rows vectors of U, and a basis for N(A) = N(U) can
be found by solving Ux = 0, which is easy. On the other hand, one has
to be careful for C(U) i- C(A) in general, since the column space of A is
not preserved by Gauss-Jordan elimination. However, we have dimC(A) =
dimC(U), and a basis for C(A) can be selected from the column vectors in
A, not in U, as those corresponding to the basic variables (or the leading l's
in U). To show that those column vectors indeed form a basis for C(A), we
used a basis for the null space N(A) to eliminate the redundant columns.
Note that a basis for the column space C(A) can be also found with the
elementary column operations, which is the same as finding a basis for the
row space R(AT) of AT.
Problem 3.15 Let A be the matrix given in Example 3.20. Find a relation of
a, b, c, d so that the vector x = (a, b, c, d) belongs to C(A).
Problem 3.16 Find bases for R(A) and N(A) of the matrix

A = [~ =~ ~~ ~~ ~].
2 6 18 8 6
Also find a basis for C(A) by finding a basis for R(A T ).

Problem 3.17 Let A and B be two n x n matrices. Show that AB = 0 if and only
if the column space of B is a subspace of the nullspace of A.

Problem 3.18 Find an example of a matrix A and its row-echelon form U such that
C(A) i- C(U).
100 CHAPTER 3. VECTOR SPACES

3.5 Rank and nullity


The argument in Example 3.20 is so general that it can be used to prove the
following theorem, which is one of the most fundamental results in linear
algebra. The proof given here is just a repetition of the argument in Exam-
ple 3.20 in a general form, and so may be skipped at the reader's discretion.

Theorem 3.15 (The first fundamental theorem) For any m x n ma-


trix A, the row space and the column space of A have the same dimension;
that is, dim'R.(A) = dimC(A).

Proof: Let dim'R.(A) = r and let U be the reduced row-echelon form of A.


Then r is the number of the nonzero row (or column) vectors of U containing
leading 1's, which is equal to the number of basic variables in U x = 0 or
Ax = O. We shall prove that the r columns of A corresponding to the r
leading l's (or basic variables) form a basis for C(A), so that dimC(A) =
r = dim'R.(A).
(1) They are linearly independent: Let A denote the submatrix of A
whose columns are those of A corresponding to the r basic variables (or
leading l's) in U, and let U denote the submatrix of U containing r leading
1'so Then, it is quite clear that U is the reduced row-echelon form of A,
so that Ax = 0 if and only if Ux = o. However, Ux = 0 has only a triv-
ial solution since the columns of U containing the leading 1's are linearly
independent by Theorem 3.5. Therefore, Ax = 0 also has only the trivial
solution, so the columns of A are linearly independent.
(2) They span C(A): Note that the columns A corresponding to the
free variables are not contained in A, and each of these column vector of
A can be written as a linear combination of the column vectors of A (see
Example 3.20). In fact, if {XiI' Xi2' ... ,Xik} is the set of free variables whose
corresponding columns are not in A, then, for an assignment of value 1 to
Xij and 0 to all the other free variables, one can always find a nontrivial
solution of
Ax = Xl CI + X2C2 + ... + XnCn = o.
When the solution is substituted into this equation, one can see that the
column Cij of A corresponding to Xij = 1 is written as a linear combination
of the columns of A. This can be done for each j = 1, ... , k, so the columns
of A corresponding to those free variables are redundant in the spanning set
of C(A). 0
3.5. RANK AND NULLITY 101

Remark: In the proof of Theorem 3.15, once we have shown that the
columns in A are linearly independent as in (1), we may replace step (2)
by the following argument: One can easily see that dimC(A) :::::: dim R(A)
by Theorem 3.10. On the other hand, since this inequality holds for arbitrary
matrices, in particular for AT, we get dimC(AT) :::::: dim R(AT). Moreover,
C(AT) = R(A) and R(AT) = C(A) implies dimC(A) ~ dim R(A), which
means dimC(A) = dim R(A). This also means that the column vectors of A
span C(A), and so form a basis.
In summary, the following equalities are now clear from Theorem 3.14
and 3.15:
dim R(A) dim R(U)
the number of nonzero row vectors of U
the maximal number of linearly independent
row vectors of A
the number of basic variables in Ux = O.
the maximal number of linearly independent
column vectors of A
dimC(A).

dimN(A) dimN(U)
the number of free variables in Ux = O.

Definition 3.8 For an m x n matrix A, the rank of A is defined to be the


dimension of the row space (or the column space), denoted by rank A.

Clearly, rank In = n and rank A = rank AT. And for an m x n matrix A,


rank A = dim R(A) = dimC(A). Since dim R(A) :=; m and dimC(A) :=; n,
we have the following corollary:

Corollary 3.16 If A is an m x n matrix, then rank A ~ min{m, n}.

Since dim R(A) = dimC(A) = rank A is the number of basic variables


in Ax = 0, and dimN(A) = nullity of A is the number of free variables
Ax = 0, we have the following corollary.

Corollary 3.17 For any m x n matrix A,

dim R(A) + dimN(A) rank A + nullity of A n,


dimC(A) + dimN(AT ) = rank A + nullity of AT m.
102 CHAPTER 3. VECTOR SPACES

°
If dimN(A) = (or N(A) = {O}), then dim R.(A) = n (or R.(A) = JR n ),
which means that A has exactly n linearly independent rows and n linearly
independent columns. In particular, if A is a square matrix of order n, then
the row vectors are linearly independent if and only if the column vectors
are linearly independent. Therefore, by Theorem 1.8, we get the following
corollary.

Corollary 3.18 Let A be an n x n square matrix. Then A is invertible if


and only if rank A = n.

Example 3.21 For a 4 x 5 matrix

°
n
2 2
A= r -:1
-2 1 1
2 -3 -7
1 2 -2 -4

by Gaussian elimination, we get

The first three nonzero rows containing leading 1's in U form a basis for
R.(U) = R.(A). Note that Xl, X3 and X5 are the basic variables in Ux = 0,
since the first, third and fifth columns of U contain leading 1'so Thus the
three columns Cl = (1, -1, 1, 1), C3 = (0,1, -3, -2) and C5 = (1,0,2,3) of A,
not the three columns in U, corresponding to those basic variables Xl, x3 and
X5 form a basis for C(A). Therefore, rank A = dim R.(A) = dimC(A) = 3,
the nullity of A = dimN(A) = 2, and dimN(AT) = 1. 0

Problem 3.19 Find the nullity and the rank of each of the following matrices:

(l)A=[ ;
-1
~
-2
-i
0 -5
~l' (2)A=[i
2 1 5 -2
i; ~l·
For each of the matrices, show that dim R(A) = dimC(A) directly by finding their
bases.

Problem 3.20 Show that a system of linear equations Ax = b has a solution if and
only if rank A = rank [A b], where [A b] denotes the augmented matrix of Ax = b.
3.5. RANK AND NULLITY 103

Theorem 3.19 For any two matrices A and B for which AB can be defined,
(1) N(AB) ~ N(B),
(2) N(AB)T) ~ N(AT ),
(3) C(AB) ~ C(A),
(4) R(AB) ~ R(B).

Proof: (1) and (2) are clear, since Bx = 0 implies (AB)x = A(Bx) = O.
(3) For an m x n matrix A and an n x p matrix B,

C(AB) = {ABx : x E lRP}


~ {Ay: y E jRn} = C(A),
because Bx E jRn for any x E jRP.
(4) R(AB) = C«(ABf) = C(BT AT) ~ C(BT) = R(B). o

Corollary 3.20 rank(AB) ::; min{rank A, rank B}.

In some particular cases, the equality holds. In fact, it will be shown


later in Theorem 5.23 that for any square matrix A, rank(A T A) = rank A =
rank(AA T ). The following problem illustrates another such case.
Problem 3.21 Let A be an invertible square matrix. Show that, for any matrix B,
rank(AB) = rank B = rank(BA).

Theorem 3.21 Let A be an m x n matrix of rank r. Then


(1) for every submatrix C of A, rank C ::; r, and
(2) the matrix A has at least one r x r submatrix of rank r, that is, A has
an invertible submatrix of order r.

Proof: (1) We consider an intermediate matrix B which is obtained from A


by removing the rows that are not wanted in C. Then clearly R(B) ~ R(A)
and hence rank B ::; rank A. Moreover, since the columns of C are taken
from those of B, C(C) ~ C(B) and rank C ::; rank B.
(2) Note that we can find r linearly independent row vectors of A, which
form a basis for the row space of A. Let B be the matrix whose row vectors
consist of these vectors. Then rank B = r and the column space of B must
be of dimension r. By taking r linearly independent column vectors of B,
one can find an r x r submatrix C of A with rank r. 0
104 CHAPTER 3. VECTOR SPACES

Problem 3.22 Prove that the rank of a matrix is equal to the largest order of its
invertible submatrices.

Problem 3.23 For each of the matrices given in Problem 3.19, find an invertible
submatrix of the largest order.

3.6 Bases for subspaces


In this section, we discuss how to find bases for V + Wand V n W of two sub-
spaces V and W of the n-space IRn , and then derive an important relationship
between the dimensions of those subspaces in terms of the dimensions of V
and W.
Let a = {VI, ... , Vk} and (3 = {WI"'" wt} be bases for V and W,
respectively. Let Q be the n x (k + f) matrix whose columns are those bases
vectors:
Q = [VI'" Vk WI ... wllnx(kH)'
Then it is quite clear that C(Q) = V + W, so that a basis for C(Q) is a basis
for V + W. On the other hand, one can show that N(Q) can be identified
with VnW.
In fact, if x = (al,"" ak, bl , ... , be) E N(Q) ~ IR k+l , then

This means that corresponding to x there is a vector

that belongs to V n W, since the middle part is in Vasa linear combination


of the basis vectors in a and the right side is in W as a linear combination
of the basis vectors in (3. On the other hand, if y E V n W, y can be written
as linear combinations of both bases for V and W:

Y alvl+···+akvk
blWI + ... + beWi,
for some al, ... , ak and bl,.'" be. Let x = (al,"" ak, -b l , ... , -bi). Then it
is quite clear that Qx = 0, i.e., x E N(Q). That is, for each x E N(Q), there
corresponds a vector y E V n W, and vice versa. Moreover, if Xi, i = 1,2,
3.6. BASES FOR SUBSPACES 105

correspond to Yi, then one can easily check that Xl + X2 corresponds to


YI +Y2, and kXI corresponds to kYI. This means that the two vector spaces
N(Q) and VnW can be identified as vector spaces. In particular, for a basis
for N(Q), the corresponding set in V n W is also a basis, that is, if the set
of vectors
X~ = (an, ... , alk, bn, .. ·, bl€),
{
Xs (asl, ... , ask, bsl , ... , bsl),
is a basis for N(Q), then the set

Y~ = anVI + ... + alk v k, Y~ = -(bnWI + ... + bl€ Wl) ,


{ or { .
Ys = aslvl + ... + askvk, Ys = -(bsIWI + ... + bslWl)
is also a basis for V n W, and vice versa. This means that

dimN(Q) = dim V n W.

Note that dim(V + W) f- dim V + dim W, in general. The following


corollary gives a relation of them.

Corollary 3.22 For any subspaces V and W of the n-space ]Rn,

dim(V + W) + dim(V n W) = dim V + dim W.

Proof: Let dim V = k and dim W = f. Recall that rank A+ nullity A =


the number of the columns of a matrix A. Thus, for the matrix Q above, we
have
dimC(Q) + dimN(Q) = k + f.
However, we have dimC(Q) = dim(V + W), dimN(Q) = dim(V n W).
dim V = k and dim W = f. o

Example 3.22 Let V and W be two subspaces of]R5 with bases

{ VI
= (1, 3, -2, 2, 3),
{ WI
(2, 3, -1, -2, 9),
V2 (1, 4, -3, 4, 2), W2 = (1, 5, -6, 6, 1),
V3 = (1, 3, 0, 2, 3), W3 (2, 4, 4, 2, 8),
106 CHAPTER 3. VECTOR SPACES

respectively. Then the matrix Q takes the following form:

1 1 1 2 1 2
3 4 3 3 5 4
Q = [VI V2 V3 WI W2 W3 1= -2 -3 0 -1 -6 4
2 4 2 -2 6 2
3 2 3 9 1 8

After Gauss-Jordan elimination, we get

U~ [~ n
0 0 5 0
1 0 -3 2
0 1 0 -1
0 0 0 0

From this, one can directly see that dim(V + W) = 4. The columns
VI, V2, V3, W3 corresponding to the basic variables in Qx = 0 form a ba-
sis for C(Q) = V + W. Moreover, dimN(Q) = dim(V n W) = 2, since there
are two free variables X4 and X5 in Qx = o.
To find a basis for V n W, we solve Ux = 0 for (Xl, X2, X3, 1,0, X5) and
(Xl, X2, X3, 0,1, X5). After a simple computation, we obtain a basis for N(Q):

Xl = (-5,3,0,1,0,0) and X2 = (0, -2, 1,0,1,0).

From QXi = 0, we obtain two equations:

-5VI + 3V2 + WI 0,
-2V2 + V3 + W2 o.
Therefore, {YI, Y2} is a basis for V n W, where

2 1
3 5
YI = 5VI - 3V2 = -1 = WI, Y2 = 2V2 - V3 = -6 = W2.
-2 6
9 1

Clearly, the equality

dim(V + W) + dim(V n W) = 4 + 2 = 3 + 3 = dim V + dim W


holds in this example. o
3.6. BASES FOR SUBSPACES 107

Remark: In Example 3.22, we showed a method for finding bases for V + W


and V n W for given subspaces V and W of ]Rn by constructing a matrix
Q whose columns are basis vectors for V and basis vectors for W. There is
another method for finding their bases by constructing a matrix Q whose
rows are basis vectors for V and basis vectors for W.
If Q is the matrix whose row vectors are basis vectors for V and basis
vectors for W in order, then clearly V + W = n(Q). By finding a basis for
the row space n(Q), we can get a basis for V + W.
On the other hand, a basis for V n W can be found as follows: Let A be
the k x n matrix whose rows are basis vectors for V, and B the £ x n matrix
whose rows are basis vectors for W. Then, V = n(A) and W = n(B). Let
A denote the matrix A with an unknown vector x = (Xl, ... , xn) E ]Rn
attached at the bottom row, i.e.,

A=[A]
x '

and the matrix B is defined similarly. Then it is clear that n(A) = n(A)
and n(B) = n(B) if and only if x E V n W = n(A) n n(B). This means
that the row-echelon form of A and that of A should be the same via the
same Gaussian elimination. Thus, by comparing the row vectors of the row-
echelon form of A with those of A, we can obtain a system of linear equations
for x = (Xl, ... , xn). By the same argument applied to Band B, we get
another system of linear equations for the same x = (Xl, ... , xn). Solutions
to these two systems together will provide us with a basis for V n W.
The following example illustrates how one can apply this argument to
find bases for V + Wand V n W.

Example 3.23 Let V be the subspace of ]R5 spanned by

VI (1, 3, -2, 2, 3),


V2 = (1, 4, -3, 4, 2),
V3 (2, 3, -1, -2, 10),

and W the subspace spanned by

WI = (1, 3, 0, 2, 1),
W2 (1, 5, -6, 6, 3),
W3 (2, 5, 3, 2, 1).
108 CHAPTER 3. VECTOR SPACES

Find a basis for V + Wand for V n W.


Solution: Note that the matrix A whose row vectors are Vi'S is reduced
to a row-echelon form

[1o 31 -2-1 22 -13] ,


o 0 0 0 1

so that dim V = 3. Similarly, the matrix B whose row vectors are wj's is
reduced to a row-echelon form

1 3 o 2
[ o 2 -6 4
o 0 o 0
so that dim W = 2.
Now, if Q denotes the 6 x 5 matrix whose row vectors are Vi'S and wj's,
then V +W = R( Q). By Gaussian elimination, Q is reduced to a row-echelon
form, excluding zero rows:

3 -2 2

[~ -~ ]
1 -1 2
0 1 0 -1 .
0 0 0 1

Thus, the four nonzero row vectors

(1, 3, -2, 2, 3), (0, 1, -1, 2, -1), (0, 0, 1, 0, -1), (0, 0, 0, 0, 1)

form a basis for V + W, so that dim(V + W) = 4.


We now find a basis for V n W. A vector x = (Xl, X2, X3, X4, X5) E ]R5
is contained in V n W if and only if x is contained in both the row space of
A and that of B.
Let A be A with x attached at the last row:

[
1
1
3-2
4 -3 4 2 3]
2
2 3 -1 -2 10 .
Xl X2 X3 X4 X5
3.6. BASES FOR SUBSPACES 109

Then by the same Gaussian elimination A is reduced to


1 3 -2
[ o
000
1 -1
-~ 11 .
o 0 -Xl + X2 + X3 o
Therefore, x E R(A) = V if and only ifR(A) = R(A). By comparing the row
vectors of the row-echelon form of A with those of A, it gives that x E R(A)
if and only if the last row vector of the row-echelon form of A is the zero
vector, that is, x is a solution of the homogeneous system of equations

= 0
= o.
We do the same calculation with B, and obtain another homogeneous system
of linear equations for x:

-9Xl + 3X2 + X3 = 0
{ 4Xl - 2X2 + X4 = 0
2Xl - X2 + X5 o.
Solving these two homogeneous systems together yields
V n W = {t(l, 4, -3, 4, 2) : t E ]R}.

Hence, {(1, 4, -3, 4, 2)} is a basis for V n Wand dim(V n W) = 1. 0

Problem 3.24 Let V and W be the subspaces of the vector space P3 (]R) spanned by
VI (X) 3 X + 4x 2 + x 3,
{ V2(X) 5 + 5x 2 + x 3 ,
V3(X) 5 5x + 10x 2 + 3x3 ,
and
9 3x + 3x 2 +
5 X + 2x2 +
6 + 4x 2 +
respectively. Find the dimensions and bases for V + Wand V n W.
Problem 3.25 Let
V = {(x, y, z, u) E]R4 : y+ z +u = O},
W ={(x, y, z, u) E]R4 : x + y = 0, z = 2u}
be two subspaces of ]R4. Find bases for V, W, V + W, and V n W.
110 CHAPTER 3. VECTOR SPACES

3.7 Invertibility
We now can have the following existence and uniqueness theorems for a
solution of a system of linear equations Ax = b for an mxn matrix A and
a vector b E ]Rm.

Theorem 3.23 (Existence) Let A be an mxn matrix. Then the following


statements are equivalent.
(1) For each b E ]Rm, Ax = b has at least one solution x in ]Rn.

(2) The column vectors of A span ]Rm, i.e., C(A) = ]Rm.


(3) rank A = m, and hence m ::; n.
(4) There exists an n x m right inverse B of A such that AB = 1m.

Proof: (1) ¢} (2): Note that C(A) ~ ]Rm in general. For any b E ]Rm,
there is a solution x E ]Rn of Ax = b if and only if b is a linear combination
of the column vectors of A. This is equivalent to saying that ]Rm = C(A).
(2) ¢} (3): Since dimC(A) = rank A = dim R(A) ::; min{m, n}, C(A) =
]Rm if and only if dimC(A) = m ::; n (see Problem 3.10).
(1) =} (4) : Let el, e2, ... , em be the standard basis for ]Rm. Then for
each i = 1, 2, ... , m we can find at least one solution Xi E ]Rn such that
AXi = ei by the condition. If B is the n x m matrix whose columns are these
solutions, i.e., B = [Xl x2 ... x m], then it follows by matrix multiplication
that

Hence, the matrix B is a required right inverse.


(4) =} (1) : If B is a right inverse of A, then for any b E ]Rm, x = Bb
is a solution of Ax = b. 0

Condition (2) means that A has m linearly independent column vectors,


and condition (3) implies that there exist m linearly independent row vectors
of A, since rank A = m = dim R(A).
Note that if C(A) ~ ]Rm, then Ax = b has no solution for b ~ C(A).

Theorem 3.24 (Uniqueness) Let A be an mxn matrix. Then the follow-


ing statements are equivalent.
(1) For each b E ]Rm, Ax = b has at most one solution x in ]Rn.

(2) The column vectors of A are linearly independent.


3.7. INVERTIBILITY 111

(3) dimC(A) = rank A = n, and hence n ::; m.


(4) R(A) = ]Rn.

(5) N(A) = {O}.


(6) There exists an n x m left inverse C of A such that CA = In.

Proof: (1)::::} (2) : Note that the column vectors of A are linearly inde-
pendent if and only if the homogeneous equation Ax = 0 has only a trivial
solution. However, Ax = 0 has always a trivial solution x = 0 and (1) means
that it is the only one.
(2) {::} (3) : Clear, because all the column vectors are linearly indepen-
dent if and only if they form a basis for C(A), or dimC(A) = n ::; m.
(3) {::} (4): Clear, because dim R(A) = rank A = dimC(A) = n if and
only if R(A) =]Rn (see Problem 3.10).
(4) {::} (5): Clear, since dim R(A) + dimN(A) = n.
(2) ::::} (6) : Suppose that the columns of A are linearly independent
so that rank A = n. Extend these column vectors of A to a basis for ]Rm
by adding m - n more independent vectors to them. Construct an m x m
matrix 8 with those vectors in columns. Then the matrix 8 has rank m
and is hence invertible. Let C be the n x m matrix obtained from 8- 1 by
throwing away the last m - n rows. Since the first n columns of 8 constitute
the matrix A, we have CA = In.
(6) ::::} (1) : Let C be a left inverse of A. If Ax = b has no solution, then
we are done. Suppose that Ax = b has two solutions, say Xl and X2. Then

Hence, the system can have at most one solution. o

Remark: (1) We have proved that an mxn matrix A has a right inverse if
and only if rank A = m, and A has a left inverse if and only if rank A = n.
In the first case Ax = b always has a solution, and in the second case the
solution (if it exists) is unique. Therefore, if m i= n, A cannot have both left
and right inverses.
(2) For a practical way of finding a right or a left inverse of an mxn
matrix A, we will show later (see Corollary 5.24) that if rank A = m, then
(AAT)-l exists and AT(AAT)-l is a right inverse of A, and if rank A = n,
then (AT A)-l exists and (AT A)-l AT is a left inverse of A.
112 CHAPTER 3. VECTOR SPACES

(3) Note that if m = n so that A is a square matrix, then A has a right


inverse (and a left inverse) if and only if rank A = m = n. Moreover, in this
case the inverses are the same (see Theorem 1.8). Therefore, a square matrix
A has rank n if and only if A is invertible. This means that for a square
matrix "Existence = Uniqueness", and the ten conditions in the above two
theorems are all equivalent. In particular, for the invertibility of a square
matrix it is enough to show the existence of a one-side inverse.

Problem 3.26 For each of the following matrices, discuss the number of possible
solutions to the system of linear equations Ax = b for any b:

-~ ~6 13: l'
[2~ !7 -3 (2) A= [ -~1 l'

l
(1) A= ;
-6

(3) A ~ ~ ~ ~~ j -~n, (4) A ~ [~ i -n


The following theorem is a collection of the results proved in Theorems
1.8, 3.23, 3.24, and the Remark before Definition 4.3.

Theorem 3.25 For a square matrix A of order n, the following statements


are equivalent.
(1) A is invertible.
(2) det A =f. o.
(3) A is row equivalent to In.
(4) A is a product of elementary matrices.
(5) Elimination can be completed: PA = LDU, with all di =f. O.
(6) Ax = b has a solution for every b E ]Rn.

(7) Ax = 0 has only a trivial solution, i.e., N(A) = {O}.


(8) The columns of A are linearly independent.
(9) The columns of A span ]Rn, i.e., C(A) = ]Rn.

(10) A has a left inverse.


(11) rank A = n.
(12) The rows of A are linearly independent.
(13) The rows of A span ]Rn, i.e., R(A) = ]Rn.

(14) A has a right inverse.


3.8. APPLICATION: INTERPOLATION 113

(15)* The linear transformation A : lR.n ---> lR.n via A(x) = Ax is injective.
(16)* The linear transformation A : lR.n ---> lR.n is surjective.
(17)* Zero is not an eigenvalue of A.

Proof: Exercise: where have we proved which claim? Prove any not cov-
ered. The numbers with asterisks will be explained in the following places:
(15) and (16) in the Remark on page 141 and (17) in Theorem 6.1. 0

3.8 Application: Interpolation


In many scientific experiments, a scientist wants to find the precise functional
relationship between input data and output data. That is, in his experiment,
he puts various input values into his experimental device and obtains output
values corresponding to those input values. After his experiment, what he
has is a table of inputs and outputs. The precise functional relationship
might be very complicated, and sometimes it might be very hard or almost
impossible to find the precise function. In this case, one thing he can do is
to find a polynomial whose graph passes through each of the data points and
comes very close to the function he wanted to find. That is, he is looking
for a polynomial that approximates the precise function. Such a polynomial
is called an interpolating polynomial. This problem is closely related to
systems of linear equations.
Let us begin with a set of given data: Suppose that for n + 1 distinct
experimental input values xo, Xl, ... , Xn , we obtained n + 1 output values
Yo = f(xo), YI = f(XI), ... , Yn = f(xn). The output values are supposed
to be related to the inputs by a certain function f. We wish to construct a
polynomial p(x) of degree less than or equal to n which interpolates f(x) at
Xo, Xl, ... , Xn: i.e., p(Xi) = Yi = f(Xi) for i = 0, 1, ... , n.
Note that if there is such a polynomial, it must be unique. Indeed, if
q(x) is another such polynomial, then h(x) = p(x) - q(x) is also a poly-
nomial of degree less than or equal to n vanishing at n + 1 distinct points
xo, Xl, ... , Xn · Hence h(x) must be the identically zero polynomial so that
p(x) = q(x) for all X E R
In fact, the unique polynomial p(x) can be found by solving a system
of linear equations: If we write p( x) = ao + al X + ... + anx n , then we are
supposed to determine the coefficients ai's. The set of equations
114 CHAPTER 3. VECTOR SPACES

for i = 0, 1, .. , , n, constitutes a system of n + 1 linear equations in n + 1


unknowns ai's:
1 Xo Yo
1 Xl YI
=
1 Xn Yn
The coefficient matrix A is a square matrix of order n + 1, known as Van-
dermonde's matrix (see Problem 2.10), whose determinant is

detA = II (Xj - xd·


O:<S;i<j:<S;n

Since the Xi'S are all distinct, det A =1= o. It follows that A is nonsingular,
and hence Ax = b always has a unique solution, which determines the unique
polynomial p(x) of degree::; n passing through the given n+1 points (xo, Yo),
(Xl, YI), ... , (x n , Yn) in the plane ]R2.

Example 3.24 Given four points

(0, 3), (1, 0), (-1, 2), (3, 6)

in the plane ]R2, let p(x) = ao + alx + a2x2 + a3x3 be the polynomial passing
through the given four points. Then, we have a system of equations

{ :~
=
+ al + a2 + a3 = 0
ao - al + a2 a3 2
ao + 3al + 9a2 + 27a3 6.

Solving this system, we find that ao = 3, al = -2, a2 = -2, a3 = 1 is the


unique solution, and the unique polynomial is p(x) = 3 - 2x - 2x2 + x3. 0

Problem 3.27 Let f(x) = sinx. Then at x = 0, i, ~, 3;, Jr, the values of fare
y = 0, ~, 1, ~, O. Find the polynomial p(x) of degree 4 that passes through :s
these five points. (One may need to use a computer due to messy computation).

Problem 3.28 Find a polynomial p( x) = a + bx + cx 2 + dx 3 that satisfies


p(O) = 1, p'(O) = 2, p(l) = 4, p'(l) = 4.
3.9. APPLICATION: THE WRONSKIAN 115

Problem 3.29 Find the equation of a circle that passes through the three points
(2, -2), (3, 5), and (-4, 6) in the plane ]R2.

Remark: (1) It is suggested that the readers think about the differences
between this interpolation and the Taylor polynomial approximation to a
differentiable function.
(2) Note again that the interpolating polynomial p( x) of degree :s: n is
uniquely determined when we have the correct data, i.e., when we are given
precisely n + 1 values of y at precisely n + 1 distinct points Xo, xl, ... , X n .
However, if we are given fewer data, then the polynomial is under-
determined: i.e., if we have m values of y with m < n + 1 at m distinct
points Xl, X2, ... , X m , then there are as many interpolating polynomials
as the null space of A since in this case A is an m x (n + 1) matrix with
m<n+l.
On the other hand, if we are given more than n + 1 data, then the
polynomial is over-determined: i.e., if we have m values of y with m > n + 1
at m distinct points Xl, X2, ... , X m , then there need not be any interpolating
polynomial since the system could be inconsistent. In this case, the best we
can do is to find a polynomial of degree :s: n to which the data is closest.
We will review this statement again in Section 5.8.

3.9 Application: The Wronskian


Let Yl, Y2, ... , Yn be n vectors in an m-dimensional vector space V. To
check the independence of the vectors yi's, consider its linear dependence:

CIYI + C2Y2 + ... + CnYn = o.


Let Q = {Xl, x2, ... , x m } be a basis for V. By expressing each Yi as a
linear combination of the basis vectors Xi'S, the linear dependence of Yi'S
can be written as a linear combination of the basis vectors xi's, so that all
of the coefficients (which are also linear combinations of Ci 's) must be zero.
It gives a homogeneous system of linear equations in ci's, say Ac = 0 with
an m x n matrix A, as in the proof of Lemma 3.8. Recall that the vectors
Yi'S are linearly independent if and only if the system Ac = 0 has only a
trivial solution. Hence, the linear independence of a set of vectors in a finite
dimensional vector space can be tested by solving a homogeneous system of
linear equations. But, if V is not finite dimensional, this test for the linear
independence of a set of vectors cannot be applied.
116 CHAPTER 3. VECTOR SPACES

In this section, we introduce a test for the linear independence of a set of


functions. For our purpose, let V be the vector space of all functions on JR
which are differentiable infinitely many times. Then one can easily see that
V is not finite dimensional.
Let fl(X), h(x), "', fn(x) be n functions in V. The n functions are
linearly independent in V if the linear equation
Cdl(X) + c2h(x) + ... + enfn(x) = 0
for all x E JR implies that all Ci = O. By taking the differentiation n - 1
times, we obtain n equations:
cd?) (x) + c2!Ji) (x) + ... + cnf~i)(x) = 0, 0::::; i ::::; n - 1,
for all x E IR. Or, in a matrix form:
h(x) h(x) fn(x) o
fHx) f2(X) f~(x) o

o
The determinant of the coefficient matrix is called the Wronskian for
{h(x), h(x),···, fn(x)} and denoted by W(x). Therefore, if there is a point
Xo E JR such that W(x) i- 0, then the coefficient matrix is nonsingular at
x = Xo, and so all Ci = O. Therefore, if the Wronskian is nonzero at a point
in JR, then {Jl(X), h(x), "', fn(x)} are linearly independent.
Example 3.25 For the sets of functions H = {x,cosx,sinx} and F2
{x, eX, e -X}, the Wronskians are

W,(x) = det [ ~
cos x
-sinx
sinx
cos x
1=x
-cosx -SlllX

and

=2x.

Since Wi(x) i- 0 for x i- 0, both Fi are linearly independent. o


Problem 3.30 Show that 1, x, x2, ... ,xn are linearly independent in the vector space
C(JR) of continuous functions.
3.10. EXERCISES 117

3.10 Exercises
3.1. Let V be the set of all pairs (x, y) of real numbers. Define
(x, y) + (Xl, YI) (X + Xl, Y + YI)
e(x, y) = (ex, y).
Is V a vector space with these operations?
3.2. For x, y E JRn and k E JR, define two operations as
xEBy = x-y, k·x = -kx.
The operations on the right sides are the usual ones. Which of the rules in
the definition of a vector space are satisfied for (JRn, EB, .)?
3.3. Determine whether the given set is a vector space with the usual addition
and scalar multiplication of functions.
(1) The set of all functions f defined on the interval [-1, 1] such that
f(O) = O.
(2) The set of all functions f defined on JR such that limx->oo f(x) = O.
(3) The set of all twice differentiable functions f defined on JR such that
f"(x) + f(x) = o.
3.4. Let C 2 [-1, 1] be the vector space of all functions with continuous second
derivatives on the domain [-1, 1]. Which of the following subsets is a sub-
space of C 2 [-1, I]?
(1) W = {f(x) E C 2[-1, 1] : f"(x) + f(x) = 0, -1 S; X S; I}.
(2) W = {f(x) E C2[-1, 1] : f"(x) + f(x) = x 2, -1 S; X S; I}.
3.5. Which of the following subsets of C[-l, 1] is a subspace of the vector space
C[-l, 1] of continuous functions on [-1, I]?
(1) W = {f(x) E C[-l, 1]: f(-l) = -f(l)}.
(2) W = {f(x) E C[-l, 1] : f(x) 2: 0 for all X in [-1, I]}.
(3) W = {f(x) E C[-l, 1] : f( -1) = -2 and f(l) = 2}.
(4) W = {f(x) E C[-l, 1] : f(~) = O}.
3.6. Does the vector (3, -1, 0, -1) belong to the subspace of JR4 spanned by the
vectors (2, -1, 3, 2), (-1, 1, 1, -3) and (1, 1, 9, -5)?
3.7. Express the given function as a linear combination of functions in the given
set Q.
(1) p(x) = -1- 3x + 3x 2 and Q = {PI (x), P2(X), P3(X)}, where
PI(X) = 1 + 2x + x 2, P2(X) = 2 + 5x, P3(X) = 3 + 8x - 2x 2.
(2) p(x) = -2 - 4x + x 2 and Q = {PI (x), P2(X), P3(X), p4(X)}, where
PI (x) = 1 + 2x2 + x 3, P2 (x) = 1 + X + 2x3, P3 (x) = -1 - 3x - 4x 3,
P4 (x) = 1 + 2x - x 2 + x 3.
118 CHAPTER 3. VECTOR SPACES

3.8. Is {cos2 x, sin 2 x, 1, eX} linearly independent in the vector space C(lR)?
3.9. Show that the given sets of functions are linearly independent in the vector
space C[-7r, 7r].
(1) {I, X, x 2 , x 3 , X4}
(2) {I, eX, e2x , e3x }
(3) {I, sinx, cosx, ... , sinkx, coskx}
3.10. Are the vectors
VI = (1, 1, 2, 4), V2 = (2, -1, -5, 2),
V3 = (1, -1, -4, 0), V4 = (2, 1, 1, 6)

linearly independent in the 4-space lR4?


3.11. In the 3-space lR 3 , let W be the set of all vectors (Xl, X2, X3) that satisfy the
equation Xl - X2 - X3 = o. Prove that W is a subspace of lR 3 . Find a basis
for the subspace W.
3.12. With respect to the basis a = {I, X, x 2} for the vector space P2(lR), find the
coordinate vector of the following polynomials:
(1) J(x) = x2 - X + 1, (2) J(x) = x 2 + 4x - 1, (3) J(x) = 2x + 5.
3.13. Let W be the subspaceofC[-7r, 7r] consistingoffunctions of the form J(x) =
a sin X + b cos x. Determine the dimension of W.
3.14. Let V denote the set of all infinite sequences of real numbers:

If x = {Xi} and y = {Yi} are in V, then x + y is the sequence {Xi + Y;}~l.


If e is a real number, then ex is the sequence {exi} ~ I.
(1) Prove that V is a vector space.
(2) Prove that V is not finite dimensional.
3.15. For two matrices A and B for which AB can be defined, prove the following
statements:
(1) If both A and B have linearly independent column vectors, then the
column vectors of AB are also linearly independent.
(2) If both A and B have linearly independent row vectors, then the row
vectors of AB are also linearly independent.
(3) If the column vectors of B are linearly dependent, then the column
vectors of AB are also linearly dependent.
(4) If the row vectors of A are linearly dependent, then the row vectors of
AB are also linearly dependent.
3.16. Let U = {(x, y, z) : 2x+3y+z = O} and V = {(x, Y, z) : x+2y-z = O}
be subspaces of lR 3 .
3.10. EXERCISES 119

(1) Find a basis for Un V.


(2) Determine the dimension of U + V.
(3) Describe U, V, U n V and U + V geometrically.

3.17. How many 5 x 5 permutation matrices are there? Are they linearly indepen-
dent? Do they span the vector space Msxs(lR) ?
3.18. Find bases for the row space, the column space, and the null space for each

[: n
of the following matrices.

(1) A =
2
4
1
-3 (2) B =
0
1
2 1
1 -2 -5]
2 ,

l n
2 -1 1 5 0 0
0 1 -1 -2 1
3 1 1 -1 3 1
(3) C ~[ 6
9
(4) D = 2
0
1 -1
0 -2
8
2
3
1
3 5 -5 5 10

2 -6
3.19. Find the rank of A as a function of x: A = [ 2~ 3 -9
1 x
3.20. Find the rank and the largest invertible submatrix of each of the following

[: n
matrices.
2 3
[000 ']
[~
1
o 0 1 0 4 0
(1) 0 1 0 0 ' (2) 1 o 1 ' 2 ] , (3)
2 3
1 1 4
o 0 0 0 0 0
3.21. For any nonzero column vectors u, v, show that the matrix A = UyT has
rank 1. Conversely, every matrix of rank 1 can be written as UyT for some
u, y.
3.22. Determine whether the following statements are true or false, and justify your
answers.
(1) The set of all n x n matrices A such that AT = A -1 is a subspace of
the vector space Mnxn(l~).
(2) If Q and {3 are linearly independent subsets of a vector space V, then so
is their union Q U {3.
(3) If U and Ware subspaces of a vector space V with bases Q and {3
respectively, then the intersection Q n (3 is a basis for U n W.
(4) Let U be the row-echelon form of a square matrix A. If the first r
columns of U are linearly independent, then so are the first r columns
of A.
(5) Any two row-equivalent matrices have the same column space.
120 CHAPTER 3. VECTOR SPACES

(6) Let A be an m x n matrix with rank m. Then the column vectors of A


span Rm.
(7) Let A be an m x n matrix with rank n. Then Ax = b has at most one
solution.
(8) If U is a subspace of V and x, yare vectors in V such that x + y is
contained in U, then x E U and y E U.
(9) Let U and V are vector spaces. Then U is a subspace of V if and only
if dimU::; dim V.
(10) For any m x n matrix A, dimC(A) + dimN(AT) = m.
Chapter 4

Linear Transformations

4.1 Introduction
As we saw in Chapter 3, there are many vector spaces. Naturally, one can ask
whether or not two vector spaces are the same. To say two vector spaces are
the same or not, one has to compare them first as sets, and then see whether
or not their arithmetical rules are preserved. A usual way of comparing two
sets is defining a function between them. Recall that a function from a
set X into another set Y is a rule which assigns a unique element y in Y
to each element x in X. Such a function is denoted as f : X -> Y and
sometimes referred to as a transformation or a mapping. We say that f
transforms (or maps) X into Y. When given sets are vector spaces, one can
compare their arithmetical rules also by a transformation f if f preserves
the arithmetical rules, that is, f(x + y) = f(x) + f(y) and f(kx) = kf(x)
for any vectors x, y and any scalar k. In this chapter, we discuss this kind
of transformations between vector spaces via the linear equation Ax = b.
For an m x n matrix A, the equation Ax = b means that to every vector
x = [Xl X2 ... Xn]T in ~n the matrix multiplication Ax assigns a vector
b (= Ax) in ~m. That is, the matrix A transforms every vector x in ~n
into a vector b in ~m by the matrix multiplication Ax = b. Moreover,
the distributive law A(x + ky) = Ax + kAy, for k E ~ and x, y E ~n, of
matrix multiplication means that A preserves the sum of vectors and scalar
multiplication.

Definition 4.1 Let V and W be vector spaces. A function T : V -> W is


called a linear transformation from V to W if for all x, y E V and scalar
k the following conditions hold:

121
122 CHAPTER 4. LINEAR TRANSFORMATIONS

(1) T(x + y) = T(x) + T(y),


(2) T(kx) = kT(x).

We often call T simply linear. It is not hard to see that the two condi-
tions for a linear transformation can be combined into a single requirement

T(x + ky) = T(x) + kT(y).


Geometrically, this is just the requirement for a straight line to be trans-
formed into a straight line, since x + ky represents a straight line through x
in the direction y in V, and its image T(x)+kT(y) also represents a straight
line through T(x) in the direction of T(y) in W. The following theorem is
a direct consequence of the definition, and the proof is left for an exercise.

Theorem 4.1 Let T : V -t W be a linear transformation. Then


(1) T(O) = o.
(2) For any Xl, x2, ... , xn E V and scalars kl' k2, ... , kn,
T(kIXI + k2X2 + ... + knxn ) = kIT(XI) + k2T(X2) + ... + knT(xn ).

Example 4.1 Consider the following functions:


(1) f:]R - t ]R defined by f(x) = 2x;
(2) 9 : ]R -t]R defined by g(x) = x 2 - x;
(3) h:]R2 -t]R2 defined by h(x, y) = (x - y, 2x);
(4) k:]R2 -t ]R2 defined by k(x, y) = (xy, x 2 + 1).
One can easily see that 9 and k are not linear, while f and h are linear.

Example 4.2 (1) For an m x n matrix A, the transformation T : ]Rn -t]Rm


defined by the matrix multiplication

T(x) = Ax

is a linear transformation by the distributive law A( x + ky) = Ax + kAy for


any x, y E ]Rn and for any scalar k E lR. Therefore, a matrix A, identified
with T, may be considered to be a linear transformation of]Rn to ]Rm.
(2) For a vector space V, the identity transformation Id : V - t V is
defined by Id(x) = x for all x E V. If W is another vector space, the zero
transformation To : V -t W is defined by To(x) = 0 (the zero vector) for
all x E V. Clearly, both transformations are linear. 0
4.1. INTRODUCTION 123

Nontrivial important examples of linear transformations are the rota-


tions, reflections, and projections in geometry defined in the following ex-
ample.

Example 4.3 (1) Let () denote the angle between the x-axis and a fixed
vector in jR2. Then the matrix

R = [ cos () - sin () ]
(J sin () cos ()

defines a linear transformation on jR2 that rotates any vector in jR2 through
the angle () about the origin, and is called a rotation by the angle ().
(2) The projection on the x-axis is the linear transformation T : jR2 ~
jR2 defined by, for x = (x, y) E jR2,

T(x) = [~ ~] [ :] [~].
(3) The linear transformation T : jR2 ~ jR2 defined by, for x = (x, y),

T(x) = [~ _~] [ : ] = [ _: ]
is called the reflection about the x-axis. o
Problem 4.1 Find the matrix of reflection about the line y = x in the plane 1R2.

Example 4.4 The transformation tr : Mnxn(lR) ~ IR defined as the sum of


diagonal entries
n
tr(A) = an + a22 + ... + ann = L aii,
i=l

for A = [aij] E Mnxn(jR), is called the trace. It is easy to show that

tr(A + B) = tr(A) + tr(B) and tr(kA) = k tr(A)

for any matrices A and B in Mnxn(!~), which means that "tr" is a linear
transformation. In particular, one can easily show that the set of all n x n
matrices with trace 0 is a subspace of Mnxn(jR). 0

Problem 4.2 Let W = {A E Mnxn(lR) : tr(A) = O}. Show that W is a subspace,


and then find a basis for W.
124 CHAPTER 4. LINEAR TRANSFORMATIONS

Problem 4.3 Show that, for any matrices A and B in Mnxn(lR), tr (AB) = tr (BA).
Example 4.5 From the calculus, it is well known that two transformations

defined by differentiation and integration,

D(J)(x) = f'(x), I(J)(x) = lox f(t)dt,


satisfy linearity, and so they are linear transformations. Many problems
related with differential and integral equations may be reformulated in terms
of linear transformations. 0

Definition 4.2 Let V and W be two vector spaces, and let T : V -+ W be


a linear transformation from V into W.
(1) Ker(T) = {v E V : T(v) = O} ~ V is called the kernel of T.
(2) Im(T) = {T(v) E W : v E V} = T(V) ~ W is called the image of T.

Example 4.6 Let V and W be vector spaces and let I d : V -+ V and


To : V -+ W be the identity and the zero transformations, respectively.
Then it is easy to see that Ker(Id) = {O}, Im(Id) = V, Ker(To) = V, and
Im(To) = {O}. 0

Theorem 4.2 Let T : V -+ W be a linear transformation from a vector


space V to a vector space W. Then the kernel Ker(T) and the image Im(T)
are subspaces of V and W, respectively.

Proof: Since T(O) = 0, each of Ker(T) and Im(T) is nonempty having o.


multiplication.
(1) For any x, y E Ker(T) and for any scalar k,

T(x + ky) = T(x) + kT(y) = 0 + kO = o.


Hence x + ky E Ker(T) so that Ker(T) is a subspace of V.
(2) If v, W E Im(T), then there exist x and y in V such that T(x) = v
and T(y) = w. Thus, for any scalar k,

v + kw = T(x) + kT(y) = T(x + ky).

Thus v + kw E Im(T), so that Im(T) is a subspace of W. o


4.1. INTRODUCTION 125

Example 4.7 Let A : Rn - t Rm be the linear transformation defined by an


m x n matrix A as in Example 4.2 (1). The kernel Ker(A) of A consists of all
solution vectors x of the homogeneous system Ax = O. Therefore, the kernel
Ker(A) of A is nothing but the null space N(A) of the matrix A, and the
image Im(A) of A is just the column space C(A) = 1m(A) = A(Rn) ~ R m of
the matrix A. Recall that Ax is a linear combination of the column vectors
of A. 0

One of the most important properties of linear transformations is that


they are completely determined by their values on a basis.

Theorem 4.3 Let V and W be vector spaces. Let {VI, ... , v n } be a basis
for V and let WI, ... , Wn be any vectors (possibly repeated) in W. Then
there exists a unique linear transformation T : V - t W such that T(vd = Wi
for i = 1, ... , n.

Proof: Let x E V. Then it has a unique expression: x = 2::~1 aiVi for


some scalars aI, ... , an. Define
n
T: V -t W by T(x) = Laiwi'
i=1
In particular, T(Vi) = Wi for i = 1, 2, ... , n.
Linearity: For x = 2::i=1 aivi, Y = 2::i=1 bivi E V and k a scalar, we have
x + ky = 2::~1 (ai + kbi)vi. Then
n n n
T(x + ky) = L(ai + kbi)Wi = L aiWi + k L biwi = T(x) + kT(y).
i=1 i=1 i=1

Uniqueness: Suppose that S : V -t W is linear and S(Vi) = Wi for


i = 1, ... , n. Then for any x E V with x = 2::i=1 aivi, we have
n n
S(x) =L aiS(vi) =L aiwi = T(x).
i=1 i=1

Hence, we have S = T. o

Therefore, from an assignment T(Vi) = Wi of an arbitrary vector in


Wto each vector Vi in a basis for V, one can extend it uniquely to a linear
transformation T from a vector space V into W. The uniqueness in the
above theorem may be rephrased as the following corollary.
126 CHAPTER 4. LINEAR TRANSFORMATIONS

Corollary 4.4 Let V and W be vector spaces, and let {VI, ... , v n } be a
basis for V. If S, T: V ---- Ware linear transformations and S(vd = T(Vi)
for i = 1, ... , n, then S = T, i.e., S(x) = T(x) for all x E V.

Example 4.8 Let WI = (1, 0), W2 = (2, -1), W3 = (4, 3) be three vectors
in ]R2.
(1) Let 0 = {el' e2, e3} be the standard basis for the 3-space ]R3, and
let T : ]R3 ____ ]R2 be the linear transformation defined by

Find a formula for T(XI' X2, X3), and then use it to compute T(2, -3, 5).
(2) Let (3 = {VI, V2, V3} be another basis for ]R3, where VI = (1, 1, 1),
V2 = (1, 1, 0), V3 = (1, 0, 0), and let T : ]R3 ---- ]R2 be the linear transfor-
mation defined by

Find a formula for T(XI' X2, X3), and then use it to compute T(2, -3, 5).
Solution: (1) For x = (Xl, X2, X3) = xlel + X2e2 + X3e3 E ]R3,
3 3
T(x) LXiT(ei) = LXiWi
i=l i=l
xI(I, 0) + x2(2, -1) + x3(4, 3)
(Xl + 2X2 + 4X3, -X2 + 3X3).

Thus, T(2, -3, 5) = (16, 18). In matrix notation, this can be written as

(2) In this case, we need to express x = (Xl, X2, X3) as a linear combi-
nation of VI, V2, V3, i.e.,
3
(Xl, X2, X3) =L kiVi
i=l
4.2. INVERTIBLE LINEAR TRANSFORMATIONS 127

By equating corresponding components we obtain a system of equations

The solution is k1 = X3, k2 = X2 - X3, k3 = Xl - X2. Therefore,

(Xl, X2, X3)

T(X1' X2, X3) X3T(V1) + (X2 - x3)T(V2) + (Xl - x2)T(V3)


x3(1, 0) + (X2 - x3)(2, -1) + (Xl - x2)(4, 3)
(4X1 - 2X2 - X3, 3X1 - 4X2 + X3).

From this formula we obtain T(2, -3, 5) = (9, 23). In matrix notation, it

can be written[; -2 -1 1[ Xl 1=
3 -4 1 X2 o
X3

Problem 4.4 Is there a linear transformation T : ]R3 ---+ ]R2 such that T(3, 1, 0) =
(1, 1) and T( -6, -2, 0) = (2, I)? If yes, can you find an expression of T(x) for
x = (Xl, X2, X3) in ]R3?

Problem 4.5 Let V and W be vector spaces and T : V ---+ W be linear. Let
{Wl' W2, ... , wd be a linearly independent subset of the image Im(T) ~ W. Sup-
pose that 0: = {Vl' V2, ... , Vk} is chosen so that T(Vi) = Wi for i = 1, 2, ... , k.
Prove that 0: is linearly independent.

4.2 Invertible linear transformations


Note that a function I from a set X to a set Y is said to be invertible if
there is a function g, which is called the inverse function of I and denoted
by 9 = 1- 1 , from Y to X such that their compositions satisfy go I = I d and
I 0 9 = I d. One can notice that if there exists an invertible function from a
set X into another set Y, then it gives a one-to-one correspondence between
these two sets so that they can be identified as sets. A useful criterion for
a function between two given sets to be invertible is that it is one-to-one
and onto. Recall that a function I : X ---+ Y is said to be one-to-one (or
128 CHAPTER 4. LINEAR TRANSFORMATIONS

injective) if f(u) = f(v) in Y implies u = v in X, and said to be onto (or


surjective) if for each element y in Y there is an element x in X such that
f(x) = y. A function is said to be bijective if it is both one-to-one and
onto, that is, if for each element y in Y there is a unique element x in X
such that f(x) = y.

Lemma 4.5 A function f : X -+ Y is invertible if and only if it is bijective


(or one-to- one and onto).

Proof: Suppose f : X -+ Y is invertible, and let g : Y -+ X be its inverse.


If f(u) = f(v), then u = g(f(u)) = g(f(v)) = v. Thus f is one-to-one. For
each y E Y, g(y) = x E X. Then f(x) = f(g(y)) = y. Thus it is onto.
Conversely, suppose f is bijective. Then, for each y E Y, there is unique
x E X such that f(x) = y. Now for each y E Y define g(y) = x. Then
one can easily check that g : Y -+ X is a well-defined function such that
fog = Id and go f = Id, i.e., g is the inverse of f. 0

The following lemma shows that if a given function is an invertible linear


transformation from a vector space into another, then the linearity is also
preserved by the inversion.

Lemma 4.6 Let V and W be vector spaces. If T : V -+ W is an invertible


linear transformation, then its inverse T-I : W -+ V is also linear.

Proof: Let wI, W2 E W, and let k be any scalar. Since T is invertible,


it is one-to-one and onto, so there exist unique vectors Vl and V2 in V such
that T(vI) = Wl and T(V2) = W2. Then
T-l(Wl + kW2) + kT(V2))
T- l (T(vI)
T- l(T(Vl + kV2))
Vl + kV2
T-I(wd + kT- l (W2). o

Definition 4.3 A linear transformation T : V -+ W from a vector space V


to a vector space W is called an isomorphism if it is invertible (or one-
to-one and onto). In this case, we say V and Ware isomorphic to each
other.
4.2. INVERTIBLE LINEAR TRANSFORMATIONS 129

Lemma 4.6 shows that if T is an isomorphism, then its inverse T- 1 is also


an isomorphism with (T- 1)-1 = T. Therefore, if V and Ware isomorphic
to each other, then it means that they look the same as vector spaces.
If T : V ~ Wand S : W ~ Z are linear transformations, then it is quite
easy to show that their composition (S 0 T)(v) = S(T(v)) is also a linear
transformation from V to Z. In particular, if two linear transformations are
given by matrices A : ]Rn ~ ]Rm and B : ]Rm ~ ]Rk, then their composition
is nothing but the matrix multiplication BA of them, i.e., (B 0 A)(x) =
B(Ax) = (BA)x. Hence, if a linear transformation is given by an invertible
n x n square matrix A : ]Rn ~ ]Rn, then the inverse matrix A-I plays the
inverse linear transformation, so that it is an isomorphism of ]Rn. That is,
a linear transformation given by an n x n square matrix A : ]Rn ~ ]Rn is an
isomorphism if and only if rank A = n.

Problem 4.6 Suppose that 5 and T are linear transformations whose composition
5 0 T is well-defined. Prove that
(1) if 5 0 T is one-to-one, so is T,
(2) if 5 0 T is onto, so is 5,
(3) if 5 and T are isomorphisms, then so is 50 T,
(4) if A and B are two n x n matrices of rank n, then so is AB.

Theorem 4.7 Two vector spaces V and Ware isomorphic if and only if
dimV = dim W.

Proof: Let T : V ~ W be an isomorphism, and let {VI, ... , V n} be a basis


for V. Then we show that the set {T(vd, ... , T(v n )} is a basis for W so
that dim W = n = dim V.
(1) It is linearly independent: Since T is one-to-one, the equation

implies that 0 = q VI + ... + cn v n . Since the vi's are linearly independent,


we have Ci = 0 for all i = 1, ... , n.
(2) It spans W: Since T is onto, for any yEW there exists an x E V
such that T(x) = y. Write x = L~1 aivi. Then

i.e., y is a linear combination of T(Vl), ... , T(v n ).


130 CHAPTER 4. LINEAR TRANSFORMATIONS

Conversely, suppose that dim V = dim W. Then one can choose any
bases {VI, ... , v n } and {WI, ... , w n } for V and W, respectively. By
Theorem 4.3 there exists a linear transformation T : V - t W such that
T(vi) = Wi for i = 1, ... , n. It is not hard to show that T is invertible so
that T is an isomorphism. Hence V and Ware isomorphic. 0

Problem 4.7 Let T : V -+ W be a linear transformation. Prove that


(1) T is one-to-one if and only if Ker(T) = {O},
(2) if V = W, then T is one-to-one if and only if T is onto.

Corollary 4.8 Any n-dimensional vector space V is isomorphic to the n-


space ]Rn.

An ordered basis for a vector space is a basis endowed with a specific


order. Let V be a vector space of dimension n with an ordered basis ct =
{VI, ... , Vn}. Let f3 = {el' ... , en} be the standard basis for ]Rn in this
order. Then clearly the linear transformation q, defined by q,(Vi) = ei is an
isomorphism from V to ]Rn, called the natural isomorphism with respect
to the basis ct. Now for any x = L:f=l aivi E V, the image of x under this
natural isomorphism is written as

n n
q,(x) = ~aiq,(Vi) = ~aiei = (aI, ... , an) =
[ al
a:
1
E ]Rn,
n

which is called the coordinate vector of x with respect to the basis ct, and
is denoted by [xJa(= q,(x)). Clearly [ViJa = ei·
Example 4.9 Recall that, from Example 4.3, the rotation by the angle ()
of ]R2 is given by the matrix

R = [ cos ()
e sin ()
- sin ()
cos ()
1.
Clearly, it is invertible and hence an isomorphism of ]R2. In fact, one can
easily check that the inverse R;l is simply R-e.
Let ct = {el,e2} be the standard basis, and let f3 = {Vl,V2}, where
Vi = Reei, i = 1,2. Then f3 is also a basis for ]R2. The coordinate vectors of
Vi with respect to ct are themselves

[VIJa = [
COS ()
sin ()
1,[V2Ja = [ - cos
sin () 1
() ,
4.2. INVERTIBLE LINEAR TRANSFORMATIONS 131
while

Example 4 .10 In Problem 4.1, one can notice that the reflection about
the line y = x may be obtained by the compositions of rotation by - ~
of the plane, reflection about the x-axis, and rotation by ~. Actually, it
is multiplication of the matrices given in (1) and (3) of Example 4.3 with
e = ~: that is, if we denote rotation by ~ by
RJ!..
4
=[ C?s"47r
sm 1I
. "47r]
- sm
cos 1I
= [..Lv'2
1
-~2]
v ..
1 ,
4 4 72 v'2

and reflection about the x-axis by [~ _~], then R_% = R~l , and the
matrix we want is

RJ!..
4
[10 -10]R-;.l
4
= [1 -1] [10 -10] [--1 -1]
-
v'2
-
v'2 v'2 v'2
= [01 01].
The reflection about any line € in the plane can be obtained in this way:

y
R- o
y = Ro oTo R-o(x)
x T x = R-o(€)
Ro To R_o(x)

where T is the reflection about the x-axis. o

Problem 4.8 Find the matrix of reflection about the line y = J3x in JR2.

Problem 4.9 Find the coordinate vector of 5 + 2x + 3x 2 with respect to the given
ordered basis Q' for P2(JR) :
(1) Q' = {I, X, x 2}; (2) Q' = {I + x, 1 + x 2, X+ x 2}.
132 CHAPTER 4. LINEAR TRANSFORMATIONS

Example 4.11 Let A be an n x n matrix. It is a linear transformation on


the n-space ]Rn defined by the matrix multiplication Ax for any x E ]Rn.
Suppose that rl, ... ,rn are linearly independent vectors in ]Rn constituting
a parallelepiped (see Remark (2) on page 70). Then A transforms this par-
allelepiped into another parallelepiped determined by Arl, . .. ,Arn. Hence,
if we denote the n x n matrix whose j-th column is rj by B, and the n x n
matrix whose j-th column is Arj by C, then clearly, C = AB, so

vol(P(C)) = Idet(AB)1 = Idet Alldet BI = Idet Alvol(P(B)).


This means that, for a square matrix A considered as a linear transformation,
the absolute value of the determinant of A is the ratio between the volumes
of a parallelepiped P(B) and its image parallelepiped P(C) under the trans-
formation by A. If det A = 0, then the image P(C) is a parallelepiped in a
subspace of dimension less than n. D

Problem 4.10 Let T : ]R3 ~ ]R3 be the linear transformation given by T(x, y, z) =
(x + y, y + z, x + z). Let C denote the unit cube determined by the standard basis
el, e2, e3. Find the volume of the image parallelepiped T(C) of C under T.

4.3 Application: Computer graphics


One of the simple applications of a linear transformation is to animations or
graphical display of pictures on a computer screen. For a simple display of
the idea, let us consider a picture in 2-plane ]R2. Note that a picture or an
image on a screen usually consists of a number of points, lines or curves con-
necting some of them, and information about how to fill the regions bounded
by the lines and curves. Assuming that the computer has information about
how to connect the points and curves, a figure can be defined by a list of
points. For example, consider the capital letters "LA" below:

They can be represented by a matrix with coordinates of the vertices.


For the sake of brevity we write it just for "L" as follows: The coordinates
4.3. APPLICATION: COMPUTER GRAPHICS 133

of the 6 vertices form a matrix:

vertices 1 2 3 4 5 6
x-coordinate
y-coordinate [~ 0
2.0
0.5 0.5
2.0 0.5
2.0
0.5
2.0
0.0
1
= A
.

Of course, we assume that the computer knows which vertices are connected
to which by lines via some other algorithm. We know that line segments are
transformed to other line segments by a matrix, considered as a linear trans-
formation. Thus, by multiplying a matrix to A, the vertices are transformed
to the other set of vertices, and the line segments connecting the vertices are
preserved. For example, the matrix B = [~ 0.~5l transforms the matrix
A to the following form, which represents new coordinates of the vertices:

vertices 1 2 3 4 5 6

BA =
[~ 0.5 1.0 0.625 2.125 2.0
2.0 2.0 0.5 0.5 0.0
1.
Now, the computer connects these vertices properly by lines according to
the given algorithm and displays on the screen the changed figure as the left
side of the following:

BA
===> ~5
1
n 3

6
(CB)A
===>

The multiplication of the matrix C = [0 5 0 ~ 1to BA shrinks the


width of BA by half, the right side of the above figure. Thus, changes in
the shape of a figure may be obtained by compositions of appropriate linear
transformations. Now, it is suggested that the readers try to find various
matrices such as reflections, rotations, or any other linear transformations,
and multiply them to A to see how the shape of the figure changes.
Remark: Incidentally, one can see that the composition of a rotation by 7r
followed by a reflection about an axis is the same as the composition of the
reflection followed by the rotation. In general, a rotation and a reflection
are not commutative, neither are a reflection and another reflection.
134 CHAPTER 4. LINEAR TRANSFORMATIONS

! /J
Rotation by 7r

~ Reflection Reflection ~

\\ Rotation by
~

7r
J\
The above argument generally applies to figures in any dimension. For
instance, a 3 x 3 matrix may be used to convert a figure in ]R3 since each
point has 3 components.

Example 4.12 It is easy to see that the matrices

,R(y,(3) = [
COS

0
f3
~ - s~nf3l '
sin f3 o cos f3

are the rotations about the x, y, z-axes by the angles 0:, f3 and ",(, respectively.
In general, the matrix that rotates ]R3 with respect to a given axis is
useful in many applications. One can easily express such a general rotation
as a composition of basic rotations such as R(x,o:), R(y,(3) and R(z,,).
z
u

--,~::........L ___ Y

x
4.4. MATRICES OF LINEAR TRANSFORMATIONS 135

Suppose that the axis of a rotation is the line determined by the vector
u = (cos 0' cos j3, cos 0' sin j3, sin 0'), - ~ :S 0' :S ~, 0 :S j3 :S 27r, in spherical
coordinates, and we want to find the matrix R(u,B) of the rotation about the
u-axis by (): For this, we first rotate the u-axis about the z-axis into the
xz-plane by R(z,_(3) and then about the y-axis into the x-axis by R(y,-o:).
The rotation about the u-axis is the same as the rotation about the x-axis,
i.e., one can use the rotation R(x,B) about the x-axis. After this, we get back
to the rotation about the u-axis via R(y,o:) and R(z,(3)' In summary,

R(u,B) = R(z,(3)R(y,o:)R(x,B)R(y,-o:)R(z,_(3)· o
Problem 4.11 Find the matrix R(u,f) for the rotation about the line determined by
7r
U = (1, 1, 1) by "4'

4.4 Matrices of linear transformations


We saw that multiplication of an m x n matrix A with an n x 1 column matrix
x gives rise to a linear transformation from jRn to jRm. In this section, we
show that for any vector spaces V and W (not necessarily the n-spaces), a
linear transformation T : V --* W can be represented by a matrix.
Recall that, for any n-dimensional vector space V with an ordered basis,
there is a natural isomorphism from V to the n-space jRn, which depends
on the choice of a basis for V. Let T : V --* W be a linear transformation
from an n-dimensional vector space V to an m-dimensional vector space W.
Take ordered bases 0' = {VI, ... , v n } for V and j3 = {WI, ... , w m } for
W, and fix them in the following discussion. Then each vector T(vj) in W
is expressed uniquely as a linear combination of the vectors WI, ... , Wm in
the basis j3 for W, say

1 T(vIi
T(v2)
aUwI
aI2 w l
+
+
a2I w 2
a22 w 2
+
+
+
+
amIWm
a m 2W m

T(v n ) alnwl + a2n W 2 + ... + amnwm ,

or, in a short form,


m
T(Vj) =L aijWi for 1 :S j :S n,
i=l
136 CHAPTER 4. LINEAR TRANSFORMATIONS

for some scalars aij (i 1, ... , m; j = 1, ... , n). Notice the indexing
order of aij in this expression: The coordinate vector [T(vj )J/3 of T(vj) with
respect to the basis (3 can be written as a column vector

Now for any vector x = 2:.j=l XjVj E V,

n n m
T(x) LxjT(Vj) = LXj Laijwi
j=l j=l ;=1

Therefore, the coordinate vector of T(x) with respect to the basis (3 is

[T(x)]/3 = = A[x]Q,

where [x]Q = [Xl ... xn]T is the coordinate vector of x with respect to
the basis 0' in V. In this sense, we say that matrix multiplication by A
represents the transformation T. Note that A = [aij] is the matrix whose
column vectors are just the coordinate vectors [T(vj)]/3 ofT(vj) with respect
to the basis (3. Moreover, for the fixed bases 0' for V and (3 for W, the matrix
A associated with the linear transformation T with respect to these bases is
unique, because the coordinate expression of a vector with respect to a basis
is unique. Thus, the assignment of the matrix A to a linear transformation
T is well-defined.

Definition 4.4 The matrix A is called the associated matrix for T (or
matrix representation of T) with respect to the bases 0' and (3, and de-
noted by A = [T]~.

Now, the above argument can be summarized in the following theorem.

Theorem 4.9 Let T : V --+ W be a linear transformation from an n-


dimensional vector space V to an m-dimensional vector space W. For fixed
4.4. MATRICES OF LINEAR TRANSFORMATIONS 137

ordered bases Ct for V and /3 for W, the coordinate vector [T(x)],a of T(x)
with respect to /3 is given as a matrix product of the associated matrix [T]~
ofT and [x]a, i.e.,
[T(x)],a = [T]~[x]a.
The associated matrix [T]~ is given as

This situation can be incorporated in the following commutative diagram:


T
V W
x ~ T(x)
~j T
[x] a ~
T
[T(x)],a
j~
]R.n ]R.m ,
A= [T]~
where <I> and W denote the natural isomorphisms, defined in Section 4.2,
from V to ]R.n with respect to Ct, and from W to ]R.m with respect to /3,
respectively. Note that the commutativity of the above diagram means that
A 0 <I> = W 0 T. When V = Wand Ct = /3, we simply write [T]a for [T]~.

Remark: (1) Note that an m x n matrix A is the matrix representation


of A itself with respect to the standard bases Ct for ]R.n and 'Y for ]R.m, i. e.,
A = [A]~. In particular, if A is an invertible n x n square matrix, then the
column vectors Cl, ... , C n form another basis /3 for ]R.n. Thus, A is simply
the linear transformation on ]R.n that takes the standard basis Ct to /3, in fact,

the j-th column of A.


(2) Let V and W be vector spaces with bases Ct and /3, respectively, and
let T : V --t W be a linear transformation with the matrix representation
[T]~ = A. Then it is quite clear that Ker(T) and Im(T) are isomorphic to
N(A) and C(A), respectively, via the natural isomorphisms. In particular,
if V = ]R.n and W = ]R.m with the standard bases, then Ker(T) = N(A), and
Im(T) = C(A). Therefore, from Corollary 3.17, we have

dim(Ker(T)) + dim(Im(T)) = dim V.


138 CHAPTER 4. LINEAR TRANSFORMATIONS

The following examples illustrate the computation of matrices associated


with linear transformations.

Example 4.13 Let 1d : V ----t V be the identity transformation on a vector


space V. Then for any ordered basis a for V, the matrix [1 dl = 1, the Q

identity matrix.

Example 4.14 Let T : H (lR) ----t P2(lR) be the linear transformation defined
by
(T(p))(x) = xp(x).
Then, with the bases a = {1, x} and (3 = {1, x, x 2} for H (lR) and P2(lR) ,

,e'pcctively, the ,,-~,ociatcd mat,ix fm T i, [Tl~ ~ [r ~]. 0

Example 4.15 Let T : lR 2 ----t lR 3 be the linear transformation defined by


T(x, y) = (x + 2y, 0, 2x + 3y) with respect to the standard bases a and (3
for lR and lR , respectively. Then
2 3

T(el) T(l, 0) = (1, 0, 2) = leI + Oe2 + 2e3,


T(e2) T(O, 1) = (2, 0, 3) = 2eI + Oe2 + 3e3.

Hence, [Tl~ ~ [~ ~]. If (I' ~ {es, e" e,l, then [Tl~' ~ [~ ~]. 0

Example 4.16 Let T : lR 2 ----t lR 2 be a linear transformation given by


T(l, 1) = (0, 1) and T(-l, 1) = (2, 3). Find the matrix representation
[Tl of T with respect to the standard basis a = {el' e2}.
Q

Solution: Note that (a, b) = ael + be2 for any (a, b) E lR 2. Thus the
definition of T shows

T(el)+ T(e2) T(el + e2) = T(l, 1) = (0, 1) = e2,


-T(ed + T(e2) T( -el + e2) = T( -1, 1) = (2, 3) = 2eI + 3e2.

By solving these equations, we obtain

T(ed -el
T(e2) = el
4.4. MATRICES OF LINEAR TRANSFORMATIONS 139

Therefore, [T]a = [-1 1]


-1 2 . D

Example 4.17 Let T be the linear transformation in Example 4.16. Find


[T].6 for a basis (3 = {VI, V2}, where VI = (0, 1) and V2 = (2, 3).
Solution: From Example 4.16,

T(VI) [=~ ~] [~ ] = [ ~ ] = [T(vdla,

T(V2) = [=~ ~] [; ]= [ ! ] = [T(V2)la'

Writing these vectors with respect to (3, we get

[ ~ ] = aVI +bV2 = [ a +;: ], [! ] = + CVI dV2 = [ c +;~ ].


Solving for a, b, c and d, we obtain

[T(vdl.6 = [ ~ ] = ~ [ ~ ] , and [T(V2)l.6 = [ ~ ] = ~ [ ~ ].


Therefore, [Tl.6 = ~ [~ ~]. D

Problem 4.12 Find the matrix representation of each of the following linear trans-
formations T of ]R3 with respect to the standard basis a = {el' e2, e3}, and
f3 = {e3, e2, ed:
(1) T(x, y, z) = (2x - 3y + 4z, 5x - y + 2z, 4x + 7y),
(2) T(x, y, z) = (2y + z, x - 4y, 3x).

Problem 4.13 Let T : ]R4 -+ ]R3 be the linear transformation defined by


T(x, y, z, u) = (x+2y, x-3z+u, 2y+3z+4u).
Let a and f3 be the standard bases for ]R4 and ]R3, respectively. Find [Tl~.

Problem 4.14 Let Id: ]Rn -+]Rn be the identity transformation. Let Xk denote the
vector in ]Rn whose first k -1 coordinates are zero and the last n - k + 1 coordinates
are 1. Then clearly f3 = {Xl, ... , xn} is a basis for ]Rn (see Problem 3.9). Let
a = {el' ... , en} be the standard basis for ]Rn. Find the matrix representations
[IdJ~ and [Idj3.
140 CHAPTER 4. LINEAR TRANSFORMATIONS

4.5 Vector spaces of linear transformations


Let V and W be two vector spaces. Let £(V; W) denote the set of all linear
transformations from V to W, i.e.,

£(V; W) = {T : T is a linear transformation from V into W}.


For S, T E £(V; W) and ,\ E lR, define the sum S + T and the scalar
multiplication '\S by

(S + T)(v) = S(v) + T(v), and ('\S)(v) = '\(S(v))


for any v E V. Then clearly S + T and '\S belong to £(V; W), so that
£(V; W) becomes a vector space. In particular, if V = lR n and W = lR m , then
the set Mmxn(lR) is precisely the vector space of the linear transformations
of lR n into lRm with respect to the standard bases. Hence, by fixing the
standard bases, we have identified £(lRn; lRm) = Mmxn(lR) via the matrix
representation.
In general, for any vector spaces V and W of dimensions nand m with
ordered bases a and /3, respectively, there is a one-to-one correspondence
between £(V; W) and Mmxn(lR) via the matrix representation.
Let us first define a transformation cP : £(V; W) -+ Mmxn(IR) as

¢(T) = [Tl~ E Mmxn(lR)

for any T E £(V; W) (see Section 4.4). If [Sl~ = [Tl~ for Sand T E
£(V; W), then we have S = T by Corollary 4.4. This means that ¢ is
one-to-one.
On the other hand, an m x n matrix A, considered as a linear transfor-
mation from lR n to lR m , gives rise to a linear transformation T from V to
W via the composition of A with the natural isomorphisms <1> and W, i.e.,
T = w- 1 0 A 0 <1>, which satisfies [Tl~ = A. This means that ¢ is onto.
Therefore, ¢ gives an one-to-one correspondence between £(V; W) and
Mmxn(lR). Furthermore, the following theorem shows that ¢ is linear, so
that it is in fact an isomorphism from £(V; W) to Mmxn(lR).

Theorem 4.10 Let V and W be vector spaces with ordered bases a and /3,
respectively, and let S, T: V -+ W be linear. Then we have

[S + Tl~ = [Sl~ + [Tl~ and [kSl~ = k[Sl~·


4.5. VECTOR SPACES OF LINEAR TRANSFORMATIONS 141

Proof: Let 0: = {VI, ... , v n } and (3 = {WI, ... , w m }. Then we have


unique expressions 8(vj) = 2:~1 aijWi and T(vj) = 2:~1 bijWi for each
1 :S j :S n. Hence
m m m
(8 + T)(vj) = L aijWi +L bijWi = L(aij + bij)Wi.
i=1 i=1 i=1

Thus
[8 + Tl~ = [8l~ + [Tl~·
The proof of the second equality [k8l~ = k[8l~ is left as an exercise. 0

In summary, for vector spaces V of dimension nand W of dimension


m with fixed ordered bases 0: and (3 respectively, the vector space .c(V; W)
of all linear transformations from V to W can be identified with the vector
space Mmxn(lR) of all m x n matrices so that

dim.c(V; W) = dim Mmxn(lR) = mn = dim V dim W.

Remark: (1) Let Ax = b be a system oflinear equations for an mxn matrix


A. By considering the coefficient matrix A as a linear transformation, one
can have other equivalent conditions to those in Theorems 3.23 and 3.24:
The conditions in Theorem 3.23 (e.g., rank A = m) are equivalent to the
condition that A is surjective, and those in Theorem 3.24 (e.g., rank A = n)
are equivalent to the condition that A is one-to-one. This observation gives
the proof of (15)-(16) in Theorem 3.25.
(2) With the identification of vector spaces .c(V; W) and Mmxn(lR) as
above, we can have, by Theorem 3.25, the following equivalent conditions
for a linear transformation T on a vector space V:
(i) T is an isomorphism,
(ii) T is one-to-one,
(iii) T is surjective.
(One can also prove them directly by using the definition of a basis for V.)
The next theorem shows that the one-to-one correspondence between
.c(V; W) and Mmxn(lR) preserves not only the linear structure but also the
compositions of linear transformations. Let V, Wand Z be vector spaces.
Suppose that 8 : V - t Wand T : W - t Z are linear transformations. Then
clearly the composition T 0 8 : V - t Z is also linear. Often we refer this
composition to the product operation of linear transformations.
142 CHAPTER 4. LINEAR TRANSFORMATIONS

Theorem 4.11 Let V, Wand Z be vector spaces with ordered bases a, (3,
and /, respectively. Suppose that S : V --t Wand T : W --t Z are linear
transformations. Then
[T 0 SJ~ = [TJJ[SJ~.

Proof: Let a = {VI, ... , v n }, (3 = {WI, ... , w m } and / = {ZI' ... , zd·
Let [TJ3 = [aijJ and [SJ~ = [bpqJ. Then, for 1 ::; i ::; n

(T 0 S)(Vi) = T(S(Vi)) =T (f bkiWk) = f bkiT(wk)


k=1 k=1

= f
k=1
bki (t
j=1
ajkZ j ) = t (f
j=1 k=1
ajkbki) Zj.

It shows that [T 0 SJ~ = [TJJ[SJ~. 0

Problem 4.15 Let a be the standard basis for lR3, and let S, T : lR3 ---> lR3 be two
linear transformations given by
S(el) = (2, 2, 1), S(e2) = (0, 1, 2), S(e3) = (-1, 2, 1),
T(ed = (1, 0, 1), T(e2) = (0, 1, 1), T(e3) = (1, 1, 2).
Compute [S + Tl"" [2T - Sl", and [T a Sl",.
Problem 4.16 Let T : P2(lR) ---> P2(lR) be the linear transformation defined by
T(f) = (3+x)/' +2/, and S: P2(lR) ---> lR3 defined by S(a+bx+cx 2) = (a-b, a+
b, c). For a basis a = {1, X, x 2 } for P2 (lR) and the standard basis f3 = {el' e2, e3}
for lR 3 , compute [Sl~, [Tl"" and [S a Tl~.

Theorem 4.12 Let V and W be vector spaces with ordered bases a and (3,
respectively, and let T : V --t W be an isomorphism. Then

Proof: Since T is invertible, dim V = dim W, and the matrices [TJ~ and
[T- I J3 are square and of the same size. Thus,

[TJ~[T-IJ3 = [T 0 T- IJ{3 = [IdJ{3

is the identity matrix. Hence, [T- I J3 = ([TJ~)-I. o

In particular, if a linear transformation T : V --t W is an isomorphism,


then [TJ~ is an invertible matrix for any bases a for V and (3 for W.
4.6. CHANGE OF BASES 143

Problem 4.17 For the vector spaces P1(lR) and ]R2, choose the bases a = {I, x} for
P1(]R) and (3 = {el, e2} for ]R2, respectively. Let T : Pl(]R) -+ ]R2 be the linear
transformation defined by T(a + bx) = (a, a + b).
(1) Show that T is invertible. (2) Find [Tl~ and [T- l l3.

4.6 Change of bases


In Section 4.2, we saw that any vector space V of dimension n with an
ordered basis a is isomorphic to the n-space ]Rn via the natural isomorphism
IP. It assigns the coordinate vector in]Rn to each x E V, i.e., lP(x) = [xjQ ' Of
course, we can get a different isomorphism if we take another basis (3 instead
of a: That is, the coordinate expression [x].a of x with respect to f3 may be
different from [x]Q. Thus, one may naturally ask what the relation between
[xjQ and [x].a is for the two different bases. In this section, we discuss this
question. One of the fundamental problems in linear algebra is to find bases
for which the matrix representation of a linear transformation is as simple
as possible.
Let us begin with an example in the plane ]R2. The coordinate expression
of x = (x, y) E ]R2 with respect to the standard basis a = {el, e2} is

x = xel + ye2, so that [x]Q = [ ~ ].


Now let f3 = {e~, e;} be another basis for ]R2 obtained by rotating a
counterclockwise through an angle e.

o x

Then the coordinate expression of x E ]R2 with respect to f3 is written as


x = x'e~ + y'e;, or [x].a = [ ~; ].
144 CHAPTER 4. LINEAR TRANSFORMATIONS

In particular, the expression of the vectors in (3 with respect to 0: are

e'1 1d(eD cosOel + sinOe2


e'2 Id(e2) - sin 0 el + cos 0 e2,
so
,
[ella =
[ cos 0 ]
sinO ' [e2'] a = [ - sin 00 ] .
cos
Therefore,

x x'e~ +y'e2 = (X'CosO-y'sinO)el+(X'sinO+y'cOSO)e2


xel + ye2·

This can be written as the following matrix equation:

[ x]
y = [ cos
sinO
0 - cosO
sin 0 ] [
y' x' ] ,0r[x]a=[1d];3[x];3,
a

where
[I d13 = [[e~]a [e2]al = [ ~~~: - ;~:: ].
Note that [Id]/-'(./ = ([Id]a)- 1
=[
.cos 0 sin 0 ] by Theorem 4.12.
a ;3 - sm 0 cos 0

In general, let 0: = {VI, V2, ... , v n } and (3 = {WI, W2, ... , w n } be


two ordered bases for V. Then any vector x E V has two expressions:
n n
X = LXiVi = LYjWj.
i=1 j=1
Now, each vector in (3 is expressed as a linear combination of the vectors in
0:: Wj = Id(wj) = L~l qijVi for j = 1, 2, ... , n, so that

Then for any x E V,


n n
x LYj LqijVi
j=l i=l
4.6. CHANGE OF BASES 145

This is equivalent to the following matrix equation:

or
[x]n = [1d]3[x]/3

1d
v V

j
x f---v-> x
~' T T
[x]/3 f-----v-> [x]n
j~
jRn ]Rn ,
Q = [1d]3
where
q1n 1= [ [W1]n ... [Wn]n].
qnn

Definition 4.5 The matrix representation [1 d]3 of the identity transfor-


mation 1d : V --+ V with respect to any two bases a and (3 is called the
transition matrix or the coordinate change matrix from (3 to a.

Since the identity transformation 1 d : V --+ V is invertible, the transition


matrix Q = [1d]3 is also invertible by Theorem 4.12. If we had taken the
expressions of the vectors in the basis a with respect to the basis (3: Vj =
1d(vj) = I:i=l PijWi for j = 1, 2, ... , n, then we would have [Pij] = [1 d]~ =
Q-1 and

Example 4.18 Let the 3-space ]R3 be equipped with the standard xyz-
coordinate system, i. e., with the standard basis a = {e1' e2, e3}. Take
a new x' y' z' -coordinate system by rotating the xyz-system around its z-
axis counterclockwise through an angle (), i. e., we take a new basis (3 =
{e~, e;, e3} by rotating the basis a about z axis through (). Then we get

[e~]n = [
COS ()
Si~ ()
1,
146 CHAPTER 4. LINEAR TRANSFORMATIONS

Hence, the transition matrix from {3 to a is

COS () - sin ()
Q = [I d]~ = [ sin () cos ()
o 0
so

[x]n = [ ~Xl [ cos ()


si~ ()
- sin () 0
c~s () ~
1[ x'~: 1 = Q[x](3.

Moreover, Q = [I d]~ is invertible and the transition matrix from a to (3 is

Q-l = [ld]~ = [
COS ()
sin () 0
-s~n() cos () 0
1
,
o 1

so that

[
X'
y' 1 [
=
c~s ()
- sm ()
sin ()() 00
cos 1[ xy1.
~ 001 z o
Problem 4.18 Find the transition matrix from a basis 0: to another basis f3 for the
3-space ]R3, where
0: = {(I, 0, 1), (1, 1,0), (0, 1, I)}, f3 = {(2, 3,1), (1,2,0), (2,0, 3)}.

4.7 Similarity
The coordinate expression of a vector in a vector space V depends on the
choice of an ordered basis. Hence, the matrix representation of a linear
transformation is also dependent on the choice of bases.
Let V and W be two vector spaces of dimensions nand m with two
ordered bases a and {3, respectively, and let T : V -+ W be a linear transfor-
mation. In Section 4.4, we discussed how to find [T]~. If we have different
bases a' and (3' for V and W, respectively, then we get another matrix
representation [T]~; of T. We, in fact, have two different expressions

[X]n and [x]nl in ]Rn for each x E V,


[T(x)](3 and [T(x)](31 in ]Rm for T(x) E W.
4.7. SIMILARITY 147

They are related by the transition matrices in the following equations:

[xl a' = [Idvl~'[xla, and [T(x)l!3' = [Idwl~'[T(x)l!3'

On the other hand, by Theorem 4.9, we have

[T(x)l!3 = [Tl~[xla, and [T(x)l!3'

Therefore, we get

Actually, from Theorem 4.11, this relation can be obtained directly as

since T = I dw 0 T 0 I dv. Note that [Tl~ and [Tl~: are m x n matrices,


[Idvl~, is an n x n matrix and [Idwl~' is an m x m matrix.
The relation can also be incorporated in the following commutative dia-
grams:

]Rm

T
(V, a') - - - - - -. (W, f3')
:/
[IdvJ~, Idv j IIdw [Idwl~'
(V,a) ___T_ ___• (W, (3)

~ ]Rm

The following theorem summarizes the above argument.


148 CHAPTER 4. LINEAR TRANSFORMATIONS

Theorem 4.13 Let T : V ----) W be a linear transformation on a vector


space V with bases 0' and 0" to another vector space W with bases (3 and (3'.
Then

where Q = [1dvJ~, and P = [1dwJ~, are the transition matrices.

In particular, if we take W = V, 0' = (3 and 0" = (3', then P = Q and we


get to the following corollary.

Corollary 4.14 Let T : V ----) V be a linear transformation on a vector


space V, and let 0' and (3 be ordered bases for V. Let Q = [1dJ~ be the
transition matrix from (3 to 0'. Then
(1) Q is invertible, and Q-I = [1 d]~.
(2) For any x E V, [xJa = Q[x],e.
(3) [TJ,e = Q-I [T]aQ.

Relation (3) of [T],e and [T]a in Corollary 4.14 is called a similarity. In


general, we have the following definition.

Definition 4.6 For any square matrices A and B, A is said to be similar


to B if there exists a nonsingular matrix Q such that B = Q-I AQ.

Note that if A is similar to B, then B is also similar to A. Thus we


simply say that A and B are similar matrices. We saw in Theorem 4.14 that
if A and Bare n x n matrices representing the same linear transformation
T, then A and B are similar.

Example 4.19 Let (3 = {VI, V2, V3} be a basis for lR?3 consisting of VI =
(1, 1, 0), V2 = (1, 0, 1) and V3 = (0, 1, 1). Let T be the linear transfor-
mation on lR?3 given by the matrix

[T],e = [ ~ ~ ~ - ].
-1 1 1

Let 0' = {el' e2, e3} be the standard basis. Find the transition matrix [1d]~
and [TJa.
4.7. SIMILARITY 149

o1 0]
1 , and [Idl~ = ([Idl3)-I = ~ -1~ -~1
[
-1 ]
1 .
1 1 1

Therefore,

[Tla = [Idl3[Tli3[Idl~ = - 1[ 43 -12 2]


1 .
2 -1 1 7 o

Example 4.20 Let T : ]R3 --t ]R3 be the linear transformation defined by

Let Ct = {eI' e2, e3} be the standard ordered basis. Then we clearly have

Let (3 = {VI, V2, V3} be another ordered basis for ]R3 consisting of VI
(-1, 0, 0), V2 = (2, 1, 0), and V3 = (1, 1, 1). Let Q = [Idl3 be the
transition matrix from (3 to Ct. Since Ct is the standard ordered basis for ]R3,
the columns of Q are simply the vectors in (3 written in the same order, with
an easily calculated inverse. Thus

Q= [-1001
21]
0 1 1 ,

A straightforward multiplication shows that

To show that this is the correct matrix, we can verify that the image under T
of the j-th vector of (3 is the linear combination of the vectors of (3 with the
entries of the j-th column of [Tli3 as its coefficients. For example, for j = 2
150 CHAPTER 4. LINEAR TRANSFORMATIONS

we have T(v2) = T(2, 1,0) = (5,3, -1). On the other hand, the coefficients
of [T(v2)]{3 are just the entries of the second column of [T]{3. Therefore,
2Vl + 4V2 - V3
12el + 4(2el + e2) - (el + e2 + e3)
5el + 3e2 - e3 = (5,3, -1),
as expected. D

The next theorem shows that two similar matrices are matrix represen-
tations of the same linear transformation.
Theorem 4.15 Suppose that A represents a linear transformation T : V ---+
V on a vector space V with respect to an ordered basis a = {VI, ... , v n },
i.e., [TJQ = A. If B = Q-l AQ for some nonsingular matrix Q, then there
exists a basis f3 for V such that B = [T]{3, and Q = [IdJ3.

!
Proof: Let Q = [qij] and let WI, ... , Wn be the vectors in V defined by
WI = qn Vl+q21 V2+···+qnlv n
w2 = q12v l + Q22 v 2 + ... + Qn2 v n

Wn = Qlnvl + Q2n v 2 + ... + QnnVn·

Then the nonsingularity of Q = [%] implies that f3 = {WI, ... , w n } is an


ordered basis for V, and Theorem 4.14 (3) shows that [TJ{3 = Q-l [TJQQ =
Q-l AQ = B with Q = [IdJ3. D

Example 4.21 Let D be the differential operator on the vector space P2(lR).
Given two ordered bases a = {I, x, x 2 } and f3 = {I, 2x, 4x 2 - 2} for P 2 (lR),
we first note that
D(l) = o· 1 + 0 . x + 0 . x 2
D(x) 1 . 1 + 0 . x + 0 . x2
D(x 2 ) = o. 1 + 2 . x + 0 . x 2 .
Hence, the matrix representation of D with respect to a is given by

[D]a ~ [~ ~ ~ 1
4.7. SIMILARITY 151

Applying D to 1, 2x and 4x 2 - 2, one obtains

D (1) 0 . 1 + 0 . 2x + 0 . (4x 2 - 2)
D (2x ) = 2· 1 + 0 . 2x + 0 . (4x 2 - 2)
D (4x 2 - 2) = O· 1 + 4 . 2x + 0 . (4x 2 - 2).

Thus,

The transition matrix Q from (3 = {I, 2x, 4x 2 - 2} to Q = {I, X, x 2 } and


its inverse are easily calculated as

Q = [1 d]3 = [1 0 -2]
0 2
004
0 ,

A simple computation shows that [D].a = Q-l[D]aQ. o

Problem 4.19 Let T : ]R3 -+]R3 be the linear transformation defined by

T(XI' X2, X3) = (Xl + 2X2 + X3, -X2, Xl + 4X3).


Let a be the standard basis, and let (3 = {VI, V2, V3} be another ordered basis
consisting of VI = (1, 0, 0), V2 = (1, 1, 0), and V3 = (1, 1, 1) for ]R3. Find
the associated matrix of T with respect to a and the associated matrix of T with
respect to (3. Are they similar?

Problem 4.20 Suppose that A and B are similar n x n matrices. Show that
(1) det A = det B,
(2) tr A = tr B,
(3) rank A = rank B.
Problem 4.21 Let A and B be n x n matrices. Show that if A is similar to B, then
A 2 is similar to B2.
152 CHAPTER 4. LINEAR TRANSFORMATIONS

4.8 Dual spaces


In this section, we are concerned exclusively with linear transformations
from a vector space V to the one-dimensional vector space ]RI. Such a linear
transformation is called a linear functional of V. The definite integrals
of continuous functions is one of the most important examples of linear
functionals in mathematics.
For a matrix A regarded as a linear transformation A : ]Rn --+ ]Rm, we saw
that the transpose AT of A is another linear transformation AT : ]Rm --+ ]Rn.
For a linear transformation T : V --+ W on a vector space V to W, one can
naturally ask what its transpose is and what the definition is. This section
will answer those questions.

Example 4.22 Let C[a, b] be the vector space of all continuous real-valued
functions on the interval [a, b]. The definite integral I : C[a, b] --+ ]R defined
by
I(J) = lb f(t)dt

is a linear functional of C [a, b]. In particular, if the interval is [0, 27r] and n
is an integer, then
Fn(J) = ~ r
27r Jo
27r
f(t)e-intdt

is a linear functional, called the n-th Fourier coefficient of f.

Example 4.23 The trace function tr : Mnxn(]R) --+ ]R is a linear functional


of Mnxn(I~).

Note that as we saw in Section 4.5, the set of all linear functionals of V
is the vector space £(V;]RI) whose dimension equals the dimension of V (see
page 141).

Definition 4.7 For a vector space V, the vector space of all linear func-
tionals of V is called the dual space of V and denoted by V*.

Recall that such a linear transformation T : V --+ ]R is completely deter-


mined by the values on a basis for V. Thus if ct = {VI, V2, ... , V n} is a basis
for a vector space V, then the functions vi : V --+ ]R defined by vi (v j) = bij
for each i, j = 1, ... , n are clearly linear functionals of V, called the i-th
coordinate function with respect to the basis ct. In particular, for any
x = L aivi E V, vi(x) = ai, the i-th coordinate of x with respect to ct.
4.8. DUAL SPACES 153

Theorem 4.16 The set Q* = {vi, V2' ... , V~} forms a basis for the dual
space V*, and for any T E V* we have
n
T = LT(Vi)vi.
i=l

Proof: Clearly, the set Q* = {vi, V2' ... , v~} is linearly independent,
since 0 = 2:i=l civi implies 0 = 2:i=l CiVi(Vj) = Cj for each j = 1, ... , n.
Moreover, the set Q* spans V*; for any T E V* and any Vj E Q, we have

Hence, by Corollary 4.4, we get T = 2:i=l T(Vi)vi. o

Definition 4.8 For a basis Q = {VI, V2, ... , v n } for a vector space V, the
basis Q* for V* is called the dual basis of Q.

This theorem says that, for a fixed basis Q = {VI, ... , v n } for V, the
transformation * : V ---t V* given by *(vd = vi is an isomorphism between
V and V*. Therefore, we have the following corollary.

Corollary 4.17 Any finite-dimensional vector space is isomorphic to its


dual space.

Example 4.24 Let Q = {(I, 2), (1, 3)} be a basis for]R2. To determine the
dual basis Q* = {j, g} of Q, we consider the equations

1 = f(l, 2) = f(et) + 2f(e2)


o f(l, 3) = f(el) + 3f(e2).
Solving these equations, we obtain that f(el) = 3 and f(e2) = -1, and
f(x, y) = 3x - y. Similarly, it can be shown that g(x, y) = -2x + y. 0

Example 4.25 Consider V =]Rn with the standard basis Q = {el, ... ,en},
and its dual basis Q* = {ei, ... , e~} for ]Rn*. Then for a vector a =
(al, ... , an) = aIel + .. +anen E ]Rn, we have ei(a) = ei(alel + .. ·+anen )·=
ai. That is,

a = (al, ... , an) = (ei(a), ... , e~(a)) = (ei, ... , e~)(a).


154 CHAPTER 4. LINEAR TRANSFORMATIONS

On the other hand, when we write a vector in IR n as x = (Xl, ... , xn)


in coordinate functions (or unknowns) Xi, it means that given a point a =
(a I, ... , an) E IR n each Xi gives us the i-th coordinate of a, that is,

In this way, we have identified ei = Xi for i = 1, ... , n, i.e., IR n* = IRn.


Thus, the actual meaning of the usual coordinate expression (Xl' ... ' xn) of
x is just a vector in IRM such that (Xl, ... ,Xn)(a) = (aI, ... ,an) for a point
a E IRn. 0

Now, consider two vector spaces V and W with fixed bases ex and (3,
respectively. Let 8 : V ---+ W be a linear transformation from V to W. Then
for any linear functional 9 E W*, i. e., 9 : W ---+ IR, it is easy to see that the
composition go S(x) = g(S(x)) for x E V defines a linear functional on V,
i. e., 9 0 S E V*. Thus, we have a transformation S* : W* ---+ V* defined by
S* (g) = 9 0 8 for 9 E W*.

Theorem 4.18 The mapping S* : W* ---+ V* defined by S*(g) = go S for


9 E W* is a linear transformation and [8*]3: = (lS]~) T .
Proof: The mapping S* is clearly linear by the definition of a composition
of functions. Let ex = {VI, ... , v n } and (3 = {WI, ... , w m } be bases for V
and W with their dual bases ex* = {vi, ... , v~} and (3* = {wi, ... , w:n},
respectively. Let [S]~ = [aij] and [S*]3: = [b k£]. Then,
m n
S(Vi) = Lakiwk and S*(W;) = Lbijvi,
k=l i=l

for 1 ::; i ::; nand 1 ::; j ::; m. Thus,

S*(W;)(Vi) = (w; 08)(Vi)


W;(S(Vi)) = w; (f
k=l
akiWk)
m

L akiw;(wk)
k=l

Hence, we get [S*]3: = (l8]~) T . o

Remark: Theorem 4.18 shows that the matrix representation of S* is just


the transpose of that of S. And hence, the linear transformation S* is called
the transpose (or adjoint) of S, denoted also by ST.
4.8. DUAL SPACES 155

Example 4.26 With the identification lRn * = lRn in Example 4.25, the
transpose AT of a matrix A is actually A *:
o
For two linear transformations S : U -+ V and T : V -+ W, it is quite
easy to show (the readers may try to) that
(T 0 S)* = S* 0 T*.
Thus, if S : V -+ W is an isomorphism, then so is its transpose S* : W* -+
V*. In particular, since * : V -+ V* is an isomorphism, so is its transpose
** : V* -+ V**. Note that even though the isomorphism * : V -+ V* depends
on a choice of a basis for V, there is an isomorphism between V and V**
that does not depend on a choice of bases for the two vector spaces: We first
define, for each x E V, x: V* -+ lR by x(f) = f(x) for every f E V*. It is
easy to verify that x is a linear functional on V*, so x E V**. We will show
below that the mapping <I> : V -+ V** defined by <I>(x) = x is the desired
isomorphism between V and V**.
Lemma 4.19 If x(f) = 0 for all f E V*, i.e., x=0 in V**, then x = O.
Proof: Suppose that x -=I O. Choose a basis a = {VI, V2, ... , v n } for V
with VI = x. Let a* = {vi, v2' ... , v~} be the dual basis of a. Then
x(Vi) = vi(x) = Vi(VI) = 1,
which contradicts the hypothesis. o

Theorem 4.20 The mapping <I> V -+ V** defined by <I>(x) x is an


isomorphism from V to V**.

Proof: To show the linearity of <I> , let x, y E V and k a scalar. Then, for
any f E V*,

<I> (x + ky)(f) -----


(x + ky)(f) = f(x + ky)
f(x) + kf(y) = x(f) + ky(f)
(x + ky)(f) = (<I>(x) + k<I>(y)) (f).
Hence, <I> (x + ky) = <I>(x) + k<I>(y). The injectivity of <I> comes from Lemma
4.19. Since dim V = dim V**, <I> is an isomorphism. 0
156 CHAPTER 4. LINEAR TRANSFORMATIONS

Problem 4.22 Let 0: = {(I, 0, 1), (1, 2, 1), (0, 0, I)} be a basis for JR3. Find the
dual basis 0:*.

Problem 4.23 Let V = JR3 and define fi E V* as follows:


h(x, y, z) = x - 2y, h(x, y, z) = x + y + z, h(x, y, z) = y - 3z.
Prove that {h, h, h} is a basis for V*, and then find a basis for V for which it
is the dual.

4.9 Exercises
4.1. Which of the following functions T are linear transformations?
(1) T(x, y) = (x 2 _ y2, x 2 + y2).
(2) T(x, y, z) = (x + y, 0, 2x + 4z).
(3) T(x, y) = (sinx, y).
(4) T(x, y) = (x + 1, 2y, x + y).
(5) T(x, y, z) = (lxi, 0).

4.2. Let T : P2(JR) -+ P3 (lR) be a linear transformation such that T(l) = 1, T(x) =
x 2, and T(x 2) = x 3 + x. Find T(ax 2 + bx + c).
4.3. Find SoT and/or To S whenever it is defined.
(1) T(x, y, z) = (x - y + z, x + z), S(x, y) = (x, x - y, y);
(2) T(x, y) = (x, 3y + x, 2x - 4y, y), S(x, y, z) = (2x, y);
4.4. Let S : C(lR) -+ C(JR) be the function on the vector space C(lR) defined by,
for f E C(JR),
S(f)(x) = f(x) _jX uf(u)du.
Show that S is a linear transformation on the vector space C(lR).
4.5. Let T be a linear transformation on a vector space V such that T2 = Id and
Ti=Id. LetU={vEV:T(v)=v}and W={vEV:T(v)=-v}. Show
that
(1) at least one of U and W is a nonzero subspace of V;
(2) Un W = {O};
(3) V = U + W.

4.6. If T: lR3 -+ JR3 is defined by T(x, y, z) = (2x - z, 3x - 2y, x - 2y + z),


(1) determine the null space N(T) of T,
(2) determine whether T is one-to-one,
(3) find a basis for N(T).
4.9. EXERCISES 157

4.7. Show that each of the following linear transformations T on ]R3 is invertible,
and find a formula for T- I :
(1) T(x, y, z) = (3x, x - y, 2x + y + z).
(2) T(x, y, z) = (2x, 4x - y, 2x + 3y - z).
4.8. Let S, T: V --4 V be linear transformations of a vector space V.
(1) Show that if T 0 S is one-to-one, then T is an isomorphism.
(2) Show that if T 0 S is onto, then T is an isomorphism.
(3) Show that if Tk is an isomorphism for some positive k, then T is an
isomorphism.
4.9. Let T be a linear transformation from ]R3 to ]R2, and let S be a linear trans-
formation from ]R2 to ]R3. Prove that the composition SoT is not invertible.
4.10. Let T be a linear transformation on a vector space V satisfying T - T2 = I d.
Show that T is invertible.
4.11. Let T : P3(]R) --4 P3(]R) be the linear transformation defined by

Tf(x) = f"(x) - 4f'(x) + f(x).


Find the matrix [T]o for the basis Q = {x, 1 + x, x + x 2, x 3}.
4.12. Let T be the linear transformation on]R2 defined by T(x, y) = (-y, x).
(1) What is the matrix ofT with respect to an ordered basis Q = {VI, V2},
where VI = (1, 2), V2 = (1, -I)?
(2) Show that for every real number c the linear transformation T - c I d is
invertible.
4.13. Find the matrix representation of each of the following linear transformations
Ton]R2 with respect to the standard basis {el' e2}.
(1) T(x, y) = (2y, 3x - y).
(2) T(x, y) = (3x - 4y, x + 5y).

4.14. Let M = [ 04 21 31] .


(1) Find the unique linear transformation T : ]R3 ]R2 so that M is the

J
--4

matrix of T with respect to the bases

(2) ::n: 1JU i1'[ :1}, 0, ~ {[ ~ 1'[: l}·


4.15. Find the matrix representation of each of the following linear transformations
Ton P2 (]R) with respect to the basis {I, x, x 2 }.
(1) T: p(x) --4 p(x + 1).
158 CHAPTER 4. LINEAR TRANSFORMATIONS

(2) T: p(x) p'(x).


--+
(3) T: p(x) --+ p(O)x.
(4) T: p(x) --+ p(x) - p(O).
x
4.16. Consider the following ordered bases of ]R3: Q = {el, e2, e3} the standard
basis and (3 = {Ul = (1, 1, 1), U2 = (1, 1, 0), U3 = (1, 0, On.
(1) Find the transition matrix P from Q to (3.
(2) Find the transition matrix Q from (3 to Q.
(3) Verify that Q = P-l.
(4) Show that [v],8 = P[v]", for any vector v E ]R3.
(5) Show that [T],8 = Q-l [T]",Q for the linear transformation T defined by
T(x, y, z) = (2y + x, x - 4y, 3x).
4.17. There are no matrices A and B in Mnxn(]R) such that AB - BA = In.
4.18. Let T : ]R3 --+ ]R2 be the linear transformation defined by
T(x, y, x) = (3x + 2y - 4z, x - 5y + 3z),
and let Q = {(I, 1, 1), (1, 1, 0), (1, 0, On and (3 = {(I, 3), (2, 5n be
bases for ]R3 and ]R2, respectively.
(1) Find the associated matrix [T]~ for T.
(2) Verify [T]~[v]", = [T(v)],8 for any v E ]R3.

4.19. Find the transition matrix [IdJ~ from Q to (3, when


(1) Q = {(2,3), (0, In, (3 = {(6,4), (4,8n;
(2) Q = {(5, 1), (1,2n, (3 = {(I, 0), (0, In;
(3) Q = {(I, 1, 1), (1,1,0), (1,0, on, (3 = {(2, 0, 3), (-1,4,1), (3,2, 5n;
(4) Q = {t, 1, t 2 }, (3 = {3 + 2t + t 2 , t 2 - 4, 2 + t}.

.
4 . 20 . Show t h at a 11 matrIces f h ~
0 t e lorm
[ cos
. ()() ··1ar.
sin ()() ] are SImI
sm - cos

n n
4.21. Show that the matrix A = [~ ~] cannot be similar to a diagonal matrix.

4.22 A" the mat <ices [~ ~ and [ - ~ ~ ,inril,,?

4.23. For a linear transformation T on a vector space V, show that T is one-to-one


if and only if its transpose T* is one-to-one.
4.24. Let T : ]R3 --+ ]R3 be the linear transformation defined by
T(x, y, z) = (2y + z, -x + 4y + z, x + z).
Compute [T]", and [T*]",* for the standard basis Q = {eI, e2, e3}.
4.9. EXERCISES 159

4.25. Let T be the linear transformation from 1R3 into 1R2 defined by
T(X1' X2, X3) = (Xl + X2, 2X3 - Xl)'
(1) For the standard ordered bases Q and (3 for 1R3 and 1R2 respectively, find
the associated matrix for T with respect to the bases Q and (3.
(2) Let Q = {Xl, X2, X3} and (3 = {Y1, Y2}, where Xl = (1,0, -1), X2 =
(1,1,1), X3 = (1,0,0), and Y1 = (0,1), Y2 = (1,0). Find the associated
matrices [TJ~ and [T*J3:.
4.26. Let T be the linear transformation from 1R3 to 1R4 defined by
T(x, y, z) = (2x+y+4z, x+y+2z, y+2z, x+y+3z).
Find the range and the kernel of T. What is the dimension of C(T)? Find
{3 {3'
[TJo and [T*Jo" where
Q = {(I, 0, 0), (0,1,0), (0,0, I)}

(3 = {(I, 0, 0, 0), (1,1,0,0), (1,1,1,0), (1,1,1, I)}.


4.27. Let T be the linear transformation on V = 1R3 , for which the associated
matrix with respect to the standard ordered basis is

A=[ -1~ i3 4 ~l'


Find the bases for the range and the null space of the transpose T* on V*.
4.28. Define three linear functionals on the vector space P2(lR) by
h(p) = fo1 p(x)dx,
h(p) = f~ p(x)dx, h(p) = fo- 1p(x)dx.
Show that {h, 12, h} is a basis for V* by finding its dual basis for V.
4.29. Determine whether or not the following statements are true in general, and
justify your answers.
(1) For a linear transformation T: IR n -> IR m, Ker(T) = {O} if m > n.
(2) For a linear transformation T: IR n -> IRm , Ker(T) i- {O} if m < n.
(3) A linear transformation T : IR n -> IR m is one-to-one if and only if the
nullspace of [TJ~ is {O}, for any bases Q and (3 oflRn and IRm respectively.
(4) For a linear transformation T on IR n , the dimension of the image of T
is equal to that of the row space of [TJo for any basis Q for IRn.
(5) Any polynomial p(x) is linear if and only if the degree of p(x) is 1.
(6) Let T: 1R3 -> 1R2 be a function given as T(x) = (T1(x), T2(x)) for any
X E 1R3 • Then T is linear if and only if their coordinate functions T i ,
i = 1, 2, are linear.
(7) For a linear transformation T : IR n -> IR n , if [TJ~ = In for some bases Q
and (3 of IR n , then T must be the identity transformation.
(8) If a linear transformation T : IR n -> IR n is one-to-one, then any matrix
representation of T is nonsingular.
(9) Any m x n matrix A can be a matrix representation of a linear trans-
formation T : IR n -> IRm.
Chapter 5

Inner Product Spaces

5.1 Inner products


In order to study the geometry of a vector space, we go back to the case
of the Euclidean 3-space ~3. Recall that the dot (or Euclidean inner)
product of two vectors x = (Xl, X2, X3) and y = (YI, Y2, Y3) in ~3 is
defined by the formula

x .y = XIYI + X2Y2 + X3Y3 = x T y,


where x T y is the matrix product of x T and y. Using the dot product, the
length (or magnitude) of a vector x = (Xl, X2, X3) is defined by

Ilxll = (x· x)~ = Jx~ + x~ + x~,


and the distance of two vectors x and y in ~3 is defined by

d(x,y) = IIx - YII.


In this way, the dot product can be considered to be a ruler for measuring
the length of a line segment in ~3. Furthermore, it can also be used to
measure the angle between two vectors: in fact, the angle () between two
vectors x and y in ~3 is measured by the formula involving the dot product
x·y
cos() = IIxIiIlYII' 0::; ()::; 7r,

since the dot product satisfies the formula

x· y = IIxlillyll cos().
161
162 CHAPTER 5. INNER PRODUCT SPACES

In particular, two vectors x and yare orthogonal (i. e., they form a right
angle e = 7r /2) if and only if the Pythagorean theorem holds:

IIxl1 2 + IIyl12 = Ilx + Y112.


By rewriting this formula in terms of the dot product, we obtain another
equivalent condition:

x· y = XIYI + X2Y2 + X3Y3 = O.


In fact, this dot product is one of the most important structures with
which R3 is equipped. Euclidean geometry begins with the vector space
R3 together with the dot product, because the Euclidean distance can be
defined by this dot product.
The dot product has a direct extension to the n-space R n for any positive
integer n, i.e., for vectors x = (Xl, ... , xn) and y = (YI, ... , Yn) in R n , the
dot product, also called the Euclidean inner product, and the length
(or magnitude) of a vector are defined similarly as

x .y = XIYI + ... + XnYn = x T y,


Ilxll = (x.x)~ = Jx~+ ... +x;.
In order to extend this notion of dot product to vector spaces in general,
we extract the most essential properties that the dot product in R n satisfies
and take these properties as axioms for an inner product of a vector space
V. First of all, we note that it is a rule that assigns a real number x . y
to each pair of vectors x and y in R n , and the essential rules it satisfies are
those in the following definition.

Definition 5.1 An inner product on a real vector space V is a function


that associates a real number (x, y) to each pair of vectors x and y in V in
such a way that the following rules are satisfied for all vectors x, y and z in
V and all scalars k in R:
(1) (x,y) = (y,x) (symmetry) ,
(2) (x + y, z) = (x, z) + (y, z) (additivity) ,
(3) (kx, y) = k(x, y) (homogeneity) ,
(4) (x, x) ~ 0, and (x, x) = 0 ¢:} x = 0 (posi ti ve definiteness).
A pair (V, (,)) of a (real) vector space V and an inner product ( ,) is called
a (real) inner product space. In particular, the pair (R n ,') is called the
Euclidean n-space.
5.1. INNER PRODUCTS 163

Note that by symmetry (1), additivity (2) and homogeneity (3) also hold
for the second variable: i. e.,
(2') (x,y+z)= (x,y)+ (x,z),
(3') (x, ky) = k(x, y).
Now it is easy to show that (0, y) = 0(0, y) = 0, and (x,O) = o.

Example 5.1 For vectors x = (Xl, X2) and y = (Yl, Y2) in ]R2, define

where a, band c are arbitrary real numbers. Then this function ( , ) clearly
satisfies the first three rules of the inner product. Moreover, if a > 0 and
ab - c2 > 0 hold, then it also satisfies rule (4), the positive definiteness of
the inner product. (Hint: The equation (x, x) = aXI + 2CXlX2 + bx§ 2': 0
if and only if either X2 = 0 or the discriminant of (x, x) / x§ is nonpositive.)
Note that the equation can be written as matrix products:

In the case of C = 0, this reduces to (x, y) = aXlYl + bX2Y2. Notice also that
a = (el,el), b = (e2,e2) and c = (el,e2) = (e2,el). 0

Example 5.2 Let V = C [0, 1] be the vector space of all real-valued con-
tinuous functions on [0, 1]. For any two functions f(x) and g(x) in V, define

(j, g) = fal f(x )g(x )dx .


Then ( , ) is an inner product on V (verify this). Let

1 - 2x if 0 :S X :S ~, 0 if 0 :S x :S ~,
f (x) =
{
0 if ~ :S X :S 1,
and g(x) ={
2x - 1 if ~ :S x :S 1.

Then f # 0 # g, but (j,g) = O. o

By a subspace W of an inner product space V, we mean a subspace of


the vector space V together with the inner product that is the restriction of
the inner product of V to W.
164 CHAPTER 5. INNER PRODUCT SPACES

Example 5.3 The set W = Dl[O, 1J of all real-valued differentiable func-


tions on [0, 1J is a subspace of V = C[O, 1J. The restriction to W of the
inner product on V defined in Example 5.2 makes W an inner product sub-
space of V. However, suppose we define another inner product on W by the
following formula: For any two functions f(x) and g(x) in W,

((f,g)) = 10 1 f(x)g(x)dx + 10 1 f'(x)g'(x)dx.


Then (( , )) is also an inner product on W but is not defined on V. This
means that this inner product is quite different from the restriction of the
inner product of V to W, and hence W with this new inner product is not
a subspace of the space V as an inner product space. 0

5.2 The lengths and angles of vectors


The following inequality will enable us to define an angle between two vectors
in an inner product space V.

Theorem 5.1 (Cauchy-Schwarz inequality) If x and yare vectors in


an inner product space V, then

(x, y)2 ::; (x, x)(y, y).

Proof: If x = 0, it is clear. Assume x =I O. For any scalar t, we have


0::; (tx + y, tx + y) = (x, x)t 2 + 2(x, y)t + (y, y).
This inequality implies that the polynomial (x, x)t 2 + 2(x, y)t + (y, y) in t
has either no real roots or a repeated real root. Therefore, its discriminant
must be nonpositive:

(x, y)2 - (x, x) (y, y) ::; 0,

which implies the inequality. o

Problem 5.1 Prove that equality in the Cauchy-Schwarz inequality holds if and only
if the vectors x and yare linearly dependent.
5.2. THE LENGTHS AND ANGLES OF VECTORS 165

The lengths and angles of vectors in an inner product space are defined
similarly to the case of the Euclidean n-space.

Definition 5.2 Let V be an inner product space. Then the magnitude or


the length of a vector x, denoted by Ilxll, is defined by

Ilxll = J(x, x) .
The distance between two vectors x and y, denoted by d(x,y), is defined
by
d(x,y) = Ilx - YII·

From the Cauchy-Schwarz inequality, we have


(x,y)
-1 ::; Ilxllllyll ::; 1.

Hence, there is a unique number () E [0,7r] such that cos () = Iliil'rJII'


Definition 5.3 The real number () in the interval [0, 7r] that satisfies

(x,y)
cos () = Ilxllllyll' or (x, y) = IIxlillyll cos (),

is called the angle between x and y.

Example 5.4 In ~2 equipped with an inner product (x, y) = 2XIYl +3X2Y2,


the angle between x = (1,2) and y = (1,0) is computed as

cos() = (x,y) = 2
Ilxllllyll v'l.'4-2'
D

Problem 5.2 Prove the following properties of length in an inner product space V:
For any vectors x, y E V,
(1) Ilxll ~ 0,
(2) Ilxll = 0 if and only if x = 0,
(3) Ilkxll = Iklllxll,
(4) Ilx + yll :s Ilxll + lIyll (triangular inequality).

Problem 5.3 Let V be an inner product space. Show that for any vectors x, yand
z in V.
166 CHAPTER 5. INNER PRODUCT SPACES

(1) d(x, y) ~ 0,
(2) d(x, y) = 0 if and only if x = y,
(3) d(x, y) = d(y, x),
(4) d(x,y):::; d(x,z) +d(z,y) (triangular inequality).

Therefore, an inner product in the 3-space ]R3 may play the roles of a
ruler and a protractor in our physical world.

Definition 5.4 Two vectors x and y in an inner product space are said to
be orthogonal (or perpendicular) if (x,y) = o.

Note that for nonzero vectors x and y, (x, y) = 0 if and only if e = 7r /2.

Lemma 5.2 Let V be an inner product space and let x E V. Then the vector
x is orthogonal to every vector y in V (i. e., (x, y) = 0 for all y in V) if and
only if x = o.

Proof: If x = 0, clearly (x, y) = 0 for all y in V. Suppose that (x, y) = 0


for all y in V. Then (x, x) = 0 in particular. The positive definiteness of
the inner product implies that x = o. 0

Corollary 5.3 Let V be an inner product space, and let 0: = {VI, ... , V n}
be a basis for V. Then a vector x in V is orthogonal to every basis vector
Vi in 0: if and only if x = o.

Proof: If (x, Vi) = 0 for i = 1, ... , n, then (x, y) = I::?=I Yi(X, Vi) = 0 for
any y = I::?=I YiVi E V. o

Example 5.5 (Pythagorean theorem) Let V be an inner product space,


and let x and y be any two vectors in V with the angle e. Then, (x, y) =
I/xl/I/yll cose gives the equality

IIx + Yll2 = IIxll2 + lIyll2 + 211xllilyll cos e.

Moreover, it deduces the Pythagorean theorem: IIx + yll2 = IIxll2 + lIyll2 for
any orthogonal vector x and y. 0
5.3. MATRIX REPRESENTATIONS OF INNER PRODUCTS 167

Theorem 5.4 If {Xl, X2, ... , xd nonzero vectors in an inner product


space V are mutually orthogonal (i. e., each vector is orthogonal to every
other vector), then they are linearly independent.

Proof: Suppose CIXI + C2X2 + ... + CkXk = o. Then for each i = 1, ... , k,

o (0, xiI = + ... + CkXk, XiI


(CIXI

CI(XI,Xil + ... + Ci(Xi,Xil + ... + Ck(Xk,Xil

Ci IIXi 11 2 ,
because Xl, ... , Xk are mutually orthogonal. Since each Xi is not the zero
vector, Ilxill =f. 0; so Ci = 0 for i = 1, ... , k. 0

r: ;
Problem 5.4 Let f(x) and g(x) be continuous real-valued functions on [0, 1]. Prove
(1) [f01 f(x )g(x )dx [f01 f2(x )dx] [f01 g2 (x )dx] ,
1 1 1

(2) [f01(f(x) + g(x))2dx] "2 :::; [f~ f2(x)dx] "2 + [f01 g2(x)dx] "2 •

5.3 Matrix representations of inner products


As we saw at the end of Example 5.1, the inner product on an inner product
space (V, (, I) can be expressed in terms of a symmetric matrix. In fact, let
Ct = {VI, ... , v n } be a fixed ordered basis for V. Then for any X = ~r=l XiVi

and Y = ~j=l YjVj in V,


n n
(x, y) = L L XiYj(Vi, Vj)
i=l j=l

holds. If we set aij = (Vi, Vjl for i,j = 1, ... , n, then these numbers
constitute a symmetric matrix A = [aij], since (Vi, Vjl = (Vj, ViI. Thus, in
matrix notation, the inner product may be written as
n n
(x, YI = L L XiYjaij = [x];A[Y]a.
i=l j=l

The matrix A is called the matrix representation of the inner product


with respect to Ct.
168 CHAPTER 5. INNER PRODUCT SPACES

Example 5.6 (1) With respect to the standard basis {e1' e2, ... , en} of
the Euclidean n-space JR n , the matrix representation of the dot product is the
identity matrix, since ei . ej = Dij. Thus for x = L Xiei, y = L Yjej E JRn
the dot product is the matrix product x T y:

(2) Let V = P2(JR), and define an inner product of V as

(I, g) = fo1 f(x)g(x)dx.


Then for a basis Q = {fr(x) = 1, h(x) = x, h(x) = x 2 } for V, one can
easily find A = [aij]: for instance,

a23 = (12, h) = 101h(x)h(x)dx = 101 X· x 2 dx = -.


1
o
0 0 4

The expression of the dot product as a matrix product is very useful in


stating and proving theorems in the Euclidean space.
On the other hand, for any symmetric matrix A and for a fixed basis Q,
the formula (x, y) = [xJ:;A[Y]a seems to give rise to an inner product on V.
In fact, the formula clearly satisfies the first three rules in the definition of
the inner product, but not necessarily the fourth rule, positive definiteness.
The following theorem gives a necessary condition for a symmetric matrix
A to give rise to an inner product. Some necessary and sufficient conditions
will be discussed in Chapter 8.

Theorem 5.5 The matrix representation A of an inner product (with re-


spect to any basis) on a vector space V is invertible.

Proof: It is enough to show that the column vectors of A are linearly inde-
pendent. Let Q = {VI, ... , v n } be a basis for an inner product space V.
We denote the column vectors of A = [aij] = [(Vi, Vj)] byaj for j = 1,···, n.
Consider the linear dependence of the column vectors of A: for C1, .•. ,Cn E JR,
5.3. MATRIX REPRESENTATIONS OF INNER PRODUCTS 169

Let c = Ef=l CiVi E V so that [cJa: = [Cl ... CnV. Then this equation be-
comes a homogeneous system 0 = A[cJa: of n linear equations in n unknowns:
n
0 = anCl + ... + alnCn = LaljCj = [vlJ;A[cJa: = (Vl' c),
j=l

n
0 = anlcl + ... + annCn = Lanjcj = [vnJ;A[cJa: = (vn,c),
j=l
where we used [ViJa: = ei. Thus, by Corollary 5.3, we get c = E~l CiVi = 0,
and the columns of A are linearly independent. 0

Recall that the conditions a > 0 and ab - c2 > 0 in (2) of Example 5.1
are sufficient for A to give rise to an inner product on jR2.
The standard basis of the Euclidean n-space jRn has a special property:
The basis vectors are mutually orthogonal and are of length 1. In this sense,
it is called the rectangular coordinate system for jRn. In an inner product
space, a vector with length 1 is called a unit vector. If x is a nonzero vector
in an inner product space V, the vector II~II x is a unit vector. The process of
obtaining a unit vector from a nonzero vector by multiplying by the inverse
of its length is called a normalization. Thus, if there is a set of vectors
(or a basis) in an inner product space consisting of mutually orthogonal
vectors, then the vectors can be converted to unit vectors by normalizing
them without losing their mutual orthogonality.

Problem 5.5 Normalize each of the following vectors in the Euclidean space jR3:
(1) u = (2, 1, -1), (2) v = (1/2, 1/3, -1/4).

Definition 5.5 A set of vectors xl, X2, ... , Xk in an inner product space
V is said to be orthonormal if
(orthogonality) ,
(normality).

A set {Xl, X2, ... , xn} of vectors is called an orthonormal basis for V if
it is a basis and orthonormal.

It will be shown later that any inner product space has an orthonormal
basis, just like the standard basis for the Euclidean n-space jRn.
170 CHAPTER 5. INNER PRODUCT SPACES

Problem 5.6 Determine whether each of the following sets of vectors in ]R2 is or-
thogonal, orthonormal, or neither with respect to the Euclidean inner product.

(1) {[ ~ ] , [ ~ ]} (2) {[ ~ ] , [ ~ ]}
(3) {[ ~ ] ,[ -~ ]} (4) {[ ~j~ ] ,[ -~j~ ]}
The next theorem shows a simple expression of a vector in terms of an
orthonormal basis.

Theorem 5.6 If {VI, V2, ... , v n } is an orthonormal basis for an inner


product space V and x is any vector in V, then

Proof: For any vector x E V, we can write x = Xl VI + X2V2 + ... + Xn V n ,


as a linear combination of basis vectors. However, for each i = 1, ... , n,

(XlVl+···+XnVn , Vi)

Xl (VI, Vi) + ... + Xi(Vi, Vi) + ... + Xn(V n , Vi)

because {VI, V2, ... , V n } is orthonormal. o

In an inner product space, the coordinate expression of a vector depends


on the choice of an ordered basis, and the inner product is just a matrix
product of the coordinate vectors with respect to an ordered basis involving
some symmetric matrix between them, as we have seen already.
Actually, we will show in Theorem 5.12 in the following section that every
inner product space V has an orthonormal basis, say a = {VI, V2, ... , v n }.
Then the matrix representation A = [aij] of the inner product with respect
to the orthonormal basis a is the identity matrix, since aij = (Vi, Vj) = 8ij .
Thus for any vector x = L XiVi and y = LYiVi in V,
n n n
(x, y) =L L8 ij X iYj = L XiYi·
i=l j=l i=l

This expression looks like the dot product in the Euclidean space ]Rn. Thus
any inner product on V can be written just like the dot product in ]Rn, if V
is equipped with an orthonormal basis.
5.4. ORTHOGONAL PROJECTIONS 171

5.4 Orthogonal projections


Let U be a subspace of a vector space V. Then by Corollary 3.13 there is
another subspace W of V such that V = U EB W, so that any x E V has a
unique expression as x = u + w for u E U and w E W. As an easy exercise,
one can show that a function T: V --+ V defined by T(x) = T(u+w) = u is
a linear transformation, whose image Im(T) = T(V) is the subspace U and
kernel Ker(T) is the subspace W.

Definition 5.6 Let U and W be subspaces of a vector space V. A linear


transformation T : V --+ V is called the projection of V onto the subspace
U along W if V = U EB Wand T(x) = u for x = u + wE U EB W.

Note that for a given subspace U of V, there exist many projections T


depending on the choice of a complementary subspace W of U . However,
if we fix a complementary subspace W of U, then a projection Tonto U is
uniquely determined and by definition T(u) = u for any u E U and for any
choice of W . In other words, ToT = T for any projection T of V.

Example 5.7 Let U, V and W be the I-dimensional subspaces of the Eu-


clidean 2-space ]R2 spanned by the vectors u = el, w = e2, and v = (1, 1),
respectively.

W V

Tw(x) = (0 , 1)
_____x = (2 1) x = (2 1)

(2 0) U (1 0) U

Since the pairs {u, w} and {u, v} are linearly independent, the space
]R2 can be expressed as the direct sum in two ways: ]R2 = U EB W = U EB V.
Thus a vector x = (2, 1) E ]R2 may be written in two ways:

X=(2I)={ 2(1,0)+(0,1) E UEBW = ]R2, or


, (1,0) + (1,1) E U EB V = ]R2.
172 CHAPTER 5. INNER PRODUCT SPACES

Let Tw and Tv denote the projections of]R2 onto Wand V along U, respec-
tively. Then

Tw(x) = (0,1) E V, and Tv(x) = (1,1) E W.


It also shows that a projection of ]R2 onto the subspace U (= the x-axis)
depends on a choice of complementary subspace of U. 0

The following shows an algebraic characterization of a projection.

Theorem 5.7 A linear transformation T : V --+ V is a projection onto a


subspace U if and only if T = T2 (= ToT, by definition).

Proof: The necessity is clear, because ToT = T for any projection T.


For e sufficiency, let T2 = T. We want to show V = Im(T) EB Ker(T)
and T(u + w) = u for u + w E Im(T) EB Ker(T). For the first one, we
need to prove Im(T) n Ker(T) = {O} and V = Im(T) + Ker(T). Indeed, if
u E Im(T) nKer(T), then there is x E V such that T(x) = u and T(u) = O.
But
u = T(x) = T2(x) = T(T(x)) = T(u) = 0

proves Im(T) n Ker(T) = {O}. Note that this also shows T(u) = u for
u E Im(T). Then, dim V = dim(Im(T)) + dim(Ker(T)) (see Remark (2) in
page 138) implies V = Im(T)+Ker(T). Now, note that T(u+w) = T(u) = u
for any u + wE Im(T) EB Ker(T). 0

Let T: V --+ V be a projection of V, so that V = Im(T) EB Ker(T). It is


not difficult to show that Im(Idv -T) = Ker(T) and Ker(Idv -T) = Im(T).

Corollary 5.8 A linear transformation T : V --+ V is a projection if and


only if I dv - T is a projection. Moreover, if T is the projection of V onto a
subspace U along W, then I dv - T is the projection of V onto W along U.

Proof: It is enough to show that (Idv - T) 0 (Idv - T) = Idv - T. But


(Idv - T) 0 (Idv - T) = (Idv - T) - (T - T2) = Idv - T. o

Problem 5.7 Let V = U EB W. Let Tu denote the projection of V onto U along W,


and Tw denote the projection of V onto W along U. Prove the following.
5.4. ORTHOGONAL PROJECTIONS 173

(1) For any x E V, x = Tu(x) + Tw(x).


(2) Tu 0 (ldv - Tu) = O.
(3) Tu oTw = Tw oTu = O.
(4) For any projection T: V --> V, Im(ldv - T) = Ker(T) and Ker(ldv - T) =
Im(T).

Now, let V be an inner product space and let U be a subspace of V.


Recall that there exist many kinds of projections of V onto U depending on
the choice of complementary subspace W of U. However, in an inner product
space V, there is a particular choice of W, called the orthogonal complement
of U, along which the projection onto U is called the orthogonal projection.
Almost all projections used in linear algebra are orthogonal projections.
In an inner product space V, the orthogonality of two vectors can be
extended to subspaces of V.

Definition 5.7 Let U and W be subspaces of an inner product space V.


(1) Two subspaces U and Ware said to be orthogonal, written by U ..1
W, if (u, w) = 0 for each u E U and w E W.
(2) The set of all vectors in V that are orthogonal to every vector in U is
called the orthogonal complement of U, denoted by U1-, i.e.,

U1- = {v E V: (v, u) = 0 for all u E U}.

One can easily show that U 1- is a subspace of V, and v E U 1- if and only


if (v, u) = 0 for every u E (3, where (3 is a basis for U. Therefore, clearly
W ..1 U if and only if W ~ U1-.

Problem 5.8 Show: (1) If U .1. W, Un W = {O}. (2) U ~ W if and only if


Wl. ~ ul..

Theorem 5.9 Let U be a subspace oj an inner product space V. Then


(1) dimU + dimU1- = dim V.
(2) (U1-)1- = U.
(3) V = U $ U1-: that is, Jor each x E V, there are unique vectors Xu E U
and XU.L E U 1- such that x = Xu + XU.L. This is called the orthogonal
decomposition oJV (or oJ x) by u.
174 CHAPTER 5. INNER PRODUCT SPACES

Proof: (1) Suppose that dimU = k. Choose a basis {VI, ... , vd for U,
and then extend it to a basis {VI, ... , Vb vk+1, ... , v n } for V, where n =
dim V. Then x = 2:.7=1 XjVj E Ul. if and only if 0 = (x, Vi) = 2:.7=1 aijXj for
1 ::; i ::; k, where aij = (Vi, Vj). The latter equations form a homogeneous
system of k linear equations in n unknowns, that is, Ul. is precisely the null
space of the k x n coefficient matrix B = [aij], which is a submatrix of the
matrix representation A of the inner product. Thus, by Theorem 5.5 the
rows of B are linearly independent, so B is of rank k. Therefore, the null
space has dimension n - k, or dim Ul. = n - k = n - dim U.
(2) By definition, every vector in U is orthogonal to Ul., i. e., U ~ (U .1 ) .1 .
On the other hand, by (1), dim(Ul.)l. = n - dimUl. = dimU. This proves
that (Ul.)..l = U.
(3) For a basis {VI, ... , vd for U, take any basis {Vk+1' ... , v n } for
Ul.. Since Un ul. = {O}, the set {VI, ... , Vk, Vk+1, ... , v n } is linearly
independent, so it is a basis for V. Therefore, every vector x E V has a
unique expression
k n
X = Laivi + L bjvj.
i=l j=k+1

Now take xu = 2:.~ aivi E U and XU-L = 2:.k+l bjvj E Ul.. To show unique-
ness, let x = U + w be another expression with u E U and w E Ul.. Then
Xu - u = w - XU-L E Un Ul. = {O}. So, Xu = u and XU-L = w. D

Definition 5.8 Let V be an inner product space, and let U be a subspace


of V so that V = U EB Ul.. Then the projection of V onto U along Ul. is
called the orthogonal projection of V onto U, denoted Proju. For x E V,
the component vector Proju(x) E U is called the orthogonal projection
of x into U.

Example 5.8 As in Example 5.7, let U, V and W be subspaces of the


Euclidean 2-space ]R2 generated by the vectors u = e1, V = (1, 1), and
w = e2, respectively. Then clearly W = Ul. and V =I- Ul.. Hence, for the
projections Tv and Tw of]R2 given in Example 5.7, the projection Tw is the
orthogonal projection, but the projection Tv is not, so that Tw = Projw
and Tv =I- Projv. D
5.4. ORTHOGONAL PROJECTIONS 175

Theorem 5.10 Let U be a subspace of an inner product space V, and let


x E V. Then, the orthogonal projection Proju(x) of x satisfies

Ilx - Proju(x) II ::; IIx - yll

for all y E U. The equality holds if and only ify = Proju(x) .

Proof: Since x = Proju(x) + Proju-L (x) for any vector x E V, x - Proju(x)


= Proju-L (x)
E U1.. Thus, for all y E U,

IIx - yll2 II (x - Proju(x)) + (Proju(x) _ y)II2


Ilx - Proju(x) 112 + IIProju(x) _ Yll2
> IIx - Proju(x) 11 2,

where the second equality comes from the Pythagorean theorem for x -
Proju(x) ..1 Proju(x) - y. 0

The theorem means that the orthogonal projection Proju(x) of x is the


unique vector in U that is closest to x in the sense that it minimizes the
distance to x from the vectors in U. Geometrically, the following picture
depicts the vector Proju(x):

x
x - Proj u( x ) = Proju-L (x )

U
o

Problem 5.9 Let U and W be subspaces of an inner product space V. Show that
(1) (U + W)l. = Ul. n Wl.. (2) (U n W)l. = Ul. + Wl..

Problem 5.10 Let U C ]R4 with the Euclidean inner product be the subspace
spanned by (1 , 1, 0, 0) and (1 , 0, 1, 0) , and W C ]R4 the subspace spanned
by (0, 1, 0, 1) and (0, 0, 1, 1). Find a basis for and the dimension of each of the
following subspaces:
(1) U + W, (2) Ul., (3) Ul. + Wl., (4) Un W.
176 CHAPTER 5. INNER PRODUCT SPACES

Lemma 5.11 Let U be a subspace of an inner product space V, and let


{Ul' U2, ... , urn} be an orthonormal basis for U. Then, for any x E V,
the orthogonal projection Proju(x) of x into U is

Proof: Let z = (x, Ul)Ul + (x, U2)U2 + ... + (x, Urn) Urn· It is enough to
show that y = x - z is orthogonal to U, because if y = x - z E U J.., then
x = z + y E U EB U J.., so the uniqueness of this orthogonal decomposition
gives z = Proju(x). However, for each j = 1, ... , m,

(x - z, Uj) = (x, Uj) - (z, Uj) = (x, Uj) - (x, Uj) = 0,

since {Ul' U2, ... , urn} is an orthonormal basis for U. That is, the vector
x - z = x - L:i(x, Ui)Ui is orthogonal to U. 0

In particular, if U = V in Lemma 5.11, then Proju(x) = x, and we get


Theorem 5.6.
A unit vector U in an inner product space V determines a I-dimensional
subspace U = {ru : r E lR}. Then, for a vector x in V, the orthogonal
projection of x into U is simply

Proju(x) = (x, u)u,

where (x, u) = I/xll cos (). On the other hand, it is quite clear that y
x - (x, u)u is a vector orthogonal to u. Thus

x= (x,u)u+y E UEBUJ..

so that IIxl1 2 = IIyl12 + I(x, u)12, which is just the Pythagorean theorem. In
particular, if V = lRn the Euclidean space with the dot product, then

Proju(x) = (u· x)u = (u T x)u = u(uT x) = (uuT)x.

(Here the third equality comes from the matrix products). This equation
shows that the matrix uuT is the matrix representation of the orthogonal
projection Proju with respect to the standard basis for lRn. Further dis-
cussions about matrix representations of the orthogonal projections will be
given in Section 5.10.
5.5. THE GRAM-SCHMIDT ORTHOGONALIZATION 177

Example 5.9 Let P(xo, YO) be a point and ax + by + c = 0 a line in the


]R2 plane. One might know already from calculus that the nonzero vector
n = (a, b) is perpendicular to the line ax + by + c = o. In fact, for any
--+
two points Q(XI, YI) and R(X2, Y2) on the line, the dot product QR . n =
--+
a(x2 - Xl) + b(Y2 - Yl) = 0, that is, QR.l n.
For any point P(xo, YO) in the plane]R2, the distance d between the point
P(xo, YO) and the line ax + by + c = 0 is simply the length of the orthogonal
--+
projection of QP into n, for any point Q(Xl' yr) in the line. Thus,
--+
d IIProjn(QP)11
--+
IQP·nl
Ilnll
la(xo - Xl) + b(yo - yr)1
Ja 2 + b2
laxo + byo + cl
Ja 2 + b2
Note that the last equality is due to the fact that the point Q is on the line
(i.e., aXI + bYI + c = 0). D

Problem 5.11 Let V = P3 (]R), the vector space of polynomials of degree < 3
equipped with the inner product

(I, g) = 10 1
f(x)g(x) dx for any f and 9 in V.
Let W be the subspace of V spanned by {I, x}, and define f (x) = x2• Find the
orthogonal projection Projw(J) of f on W.

5.5 The Gram-Schmidt orthogonalization


The construction of the orthogonal projection onto a subspace described in
Section 5.4 can be used to find an orthonormal basis from any given basis,
as the following example shows.
Example 5.10 Let
178 CHAPTER 5. INNER PRODUCT SPACES

Find an orthonormal basis for the column space C(A) of A.


Solution: Let Cl , C2 and C3 be the column vectors of A in order from
left to right. It is easily verified that they are linearly independent, so they
form a basis for the 3-dimensional subspace C(A) of the Euclidean space
]R4, i.e., the column space, but this basis is not orthonormal. To make an
orthonormal basis, set

VI = II~~ I = ~I = (~, ~, ~, ~),


which is a unit vector. Clearly, VI, c2 and C3 span the column space C(A).
Let WI denote the subspace spanned by VI. Then

and c2-ProjW1 (C2) = C2-2vI = (0, 1, -1, 0) is a nonzero vector orthogonal


to VI . To convert it to a unit vector, we set

Since C2 = 2VI + V2 V2, we still have a spanning set {VI, V2, C3} of the
column space C(A) and thus a basis. Let W2 denote the subspace spanned
by VI and V2. Then

so C3 - ProjW2(C3) = C3 - 4VI + V2V2 = (0, 1, 1, -2) is a nonzero vector


orthogonal to both VI and V2. In fact,

(C3- 4vI+V2v2' VI) (C3, VI) - 4(VI' VI) + V2 (V2, Vl) = 0,


(C3 - 4VI + V2 V 2, V2) (C3, V2) - 4(VI' V2) + v'2 (V2' V2) = 0,
since VI and V2 are orthogonal. Thus we can normalize the vector C3 -
Proj W2(C3) and set

Then one can easily show that the set {VI, V2, V3} still spans C(A) and
forms an orthonormal basis for it. 0
5.5. THE GRAM-SCHMIDT ORTHOGONALIZATION 179

In fact, the orthonormalization process in Example 5.10 indicates how


to prove the following general version, called the Gram-Schmidt orthog-
onalization.

Theorem 5.12 Every inner product space has an orthonormal basis.

Proof: [Gram-Schmidt orthogonalization process] Let {Xl, X2, ... , xn}


be a basis for an n-dimensional inner product space V. Let

Xl
VI = IlxIII'

Of course, X2 - (X2' VI)VI #- 0, because {Xl, X2} is linearly independent.


Generally, we define by induction on k = 1,2, ... , n

Xk - (Xk, VI)VI - (Xk, V2)V2 - ... - (Xk, Vk-I)Vk-1


Vk = Ilxk - (Xk, VI)VI - (Xk, V2)V2 - ... - (Xk, Vk-I)Vk-III'

Thus, Vk is the normalized vector of Xk - Projwk _ 1 (Xk), where W k - l is


the subspace of V spanned by {Xl, X2, ... , Xk-l} (or equivalently, by
{VI, V2, ... , Vk-l}). Then, the vectors VI, V2, ... , Vn are orthonormal in
the n-dimensional vector space V. Since every orthonormal set is linearly
independent, it is an orthonormal basis for V. 0

Here is a simpler proof of Theorem 5.9. Suppose that U is a subspace


of an inner product space V. Then clearly we have U 1.. U.l, by definition.
To show (U.l).l = U, take an orthonormal basis, say a = {VI, V2, ... , Vk},
for U by the Gram-Schmidt orthonormalization, and then extend it to an
orthonormal basis for V, say f3 = {VI, V2, ... , Vk, Vk+1, ... , v n }, which is
always possible. Then clearly 'Y = {Vk+l' ... , v n } forms an (orthonormal)
basis for U.l, which means that (U.l).l = U and V = U EB U.l.

Problem 5.12 Find an orthonormal basis for the subspace of the Euclidean space
]R3 given by x + 2y - 3z = 0, which is the orthogonal complement of the vector
(1,2, -3) in ]R3.

Problem 5.13 Let V = C[O, 1] with the inner product

(1,g) = 11 f(x)g(x)dx for any f and 9 in V.

Find an orthonormal basis for the subspace spanned by 1, x and x 2 .


180 CHAPTER 5. INNER PRODUCT SPACES

We can now identify an n-dimensional inner product space V with the


Euclidean n-space ~n via the Gram-Schmidt orthogonalization. In fact, if
(V, (, )) is an inner product space, then by the Gram-Schmidt orthogonal-
ization we can choose an orthonormal basis Q = {VI, V2, ... , v n } for V.
With this orthonormal basis Q, the natural isomorphism <I> : V ---+ ~n given
by <I>(Vi) = [vila = ei, i = 1, ... , n (see the last remark of Section 4.4)
preserves the inner product of vectors: Every vector x E V has a unique
expression x = I:~I XiVi with Xi = (X,Vi). Thus the coordinate vector of x
with respect to Q is a column matrix

[x]a =

which is a vector in ~n. Moreover, if Y = I:i'=1 YiVi is another vector in V,


then

(x, y) = (~XiVi, t, YjVj ) = ~ XiYi = [X];[Y]a.

The right side of this equation is just the dot product of vectors in the
Euclidean space ~n. That is,

(x, y) = [x];[Y]a = <I>(x) . <I>(y)


for any x, Y E V. Hence, the natural isomorphism <I> preserves the inner
product, and identifies the inner product on V with the dot product on ~n.
In this sense, we may restrict our study of an inner product space to the
case of the Euclidean n-space ~n with the dot product.
A special kind of linear transformation that preserves the inner product
such as the natural isomorphism from V to ~n plays an important role in
linear algebra, and we will study this kind of transformation in Section 5.6.
Problem 5.14 Use the Gram-Schmidt orthogonalization on the Euclidean space ~4
to transform the basis
{(a, 1, 1, 0), (-1, 1, 0, 0), (1, 2, 0, -1), (-1, 0, 0, -I)}
into an orthonormal basis.
Problem 5.15 Find the point on the plane x - y - z = a that is closest to p =
(1, 2, 0).
5.6. ORTHOGONAL MATRICES AND TRANSFORMATIONS 181

5.6 Orthogonal matrices and transformations


In Chapter 4, we saw that a linear transformation can be associated with
a matrix, and vice versa. In this section, we are mainly interested in those
linear transformations (or matrices) that preserve the lengths of vectors in
an inner product space.
Let A = [CI ... c n ] be an n x n square matrix, where CI, ... , C n E ]Rn
are the column vectors of A. Then a simple computation shows that

Hence, if the column vectors are orthonormal, cT Cj = Oij, then AT A = In,


that is, AT is a left inverse of A, and vice versa. Since A is a square matrix,
this left inverse must be the right inverse of A, i.e., AAT = In. Equivalently,
the row vectors of A are also orthonormal. This argument can now be
summarized as follows.

Lemma 5.13 Let A be an n x n matrix. The following are equivalent.


(1) The column vectors of A are orthonormal.
(2) AT A = In.
(3) AT = A-I.
(4) AAT = In.
(5) The row vectors of A are orthonormal.

Definition 5.9 A square matrix A is called an orthogonal matrix if A


satisfies one (and hence all) of the statements in Lemma 5.13.

Therefore, A is orthogonal if and only if AT is orthogonal.

Example 5.11 It is easy to see that the matrices

A = [ cos () - sin () ] B [cos () . sin () ]


sin () cos ()' = sin () - cos ()

are orthogonal, and satisfy

A-I = AT = [ c~s() Sin()] ,


- sm () cos ()
B- 1 = BT = [ c~s() sin() ].
sm() - cos()
182 CHAPTER 5. INNER PRODUCT SPACES

Note that the linear transformation T : ]R2 -+ ]R2 defined by T(x) = Ax is


a rotation through the angle (), while S : ]R2 -+ ]R2 defined by S(x) = Bx is
the reflection about the line passing through the origin that forms an angle
() /2 with the positive x-axis. 0

Example 5.12 Show that every 2 x 2 orthogonal matrix must be one of the
following forms

[
COS () sin () 1
sin () - cos () .

Solution: Suppose that A = [~ ~ 1is an orthogonal matrix, so that


AAT = h = AT A. From the first equality, we get a 2 + b2 = 1, ac + bd = 0,
and c2 + d 2 = 1. From the second equality, we get a 2 + c2 = 1, ab + cd = 0,
and b2 + d 2 = 1. Thus, b = ±c. If b = -c, then we get a = d. If b = c, then
we get a = -d. Now, choose () so that a = cos () and b = sin (). 0

Problem 5.16 Find the inverse of each of the following matrices.

(1) [~o - cos~ sin~ l'


sin e cos e
(2) [-~j~0 =~j~0 ~1 1
What are they as linear transformations on ]R3: rotations, reflections, or other?

Intuitively, any rotation or reflection on the Euclidean space ]Rn preserves


both the lengths of vectors and the angle of two vectors. In general, any
orthogonal matrix A preserves the lengths of vectors:

Definition 5.10 Let V and W be two inner product spaces. A linear trans-
formation T : V -+ W is called an isometry, or an orthogonal transfor-
mation, if it preserves the lengths of vectors, that is, for every vector x E V

IIT(x)11 = Ilxll·

Clearly, any orthogonal matrix is an isometry as a linear transformation.


If T : V -+ W is an isometry, then T is a one-to-one, since the kernel of Tis
trivial: T(x) = 0 implies Ilxll = IIT(x)11 = 0. Thus, if dim V = dim W, then
an isometry is also an isomorphism.
The following is an interesting characterization of an isometry.
5.6. ORTHOGONAL MATRICES AND TRANSFORMATIONS 183

Theorem 5.14 Let T : V -+ W be a linear transformation on an inner


product space V to W. Then T is an isometry if and only if T preserves
inner pmducts, that is,

(T(x), T(y)) = (x, y)


for any vectors x, y in V.

Proof: Let T be an isometry. Then IIT(x)112 = IIxl1 2 for any x E V. Hence,

(T(x+y),T(x+y)) = IIT(x+Y)112 = Ilx+yl12 = (x+y,x+y)


for any x, y E V. On the other hand,

(T(x + y), T(x + y)) (T(x), T(x)) + 2(T(x), T(y)) + (T(y), T(y)),
(x+y,x+y) (x, x) + 2(x, y) + (y, y),
from which we get (T(x), T(y)) = (x, y).
The converse is quite clear by choosing y = x. D

Corollary 5.15 Let A be an n x n matrix. Then, A is an orthogonal matrix


if and only if A : jRn -+ jRn, as a linear transformation, preserves the dot
product. That is, for any vectors x, y E jRn,

Ax· Ay = x·y.

Proof: One way is clear. Suppose that A preserves the dot product. Then
for any vectors x, y E jRn,

Ax . Ay = x T AT Ay = x T Y = x . y.
Take x = ei and y = ej. Then this equation is just [AT Alij = (jij. D

Since d(x, y) = Ilx - yll for any x and y in V, one can easily derive the
following corollary.

Corollary 5.16 A linear transformation T : V -+ W is an isometry if and


only if
d(T(x), T(y)) = d(x, y)
for any x and y in V.
184 CHAPTER 5. INNER PRODUCT SPACES

Recall that if () is the angle between two nonzero vectors x and y in an


inner product space V, then for any isometry T : V ~ V,
(x,Yl (Tx, TYI
cos () = Ilxllllyll IITxIIIITyll'
Hence, we have
Corollary 5.17 An isometry preserves the angle.
The following problem shows that the converse of Corollary 5.17 is not
true in general.
Problem 5.17 Find an example of a linear transformation on the Euclidean space
(i.e., not an isometry).
]Rn that preserves the angles but not the lengths of vectors
Such a linear transformation is called a dilation.

We have seen that any orthogonal matrix is an isometry as the linear


transformation T(x) = Ax. The following theorem says that the converse is
also true, that is, the matrix representation of an isometry with respect to
an orthonormal basis is an orthogonal matrix.

Theorem 5.18 Let T : V ~ W be an isometry of an inner product space V


to W of the same dimension. Let 0: = {VI, ... , v n } and (3 = {WI, ... , w n }
be orthonormal bases for V and W, respectively. Then the matrix [T]~ for
T with respect to the basis 0: and (3 is an orthogonal matrix.

Proof: Note that the k-th column vector of the matrix [T]~ is just [T(Vk)];3'
Since T preserves inner products and 0: is orthonormal, we get

which shows that the column vectors of [T]~ are orthonormal. D

Therefore, a linear transformation T : V ~ V is an isometry if and only


if [T]a is an orthogonal matrix for an orthonormal basis 0:. Moreover, a
square matrix A preserves the dot product if and only if it preserves the
lengths of vectors.
Problem 5.18 Find values r > 0, 8 > 0, a, band c such that matrix Q is orthogonal.

(1) Q = [~r 2:
-8
~],
C
(2) Q = [~ ~: ~].
T -28 C
5.7. RELATIONS OF FUNDAMENTAL SUBSPACES 185

Problem 5.19 (Bessel's Inequality) Let V be an inner product space, and let
{v 1, . . . , V m} be a set of orthonormal vectors in V (not necessarily a basis for
V). Prove that for any x in V, IIxl12 ~ l:z:,1 I(x, vi)1 2 .

Problem 5.20 Determine whether the following linear transformations on Euclidean


space ~3 are orthogonal.
(1) T(x, y, z) = (z, V;x + h, ~ - V;y).
(2) T(x, y, z) = (153x + gz, gy - 153z, x).

5.7 Relations of fundamental subspaces


We now go back to the study of the system Ax = b of linear equations. One
of the most important applications of the orthogonal projection of vectors
onto a subspace is to study the relations or structures of the four fundamental
subspaces N(A), 'R.(A), C(A), and N(AT) of an m x n matrix A.

Lemma 5.19 For any m x n matrix A, the null space N(A) and the row
space 'R.(A) are orthogonal in ~n. Similarly, the null space N(AT) of AT
and the column space C(A) = 'R.( AT) are orthogonal in ~m.

Proof: Note that W E N(A) if and only if Aw = 0, i.e., for every row
vector r in A, r· w = O. For the second statement, do the same with AT. 0

This theorem shows that N(A) l.. 'R.(A) and C(A) l.. N(AT ), hence
N(A) ~ 'R.(A).l (or 'R.(A) ~ N(A).l) and N(AT) ~ C(A).l ( or C(A) ~
N (AT).l ), but the equalities between them do not follow immediately. The
next theorem shows that we have equalities in both inclusions, that is, the
row space 'R.(A) and the null space N(A) are orthogonal complements of
each other, and the column space C(A) and the null space N(AT) of AT are
orthogonal complements of each other. Note that the above theorem also
shows that N(A) n'R.(A) = {O} and C(A) n N(AT) = {O}.

Theorem 5.20 (The second fundamental theorem) For any mxn ma-
trix A,
(1) N(A) EB'R.(A) = ~n,
(2) N(AT) EB C(A) = ~m.
186 CHAPTER 5. INNER PRODUCT SPACES

Proof: (1) Since both the row space R(A) and the null space N(A) of A
are subspaces of ]Rn, we have N(A) + R(A) ~ ]Rn in general. However,

dim(N(A) + R(A)) dimN(A) + dim R(A) - dim(N(A) n R(A))


= dimN(A) + dim R(A)
dimN(A) + rank A
n = dim]Rn ,

since dim(row space) + dim(null space) = n = number of columns. This


means that N(A) + R(A) = ]Rn. Actually we have N(A) EB R(A) = ]Rn
since N(A) n R(A) = {O}. A similar argument applies to AT to get (2). 0

Corollary 5.21 (1) N(A) = R(A).l , and hence R(A) = N(A).l.


(2) N(AT) = C(A).l, and hence C(A) = N(AT).l.

For an m x n matrix A considered as a linear transformation A : ]Rn ---+


]Rm, the decompositions]Rn = R(A) EB N(A) and]Rm = C(A) EB N(AT) given
in Theorem 5.20 depict the following figure with r = rank A.

R (A) ~]Rr C(A) ~]Rr

IR n IR m
x A b
Xr
AT be

Note that if rank A = r , then dim R(A) = r = dimC(A) , dimN(A) =


n - rand dimN(A T ) = m - r. The figure shows that for any b e in the
column space C(A), which is the range of A, there is an x E ]Rn such that
Ax = b e. Now there exist unique Xr E R(A) and Xn E N(A) such that
x = Xr + Xn . Thus be = Ax = A(xr + xn) = Ax r . Moreover, for any
x' E N(A), A(xr + x') = AXr = b e, since Ax' = o. Therefore, the set of all
solutions to Ax = b e is precisely X r + N(A), which is the n - r dimensional
plane parallel to the null space N(A) and passing through X r .
5.B. LEAST SQUARE SOLUTIONS 187

In particular, if rank A = m, then N(AT) = {O} and hence C(A) = IRm.


Thus for any b E IRm , the system Ax = b has solutions of the form Xr + x n ,
where Xn E N(A) is arbitrary and Xr E R(A) is unique (this is the case in
the existence Theorem 3.23).
On the other hand, if rank A = n :s; m, then N(A) = {O} and hence
R(A) = IRn. Therefore, the system Ax = b has at most one solution, that
is, it has a unique solution Xr in the row space if b E C(A), and has no
solution (that is, the system is inconsistent) if b ~ C(A) (this is the case in
the uniqueness Theorem 3.24). The latter case occurs when m > r = rank A:
that is, N(AT) is a nontrivial subspace of IRm.

Problem 5.21 Show that


(1) if Ax = b and AT y = 0, then yTb = 0, and
(2) if Ax = 0 and ATy = c, then xTc = o.

Problem 5.22 Given two vectors (1, 2, 1, 2) and (0, -1, -1, 1), find all vectors
in 1R4 that are perpendicular to them.

Problem 5.23 Find a basis for the orthogonal complement of the row space of A:

(1) A = [ 13 0268]
2 -1 1 ,

5.8 Least square solutions


We consider again a system Ax = b of linear equations. Recall that the
system Ax = b has at least one solution if and only if b belongs to the
column space C(A) of A. In this case, such a solution is unique if and only
if the null space N(A) of A is trivial.
Now the problem is "what happens if b ~ C(A) ~ IRm so that Ax = b is
inconsistent?" Note that for any x E IRn , Ax E C(A). Thus the best we can
do is to find a vector Xo E IRn such that Axo is closest to the given vector
bin IRm , i.e., IIAxo - bll is as small as possible. Such a solution vector
Xo gives the best approximation Axo to b, and is called a least square
solution of Ax = b. However, since we have the orthogonal decomposition
IRm = C(A)EBN(AT), we know that for any bE IRm , ProjC(A) (b) = be E C(A)
188 CHAPTER 5. INNER PRODUCT SPACES

is the closest vector to b among the vectors in C(A). Therefore, a least square
solution Xo E ~n satisfies the following:

Axo = be ProjC(A) (b),


IIAxo - bll < IIAx - bll
for any vector x in ~n. Since be E C(A), there always exists a least square
solution Xo E ~n such that Axo = be. It is quite easy to show that all other
least square solutions are the vectors in Xo + N(A).
In summary, a least square solution of Ax = b, when b rf- C(A), is
simply a solution of Ax = be, where be = ProjC(A) (b) E C(A) in the unique
orthogonal decomposition of

with b n = b - be E N(A T ). That is, to find such a least square solution, we


first have to find be and then solve Axo = be.
Practically, the computation of be from b could be quite complicated,
since we first have to find an orthonormal basis for C(A) by using the Gram-
Schmidt orthogonalization (whose computation is cumbersome) and then
express be with respect to this orthonormal basis for a given b.
To find an easier method, let us examine a least square solution in a little
more detail. If Xo E ~n is a least square solution of Ax = b, i.e., a solution
ofAxo = be, then Axo - b = Axo - (b e + b n ) = -b n E N(AT) holds. Thus,
by applying AT to the equation, we get AT Axo = ATb, i.e., Xo is a solution
of the equation

This equation is very interesting because it also gives a sufficient condition


of a least square solution as the following theorem shows, and so is defined
to be the normal equation of Ax = b.

Theorem 5.22 Let A be an m x n matrix, and let b E ~m be any vector.


Then a vector Xo E ~n is a least square solution of Ax = b if and only if Xo
is a solution of the normal equation AT Ax = ATb.

Proof: We only need to show the sufficiency of the normal equation: If Xo


is a solution of the equation AT Ax = ATb, then, AT(Axo - b) = 0, so
Axo - b = Axo - (be + b n ) E N(AT ). This means that, as a vector in C(A),
Axo - be E N(AT) n C(A) = {O}. Therefore Axo = be = ProjC(A)(b), i.e.,
5.S. LEAST SQUARE SOLUTIONS 189

Xo is a least square solution of Ax = b. o

Note that if the rows of A are linearly independent, then rank A = m


and C(A) = ]Rm (or N(AT) = 0). Thus, a least square solution of Ax = b is
simply a usual solution.

Example 5.13 Find all the least square solutions to Ax = b, and then
determine the orthogonal projection be of b into the column space C(A) of
A, where
-2

[-~ -:2 1' b=[!]


-3
A= 1
-5 0

Solution:

AT A =[-~
2 -1
-3
-1
1
2 -i] [-~ -~ 1
-2
-3
1
[ -24
15
-3
-24
39
3
-3]
3
6
,
-5

and

From the normal equation, a least square solution of Ax =b is a solution


of AT Ax = ATb, i.e.,

15
[ -24 - 39
24 -3]
3 [Xl]
X2
[-
01] .
-3 3 6 X3 3

By solving this system of equations (left for an exercise), we obtain all the
least square solutions desired:
190 CHAPTER 5. INNER PRODUCT SPACES

for any number t. Now

Note that the vector x ~ k [ =~ 1is not in R(A). One needs to do a little
more computation to find a least square solution x E R(A). 0

Problem 5.24 Find all least square solutions x in ]R3 of Ax = b, where

A= [ 1o 0 2]
-1
2
1
2
-1 ' b =
[3]
-3

-1 2 0 -3

Practically, finding the solutions of the normal equation depends very


much on AT A. In the most fortunate case, if the square matrix AT A is
the identity matrix, then the normal equation AT Ax = ATb of the system
Ax = b reduces to x = ATb, which is simply a least square solution. Even
if AT A is not the identity matrix, we may still have several simple cases.
Remark: Let us now discuss the solvability of this normal equation. Ob-
serve that AT : ]Rm ~ ]Rn and the row space R(A) of A and the column
space C(AT) of AT are the same. Thus, for any Xr E C(AT) = R(A) there
exists a vector b E ]Rm such that ATb = x r . If we write b = be + b n for
unique be E C(A) and b n E N(AT ), then Xr = ATb = ATb e . Therefore, the
restrictions

A Aln(A) : R(A) ~ ]Rn ~ C(A) ~]Rm and


AT ATlc(A) : C(A) ~ ]Rm ~ R(A) ~ ]Rn

are one-to-one and onto transformations,_that is, they_are invertible. How-


ever, even in this case we do not have AAT = Ir nor AT A = Ir in general.
The transpose AT of a matrix A satisfies the following equation: For
x E ]Rn and y E ]Rm, Ax E ]Rm, so

Ax· y = (Axf y = x T AT Y = X· AT y.

The following theorem gives a condition for AT A to be invertible.


5.B. LEAST SQUARE SOLUTIONS 191

Theorem 5.23 For any m x n matrix A, AT A is a symmetric n x n square


matrix and rank(A T A) = rank A.

Proof: Clearly, AT A is square and symmetric: (AT Af = AT(AT)T


AT A. Since the number of columns of A and AT A are both n, we have

rank A + dimN(A) = n = rank (AT A) + dimN(A T A).


Hence, it suffices to show that A and AT A have exactly the same null space
so that dimN(A) = dimN(AT A). If x E N(A), then Ax = 0 and also
AT Ax = ATO = 0, so that x E N(AT A). Conversely, suppose that AT Ax =
O. Then

Ax· Ax = (Ax)T(Ax) = xT(AT Ax) = X· AT Ax = X· 0 = O.

Hence Ax = 0, i.e., x E N(A). D

In the following discussion, we assume that the columns of A are linearly


independent, i.e., rank A = n, so that N(A) = {O}, or A is one-to-one.
Hence the system Ax = be has a unique solution x in R(A) = ]Rn. Moreover,
by Theorem 5.23, the square matrix AT A is also ofrank n and it is invertible.
In this case, from the normal equation, a least square solution is

Corollary 5.24 If the columns of A are linearly independent, then


(1) AT A is invertible so that (AT A)-l AT is a left inverse of A,
(2) the vector x = (AT A)-l ATb is the unique least square solution of a
system Ax = b, and
(3) Ax = A(AT A)-lATb is the projection be of b into the column space
C(A).

By applying Corollary 5.24 to AT, we can say that, if rank A = m for an


m x n matrix, then AAT is invertible and AT (AAT) -1 is a right inverse of
A (cf. Remark after Theorem 3.24). Moreover, by using Theorem 5.23, we
can show that for a matrix A, AT A is invertible if and only if the columns
of A are linearly independent, and AAT is invertible if and only if the rows
of A are linearly independent.
192 CHAPTER 5. INNER PRODUCT SPACES

Example 5.14 Consider the following system of linear equations:

Clearly, the two columns of A are linearly independent and C(A) is the xy-
plane. Thus b ~ C(A). Note that

which is invertible. By a simple computation one can obtain

(AT A)-l = ~ [ 29 -7].


9 -7 2

Hence,

x = (AT A)-l ATb = ~9 [ -7


29 -7]2 [237] = ~9 [ -342] = [ -1/3
14/3]

is a least square solution, and the orthogonal projection of b in C(A) is

be = Ax = [ ~1 2]
~ [14/3]
-1/3 = [ 4]
~ .
o

Problem 5.25 Let W be the subspace of the Euclidean space IR3 spanned by the
vectors Vl = (1, 1, 2) and V2 = (1, 1, -1). Find Projw(b) for b = (1, 3, -2).

5.9 Application: Polynomial approximations


In this section, one can find a reason for the name of the "least square"
solutions, and the following example illustrates an application of the least
square solution to the determination of the spring constants in physics.
5.9. APPLICATION: POLYNOMIAL APPROXIMATIONS 193

Example 5.15 Hooke's law for springs in physics says that for a uniform
spring, the length stretched or compressed is a linear function of the force
applied, that is, the force F applied to the spring is related to the length x
stretched or compressed by the equation

F = a+kx,

where a and k are some constants determined by the spring.


Suppose now that, given a spring of length 6.1 inches, we want to deter-
mine the constants a and k under the experimental data: The lengths are
found to be 7.6, 8.7 and lOA inches when forces of 2, 4 and 6 kilograms,
respectively, are applied to the spring. However, by plotting these data

(x, F) = (6.1, 0), (7.6, 2), (8.7, 4), (lOA, 6),

in the x F-plane , one can easily recognize that they are not on a straight
line of the form F = a + kx in the xF-plane, which may be caused by
experimental errors. This means that the system of linear equations:

Fl = a + 6.1k 0,
{ F2 = a+ 7.6k 2,
F3 = a+8.7k 4,
F4 = a + 10Ak 6

is inconsistent (i. e., has no solutions so the second equality in each equation
may not be a true equality). Thus, the best thing one can do is to determine
the straight line that "fits" the data, that is, the line that minimizes the sum
of the squares of the vertical distances from the line to the data: i. e., one
needs to minimize

This quantity is simply the square of the distance between the vector b =
(0,2,4,6) in ]R4 and the vectors (H, F2, F3, F4) in the column space C(A) of
the 4 x 2 matrix

A= [ 1
1 6.1
7.6
1
1 8.7 '
1 lOA
194 CHAPTER 5. INNER PRODUCT SPACES

since the matrix form of the system of linear equations is

The minimum of the sum of squares is obtained when (FI' F 2 , F 3 , F4 ) is the


projection of the vector b = (0,2,4,6) into the column space C(A), that is,
what we are looking for is the least square solution of the system, which is
now easily computed as

[~ 1= x = (AT A)-l ATb = [ -~:~ l·


It gives F = -8.6 + l.4x. o

In general, a common problem in experimental work is to obtain a poly-


nomial Y = f(x) in two variables x and Y that "fits" the data of various
values of y determined experimentally for inputs x, say

(Xl, YI), (X2, Y2), ... , (Xn, Yn),

plotted in the xy-plane. Some possibilities are (1) by a straight line: y =


a+bx, (2) by a quadratic polynomial: y = a+bx+cx 2, or (3) by a polynomial
of degree k: y = ao + alx + ... + akxk, etc.
As a general case, suppose that we are looking for a polynomial y =
f (x) = ao + a I X + a2x2 + ... + akxk of degree k that passes through the given

!
data, then we obtain a system of linear equations,

f(xd = ao + alXI + a2 x I + ... + akXr YI


f(X2) = ao + alX2 + a2x~ + ... + akx~ Y2

f(x n ) = ao + alxn + a2x~ + ... + akx~


or, in matrix form, the system may be written as Ax = b:
Xl x 2I xkI ao YI
x2 x~ x~ al Y2
[j
xn x n2 xkn ak Yn
5.9. APPLICATION: POLYNOMIAL APPROXIMATIONS 195

The left side Ax represents the values of the polynomial at Xi'S and the right
side represents the data obtained from the inputs Xi'S in the experiment.
If n ::; k + 1, then the cases have already been discussed in Section 3.8.
In general, this kind of system may be inconsistent (i.e., it may have no
solution) if n > k + 1. This means that there may be no polynomial of
degree k < n - 1 whose graph passes through the n data (Xi, Yi) in the XY-
plane. Practically, it is due to the fact that the experimental data usually
have some errors.
Thus, the best thing we can do is to find the polynomial f(x) that min-
imizes the sum of the squares of the vertical distances between the graph
of the polynomial and the data. In matrix and vector space language, an
inconsistency of the system means that the vector b E ]Rn representing the
data is not in the column space C(A) of the coefficient matrix A. And min-
imizing the sum of the squares of the vertical distances between the graph
of the polynomial and the data means looking for the least square solution
of the system, because for any c E C(A) of the form

1 Xl x2
I
xk
I ao ao + alxl + ... + akxIk
ao + alx2 + ... + ak x 2
x2 k
1 X2 2 x~ al
=c,

1 Xn x2
n
xk
n ak ao + alxn + ... + akXnk
we have
lib - cl1 2 = (YI - ao - alxl - ... - akxt)2 + ...
+(Yn - ao - alX n - ... - akx~)2.

The previous theory says that the orthogonal projection be of b into the
column space of A minimizes this quantity and shows how to find be and a
least square solution Xo.

Example 5.16 Find a straight line Y = a + bx that fits the given experi-
mental data, (1, 0), (2, 3), (3, 4) and (4, 4), that is, a line Y = a + bx that
minimizes the sum of squares of the vertical distances IYi - a - bXi I's from

n
the line Y = a + bx to the data (Xi, Yi). By adapting matrix notation

1 1
1 2
1 3
1 4
x= [ ~1 and b = [
196 CHAPTER 5. INNER PRODUCT SPACES

we have Ax = b and want to find a least square solution of Ax = b. But


the columns of A are linearly independent, and the least square solution is
x = (AT A)-l ATb. Now,

AT A = [4 10
10 30
1' (AT A)-l = [ _ t~ ~1
~'
ATb
=
[11
34
1.
Hence, we have

an d y = -"21 + 13 . d rme.
. t h e d eSlre
lOX IS o

Problem 5.26 From Newton's second law of motion, a body near the surface of the
earth falls vertically downward according to the equation
1
s(t) = So + vat + 2" g t 2,
where s(t) is the distance that the body traveled in time t, and So, va are the
initial displacement and velocity, respectively, of the body, and g is the gravita-
tional acceleration at the earth's surface. Suppose a weight is released, and the
distances that the body has fallen from some reference point were measured to be
s = -0.18, 0.31, 1.03, 2.48, 3.73 feet at times t = 0.1, 0.2, 0.3, 0.4, 0.5 seconds,
respectively. Determine approximate values of So, va, g using these data.

5.10 Orthogonal projection matrices


In Section 5.8, we have seen that the orthogonal projection ProjC(A) of]R.m
on the column space C(A) of an m x n matrix A plays an important role in
finding a least square solution of Ax = b. Note that any subspace W of]R.m
is the column space of such a matrix A, whose columns are the vectors in
a basis for W. Therefore, in this section, we only consider the orthogonal
projection Projw of an inner product space V onto a subspace W, and aim
to find its associated matrix, called an orthogonal projection matrix, for
the projection Projw. This will give us a practical way of computing a given
orthogonal projection.
5.10. ORTHOGONAL PROJECTION MATRICES 197

First of all, if a subspace W of the Euclidean space ]Rm has an orthonor-


mal basis {3 = {Ul, U2, ... , un}, then for any x E ]Rm,

Projw(x) (Ul . X)Ul + (U2 . X)U2 + ... + (Un· X)U n


= ul(uf x) + U2(Ur x) + ... + Un(U;X)
(uluf + U2Ur + ... + UnU;)X,
by Lemma 5.11. Note that in this equation, Projw is a linear transformation,
but the right side is the usual matrix product of vectors. It implies that if
an orthonormal basis for a subspace W is given, the matrix representation
(projection matrix) of the orthogonal projection Projw with respect to the
standard basis Q for ]Rm is given as

Note that if we denote by Projui the orthogonal projection of ]Rm on the


subspace spanned by the basis vector Ui for each i, then matrix representa-
tion is uiuT (see page 176). Moreover, by using the matrix representations,
it can be shown that

and
ProjU.
J
0 Proju. =
,
{pO . rOJUj
ifi=l=j,
if i = j.

Problem 5.27 Let u = (~, ~) be a vector in ]R2 which determines I-dimensional


subspace U = {au = (J"2, J"2) :
a E ]R}. Show that the matrix

A = uuT = ~[ ~ ] [l1J = ~ [~ ~],


considered as a linear transformation on ]R2, is an orthogonal projection onto the
subspace U.

Problem 5.28 Show that if {VI, V2, ... , v m } is an orthonormal basis for ]Rm, then
VIV! +v2v1 + ... +vmv~ = 1m·

Definition 5.11 Let W be a subspace of the Euclidean m-space ]Rm. An


m x m matrix P is called the (orthogonal) projection matrix on a sub-
space W if Projw(x) = Px for any vector x in ]Rm. Equivalently, P is the
matrix representation of the orthogonal projection Projw of ]Rm onto W
with respect to the standard basis for ]Rm.
198 CHAPTER 5. INNER PRODUCT SPACES

It has already been shown that UI uT + U2Ur + ... + ukuk is a projection


matrix for any orthonormal set {UI' U2, ... , ud in ]Rm. Such an expression
of the projection matrix on a subspace W can be obtained only when an
orthonormal basis for W is known.
Now, let W be an n-dimensional subspace of the Euclidean space ]Rm,
and let {VI, V2, ... , V n} be a (not necessarily orthonormal) basis for W.
If we find an orthonormal basis for W by the Gram-Schmidt orthogonaliza-
tion, then we can get the projection matrix of the previous form. But the
Gram-Schmidt orthogonalization process could be cumbersome and tedious.
Sometimes, one can avoid this cumbersome process. Let A = [VI V2 ... vnl
be the m x n matrix having the basis vectors Vi'S as columns. Clearly, we
have W = C(A). For any vector b E lR m , the projection vector Projw(b)
is simply the vector Axo for a least square solution Xo of Ax = b that is a
solution of the normal equation AT Ax = ATb.
On the other hand, since the columns of A are linearly independent, AT A
is invertible, so Xo = (AT A)-l ATb, and, furthermore,

by Corollary 5.24. This means that A(AT A)-l AT is the projection matrix on
the subspace W = C(A). Note that this projection matrix is independent of
the choice of basis for W due to the uniqueness of the matrix representation
of a linear transformation with respect to a fixed basis. Some possible simple
computations for the matrix A(AT A)-I AT will follow later. This argument
proves the following theorem.

Theorem 5.25 For any subspace W oflRm , the projection matrix P on W


can be written as

for a matrix A whose columns form a basis for W.

Example 5.17 Find the projection matrix P on the plane 2x - y - 3z =


in ]R3 and calculate Pb for b = (1, 0, 1).
°
Solution: Choose any basis for the plane 2x - y - 3z = 0, say,

VI = (0, 3, -1) and V2 = (1, 2, 0).


5.10. ORTHOGONAL PROJECTION MATRICES 199

Let A = [ ~ ~] be the matrix with VI and V2 as columns. Then


-1 0

(ATA)-l = [106 6]-1


5 = 14
1 [ 5 -6]
-6 10 .

The projection matrix is

P A(AT A)-lAT

1[ 03 1]
- 2
14 -1 0

-1 [102 13 -3
2 6] ,
14 6 -3 5

and

Pb = - 1 [102 132 -36] [1]


0 = -1 [ -1
16] .
14 6 -3 5 1 14 11
D

Remark: In particular, if the columns of A consist of an orthonormal basis


= {Ul, ... , un} for W, then
0:

A T A -- [-- uF: --] [ UlI I] [1. . ]


Un
-- u Tn -- I I 1

since uT Uj = Dij. Hence, the normal equation AT Ax = ATb becomes

-
x-A b-
T - [-- u.f: --] [ b.:l ] [(Ul., b) ]
:'
-- u Tn -- bn (u n, b)

which is just the expression of Projw(b) with respect to the orthonormal


basis 0: for W that are the columns of A.
200 CHAPTER 5. INNER PRODUCT SPACES

Corollary 5.26 Suppose that the column vectors {UI, ... , un} of A form
an orthonormal basis for W in ]Rm. Then we get
P = A(AT A)-l AT = AAT = UIU[ + U2U§ + ... + unU~.
In particular, if A is an m x m orthogonal matrix, then, for all b E ]Rm,
Ax = b has the unique solution x = A-Ib = ATb.

Proof: For any x E ]Rm,


uT
I
Px=AATx = [ UI ... Un 1[ -- --] X

UT
n

[ UI ... Un 1 [ uTX 1
UTX
n
= (UIU[ + U2U§ + ... + Unu~)X,
where each uT x is a scalar as the inner product of Ui and x. o

Example 5.18 If A = [CI C2], where CI = (1, 0, 0), C2 = (0, 1, 0), then
the column vectors of A are orthonormal, C(A) is the xy-plane, and the
projection of b = (x, y, z) E ]R3 onto C(A) is be = (x, y, 0). In fact,

o
Before discussing the computation of P = [Projwla with a general basis
for W, we exhibit a criterion for a square matrix to be a projection matrix.

Theorem 5.27 A square matrix P is a projection matrix if and only if it


is symmetric and idempotent, i. e., pT :.= P and p 2 = P.

Proof: Let P be a projection matrix. Then, by Theorem 5.25, the matrix


P can be written as P = A(AT A)-l AT for some matrix A whose columns
are linearly independent. A simple expansion of P = A(AT A)-l AT gives

pT (A(AT A)-l AT) T = A(AT A)-IT AT = A(AT A)-l AT = P,


p2 PP = (A(AT A)-l AT) (A(AT A)-l AT) = A(AT A)-l AT = P.
[Link]. ORTHOGONAL PROJECTION MATRICES 201

We have already shown the second equation in Theorem 5.7.


For the converse, we have the orthogonal decomposition JRm = C(P) EB
N(pT) by Theorem 5.20. But N(pT ) = N(P) since p T = P. Note that
p2 = P implies Pu = u for u E C(P) (see Theorem 5.7). 0

From Corollary 5.8, if P is a projection matrix on C(P), then J - P is


also a projection matrix on the null space N(P) (= C(J - P)), which is
orthogonal to C(P) (= N(I - P)).

Example 5.19 Let Pi : JRm __ JRm be defined by

Pi(XI, ... , xm) = (0, ... , 0, Xi, 0, ... , 0),

for i = 1, ... , m. Then each Pi is the projection of JRm onto the i-th axis,
whose matrix form looks like

o o
o 1
1 o
o 1
o o
When we restrict the image to JR, Pi is an element in the dual space JRn *, and
usually denoted by Xi as the i-th coordinate function (see Example 4.25).

Problem 5.29 Show that any square matrix P that satisfies pT P = P is a projection
matrix.

In general, if A = [CI ... c n ] is an m x n matrix with linearly indepen-


dent column vectors CI, ... , Cn, then rank A = n ::; m and {CI' ... , cn}
form a basis for the column space W = C(A) of dim n in JRm. By using
the Gram-Schmidt orthogonalization, one can obtain an orthonormal basis
{UI' ... , un} for C(A) from this basis, so that the matrix Q = [UI ... un]
and A have the same column space W. Then, by the Remark on page 199,
Projw = QQT. The computation of the Gram-Schmidt orthogonalization
might be messy, but these cases occur frequently in applied science and
engineering problems, so we show the process in detail in the following.
202 CHAPTER 5. INNER PRODUCT SPACES

From the Gram-Schmidt orthogonalization,

CI
(C2' ql)
C2 - (ql, ql) ql

(C n , qn-I) (c n , ql)
qn = C n- qn-I - ... - ql,
(qn-I, qn-I) (ql, ql)
gives an orthogonal basis {ql, ... , qn} for C(A). By taking normalization
of these vectors, we obtain an orthonormal basis {UI' ... , un} for C(A),
where Ui = qdllqill. Rewriting these equations gives us

IlqIllul
b2I UI + b22 U 2

where aij = {~:',~~~ for i > j, aii = 1, and bij = aijll%11 for i 2: j. Hence,
bl l b2I bnl
0 b22 bn2
A = [CI ... c n ] = [UI ... un] =QR.

0 0 bnn

I
The matrix Q = [UI ... un] is an m x n matrix with orthonormal columns,
called the orthogonal part of A, and

bll b2I ... bnl


o b22 ... bn 2
R=
o 0 bnn
is an invertible upper triangular matrix, called the upper triangular part
of A (note that all the diagonal bii -=I 0). Such an A = QR is called the
QR factorization of an m x n matrix A, when rank A = n. With this
decomposition of A, the projection matrix can now be calculated easily as
P = A(AT A)-IAT = QR(RTQTQR)-IRTQT = QQT,
and x = (AT A)-I ATb = R-IQTb.
5.10. ORTHOGONAL PROJECTION MATRICES 203

Example 5.20 Let us find the projection matrix for

[ 1101
1 0]
A = [CI C2 C3] = 0 1 1 .
001
Solution: We first find the decomposition of A into Q and R, the orthog-
onal part and the upper triangular part:

1 1 0 1 [~0 ..J3/~ 1/~ 1


1/~ 1/)6
A =
r o~ 0~ 1~ = [UI U2 U3]
0 0 J7/..J3
1/~ 1/)6 -2/V21] [~ 1/~ 1/~ 1
[ 1/~ -1/)6 2/V21 0 ..J3/~ 1/)6 = QR
o ~/..J3 2/V21 0 0 J7/..J3 '
o 0 ..J3/J7
and
6/7 1/7 1/7
P = QQT = [ 1/7 6/7 -1/7 -2/7]
2/7
-1/7 -1/7 6/7 2/7 .
-2/7 2/7 2/7 3/7 o
204 CHAPTER 5. INNER PRODUCT SPACES

Problem 5.30 Find the 2 x 2 matrix P that projects the xy-plane onto the line

n
y =x.

Problem 5.31 Find the projection matrix P of]R.3 onto the column space C(A) for

A~ [i
Problem 5.32 Find the matrix for orthogonal projection from ]R.3 to the plane
spanned by the vectors (1, 1, 1) and (1, 0, 2).

Problem 5.33 Find the projection matrix P on the Xll X2, X4 coordinate subspace
of ]R.4.

Problem 5.34 Find the QR factorization of the matrix [ ~~~: co; B ] .

5.11 Exercises
5.1. Decide which of the following functions on ]R.2 are inner products and which
are not. For x = (Xl, X2), Y = (Yl, Y2),
(1) (x,y) = XlYlX2Y2,
(2) (x, y) = 4XlYl + 4X2Y2 - XlY2 - X2Yl,
(3) (x,y) = XlY2 - X2Yl,
(4) (x,y) = XlYl + 3X2Y2,
(5) (x, y) = XlYl - XlY2 - X2Yl + 3X2Y2.
5.2. Show that the function (A, B) = treAT B) for A, B E Mnxn(]R.) defines an
inner product on Mnxn(]R.).
5.3. Find the angle between the vectors (4, 7, 9, 1, 3) and (2, 1, 1, 6, 8) in ]R.5.
5.4. Determine the values of k so that the given vectors are orthogonal with respect
to the Euclidean inner product in ]R.4.

5.5. Consider the space e[O, 1] with the inner product defined by

(/,g) = 11 f(x)g(x)dx.

Compute the length of each vector and the cosine of the angle between each
pair of vectors in each of the following:
5.11. EXERCISES 205

(1) f(x) = 1, g(x) = x;


(2) f(x) = x m , g(x) = x n , where m, n are nonnegative integers;
(3) f(x) = sin 7l"mx, g(x) = sin 7l"nx, where m, n are integers.

5.6. Prove that

for any real numbers aI, a2, ... , an. When does equality hold?
5.7. Let V = P2([0, 1]), the space of polynomials of degree::; 2 on [0, 1]. Equip
V with the inner product

(1, g) = 11 f(t)g(t)dt.

(1) Compute (1,g) and Ilfll for f(x) = x + 2 and g(x) = x 2 - 2x - 3.


(2) Find the orthogonal complement of the subspace of scalar polynomials.

5.8. Find an orthonormal basis for ]R3 with the Euclidean inner product by ap-
plying the Gram-Schmidt orthogonalization to the vectors x = (1, 0, 1),
X2 = (1, 0, -1), X3 = (0, 3, 4).

5.9. Show that if u is orthogonal to v, then every scalar multiple of u is also


orthogonal to v. Find a unit vector orthogonal to VI = (1, 1, 2) and V2 =
(0, 1, 3) in ]R3.
5.10. Determine the orthogonal projection of VI onto V2 for the following vectors
in the n-space ]Rn with the Euclidean inner product.
(1) VI = (1, 2, 3), V2 = (1, 1, 2),
(2) VI = (1, 2, 1), V2 = (2, 1, -1),
(3) VI = (1, 0, 1, 0), V2 = (0, 2, 2, 0).
5.11. Let 5 = {vd, where Vi'S are given below. For each 5, find a basis for 51-
with respect to the Euclidean inner product on ]Rn.
(1) VI = (0, 1, 0), V2 = (0, 0, 1),
(2) VI = (1, 1, 0), V2 = (1, 1, 1),
(3) VI = (1, 0, 1, 2), V2 = (1, 1, 1, 1), V3 = (2, 2, 0, 1).

5.12. Which of the following matrices are orthogonal?

[ 1/2 -1/3] (2) [ 4/5 -3/5 ]


(1) -1/2 1/3' -3/5 4/5 '

(3)
[ 1jv'2
° -1/v'2 ° -1/v'2]
1/v'2 , (4)
[ 1jv'2
1/~
1/V3
-1/V3
-1/v'G]
1/V6 .
-1/v'2 1/v'2
° 1/V3 2/V6
206 CHAPTER 5. INNER PRODUCT SPACES

5.13. Consider]R4 with the Euclidean inner product. Let W be the subspace of]R4
consisting of all vectors that are orthogonal to both x = (1, 0, -1, 1) and
y = (2, 3, -1 , 2) . Find a basis for W.

5.14. Let V be an inner product space. For vectors x and y in V , establish the
following identities:
(1) (x,y) = 111x + Yl12 -111 x _ Yl12 (polarization identity) ,
(2) (x,y) = ~ (11x + Yl12 -lIxll 2-llyI12) (Polarization identity) ,
(3) Ilx + Yl12 + Ilx - Yl12 = 2(llx11 2+ IIYI12) (parallelogram equality).

5.15. Show that x + y is perpendicular to x - y if and only if Ilxll = Ilyli.


5.16. Let A be the m x n matrix whose columns are Cl, ... ,C n in ]Rm. Prove that
the volume of the n-dimensional parallelepiped P(A) determined by those
vectors Cj 'S in ]Rm is given by

vol(A) = Vdet(AT A) .

(Note that the volume of the n-dimensional parallelepiped determined by Cl ,


in ]Rm is by definition the multiplication of the volume of the (n - 1)-
... , C n
dimensional parallelepiped (base) determined by C2 , . .. , C n and the height of
Cl from the plane W which is spanned by C2 , ... , C n . Here, the height is the
length of the vector C = Cl - ProjW(Cl), which is orthogonal to W. If the
vectors are linearly dependent, then the parallelepiped is degenerate, i.e., it
is contained in a subspace of dimension less than n .)

5.17. Find the volume of the three-dimensional tetrahedron in ]R4 whose vertices
are at (0, 0,0, 0), (1,0,0, 0) , (0,1,2, 2) and (0,0,1, 2).

5.18. For an orthogonal matrix A , show that det A = ±1. Give an example of an
orthogonal matrix A for which det A = -1.
5.11. EXERCISES 207

5.19. Find orthonormal bases for the row space and the null space of each of the

n
following matrices.

(1) [ 21 41 3]
1 , (2) [ -21 -34 0]
1 ,
[1 ~ ~
201 o 0 2 (3)

5.20. Let A be an m x n matrix of rank T. Find a relation of m, nand T so that


Ax = b has infinitely many solutions for every b E IRm.
5.21. Find the equation of the straight line that best fits the data of the four points
(0, 1), (1, 3), (2, 4), and (3, 4).
5.22. Find the cubic polynomial that best fits the data of the five points
(-1, -14), (0, -5), (1, -4), (2, 1), and (3, 22).
5.23. Let W be the subspace of IR4 spanned by the vectors Xi'S given in each of the
following problems. Find the projection matrix P for the subspace Wand
the null space N(P) of P. Compute Pb for b given in each problem.
(1) = (1, 1, 1, 1), X2 = (1, -1, 1, -1), X3 = (-1, 1, 1, 0), and
Xl
= (1, 2, 1, 1).
b
(2) Xl = (0, -2, 2, 1), X2 = (2, 0, -1, 2), and b = (1, 1, 1, 1).
(3) Xl = (2, 0, 3, -6), X2 = (-3, 6, 8, 0), and b = (-1, 2, -1, 1).
5.24. Find the projection matrix for the row space and the null space of each of
the following matrices:

(2) [ 21 41 1]
1 ' (3) [~~ ~ ].
2 3 -1

5.25. Consider the space C[-l, 1J with the inner product defined by

(f,g) = [11 f(x)g(x)dx.


A function f E C[-l, 1J is even if f( -x) = f(x), or odd if f( -x) = - f(x).
Let U and V be the sets of all even functions and odd functions in C[-l, 1],
respectively.
(1) Prove that U and V are subspaces and C[-l, 1J = U + V.
(2) Prove that U ..1 V.
(3) Prove that for any f E C[-l, 1], IIfl12 = IIhl1 2 + IIgl12 where f = h+ 9 E
UEBV.
5.26. Determine whether the following statements are true or false, in general, and
justify your answers.
208 CHAPTER 5. INNER PRODUCT SPACES

(1) Two vectors x and y in an inner product space are linearly independent
if and only if the angle between x and y is not zero.
(2) If V is perpendicular to W, then V..l is perpendicular to W..l.
(3) Every permutation matrix is an orthogonal matrix.
(4) The projection of]Rm on a subspace W is a linear transformation of]Rm
into itself.
(5) Two different subspaces of]Rm may have the same projection matrix.
(6) An n x n symmetric matrix A is a projection matrix if and only if
A2 =1.
(7) For any m x n matrix A and b E ]Rm, AT Ax = ATb always has a
solution.
(8) An inner product can be defined on every vector space.
(9) Let V be an inner product space. Then Ilx - yll ~ Ilxll - Ilyll for any
vectors x and y in V.
(10) The least square solution of Ax = b is unique for any symmetric matrix
A.
(11) Every system of linear equations has a least square solution.
(12) The least square solution of Ax = b is the orthogonal projection of b
on the column space A.
Chapter 6

Eigenvectors and Eigenvalues

6.1 Introduction
Gaussian elimination plays a fundamental role in solving a system Ax = b
of linear equations. In order to solve a system of linear equations, Gaussian
elimination reduces the augmented matrix to a (reduced) row-echelon form
by using elementary row operations that preserve row and null spaces.
In this chapter, as another method of simplifying a square matrix, we
examine which matrices can be similar to diagonal matrices, and what the
transition matrices are in this case. The tools are eigenvalues and eigenvec-
tors. In fact, they play important roles in their own right in mathematics
and have far-reaching applications not only in mathematics, but also other
fields of science and engineering. Some specific applications with a square
matrix A are (1) solving systems Ax = b of linear equations, (2) checking
the invertibility of A or estimation of det A, (3) calculating a power An or
the limit of a matrix series 2:~1 An, (4) solving systems of linear differential
equations or difference equations, (5) finding a simple form of the matrix
representation of a linear transformation, etc. One might notice that some
of the problems listed above are easy if A is diagonal.
We begin by introducing eigenvalues and eigenvectors of a square matrix
A. For an n x n square matrix A, there may exist a nonzero vector that is
transformed by A into a scalar multiple of itself.

Definition 6.1 Let A be an n x n square matrix. A nonzero vector x in


the n-space ]Rn is called an eigenvector (or characteristic vector) of A

209
210 CHAPTER 6. EIGENVECTORS AND EIGENVALUES

if there is a scalar A in lR such that

Ax = AX.

The scalar A is called an eigenvalue (or characteristic value) of A, and


we say x belongs to A.

Geometrically, an eigenvector of a matrix A is a nonzero vector x in


the n-space lRn to which Ax is parallel. Algebraically, an eigenvector x is
a nontrivial solution of the homogeneous system (AI - A)x = 0 of linear
equations, that is, an eigenvector x is a nonzero vector in the null space
N(AI - A). There are two unknowns in this equation: an eigenvalue A and
an eigenvector x. To find those unknowns, first we should find an eigenvalue
A by using the fact that the equation (AI - A)x = 0 has a nontrivial solution
x if and only if A satisfies the equation

det(AI - A) = o.
Note that the left side is a polynomial of degree n in A, called the charac-
teristic polynomial of A. Thus the eigenvalues are simply the roots of the
equation det(AI - A) = O.
Thus, to find eigenvectors of A, first find the roots (or eigenvalues of A) of
the equation det(AI - A) = 0, and then solve the homogeneous system (AI -
A)x = 0 for each eigenvalue A. In summary, by referring to Theorem 3.25
we have the following theorem.

Theorem 6.1 If A is an n x n matrix, then the following are equivalent:


(1) A is an eigenvalue of A;
(2) det(AI - A) = 0 (or det(A - AI) = 0);
(3) AI - A is singular;
(4) the homogeneous system (AI - A)x = 0 has a nontrivial solution.

Hence, the eigenvectors of A belonging to an eigenvalue A are just the


nonzero vectors x in the null space N(AI - A). We call this null space the
eigenspace of A belonging to A, and denote it E(A).

Example 6.1 Find the eigenvalues and eigenvectors of

A= [ 2 J2 J2] 1 .
6.1. INTRODUCTION 211

Solution: The characteristic polynomial is

det(AI - A) = det [ A - 2
-J2 -J2] = A2 -
A-I 3A = A(A - 3),

whence the eigenvalues are Al = 0 and A2 = 3. To determine the eigenvectors


belonging to Ai'S, we should solve the homogeneous system of equations
(AJ - A)x = O. Let us take Al = 0 first; then the system of equations
(All - A)x = 0 becomes

0,
0,

Hence, Xl = (Xl, X2) = (-1, J2) is an eigenvector belonging to Al = 0,


and the eigenvectors of A belonging to Al = 0 are nonzero vectors of the
form tXI, t E R (Here, we can take any nonzero solution (Xl, X2) as an
eigenvector Xl belonging to Al = 0.)
For A2 = 3, the system of equations (A21 - A)x = 0 becomes

Xl - J2 X2 = 0, In
{
-J2 Xl + 2 X2 = 0, or Xl = Y 2 X2·
Thus, by a similar calculation, X2 = (J2, 1) is an eigenvector belonging to
A2 = 3 and the eigenvectors of A belonging to A2 = 3 are the nonzero vectors
of the form tX2, t E R Note that the eigenvectors Xl and X2 belonging to
the eigenvalues Al and A2 are linearly independent. 0

Example 6.2 Find a basis for the eigenspaces of

A =
3 -2 0
[ -2 3 0
1.
005

Solution: The characteristic polynomial of A is (A - 1) (A - 5)2, so that


the eigenvalues of A are Al = 1 and A2 = 5 with multiplicity 2. Thus, there
are two eigenspaces of A. By definition, X = (Xl, X2, X3) is an eigenvector of
A belonging to A if and only if X is a nontrivial solution of the homogeneous
system (AI - A)x = 0:

2
A-3
o
212 CHAPTER 6. EIGENVECTORS AND EIGENVALUES

If Al = 1, then the system becomes

Solving this system yields Xl = t, X2 = t, X3 = 0 for t E lR. Thus, the


eigenvectors belonging to Al = 1 are nonzero vectors of the form

so that (1, 1, 0) is a basis for the eigenspace E(AI) belonging to Al = 1.


If A2 = 5, then the system becomes

Solving this system yields Xl = -8, X2 = 8, X3 = t for 8, t E lR. Thus, the


eigenvectors of A belonging to A2 = 5 are nonzero vectors of the form

for 8, t E lR. Since (-1, 1, 0) and (0, 0, 1) are linearly independent, they
form a basis for the eigenspace E(A2) belonging to A2 = 5. 0

Example 6.3 Consider the matrix A = [~ ~]. Like the above example,
a simple computation shows that the characteristic polynomial is (A - 1)2
so that the eigenvalue A = 1 is of multiplicity 2. However, there is only one
linearly independent eigenvector x = (1, 0) belonging to A = 1. This kind
of matrix will be discussed in Chapter 7. 0

Note that the equation det(AI - A) = 0 may have complex roots, which
are called complex eigenvalues. However, the complex numbers are not
scalars of the real vector space. In many cases, it is necessary to deal with
those complex numbers, that is, we need to expand the set of scalars to
6.1. INTRODUCTION 213

include complex numbers. This expansion of the set of scalars to the set of
complex numbers leads us to work with complex vector spaces, which will
be treated in the next chapter. In this chapter, we restrict the discussion to
the case of real eigenvalues, even though the entire discussion in this chapter
applies in the same way to general complex vector spaces.

Example 6.4 The characteristic polynomial of the matrix

A = [ C?s B - sin B
smB cosB
1
is >.2 -2 cos B>'+ (cos 2 B+sin2 B). Thus, the eigenvalues are >'i = cos B±i sin B,
which are complex numbers, so this matrix as a rotation of]R2 has no real
eigenvalues unless B = mr, n = 0, ±1, ±2, .... 0

Problem 6.1 Let A be a 2 x 2 matrix whose characteristic polynomial is det(>.I -


A) = >.2 + b>' + c. Show that b = -tr A and c = det A.

Problem 6.2 Let>. be an eigenvalue of A and x an eigenvector belonging to >..


Use mathematical induction to show that >.m is an eigenvalue of Am and x is an
eigenvector of Am belonging to >.m for each m = 1, 2, ....

Remark: (1) If A is an upper triangular matrix, then the diagonal en-


tries are exactly the eigenvalues of A. In fact, the characteristic polynomial
satisfies
>.-all * ... *
det(>.I - A) = det

o *
>. - ann

= (>. - all) ... (>. - ann) = o.


(2) Let A and B be square matrices similar to each other (i. e., there
exists a nonsingular matrix Q such that B = Q-I AQ). Then

det(>.I - B) = det (Q-I(>'I)Q - Q-I AQ)

det (Q-I(>'I - A)Q)


= det Q-I det (>.1 - A) det Q
= det(>.I - A).
214 CHAPTER 6. EIGENVECTORS AND EIGENVALUES

Therefore, similar matrices have the same characteristic polynomial with the
same roots. In particular, the eigenvalues are invariant under the similarity.
However, their eigenvectors might be different: When B = Q-1 AQ, x is
an eigenvector of B belonging to A if and only if Qx is an eigenvector of A
belonging to A, since AQ = QB, and QBx = AQX.
(3) If an n x n matrix A has n eigenvalues AI, ... , An counting multi-
plicities, then the characteristic polynomial of A can be factorized as

det(A] - A) = (A - AI) ... (A - An).

If we take A = 0, then we get det(-A) = (-l)ndetA = (-l)nAl ... An.


Therefore, det A = Al ... An, the product of the n eigenvalues.
(4) On the other hand,
(A - AI) ... (A - An) = det(>.I - A)
A - an -a12
-a21 A - a22
det

which is a polynomial of the form peA) = An + Cn_1An-1 + ... + CIA + Co in


A. We can compute the coefficient Cn -1 of An - 1 in two ways by expending
both sides, and get

Al + ... + An = an + ... + ann = trA.


This shows that the trace of A is the sum of the n eigenvalues. Thus we
have the following theorem:

Theorem 6.2 Let A be an n x n square matrix. Then


(1) the eigenvalues are invariant under the similarity.
Moreover, if A has n eigenvalues counting multiplicities, then
(2) the determinant of A is the product of the n eigenvalues, and
(3) the trace of A is the sum of the n eigenvalues.
In Theorem 6.2 (2)-(3), we assume that the matrix A has n (real) eigen-
values counting multiplicities. But, by allowing the scalars to be complex
numbers, which will be done in the next chapter, every n x n matrix has n
eigenvalues counting multiplicities, so that (2) and (3) in Theorem 6.2 are
true for any square matrix.
6.1. INTRODUCTION 215

Corollary 6.3 The determinant and the trace of A are invariant under sim-
ilarity.

Recall that any square matrix A is singular if and only if det A = O.


However, det A is the product of its n eigenvalues. Thus a square matrix
A is singular if and only if zero is an eigenvalue of A, or A is invertible if
and only if zero is not an eigenvalue of A. The following corollaries are easy
consequences of this fact.

Corollary 6.4 For any n x n matrices A and B, the following are equivalent.
(1) Zero is an eigenvalue of AB.
(2) A or B is singular.
(3) Zero is an eigenvalue of BA.

Corollary 6.5 For any n x n matrices A and B, AB and B A have the same
eigenvalues.

Proof: By Corollary 6.4, zero is an eigenvalue of AB if and only if it is an


eigenvalue of BA. Let A be a nonzero eigenvalue of AB with (AB)x = AX
for a nonzero vector x. Then the vector Bx is not zero, since Ai- 0, but

(BA)(Bx) = B(AX) = A(Bx).


This means that Bx is an eigenvector of BA belonging to the eigenvalue A,
and A is an eigenvalue of BA. Similarly, any nonzero eigenvalue of BA is
also an eigenvalue of AB. 0

Problem 6.3 Find the matrices A and B such that det A = det B, trA = trB, but
A is not similar to B.

Problem 6.4 Let A1, A2, ... , An be the eigenvalues of an n X n matrix A. Then
(1) A is invertible if and only if Ai =1= 0, for all i = 1, 2, ... , n.
1 1 1
(2) If A is invertible, then the inverse A -1 has eigenvalues A1' A2' ... , An'

Problem 6.5 Show that A and AT have the same eigenvalues. Do they necessarily
have the same eigenvectors?

Problem 6.6 For any n x n matrices A and B, show that AB and BA are similar if
A or B is nonsingular.
216 CHAPTER 6. EIGENVECTORS AND EIGENVALUES

6.2 Diagonalization of matrices


In this section, we are going to show what kind of square matrices are similar
to diagonal matrices. That is, given a square matrix A, we want to know
whether there exists an invertible matrix Q such that Q-l AQ is a diagonal
matrix, and if so, how one can find such a matrix Q. It is closely related to
the eigenvalues and eigenvectors of A.
Recall that an n x n matrix A is a linear transformation on the n-space
IRn , whose matrix representation with respect to the standard basis is the
matrix A itself. If we take another basis for IRn , then we get another matrix
representation D, which is similar to A by a transition matrix Q as shown
in Section 4.6. In some cases, one can find a "good" basis for IR n so that the
matrix D is a diagonal matrix. In this case, the similarity D = Q-l AQ gives
an easy way to solve some problems related to the matrix A. For instance, let
Ax = b be a system of linear equations with a square matrix A, and suppose
that there is an invertible matrix Q such that Q-l AQ is a diagonal matrix D.
Then the system Ax = b can be written as QDQ-1x = b, or equivalently
DQ-l x = Q- 1b. Hence, for c = Q- 1 b the solution y of Dy = c yields the
solution x = Qy of the original problem. Note that Dy = c can be solved
easily.
In this section, we shall discuss what we mean by a "good" basis and
how to find it.

Definition 6.2 A square matrix A is said to be diagonalizable if there


exists an invertible matrix Q such that Q-l AQ is a diagonal matrix (i. e., A
is similar to a diagonal matrix).

The next theorem characterizes a diagonalizable matrix, and the proof


shows a practical way of diagonalizing a matrix.

Theorem 6.6 Let A be an n x n matrix. Then A is diagonalizable if and


only if A has n linearly independent eigenvectors.

Proof: (=> ) Suppose A is diagonalizable. Then there is an invertible matrix


Q such that Q-l AQ is a diagonal matrix D, say
o
o
6.2. DIAGONALIZATION OF MATRICES 217

or, equivalently, AQ = QD. Let Xl, ... , Xn denote the column vectors of
Q. Since
AQ = [AXI AX2 ... Axnl
QD = [AlXl A2X2 ... AnXn],

the matrix equation AQ = QD implies AXi = Ai xi for i = 1, ... , n. More-


over, since Q is invertible, their column vectors are nonzero and are linearly
independent, that is, the Xi'S are n linearly independent eigenvectors of A.
(-¢:::) Assume that A has n linearly independent eigenvectors Xl, ... , Xn
belonging to the eigenvalues AI, ... , An, respectively, so that AXi = AiXi
for i = 1, ... ,n. If we define a matrix Q as

with Xj as the j-th column vector, then the same equation shows AQ =
QD, where D is the diagonal matrix having the eigenvalues AI, ... , An
on the diagonal. Since the column vectors of Q are assumed to be linearly
independent, Q is invertible, so Q-l AQ = D. 0

Remark: (1) Theorem 6.6 says that to diagonalize a matrix A what we


need is a basis consisting of eigenvectors of A, by which we meant a "good"
basis. Moreover, the proof of the theorem reveals a method for diagonalizing
an n x n matrix A.
Step 1. Find n linearly independent eigenvectors Xl, X2, ... , Xn of A.
Step 2. Form the matrix Q having Xl, X2, ... , Xn as its column vectors.
Step 3. The matrix Q-l AQ will be a diagonal form with AI, ... , An as
its successive diagonal entries, where Aj is the eigenvalue associated
with the eigenvector Xj, j = 1, 2, ... , n.
(2) Let a denote the standard basis for ~n and let {3 = {Xl, X2, ... , xn}
be the basis for ~n consisting of n linearly independent eigenvectors of A.
Then the matrix

is the transition matrix from {3 to a, and the matrix representation of A, as


a linear transformation, with respect to {3, is

[Ali3 = [Idl~[A1Q[Id]3 = Q-l AQ = [ Al 0 1


o An·
218 CHAPTER 6. EIGENVECTORS AND EIGENVALUES

Note that the diagonal entries ,Vs are the eigenvalues of A.


(3) Not all matrices are diagonalizable. A standard example is A
[~ ~]. Its eigenvalues are Al = A2 = o. Hence, if A is diagonalizable,
then

for some invertible matrix Q, and then A must be the zero matrix. Since
A is not the zero matrix, no invertible matrix Q can be achieved so that
Q-1 AQ is diagonal.

Example 6.5 Diagonalize the following matrix:

Solution: A direct calculation gives that the eigenvalues of A are Al = 1,


A2 = 1 and A3 = -2, and their associated eigenvectors are

Xl = (1, 0, 0), X2 = (0, 1, 1) and X3 = (1, 2, 1),


respectively. They are linearly independent, and the first two vectors Xl, X2
form a basis for the eigenspace E(l) belonging to Al = A2 = 1, and X3 forms
a basis for the eigenspace E( -2) belonging to A3 = -2. Thus, the matrix

diagonalizes A. In fact, one can verify that

What would happen if one chose different eigenvectors belonging to the


eigenvalues 1 and -27 According to the proof of Theorem 6.6, nothing would
6.2. DIAGONALIZATION OF MATRICES 219

happen: Any matrix whose columns are linearly independent eigenvectors


will diagonalize A. For example, {(-I, 0, 0), (0, -1, -In is another basis
for E(I), and {(2, 4, 2n is also a basis for E( -2). The matrix

also diagonalizes A as Q-l AQ = [~~ ~ ].


o 0 -2
A change in the order of the eigenvectors in constructing a transition matrix
Q does not change the diagonalizability of A, but the eigenvalues appear-
ing on the main diagonal of the resulting diagonal matrix would appear in

n
accordance with the order of the eigenvectors in the transition matrix. For
example, let

s~ [~ ~
Then S will diagonalize A, because it has linearly independent eigenvectors
as columns, but we find that

Problem 6.8 Find a 2 x 2 matrix A whose eigenvalues are 2 and 3, and whose
eigenvectors are (2, 1) and (3, 2), respectively.

Theorem 6.6 shows how to diagonalize a matrix and what the diagonal
matrix is when the matrix has a full set of linearly independent eigenvec-
tors. We next consider when a square matrix A can have a full set of linearly
independent eigenvectors. This problem is closely related to the eigenval-
ues of the matrix, because eigenvectors can be found practically after the
eigenvalues have been computed.
The following theorem shows that if an n x n matrix has n distinct (real)
eigenvalues, then it has n linearly independent eigenvectors so that it is
always diagonalizable and the diagonal matrix has the eigenvalues on the
main diagonal.
220 CHAPTER 6. EIGENVECTORS AND EIGENVALUES

Theorem 6.7 Let ),1, ),2, ... , ),k be distinct eigenvalues of a matrix A
and Xl, X2, ... , Xk eigenvectors belonging to them, respectively. Then
{Xl, X2, ... , xd is linearly independent.

Proof: Let r be the largest integer such that {Xl, ... , x r } is linearly
independent. If r = k, then there is nothing to prove. Suppose not, i.e.,
1 ::; r < k. Then {Xl, ... , Xr+l} is linearly dependent. Thus, there are
scalars CI, C2, ... , Cr+l with Cr+l =1= 0 such that
ClXl + C2X2 + ... + Cr+lXr+l = O. (1)
Multiplying both sides by A, and, using

we get
Cl),lXl + C2),2X2 + ... + Cr+l)'r+lXr+l = O. (2)
Multiplying both sides of (1) bY),r+! and subtracting the resulting equation
from (2) yields
Cl (),l - ),r+l)Xl + C2(),2 - ),r+!)X2 + ... + cr(),r - ),r+dxr = o.
Since {Xl, X2, ... , X r } is linearly independent and ),1, ),2, ... , )'r+l are
all distinct, it follows that Cl = C2 = '" = Cr = O. Substituting these values
in (1) yields Cr +! = 0, which is a contradiction to the assumption. 0

As a consequence of Theorems 6.6 and 6.7, we obtain the following.

Theorem 6.8 If an n x n matrix A has n distinct eigenvalues, then A is


diagonalizable.

It follows from Theorem 6.8 that if Xl, X2, ... , Xn are eigenvectors of an
nxn matrix A belonging to n distinct eigenvalues ),1, ),2, ... , )'n, respectively,
then they form a basis for IRn and the matrix representation of A with respect
to this basis should be a diagonal matrix as shown in Remark (2) on page 217.
Of course, some matrices can have eigenvalues with multiplicities > 1
so that the number of distinct eigenvalues is strictly less than n. In this
case, if such a matrix still has n linearly independent eigenvectors, then it
is also diagonalizable, because for a diagonalization all we need is n linearly
independent eigenvectors. In some cases, such a matrix does not have n
linearly independent eigenvectors (see Example 6.3), so a diagonalization is
impossible. This case will be discussed in Section 9.1.
6.3. APPLICATION: DIFFERENCE EQUATIONS 221

Example 6.6 Compute A100 for A = [~ ~].


Solution: Its eigenvalues are 5 and -2 with associated eigenvectors (1,1)

and (-4,3), respectively. Hence Q = [~ -~] diagonalizes A, i.e.,

Q-1 AQ = [ 5 0 ]
o -2

Therefore,

5100 0 ]
A lOO Q [ 0 (_2)100 Q-1

~ [ 3· 5 100 + 4 . 2 100 4· 5100 - 4 . 2 100 ].


7 3 . 5100 - 3 . 2 100 4. 5100 + 3 . 2 100 o

Probl,m 6.9 Foe the matrix A ~ [1~ -~! 1~].


(1) diagonalize the matrix A,
(2) find the eigenvalues of AlO + A7 + SA.

6.3 Application: Difference equations


The discrete analogs of differential equations are called difference equations.
They represent a mathematical model of dynamic processes that change over
time and are widely used in such areas as economics, electrical engineering,
and ecology.
Let us begin with a classical example. Early in the thirteenth century,
Fibonacci posed the following problem: "Suppose that a newly born pair of
rabbits produces no offspring during the first month of their lives, but each
pair gives birth to a new pair once a month from the second month onward.
Starting with one (= Xl) newly born pair in the first month, how many pairs
of rabbits can be bred in a given time, assuming no rabbit dies?"
Initially, there is one pair. After one month there is still one pair, but
a month later it gives a birth, so there are two pairs. If at the end of n
months there are Xn pairs, then after n + 1 months the number will be the
222 CHAPTER 6. EIGENVECTORS AND EIGENVALUES

Xn pairs plus the number of offspring of the Xn-l pairs who were alive at
n - 1 months. Therefore, we have for n ~ 2,

Xn+l = Xn + Xn-l.
It is convenient to set XQ = o. Hence the first several terms of the sequence
become
0, 1, 1, 2, 3, 5, 8, 13, ... ,
which is called the Fibonacci sequence; each term is called a Fibonacci
number.
In general, the linear difference equation (or recurrence relation)
of order k is written as

where ai, i = 1, ... ,k are constants with al and ak nonzero.


The equation for the Fibonacci sequence above is an example of the linear
difference equation. A solution to the difference equation is any sequence
{xn : n ~ O} of numbers that satisfies the equation. Of course, a solution
can be found by simply writing out enough terms of the sequence, but that
can be an awful task. Linear algebra gives us an easier approach to this
problem.
Instead of discussing the general method of finding a solution to a dif-
ference equation, we restrict our discussion to the case of the Fibonacci
sequence.

Example 6.7 Find the 1997th Fibonacci number.


Solution: A standard trick is to consider a trivial extra equation Xn = Xn
together with the given equation:

{
Xn+l + Xn-l
Xn

Equivalently in matrix notation,

[ Xn+l
Xn
1= [Ill [ l'
1 0
Xn
Xn-l

which is of the form

xn = AXn-l = A n xQ , n = 1, 2, ... ,
6.3. APPLICATION: DIFFERENCE EQUATIONS 223

where Xn = [ X;:l ] and A = [~ ~]. Thus, the problem is reduced to


computing An. However, by a simple computation, we obtain the eigenvalues
)'1 = ~(1 + vis), A2 = ~(1 - vis) of A and their associated eigenvectors
VI = (AI, 1), V2 = (A2, 1), respectively. Moreover, the transition matrix
and its inverse are found to be

_1-)5]
2 .
1+)5
2

1+~)5
With D = [
l-~v?l
e-ov?f 1
2 Q-.
1

For instance, if n = 1997, then

Therefore, we get

X1997
=~
vis
((1 +2vis) 1997 _ (1_vls)1997)
2 .

e- v?t
Note that since X1997 must be an integer, we look for the nearest integer
to this huge number. Since < ~, actually very small, for large k,
, ,,,,)1997 . Historically, the number
it must be the integer nearest to Js (¥
2

1+2)5, which is very close to the ratio ~,


X1997
is called the golden mean. 0

A more general form of the linear difference equation can be written in


a matrix equation as
224 CHAPTER 6. EIGENVECTORS AND EIGENVALUES

where A is an n x n matrix and Xo E ~n, which is also called a linear


difference equation. In fact, this equation defines a sequence of vectors
{xd~o, and a solution of this equation is reduced to a simple computation
of Ak.
When A is a diagonalizable matrix, it is easy to compute the solution.
Let Q be an invertible matrix so that

Then Xk = Akxo = QDkQ-Ixo for k = 1, 2, .... Note that the columns Vi


of Q are the eigenvectors of A, so by setting Q-Ixo = a = [al a2 ... anjT,
we get

)..k
I

~ QDkQ-lxo ~ [ ~l
)..k
2
Xk V2 Vn

1 0

Hence, if I)..il < 1 for all i, then the vector Xk must approach the zero vector
as k increases. On the other hand, if there exists an eigenvalue )..i with
I)..il > 1, this vector Xk may grow exponentially in magnitude.
Therefore, we have three possible cases for the process given by Xk
AXk-l, k = 1,2,···. The process is said to be

(1) unstable if A has an eigenvalue).. with 1)..1 > 1,


(2) stable if 1)..1 < 1 for all eigenvalues of A,
(3) neutrally stable if the maximum value of the eigenvalues of A is 1.

Problem 6.10 Let {an} be a sequence with ao = 1, al = 2, a2 = 0, and the


recurrence relation an = 2a n -l + a n -2 - 2a n -3 for n 2': 3. Find the n-th term an.

The following example shows a special type of difference equation called


a Markov process.

Example 6.8 (Markov process) We start with Xo people outside a big city
and YO people inside the city. Suppose that each year 20% of the people
6.3. APPLICATION: DIFFERENCE EQUATIONS 225

outside the city move in, and 10% of the people inside move out. Then we
are concerned about the "eventual" distribution of the population.
At the end of the first year, the distribution of the population will be

Xl - 0.8xo + [Link]
{
YI : 0. 2xo + 0.9yo.
Or, in matrix form,

[ Xl
YI
1= [0.8 0.1
0.2 0.9
1[ XoYo 1= Axo.
Thus if Xn = (xn, Yn) denotes the distribution of the population after n
years in the country in where the city is, we get Xn = Anxo.
In this formulation, the problem can be summarized as follows:
(1) The total number of people stays fixed.
(2) The numbers Xn , Yn can never become negative.
A process having these two properties is called a Markov process. In
general, a Markov process is a repeated application of a matrix A to an
initial state Xo, where the matrix A satisfies the following two conditions:
(1') entries of each column of A add up to 1;
(2') entries of A are all nonnegative so that the powers An are all nonneg-
ative.
Such a matrix A is called a stochastic matrix.
Now, to solve the problem, we first find the eigenvalues and eigenvectors
of A. They are Al = 1, A2 = 0.7 and VI = (1, 2), V2 = (-1, 1), respectively,
so that
1 -1
Q= [ 2 1
1' and Q-1 =:31 [ -21 1
1 1.
Hence, Q-1 AQ = [10 0.70 1 = D and

xn Anxo = QDnQ-Ixo

(xo + Yo) [ ~j~ 1+ (- 2xo + Yo)(0.7t [ -~j~ 1


~ (xo + Yo) [ ~j~ 1' as n ~ 00. o
226 CHAPTER 6. EIGENVECTORS AND EIGENVALUES

Let A be a stochastic matrix for a Markov process. Then the entries of


each column of A add up to 1. This means that the sum of each column of
A - I is 0, or equivalently rl + r2 + ... + rn = 0 for the row vectors ri of
A - I. This is a nontrivial linear combination of the row vectors, and these
row vectors are linearly dependent, so that det(A - 1) = o. Consequently,
A = 1 is an eigenvalue of A. If x is an eigenvector of A belonging to A = I,
then Ax = x. Hence, it is called the steady state.

Problem 6.11 Suppose that land use in a city in 1990 is


Residential Xo = 30%
Commercial Yo = 20%
Industrial Zo = 50%.

Denote by Xk, Yk, Zk the percentage of residential, commercial, and industrial,


respectively, after k years, and assume that the stochastic matrix is given as follows:

[ ~::~
Zk+l
] = [~:~ ~:~ 0~1] [ ~: ].
0.1 0.2 0.9 Zk

Find the land use in the city after 50 years. This problem has two essential prop-
erties of Markov process: The total area of the city stays fixed, and each portion of
area can never become negative.

Problem 6.12 A car rental company has three branch offices in different cities.
When a car is rented at one of the offices, it may be returned to any of three
offices. This company started business with 900 cars, and initially an equal number
of cars was distributed to each office. When the week-by-week distribution of cars
is governed by a stochastic matrix
0.6 0.1 0.2]
A = [ 0.2 0.2 0.2 ,
0.2 0.7 0.6

determine the number of cars at each office in the k-th week. Also, find lim Ak.
k-oc

6.4 Application: Differential equations I


The diagonalization of matrices may be used to solve systems of linear differ-
ential equations. For those who are not familiar with differential equations,
we begin this section with some basic preliminaries about them.
Let y = f(t) be a real-valued differentiable function on an interval I =
[a, b] containing o. From elementary calculus, it is easy to see that the
6.4. APPLICATION: DIFFERENTIAL EQUATIONS I 227

differential equation y'(t) = d~~t) = 5y(t) has the general solution Y = ce 5t ,


where c is an arbitrary constant. If we are given an additional condition
y(O) = Yo = 3 for 0 E I, called an initial condition, then the solution of
y' = 5y is y = 3e 5t , called a particular solution.
This computation can be extended to a system of n linear differential
equations (called a homogeneous linear system of order n) with constant
coefficients, which is by definition of the form

r
= anYl + a12Y2 + ... + alnYn
Y2 = a21Yl + a22Y2 + ... + a2nYn

y~ = anlYl + an2Y2 + ... + annYn,


where Yi = !i(t), for i = 1, 2, ... , n, are real-valued differentiable functions
on an interval I = [a, b] and y~ = df~~t) is its derivative. In most cases, we
may assume that the interval I contains 0, and some initial conditions are
given as fi(O) = di at 0 E I.
Let y(t) be the vector whose entries are the functions fi(t)'s. Then its
derivative is defined by
yi (t)
Y2(t)
y'(t) =

y~(t)
If A is the matrix of the coefficient in the system, then the matrix form of
the system is written as
y'(t) = Ay(t),
with an initial condition Yo = y(O) = (d 1 , ... , dn ) E IRn. It is well-known
that this system has a unique solution y(t) depending on an initial condition
Yo by the fundamental theorem of ordinary differential equations.
In a more general type of system of linear differential equations, the
entries of the coefficient matrix could be functions. However, for our purpose
we restrict our attention to systems with constant coefficients.
Example 6.9 Consider the following three systems:

{ yi, = 2Yl - 3Y2


2Yl
{Yi = tUI + t 2Y2
2Yl + t 3 Y2
{Yi
Y2 = + Y2 " Y2 = ,
Y2,
The first two systems are linear, but the coefficients of the second are func-
tions. The third is not linear. 0
228 CHAPTER 6. EIGENVECTORS AND EIGENVALUES

Some basic facts on a system y'(t) = Ay(t) oflinear differential equations


are listed below.
(1) Write k solutions {Yl(t), Ydt)} of the system as

Yll (t)
Y2l(t)
Yl (t) =

Then each solution Yi(t), for i = 1, ... , k, draws a curve in IR n passing


through the initial vector Yio = Yi(O) = (dil , ... , din) as t varies in the
interval I, and it is uniquely determined by the given initial condition YiO.
By definition, this set of k solutions is said to be linearly independent on the
interval I if, at each tEl, {Yl (t), "', Yk (t)} is linearly independent in IRn.

Example 6.10 For the system

y'(t) =[ yi(t)
Y2(t)
1= [01 °11[ Yl(t)
Y2(t)
1= [ Yd
Y2(t) 1= Ay(t)
t) ,

one can easily verify that the vector functions

Yl(t) = [ Yll(t)
Y2l(t)
1= [ ee; l' and Y2(t) = [ Y22(t)-e 1
Y12(t) = [ e t
t
1
are solutions of the system. Suppose ClYl(t) + C2Y2(t) = 0 for all tEl.

Then at t = 0, in particular, it becomes Cl [ ~ 1+ C2 [ _ ~ 1= O. Then


clearly we have CI = C2 = 0, that is, they are linearly independent. 0

(2) If Yl, ... ,Yk are solutions of the system, then


, ,
CIYl+ ... + qYk
clAYl + ... + CkAYk
A(CIYl + ... + CkYk)
implies that their linear combination ClYl + ... + CkYk is also a solution for
any constants c/s. Moreover, if Yl (t), ... , Yn (t) are n linearly independent
solutions of the system, then their initial vectors Yio'S are linearly indepen-
dent, so they form a basis for IRn. Any initial condition Yo as a vector in IR n
6.4. APPLICATION: DIFFERENTIAL EQUATIONS I 229

is a linear combination of YiO'S, i = 1, ... , n: Yo = CIYlO + ... + CnYnO. Then


one can easily see that the vector function

y(t) = CIYl(t) + ... + cnYn(t)


is the solution of the system satisfying the initial condition Yo. In fact,
any solution can be obtained in this form, and this form of the solution is
called the general solution of the system. A set of linearly independent
n solutions is called a fundamental set of solutions, and they may be
determined by a set of n linearly independent initial vectors YiO'S.
Given a set of n solutions Yl (t),···, Yn(t) of the system y'(t) = Ay(t),
the linear independence of the set can be determined as follows: Let

yn(t) Y12(t)
Y21(t) Y22(t)
Y(t) = [Yl(t) ... Yn(t)] =

Ynn(t)

Clearly, the n solutions are linearly independent on I if and only if det Y(t) 1=
o for all tEl. However, the following lemma says that det Y (t) 1= 0 for all
tEl if and only if det Y(t) i= 0 at one point tEl. The determinant of Y(t)
is called the Wronskian of the solutions, denoted by W(t) = det Y(t) for
tEl.

Lemma 6.9 W'(t) = tr(A)W(t).

Proof:

W'(t) (det Y(t))' = L sgn(CT) (Yla(l) ... Yna(n»)'

L sgn(CT)Y~a(l) ... Yna(n) + ... + L sgn(CT)Yla(l) ... Y~a(n)


a a

t t Y~jYij t
Z J
=
Z
(t Y~j[adj
J
YLi)
n
L[Y' . adj Y]ii
tr(Y' . adj Y) = tr(A· (Y . adj Y))
= tr(det Y(t)A) tr(A)W(t),
230 CHAPTER 6. EIGENVECTORS AND EIGENVALUES

where lij (t) is the cofactor of Yij, and the equalities in the last two lines are
due to the fact that
Y'(t) [y~ (t)... y~ (t )1 = A [Yl (t) . . . Yn (t )1 = AY (t ) ,
Y(t) adjY(t) det Y(t)In = W(t)In. o

This lemma shows that the Wronskian W(t) satisfies a differential equa-
tion of the form Y' = ay discussed at the beginning of this section. Thus
it must be an exponential function of the form W(t) = Woe(trA)t with an
initial condition W(O) = Woo This means that the value of W(t) is zero for
all t or never zero on I depending on whether or not Wo = o. Thus, the n
solutions are linearly independent for all tEl if and only if the n solutions
are linearly independent at some tEl, i.e., the linear dependence of the n
solutions may be checked at any convenient point. This again justifies that
it is good enough to begin with a set of n linearly independent initial vectors
to find the general solution of y'(t) = Ay(t). That is, if the n initial vectors
(conditions) YlO(O) = YlO,··· ,YnO(O) = YnO are linearly independent, then
the solutions determined by them form a fundamental set.

6.5 Application: Differential equations II


We now get to the problem of solving a system of homogeneous linear dif-
ferential equations y'(t) = Ay(t) with the given initial condition Yo =
(d 1, ... , d n ). This problem may be considered in three steps: (1) A is a
diagonal matrix D, (2) A is diagonalizable, and (3) A is any square matrix.
(1) First suppose that A is a diagonal matrix D. Then

means that we are given n simple first order linear equations:


y~(t) = AiYi(t), i = 1, 2, ... , n,
whose solutions are known as Yi(t) = die Ait with Yi(O) = di , for i = 1, ... , n.
In matrix notation,

y(t) -- [ Yl~t)
: ]_- [ e
A1t
0] [ ~l: ]

Yn(t) 0 eAnt dn
6.5. APPLICATION: DIFFERENTIAL EQUATIONS II 231

where etD is by definition,

Remark: Actually, the above solution is the general solution of the system.
Indeed, if we take n initial conditions to be the standard basis {el' ... , en}
for ]Rn, then for each initial vector ei = (0, ... , 1, ... ,0) for i = 1, ... ,n, we
get the solution Yi(t) = eA;tei . Since the initial conditions Yi(O) = ei are
linearly independent eigenvectors of the diagonal matrix D, the set {yi (t) =
eA;tei: i = 1, 2, ... , n} is a fundamental set. Any initial condition can
be written as
Yo = d1el + ... + dne n ,
so the general solution is of the form

Example 6.11 One of the fundamental problems of mathematical ecology


is the predator-prey problem. Let x(t) and y(t) denote the populations at
time t of two species in a specified region, one of which x preys upon the other
y. For example, x(t) and y(t) may be the number of sharks and small fish,
respectively, in a restricted region of the ocean. Without the fish (preys) the
population of the sharks (predators) will decrease, and without the sharks,
the population of the fish will increase. A mathematical model showing their
interactions and whether an ecological balance exists can be written as the
following system of differential equations:

x'(t) = ax(t) - bx(t)y(t),


{
y'(t) = -cy(t) + dx(t)y(t).

In this equation, the coefficients a and c are the birth rate of x and the death
rate of y, respectively. The nonlinear x(t)y(t) terms in the two equations
mean the interaction of the two species, so the coefficients band d are the
232 CHAPTER 6. EIGENVECTORS AND EIGENVALUES

measures of the effect of the interaction between them. A study of this gen-
eral system of differential equations leads to a very interesting development
in the theory of dynamical systems and can be found in any book on ordi-
nary differential equations. Here, we restrict our study to the case of x and
y very small, i.e., near the origin in the plane. In this case, one can neglect
the nonlinear terms in the equations, so the system is assumed to be given
as follows:
[
X' (t)
y'(t)
1 [a0 0
-c
1[ x(y(t)·
t) 1

Thus the eigenvalues are >'1 = a, A2 = -C, and their eigenvectors el, e2,
respectively. Thus, its general solution is

[ ~~!j 1= [ d~~=:: 1= [e~t e~ct 1[ ~~ 1= dleatel + d2e- cte2. D

(2) We next assume that a matrix A in the system y'(t) = Ay(t) is


diagonalizable, that is, it has n linearly independent eigenvectors Yl, ... , Yn
belonging to the eigenvalues AI, ... , An, respectively. Then the transition
matrix Q = [Yl ... Ynl diagonalizes A and

Thus the system becomes Q-ly' = DQ-ly. If we take a change of variables


by the new vector x = Q-ly (or y = Qx), then we obtain a new system
x' = Dx,
with an initial condition Xo = Q-lyO = (Cl' ... , Cn). Since D is diagonal,
its general solution is
x = etDxo = cle>'lt el + ... + cneAnten.
The general solution of the original system y' = Ay is
y = Qx = QetDQ-lyO
o
[v! . vnf:" ][:, 1
cleAltYl + c2eA2tY2 + ... + cneAntYn
ClYl(t) + C2Y2(t) + ... + cnYn(t).
6.5. APPLICATION: DIFFERENTIAL EQUATIONS II 233

Remark: Note that each vector function Yi(t) = eAitvi is the solution of the
system determined by the initial condition Yi(O) = Vi, for i = 1, ... ,n, which
are linearly independent eigenvectors of A. Hence, they form a fundamental
set of solutions.
Thus, we have obtained the following theorem:

Theorem 6.10 If an n x n matrix A has n linearly independent eigenvec-


tors, then for any Yo E ]Rn, the system of linear differential equations

y'(t) = Ay(t) with initial condition y(O) = Yo


has a unique solution of the form y = QetDQ-lyo, where A = QDQ-l.

Remark: When A is diagonalizable, the procedure for solving a system


y'(t) = Ay(t) with an initial condition Yo at t = 0 may be summarized as
follows:
Step 1. Find the eigenvalues and n linearly independent eigenvectors of A.
Step 2. Construct the transition matrix Q with the eigenvectors so that
A = QDQ-l.
Step 3. Take a change of variables by y = Qx to get a new system x' = Dx,
whose solution is x = etDxo.
Step 4. Use the substitution y = Qx to get the solution y = Qx =
QetDQ-lyo.
Consequently, when A is diagonalizable, the general solution of y'(t) =
Ay(t) may be obtained directly from a basis of eigenvectors and eigenvalues
of A without looking for an individual fundamental set.

Example 6.12 Solve the system of linear differential equations

4Y2 + 4Y3
11Y2 + 12Y3
4Y2 + 5Y3,

and also find its particular solution satisfying the initial conditions Yl (0) = 0,
Y2(0) = 3 and Y3(0) = 2.
Solution: In matrix form, the system may be written as y' = Ay with

-11
-4 4] 12 .
-4 5
234 CHAPTER 6. EIGENVECTORS AND EIGENVALUES

(1) The eigenvalues of A are >'1 = .A.2 = 1, and .A.3 = -3, and their
eigenvectors are Yl = (1,1,0), Y2 = (-1,0,1) and Y3 = (1,3,1), respectively,
which are clearly linearly independent[ (011see !;Ob1e]m 6.9).
311
(2) The matrix Q = [Yl Y2 V3] = 0 diagonalizes A:
1

Q-l AQ =D= [~~ ~ ].


o 0 -3
(3) Then y' = Ay is transformed to x' = Dx by a change of variables
y = Qx. The system x' = Dx consists of three equations: xi = Xl, x; =
X2, X3 = -3X3· These are easily solved to give Xl = ae t , X2 = bet, X3 = ce- 3t ,
where a, b, and c are arbitrary constants. In matrix form, the solution is
written as x = eDtxo with Xo = (a, b, c), i.e.,

[ ~~ ] = [e~0 e~0 ~
X3 e- 3t
] [ ~
c
] [~;:].
ce- 3t

(4) Since y = Qx, we get

aetYl + be t Y2 + ce- 3tv3.


For the initial conditions Yo = [0 3 2]T,

Thus the particular solution is

o
6.6. EXPONENTIAL MATRICES 235

(3) A system y' = Ay of linear differential equations with a general


matrix A will be discussed in Section 6.7.

Problem 6.13 Solve the system { Y}


Y2

Y' 4Yl + Y3
Problem 6.14 Solve the system { y~ = -2Yl + Y2
y~ -2Yl + Y3,
and find the particular solution of the system satisfying the initial conditions Yl (0) =
-1, Y2(0) = 1, Y3(0) = O.

yi = Yl - Y2
Problem 6.15 Solve the system { y~ = 3Yl
y~ = 2Yl + Y2
with initial conditions Yl (0) = 0, Y2(0) = 2, Y3(0) = 1.

6.6 Exponential matrices


Consider again a system of linear differential equations y' = Ay with initial
condition y(O) = Yo for a square matrix A. If A is diagonalizable, say
Q-1 AQ = D, the system is solvable and has a unique solution of the form
y = Qe tD Q-1yo, by Theorem 6.10. Now let A be any square matrix, not
necessarily diagonalizable. It will be shown later that the solution of the
system y' = Ay with y(O) = Yo is still of the form

provided that etA is made meaningful. In particular, if A is diagonalizable


so that A = QDQ-1, then this form should be
etA = etQDQ-l = QetD Q-1.
Motivated from the Maclaurin series of the exponential function eX, we
define the exponential matrix of a matrix.

Definition 6.3 For any square matrix A, the exponential matrix of A is


defined as the series
A 00 Ak A2 A3
e =L-, =I+A+-, +-, + ....
k=O k. 2. 3.
236 CHAPTER 6. EIGENVECTORS AND EIGENVALUES

Example 6.13 If A = D is a diagonal matrix, then for any integer k

and Dk = [ ).~ 0 l.
o ).k
n

Thus, the exponential matrix e D is

1 2 1 3
I+D+-D +-D + ...
2! 3!
o

which coincides with the previous definition. o


Note that the computation of the powers Ak is not simple in general.
However, if A is diagonalizable, then it is easy to see that Ak = QDkQ-I,
which is relatively simple to compute.
In any case, it should be verified that e A exists for any square matrix A,
that is, each entry of the series form of e A is convergent. For this we first
discuss the convergence of sequences of matrices in general.

Definition 6.4 A sequence of matrices AI, A2, A3, ... of the same size is
said to converge to a matrix L if each sequence of the (i, j)-entries of AI,
A2, A3, ... converges to the (i, j)-entry of L for all i, j. In this case, we
write
L = lim Ak.
k-+oo

Such a matrix L is called a limit of the sequence AI, A2, A 3 , ....

Theorem 6.11 Let AI, A2, A3, ... be a sequence of m x n matrices such
that lim Ak = L. Then
k-+oo

lim BAk = BL and lim AkC = LC


k-+oo k-+oo

for any matrices Band C for which the products are defined.
6.6. EXPONENTIAL MATRICES 237

Proof: By comparing the (i,j)-entries of both sides

lim [BAklij = lim (f[Blu:[Akl€j) = f[Bl i€ lim [Akl€j


k--+oo k--+oo €=1 €=1 k--+oo
m

= 2)BlidLl€j = [BLl ij ,
€=1

we get lim BAk


k--+oo
= BL. Similarly lim AkC
k--+oo
= LC. o

For example, if A is a diagonalizable matrix such that A = QDQ-l for


some invertible matrix Q, then, for each integer k 2: 0, Ak = QDkQ-l
implies that

Thus, lim Ak exists if and only if lim A~ exists for i = 1, ... , n.


k--+oo k--+oo

Problem 6.16 Let A = [~ i]. Find kl2..~ Ak if it exists. (Note that the matrix
A is not diagonalizable.)

Definition 6.5 A series of matrices Ao + Al + A2 + ... is said to converge


m
to a matrix L if L is the limit of the sequence {8m = L Ak I m = 0, 1,2, ... }
k=O
of the partial sums, that is, lim 8 m
m--+oo
= L. In this case, we write
00

Ao + Al + A2 + ... = L Ak = L.
k=O
Example 6.14 The sequence of the partial sums of a geometric series
AO+Al+A2+ ...
m
of a square matrix A is {8m = L Ak}, and so if A = QDQ-l is diagonaliz-
k=O
able, then
m m
L QDkQ-l = Q(L Dk)Q-l
k=O k=O
238 CHAPTER 6. EIGENVECTORS AND EIGENVALUES

Thus, the sequence converges if and only if IAil < 1 for all i. D

In particular, the exponential matrix e A is defined to be the limit of the


sequence
{ Ek! : 0,1,2,··· .
m Ak
m =
}

The existence of e A for any square matrix A can easily be shown as follows:
Let M be a number such that laijl ~ M for all (i,j)-entries aij of A (such
a number exists since A has only finite number of entries). Then all (i,j)-
entries of Ak are bounded by n k - 1 Mk for all k, and hence each entry of e A
is bounded by
~~
~ kIn
k-1Mk _
-
~ enM ,
k=O • n

L
00 Ak
so by the comparison test, each entry of e A = kI is absolutely convergent
n=O •
for any square matrix A.
Again, if A is diagonalizable with Q-l AQ = D, then by Theorem 6.11,

A QDQ-l
e = e

In general, the computation of eA is not easy at all if A is not diagonal-


izable. When A is a triangular 2 x 2 matrix, it is relatively easy.

Example 6.15 If A = [~ ~], then


1 2
I +A+ -A + ...
2

[10] [11] 1[1


o 1 + 0 3 +2 0
6.6. EXPONENTIAL MATRICES 239

It is a good exercise to calculate the missing entry * directly from the defi-
~~. 0

The following properties of the exponential matrices are easy to prove,


so are left for exercises.

Theorem 6.12 (1) e A+ B = eAe B provided that AB = BA.


(2) e Q- lAQ = Q-leAQ for any invertible matrix Q.
(3) If AI, A2, ... , An are the eigenvalues of a matrix A with their asso-
ciated eigenvectors VI, V2, ... , V n , then e'\; 's are the eigenvalues of
e A with the same associated eigenvectors Vi'S for i = 1, 2, ... , n.
Moreover, det e A = eAl •.. e An = etr(A) -I- 0 for any square matrix A.
(4) In particular, the matrix e A is never singular for any square matrix A,
and (eA)-l = e- A .

Problem 6.17 Prove the above four properties of eA.

Problem 6.18 Prove that if A is skew-symmetric, then e A is orthogonal.

Example 6.16 For A = [~ ~], one can compute e A as a simple applica-


tion of property (1). We first write it as A = 21 + N, where N = [~ ~].
Since (2I)N = N(21), by (1)

From the direct computation of the series expansion, we get e 21 = e 2 1.

Moreover, since N k = 0 for k 2 2, eN = I+N+~; +- .. = 1+N = [~ ~].


Thus,

e A = e2 1(1 + N) = e ~ ~ ] [ e;
2 [
3e 2
].
n
e2 0

[~
3
Problem 6.19 Compute e A for A = 2
0
240 CHAPTER 6. EIGENVECTORS AND EIGENVALUES

6.7 Application: Differential equations III


One of the most prominent applications of exponential matrices is to the
theory of linear differential equations.

Lemma 6.13 (1) Let A(t) and B(t) be matrices, whose entries are all
differentiable functions in t, such that their product is defined. Then
A(t)B(t) is differentiable, and its derivative is

:t (A(t)B(t)) = d~~t) B(t) + A(t) d~y).

(2) For any t E lR. and any square matrix A, the exponential matrix

e
~ =
~ 2 ~ 3
1+ tA + -A + -A + ...
2! 3!
d
is a differentiable function of t, and dt etA = Ae tA .

Proof: (1) is a usual computation. (2) By definition,

d
_etA = lim e(t+h)A - etA
= lim (e hA
-
I) etA
dt h~O h h---+O h '

since (tA)(hA) = (hA)(tA). One can now easily show limh~O eh~-I = A. 0

Lemma 6.13 implies that y = etAyo is the solution of the system y' = Ay.
In particular, if A is diagonalizable, say Q-l AQ = D, then from (2) in
Theorem 6.12 section we get

as m Theorem 6.10. The following theorem is a direct consequence of


Lemma 6.13.

Theorem 6.14 For any n x n matrix A, the linear differential equation


y' = Ay with initial condition y(O) = Yo has the solution
6.7. APPLICATION: DIFFERENTIAL EQUATIONS III 241

If A is not diagonalizable, then it is not easy to compute the matrix etA.


For this case, we introduce the Jordan canonical form of A in Chapter 9,
with which the computation of etA is made relatively easy. The matrices in
the following examples will be treated in general in Chapter 9.

Example 6.17 Solve the system y' = Ay of linear differential equations


with initial condition y(O) = Yo, where

Yo = [ ~ ].
Solution: First note that A has an eigenvalue A of multiplicity 2 and is
not diagonalizable. Now we write it as

A = [~ ~] + [~ ~] = AI + N.
Then, by the same argument as in Example 6.16,

Therefore, the solution is

In terms of components, Yl = (a + bt)e At , Y2 = beAt. D

Example 6.18 Find the general solution of the system y' = Ay, where

Solution: Note that the eigenvalues of A are a ± ib, which are not real
unless b = 0, in which case the matrix is already in diagonal form. If b =I 0,
the diagonalization discussed in this section does not apply since we have
complex eigenvalues which are going to be discussed in Chapter 9. However,
242 CHAPTER 6. EIGENVECTORS AND EIGENVALUES

there is another method of solving the system without using diagonalization.


We first write A as

A = [ ab -b
a 1= a [10 0
1 1+ b [01 -10 1= aI + bJ,

then clearly I J = J I, so etA = eat!+btJ = eat ebtJ . Since

one can deduce Jk = Jk+4 for all k = 1,2,· .. , and, moreover,

btJ (bt)2j2 (bt)3J3 (bt)4J4


I + l! + 2! + 3! + 4! + ...

[
1 - (bt)2
2! 4!
+ (bt)4
_... -(bt) + (bt)3
3!
- (bt)5
5!
+ ... j
(bt)3 (bt)5 (bt)2 (bt)4
(bt)- - + - - ... 1- - + - - ...
3! 5! 2! 4!

[
COS

sinbt
bt - sin bt
cosbt
1
for any constant band t. Thus, the general solution of y' = Ay is

- sin bt
cos bt
1[ [Link]
C2

In terms of components,

Yl eat (Cl cos bt - C2 sin bt)


{
Y2 eat (Cl sin bt + C2 cos bt). o

Problem 6.20 Solve the system y' = Ay with initial condition y(O) = Yo by com-

!
puting etAyo for

=: 1' ~
1
(l)A=[~ Yo [ : l, (2) A ~[ o
o
6.8. DIAGONALIZATION OF LINEAR TRANSFORMATIONS 243

6.8 Diagonalization of linear transformations


Recall that two matrices are similar if and only if they can be the matrix
representations of the same linear transformation, and similar matrices have
the same eigenvalues. In this section, we aim to find a basis a so that
the matrix representation of a linear transformation with respect to a is a
diagonal matrix. First, we start with the eigenvalues and the eigenvectors
of a linear transformation.

Definition 6.6 Let V be an n-dimensional vector space, and let T : V ---+ V


be a linear transformation on V. Then the eigenvalues and eigenvectors
of T can be defined by the same equation, Tx = AX, with a nonzero vector
xE V.

Practically, the eigenvalues of T are computed as those of the matrix


representation [Tl et of T with respect to a basis a for V. In fact, this is well
defined, since [Tl et is similar to [Tl.a for any other basis f3 for V and their
eigenvalues are the same by Theorem 6.2.
The eigenvectors of T can also be found from the eigenvectors of its
matrix representation. Let a = {Vl' V2, ... , Vn} be a basis for V. Then
the natural isomorphism (p : V ---+ ]Rn identifies the associated matrix A =
[Tl et : ]Rn ---+ ]Rn with the linear transformation T : V ---+ V via the following
commutative diagram.
T
V 'V

~I~ ~I~
]Rn • ]Rn.
A = [Tl et
Let A be an eigenvalue of A (also of T). Then, X = (Xl, X2, ... , Xn) E
]Rn is an eigenvector of the matrix A belonging to A (Ax = AX) if and only
if (p-l(x) = v = Xlvl + X2v2 + ... + XnVn E V is an eigenvector of T
(T(v) = AV), because the commutativity of the diagram shows

[T(V)let = [Tlet[vl et = Ax = AX = [Avl et .

Therefore, if Xl, X2, ... , Xk are linearly independent eigenvectors of


A = [TJet, then (p-l(Xl), (p-l(X2), ... , (p-l(Xk) are linearly independent
eigenvectors of T. Hence, the linear transformation T has a diagonal matrix
244 CHAPTER 6. EIGENVECTORS AND EIGENVALUES

representation if and only if it has n linear independent eigenvectors, by


Theorem 6.6.
The following example illustrates how to find a diagonal matrix repre-
sentation of a linear transformation on a vector space.

Example 6.19 Let T : P2(~) --+ P2(~) be the linear transformation defined
by
(TJ)(x) = f(x) + xJ'(x) + J'(x).
Find a basis for P2(~) with respect to which the matrix of T is diagonal.
Solution: First of all, we find the eigenvalues and the eigenvectors of T.
Take a basis for the vector space P2(~)' say 0: = {I, x, x 2}. Then the
matrix of T with respect to 0: is

1 1 0
[TJo = [ 0 2 2
1,
003

which is upper triangular. Hence, the eigenvalues of T are Al = 1, A2 = 2


and A3 = 3. By a simple computation, one can verify that the vectors
Xl = (1, 0, 0), X2 = (1, 1, 0), and X3 = (1, 2, 1) are eigenvectors of [TJo
in ~3 belonging to eigenvalues AI, A2, A3, respectively. Their corresponding
eigenvectors of T in P2(~) are iI(x) = 1, 12 (x) = 1 + x, h(x) = 1 +
2x + x 2, respectively. Since the eigenvalues AI, A2, A3 are all distinct,
the eigenvectors {Xl, X2, X3} of [TJo are linearly independent and so are
(3 = {iI, 12, h} in P2(~). Thus, each fi is a basis for the eigenspace E(Ai)
of T belonging to Ai for i = 1, 2, 3. Thus the transition matrix is

Hence, by changing the basis 0: to (3, the matrix representation of T is a


diagonal matrix:

o
6.9. EXERCISES 245

Note that, if T = A is an n x n square matrix written in column vectors,


A = [CI ... cnl, then the linear transformation A : lRn - t lR n is given by
A( ei) = Ci, i = 1, ... , n, so that A itself is just the matrix representation
with respect to the standard basis a = {el, ... , en} for IRn, say A =
[Ala:. Now if there is a basis (3 = {Xl, ... , xn} of n linearly independent
eigenvectors of A, then the natural isomorphism <I> : IRn - t IRn defined by
<I>(Xi) = ei is simply a change of basis by the transition matrix Q = [Idl3
and the matrix representation of A with respect to (3 is a diagonal matrix:

Problem 6.21 Let T be the linear transformation on lR3 defined by


T(x, y, z)=(4x+z, 2x+3y+2z, x+4z).
Find all the eigenvalues and their eigenvectors of T and diagonalize T.

Problem 6.22 Let M 2x2 (lR) be the vector space of all real 2 x 2 matrices and let T
be the linear transformation on M 2x2 (lR) defined by

T[a b]=[a+b+d a+b+C].


C d b+c+d a+c+d
Find the eigenvalues and basis for each of the eigenspaces of T, and diagonalize T.

Problem 6.23 Let T : P2(1R) -+ P2(1R) be the linear transformation defined by


T(f(x)) = f(x) + xf'(x). Find all the eigenvalues of T and find a basis Q for P2 (lR)
so that [TJ" is a diagonal matrix.

6.9 Exercises
6.l. Find the eigenvalues and eigenvectors for the given matrix, if they exist.

1
[~
l n l J
-4 -1
(1) [-~ ~], (2) 23,
1 3
1 1
01 01 10

l
1 1
(3) (4)
010 1 1
101 1 1 1

l~
0 -5
(5)
-12 -12 -1 -10 1 (6)
1 o 0 01 .
0 -1 2 -1 ' 0 1 -2
-1 0 -1 2 0 2 1
246 CHAPTER 6. EIGENVECTORS AND EIGENVALUES

n
6.2. Find the characteristic polynomial, eigenvalues and eigenvectors of the matrix

A~ [-~ j
6.3. Show that a 2 x 2 matrix A = [~ !] has

(1) two distinct real eigenvalues if (a - d)2 + 4bc > 0,


(2) one eigenvalue if (a - d)2 + 4bc = 0,
(3) no real eigenvalues if (a - d)2 + 4bc < 0,
(4) only real eigenvalues if it is symmetric (i.e., b = c).

6.4. Suppose that a 3 x 3 matrix A has eigenvalues -1, 0, 1 with eigenvectors


u, v, W, respectively. Describe the null space N(A), and the column space
C(A).

6.5. If a 3 x 3 matrix A has eigenvalues 1, 2, 3, what are the eigenvectors of


B = (A - J)(A - 2I)(A - 3I) ?
6.6. Show that any 2 x 2 skew-symmetric nonzero matrix has no real eigenvalue.
6.7. Find a 3 x 3 matrix that has the eigenvalues )'1 = 1, A2 = 2, A3 = 3 with the
associated eigenvectors Xl = (2, -1, 0), X2 = (-1, 2, -1), X3 = (0, -1, 2).
6.8. Let P be the projection matrix that projects ]Rn onto a subspace W. Find
the eigenvalues and the eigenspaces for P.

6.9. Let u, v be n x 1 column vectors, and let A = uv T . Show that u is an


eigenvector of A, and find the eigenvalues and the eigenvectors of A.

°
6.10. Show that if A is an eigenvalue of an idempotent n x n matrix A (i. e., A 2 = A),
then A must be either or l.
6.11. Prove that if A is an idempotent matrix, then tr A = rank A.
6.12. Let A = [aij] be an n x n matrix with eigenvalues AI, ... , An. Show that

Aj = ajj + L (aii - Ai) for j = 1, ... , n.


ii-J

6.13. Prove that if two diagonalizable matrices A and B have the same eigenvec-
tors (i. e., there exists an invertible matrix Q such that both Q-1 AQ and
Q-1 BQ are diagonal; such matrices A and B are said to be simultaneously
diagonalizable), then AB = BA. In fact, the converse is also true. (See
Exercise 7.17.) Prove the converse with an assumption that the eigenvalues
of A are all distinct.
6.9. EXERCISES 247

000 o -Cn
100 o -Cn-l

6.14. Show that the matrix A = 010 o -C n -2 has the character-

o 0 0 1 -Cl
istic polynomial p('\) = .An + Cl.A n - 1 + ... + Cn-l.A + Cn.
(This shows that every monic polynomial is the characteristic polynomial of
some matrix. The matrix A is called the companion matrix of p(,\).)
6.15. Let D : P3(lR) -+ P3(lR) be the differentiation defined by Df(x) = J'(x) for
f E P3(lR). Find all eigenvalues and eigenvectors of D and of D2.
6.16. Let T: P2(lR) -+ P2(lR) be the linear transformation defined by
T(a2x2 + alx + ao) = (ao + adx2 + (al + a2)x + (ao + a2).
Find a basis for P2 (lR) with respect to which the matrix representation for T
is diagonal.

-n [~ ~ n
6.17. Determine whether or not each of the following matrices is diagonalizable.
3 0
(2) (3) [ o 2
-2 0
6.18. Find an orthogonal matrix Q and a diagonal matrix D such that QT AQ =D
for

(1) A = [ -~4 -~2-3~]. (2) A =

6.19. Calculate AlOx for A = [ o1 2


5
-1]
-2 ,
o 6 -2
6.20. For n 2:: 1, let an denote the number of subsets of {I, 2, ... , n} that contain
no consecutive integers. Find the number an for all n 2:: 1.
6.2l. Evaluate det An, where
1 100 o0 0 0
1 1 1 0 o0 0 0
o 1 1 1 000 0

o 0 0 0 1 1 1 0
o 0 0 0 o 1 1 1
o 0 0 0 001 1
is the n x n {O, 1}-matrix with 1 's on the main diagonal and its two parallel
side diagonals.
248 CHAPTER 6. EIGENVECTORS AND EIGENVALUES

6.22. Let A = [0~6 0~3]. Find a value x so that A has an eigenvalue A = l.

For Xo = (I, I), calculate klim~oo


Xk, where Xk = AXk-l, k = 1,2,···.
6.23. Compute e A for

(1) A = [~ ~], (2) A = [~ ~].


6.24. In 1985, the initial status of the car owners in a city was reported as follows:
40% of the car owners drove large cars, 20% drove medium-sized cars, and
40% drove small cars. In 1995, 70% of the large-car owners in 1985 still owned
large cars, but 30% had changed to a medium-sized car. Of those who owned
medium-sized cars in 1985, 10% had changed to large cars, 70% continued
to drive medium-sized cars, and 20% had changed to small cars. Finally, of
those who owned the small cars in 1985, 10% had changed to medium-sized
cars and 90% still owned small cars in 1995. Assuming that these trends
continue, and that no car owners are born, die or otherwise add realism to
the problem, determine the percentage of car owners who will own cars of
each size in 2025.

6.25. Let A = [~ ~].


(1) Compute e A directly from the expansion.
(2) Compute eA by diagonalizing A.
6.26. Let A(t) be a matrix whose entries are all differentiable functions in t and
invertible for all t. Compute the following:

(1) ! (A(tn, (2) :t (A(t)-l).

6.27. Solve y' = Ay, where

(1) A =

(2) A
[ -6

= [
-1
2 -12
24
8

~ -~ ]
-nand y(O)
and

= [ ]
rn
y(l)

~
=

.
6.28. Let f(A) = det(AI -A) be the characteristic polynomial of A. Evaluate f(A)
for

(1) A = [;
1 1 3
! ~] (2) A = [
-1 1
;
4
~ -~].
In fact, f(A) = 0 for any square matrix A and its characteristic polynomial
f(A) (this is the Cayley-Hamilton theorem).
6.29. Determine whether the following statements are true or false, in general, and
justify your answers.
6.9. EXERCISES 249

(1) If B is obtained from A by interchanging two rows, then B is similar to


A.
(2) If A and Bare diagonalizable, so is AB.
(3) Every invertible matrix is diagonalizable.
(4) Every diagonalizable matrix is invertible.
(5) Every permutation matrix is orthogonal.
(6) Interchanging the rows of a 2 x 2 matrix reverses the signs of its eigen-
values.
(7) A matrix A cannot be similar to A + I.
(8) The eigenvalues of A + B equal the sum of the eigenvalues of A and B.
(9) The sum of the eigenvalues of A + B equals the sum of all the individual
eigenvalues of A and B.
Chapter 7

Complex Vector Spaces

7.1 Introduction
A real matrix has real coefficients in its characteristic polynomial, but the

eigenvalues may fail to be real. For instance, the matrix A = [~ - ~1


has no real eigenvalues, but it has the complex eigenvalues A = 1 ± i. Thus,
it is indispensable to work with complex numbers to find the full set of
eigenvalues and eigenvectors. Therefore, it is natural to extend the concept
of real vector spaces to that of complex vector spaces, and then develop
the basic properties of complex vector spaces. With this extension, all the
square matrices of order n will have n eigenvalues.
The Eucidean n-dimensional complex vector space is the set en of vectors
with n complex components:

en = {(Z1' Z2, ... , zn) : Zi E e, i = 1,2, ... , n}.


In this complex vector space en, the addition and the scalar multiplication
are given as follows:

(Z1' Z2, ... , zn) + (z~, z~, ... , z~)


k(Z1,Z2, ... ,zn)
The standard basis for en is again {e1' ... , en} as the real case, but the
scalars are now complex numbers so that any vector z in en is of the form
z = Lk=1 Zkek with Zk = Xk + iYk E e, i.e., Z = x + iy with x, y E ]Rn.
In a complex vector space, linear combinations are defined exactly the
same as real space except the scalars are replaced by complex numbers.

251
252 CHAPTER 7. COMPLEX VECTOR SPACES

Thus the same is true for the linear independence, spanning spaces, basis,
dimension, and subspace. For complex matrices, whose entries are complex
numbers, the matrix sum and product follow the same rules as real matrices.
The same is true for the concept of a linear transformation T : V ----. W from
a complex vector space V to a complex vector space W. The definitions
of the kernel and the image of a linear transformation remain the same as
those in the real case, as well as the facts about null spaces, column spaces,
matrix representations of linear transformations, and similarity.
However, if we are concerned about the inner product, there should be a
modification from the real case. Note that the absolute value (or modulus)
of a complex number Z = x + iy is defined as the nonnegative real number
Izi = (zz)~ = Jx 2 + y2, where z is the complex conjugate of z. Accordingly,
the length of a vector z = (ZI' ... , zn) in the n-space en with Zk = Xk +
iYk E e has to be modified: if we would take an inner product in en as
IIzl12 = zf + ... + z;, then a nonzero vector (1, i) in e 2 would have zero
length: 12 + i = o. In any case, a modified definition should coincide with
2
the old definition, when the vectors and matrices were real. The following
is the usual definition of an inner product in en.
Definition 7.1 If u = [UI U2 ... un]T and v = [VI v2 ... Vnr are vectors
in the n-space en with Uk, Vk E e, then their inner product u . v is defined
by
U· v = UIVI + U2V2 + ... + UnV n = ITT v,

where IT = [UI U2 ... un]T. The length (or magnitude) of a vector u in


en is defined by

Ilull = (u· u)~ = VlUI12 + IU212 + ... + lu n l2,


where IUkl2 = UkUk, and the distance between two vectors u and v in en
is defined by
d(u, v) = Ilu - vii·
In an (abstract) complex vector space, we can also define an inner product
by adopting the basic properties of the Euclidean inner product on en as
axioms.
Definition 7.2 A (complex) inner product (or Hermitian inner prod-
uct) on a complex vector space V is a function that associates a complex
number (u, v) with each pair of vectors u and v in V in such a way that the
following conditions are satisfied: For any vectors u, v and w in V and any
scalar k in C.
7.1. INTRODUCTION 253

(1) (u, v) = (v, u),


(2) (u + v, w) = (u, w) + (v, w),
(3) (ku, v) = k(u, v),
°
(4) (v, v) 2': 0, and (v, v) = if and only if v = o.

A complex vector space together with an inner product is called a com-


plex inner product space or a unitary space.

The following additional properties follow immediately from the defini-


tion of an inner product space:
(5) (0, v) = (v,O) = 0,
(6) (u, v + w) = (u, v) + (u, w),
(7) (u, kv) = k(u, v).

Remark: There is another way to define an inner product on a complex


vector space. If we redefine the Euclidean inner product U· v on en by

then the third equation in Definition 7.2 should be modified to be


(3') (u, kv) = k(u, v), so that (ku, v) = k(u, v).

But these two different definitions do not induce any essential difference in
a complex vector space.
In a complex inner product space, as in a real inner product space, the
length (or magnitude) of a vector u and the distance between two vectors
u and v are defined by
1
Ilull = (U,U)2, d(u,v) = Ilu-vll,

respectively.

Example 7.1 Let Cda, b] denote the set of all complex-valued continuous
functions defined on [a, b]. Thus an element in Cda, b] is of the form
f(x) = hex) + ih(x), where hex) and hex) are real-valued continuous
on [a, b]. Note that f is continuous if and only if each component function
Ii is continuous. From the theory of continuous functions, it is quite easy
to see that Cda, b] is a complex vector space under the sum and scalar
254 CHAPTER 7. COMPLEX VECTOR SPACES

multiplication of functions. For a vector f(x) = h(x) + ih(x) in Cda, b],


we define

lb f(x)dx = lb [h(x) + ih(x)] dx = lb h(x)dx + i lb h(x)dx.

We leave it as an exercise to show that, for vectors f(x) = f1(X) +ih(x)


and g (x) = gl (x) + i g2 (x) in the complex vector space Cd a, b], the following
formula defines an inner product on Cda, b]:

(f, g) lb f(x)g(x)dx

lb [h(x) - ih(x)] [gl(X) + ig2(x)] dx

lb [J1(X)gl(X) + h(X)g2(X)] dx
+i lb [h (X)g2(X) - h(X)gl (x)] dx. o
Problem 1.1 Show that the Euclidean inner product on en satisfies all the inner
product axioms.

The definitions of such terms as orthogonal sets, orthogonal comple-


ments, orthonormal sets, and orthonormal basis remain the same in complex
inner product spaces as in real inner product spaces. Moreover, the Gram-
Schmidt orthogonalization is still valid in complex inner product spaces, and
can be used to convert an arbitrary basis into an orthonormal basis. If V
is an n-dimensional complex vector space, then by taking an orthonormal
basis for V, there is a natural isometry from V to en that preserves the
inner product as in the real case. Hence, without lose of generality, we may
only work on en with the Euclidean inner product, and we use· and ( , )
interchangeably.
On the other hand, we may consider the set en as a real vector space by
defining addition and scalar multiplication as
(Zl,Z2, ... ,Zn) + (zi,z;, ... ,z~) (
Zl + zl,
I
Z2 + z2,·I
.. , Zn + znI )
r(zl,z2, ... ,zn) (rz1' rZ2, ... , rz n), for r E JR.
Two vectors e1 = (1,0, ... ,0) and ie1 = (i, 0, ... ,0) are linearly dependent
when we consider en as a complex vector space. However, they are linearly
independent if we consider en as a real vector space. In general,
7.1. INTRODUCTION 255

forms a basis for en considered as a real vector space. In this way, en is


naturally identified with the 2n-dimensional real vector space R2n. That is,
dim en = n when en is considered as a complex vector space, but dim en =
2n when en is considered as a real vector space.
Note that when en is considered as a 2n-dimensional real vector space,
the space R n = {(Xl, X2, ... , xn) : Xi E R} is a subspace of en, but not when
en is considered as an n-dimensional complex vector space.
Example 7.2 Consider the complex vector space e 3 with the Euclidean
inner product. Apply the Gram-Schmidt orthogonalization to convert the
basis UI = (i, i, i), U2 = (0, i, i), U3 = (0, 0, i) into an orthonormal basis.
Solution: Step 1. Set

UI (i, i, i) (i i i)
VI = IluI11 = J3 = J3' J3' J3 .
Step 2. Let WI denote the subspace spanned by VI. Then

Therefore,

U2 - Projw1 U2 3 ( 2i i i) (2i i i)
V2 = II u 2 - Projw1 u211 = y'6 -"3' 3' 3 = - y'6' y'6' y'6 .
Step 3. Let W 2 denote the subspace spanned by {VI, V2}. Then

U3 - Projw2 U3
= U3 - (U3, VI)Vl - (U3, V2)V2

. 1 (i i i) 1 ( 2i i i)
= (0, 0, z) - J3 J3' J3' J3 - y'6 - y'6' y'6' y'6
(0, -~, ~).
Therefore,

~) = (0, ~, ~).
256 CHAPTER 7. COMPLEX VECTOR SPACES

Thus,

form an orthonormal basis for ((:3. o


Example 7.3 Let CdO, 271"] be the complex vector space with the inner
product given in Example 7.1, and let W be the set of vectors in GdO, 271"]
of the form
eikx = cos kx + i sin kx,
where k is an integer. The set W is orthogonal. In fact, if

are vectors in W, then

(gk, gl) = fo27r eikxeilxdx = fo27r e-ikxeilxdx = fo27r ei(l-k)xdx

fo27r cos( f - k )xdx + i fo27r sinef - k )xdx

[l~k sin{f - k)x]~7r + i [i:=.1 cos{f - k)x]~7r, if k "" f,


{
Ji7r dx, if k = f.

0, if k "" f,
{
271" , if k = f.

Thus, the vectors in Ware orthogonal and each vector has length .;21r. By
normalizing each vector in the orthogonal set W, we obtain an orthonormal
set. Therefore, the vectors

1 ·k
fk{x) = .;21re t x, k = 0, ±1, ±2, ... ,

form an orthonormal set in the complex vector space CdO, 271"J. 0

Problem 7.2 Prove that in a complex inner product space V,


(1) (Cauchy-Schwarz inequality) /(x, y) /2 :5 (x, x) (y, y),
(2) (Triangular inequality) /Ix + y/l :5 /lx/l + /ly/l,
(3) (Pythagorean theorem) /Ix + y/l2 = /lx/l 2 + /ly/l2 if x and yare orthogonal.
7.1. INTRODUCTION 257

The definitions of eigenvalues and eigenvectors in a complex vector space


are the same as in the real case, but the eigenvalues can now be complex
numbers. Hence for any n x n (real or complex) matrix A the characteristic
polynomial det(.U -A) has always exactly n complex roots (i.e., eigenvalues)
counting multiplicities.
For example, consider a rotation matrix

A = [ C?s (} - sin (}
sm (} cos(}
1
with real entries. This matrix has two complex eigenvalues for any (} E ~,
but no real eigenvalues unless (} = hr, for an integer k.
Therefore, the theorems in Chapter 6 regarding eigenvalues and eigenvec-
tors remain true without requiring the existence of n eigenvalues explicitly,
and exactly the same proofs as in the real case are valid since the argument
in the proofs is not concerned with what the scalars are. For example, one
can have a theorem like "for an n x n matrix A, the eigenvectors belonging
to distinct eigenvalues are linearly independent", and "if the n eigenvalues
of A are distinct, then the eigenvectors belonging to them form a basis for
en so that A is diagonalizable".
An n x n real matrix A can be considered as a linear transformation on
both ~n and en:

T : ~n ~ ~n defined by T(x) = Ax,


s: en ~ en defined by S(x) = Ax.

Since the entries are all real, the coefficients of the characteristic polynomial
of A, p(A) = det(AI - A), are all real. Thus, if A is a root of p(A) = 0, then
its conjugate). is also a root because p().) = p(A) = O. In particular, any
n x n real matrix A has at least one real eigenvalue if n is odd.
Moreover, if x is an eigenvector belonging to the complex eigenvalue A,
then the complex conjugate x is an eigenvector belonging to).. In fact, if
Ax = AX with x :I 0, then

where x denotes the vector whose entries are the complex conjugates of the
corresponding entries in x.
Using this fact, the following example shows that any 2 x 2 matrix with
no real eigenvalues can be written as a scalar multiple of a rotation.
258 CHAPTER 7. COMPLEX VECTOR SPACES

Example 7.4 Let A be a 2 x 2 real matrix with no real eigenvalues. Then


it has two complex eigenvalues A = a + ib and A = a - ib with a, b E ]R
and b ::I O. Denote their associated eigenvectors in e 2 by x = u + iv and
x = u - iv with u, v E ]R2, respectively. Clearly, we have
u = ~(x+x), v = -~(x-x),
1 - i-
a "2 (A + A), b = -"2(A - A).
Since A ::I A, by the same argument as in Theorem 6.7, x and x are lin-
early independent in the complex vector space e 2 , so they are when e 2 is
considered as a real vector space. This implies that u and v are linearly
independent real vectors in ]R2 (see Problem 7.3 below), which is regarded
as a subspace of the real vector space e 2 . Thus a = {u, v} is a basis for
the real vector space ]R2, and we have
1 1 -
Au = "2 (Ax + Ax) = "2 (AX + AX)

= A(U~iV)+A(U~iV)
= au - bv,
and similarly Av = bu + avo It implies that the matrix representation of the
linear transformation A : ]R2 ---t ]R2 with respect to the basis a is

That is, any 2 x 2 matrices which have no real eigenvalues is similar to a


matrix of the above form. By setting r = J a2 + b2 > 0, we get a = r cos ()
and b = r sin () for some () E ]R, so

[
r cos () r sin ()
-rsin() rcos()
1. o
Problem 7.3 Let x and y be two vectors in a vector space V. Show that x and y
are linearly independent if and only if x + y and x - yare linearly independent.
Problem 7.4 Find the eigenvalues and the eigenvectors of

(1) [ ~1. OIl


2 0 ,
0 -i
(2) [-1-iIi 2
i 0
l+il
0
1
.

Problem 7.5 Prove that an n x n complex matrix A is diagonalizable if and only if


A has n linearly independent eigenvectors in the complex vector space en.
7.2. HERMITIAN AND UNITARY MATRICES 259

7.2 Hermitian and unitary matrices


Recall that the dot product of real vectors x, y E IR n is given by X· Y = x T y
in matrix notation. For complex vectors u, v E en, the inner product is
defined by U· v = UI VI + ... + UnV n = fiT v. That is, we need the conjugate
transpose, not just the transpose.

Definition 1.3 Let A be a complex matrix. Then the matrix


AH =AT ,

the complex conjugate transpose of A, is called the adjoint of A.

Note that A is the matrix whose entries are the complex conjugates of
the corresponding entries in A. Thus, [aij]H = [aji]. With this notation, the
Euclidean inner product on en can be written as
U· v = liT V = uHv.

Problem 7.6 For any matrices A and B such that AB is defined, show that (AB)H =
BHAH.

Problem 7.7 Prove that if A is invertible, then so is A H , and (AH)-l = (A -l)H.


For complex matrices, the notion of symmetry, skew-symmetry, and or-
thogonal real matrices are replaced by Hermitian, skew-Hermitian and uni-
tary matrices, respectively.
Definition 1.4 A complex square matrix A is said to be Hermitian (or
self-adjoint) if AH = A, or skew-Hermitian if AH = -A.

Examples of Hermitian and skew-Hermitian matrices are

A =[ 4 -2i 4+i]
3 '
B= [ -1 i + i 1+i]
-i .

those of a skew-Hermitian A Hermitian matrix with real entries is just a real


symmetric matrix, and conversely, any real symmetric matrix is Hermitian.
Like real matrices, any m x n (complex) matrix A can be considered as
a linear transformation from en to em, and we note that

for any x E en and y E em. The following theorem lists some important
properties of Hermitian matrices.
260 CHAPTER 7. COMPLEX VECTOR SPACES

Theorem 7.1 Let A be a Hermitian matrix.


(1) For any (complex) vector x E en, x H Ax is real.
(2) Every (complex) eigenvalue of A is real. In particular, an n x n real
symmetric matrix has precisely n real eigenvalues.
(3) The eigenvectors of A belonging to different eigenvalues are mutually
orthogonal.

Proof: (1) Since x H Ax is 1 x 1 matrix, (x H Ax) = (x H Ax)H = x H Ax.


(2) If Ax = AX, then xHAx = xHAX = AxHx = Allxl12. The left side is
real and IIxl1 2 is real and positive, because x i:- O. Therefore, A must be real.
(3) Let x and y be eigenvectors of A belonging to eigenvalues A and /1,
respectively. Let A i:- /1. Because A = A H and A is real, we get

A(x . y) = (AX) . y = Ax . y = x . Ay = /1 (x . y).


Since A i:- /1, it gives that X· Y = x H Y = 0, i. e., x is orthogonal to y. 0

In particular, eigenvectors belonging to different eigenvalues of a real


symmetric matrix are orthogonal.

Remark: Condition (1) in Theorem 7.1 (i.e., x H Ax is real for any complex
vector x E en) is equivalent to saying that the diagonals of A are real:

LaijXiXj
i,j

where C = Li<j aijXiXj. Since C+C E JR, all aii E JR if and only ifx H Ax E JR
for any x E en.

Problem 7.8 Prove that the determinant of any Hermitian matrix is real.

Problem 7.9 Let x be a nonzero vector in the complex vector space en, and A =
xx H . Show that A is Hermitian, and find all the eigenvalues and their eigenspaces
for A.
7.2. HERMITIAN AND UNITARY MATRICES 261

Note that if A is Hermitian, then the matrix iA is skew-Hermitian; simi-


larly, if A is skew-Hermitian, then iA is Hermitian. Therefore, the following
theorem is a direct consequence of these facts and Theorem 7.1. The proof
is left for an exercise.

Theorem 7.2 Let A be a skew-Hermitian matrix.


(1) For any complex vector x =I- 0, x H Ax is purely imaginary, and the
diagonal entries of A are purely imaginary.
(2) Every eigenvalue of A is purely imaginary. In particular, a real skew-
symmetric matrix has purely imaginary n eigenvalues.
(3) The eigenvectors of A belonging to different eigenvalues are mutually
orthogonal.

Problem 7.10 Prove Theorem 7.2 by using Theorem 7.1, and prove (3) directly.

Problem 7.11 Show that A = B+iC (B and C real matrices) is skew-Hermitian if


and only if B is skew-symmetric and C is symmetric.

Problem 7.12 Let A and B be either both Hermitian or both skew-Hermitian.


(1) AB is Hermitian if and only if AB = BA.
(2) AB is skew-Hermitian if and only if AB = -BA.

Recall that a real matrix Q is said to be orthogonal if the column vectors


of Q are orthonormal (i. e., QT Q = I). The same is true for complex matrices
(see Lemma 5.13).

Lemma 7.3 For a complex square matrix U, the following are equivalent:
(1) the column vectors of U are orthonormal;
(2) UHU = I;
(3) U-l = U H ;
(4) UUH = I;
(5) the row vectors of U are orthonormal.

The complex analogue to an orthogonal matrix is a unitary matrix.

Definition 7.5 A complex square matrix U is said to be unitary if it sat-


isfies anyone (and hence, all) of the conditions in Lemma 7.3.

Like a real orthogonal matrix, any unitary matrix preserves length.


262 CHAPTER 7. COMPLEX VECTOR SPACES

Theorem 7.4 Let U be an n x n unitary matrix.


(1) U preserves the inner products (and hence the length) on en for all
vectors x and y, (Ux, Uy) = (x, y) (or equivalently, IIUxl1 = IIxll).
(2) If). is an eigenvalue of U, then 1).1 = l.
(3) The eigenvectors of U belonging to different eigenvalues are mutually
orthogonal.

Proof: (1) (Ux, Uy) = (x, UHUy) = (x, y), and setting x = y gives us
IIUxll 2 = IIx1l2.
(2) For Ux = ).x, (x, x) = (Ux, Ux) = ).).(x, x) = 1).1 2(x, x).
(3) Let Ux = ).x, Uy = f1Y, and), i= f1. Since U is unitary, we have
).). = 1 = f1{L, and U- 1y = f1-1 y = {Ly. Therefore,

holds, and), i= f1 implies (x, y) = o. o

Theorem 7.5 A transition matrix from one orthonormal basis to another


in a complex vector space is unitary.

Proof: Let ex = {VI, ... , v n } and j3 = {WI, ... , w n } be two orthonormal


bases, and let Q = [%] be the transition matrix from the basis j3 to the basis
ex. By definition,
n
Wj = Lq£jV(I.
(1=1
Thus,

n n
L qki L q(lj(Vk' ve)
k=l (1=1
n n
= L qkiqkj = L[QH]fk[Q]kj.
k=l k=l
This means that the columns of Q are orthonormal and Q is unitary. 0
7.3. UNITARILY DIAGONALIZABLE MATRICES 263

As in the real case, it is true that two matrices representing the same
linear transformation on a complex vector space with respect to different
bases are similar. If the two bases are both orthonormal, then the transition
matrix is unitary (or orthogonal).
Problem 7.13 Show that Idet UI = 1 for any unitary matrix U.

Problem 7.14 Show that

_ [ 1 ;i 1+i

-':+ 1
A- 1 -2.
4 i
is unitary but neither Hermitian nor skew-Hermitian.

Problem 7.15 Show that if U is a unitary matrix, then so is U H .

Problem 7.16 Show that if A and B are unitary, so is AB.

Problem 7.17 Describe all 3 x 3 matrices that are simultaneously Hermitian, unitary,
and diagonal. How many are there?

7.3 Unitarily diagonalizable matrices


In the previous section, we saw that if an n x n square matrix A is Hermi-
tian, skew-Hermitian or unitary, then the eigenvectors belonging to distinct
eigenvalues are mutually orthogonal. Hence, if such a matrix A has n dis-
tinct eigenvalues, then there exists an orthonormal basis a for en consisting
of eigenvectors of A so that the matrix representation [Ala: is diagonal, i.e.,
A is diagonalizable by a unitary matrix. In this section, it will be shown
that any Hermitian, skew-Hermitian or unitary matrix has n orthonormal
eigenvectors even if the eigenvalues are not all distinct. In particular, it is
always diagonalizable by a unitary matrix.

Definition 7.6 (1) Two real matrices A and B are orthogonally simi-
lar if there exists an orthogonal matrix P such that p-l AP = B. A
matrix is orthogonally diagonalizable if it is orthogonally similar
to a diagonal matrix.
(2) Two complex matrices A and B are unitarily similar if there exists
a unitary matrix U such that U- 1 AU = B. A matrix is unitarily
diagonalizable if it is unitarily similar to a diagonal matrix.
264 CHAPTER 7. COMPLEX VECTOR SPACES

We begin with a classical theorem due to Schur (1909) concerning or-


thogonal and unitary similarity.

Lemma 7.6 (Schur's lemma) (1) If an n x n real matrix A has only


real eigenvalues, then A is orthogonally similar to an upper triangular
matrix.
(2) Every n x n complex matrix is unitarily similar to an upper triangular
matrix.

Proof: We will prove the second assertion only. The proof of the first is
similar. We will prove it by mathematical induction on n. Clearly, it is true
for n = 1. Assume now that the theorem holds for n = r - 1. Let A be an
r x r matrix and let ).1 be an eigenvalue of A with a normalized eigenvector x.
Extend it to an orthonormal basis by the Gram-Schmidt orthogonalization,
say {x, U2, ... , u r }. Set the unitary matrix U1 = [x U2··· url with these
basis vectors in its columns. A direct computation of the product U1 1AU1
shows

U1 1AUI u{l AU1 = U{l[Ax AU2 ... Aurl


xT

~~ 1[+ +l
-T
U2 I
= AU2 ...
-T
U
I
r

*
+
= o I
B
o
where B is an (r - 1) x (r - 1) matrix. By the inductive hypothesis there
exists an (r - 1) x (r - 1) unitary matrix U2 such that Ui 1BU2 is an upper
triangular matrix with diagonal entries ).2, ).3, ... , ).r. Define

Then it is easy to check that U is also a unitary matrix, so


7.3. UNITARILY DIAGONALIZABLE MATRICES 265

*
o

Schur's lemma is a cornerstone in the study of complex matrices.

Theorem 7.7 If A is either a Hermitian, a skew-Hermitian or a unitary


matrix, then it is unitarily diagonalizable.

Proof: By Schur's lemma, U H AU = B is an upper triangular matrix for


some unitary matrix U. However,

BH = UH AHU = { UH(±A)U = ±B if A is (skew-) Hermitian


U- I A-IU = B- 1 if A is unitary,

where the right sides of the equalities depend on whether A is either a


Hermitian, a skew-Hermitian or a unitary matrix. This means that B is
also either a Hermitian, a skew-Hermitian or a unitary matrix depending
on whether A is either a Hermitian, a skew-Hermitian or a unitary matrix,
respectively.
It is quite easy to show that an upper triangular matrix that is also
Hermitian or skew-Hermitian must already be a diagonal matrix. Moreover,
if an upper triangular matrix B is unitary, then one can easily show that B
is already a diagonal matrix by comparing the diagonal entries of both sides
of BHB = BBH. 0

Note that, in the similarity condition U- I AU( = U H AU) = D of A


to a diagonal matrix D through a unitary matrix U, the equation AU =
UD shows that the column vectors of U constitute a set of n orthonormal
eigenvectors of A while the diagonal entries of D are eigenvalues of A as
shown in Theorem 6.6. Therefore, by Theorems 7.1, 7.2 and 7.4, all the
diagonal entries of D are real, purely imaginary or of unit length depending
on the types (Hermitian, skew-Hermitian or unitary, respectively) of the
matrix A.

Example 7.5 Let

A=[ l+i
2 1 -1 i 1.
266 CHAPTER 7. COMPLEX VECTOR SPACES

Then A is Hermitian, and the eigenvalues of A are Al = 3 and A2 = 0 with


associated eigenvectors Xl = (1- i, 1) and X2 = (-1, 1 + i). Let
xII.
Ul Ilxlll = j3(1 - t, 1),
X2 1 .
U2 IIx211 = j3(-1, 1 +t),

1- -1]
and let

U
= ~
j3
[
1
i
l+i .
Then U is a unitary matrix and diagonalizes A:

UHAU = ~ [ 1+i 1 ][ 2 1-i][1-i -1]


3 -1 1-i 1+1 1 1 l+i

[~ ~]. o

Since all the real symmetric matrices are Hermitian matrices, they are
unitarily diagonalizable by Theorem 7.7. However, the following theorem
says more than that.

Theorem 7.8 Let A be an n x n real matrix. Then the following are equiv-
alent.
(1) A is symmetric.
(2) A is orthogonally diagonalizable.
(3) A has a full set of n orthonormal eigenvectors.

Proof: (1) ~ (2): If A is real and symmetric, then it is a Hermitian


matrix, so it has only real eigenvalues. By Schur's lemma 7.6, A is orthogo-
nally similar to an upper triangular matrix, which must be already diagonal.
Hence it is orthogonally diagonalizable. Conversely, if A is orthogonally di-
agonalizable, i.e., there is an orthogonal matrix Q such that Q-l AQ = D,
then A = QDQ-l = QDQT is clearly symmetric.
(2) ? (3) : If A is diagonalized by an orthogonal matrix Q, then the
column vectors of Q are eigenvectors of A. Hence A has a full set of n
orthonormal eigenvectors. Conversely, if A has a full set of n orthonormal
eigenvectors, then these eigenvectors form an orthogonal transition matrix
Q that diagonalizes A. 0
7.3. UNITARILY DIAGONALIZABLE MATRICES 267

Corollary 7.9 If A is a real symmetric matrix, then the dimension of the


eigenspace E()") = N()"I - A) belonging to an eigenvalue).. is equal to the
multiplicity of ).. as a root of the characteristic polynomial.

Therefore, if a real matrix A is symmetric, then it is always diagonaliz-


able, even more, orthogonally. Moreover, they are all that can be "orthog-
onally" diagonalized. Even though not all matrices are diagonalizable, cer-
tain nonsymmetric matrices may still have a full set of linearly independent
eigenvectors so that they are diagonalizable, but in this case the eigenvectors
cannot be orthogonal. That is, the transition matrix Q cannot be an orthog-
onal matrix. For example, one can verify that the matrix A = [~ ~ 1is
nonsymmetric, and has two linearly independent eigenvectors so that it is
diagonalizable, but not orthogonally diagonalizable.

Problem 7.18 Show that the nonsymmetric matrix

is diagonalizable.

Remark: The procedure for orthogonal diagonalization of a symmetric ma-


trix A can be summarized as follows.
Step 1. Find a basis for each eigenspace of A.
Step 2. Apply the Gram-Schmidt orthogonalization to each of these bases
to obtain an orthonormal basis for each eigenspace.
Step 3. Form the matrix Q whose columns are the basis vectors constructed
in Step 2; this matrix orthogonally diagonalizes A.
The justification of this procedure should be clear, because eigenvectors
belonging to distinct eigenvalues are orthogonal, while an application of
the Gram-Schmidt orthogonalization assures that the eigenvectors obtained
within the same eigenspace are orthonormal. Thus, the entire set of eigen-
vectors obtained by this procedure is orthonormal.

n
Example 7.6 Find an orthogonal matrix Q that diagonalizes

A=[H
268 CHAPTER 7. COMPLEX VECTOR SPACES

Solution: The characteristic polynomial of A is

det(>.I - A) = det [
). - 4
-2
-2
). - 4
-2
-2
1
-2 -2 ).-4
Thus, the eigenvalues of A are ). = 2 and ). = 8. By the method used in
Example 6.2, it can be shown that
UI = (-1, 1, 0) and= (-1, 0, 1) U2

form a basis for the eigenspace belonging to ). = 2. Applying the Gram-


Schmidt orthogonalization to {UI' U2} yields the following orthonormal
eigenvectors (verify):
1 1
VI = v'2 (-1, 1, 0) and V2 = 03 (-1, -1, 2).

The eigenspace belonging to ). = 8 has U3 = (1, 1, 1) as a basis. The


1
normalization of U3 yields V3 = V3 (1, 1, 1). Finally, using VI, v2, and V3

as column vectors, we obtain


1 1 1
v'2 -03 V3
1 1 1
Q=
v'2 -03 V3
2 1
0
03 V3
which orthogonally diagonalizes A. (It is suggested that the readers verify
that QT AQ is actually a diagonal matrix.) 0

7.4 Normal matrices


We have seen that Hermitian, skew-Hermitian and unitary matrices are all
unitarily diagonalizable. However, it turns out that they do not constitute
the entire class of unitarily diagonalizable matrices, whereas in the class
of real matrices the real symmetric matrices are the only matrices with
real entries that are orthogonally diagonalizable. That is, there are many
unitarily diagonalizable matrices that are neither one of the above-mentioned
classes of matrices. Actually, all unitarily diagonalizable matrices belong to
the following class of matrices, called normal matrices.
7.4. NORMAL MATRICES 269

Definition 7.7 A complex square matrix A is called normal if

Note that all the Hermitian, unitary and skew-Hermitian matrices are
normal. There are matrices that are normal but are none of these.

Example 7.7 The 2 x 2 matrix

is normal, but is neither Hermitian, skew-Hermitian, unitary, nor a diag-


onal matrix. However, one can easily check that this matrix is unitarily
diagonalizable. In fact,

U- 1 AU = ~2 [ 1
1
=D.
o
Problem 7.19 Which of following matrices are Hermitian, skew-Hermitian or nor-

[: :j,
mal?

(1) [ 1 ~i 1 + i ] . (2)
1
1 (3) [ -1i 1 -i 1

n
2i ~ ;
o ' 1 -7 0 -7

(4) [-~ ~]; (5) [~


0
(6) [ 1 ~i l+i
13i
. 1
0 -7 3 1

As a matter of fact, the theorem below shows that normal matrices are
all classified as unitarily diagonalizable matrices. We begin with a lemma.

Lemma 7.10 If an upper triangular matrix T is normal, then it must be a


diagonal matrix.

Proof: We first make a direct computation of the equation TTH = TH T:

and then compare the corresponding diagonal entries of both sides, i. e.,
270 CHAPTER 7. COMPLEX VECTOR SPACES

which implies tl2 = ... = tIn = O. Inductively, assume that ti-li = ... =
ti-In = 0 has been shown for i = 1, ... , k. Then

and
(THT)kk = Itlkl2 + ... + Itk_Ikl2 + Itkkl2 = Itkkl2,
because tlk = ... = tk-Ik = 0 by induction hypothesis. But TTH = THT
yields tkk+l = ... = tkn = O. Thus, we get tii+1 = ... = tin = 0 for all
i = 1, ... , n, which shows that all the entries of T off the diagonal are zero,
i. e., T is a diagonal matrix. 0

Theorem 7.11 If A is a complex square matrix, then the following are


equivalent:
(1) A is normal;
(2) A is unitarily diagonalizable;
(3) A has a full set of n orthonormal eigenvectors.

Proof: (1) {::> (2) Suppose that A is normal. By Schur's lemma again,
we can find a unitary matrix U so that T = UH AU is an upper triangular
matrix. Then T is also normal, since

TTH = U H AUU H AHU = U H AAHU = U HAH AU


U H AHUU H AU = THT.

Thus by Lemma 7.10, T is already diagonal so that A is unitarily diagonal-


izable. Conversely, if A is unitarily diagonalizable, i. e., U H AU = D for a
unitary matrix U, then
AAH = UDUHUDHU H UDDHU H = UDHDU H
UDHUHUDU H AHA.

That is, A is normal.


The proof of (2) {::> (3) is the same as in the proof of Theorem 7.8. 0

Note that there are many nonnormal complex matrices that are still
diagonalizable, but of course not unitarily. Readers are suggested to find
such an example.
7.5. THE SPECTRAL THEOREM 271

Recall that an n x n real matrix A is the sum S +T of a symmetric matrix


S = ~(A + AT) and a skew-symmetric matrix T = ~(A - AT). The same
can be said about a complex matrix. Let A be a complex matrix. Then it
is the sum A = HI + iH2, where
1 H i H. 1 H
H l =2(A+A ), H2=-2(A-A ) orzH2=2(A-A ).

Both HI and H2 are Hermitian, so iH2 is skew-Hermitian.

Problem 7.20 Determine whether or not the matrix

is unitarily diagonalizable. If it is, find a unitary matrix U that diagonalizes A.

Problem 7.21 For any real matrices Hl and H2 of the same size, show that A =
Hl + iH2 is normal if and only if HlH2 = H 2H l .

Problem 7.22 For any unitarily diagonalizable matrix A, prove that


(1) A is Hermitian if and only if A has only real eigenvalues;
(2) A is skew-Hermitian if and only if A has only purely imaginary eigenvalues;
(3) A is unitary if and only if [Link] = 1 for any eigenvalue .x of A.

7.5 The spectral theorem


As we saw in the previous section, normal matrices are the matrices that
are unitarily diagonalizable. That is, A is normal if and only if there exists
a basis a for en consisting of orthonormal eigenvectors of A such that the
matrix representation [AJa of A with respect to a is diagonal.

Theorem 7.12 (Spectral theorem) Let A be a normal matrix, and let


{Ul' U2, ... , un} be a set of orthonormal eigenvectors belonging to the
eigenvalues AI, A2, ... , An of A, respectively. Then A can be written in the
following form:

where Pi = uiufl is the orthogonal projection matrix onto the subspace


spanned by the eigenvector Ui for each i = 1, ... , n.
272 CHAPTER 7. COMPLEX VECTOR SPACES

The above expression of A is called the spectral decomposition of A


into projections.
Proof: Note that U = [UI U2 ... un] is the unitary matrix that transforms
A into a diagonal matrix D, i.e., U- l AU = UH AU = D. Then

A UDU H = [UI U2 ... Un] [


Alu{f
:H
1
AnUn
AlUlu{f + ... + AnUnU;;
AIPI + ... + AnPn ,

where

which is a Hermitian matrix.


Now, for any x E en and i, j = 1, ... , n,

PiX = uiuflx = (Ui,X)Ui'


~Pj uiuflujUf = (Ui' Uj)uiuf
Uiu{l = Pi if i = j,
{
uiuf = 0 if i -# j,
n
(PI+···+Pn)x = Plx+···+Pnx = L(Ui,X)Ui = X = Id(x).
i=l
Therefore, each ~ is nothing but the orthogonal projection onto the subspace
spanned by Ui which is the eigenspace E(Ai) = N(AiI - A). 0

Note that the equation PI + ... + Pn = I d means that if we restrict the


images of the Pi'S to be the span of Ui which is just JR, then (H, ... , Pn )
is another orthogonal coordinate expression just like (Zl, ... , zn) of the en,
but in this case, with respect to the orthonormal basis {UI, U2, ... , un}
(see Sections 5.4 and 5.10).
Note that any x E en has the unique expression x = L:(Ui,X)Ui as a
linear combination of the orthonormal basis vector Ui'S, and
7.5. THE SPECTRAL THEOREM 273

= AIUI(U{lX) + ... + AnUn(U;;X)


AI{UI,X)UI + ... + An{Un,X)Un .

If an eigenvalue A has multiplicity £, i.e., A = Ai! = ... = Ail' with a set


of £ orthonormal eigenvectors UiI' ... , Uif' then they form an orthonormal
basis for the eigenspace E(A), and

P>. = I{I + ... + Pil = Uil u~ + ... + Uif u~


is the orthogonal projection matrix onto E(A). Therefore, counting the mul-
tiplicity of each eigenvalue, every normal matrix A has the unique spectral
decomposition into projections

for k ::; n, where Ai'S are all distinct.

Corollary 7.13 Let A be a normal matrix.


(1) The eigenvectors of A belonging to different eigenvalues are mutually
orthogonal.
(2) If an eigenvalue A of A has multiplicity k, then the eigenspace N(A -
AI) belonging to A is of dimension k.

Corollary 7.14 Let A be a normal matrix with the spectral decomposition


A = AIP>'I + ... + AkP>'k. Then, for any positive integer m,

Moreover, if A is invertible, then for any positive integer £

A

= -A£1 P,"1 + ... + -A£1 p,"k·
I k

Example 7.8 Find the spectral decomposition of

Solution: From Example 7.6, the spectral decomposition is


274 CHAPTER 7. COMPLEX VECTOR SPACES

where the projections are

-1
PI H
VIVI = -1 [ -11 ] [-1 1 0]
2 0
= -1 [ -11
2 0
1
0
~ ],
P2 v2 v f= [ ~ =i ] [-1 - 1 2] =6
1 [ 1
1
1
1 -2 -2] ,
-2 -2 4

V3Vtj = -1[1]
1 [1 1 1] = -1 [ 11 1
1 11 ] .
3 1 3 1 1 1

[ 2-1 -1]
Hence,

P{ = PI + P2 = ~ -1 2-1
-1 -1 2
is the projection onto the eigenspace E(2) belonging to .>. = 2, P3 is the
projection onto the eigenspace E(8) belonging to .>. = 8, and

A = [~2 ~2 4~] ~3 [-~-1 -1-~ =~]


=
2
+ ~ [~ ~ ~].
3 1 1 1 0

Problem 7.23 Given A = 2 [


2 -1
3-2
10
,find an orthogonal matrix Q that
-1 -2 0
diagonalizes A, and find the spectral decomposition of A.

Example 7.9 Find the spectral decomposition of a normal matrix

Solution: Since A is normal (AAH = AHA), it is unitarily diagonalizable.


The characteristic polynomial of A is

o
det(A/ - A) = det [ _~ .>. - i
o
-~ ] = (A - i)'(A + i),
7.5. THE SPECTRAL THEOREM 275

Hence, the eigenvalues are i = >'1 = >'2 of multiplicity 2 and -i = >'3 of


multiplicity 1. By a simple computation using the Gram-Schmidt orthogo-
nalization, one can find that

are orthonormal eigenvectors of A belonging to the eigenvalues >'1, >'2 and


>'3, respectively. Now, the spectral decomposition is A = i(P1 + P2) - iP3,
where the projection matrices are

-1o 1.
1

Hence,

OIl
2 0 -~
.[ 1 0 -1
0 0 0
1.
o 1 -1 0 1 o

Problem 7.24 Find the spectral decomposition of each of the following matrices:

(1) A = [ ~ ~ ]; (2) B = [ 2 ~i 2+i ] .

! n n
3 '

~
0 0 1 1

(3) C [
2
0
0
0
2
0
(4)D~ [1 0 0
0 0
0 0
276 CHAPTER 7. COMPLEX VECTOR SPACES

7.6 Exercises
7.1. Calculate Ilxll for

(1) x = [ 1; i ] , (2) x = [
1 - 2i
i
1 .
3+i
7.2. Construct an orthonormal basis for C 2 from {(i, 4 + 2i), (5 + 6i, I)} by
applying the Gram-Schmidt orthogonalization.

7.3. Find the rank of the matrix A = [ ~


1 i
1
1+ i
1- i
1 1+ i
2+i
1.
1 + 3i l-i 2-i 1 + 4i
7.4. Find the eigenvalues and eigenvectors for each of the following matrices:

(1)

(3)

1 1

7.5. Find the third column vector v so that U = [


V3
~
v'2
o ~1 i, unitary.
V3
How much freedom is there in this choice?
7.6. Find a real matrix A such that A + rI is invertible for all r E lR. Does there
exist a square matrix A such that A + cI is invertible for all c E C?
7.7. Find a unitary matrix whose first row is
(1) k(l, 1 - i) where k is a number, (2) (~, ~, l;-i).
7.B. Let V = C2 with the Euclidean inner product. Let T be the linear trans-
formation on V with the matrix representation A = [i ~] with respect
to the standard basis. Show that T is normal and find a set of orthonormal
eigenvectors of T.
7.9. Prove that the following matrices are unitarily similar:

[ ~~~: - ;~~: ], [e~9 e~i9]' where B is a real number.


7.10. For each of the following real symmetric matrices A, find a real orthogonal
matrix Q such that QT AQ is diagonal:

(1) [i i], (2) [~ ~].


7.6. EXERCISES 277

7.11. For each of the following Hermitian matrices A, find a unitary matrix U such
that U H AU is diagonal.

1
(2) [ 2 - 3i
2 + 3i ]
-1 ' (3) [ ~i 2
2+i ]
1- i .
2-i 1+i 2
7.12. Find the diagonal matrices to which the following matrices are unitarily sim-
ilar. Determine whether each of them is Hermitian, unitary or orthogonal.

(1) !2 [ 1-
1+ ~
z
1- i ]
l+i '
(2) [0.6
0.8
-0.8]
0.6 ' (3) [ -] 1 °li ].
-i
7.13. For a skew-Hermitian matrix A, show that
(1) A - I is invertible, (2) e A is unitary.
7.14. Let U be a unitary matrix. Prove that U and U T have the same set of
eigenvalues.

7.15. Verify that A = [: 2] is normal. Diagonalize A by a unitary matrix U.


7.16. Show that the nonsymmetric real matrix

A = [ i ~ ~]
-2 -4 -5
can be diagonalized.

7.17. Suppose that A, Bare diagonalizable n x n matrices. Prove that AB = BA


if and only if A and B can be diagonalized simultaneously by the same matrix
Q, i.e., Q-l AQ and Q-l BQ are diagonal matrices.

7.18. Find the spectral decomposition of A = [2i i1 ~1]


7.19. Let A and B be 2 x 2 symmetric matrices. Prove that A and B are similar
if and only if det A = det Band tr A = tr B.
7.20. Let A be a real symmetric n x n matrix and A an eigenvalue of A with
multiplicity m. Show that dimN(A - AI) = m.
7.21. Show that a matrix A is nilpotent, i.e., An = 0 for some integer n ?: 1, if and
only if its eigenvalues are all zero.
7.22. Determine whether the following statements are true or false, in general, and
justify your answers.
(1) A Hermitian matrix is always unitarily similar to a diagonal matrix.
(2) An orthogonal matrix is always unitarily similar to a real diagonal ma-
trix.
(3) For any m x n matrix A, AAH and AHA have the same eigenvalues.
278 CHAPTER 7. COMPLEX VECTOR SPACES

(4) If a triangular matrix is similar to a diagonal matrix, it is already diag-


onal.
(5) If all the columns of a square matrix A are orthonormal, then A is
diagonalizable.
(6) Every permutation matrix is diagonalizable.
(7) Every permutation matrix is Hermitian.
(8) A nonzero nilpotent matrix cannot be symmetric.
(9) Every square matrix is unitarily similar to a triangular matrix.
(10) If A is a Hermitian matrix, then A + if is invertible.
(11) If A is a real matrix, then A + if is invertible.
(12) If A is an orthogonal matrix, then A + ~ f is invertible.
(13) Every unitarily diagonalizable matrix is Hermitian.
(14) Every diagonalizable matrix is normal.
Chapter 8

Quadratic Forms

8.1 Introduction
The reader should now be well aware of the important roles of matrices in
the study of linear equations, which can be expressed in the form

The left side alXl + a2x2 + ... + anXn = aT x of the equation is a (homo-
geneous) polynomial of degree 1 in n variables, called a linear form. In
this chapter, we study a (homogeneous) polynomial of degree 2 in several
variables, called a quadratic form, and show that matrices also play an
important role in the study of a quadratic form. Quadratic forms arise in
a variety of applications, including geometry, vibrations of mechanical sys-
tems, statistics, and electrical engineering, etc. A more general form of a
quadratic form is a bilinear form which is described in Section 8.7. The
inner product of a vector space and the determinant function on M 2x2 (IR.)
are typical examples of bilinear forms. It turns out that a quadratic form
(or bilinear form) may be associated with a real symmetric matrix, and vice
versa.
A quadratic equation in two variables x and y is an equation of the form

ax2 + 2bxy + cy2 + dx + ey + f = 0,


in which the left side consists of a constant term f, a linear form dx + ey,
and a quadratic form ax 2 + 2bxy + cy2. Note that this quadratic form may

279
280 CHAPTER 8. QUADRATIC FORMS

be written in matrix notation as

ax2 + 2bxy + cy2 = [X y] [~ ~] [ ~ ] = x T Ax,

where

x = [ ~] and A = [~ ~].
Note also that the matrix A is taken to be a (real) symmetric matrix.
Geometrically, the solution set of a quadratic equation in x and y usually
represents a conic section, such as an ellipse, a parabola or a hyperbola in
the xy-plane.

Definition 8.1 (1) An equation of the form


n n n
f(x) = L L aijXiXj +L biXi +c = 0,
i=lj=l i=l

where aij, bj and c are real constants, is called a quadratic equation


in n variables xI, X2,' .. ,Xn . In matrix form, it can be written as

f(x) = x T Ax + b T x + c = 0,
where A = [aij], x = [Xl ... xn]T and b = [b l ... bnV in ]Rn.

(2) A linear form is a polynomial of degree 1 in n variables Xl, X2, ... , Xn


of the form
n
b Tx = LbiXi,
i=l

where x = [Xl ... Xn]T and b = [b l ... bn]T in ]Rn.

(3) A quadratic form is a (homogeneous) polynomial of degree 2 in n


variables Xl, X2, ... ,Xn of the form

n n
= L L aijXiXj,
i=l j=l

where x = [Xl X2 ... Xn]T E ]Rn and A = [aij] is a real n x n matrix.


8.1. INTRODUCTION 281

Remark: (1) A quadratic equation is said to be consistent if it has a


solution, i.e., there is a vector x E R n such that f(x) = O. Otherwise,
it is said to be inconsistent. For instance, the equation 2x2 + 3y2 = -1
is inconsistent. In the following discussion we will consider only consistent
equations.
(2) A linear form is simply the dot product in Rn with a fixed vector
bERn. (This has been discussed in the previous chapters.)
(3) The matrix A in the definition of a quadratic form is any square
matrix, but it can be restricted to be a symmetric matrix. In fact, any
square matrix A is the sum of a symmetric part B and a skew-symmetric
part C, say

A=B+C,

For the skew-symmetric matrix C, we have

xTCx = (xTCx)T = xTCTx = -xTCx.

Hence, as a real number, x T Cx = O. Therefore,


q(x) = x T Ax = xT(B + C)x = x T Bx.
This means that, without loss of generality, one may assume that the matrix
A in the definition of a quadratic form is a symmetric matrix.
(4) From the definition of a quadratic form, one can see that, fixing a
basis like the standard basis for R n , a quadratic form is associated with a
unique symmetric matrix, which is called the matrix representation of
the quadratic form q with respect to the basis chosen (the standard basis
for Rn). On the other hand, any (real) symmetric matrix A gives rise to a

quadratic form x T Ax. For example, for a symmetric matrix [~ _ ~ l' the
equation

defines a quadratic form 8x~ + 4X1X2 - x~.

Problem 8.1 Find the symmetric matrices representing the quadratic forms
(1) 9xi - x~ + 4x~ + 6XIX2 - 8XIX3 + 2X2X3,
(2) XIX2 + XIX3 + X2X3,
(3) xi + x~ - x~ - x~ + 2XIX2 - lOXIX4 + 4X3X4.
282 CHAPTER 8. QUADRATIC FORMS

8.2 Diagonalization of a quadratic form


In this section, we discuss how to sketch the solutions of a quadratic equation.
To study the solution of a quadratic equation f(x) = 0, we first consider an
equation x T Ax = c without a linear form.
This quadratic form may be rewritten as the sum of two parts:
n
q(x) = x T Ax = L aii x ; + 2 L aijXiXj,
i=l i<j

in which the first part L:f=l aiixT is called the (perfect) square terms and
the second part L:iofi aijXiXj is called the cross-product terms. Actually,
what makes it hard to sketch the quadratic surface is the cross product
terms. However, the symmetric matrix A can be orthogonally diagonalized,
i. e., there exists an orthogonal matrix P such that

o
pT AP = p- l AP = D =

o
Here, the diagonal entries Ai'S are the eigenvalues of A and the column
vectors of P are their associated eigenvectors of A. Then we get, by setting
x=Py,

which is a quadratic form without the cross-product terms. Consequently,


we have proven the following theorem.

Theorem 8.1 Let x T Ax be a quadratic form in x = [Xl X2 ... Xn]T E ]Rn


for a symmetric matrix A. Then there is a change of coordinates of x into
y = pT x = [YI Y2 ... YnV such that

x T Ax = yTDy = Alyi + A2Y~ + ... + AnY~,


where P is an orthogonal matrix and pT AP = D. D

Remark: (1) Recall that in Theorem 8.1 the columns of the matrix Pare
the orthonormal eigenvectors of A and y is just the coordinate expression
of x with respect to the orthonormal eigenvectors of A. In fact, P = [Id]3,
8.2. DIAGONALIZATION OF A QUADRATIC FORM 283

where {3 is a basis consisting of orthonormal eigenvectors of A and a is the


standard basis.
(2) The solution set of a quadratic equation of the form x T Ax = c is a
hypersurface in ~n, that is, a curved surface that can be parameterized in
n - 1 variables. These are called n - I-dimensional quadratic surfaces, with
axes in the directions of eigenvectors. In particular, if n = 2, the solution
set of a quadratic equation is called a quadratic curve, or more commonly
a conic section. When n = 3, the quadratic surfaces are ellipsoids or
hyperboloids depending on the signs of the eigenvalues of A. Of course, a
paraboloid is also a quadratic surface, but it appears when a linear form
is present in the quadratic equation. The determination of the quadratic
hypersurface depends on the signs of the eigenvalues of A.

Example 8.1 Determine the conic section 3x2 + 2xy + 3y2 - 8 = O.

Solution: This equation can be written in the form

[x Y] [~ ~] [: ] = 8.

The matrix A = [~ ~] has eigenvalues Al = 2 and A2 = 4 with associated


unit eigenvectors

and

respectively, which form an orthonormal basis {3. If a denotes the standard


basis, then the transition matrix

1 [ 1 1 ]__ [ co.s 45° sin 45° ] ,


p = [JdJ3 = [VI V2] = M2 -1
v£. 1 - sm45° cos 45°

which is a rotation through 45° in clockwise direction such that pT = p-l.


It gives a change of coordinates, x = Py, i. e.,

X _ _1_ 1 1 x' ] _ [
J2x
I '+ J2Y
I ']
[]
Y [
- v'2 -1 1 ] [
y' - - ~x' + ~y' .
284 CHAPTER 8. QUADRATIC FORMS

Thus, we get

x T Ax = yTpT APy

yT [; ~ 1y = 2(x')2 + 4(y')2 8,

or
(X')2 (y')2
--+--=l.
4 2
Its solution set is just an ellipse with axes VI = pT el and V2 = pT e2. 0

Definition 8.2 The inertia of a symmetric matrix A is a triple of integers


denoted by In(A) = (p, q, k), where p, q and k are the numbers of positive,
negative and zero eigenvalues of A, respectively.

It turns out that the inertia In(A) completely determines the geometric
type of the quadratic form in the following sense. Since In( -A) = (q, p, k)

°
if In(A) = (p, q, k) and the equation x T Ax = c is inconsistent if p = and
c > 0, it suffices to consider the cases of c 2: and p > 0. Excluding those
°
inconsistent cases we have the following characterization of the solution sets
for n = 2 and 3:
For n = 2, there are only three possible cases for In(A):
In(A) The solution of Xl Ax = c
(p, q, k) c>o c=o
(2, 0, 0) ellipse a point
(1, 1,0) hyperbola two lines crossing at 0
(1, 0, 1) two parallel lines a line

For n = 3, there are six possibilities:


In(A) The solution of Xl Ax = c
(p, q, k) c>o c=o
(3, 0, 0) ellipsoid a point
(2, 1,0) one-sheeted hyperboloid elliptic cone
(2,0, 1) elliptic cylinder a line
(1, 2, 0) two-sheeted hyperboloid elliptic cone
(1, 1, 1) hyperbolic cylinder two planes crossing in a line
(1, 0, 2) two parallel planes a plane
8.2. DIAGONALIZATION OF A QUADRATIC FORM 285

In general, In(A) will have n(n + 1)/2 possibilities, each characterizing


a different geometric type of a quadratic form. For example, if In(A) =
(n, 0, 0) and c > 0, i.e., the eigenvalues of A are all positive, then the
quadratic form describes an ellipsoid in lR.n , etc.

Example 8.2 Determine the quadratic surface for 2xy + 2xz = l.


Solution: The matrix for the given quadratic form is

A= [ °100
° °1,
1
1 1

so the eigenvalues of A are found to be >'1 = J2, >'2 = -J2, >'3 = 0, and
the associated orthonormal eigenvectors are

1 1 1) ( 1 1 1)
VI = ( J2' 2' 2 ,V2 = - J2' 2' 2 '
respectively. Hence, an orthogonal matrix P that diagonalizes A is

P = -1 [J21 -J21 -J2 °1,


2 1 1 J2
and with the change of coordinates x = Py, that is,
1, , In, 1
Y = 2(x + y - v 2z ), z = 2(x' + y' + y'2z')

the equation is transformed to J2(x')2 - J2(y')2 = 1, which is a hyperbolic


cylinder. Note that In(A) = (1, 1, 1). D

Consider a general form of quadratic equation

(1) If it does not have a linear form, i.e., b = 0, then, as we have seen
above, a parabolic curve or surface does not appear as a solution of the
quadratic form.
(2) Suppose that it has a nonzero linear form, i.e., b i- O. If the matrix
A is invertible, then, by taking a change of variables as y = x + ~ A-I b (or
286 CHAPTER 8. QUADRATIC FORMS

x = y - !A- 1b), the given quadratic equation is transformed into a new


quadratic equation yT Ay = d without a linear form, where d = c+ ~ b T A -1 b.
However, if A is singular, the solution of the quadratic equation depends not
only on the inertia of A, but also on the type of linear form, and the parabolic
curve or surface appear as solutions to the quadratic equation with a nonzero
linear form. For example, the equation x 2 - Z = c has a singular quadratic
form for which In(A) = (1, 0, 2) and also has a nonzero linear form that
cannot be removed by any change of variables. The solution of this equation
is a parabolic cylinder when n = 3.

Example 8.3 Classify the conic section for 3x2 - 6xy + 4y2 + 2x - 2y = o.
Solution: The matrix for the quadratic form 3x 2 - 6xy + 4y2 is

A= [_~ -!].
Its inverse is
A-I = ~ [~ ~] and b = [ _; ].

With the change of coordinates y = x + !A- 1 b, that is


1
x'=x+-
3' y'=y ,
the equation is transformed to a new equation 3(x')2 - 6x'y' + 4(y')2 = ~.
Clearly, the matrix representation of the new quadratic form is also A, and
its eigenvalues are !(7 ± J37). Therefore, In(A) = (2, 0, 0) and the solution
of the equation is an ellipse. 0

Example 8.4 In analytic geometry, a general quadratic curve (or conic sec-
tion) is represented by a quadratic equation in two variables as

ax 2 + 2bxy + cy2 + dx + ey + f = 0

with the symmetric matrix A = [~ ~ l We present here the classification


of the conic sections according to the coefficients.
(1) If b = 0, then A is already a diagonal matrix with the eigenvalues a
and c, and the equation becomes

ax 2 + cy2 + dx + ey + f = o.
8.2. DIAGONALIZATION OF A QUADRATIC FORM 287

(i) If a = 0 = c, then the conic section is a line in the plane.


(ii) If a # 0 = c, then it is a parabola when e # 0, or one or two lines
when e = o.
(iii) If a #0# c, then the quadratic equation becomes

ax 2 + cy2 + dx + ey + f = a(x - p)2 + c(y - q)2 +h = 0


for some constants p, q, and h. If h = 0, the cases are easily classified
(try). Suppose h # o. Then, the conic section is a circle if a = c, an
ellipse if ac > 0, or a hyperbola if ac < o.

(2) Suppose that b # o. Since A is symmetric, it can be diagonalized


by an orthogonal matrix P whose columns are orthonormal eigenvectors,
and the diagonal matrix has eigenvalues Al and A2 on the diagonal. By a
coordinate change by P, the quadratic equation becomes

ax 2 + 2bxy + cy2
+ dx + ey + f = AlU 2 + A2v2 + d'u + e'v + f = o.
for some constants d' and e'. Hence, the classification of the conic sections is
reduced to case (1) according to the various possible cases of the eigenvalues
of A. However, the eigenvalues are given as

which are determined by the coefficients a, b, and c. Hence one can classify
the conic section according to the various possible cases a, b, and c (see
Exercise 8.12).
(3) The axes of the conic section are the directions of the eigenvectors,
which are orthogonal to each other. Since we only need to find axis lines,
but not the direction vectors, we may choose them to be the rotation of the
standard coordinate x- and y-axes which are determined by el, e2. Now, a
pair of orthogonal eigenvectors are found to be

for i = 1,2. The slope of VI from the x-axis is

-(a - AI)
+ y --v;- + 1
a-c l(a-c)2
b - --v;-
- cot 2() + cosec 2() = tan (),
288 CHAPTER 8. QUADRATIC FORMS

where cot 2e = aibc for some e. Since b -=I 0 and a - c > 0, we may assume
that 0 < e < n. This means that if we set tan2e = a~c with -I < e < I'
then e is the rotation angle we were looking for. Therefore, the orthonormal
eigenvectors Ul and U2 of A may be chosen as the rotation of the standard
basis through the angle e. The transition matrix is now

P = [Ul U2] = [ C?s e


sm e
- cos
sin eland
e ' pT AP = p- 1AP = [AI0 0
A2
1.
By a change of coordinates [ ~1 P [ ~~ l' the quadratic equation
becomes

where d' =d cos e+ e sin e and e' = -d sin () + e cos (). 0

Problem 8.2 Sketch the graph of each of the following quadratic equations:
(1) 2x2 + 2y2 + 6yz + lOz 2 = 9;
(2) x 2 - 8xy + 16y2 - 3z 2 = 8;
(3) 4x 2 + 12xy + 9y2 + 3x - 4 = O.

8.3 Congruence relation


As we have seen so far, in a quadratic equation x T Ax + b T x + c = 0, the
linear form may be eliminated by a change of variables when A is invertible,
and then by another change of variables the equation can be transformed
into a simple form yT Ay = c having only square terms. Hence, the geo-
metric types of quadratic equations may be easily classified. However, these
transformations of equations contain basis changes by some orthogonal ma-
trices.
Let us now consider a change of basis (or variables) and a relation be-
tween two different matrix representations of a quadratic form. Usually a
quadratic form x T Ax with a symmetric matrix A is expressed in the coordi-
nates of x with respect to the standard basis Q = {el' e2, ... , en} for ~n.
Let f3 = {e~, e;, ... , e~} be another basis for ~n. Then any vector x in ~n
has two coordinate representations [x]Q and [x],a through the equations
8.3. CONGRUENCE RELATION 289

They are related as [X]a = P[x]/3' where P = [I d]3 is the transition matrix
from (3 to n. This is just a change of coordinates (or variables). If we set
notations x = [x]a and y = [x]/3' then the quadratic form can be written as

where yT By is the expression of xT Bx in a new basis (or a coordinate


system) (3, and B = pT AP.

Definition 8.3 Two n x n matrices A and B are said to be congruent if


there exists an invertible matrix P such that pT AP = B.

It is easily seen that the congruence relation is an equivalence relation in


the vector space Mnxn(I~), and any two matrix representations of a quadratic
form with respect to different bases are congruent.

Remark: (1) Clearly, two orthogonally similar matrices are congruent, but
the converse is not true in general. By Theorem 8.1, a symmetric matrix A
is congruent to a diagonal matrix D by an orthogonal matrix P. However, it
can be congruent to many diagonal matrices (not necessarily by orthogonal
matrices). In fact, if pT AP = D by an orthogonal matrix P, then the
matrix Q = kP, k -=I 0, also diagonalizes A to a different diagonal matrix via
a congruent relation:

which is another diagonal matrix with diagonal entries k 2AI, k 2A2, ... , k 2An.
In this case, if k -=I ±1, Q is not an orthogonal matrix and the resulting
diagonal entries are not the eigenvalues of A anymore.
(2) Sylvester's law of inertia (Theorem 8.13 in Section 8.7) says that
even though a symmetric matrix A may be congruent to various diagonal
matrices, the numbers of positive, negative and zero diagonal entries, respec-
tively, of the congruent diagonal matrices do not change no matter what the
diagonalizing matrices are. That is, any two congruent matrices have the
same inertia.
There is a practical way of diagonalizing a symmetric matrix through
the congruence relation by using the elementary operation of adding a row
(or a column) to a constant multiple of another.
Suppose that we have diagonalized a symmetric matrix A by an invertible
matrix P through the congruence relation pT AP = D. Since both P and
290 CHAPTER 8. QUADRATIC FORMS

p T are invertible matrices, pT can be written as a product of elementary


matrices Ei's, say pT = Ek'" E2E1. Then we have

Note also that multiplying an elementary matrix to A on the left (on the
right) is the same as executing an elementary row (column, respectively)
operation to A. Clearly, if E is an elementary matrix, so is ET. Moreover,
multiplying an elementary matrix ET on the right of A is equal to performing
the same operation on the i-th column of A as the row operation E operates
on the i-th row. Since A is symmetric, this means that the operations
EAET will have the same effect on the diagonally opposite entries of A
simultaneously. For instance, if

then by the elementary matrix

E = [ -1
1 0 0
1 0
1,
001

which adds -1 times the first row to the second row, yielding

Thus, the above equation implies that the operations performed on the
left side of A (i. e., Ek" . E2E1) are nothing but a Gaussian elimination on
A to get an upper triangular matrix pTA and those on the right side (i.e.,
E[ Ef ... En are the corresponding column operations to yield a diago-
nal matrix D. In summary, if we take a Gaussian-elimination of A by the
elementary matrices E 1, ... , E k , then E k ··· E 1AE[ ... El = D implies
P -- ET T '
1'" E k' z.e.,

[A I I] -t [E1AE[ I lEn -t [E2 E 1AE[Ef I IE[Efl-t···


-t [Ek'" E 1AE[· .. El I IE[ .. · Ell = [D I Pl·
8.3. CONGRUENCE RELATION 291

Remark: (1) Notice that in this case P need not be an orthogonal matrix,
and the diagonal entries of D need not be eigenvalues of A.
(2) Be careful not to apply the same argument for the diagonalization of
symmetric matrices through the similarity p- 1 AP = D, because multiplying
E-l on the right of A is not the same column operation as E (try it yourself)
so that the operations EAE- 1 do not work for the diagonalization of A.

Example 8.5 Let

A~[~ ~ :]
be a symmetric matrix. The preceding method produces

[A I I] = [~2 3~ 6~ 1i ~0 0~ 1~]
1 0 0 I 1 -1
[ o -1 1 I 0 1
o 1210 0

~ [E3 E2EIAE{ Ef EI I I E{ Ef Efl _ [~1


o 0
-1 0
o 3
I1
I0
I0
-1
1
0
[DIP],
where

El = [ - ~ 0~ ~],
o 1
E2 = [ ~ ~ ~], E3 = [~ ~ ~].
-2 0 1 0 1 1

One can check that pT AP = D by a direct computation. This example


shows how to get In(A) = (2, 1, 0) without computing the eigenvalues of
A. 0

Problem 8.3 Find an invertible matrix P such that pT AP is diagonal for each of
the following matrices:

(1) A = [
0 1
1 1
-1
0
1; (2) A [ -31
=
-1 0 2 1
292 CHAPTER 8. QUADRATIC FORMS

8.4 Extrema of quadratic forms


In a calculus course, one uses the second derivative test to see whether a
given function Y = f (x) takes a local maximum or a local minimum at a
critical point. In this section, we show a similar test for a function of more
than one variable and also show how quadratic forms arise and how they
can be used in this context.
Let f(x) be a real-valued function on IRn. A point Xo in IR n at which either
a first partial derivative of f fails to exist or the first partial derivatives of f
are all zero is called a critical point of f. If f(x) has either a local maximum
or a local minimum at a point Xo and all the first partial derivatives of f exist
at Xo, then all of them must be zero, i.e., fXi (xo) = 0, for all i = 1, ... , n.
Thus if f(x) has first partial derivatives everywhere, its local maxima and
minima will occur at critical points.
Let us first consider a function of two variables: f(x), x = (x, y) E 1R2 ,
which has a critical point Xo = (xo, YO) E 1R2 . If f has continuous third
partial derivatives in a neighborhood of Xo, it can be expanded in a Taylor
series about that point: for x = (xo + h, Yo + k),

f(x) = f(xo + h, + k) = f(xo) + (hfx(xo) + kfy(xo))


Yo
+"21 ( h 2fxx(xo) + 2hkfxy(xo) + k 2 ) +R
fyy(xo)
1
= f(xo) + "2 (ah2 + 2bhk + ck2) + R,
where
a = fxx(xo), b = fxy(xo),
and the remainder R is given by

with z = (xo + (Jh, Yo + (Jk) for some 0 < (J < 1.


If hand k are sufficiently small, IRI will be smaller than the absolute
value of ~(ah2 + 2bhk + ck 2), and hence f(x) - f(xo) and ah 2 + 2bhk + ck 2
will have the same sign. Note that the expression

q(h, k) = ah2 + 2bhk + ck 2 = [h k]H [ ~1


8.4. EXTREMA OF QUADRATIC FORMS 293

is a quadratic form in the variables hand k, where

H = [a b
b c
1= [fxx(x o)
fxy(xo)
fxy(xo)
fyy(xo)
1
is a symmetric matrix, called the Hessian of f at Xo = (xo, YO). Hence,
f (x, y) has a local minimum (or maximum) at Xo if and only if the quadratic
form q( h, k) is positive (or negative, respectively) for all sufficiently small
(h, k). The critical point Xo is called a saddle point if q( h, k) takes both
positive and negative values. Thus at this point f(x, y) has neither a local
minimum nor a local maximum. (This is the second derivative test for a local
extrema of f(x, y).)
In particular, a quadratic form

q(x) = x T Ax = [x Y] [~ ~ 1[ : 1= ax 2 + 2bxy + cy2,


for x = [x y]T E ]R2 is itself a function of two variables, and its first partial
derivatives are
qx 2ax + 2by
qy 2bx + 2cy.
By setting these equal to zero, we see that 0 = (0, 0) is a critical point of q.
If ac - b2 =1= 0, this will be the only critical point of q. Note the Hessian of q
is

H =2 [~ ~ 1= 2A.
Thus H is nonsingular if and only if ac - b2 =1= o.
Since q(O) = 0, it follows that the quadratic form q takes the global
minimum at 0 if and only if
q(x) = x T Ax > 0 for all x =1= 0,

and q takes the global maximum at 0 if and only if


q(x) = x T Ax < 0 for all x =1= o.
If x T Ax takes both positive and negative values, then 0 is a saddle point.
Thus, if A is nonsingular, the quadratic form q will have either the global
minimum, the global maximum or a saddle point at o.
This argument leads us to the following general definition for a symmetric
matrix.
294 CHAPTER 8. QUADRATIC FORMS

Definition 8.4 Let A = [aij] E Mnxn(lR) be a symmetric matrix and let


x = (Xl, X2, ... , Xn) E ]Rn. Then, A is said to be
(1) positive definite if x T Ax = L,i,j aijXiXj > 0 for all nonzero x,
(2) positive semidefinite if x T Ax = L,i,j aijXiXj 2: 0 for all x,
(3) negative definite if x T Ax = L,i,j aijXiXj < 0 for all nonzero x,
(4) negative semidefinite if x T Ax = L,i,j aijXiXj :s: 0 for all x,
(5) indefinite if x T Ax takes both positive and negative values.

For example, the real symmetric matrix

[-io -~ -~]
-1 2

is positive definite, because the quadratic form satisfies

[Xl x2 X3] [ _XI2:12~:~ X3]


+ 2X3
-X2
Xl (2XI - X2) + X2( -Xl + 2X2 - X3) + X3( -X2 + 2X3)
2xi - 2XIX2 + 2x~ - 2X2X3 + 2x~
xi + (Xl - X2)2 + (X2 - X3)2 + x~ > 0
unless Xl = X2 = X3 = o.
To determine whether or not a matrix A is positive definite, one can
diagonalize A so that
n n
xT Ax = L aijXiXj = yT Dy = L AiY;,
i,j=l i=l

where y = pT x for an orthogonal matrix P and the Ai'S are eigenvalues of


A. Therefore, x T Ax> 0 for all nonzero x E ]Rn if and only if all the Ai'S are
positive.
Consequently we have the following characterization of positive definite
matrices:
8.4. EXTREMA OF QUADRATIC FORMS 295

Theorem 8.2 A real symmetric n x n matrix A is positive definite if and


only if all the eigenvalues of A are positive.

In particular, if A is positive definite, det A > O. If the eigenvalues of


A are all negative, then -A must be positive definite and consequently A
must be negative definite. If A has eigenvalues that differ in sign, then A is
indefinite. Indeed, if Al is a positive eigenvalue of A and Xl is an eigenvector
belonging to AI, then

and if A2 is a negative eigenvalue with eigenvector X2, then

If A is definite, then 0 is the only critical point of a quadratic form


q(x) = x T Ax, and q(O) = 0 is the global minimum if A is positive definite
and the global maximum if A is negative definite. If A is indefinite, then 0
is a saddle point. Hence, if a function f of two variables has a nonsingular
Hessian H at a critical point Xo = (xo, Yo) which has nonzero eigenvalues
Al and A2, then we can say the second derivative test for f(x) as follows:
(1) f has a minimum at Xo if Al > 0 and A2 > 0,
(2) f has a maximum at Xo if Al < 0 and A2 < 0,
(3) f has a saddle point at Xo if Al and A2 have different signs.

Example 8.6 For q(x,y) = 2x2 - 4xy + 5y2, determine the nature of the
critical point (0, 0).
Solution: The matrix A of the quadratic form is

[ 2 -2]
-2 5 .

Its eigenvalues are Al = 6 and A2 = 1. Since both eigenvalues are positive,


A is positive definite and hence (0, 0) is a global minimum. D

Example 8.7 Find and describe all critical points of the function

1
f(x, y) = 3" x 3 + xy2 - 4xy + 1.
296 CHAPTER 8. QUADRATIC FORMS

Solution: The first partial derivatives of fare

fx = x 2 + y2 - 4y, fy = 2xy - 4x = 2x(y - 2).

Setting fy = 0, we get x = °°
or y = 2. Setting fx = 0, we see that if
x = 0, then y must be either or 4, and if y = 2, then x = ±2. Thus
(0, 0), (0, 4), (2, 2), (-2, 2) are the critical points of f. To classify these
critical points, we compute the second partial derivatives:

fxx = 2x, fxy = 2y - 4, fyy = 2x.

For each critical point (xo, Yo), we determine the eigenvalues }.1 and }.2 of
the Hessian
H= [ 2yo2xo- 4 2yo -
2xo
4] .
These values are summarized in the following table:

Critical Point (xo, Yo) }.1 }.2 Description


(0, 0) 4 -4 saddle point
(0, 4) 4 -4 saddle point
(2, 2) 4 4 local minimum
(-2, 2) -4 -4 local maximum D

In general, the above arguments can be extended to the second derivative


test for functions of more than two variables: Let f(x) = f(Xl, ... , x n ) be
a real-valued function whose third partial derivatives are all continuous. If
Xo is a critical point of f, the Hessian of f at Xo is the symmetric matrix
H = H(xo) = [hij] given by

The critical point can be classified as follows:


(1) f has a local minimum at Xo if H(xo) is positive definite,
(2) f has a local maximum at Xo if H(xo) is negative definite,
(3) Xo is a saddle point of f if H(xo) is indefinite.
8.4. EXTREMA OF QUADRATIC FORMS 297

Example 8.8 Find the local extrema of the function

I(x, y, z) = X2 + xz - 3cosy + Z2.

Solution: The first partial derivatives of I are

Ix = 2x + z, I y = 3 sin y, Iz = x + 2z.
It follows that (x, y, z) is a critical point of I if and only if x = z = 0 and
y = mr, where n is an integer. Let Xo = (0, 2k7r, 0). The Hessian of I at
Xo is given by

The eigenvalues of H(xo) are 3, 3, and 1. Since the eigenvalues are all
positive, it follows that H(xo) is positive definite and hence I has a local
minimum at Xo. On the other hand, at a critical point of the form Xl =
(0, (2k - 1)7r, 0), the Hessian will be

The eigenvalues of H(XI) are -3, 3, and 1. It follows that H(XI) is indefinite
and hence Xl is a saddle point of I. D

Problem 8.4 For each of the following functions, determine whether the given crit-
ical point corresponds to a local minimum, local maximum, or saddle point:
(1) f(x, y) = 3x2 - xy + y2 at (0, 0);
(2) f(x, y, z) = x 3 + xyz + y2 - 3x at (1, 0, 0).

Problem 8.5 Which of the following matrices are positive definite? negative defi-
nite? indefinite?

1 2 1
(1) [ 2 1 1
1, 2 0 0
(2) [ 0 5 3
1,
112 035
298 CHAPTER 8. QUADRATIC FORMS

8.5 Application: Quadratic optimization


One of the most important problems in applied mathematics is the optimiza-
tion (minimization or maximization) of a real-valued function f of n variables
subject to constraints on the variables. For example, when the function f is
a linear form subject to constraints in the form oflinear equalities and/or in-
equalities, the optimization problem is known as linear programming. Those
optimization problems are extensively used in the military, industrial, gov-
ernmental planning fields, among others.
In this section, we consider an optimization problem of a quadratic form
in n variables. If there are no constraints on the variables, then such an
optimization problem was discussed in Section 8.4.
As a quadratic optimization problem with constraints, we consider a
very special one: Finding the maximum and minimum values of a quadratic
form q(x) = x T Ax on the unit sphere. Advanced calculus tells us that such
extrema of q(x) on the unit sphere always exist.
Theorem 8.3 Let A be a symmetric n x n matrix whose eigenvalues are
Al 2 A2 2 ... 2 An in descending order. If x is constrained so that JJxll = 1
relative to the Euclidean inner product on ]Rn, then
(1) Al 2 x T Ax 2 An,
(2) x T Ax = A if x is an eigenvector of A belonging to an eigenvalue A.

Proof: (1) Since A is symmetric, there is an orthonormal basis a =


{VI, V2, ... , v n} for ]Rn consisting of eigenvectors of A belonging to the
eigenvalues AI, A2, ... , An, respectively. If ( , ) denotes the Euclidean inner
product, then any x in ]Rn may be expressed as

Thus,
Ax (x, V1)Av1 + (x, V2)Av2 + ... + (x, vn)Av n
Al (x, V1)V1 + A2(x, V2)V2 + ... + An(x, vn)vn.
If JJxJJ2 = (x, V1)2 + (x, V2)2 + ... + (x, v n)2 = 1, then we obtain
x T Ax = (x, Ax) Al (x, V1)2 + A2(x, V2)2 + ... + An (x, v n)2
< Al (x, V1)2 + Al (X, V2)2 + ... + Al (x, v n)2
Al ((x, V1)2 + (x, V2)2 + ... + (x, v n)2) ,
AI,
8.5. APPLICATION: QUADRATIC OPTIMIZATION 299

since Al is the largest eigenvalue. Similarly, one can show An ~ x T Ax.


(2) If x is an eigenvector of A belonging to A and Ilxll = 1, then

x T Ax = (x, Ax) = (x, AX) = A(X, x) = Allxl1 2 = A. o

It follows from the preceding theorem that, subject to the constraint

Ilxll = (x~ + x~ + ... + x~)~ = 1,


the quadratic form x T Ax has the maximum value Al (the largest eigenvalue)
and the minimum value An (the smallest eigenvalue). This is very important
in vibration problems ranging from aerodynamics to particle physics.

Corollary 8.4 Let A be a symmetric matrix. For any nonzero vector x E


~n,we get

where Al and An are the largest and the smallest eigenvalues of A, respec-
tively.

Example 8.9 Find the maximum and minimum values of the quadratic
form
x~ + x~ + 4X1X2
subject to the constraint x~ + x§ = 1, and determine values of Xl and X2 at
which the maximum and minimum occur.
Solution: The quadratic form can be written as

The eigenvalues of A are A = 3 and A = -1, which are the largest and
smallest eigenvalues, respectively. Their corresponding eigenvectors are

respectively. Note that those extreme values of the quadratic form occur at
those unit eigenvectors.
Thus, subject to the constraint x~ + x§ = 1, the maximum value of the
quadratic form is A = 3, which occurs at x = ±(1/v'2, 1/v'2) , and the
minimum value is A = -1, which occurs at x = ±(1/v'2, -1/v'2). 0
300 CHAPTER 8. QUADRATIC FORMS

Problem 8.6 Find the maximum and minimum values of the quadratic form
2xi + 2x~ + 3XIX2
subject to the constraint xi + x~ = 1, and determine values of Xl and X2 at which
the maximum and minimum occur.

Problem 8.7 Find the maximum and minimum of the following quadratic forms
subject to the constraint xi + x~ + x~ = 1 and determine the values of Xl, x2, and
X3 at which the maximum and minimum occur:
(1) xi + x~ + 2x~ - 2XIX2 + 4XIX3 + 4X2X3,
(2) 2xi + x~ + x~ + 2XIX3 + 2XIX2.

8.6 Definite forms


So far, we have seen that it is important to determine whether or not a
symmetric matrix A is positive definite. In most cases, the definition does
not help much. But we have seen that Theorem 8.2 gives us a practical
characterization of positive definite matrices: A is positive definite if and
only if all eigenvalues of A are positive. We will find some other practical
criteria in terms of the determinant of the matrix. For this, we again look at
the quadratic form in two variables, q( X, y) = ax 2 + 2bxy + cy2, which may
be rewritten in a complete square form as

q(x)=ax2+2bXy+cy2=a(x+~y)2 + (c- ~)y2.


We see that q is positive definite, i.e., q(x) = x T Ax > 0 for any nonzero
vector x = (x, y) E ]R2, if and only if a > 0 and ac > b2, or equivalently, the
determinants of

[aJ and [~ ~ 1
are positive.
The natural generalization of the above conditions will involve all n-
submatrices of A, called the principal submatrices of A, which are defined
as the upper left square submatrices
8.6. DEFINITE FORMS 301

With this construction, we have the following characterization of positive


definite matrices.

Theorem 8.5 The following are equivalent for a real symmetric matrix A:
(1) A is positive definite, i. e., x T Ax > 0 for all nonzero vector x;
(2) all the eigenvalues of A are positive;
(3) all the principal submatrices Ak'S have positive determinants;
(4) all the pivots (without row interchanges) are positive;
(5) there exists a nonsingular matrix W such that A = WTW.

Proof: (1) {::} (2) was shown.


(2) => (3) If A has positive eigenvalues AI, A2, ... , An, then det A =
AlA2··· An > o. To prove the same result for all the submatrices A k , we
show that if A is positive definite, so is every Ak. For each k = 1, ... , n,
consider all the vectors whose last n - k components are zero, say x
[Xl ... Xk 0 ... oV = [Xk oV, where Xk is any vector in ]Rk. Then

Thus x T Ax > 0 for all such nonzero xif and only if xr AkXk > 0 for all
nonzero Xk E ]Rk; that is, Ak'S are positive definite, all eigenvalues of Ak are
positive, and its determinant is positive.
(3) => (4) Recall that the symmetric matrix A can be factorized uniquely
into the form

where L is a lower triangular matrix with l's on its diagonal and D is the
diagonal matrix with the pivots dk of A on the diagonal. But the k-th pivot
dk is exactly the ratio of det Ak to det Ak-l :

dk = detA k .
detAk_l
Hence, all dk's are positive.
(4) => (5) Let A = LDLT as above with

o
D=
o
302 CHAPTER 8. QUADRATIC FORMS

Define
o
-ID=
o
Then, clearly det( VD) > 0, D = VDVD and (VD)T = VD. Hence,

A LDLT = (L-ID)(-IDLT) = (L-ID)(L-IDf


WTW,

where W = (LVDf, which is nonsingular since Land VD are.


(5) =} (1) If A is real symmetric and A = WTW, where W is nonsingular,
then for x i- 0 we have

because Wx i- o. o

Problem 8.8 State the corresponding conditions to the ones in Theorem 8.5 for the
negative definite forms.

Problem 8.9 Determine which one of the following matrices A and B is positive
definite. For the positive definite one, find a nonsingular matrix W such that it is

IVTIV. A ~ [=: ~~ =n' B ~ -~ -~


[ n
Problem 8.10 Let A be a positive definite matrix. Prove that C T AC is also positive
definite for any nonsingular matrix C.

We now consider semidefinite matrices. One can easily establish the


following analogous theorem.

Theorem 8.6 The following are equivalent for a real symmetric matrix A:
(1) A is positive semidefinite, i. e., x T Ax ~ 0 for all vectors x;
(2) all the eigenvalues of A are nonnegative;
(3) all the principal submatrices Ak'S have nonnegative determinants;
(4) all the pivots (without row exchanges) are nonnegative;
8.7. BILINEAR FORMS 303

(5) there exists a matrix W, possibly singular, such that A = WTW.


Problem [Link] State the corresponding conditions to the ones in Theorem 8.6 for
the negative semidefinite forms.
Problem B.12 Show that the determinant of a negative definite n x n symmetric
matrix is positive if n is even and negative if n is odd.

8.7 Bilinear forms


In the study of a system of linear equations, an essential thing is to know
which properties of a matrix remain unchanged (i.e., invariant) under the
elementary row operations (or Gauss-Jordan eliminations). For example,
the row space R(A), the null space N(A) and the rank of A are invariant
under the elementary row operations.
If we understand a matrix A as a linear transformation on a vector space,
a change of basis for the vector space to diagonalize A gives rise to a similarity
relation S-1 AS = D by some nonsingular matrix S, and we know that
similar matrices have the same eigenvalues and determinants.
However, as we mentioned in Section 8.1, the diagonalization of a sym-
metric matrix A, considered as a quadratic form on a vector space, can be
obtained by the conjugation pT AP = D by some invertible matrix P which
is also a change of basis (i.e., change of variables) for the vector space. In
this case there may be various matrices P that diagonalize A so that the
diagonal entries of D need not be eigenvalues of A and may vary depending
on P: that is, the eigenvalue is no longer invariant under the congruence
relation.
However, Sylvester's law of inertia says that the inertia or the numbers of
positive and negative signs of the diagonal entries are unchanged whatever P
is, that is, they are invariant under the congruence relation. In this section
we will prove this. For this, we extend the quadratic forms to more general
forms:

Definition 8.5 A bilinear form on a pair of vector spaces V and W is a


real-valued bilinear function b on V x W (b : V x W ~ JR) satisfying
b(kx + x', y) = kb(x, y) + b(x', y)
b(x, ky + y') = kb(x, y) + b(x, y')
for any x, x' in V, y, y' in Wand any scalar k. In particular, if V = W,
b : V x V ~ JR is called a bilinear form on V.
304 CHAPTER 8. QUADRATIC FORMS

Example 8.10 Let A be an m x n matrix and let b : JRm x JRn - 7 JR be


defined by b(x, y) = x T Ay for x E JR m , y E JR n . Then b is clearly a bilinear
form. In particular, if m = n and A = In, the identity matrix, then it shows
that the Euclidean inner product on JRn is a bilinear form. Generally, for any
inner product space V, the inner product b: V x V -7 JR, b(x, y) = (x, y),
is a bilinear form on V.

Example 8.11 Let V be a vector space and V* its dual vector space, that
is, V* = C(V; JR). Let b: V x V* - 7 JR be defined by

b(v, v*) = v*(v) for any v E V, v* E V*.

Then one can easily show that b is a bilinear form on the pair of vector
spaces V and V*. The reader should notice that the vector space operations
on the dual space V* are defined in such a way as to force the mapping b to
be a bilinear form.

Definition 8.6 A bilinear form b : V x W -7 JR on vector spaces V and W


is said to be nondegenerate if it satisfies
b( v, w) = 0 for all w E W implies v = 0, and
b( v, w) = 0 for all v E V implies w = o.
Note that b(O, w) = b(v, 0) = 0 for any v E V and w E W. Thus the
nondegeneracy condition asserts that the equation b(v, w) = 0 for all w
holds only when v = o.
Let b : V x W - 7 JR be a nondegenerate bilinear form. For a fixed w E W,
we define CPw : V - 7 JR by

CPw(v) = b(v, w) for v E V.

Then the bilinearity of b proves that CPw E V*, from which we obtain a
linear transformation

cP: W -7 V* defined by cp(w) = CPw.

Similarly, we can have a linear transformation 7jJ : V -7 W* defined by

7jJ(v)(w) = b(v, w) for v E V and w E W.

Theorem 8.7 If b : V x W - 7 JR is a non degenerate bilinear form, then the


linear transformations cP : W -7 V* and 7jJ : V -7 W* are isomorphisms.
8.7. BILINEAR FORMS 305

Proof: Suppose that 'Pw = 'Pw" Then, for all v E V,

b(v, w) = 'P(w)(v) = 'P(w/)(v) = b(v, w') or b(v, w - w') = o.


The nondegeneracy of b implies that w = w', that is, 'P is one-to-one. This
also implies that
dim W :s: dim V* .
A similar argument shows that the linear transformation 'ljJ V ---t W* is
also one-to-one, and therefore

dim V :s: dim W* .


Since dim V = dim V* and dim W = dim W* from Corollary 4.19, we have
dim V :s: dim W* = dim W :s: dim V* = dim V.
Therefore, 'P and 'ljJ are surjective, and so isomorphisms. o

Corollary 8.8 If there exists a nondegenerate bilinear form b : V x W ---t JR,


then dim V = dim W.

Let b : V x V JR be a bilinear form on a vector space V, and let


---t

a = {VI, V2, ... , V n} be a basis for V. Such a bilinear form is completely


determined by the values b(Vi' Vj) of the vectors Vi, Vj in the basis a. In
fact, if
x = Xl VI + X2V2 + + XnV n ,
Y = YI VI + Y2 V 2 + + YnVn

are vectors in V, then


n
b(x, y) = L xiyjb(Vi, Vj) = [x]rA[Y]a,
i,j=l

where A = [aij], aij = b(Vi' Vj) is called the matrix representation of b


with respect to a basis a. We write A = [b]a. Let (3 be another basis for
the vector space V and let P = [I d]3 be the transition matrix from (3 to a.
Then we get
P[x]t3 = [Id]3[x]t3 = [x]a
and
306 CHAPTER 8. QUADRATIC FORMS

for any x and y in V. Thus, two matrix representations of a bilinear form b


with respect to different bases are congruent, and conversely any two congru-
ent matrices can be matrix representations of the same bilinear form (verify
it). Note that congruent matrices have the same rank because P and pT
are nonsingular.
The rank of a bilinear form b : V x V --+ IR on a vector space V, written
rank(b), is defined as the rank of any matrix representation of b.

Theorem 8.9 A bilinear form b : V x V --+ IR on a vector space V is


nondegenerate if and only if rank ( b) = dim V.

Proof: Since every n-dimensional vector space V is isomorphic to IRn and


congruent matrices have the same rank, we can assume that V = IR n and
A = [bl a is the matrix representation of a bilinear form b : IR n x IR n --+ IR
with respect to the standard basis 0: = {el' e2, ... , en} for IRn. Then we
have b(u, v) = u T Av for any u, v E IRn.
Suppose that rank(b) = rankA < n. Then the homogeneous system
Ax = 0 has a nontrivial solution, say v, and then b(u, v) = u T Av = 0 for
any u E IR n , but v i= O. It implies that b is degenerate.
Now, let's assume that rank(b) = rankA = nand b(u, v) = u T Av = 0
for all u E IRn. By taking the basic vectors el, e2, ... , en instead of u, we
can see that Av = O. The condition rankA = n implies that v = O. A similar
method with the equation b(u, v) = u T Av = (u T Av)T = v T AT u verifies
that b(u, v) = 0 for all v E IR n implies u = O. Hence, b is nondegenerate. 0

Example 8.12 Let b : 1R2 x 1R2 --+ IR be defined by b(x, y) = XIYI + 3XIY2 +
2X2YI - X2Y2 with respect to the standard basis 0: = {el' e2}. Then b is
clearly a bilinear form, and the matrix representation of b with respect to 0:
is

[bl a = [~ - ~ ].
If (3 = {VI = (1, 0), V2 = (1, I)} is another basis for 1R2, then the matrix
representation becomes

[bl~ = [~ :],

because b(VI' VI) = 1, b(VI' V2) = 4, b(V2, VI) = 3 and b(V2, V2) = 5.
Since rank[bl a = rank[bl~ = 2, the bilinear form b is nondegenerate.
8.7. BILINEAR FORMS 307

Problem 8.13 (1) Let b : lR3 x lR3 -+ lR be defined by


b(x, y) = XIYI - 2XIY2 + X2YI - X3Y3

with respect to the standard basis. Is this a bilinear form? If so, find the matrix
representation of b with respect to the basis
Q = {VI = (1, 0, 1), V2 = (1, 0, -1), V3 = (0, 1, On.
Find its rank.
(2) Let V = M 2x2 (lR) be the vector space of2 x 2 matrices, and let b : V x V -+ lR
be defined by b( A, B) = tr( A) . tr( B). Is this a bilinear form? If so, find the matrix
representation of b with respect to the basis

Q = { EI = [~ ~], E2 = [~ ~], E3 = [~ ~], E4 = [~ ~]}.


Find its rank.

Definition 8.7 A bilinear form b on a vector space V is said to be sym-


metric if b(x, y) = b(y, x) for any x, y E V. It is skew-symmetric if
b(x, y) = -b(y, x) for any x, y E V.

One can easily see that a bilinear form is symmetric (or skew-symmetric)
if and only if its matrix representation is symmetric (or skew-symmetric) for
any basis. As an example, an inner product on a vector space V is just a
symmetric, nondegenerate bilinear form on V.
A bilinear form b on V is diagonalizable if there exists a basis a for V
such that the matrix representation [b]a of b with respect to a is diagonal.

Theorem 8.10 A bilinear form b on a vector space V is symmetric if and


only if it is diagonalizable.

Proof: Since every symmetric matrix is orthogonally diagonalizable, we


only need to prove the sufficiency. Let a bilinear form b be diagonalizable
and let a be a basis for V such that the matrix representation [b]a is diago-
nal. Then for any basis {3 for V, the matrix representation [b]a and [b]!3 are
congruent, say [b]!3 = pT[b]aP for some invertible matrix P. Since [b]a is
diagonal, we have

i. e., [b]!3 is symmetric and the bilinear form b is symmetric. o


308 CHAPTER 8. QUADRATIC FORMS

Example 8.13 Let b : lR3 x lR3 ---t lR be the bilinear form defined by

b(x, y) = XIY3 - 2X2Y2 + 2X2Y3 + X3Yl + 2X3Y2 - X3Y3.

Then clearly b(x, y) = b(y, x), and the matrix representation of b with
respect to the standard basis Q = {el' e2, e3} is

[b]a = [ 0
0
-2 2
OIl ,
1 2-1
which is symmetric. Hence, the bilinear form b is symmetric. By Theo-
rem 8.10, it is diagonalizable. In fact,

[0o -20
n
1 I1 0
[[b]a I I] 210 1
1 2 -1 10 0

--+
[-1o 0 2
010 0
010 1 -: ] ~ ]D] Pl.
o 0 -1 I 1 2 -1
By a direct computation, one can easily show that pT[b]aP = D. 0

A skew-symmetric matrix is not diagonalizable in general, but the fol-


lowing theorem shows the structure of a skew-symmetric bilinear form. Note
that a bilinear form b is skew-symmetric if and only if b(x, x) = 0 for any x
in V.

Theorem 8.11 Let b : V x V ---t lR be a skew-symmetric bilinear form. Then

there exists a basis Q for V with respect to which the matrix representation
[b]a is of the form

[-~ ~ 1
[ -~ ~ 1
o

o
8.7. BILINEAR FORMS 309

Proof: If b = 0, then [bJa is the zero matrix. Also if dim V = 1, then


b(x, x) = 0 for any basis vector x in V, so b = o.
Now, we assume that b i= 0 and prove it by induction on dim V. Since
b i= 0, there exist nonzero vectors x and y in V such that b(x, y) i= o. By the
bilinearity of b, we can assume that b(x, y) = 1. Such vectors x and y must
be linearly independent, because if y = kx, then b(x, y) = kb(x, x) = o.
Let U be the subspace of V spanned by x and y and let

w= {v E V : b(v, u) = 0 for any u E U}.


Then one can easily show that W is also a subspace of V and Un W = {O}
(see Problem 8.14 below). Moreover, U + W = V. In fact for a given vector
v E V, let u = b(v, y)x - b(v, x)y. It is easy to show that u E U and
v - u E W. Thus V = U E8 W, where dim W = n - 2. Clearly, the matrix
representation of the restriction of b to U with respect to the basis {x, y} is

[_~ ~ l' and the restriction of b to W is also skew-symmetric. The same


argument is applied to W, and the theorem is proved by induction. 0

Problem 8.14 Prove that Un W = {O} in the proof of Theorem 8.1l.


Example 8.14 Let b : lR 3 x lR 3 -. lR be the bilinear form defined by

b(x, y) = XlY2 - X2Yl + X3Yl - XlY3 + X2Y3 - X3Y2·

Then clearly b(x, y) = -b(y, x), and the matrix representation of b with
respect to the standard basis a = {el' e2, e3} is

[bJa = [ -1
0 1 -1
0 1
1,
1 -1 0

which is skew-symmetric. By a simple computation, b( el, e2) = 1 =


-b(e2, el). Let U be the subspace of lR3 spanned by el and e2, i.e., the
xy-plane. If we set W = {v E V : b(v, u) = 0 for any u E U}, then
W = {Az : A E lR} where z = (1,1,1). Clearly, (3 = {el' e2, z} is a basis
for lR 3 and b(z, z) = 0 so that

[b]p =[ -~ H1 o
310 CHAPTER 8. QUADRATIC FORMS

Problem 8.15 Show that any bilinear form b on a vector space V is the sum of a
symmetric bilinear form and a skew-symmetric bilinear form.

The following theorem shows how quadratic forms and bilinear forms are
related.

Theorem 8.12 If b is a symmetric bilinear form on ~n, then the function


q(x) = b(x, x), for x E ~n, is a quadratic form.
Conversely, for every quadratic form q, there is a unique symmetric bi-
linear form b such that q(x) = b(x, x) for all x in ~n.

Proof: If b(x, y) = x T Ay is a symmetric bilinear form, then q(x) =


b(x, x) = x T Ax is clearly a quadratic form.
Conversely, if b is a symmetric bilinear form, then

b(x + y, x + y) = b(x, x) + 2b(x, y) + b(y, y).


Hence, for a given quadratic form q, a bilinear form b can be defined by
1
b(x, y) = 2" [q(x + y) - q(x) - q(y)J.

This form b is clearly symmetric, bilinear and b(x, x) = q(x). The unique-
ness also comes from this relation. D

Recall that any two matrix representations of a quadratic form or a


symmetric bilinear form are congruent, and they can be diagonalized. But
their congruent diagonal matrices may have different diagonal entries (see
Remark (1) on page 289). Although these entries are not unique, the number
of positive entries and the number of negative entries are invariant, i. e.,
independent of the choice of diagonal representation. This result is called
Sylvester's law of inertia.

Theorem 8.13 (Sylvester's law of inertia) Let b be a symmetric bilin-


ear form on a vector space V. Then the number of positive diagonal entries
and the number of negative diagonal entries of any diagonal representation
of b are both independent of the diagonal representation.

Proof: Let Q = {XI, ... , x P ' X p +l, ... , x n } be an ordered basis for V in
which
b(Xi' Xi) > 0 for i = 1, 2, ... , p, and
b(Xi' Xi) :::; 0 for i = p + 1, ... , n,
8.7. BILINEAR FORMS 311

and let (3 = {YI, ... , Yp', Yp'+l, ... , Yn} be another ordered basis for V
in which
b(Yi' Yi) > 0 for i = 1, 2, ... , p', and
b(Yi' Yi) :S 0 for i = p' + 1, ... , n.
To show p = P:, let U and W be subspaces V spanned by {Xl, ... , xp} and
{Yp'+l, ... , Yn}, respectively. Then b(u, u) > 0 for any nonzero vector
u E U and b(w, w) :S 0 for any nonzero vector wE W. Thus Un W = {O},
and

dim(U + W) = dim U + dim W - dim(U n W) = p + (n - p') :S n,


or p :S p'. Similarly, we can show p' :S p to conclude p = p'. Therefore, any
two diagonal matrix representations of b have the same number of positive
diagonal entries. By considering the bilinear form -b instead of b, we can
also have that any two diagonal matrix representations of b have the same
number of negative diagonal entries. 0

Corollary 8.14 Let A be a symmetric matrix. If B = pT AP for some


invertible matrix P, then A and B have the same number of positive diagonal
entries, the same number of negative diagonal entries and the same number
of zero diagonal entries.

Definition 8.8 Let A be a real symmetric matrix. The number of positive


eigenvalues of A is called the index of A. The difference between the number
of positive eigenvalues and the number of negative eigenvalues of A is called
the signature of A.

Hence, the index and signature together with the rank of a symmetric
matrix are invariants under the congruence relation, and any two of these
invariants determine the third, by noting that

the number of positive eigenvalues = the index,


the index + the number of negative eigenvalues = the rank,
the index - the number of negative eigenvalues = the signature.

We have shown the necessary condition of the following corollary.

Corollary 8.15 Two symmetric (square) matrices are congruent if and only
if they have the same invariants: index, signature and rank.
312 CHAPTER 8. QUADRATIC FORMS

Proof: Suppose that two symmetric matrices A and B have the same in-
variants, and let D and E be diagonal matrices congruent to A and B,
respectively. Without loss of generality, we may choose D and E so that
the diagonal entries are in the order of positive, negative and zero. Let p
and r denote the index and the rank, respectively, of both D and E. Let
di denote the i-th diagonal entry of D. Define the diagonal matrix Q whose
i-th diagonal entry qi is given by
ifl:S;i:S;p
ifp<i:S;r
if r < i :s; n.
Then
o
-Ir-p
o
Hence, A is congruent to Jpr, and so is B, i.e., A is congruent to B. 0

Example 8.15 Determine the index, the signature and the rank for each
of the following matrices.

A= [ 121 310 632] , B= [~ oo 0]


4 0
5
,

Which are congruent each other?

n
Solution: In Example 8.5, we saw that the matrix A can be diagonalized
to

D= [~ ~[
Therefore, A has rank 3, index 2 and signature 1. The matrix B is already
diagonal, and has rank 3, index 3 and signature O. Using the method of
Example 8.5, one can show that C is congruent to the diagonal matrix with
diagonal entries 1, 1, -4. Therefore, C has rank 3, index 2 and signature l.
(Note that it is not necessary to find the eigenvalue of C to diagonalize it
orthogonally.) We conclude that A and C are congruent and B is congruent
to neither A nor C by Corollary 8.15. 0
8.8. EXERCISES 313

Problem 8.16 Prove that if the diagonal entries of a diagonal matrix are permuted,
then the resulting diagonal matrix is congruent to the original one.

Problem 8.17 Prove that the total number of distinct equivalence classes of congru-
ent n x n real symmetric matrices is equal to ~(n + l)(n + 2).

Problem 8.18 Find the signature, the index and the rank of each of the following
matrices.

0 1 2]
(1) [ 1 -2 3 ,
2 3 4
Which are congruent each other?

8.8 Exercises
8.1. Find the matrix representing each of the following quadratic forms:
(1) xI + 4XIX2 + 3x~,
(2) xI - x~ + x~ + 4XIX3 - 5X2X3,
(3) xI - 2x~ - 3x~ + 4XIX2 + 6XIX3 - 8X2X3,
(4) 3XIYI - 2XIY2 + 5X2YI + 7X2Y2 - 8X2Y3 + 4X3Y2 - X3Y3,

(5) [Xl X2] [~ ~] [ ~~ ].

8.2. Let q be a quadratic form on ]R3 and let A = [ 7 4 -5]


4
-5
- 2
4
4
7
be the matrix

representing q with respect to the basis


0: = {(I, 0, 1), (1, 1, 0), (0, 0, I)}.
(1) Diagonalize A, i.e., find an orthogonal matrix P so that p T AP is a
diagonal matrix.
(2) Construct a basis {3 for ]R3 such that the elements of (3 are the principal
axes of the quadratic surface q(x) = O.

8.3. Sketch the graph of each of the following quadratic equations:


(1) xy=2,
(2) 53x2 - 72xy + 32y 2 = 80,
(3) 16x2 - 24xy + 9y2 - 60x - 80y + 100 = O.
8.4. For a positive definite quadratic form q(x) = ax2 + 2bxy + cy2, the curve
q(x) = 1 is an ellipse. When a = c = 2 and b = -1, sketch the ellipse.
314 CHAPTER 8. QUADRATIC FORMS

8.5. Determine whether each of the following matrices takes a local minimum,
local maximum or saddle point at the given point:
(1) f(x, y) = -1 + 4(e X - x) - 5xsiny + 6y2 at the point (x, y) = (0, 0);
(2) f(x, y) = (x 2 - 2x) cosy at (x, y) = (1, 11").
8.6. Show that the quadratic form q(x) = 2x2 + 4xy + y2 has a saddle point at
the origin, despite the fact that its coefficients are positive. Show that q can
be written as the difference of two perfect squares.
8.7. Find the maximum and the minimum values of the function

R(x) = xTTAx, x =I- 0,


x x

n
which is called the Rayleigh quotient of A, when

(1) A ~ U~ ~ (2) A ~ -~
[ -n]
8.8. Determine whether or not each of the following matrices is positive definite:

(1) A = [-i -~ =~],


-1 -1 2
(2) A =
1 0 1
[~ ~ ~]
Use the decomposition A = LDLT to write x T Ax as the sum of squares.
8.9. Show that if A and B are both positive definite, so are A 2 , A- 1 and A + B.
8.10. Prove that if A and B are symmetric and positive definite, so is A2 + B- 1 .
8.11. Find a substitution x = Qy that diagonalizes each of the following quadratic
forms, where Q is orthogonal. Also, classify the form as positive definite,
positive semidefinite, and so on.
(1) q(x) = 2x2 + 6xy + 2y2.
(2) q(x) = x 2 + y2 + Z2 + 2(xy + xz + yz).
8.12. For a given quadratic equation ax 2 + 2bxy + cy2 + dx + ey + f = 0 with b =I- 0,
classify the conic section according to the various possible cases of a, b, and
c (see Example 8.4).
8.13. Find the eigenvalues of the following matrices and the maximum value of the
associated quadratic forms on the unit sphere.

(1) [-~ -~ ~
o 1 -1
], (2) [-i -~
0 1-2
~] -2
3

o 0 5
-~ ~].
8.14. Let b be a bilinear form on IR.2 defined by
b(x, y) = 2X1Y1 - 3X1Y2 + X2Y2.
S.S. EXERCISES 315

(1) Find the matrix A of b with respect to the basis a = {(I, 0), (1, I)}.
(2) Find the matrix B of b with respect to the basis f3 = {(2, 1), (1, -I)}.
(3) Find the transition matrix Q from the basis f3 to the basis a and verify
that B = QT AQ.
8.15. Which of the following functions bon jR2 are of bilinear form?

mgt~: ~~ ~ ~XI + yd +
(3) b(x, y) = (Xl
- YI)2
2 -
X2Y2
(Xl - YI)2
(4) b(x, y) = XIY2 - X2YI

8.16. For a bilinear form on jR2 defined by b(x, y) = XIYI + X2Y2, find the matrix
representation of b with respect to each of the following bases:
a = {(I, 0), (0, I)}, f3 = {(I, -1), (1, I)}, I = {(I, 2), (3, 4)}.

8.17. Which one of the following bilinear forms on jR3 are symmetric or skew-
symmetric? For each symmetric one, find its matrix representation of the
diagonal form, and for each skew-symmetric one, find its matrix representa-
tion of the block form in Theorem [Link].
(1) b(x, y) = XIY3 + X3YI
(2) b(x, y) = XIYI + 2XIY3 + 2X3YI - X2Y2
(3) b(x, y) = XIY2 + 2XIY3 - X2Y3 - X2YI - 2X3YI + X3Y2
(4) b(x, y) = L:;,j=l(i - j)XiYj·
8.18. Find the signature, index and rank of each of the following symmetric matri-

1 32] , ~ ].
-3
-1 2
3 4 1 -6
8.19. Determine whether the following statements are true or false, in general, and
justify your answers.
(1) For any quadratic form q on jRn, there exists a basis a for jRn with
respect to which the matrix representation of q is diagonal.
(2) Any two matrix representations of a quadratic form have the same in-
ertia.
(3) The sum of two bilinear forms on V is also a bilinear form.
(4) If A is a real symmetric positive definite matrix, then the solution set
of x T Ax = 1 is an ellipsoid.
(5) For any nontrivial bilinear form b i- 0 on V, if b(v, v) = 0, then v = O.
(6) Any symmetric matrix is congruent to a diagonal matrix.
(7) Any two congruent matrices have the same eigenvalues.
(S) Any two congruent matrices have the same determinant.
(9) Any matrix representation of a bilinear form is diagonalizable.
(10) If a real symmetric matrix A is both positive semidefinite and negative
semidefinite, then A must be the zero matrix.
(11) Any two similar real symmetric matrices have the same signature.
Chapter 9

Jordan Canonical Forms

9.1 Introduction
Recall that an n x n matrix A is diagonalizable if and only if A has a full set
of n linearly independent eigenvectors. In particular, if A is normal, then
A can be diagonalized by a unitary matrix U whose column vectors are the
orthonormal eigenvectors of A. There are some nonnormal matrices that
are still diagonalizable. Of course, in this case the transition matrix need
not be unitary. If a matrix A is diagonalizable, then the dimension of each
eigenspace E()") = N()"I - A) is equal to the multiplicity of the eigenvalue
)... Therefore, if )..1, ... , )..e are distinct eigenvalues of A with multiplicities
mAl' ... , mAl' respectively, then

and hence,

In some cases, a matrix may not have a full set of linearly independent
eigenvectors. That is, a matrix A may have an eigenvalue).. with multiplicity
m A > 1, but the number of linearly independent eigenvectors belonging to
).. could be less than m A , so

1 ~ dimE()..) < mAO

This means that A does not have enough eigenvectors belonging to ).., hence
it is impossible to find a transition matrix Q such that Q-l AQ = D is a
diagonal matrix. In this case, it wouldn't be easy, for example, to compute

317
318 CHAPTER 9. JORDAN CANONICAL FORMS

the exponential matrix eA, so the general solution x( t) = etAc of the system
x'(t) = Ax(t) of linear differential equations may not be easily found.
However, we show in this section that even a nondiagonalizable matrix
is similar to a matrix very "close" to a diagonal matrix, called a Jordan
canonical form. In this case, the columns of a transition matrix Q are
something similar to eigenvectors, but not quite. They are called generalized
eigenvectors. Using this Jordan canonical form of A, the computation of e A
could be easier.
Recall that if A is a diagonalizable matrix, then the general solution of
a system of linear differential equations x'(t) = Ax(t) is given as
x(t) = etAxo = cleAltul + c2eA2tu2 + ... + cneAntun,
where the vectors u/s are linearly independent eigenvectors of A belonging
to the eigenvalues 'xi'S (see Section 6.4). This solution is not valid if A is not
diagonalizable, since we do not know how to compute etA. The following
example will illustrate how to handle these cases for n = 2 and 3 by solving
a system x'(t) = Ax(t) for an arbitrary matrix A. If the reader is not
comfortable with systems of linear differential equations, the next example
may be skipped, but it will be very helpful to understand the most essential
features of a Jordan-canonical form.
Example 9.1 (1) Let x'(t) = Ax(t) be a system of linear differential equa-
tions, and let A be a 2 x 2 matrix with an eigenvalue ,x of multiplicity 2.
If dimE('x) = 2, then one can find a basis {UI, U2} of E('x); then
Xi = eAtui' i = 1,2, are linearly independent solutions of the system. Thus
the general solution is
x(t) = CIXI + C2X2 = eAt(quI + C2U2),
where CI, C2 are arbitrary constants. Note that A is diagonalized by them.
Suppose dim E('x) = 1. Then with a basis U of E('x) one obtains only
one solution Xl (t) = eAtu. To get the general solution of the system, we need
one more solution linearly independent to XI(t). Motivated by the type of
solutions in Example 6.17, we assume that the second solution is of the form
x(t) = teAtv + eAtw,
where the vectors v and ware to be determined. As a solution, this vector
function should satisfy the equation x' (t) = Ax( t). Thus for all t
x' (t) teAt,Xv + eAtv + 'xeAtw,
Ax(t) teAt Av + eAt Aw.
9.1. INTRODUCTION 319

By comparing the coefficient vectors of teAt and eAt, we obtain two equations:

Av = AV, or (A - AI)V = 0,
AW=V+AW, or (A - AI)W = v.

The first equation shows that v is an eigenvector of A belonging to A, so we


may take v = u. From the second equation, one can find a solution W so
that one always obtains a second solution X2(t) = teAtu + eAtw. In fact, the
vector w, which is a nonzero solution of (A - AI?w = (A - AI)V = 0, is
a generalized eigenvector of A. It is also known that the vectors v, ware
linearly independent (see Theorem 9.2 below). Thus the general solution is
of the form

x(t) + C2X2(t)
ClXl(t)

eAt«cl + c2t)V + C2W).

Now let Q = [v w]. Then

AQ = A[uw] [Au Aw] = [AU U + AW]


[u w] [~ ~] = QJ,

where J = [~ ~]. Thus Q-l AQ = 1.


(2) Let A be a 3 x 3 square matrix with an eigenvalue A of multiplicity 3.
Then three cases are possible: There are either 3, 2 or 1 linearly independent
eigenvectors of A. We consider each case separately.
(i) Suppose that dimE(A) = 3, and let Ul, U2, U3 be three linearly
independent eigenvectors of A. Then Xi(t) = eAtui' i = 1,2,3, are three
linearly independent solutions of x'(t) = Ax(t), so the general solution is
given as

where Cl, C2 and C3 are arbitrary constants. In this case, the matrix Q =
[Ul U2 U3] diagonalizes A as usual:

J
320 CHAPTER 9. JORDAN CANONICAL FORMS

(ii) Suppose that dimE(.A) = 2. For any nonzero vector U E E(.A)


x(t) = e'~tu is a solution of x/(t) = Ax(t). Hence, one can always find two
linearly independent solutions of the form Xi(t) = eAtui' where {Ul' U2} is
a basis for E (.A), since dim E (.A) = 2. From experience, the third solution is
supposed to be of the form

for some vectors v, w, which are to be determined. Simple substitution of


this equation into the original equation x' (t) = Ax( t) gives

Av = .Av, or (A- .AJ)v = 0,


Aw=v+.Aw, or (A - .AI)w = v.

The first equation means that v is an eigenvector of A in E(.A). Thus v =


CIUl + C2U2 for some constants q, C2. These constants are chosen in such a
way that the second equation is consistent, which is always possible. After
that we can get threelinearly independent solutions. However, in this case,
the first two solutions may be replaced by others; once vectors v and ware
determined, choose an eigenvector U in E(.A) so that it is linearly independent
to v (see Theorem 9.4 below for a reason). Then,

form a fundamental set of solutions. Thus the general solution is given as

x(t) CIXl(t) + C2X2(t) + C3X3(t)


eAt(clu + (C2 + C3t)V + C3 W ).

Now, if we set Q = [u v w], then

n
AQ = A[u v wl = [Au Av Awl [.[Link] v + .Awl

lu v wi [~ ~ QJ,
9.1. INTRODUCTION 321

(iii) Finally, suppose that dim E (A) = 1, and let u be a basis for E (A) so
that we get only one solution Xl(t) = e'~tu. Then, by experience, the second
and third solutions are supposed to be of the form

By substituting these equations into the equation x'(t) = Ax(t), one can
obtain
Az = W+ AZ, or (A - AI)z = w,
AW=U+AW, or (A - AI)w = (A - AI)2z = u,
Au = AV, or (A - AI)u = (A - AI)3z = o.
It can be shown that the solution vectors v (= u), w, and z are linearly
independent (see Theorem 9.2 below), which are so-called generalized eigen-
vectors of A. They give us three linearly independent solutions Xi'S. Thus
the general solution is given as

Set Q = [u W z]. Then

Hl
AQ = A[uwz] [Au Aw Az] [AU u + AW W + AZ]

= [u wz[ [ ~ = QJ,

A 1
where J = [ 0 A
o 0
!l Thus Q-'AQ = J. o

The matrix J in each of the above cases is called the Jordan canonical
form of the matrix A. Note that in each case in the example, J can be divided
into smaller submatrices, called Jordan blocks: For instance, in case (ii) of

(2), J can be written as [* J 2


] with two Jordan blocks J 1 = [A] of order
322 CHAPTER 9. JORDAN CANONICAL FORMS

1 and J 2 = [~ ~ 1of order 2. Since etJ1 = eAt and eth = eAt [~ ~ l'
tJ1
Qe tJ Q -1 Xo = Q [e 0 0
etJ2 1Q -1 Xo

eAt (c1U + (C2 + C3t)V + C3 W ),


where (C1' C2, C3) = Q-1 xO with arbitrary constants Ci'S.
Observe that the number of Jordan blocks in each Jordan canonical ma-
trix J is equal to the number of linearly independent eigenvectors. The
column vectors of the transition matrix Q are called generalized eigenvectors
of A belonging to the eigenvalue A. For a more precise definition, refer to
Definition 9.2 below.
This example makes the following theorem quite convincing, the proof
of which may be found in some advanced linear algebra books.

Theorem 9.1 If a square matrix A of order n has s linearly independent


eigenvectors, then it is similar to a matrix J of the following form, called
the Jordan canonical form,

in which each J i , called a Jordan block, is a triangular matrix of the form

where Ai is a single eigenvalue of A and s is the number of linearly indepen-


dent eigenvectors of A.

Note that the same eigenvalue Ai may appear in several blocks, if it


has more than one linearly independent eigenvector. In particular, if A
has a full set of n linearly independent eigenvectors, then there have to
be n Jordan blocks so that each Jordan block is just a 1 x 1 matrix, and
9.1. INTRODUCTION 323

the corresponding Jordan canonical form is just the diagonal matrix with
eigenvalues on the diagonal. Hence, a diagonal matrix is a particular case of
the Jordan canonical form.
Actually, the Jordan canonical form of a matrix can be completely de-
termined by the multiplicities of the eigenvalues and the number of linearly
independent eigenvectors in each of the eigenspaces without knowing the
transition matrix Q as shown in the following example.

Example 9.2 Suppose that a 5 x 5 matrix A has an eigenvalue A of multi-


plicity 5. Then seven Jordan canonical forms are possible, as follows.
(1) Suppose A has only one linearly independent eigenvector belonging
to A. Then the Jordan canonical form of A is of the form
A 1 000
o A 1 0 0
J= 0 0 A 1 0
o 0 0 A 1
o 0 0 0 A

which consists of only one Jordan block with eigenvalue A on the diagonal.
(2) Suppose it has two linearly independent eigenvectors belonging to A.
Then the Jordan canonical form of A is either one of the forms

A 1 A
0 A A 1 0 0
J= A 1 0 or J= 0 A 1 0
0 A 1 0 0 A 1
0 0 A 0 0 0 A

each of which consists of two Jordan blocks with eigenvalue A on the diagonal.
(3) Suppose it has three linearly independent eigenvectors belonging to
A. Then the Jordan canonical form of A is either one of the forms

A A
A 1 A
J= 0 A or J= A 1 0
A 1 0 A 1
0 A 0 0 A

each of which consists of three Jordan blocks with eigenvalue A on the diag-
onal.
324 CHAPTER 9. JORDAN CANONICAL FORMS

(4) Suppose it has four linearly independent eigenvectors belonging to A.


Then the Jordan canonical form of A is of the form

A 1
o A
which consists of four Jordan blocks with eigenvalue A on the diagonal.
(5) Suppose it has five linearly independent eigenvectors belonging to A.
Then the Jordan canonical form of A is of the form

which is just the diagonal matrix, that is, the Jordan canonical form of A
coincides with the diagonalizability.
Note that in cases (2) and (3), the problem of choosing one of the two
possible Jordan canonical forms that is similar to the given matrix A depends
on the nature of the eigenvectors of A and will be discussed in the following
section. D

Example 9.3 Let J be a matrix of a Jordan canonical form:

J=

[0]

(1) Find all possible forms of the matrix A that can be similar to J, i.e.,
Q-1 AQ = J for some invertible matrix Q.
(2) Find an invertible matrix Q such that Q-1 AQ = J.

Solution: Since J is an upper triangular matrix, the eigenvalues of J


are the diagonal entries 6 and 0 with multiplicities 2 and 3, respectively.
The eigenspace E(6) has a single linearly independent eigenvector e1 =
9.1. INTRODUCTION 325

(1, 0, 0, 0, 0) so that dim E(6) = 1. Thus .A = 6 appears only in a single


block J I . The eigenspace E(O) has two linearly independent eigenvectors e3
and e5 so that dim E(O) = 2, and .A = 0 appears in two blocks hand h.
Set Q = [Xl X2 ... X5] and rewrite Q-I AQ = J as AQ = QJ. Then

6 1
o 6
A Xl X2 ... X5 Xl X2 ... X5 o 1
o 0
o
Hence,

[AXI AX2 AX3 AX4 AX5] = [6XI Xl + 6X2 OX3 X3 OX5], or


AXI = 6XI, AX2 = 6X2 + Xl, AX3 = OX3,Ax4 = OX4 + X3, AX5 = OX5.
Thus, the matrix A has three eigenvectors Xl, X3, X5, just as J has. The
vector Xl belonging to .A = 6 is in the first column of Q as el is in the first
column of J. The two vectors X3 and X5 belonging to .A = 0 are placed in the
third and fifth columns of Q. The two vectors X2, X4 are not eigenvectors,
but they satisfy the equations (A - 6I)x2 = Xl and (A - OJ)X4 = X3, which
follow from the second and fourth equations. Then one can easily see that
they further satisfy

These "special" vectors, X2 and X4, fill up the deficient eigenvectors that
the eigenvalues 6 and 0 are lacking, respectively, and are called generalized
eigenvectors.
In summary, if a 5 x 5 matrix A is similar to the Jordan canonical form J,
then A should have eigenvalues 6 and 0 of multiplicities 2 and 3 respectively,
but only one linearly independent eigenvector, say Xl, belonging to 6 and
only two linearly independent eigenvectors, say X3, X5, belonging to O. For
such a matrix A, the transition matrix Q can be made by Xl, X2, ... , X5,
where X2, X4 are nonzero vectors satisfying the following equations:

but
(A - 6I)X2 =I- 0, and (A - 0I)x4 =I- O. o
326 CHAPTER 9. JORDAN CANONICAL FORMS

Example 9.4 Solve a system of linear differential equations x'(t) = Ax(t),


where

A= [
-4
~ =~3 =~].
3

Solution: (1) The eigenvalue of A is A = 1 of multiplicity 3.


(2) The eigenvectors are solutions of

(A - I)x

The three equations are identical, so we get two linearly independent eigen-
vectors Ul = (1,0,2) and U2 = (0,2, -3). Thus Xi(t) = etui' i = 1,2, are
two linearly independent solutions.
(3) For the third solution, we set x(t) = tetv + etw, where v and ware
supposed to satisfy (A - I)v = °
and (A - I)w = v. The first equation
means v is an eigenvector of A, so one can write

The second equation now is written as

(A - I)w = [
-4
: =: =~] [~] [
3 2 Z 2Cl -
2~~ 1
3C2
= v.

This system has a solution (or, is consistent) if and only if Cl = C2. By


choosing Cl = C2 = 2, we get a solution w = (0,0, -1), and v = (2,4, -2).
Since U = UI = (1,0,2) is already linearly independent to both v and w, we
obtain three new linearly independent solutions Xl (t) = etu, X2 (t) = etv,
and X3(t) = tetv + etw. Thus a general solution is

x(t) CIXI + C2X2 + C3X3


et (CIU + (C2 + C3t)V + C3W).
9.2. GENERALIZED EIGENVECTORS 327

~ 1'
1 2
(4) Note that for Q = [u v w 1= [ 0 4 we get
2 -2 -1

J=
[ o~ ~ ~ 1
0 1
= [Jl
0 h
0 l'
or A = QJQ-l. Now the general solution may also be computed as

= Q[ e~1

~ e'lu v tv+w] [ ~: 1
et (C1U + C2V + C3(tv + w))
= et (C1U + (C2 + C3t)V + w),

where Q-lxQ = (Cl' C2, C3), since etJ1 = e t and etJ2 = et [ ~ ~ 1by Exam-
pIe 6.17. This coincides with the result in (3). o
Problem 9.1 Let A be a 5 x 5 matrix with two distinct eigenvalues A of multiplicity
3 and p, of multiplicity 2. Find all possible Jordan canonical forms of A up to
permutations of the Jordan blocks.

9.2 Generalized eigenvectors


In this section, we discuss a theoretical basis for using those generalized
eigenvectors to produce an invertible transition matrix Q that transforms
the given matrix A into the Jordan canonical form, and Examples 9.5 and
9.6 show a practical method of finding the transition matrices. However, at
the instructor's discretion, the theoretical argument in the first part of this
section may be skipped.
Consider the columns of the transition matrix Q such that Q-l AQ = J.
By comparing the columns of the equation AQ = QJ, one can easily see
328 CHAPTER 9. JORDAN CANONICAL FORMS

that those corresponding to the first columns of each of the Jordan blocks
of J are precisely the linearly independent eigenvectors of A, and, as we saw
in Example 9.1, the other columns of Q are some generalized eigenvectors.

Definition 9.1 A nonzero vector x is said to be a generalized eigenvec-


tor of A of rank k belonging to an eigenvalue ..\ if

Note that if k = 1, this is the usual definition of an eigenvector. Let x


be a generalized eigenvector of rank k belonging to an eigenvalue ..\. Define

Xk x,
Xk-l (A - >..I)x (A - ..\I)xk,
Xk-2 (A - ..\I)2x (A - ..\I)xk-l'

X2 = (A - ..\I)k-2x (A - >"I)X3,
Xl = (A - ..\I)k-Ix (A - ..\I)X2.
Definition 9.2 The set of vectors {Xl, X2, ... , xd is called a chain of
generalized eigenvectors belonging to the eigenvalue ..\.
Note that, if X is a generalized eigenvector of A of rank k > 1 belonging
:s
to an eigenvalue ..\, then, for each e, 1 < e k, (A - >..I)exe = (A - >..I)kx = 0
and (A_..\I)e-I xe = (A-..\I)k-Ix i- o. Hence, the vector Xi! = (A->..I)k-e x
is a generalized eigenvector of A of rank e. However, Xl = (A - >..I)k-I x is
always an eigenvector belonging to >.., called the initial vector of the chain.
Note also that (A - >..I)i!Xi = 0 for e ?: i.
The following series of theorems shows that a transition matrix Q may
be constructed from a set of linearly independent generalized eigenvectors of
A, and justifies the invertibility of Q.
Example 9.3 also reveals how to find a transition matrix Q practically,
and the validity of the method is justified by the following theorems.

Theorem 9.2 A chain of generalized eigenvectors S = {Xl, X2, ... ,xd


belonging to an eigenvalue>.. is linearly independent.

Proof: Let us solve CIXI +C2X2+·· ·+qxk = 0 for scalars Ci, i = 1, ... , k.
If we multiply (on the left) both sides of this equation by (A - >"I)k-l, then
for i = 1, ... , k - 1,
9.2. GENERALIZED EIGENVECTORS 329

Thus, ck(A - AI)k-Ixk = 0, and, hence, Ck = O.


Do the same to the equation CIXI + ... + Ck-IXk-1 = 0 with (A - AI)k-2
and get Ck-l = O. Proceeding successively, we can show that Ci = 0 for all
i = 1, ... , k. That is, the equation has only the trivial solution. Hence, the
set S is linearly independent. 0

Theorem 9.3 The union of chains of generalized eigenvectors of a square


matrix A belonging to distinct eigenvalues is linearly independent.

Proof: Let {Xl, X2, ... , xd and {YI, Y2, ... , ye} be the chains of gen-
eralized eigenvectors of A belonging to the eigenvalues A and Ji-, respectively,
and let A i- Ji-. We wish to show that the set of vectors {Xl, ... , Xk, YI, ... ,
ye} is linearly independent. To solve the linear dependence of them,

CIXI + ... + CkXk + dlYI + ... + dlYl = 0,

for Ci'S and d/s, we multiply both sides of the equation by (A - AI)k and
note that (A - AI)kxi = 0 for all i = 1, ... , k. Thus we have

Again, multiply this equation by (A - Ji-I)l-l and note that

(A - Ji-Il-I(A - AI)k (A - AI)k(A - IL!)l-l,


(A - Ji-I)l-IYl YI,
(A - Ji-I)l-IYi = 0

for i = 1, ... , f - 1. Thus we obtain

Because (A - IL!)Y1 =0 (or AYI = Ji-YI), this reduces to

which implies that dl = 0 by the assumption Ai- Ji- and YI i- O. Proceeding


successively, we can show that di = 0, i = f, f - 1, ... , 2, 1, so we are left
with
330 CHAPTER 9. JORDAN CANONICAL FORMS

Since {Xl, ... , xd is already linearly independent by Theorem 9.2, Ci = 0


for all i = 1, ... , k. Thus the set of generalized eigenvectors {Xl, ... , Xk,
YI, ... , ye} is linearly independent. 0

The next step to produce Q such that AQ = QJ is to describe a method


for choosing chains of generalized eigenvectors from a generalized eigenspace,
which is defined below, so that the union of the chains is linearly independent.

Definition 9.3 Let A be an eigenvalue of A. The generalized eigenspace


of A belonging to A, denoted by K).., is the set

K).. = {x E en : (A - AI)P x = 0 for some positive integer pl.


It turns out that dim K).. is the multiplicity of A, and it contains the usual
eigenspace N(A - AI). The following theorem enables us to choose a basis
for K).., but we omit the proof even though it can be proved by induction on
the number of vectors in S u T.

Theorem 9.4 Let S = {Xl, X2, ... , xd and T = {YI, Y2, ... , ye} be
two chains of generalized eigenvectors of A belonging to the same eigenvalue
A. If the initial vectors Xl and YI are linearly independent, then the union
S U T is linearly independent.

Note that this theorem easily extends to a finite number of chains of


generalized eigenvectors of A belonging to an eigenvalue A, and the union of
such chains will form a basis for K).. so that the matrix Q may be constructed
from these bases for each eigenvalue as usual.

Example 9.5 Find the Jordan canonical form of the matrix

Solution: The eigenvalues of A are Al = A2 = 2, A3 = 3. Since rank


(A - All) = 2, the dimension of the eigenspace N(A - All) is 1. Thus
there is only one linearly independent eigenvector belonging to Al = A2 = 2,
which is of the form UI = (a, 0, 0) with a i- 0, and an eigenvector belonging
to A3 = 3 is found to be U3 = (3, -1, 1). We need to find a generalized
9.2. GENERALIZED EIGENVECTORS 331

eigenvector of rank 2 belonging to the eigenvalue 2, which is a solution to


the following systems:

(A - 2I)x

From the second equation, x has to be of the form (a, b, 0), and from
the first equation we must have b =1= o. Let us take U2 = (0, 1, 0) for a
generalized eigenvector of rank 2. Thus we have

(A - 2I)u2 Ul = (1, 0, 0),


(A - 2I) 2u2 (A - 2I)Ul = o.
Clearly, the set of vectors {Ul' U2, U3} is linearly independent. Set

Then
Q ~ [~
0
1
0 -n 1
so Q-l=
[~ -3]0
1
0
1
1
.

2
~l
[-:- -:-J
Q-lAQ = = [
+ J2] ,
0 I
where J l = [ ~ ~] and J2 = [3]. 0

Example 9.6 Find Q so that Q-l AQ =J is the Jordan canonical form of


the matrix

A= [ H ~ ~ 1·
-1 4 -6 4
Solution: The characteristic polynomial of the matrix A is
332 CHAPTER 9. JORDAN CANONICAL FORMS

Therefore, the only eigenvalue of A is A = 1 of multiplicity 4. Note that


dim N (A - 1) = 1 since the rank of the matrix

-~ -~0 -1~ ~1 1
A-I
r -1o 4 -6 3
is 3; the fourth row is a linear combination of the first three rows, which are
linearly independent. Thus there is only one eigenvector belonging to A = 1,
say x = (1, 1, 1, 1). We need to find a generalized eigenvector of rank 4,
which is a solution x of the following equations:
3 -3
r -1 3 -3
(A - 1)3 x
-1
-1
3
3
-3
-3
; 1x -I 0,

(A - 1)4 x = o.
But, a direct computation shows that the matrix (A - 1)4 = o. Hence, we
may take any vector that satisfies the first equation, say x = (-1, 0, 0, 0),

-n
as a generalized eigenvector of rank 4. Now, take X4 = (-1, 0, 0, 0), and

-~ -~
r -1 4 -6 3
-~
1r 1 r 1'
0
~
1
(-1,0, 1,2),
(1, 1, 1, 1).
Thus clearly the chain of generalized eigenvectors {Xl, X2, X3, X4} is linearly
independent. Therefore,

~r ~
-1 1 0

-~ 1
1
Q=
0
1
0
0
and Q-1
-1 1 0
1 -2 o
1 1'
r1 2 1 -1 3 -3 1
and

~ !
1 0
1 1
Q-IAQ = =J.
0 1
r 0 0 1 0
9.3. COMPUTATION OF EA 333

Problem 9.2 Find a full set of generalized eigenvectors of the following matrices:

-2 0 -2] [-6 31 -14]


(1) [ -1 1 -2, (2) -1 6 -2 .
o 1 -1 0 2 1

[~ ~ ! n
Problem 9.3 Find the Jordan canonical form for each of the following matrices:

(2) [ 04 14 2]
2 ,
(3)
004

9.3 Computation of eA
The Jordan canonical form of an arbitrary matrix enables us to compute the
exponential matrix. Let A be an arbitrary square matrix, and let J be the
Jordan canonical form of A such that

Q-' AQ = J= [J' .. J, 1'


where Q is made of generalized eigenvectors of A and J/s are Jordan blocks.
(1) Computation of the power Ak of A for k = 1,2, ... : Since we have

for k = 1, 2, ... , we may assume that J is a simple Jordan block and


compute J k . Now an n x n Jordan block J belonging to an eigenvalue).. of
A may be written as

).. 1 0 1 0 0 0 1 0

0 0 0 0
J= ).. +
).. 1 1 0 0 1
0 0 )..
0 0 1 0 0 0
AI + N.
334 CHAPTER 9. JORDAN CANONICAL FORMS

Since I is the identity matrix, clearly IN = N I and


k
A 1 0 0

Jk =
0 A 1

A
0
1
= (AI +N)k = t
j=O J
(~)Ak-jNj.
0 0 A

Note that N k = 0 for k ~ n. Thus, by assuming (~) = 0 if k < f,

Jk = ~ (~)Ak-jNj
j=O J

AkI+ (~)Ak-lN+ ... + (n:l)A k-(n-l)Nn- 1


Ak (~)Ak-l (k~2)Ak-2 (n~l)Ak-n+l
o Ak (~)Ak-l (n~2)Ak-n+2

Problem 9.4 Compute Ak, k = 1, 2, ... , for

-3 1 2]
o2 1
2 0
100]
(1) A = [ 0 0 2 0 '
000 1
(2) B = [ =~
-2
1
1
-3
-1 2
-1 2
1 4
.

(2) Computation of the exponential matrix eA of A: Note that

eA = eQJQ - 1
= QeJ Q-l
eh
o
= Q
o
9.3. COMPUTATION OF EA 335

where the Ji's are Jordan blocks. Thus, it is enough to compute e J when J
is a simple Jordan block of the form
J=>..I+N,
where I and N are as in (1). Then, as usual, Nk = 0 for k ~ n, and
1 1
1 1 2! (n - I)!
1
o 1 1
(n - 2)!
1

1
o 1

In particular, the solution y(t) = etA Yo of a system of linear differential


equations
y'=Ay with initial condition y(O) = Yo
can be written as
QetJQ-IyO

I
etAyo =
t2 tn - l
1 t
2! (n - I)!
tn - 2
0 1 t
(n - 2)! C2
Cl
eAt [UI U2 ... Un]
1
Cn
t
0 1

eAt ( (E Ck+1 ~) UI + (E Ck+2 ~) U2 + ... + Cn Un ) ,

where Q-IyO = (q, ... , cn) and the Ui'S are generalized eigenvectors be-
longing to >.. of A.

n
Example 9.7 Solve the linear differential equation y' Ay with initial
condition y(O) = Yo, where

A ~ [j -~ =n, Yo ~ [
336 CHAPTER 9. JORDAN CANONICAL FORMS

Solution: The Jordan canonical form of A is computed as

where

J 2 = [3], and Q = [ =1~ 0~ -1~ ].


Let y = Qx. Then the given system changes to x' = Jx with

and the solution of this new system is given by

[,,1,0 0 J[ 5] ['~' j, ][n'


te 2t
x(t) etJx(O) = ~ e th e 2t
0

since
eth = e 2t [ ~ ~1 and e th = e 3t .

n
Thus

y(t) Qx(t)= [-1 1 2]


-11
1 0
1
-1
[e
2t
0
0
te

0
2t
e2t
e~t ][

," ((5+ 5t) [ =: ]+ 5 [ i]) -:J '


d' [ o
Problem 9.5 Solve the system of linear differential equations y' = Ay with the
initial condition y(O) = Yo, where

A = 2
[ -3 -11 -11 1, Yo = [ 1
-1
-1 .
9 3 -4 1
9.4. CAYLEY-HAMILTON THEOREM 337

9.4 Cayley-Hamilton theorem


As we saw in earlier chapters, the association of the characteristic polyno-
mial with each matrix is very useful in studying matrices. In this section,
using this association of the polynomials with matrices we prove one more
useful theorem, called the em Cayley-Hamilton theorem, which makes the
calculation of matrix polynomials simple, and has many applications to real
problems.
Let f(x) = +
amx m am_lx m - l +... + +
alx ao be a polynomial, and let
A be an n x n square matrix. The matrix defined by

is called a matrix polynomial of A. For example, if f(x) = x 2 - 2x +2


and A = [~ ~ 1' then
f(A) A2 - 2A + 2h

[: :l-2[~ ~l+2[~ ~l=[~ ~l·


Problem 9.6 Let A be an eigenvalue of A and x an eigenvector belonging to A. If
f(x) is any polynomial, then f(A) is an eigenvalue of the matrix polynomial f(A).

Theorem 9.5 (Cayley-Hamilton) For any n x n matrix A, if f(>.)


det(AI - A) is the characteristic polynomial of A, then f(A) = O.

Proof: We prove this theorem in three steps:

(1) We first assume that A is a diagonal matrix D = >'01


[

Since, for all k 2: 0, Dk ~ [


i
>.k

~ l'
>.kn
we have

f(>'d
f(D) = [ b
338 CHAPTER 9. JORDAN CANONICAL FORMS

(2) Suppose that A is diagonalizable, i.e., Q-l AQ = D or A = QDQ-l


for an invertible matrix Q. Since the characteristic polynomials of A and D
are the same, we have
f(A) f(QDQ-l)
(QDQ-l)n + an_l(QDQ-l)n-l + ... + al(QDQ-l) + aoI
Q(D n + an_lD n- 1 + ... + aID + aoI)Q-l
Qf(D)Q-l = o.
(3) Finally, suppose that A is any square matrix. Then by Theorem 9.1,
Jl 0 ]
A is similar to the Jordan canonical form J = [ : ... : = Q-l AQ.
o Js
Thus f(A) = Qf(J)Q-l. Since

Jk 0 ]
Jk = [ :1 ... :
. .'
o Jks
it is enough to show f(J) = 0 for a single Jordan block J = aI + N with
eigenvalue a, where Nn = O. Since f()..) = det()"I - A) = det(M - J) =
().. _ a)n,
D
f(J) = f(aI + N) = (aI + N - aIt = N n = o.

Example 9.8 The characteristic polynomial of

A= [j _j J]
is f()..) = det(M - A) = )..3 +)..2 - 6)", and

f(A) A 3 +A2 - 6A

[27o 78 54] [-9 -42 -18]


80+ 0 4 0

p ~]
-27 -102 -54 9 30 18

~]
0
~
6
-6 [ 2 = 0
-3 -12 -6 0 0 D
9.4. CAYLEY-HAMILTON THEOREM 339

The Cayley-Hamilton theorem can be used to find the inverse of a non-


singular matrix. If f (,\) = ,\ n + an-l'\ n-l + ... + a 1'\ + ao is the characteristic
polynomial of a matrix A, then

0= f(A) An + an_lA n- l + ... + alA + aoI,


or - aoI (A n- l + an_ l A n- 2 + ... + alI)A.

Since ao = f(O) = det(OI - A) = det(-A) = (-I)ndetA, A is nonsingular


if and only if ao = (_1)n det A -I- O. Therefore, if A is nonsingular,
1 n-l
A -1 = --(A + an-IA n-2 + ... + alI).
ao

Example 9.9 The characteristic polynomial of the matrix

A= [-! ~ -~]
-2 4 1

is f('\) = det(AI3 - A) = ,\3 - 8,\2 + 17>. - 10, and the Cayley-Hamilton


theorem yields

Hence

A-I 1 2
10(A - 8A + 17h)

1[ 10 6 -6] 8[ 4 -2] +-17 [1


n
2 0
- -39 7 18 - - -5 3 2 0 1
10 -30 12 13 10_ 2 4 1 10 1 0

= -
1 [ -5 -10 10]
1 0 2 .
10 -14 -20 22
0

Problem 9.7 Let A and B be square matrices, not necessarily of the same size, and
let f(>..) = det(>..I - A) be the characteristic polynomial of A. Show that f(B) is
invertible if and only if A has no eigenvalue in common with B.

The Cayley-Hamilton theorem can also be used to simplify the calcula-


tion of matrix polynomials. Let p('\) be any polynomial and let f(,\) be the
340 CHAPTER 9. JORDAN CANONICAL FORMS

characteristic polynomial of a square matrix A. A theorem of algebra tells


us that there are polynomials q(.>.) and r('>') such that

p(.>.) = q(.>.)f(.>.) + r('>')


with the degree of r('>') less than the degree of f(.>.). Then

p(A) = q(A)f(A) + r(A).


By the Cayley-Hamilton theorem, f(A) = 0 and

p(A) = r(A).

Thus the problem of evaluating a polynomial of an n x n matrix can be


reduced to the problem of evaluating a polynomial of degree less than n.

Example 9.10 The characteristic polynomial of the matrix A = [~ i]


is f(.>.) = .>.2 - 2'>' - 3. Let p(.>.) = .>.4 - 7.>.3 - 3.>.2 +.>. + 4 be a polynomial.
A straightforward calculation shows that

p(.>.) = (.>.2 _ 5.>. - lO)f(.>.) - 34'>' - 26.

Therefore

p(A) (A2 - 5A + 10)f(A) - 34A - 261


-34A - 261

-34 [~ i]- 26 [~ ~] = [=~~ =~~]. o

Problem 9.8 For the matrix A = [~ ~ ~ 1' evaluate the matrix polynomial
002

9.5 Exercises
9.1. Show that if A nonsingular, then A-I has the same block structure in its
Jordan canonical form as A does.
9.5. EXERCISES 341

9.2. Find the number of linearly independent eigenvectors for each of the following
matrices:

1 1 0 0 0 2 0 0 0 0 2 1 0 0 0
0 1 1 0 0 0 2 0 0 0 0 2 0 0 0
(1) 0 0 1 0 0 , (2) 0 0 2 0 0 , (3) 0 0 3 0 0
0 0 0 3 1 0 0 0 5 1 0 0 0 3 0
0 0 0 0 3 0 0 0 0 5 0 0 0 0 5

9.3. Solve the system of linear equations


{ (1 - i)x + (1 + i)y = 2-i
(1 + i)x + (1 + i)y = 1 + 3i.

9.4. Solve y' = Ay for A = [ ~ ~ ] with Yo = [ ~ ].

9.5. Solve y' = Ay, where A = [=621 2: : ] and y(l) = (2, 1, 0).
-12 -6
9.6. Solve the initial value problem
y~ = -Y1 +2Y3, Y1(O) = -2
{ Y2 = 2Y1 +Y2 -2Y3, Y2(O) = 0
y~ = -2Y1 +3Y3, Y3(O) = -1.

9.7. Find the Jordan-canonical form for A = [~ ; ] , and compute eA.

9.8. Consider a 2 x 2 matrix A = [~ ~].


(1) Find a necessary and sufficient condition for A to be diagonalizable.
(2) The characteristic polynomial for A is f(t) = t 2 - (a + d)t + (ad - be).
Show that f(A) = O.

9.9. For each of the following matrices, find a polynomial of which the matrix is
a root.

(3) [~ i -~].
o 2 -1
9.10. Verify that each of the matrices below satisfies its own characteristic polyno-
mial and from these results compute A -1, if it exists.

(1) [~ ~ l, (2) [~ ~ l, (3) [~ ~ ~].


342 CHAPTER 9. JORDAN CANONICAL FORMS

9.11. An n x n matrix A is called a circulant matrix if the i-th row of A is


obtained from the first row of A by a cyclic shift of the i-I steps, i.e., the
general form of the circulant matrix is
al a2 a3 an
an al a2 an-l
A= an-l an al a n -2

a2 a3 a4 al
(1) Show that any circulant matrix is normal.
(2) Find all eigenvalues of the n x n circulant matrix
o 1 0 0
o 0 1 0
w=
o0 0 1
1 0 0 0
(3) Find all eigenvalues of the circulant matrix A by showing that
n
A = LaiWi-1.
i=l
(4) Use your answer to find the eigenvalues of
o 1 1 1
101 1
B=
1 1 0 1
1 1 1 0

9.12. Determine whether the following statements are true or false, in general, and
justify your answers.
(1) Any square matrix similar to a triangular matrix.
(2) If a matrix A has exactly k linearly independent eigenvectors, then the
Jordan canonical form of A has k Jordan blocks.
(3) If a matrix A has k distinct eigenvalues, then the Jordan canonical form
of A has k Jordan blocks.
(4) If a 4 x 4 matrix A has eigenvalues 1 and 2, each of multiplicity 2, such
that dimE(I) = 2 and dimE(2) = 1, then the Jordan canonical form
of A has three Jordan blocks.
(5) If A!, ... , Ak are k distinct eigenvalues of A with multiplicities mi and
dimE(Ai) =I- mi, then A is not diagonalizable.
(6) For any Jordan block J with eigenvalue A, det e J = eA.
(7) If f(x) is a polynomial and A is a square matrix such that f(A) = 0,
then f(x) is a multiple of the characteristic polynomial of A.
Selected Answers and Hints

Chapter 1
Problems
1. 2 (1) Inconsistent.
(2) (Xl, X2, X3, X4) = (-1 - 4t, 6 - 2t, 2 - 3t, t) for any t E R
1.3 (1) (x, y, z) = (t, -t, t). (3) (w, x, y, z) = (2, 0, 1, 3)
1.4 (1) b1 + b2 - b3 = O. (2) For any bi's.
1. 7 a =- 127, b= 123, C = 143 , d = -4.
1.8 Consider the matrices: A = [; :], B = [; ~], C = [~ ~].
1.9 Compare the diagonal entries of AAT and AT A.
1.11 (1) Infinitely many for a = 4, exactly one for a i- ±4,and none for a = -4.
(2) Infinitely many for a = 2, none for a = -3, and exactly one otherwise.

18 Con,id" the matdx A ~ [~ :].

1.13 (3) 1= JT = (AA-1)T = (A-If AT means by definition (AT)-l = (A-If.


1.16 Any permutation on n objects can be obtained by taking a finite number of
interchangings of two objects.

1.20 A-I = 115 [~


4
=;;
-2
~].
1
1.21 Consider the case that some di is zero.
1.22 X = 2, y = 3, z = 1.

1.23 L = [ - ~ -1~ ~],


o 1
1.24 (1) Consider (i,j)-entries of AB for i < j.
(2) A can be written as a product of lower triangular elementary matrices.

343
-L nD~u +4;3lU~[~ T +1
344 Selected Answers and Hints

L25L~[+
1.26 There are four possibilities for P.
1.27 (1) II = 0.5,12 = 6,13 = 5.5. (2) h = O,h = h = 1,14 = h = 5.
0.35]
1.29 x = k [ 0.40 , for k > O.
0.25

0.0 0.1 0.8] [ 90 ]


1.30 A = [ 0.4 0.7 0.1 with d = 10 .
0.5 0.0 0.1 30

Exercises

1.1 Row-echelon forms are A, B, D, F. Reduced row-echelon forms are A, B, F.


-3 2 1
1.2
(1) [ ~ 0
0
0
1
0
0
-1/4
0
0
3/!o 1.
0
-3 0
1
~
3/2
0 1 -1/4 1/2
3/4
1.3
(I) [ 0 0 0 o .
0 0 0 0
1.4 (1) Xl = 0, X2 = 1, X3 = -1, X4 = 2. (2) X = 17/2, Y = 3, z = -4.
1.5 (1) and (2).
1.6 For any bi's.
1. 7 bl - 2b 2 + 5b3 =I o.
1.8 (1) Take x the transpose of each row vector of A.
1.10 Try it with several kinds of diagonal matrices for B.
12k 3k(k-l)]
1.11 Ak = [ 0 1 3k .
o 0 1
5 -22
101 ]
1.13 (2) [ 0 27 -60 .
o 0 87
1.14 See Problem 1.9.
1.16 (I)A- l AB=B. (2)A- l AC=C=A+I.
1.17 a = 0, c- l = b =I O.
Selected Answers and Hints 345

1.18 A-1 = [~o t/~ 0


-d -1/~ 1'
0 1/4
B- 1 = [-~~~~ -~~; -1/8]
5/4 0
3/8
-1/4
.

1.21 (1) x = A- 1 b = [-!~~ -;~~ !~~]


-1/3 -2/3 1/3
[;] 7
= [ -~~~ ].
-5/3

1.22 (1) A = [! ~] [~ ~] [~ li2] = LDU, (2) L = A, D = U = I.

1.23 (1) A = [ ;
3 1 1
~ ~] [~~ ~ [~0 i0
0 0 -1
] i],
1

(2)[b~a ~][~ d-~2/a][~ b/~].


1.24 c = [2 - 1 3V, x = [4 2 3V.

1.25 (2) A [~1 1~ 1~] [~0 0~ 2~] [~0 0


= ~ 4/!].
1
1.26 (1) (Ak)-l = (A-1)k. (2) An-1 = 0 if A E Mnxn.
(3) (I-A)(I+A+···+A k- 1)=I-A k .

1.27 (1) A = [~ ~]. (2) A = A-1 A2 = A- 1A = I.


1.28 Exactly seven of them are true.
(8) If AB has the (right) inverse G, then A-1 = BG.
(10) Consider a permutation matrix [~ ~].

Chapter 2
Problems
2.4 (1) -27, (2) 0, (3) (1- x 4)3.
2.6 Let a be a transposition in Sn. Then the composition of a with an even (odd)
permutation in Sn is an odd (even, respectively) permutation.
2.9 (1) -14. (2) o.
2.10 See Example 2.6, and use mathematical induction on n.
2.12 If A = 0, then clearly adjA = O. Otherwise, use A· adjA = (det A)I.
2.13 Use adjA . adj(adjA) = det(adjA) = I.
2.14 (1) Xl = 4,X2 = 1, X3 = -2.
10 5 5
(2) X = 23' Y = 6' z = 2·
346 Selected Answers and Hints

2.15 The solution of the system Id(x) = x is Xi = ~:5 = detA.


2.16 Find the cofactor expansion along the first row first, and then compute the
cofactor expansion along the first column of each n x n submatrix (in the
second step, use the proof of Cramer's rule).

Exercises

2.1 k = 0 or 2.
2.2 It is not necessary to compute A 2 or A 3 .
2.3 -37.
2.4 (1) detA= (_I)n-l(n-1). (2)0.
2.5 -2,0,1,4.
2.6 Consider L alCY(l) ... ana(n)'

2.7 (1) 1, (2)24.


2.8 (3) Xl = 1, X2 = -1, X3 = 2, X4 = -2.
2.9 (2) x = (3,0, 4/11)T.
2.10 k = 0 or ± 1.
2.11 x = (-5,1,2, 3)T.
2.12 x=3, y=-l, z=2.
2.13 (3) All = -2, A12 = 7, A13 = -8, A33 = 3.

2.16 A-l = /2 [~~ -~ 1~ ].


6 14 -18

2.17 (1) adj(A) = [ ~


-4
=;
7
=~]'det(A)=-7'det(adj(A))=49'
5

A- l = -tadj(A). (2) adj(A) = [ -1~


7
! -~],
-3 -1
detA = 2, det(adj(A)) = 4, A- l = ~adj(A).

2.19 Multiply [~ ~].


2.20 If we set A = [ ; i], then the area is ~ I det AI = 4.

2.21 If we ""t A ~ [~ ; ] . then the rueea;, hll det(AT A)I ~ 'f.


Selected Answers and Hints 347

2.22 Use det A = L sgn(a)ala(l) ... ana(n)'

2.23 Exactly seven of them are true.


(4) (c1n - Af = c1n - AT.
(10) Since UyT = U[Vl ... vnl = [VI U··· VnU],
det(uyT) = Vl ... Vn det([u··· ul) = 0.

(13) Consider [i° ~ 1


~].
1

Chapter 3
Problems

3.2 (2), (4).


3.3 (1), (2), (4).
3.4 Note that any vector y in W is of the form alXl + a2x2 + ... + amXm which
is a vector in U.
3.5 tr(AB - BA) = 0.
3.8 Linearly depeudent.
3.10 Any basis for W must be a basis for V already, by Corollary 3.1l.
3.11 (1) n - 1, (2) n(n2+l), (3) n(n2-l).

3.13 See Problem 1.10.


3.15 63a + 39b - 13c + 5d = 0.
3.17 If b l , ... , b n denote the column vectors of B, then AB = [Ab l ... Ab n ].
3.18 Consider the matrix A from Example 3.20.
3.19 (1) rank = 3, nullity = 1. (2) rank = 2, nullity = 2.
3.20 Ax = b has a solution if and only if b E C(A).
3.21 A-I(AB) = B implies rank B = rank A-l(AB) ~ rank(AB).

3.22 By (2) of Theorem 3.21 and Corollary 3.18, a matrix A of rank r must have
an an invertible submatrix C of rank r. By (1) of the same theorem, the rank
of C must be the largest.
3.24 dim(V + W) = 4 and dim(V n W) = l.
3.25 A basis for V is {(1,0,0,0), (0,-1,1,0), (0,-1,0,1)},
for W: {(-I, 1,0,0), (0,0,2, I)}, and for V n W : {(3, -3, 2, I)}. Thus,
dim(V + W) = 4 means V + W = ]R4 and any basis for]R4 works for V + W.
348 Selected Answers and Hints

3.28
°° °2
1 1
1 2

Exercises
3.1 Consider 0(1,1).
3.2 (5).
3.3 (1), (2), (3).
3.4 (1).
3.5 (1), (4).
3.6 No.
3.7 (1) p(x) = -Pl(X) + 3p2(X) - 2P3(X).
3.8 No.
3.10 No.
3.11 {(I, 1,0), (1,0, I)}.
3.12 (3) (5, 2, 0).
3.13 2.
3.14 Consider {ej = {a;}~d where ai = { °
I if i = j,
otherwise.

°
3.15 (1) 0 = c1Ab 1+. ,+cpAb s = A(c1b 1+ . .+cpb S ) implies c1b 1+. ,+cpbP = 0
since N(A) = 0, and this also implies Ci = for all i = 1, ... ,p since columns
of B are linear independent.
(2) B has a right inverse. (3) and (4): Look at (1) and (2) above.
3.16 (1) {( -5,3, I)}. (2) 3.
3.17 5!, and dependent.
3.18
(1) R(A) ((1,2,0,3), (0,0,1,2)), C(A) = ((5,0,1), (0,5,2)),
N(A) (( -2,1,0,0), (-3,0, -2, 1)).
(2) R(B) ((1,1, -2, 2), (0,2,1, -5), (0,0,0,1)),
C(B) ((1,-2,0), (0,1,1), (0,0,1)), N(B) = ((5,-1,2,0)).
3.19 rank = 2 when x = -3, rank = 3 when x =J -3.
3.21 See Exercise 2.23: Each column vector of UyT is of the form ViU, that is, u
spans the column space. Conversely, if A is of rank 1, then the column space
is spanned by anyone column of A, say the first column u of A, and the
remaining columns are of the form ViU, i = 2, ... , n. Take y = [1 V2 ... VnV,
Then one can easily see that A = UyT.
3.22 Three of them are true.
Selected Answers and Hints 349

Chapter 4
Problems

4.1 [~ ~], since it is simply the change of coordinates x and y.


4.2 To show W is a subspace, see Theorem 4.2. Let Eij be the matrix with 1
at the (i, j)-th position and 0 at others. Let Fk be the matrix with 1 at the
(k,k)-th position, -1 at the (n,n)-th position and 0 at others. Then the
set {Eij,Fk : 1 :::; i =1= j :::; n, k = 1, ... ,n -I} is a basis for W. Thus
dim W = n 2 -l.
4.3 tr(AB) = 2:::1 2:~=1 aikbki = 2:~=1 2:::1 bkiaik = tr(BA).
4.4 If yes, (2, 1) = T( -6, -2, 0) = -2T(3, 1, 0) = (-2, -2).
4.5 If a1V1 + a2v2 + ... + akvk = 0, then
0= T(a1v1 + a2v2 + ... + akvk) = a1W1 + a2w2 + ... + akwk implies ai = 0
for i = 1, ... , k.
4.6 (1) If T(x) = T(y), then So T(x) = So T(y) implies x = y. (4) They are
invertible.
4.7 (1) T(x) = T(y) if and only if T(x - y) = 0, i.e., x - y E Ker(T).
(2) Let {VI, ... , v n } be a basis for V. If T is one-to-one, then the set
{T(vd, ... , T(v n )} is linearly independent as the proof of Theorem 4.7
shows. Corollary 3.11 shows it is a basis for V. Thus, for any y E V, we can
write it as y = 2:~=1 aiT(vi) = T(2:~=l aivi). Set x = 2:~=1 aivi E V. Then
clearly T(x) = y so that T is onto. If T is onto, then for each i = 1, ... , n
there exists Xi E V such that T(Xi) = Vi. Then the set {Xl, ... , x n } is
linearly independent in V, since, if 2:~=1 aiXi = 0, then 0 = T(2:~=l aixi) =
2:~1 aiT(xi) = 2:~=1 aivi implies ai = 0 for all i = 1, ... , n. Thus it is a
basis by Corollary 3.11 again. If T(x) = 0 for x = 2:~=1 aixi E V, then
o = T(x) = 2:~=1 aiT(xi) = 2:7=1 aiVi implies ai = 0 for all i = 1, ... , n,
that is X = o. Thus Ker (T) = {O}.

4.8 Use rotation R f and reflection [~ _ ~ ] about the x-axis.

4.9 (1) (5,2,3). (2) (2,3,0).


4.10 vol(T(C)) = I det(A)lvol(C), for the matrix representation A of T.

4.12 (1) [T]a = [~4 7


=~ 0~], [T]J3 = [~0 7
=~ 4~].
n [! n
350 Selected Answers and Hints

~ [~ ~ ~ ~
in n
415 [S +TI. [T 0 Sio

416 [Sl~ ~ [~ - [110 ~ [~ ~


4.17 (2) [Tl~= [~ ~] l[T- 1 13= [-~ ~
4.18 [Id13 = ~ [~ -~ -~], [Idl~ = [-; -~ -1~].
2 2 1 1 1 1 -2

4.19 [Tlo = [~1 -i0 4~], [Tlf3 = [-~1 -~1 -~].


5
4.20 Write B = Q-l AQ with some invertible matrix Q.
(1) detB = det(Q-1AQ) = detQ-1detAdetQ = detA. (2) tr (B) = tr
(Q-l AQ) = tr (QQ-l A) = tr (A) (see Problem 4.3). (3) Use Problem 3.2l.
4.22 Q* = {h(x, y, z) = x - h, h(x, y, z) = h, h(x, y, z) = -x + z}.

Exercises
4.1 (2).
4.2 ax 3 + bx 2 + ax + c.
4.5 (1) Consider the decomposition of v = v+;(v) + v-;(v).
4.6 (1) {(x, ~x, 2x) E JR3 : x E JR}.
4.7 (2) T-1(r, s, t) = (~r, 2r - s, 7r - 3s - t).
4.8 (1) Since To S is one-to-one from V into V, To S is also onto and so T is
onto. Moreover, if S(u) = S(v), then To S(u) = To S(v) implies u = v.
Thus, S is one-to-one, and so onto. This implies T is one-to-one. In fact, if
T(u) = T(v), then there exist x and y such that S(x) = u and S(y) = v.
Thus To S(x) = To S(y) implies x = y and so u = T(x) = T(y) = v.
4.9 Note that T cannot be one-to-one and S cannot be onto.

4.11 [-~ -~ =: _:~].


000 1
1/3 2/3]
4.12 (1) [ -1/3 1/3 .

4.13 (1) [~ _~], (2) [~ -! l


Selected Answers and Hints 351

4.14 (1) T(1, 0, 0) = (4, 0), T(1, 1, 0) = (1, 3), T(1, 1, 1) = (4, 3).
(2) T(x, y, z) = (4x - 2y + z, y + 2z).

[~o 0i ~1 l' [~0 0~ 0~].


n
4.15 (1) (4)

4.16 (1) P ~ [~
4.17 Use the trace.
_: -H (2) Q ~ [: ~ ~ p-'
-7 -33 -13]
4.18 (1) [ 4 19 8·

-2/3 1/3 4/3]


4.19 (2) [ ~ ~ ] ,(4) [ 2/3 -1/3 -1/3 .
7/3 -2/3 -8/3

4.24 [T]a = [-~1 0~ 1i] = ([T*]". )T.

4.25 (1) [-i ~ ~]. (2) [T]~ = [-~ ~ -i]·

n
4.26 N(T) = {O},

C(T) ~ ( (2,1,0,1), (1,1,1,1), (4,2,2,3) ), [T]~ ~ -1 ~ -


[

4.28 Pl(X) = 1 + x - ~x2, P2(X) = -i + ~x2, P3(X) = -~ + x - ~x2.


4.29 Two of them are false.

Chapter 5
Problems
5.1 (x,y)2 = (x,x)(y,y) if and only if Iltx+yl12 = (x,x)t 2 +2(x,y)t+(y,y) =0
has a repeated real root to.
5.2 (4) Compute the square of both sides and then use Cauchy-Schwarz inequality.
5.4 (j,g) = fo1 f(x)g(x)dx defines an inner product on e[O, 1]. Use Cauchy-
Schwarz inequality.
5.5 (1) ~(2, 1, -1), (2) Jh(6, 4, -3).
5.6 (1): Orthogonal, (2) and (3): None, (4): Orthonormal.
5.9 1) is just the definition, and use (1) to prove (2).
352 Selected Answers and Hints

5.11 -i+x.
5.13 {I, V3(2x - 1), V5(6x 2 - 6x + I)}.
5.15 Projw(p) = (~,~, -~).

5.17 Consider a matrix [~ ~].


5 .18 (1) r -- V2'
I
8 -
- I - I b_ I _ I
v'6' a - - v'3' - v'3' c - v'3.
5.19 Extend {VI, ... , v m } to an orthonormal basis {VI, ... , Vm , ... , v n }. Then
IIxl1 2 = 2::;:1 I(x, vi)1 2 + 2::7=m+1 I(x, vj)12.
5.20 (1) orthogonal. (2) not orthogonal.

5.22 The null space of the matrix [ 01 -12 -11 21 ] I.S

x = t[l - 11 oV + 8[-410 IV for t, 8 E R


5.23 R(A).l = N(A).
5.24 x = (1, -1, 0) + t(2, 1, -1) for any number t.
5.25 For A = [VI V2], two columns are linearly independent.

5.26 [ ~~ ]
'2g
= x = (AT A)-lATh = [ ~~3:
16.1
].
5.28 For x E lR m, x = (VI, X)VI + ... + (vm, x)v m = (VI v[)x + ... + (v m v;')x.
5.29 First, show that P is symmetric.
5.30 The line is a subspace with an orthonormal basis ~ (1, 1), or is the column

space of A = ~[~ ].

5.31 P = ~3 i1 -1~ -~].


[
2
5.33 Note that {el' e2, e4} is an orthonormal basis for the subspace.

Exercises
5.1 Inner products are (2), (4), (5).
5.2 For the last condition of the definition, note that (A, A) tr(A T A)
2::i,j aTj = 0 if and only if aij = 0 for all i, j.
5.4 (1) k=3.
5.5 (3) Ilfll = Ilgll = Ji/2, The angle is 0 if n = m, ~ if n =I- m.
5.6 Use the Cauchy-Schwarz inequality and Problem 5.1 with x = (al,···, an)
and y = (1,···,1) in (lR n , .).
Selected Answers and Hints 353

5 7 (1) _ 37 {l9
. 4' V3'
(2) If (h, g) = h( ~ + ~ +c) = 0 with h =1= 0 a constant and g(x) = ax 2+bx+c,
then (a, b, c) is on the plane ~ + ~ + c = 0 in JR3.
3 1
5.10 (1) 2V2, (2) 2V2.
5.12 Orthogonal: (4). Nonorthogonal: (1), (2), (3).
5.16 Use induction on n. Let B be the matrix A with the first column Cl replaced
by C = Cl -ProjW(cl), and write ProjW(Cl) = a2c2+" ·+anc n for some ai's.
Show that Jdet(AT A) = Jdet(BT B) = Ilcllvol(c2, ... , cn) = vol(P(A)).

5.17 Let A = [~ ~ ~]. Then the volume of the tetrahedron is


012
~ J'7Cie-t-:-(A7iiT""A:7) = l.
5.18 AT A = I and detAT = detA imply detA = ±l.
. A = [cos
Th e matnx . ()() -cos
sin ()() ].IS ort h ' h det A = - 1.
I wIt
ogona
Sill
5.20 Ax = b has a solution for every b E JRm if r = m. It has infinitely many
solutions if nullity = n - r = n - m > O.

5.21 Find a I"", 'qu.,",olution of [t ~] [~ 1~ [; 1 fo, (a. bl

3
in y = a + bx. Then y = x + 2'
1 -1 1 -1
1 0 0 0
5.22 Follow Exercise 5.21 with A = 1 1 1 1 . Then y = 2x 3 - 4x 2 +
1 2 4 8
1 3 9 27
3x - 5.
5.25 (1) Let h(x) = ~(f(x) + f( -x)) and g(x) = ~(f(x) - f( -x)). Then f = h+g.
(2) For fEU and 9 E V, (1, g) = f~1 f(x)g(x)dx = - fl- 1 f( -t)g( -t)dt
= - f~1 f(t)g(t)dt = -(1, g), by change of variable x = -to
(3) Expand the length in the inner product.
5.26 Six of them are true.
(1) Consider (1,0) and (-1,0).
(2) Consider two subs paces U and W ofJR3 spanned by el and e2, respectively.
(3) The set of column vectors in a permutation matrix P are just
{el' ... , en}, which is a set of orthonormal vectors.
354 Selected Answers and Hints

Chapter 6
Problems

6.3 Consider the matrices [~ i] and [~ ~] .


6.4 (1) Use det A = Al ... An. (2) Ax = AX if and only if x = AA-lx.
6.5 Check with A = [~ ~ l
6.6 If A is invertible, then AB = A(BA)A -1.
6.7 (1) If Q = [Xl X2 X3] diagonalizes A, then the diagonal matrix must be AI and
AQ = AQI. Expand this equation and compare the corresponding columns
of the equation to find a contradiction on the invertibility of Q.

6.8 Q = [i ~], D = [~ ~]. Then A = QDQ-l = [=~ ~ l


6.9 (1) The eigenvalues of A are 1, 1, -3, and their associated eigenvectors are
(1,1,0), (-1,0,1) and (1,3,1), respectively.
(2) If f(x) = x lO + x 7 + 5x, then f(l), f(l) and f( -3) are the eigenvalues of
AlO +A7 +5A.
n l
6.10 Note that [ aan+ ] 1 [2 °1 -2]
° [an ] The eigenvalues are 1, 2,
an-I.
an-l 1 ° °
an-2
-1 and eigenvectors are (1,1,1), (4,2,1) and (1, -1, 1), respectively. It turns
2 2n
out that an = 2 - (_1)n 3 - "3.
6.12 The eigenvalues are 0, 0.4, and 1, and their eigenvectors are
(1,4,-5), (1,0,-1) and (3,2,5), respectively.
6.13 Yl = cle 2x - iC2e-3x; Y2 = cle 2x + C2 e - 3x .
Yl = - C2e2x + C3e3x {Yl = e 2x - 2e 3x
6.14 { Y2 = cle x + 2c2e2x - C3e3x, Y2 = eX - 2e 2x + 2e 3x
Y3 = 2c2e2x - C3e3x Y3 = - 2e 2x + 2e 3x .
6.15 Yl = 0, Y2 = 2e 2t , Y3 = e 2t .
6.17 For (1), use (A + B)k = L~=o (~)Ai Bk-i if AB = BA. For (2) and (3), use
the definition of eA. Use (1) for (4).
6.18 Note that e CAT ) = (eA)T by definition (thus, if A is symmetric, so is e A ), and
use (4).

6.19 Wdte A ~ 2I +N with N ~ [ ~ °3 0]


3 . Then N3 = o.
°°
6.20 (1) [ :=: ] , (2) [ ~~_~-~ ].
Selected Answers and Hints 355

6.21 With respect to the standard basis, T = [~ ~


; ] with eigenvalues 3, 3,
104
5 and eigenvectors (0,1,0), (-1,0,1) and (1,2,1), respectively.
6.22 With the standard basis for M 2x2 (]R.): Q =

[T]Q =A = [~ : : ~]. The eigenvalues are 3,1, 1, -1, and their asso-
101 1
ciated eigenvectors are (1,1,1,1), (-1,0,1,0), (0, -1,0,1), and (-1,1, -1, 1),
respectively.

6.23 With the basi, a ~ {I, x, x'j, 11']. ~ A ~ [~ ~ ~].


Exercises
6.1 (4) 0 of multiplicity 3, 4 of multiplicity 1. Eigenvectors are ei - eH1 for
1 :S i :S 3 and 2::=1 ei'
6.2 f(>..) = (>.. + 2)(>..2 - 8>" + 15), >"1 = -2, >"2 = 3, >"3 = 5,
Xl = (-35, 12, 19), X2 = (0, 3, 1), X3 = (0, 1, 1).
6.4 {v} is a basis for N(A), and {u, w} is a basis for C(A).
6.5 Note that the order in the product doesn't matter, and any eigenvector of A is
killed by B. Since the eigenvalues are all different, the eigenvectors belonging
to 1, 2, 3 form a basis. Thus B = 0, that is, B has only the zero eigenvalue,
so all vectors are eigenvectors of B.

6.7 A ~ QDQ-' ~ ~ [ : -~ =~].


6.8 Note that ]R.n = W EI1 WJ. and P(w) = w for w E Wand P(v) = 0 for
v E W 1.. Thus, the eigenspace belonging to >.. = 1 is W, and that to >.. = 0
is WJ..
6.9 For any w E ]R.n, Aw = u(vTw) = (v. w)u. Thus Au = (v· u)u, so u is an
eigenvector belonging to the eigenvalue>.. = v . u. The other eigenvectors are
those in vJ. with eigenvalue zero. Thus, A has either two eigenspaces E(>..)
that are 1-dimensional spanned by u and E(O) = vJ. if v . u i= 0, or just one
eigenspace E(O) = ]R.n if v . u = O.
6.10 >..v = Av = A 2v = >..2V implies >..(>.. - 1) = O.
356 Selected Answers and Hints

6.12 Use tr(A) = Al + ... + An = all + ... + ann.


6.13 A = QDlQ-l and B = QD 2Q-l imply AB = BA since DlD2 = D 2D l .
Conversely, Suppose AB = BA and all eigenvalues AI, ... ,An of A are dis-
tinct. Then the eigenspaces E(Ai) are all I-dimensional for i = 1, ... ,n. But
if Ax = AiX, then ABx = BAx = ABx implies Bx E E(Ai). Thus Bx = /lx
means x is also an eigenvector of B. By the same token, any eigenvector of
B is also an eigenvector of A. Choose a set of linearly independent eigenvec-
tors of A, which form an invertible matrix Q such that Q-l AQ = Dl and
Q-lBQ = D 2.
6.14 Use induction on n. Clearly true for n = 1. Assume the equality for n - 1.
Then, by taking the cofactor expansion of det(AI - A) along the first row,
det(>.I - A) A(Ak + ClA k - l + ... + Ck-IA + Ck) + (-1)2k ck+l
= Ak+l + ClA k + ... + CkA + Ck+l.

6.16 With '''peot to the bas;, a ~ {I, x, x'J, [T]. ~ [~ : ~]. The e;gen-

values are 2,1, -1 and the eigenvectors are (1,1,1), (-1,1, 0) and (1,1, -2),
respectively.
6.19 Eigenvalues are 1, 1, 2 and eigenvectors are (1, 0, 0), (0,1,2) and (1,2,3).
AlOX = (1025, 2050, 3076).

6.20 Fibonacci sequence: an+l = an + an-l with al = 2 and a2 = 3.


6.21 One can easily check that det An = det A n- l - det A n- 2. Set an = det An,
so that an = an-l - an-2. With an-l = an-I, we obtain a matrix equation:

xn an ] -_
= [ an-l [1 -1]
1 a
[ an-I] -_ A Xn-l -_ An Xl,
an-2
with al = 1 and a2 = O. Using the eigenvalues might make the computation
a mess. Instead, one can use the Cayley-Hamilton Theorem 9.5: Since the
characteristic polynomial of A is A2 - A + 1, A2 - A + I = a holds. Thus,
A 3 = A2 - A = -I, so A 6 = I. One can now easily compute an modulo 6.
6.22 The characteristic equation is A2 - XA - 0.18 = O. Since A = 1 is a solution,
x = 0.82. The eigenvalues are now 1, -0.18 and the eigenvectors are ( -0.3, -1)
and (1, -0.6).

6.23 (1) eA = [~ e- i ].
6.24 The initial status in 1985 is Xo = (xo, Yo, zo) = (0.4, 0.2, 0.4), where x, y, Z
represent the pe[rc;~t]age Of[la;~e, ~~diU~, a]nd[ S~4al]l car owners. In 1995, the
status is Xl = Yl = 0.3 0.7 0.1 0.2 = Axo. Thus, in 2025,
Zl a 0.2 0.9 0.4
Selected Answers and Hints 357

the status is X4 = A4 xQ . The eigenvalues are 0.5, 0.8, and 1, whose eigenvec-
tors are (-0.41,0.82, -0.41), (0.47,0.47, -0.94), and (-0.17, -0.52, -1.04),
respectively.
Yl(X) = _2e 2 (1-x) + 4e 2 (x-l)
6.27 (1) { Y2(X) = _e 2(1-x) + 2e 2(x-l) (2) { Yl(X) = e 2 x(cosx - sin x)
Y3(X) = 2e 2 (1-x) - 2e 2 (x-l).
Y2(X) = 2e 2x sinx.

6.28 (1) f()..) = )..3 - 10)..2 + 28).. - 24, eigenvalues are 6, 2, 2, and eigenvectors are
(1,2,1), (-1,1,0) and (-1,0,1).
(2) f()..) = ().. - 1)()..2 - 6)" + 9), eigenvalues are 1, 3, 3, and eigenvectors are
(2,-1,1), (1,1,0) and (1,0,1).
6.29 Three of them are true:
(1) For A = [~ ~], if B = Q-l AQ, then Q must be singular.

(2) Consider A = [~ ; ] , and B = [ _ ~ _!].


(3) Consider [~ i]. (4) Consider [~ ~]. (6) Consider [~ ~].
(7) If A is similar to I + A, then det(M - A) is a constant.

(8) Consider A = [~ ~] and B = [i !].


(9) tr (A + B) = tr A + tr B.

Chapter 7
Problems
7.1 (1) u· v = uT V =-L"' UiVi -= Li ViUi =- v-:-rr.
= Li kUiVi = k Li UiVi = k(u· v).
(3) (ku) . v
(4) °
u· u = Li IUil2 ~ 0, and U· u = if and only if Ui = for all i. °
°: :;
7.2 (1) If x = 0, clear. Suppose x:f 0 :f y. For any scalar k,
(x - ky,x - ky) = (x, x) - k(x,y) - k(y,x) + kk(y,y). Let k = ~~::\
to obtain 1(x, x) (y, y) - 1(x, y) 12 ~ 0. Note that equality holds if and only if
x = ky for some scalar k.
(2) Expand IIx + Yl12 = (x + y, x + y) and use (1).
7.3 Suppose that x and y are linearly independent, and consider the linear depen-
dence a(x+y)+b(x-y) = 0 ofx+y and x-Yo Then 0 = (a+b)x+(a-b)y.
Since x and yare linearly independent, we have a + b = and a - b = which
°
are possible only for a = = b. Thus x+y and x-yare linearly independent.
° °
Conversely, if x + y and x - yare linearly independent, then the the linear
dependence ax+by = 0 ofx and y gives ~(a+b)(x+y)+ ~(a-b)(x-y) = O.
°
Thus we get a = = b. Thus x and y are linearly independent.
358 Selected Answers and Hints

7.4 (1) Eigenvalues are 0, 0, 2 and their eigenvectors are (1,0, -i) and (0,1,0),
respectively. (2) Eigenvalues are 3, 1+2VS , 1-2VS, and their eigenvectors are
-i 1-i) (VS- 3 i 1 1-VS(1 + i)) and (- vs+3i 1 l+VS(1 + i)) respec-
( 1 '~2' 2"2 ' 2"2 '
tively.
7.5 Refer to the real case.
7.6 (AB)H = (AB)T = BT-;r? = BHAH.
7.7 (AH)(A-1)H = (A-1 A)H = I.
7.8 The determinant is just the product of the eigenvalues and a Hermitian matrix
has only real eigenvalues.
7.9 See Exercise 6.9.
7.10 To prove (3) directly, show that "X(x . y) = 7l(x . y) by using the fact that
AH x = -[Link] when Ax = [Link].
7.11 AH = BH + (iC)H = BT - iCT = -B - iC = -A.
7.12 ±AB = (AB)H = BHAH = (±B)(±A) = BA, + if they are Hermitian, - if
they are skew-Hermitian.
7.13 Note that det U H = det U, and 1 = det 1= det(UHU) = Idet U12.
7.16 Since A- 1 = A H , (AB)HAB = I.
7.17 Hermitian means the diagonal entries are real, and diagonality implies off-
diagonal entries are zero. Unitary means the diagonal entries must be ±1.
7.20 This is a normal matrix. From a direct computation, one can find the eigen-
values, 1- i, 1- i and 1 + 2i, and the corresponding eigenvectors: (-1,0,1),
(-1,1,0) and (1,1,1), respectively, which are not orthogonal. But by an
orthonormalization, one can obtain a unitary transition matrix so that A is
unitarily diagonalizable.
7.21 AHA = (H1 - iH2)(H1 + iH2) = (H1 + iH2)(H1 - iH2) = AAH if and only
if H1H2 - H2H1 = 0.
7.22 In one direction these are all already proven in the theorems. Suppose that
UH AU = D for a unitary matrix U and a diagonal matrix D.
(1) and (2). If all the eigenvalues of A are real (or purely imaginary), then
the diagonal entries of D are all real (or purely imaginary). Thus DH = ±D,
so that A is Hermitian (or skew-Hermitian).
(3) The diagonal entries of D satisfy 1).1 = 1. Thus, DH = D-1, and
AH = UD- 1U- 1 = A- 1.

1 [y'3° -v'2 -1 1
7.23 Q = y6
6 y'3 v'2
v'2
-2
1
.
Selected Answers and Hints 359

7.24(1)A=~[_~ -~]+~[~ ~],


(2) B - 3+2V6 [
1 (HV6)(2+i)
5
1
- -6- (H~(2-i) 7+;V6

+ 3-2V6 [
1 (1-V6)(2+i)
5
1
6 (1-V6)(2-i) 7-2V6 .
5 5

Exercises
7.1 (1) y6, (2) 4.
7.4 (1) A = i, x = t(l, -2 - i), A = -i, x = t(l, -2 + i).
(2) A = 1, x = t(i, 1), A = -1, x = t(-i, 1).
(3) Eigenvalues are 2,2 + i, 2 - i, and eigenvectors are (0, -1, 1)),
(1, -i(2 + i), 1), (1, -i(2 - i), 1).
(4) Eigenvalues are 0, -1,2, and eigenvectors are
(1,0, -1)), (1, -i, 1), (1, 2i, 1).
7.6 A + cI is invertible if det(A + cI) =1= O. However, for any matrix A, det(A +
eI) = 0 as a complex polynomial has always a (complex) solution. For the
. [cos
rea I matnx ()] A + r I"IS mvertl'ble Clor every reaI numb er r
. ()() - sin ()'
sm cos
since A has no real eigenvalues.

7.1 (I) ~ [ 1~ i 1___1i l' (2) ~[ l' V2 ~I~J


7.10 (2) Q = ~ [i -i]·
7.12 (1) Unitary; diagonal entries are {I, i}. (2) Orthogonal; {cos ()+i sin (), cos ()-
isin()}, where () = cos- 1 (0.6). (3) Hermitian; {I, 1 + V2, 1 - V2}.
7.13 (1) Since the eigenvalues of a skew-Hermitian matrix must always be purely
imaginary, 1 cannot be an eigenvalue.
(2) Note that, for any invertible matrix A, (eA)H = eAH = e- A = (e A )-I.
7.14 det(U - AI) = det(U - AIf = det(UT - AI).

7.15 U= ~ [~ -~], D=UHAU= [26 i 2~i]'


7.17 See Exercise 6.13.
7.18 The eigenvalues are 1, 1,4, and the orthonormal eigenvectors are
(~, - ~, 0), (- ~, - ~, ~) and (~, ~, ~). Therefore,

A ~~ [;: ~: ~:] + ~ [: : :].


360 Selected Answers and Hints

7.20 See Theorem 7.8.


7.21 If A is an eigenvalue of A, then An is an eigenvalue of An. Thus, if An = 0,
then An = 0 or A = o. Conversely, by Schur's lemma, A is similar to an upper
triangular matrix, whose diagonals are eigenvalues that are supposed to be
zero. Then it is easy to conclude A is nilpotent.
7.22 Seven of them are true.
. [COS{}
(2) ConsIder . {}
-Sin{}] .
{} wIth {} =I- k7r.
sm cos

(3) If m =I- n, false. (4) Consider [~ ;] .

(6) and (7) A permutation matrix is an orthogonal matrix, but not symmetric.
(10) There is an invertible matrix Q such that A = Q-1 DQ. Thus,
det(A + if) = det(D + if) =I- o.
(11) Consider A = [; =~]. (12) Modify (10).

Chapter 8
Problems
8.1

-1
3 -4] 1 [0 1 1]
1 ,(2) 2 1 0 1 ,(3) [ 101 101 -100 -025] .
1 4 1 1 0 -5 0 2-1

8.2 (1) The eigenvalues of A are 1, 2, 11. (2) The eigenvalues are 17, 0, -3, and
so it is a hyperbolic cylinder. (3) A is singular and the linear form is present,
thus the graph is a parabola.
8.4 (1) local minimum, (2) saddle point.
8.5 (1) is indefinite. (2) and (3) are positive definite.
8.6 max =~ at ±(I/V2, 1/V2), min =~ at ±(I/V2, -1/V2).
8.7 (1) max = 4 at ± ~ (1, 1, 2), min = -2 at ± ~ (-1, -1, 1);
1 . 1
(2) max = 3 at ± J6 (2, 1, 1), mm = 0 at ± J3 (1, -1, -1).
8.8 A is negative definite if and only if -A is positive definite.
8.9 B with the eigenvalues 2, 2 + V2 and 2 - V2.
8.12 The determinant is the product of the eigenvalues.
8.13 (2) bl l = b14 = b41 = b44 = 1, all others are zero.
Selected Answers and Hints 361

8.14 If u E Un W, then u = ax + {3y E W for some scalars a and {3. Since


x, y E U, b(u, x) = b(u, y) = O. But b(u, x) = {3b(y, x) = -{3 and
b(u, y) = ab(x, y) = a.
8.15 Let c(x,y) = ~(b(x,y) + b(y, x)) and d(x,y) = ~(b(x,y) - b(y,x)). Then
b = c+ d.
8.16 Let D be a diagonal matrix, and let D' be obtained from D by interchanging
two diagonal entries dii and dj j , i =I- j. Let P be the permutation matrix
interchanging i-th and j-th rows. Then PDpT = D'.
8.17 Count the number of distinct inertia (p, q, k). For n, the number of inertia
with p = i is n - i + l.
8.18 (3) index = 2, signature = 1, and rank = 3.

Exercises

8.1 (1) [ ; ; ], (3) [ ; -;


3 -4 -3
-!],
(4) [~
0
-;
4-1
-~].
8.2 (2) {(2, 1, 2), (-1, -2, 2), (1, 0, 0)}.(2, 1,0).

8.5 (2) The point (1,7r) is a critical point, and the Hessian is [~ _~]. Hence,
1(1,7r) is a local maximum.
8.7 Note that the maximum value of R(x) is the maximum eigenvalue of A, and
similarly for the minimum value.
8.9 If A is an eigenvalue of A, then ..\2 and *are eigenvalues of A2 and A-I,
respectively. Note x T (A + B)x = x T Ax + x T Bx.

8.11 (1) Q = ~ [~ _~]. The form is indefinite with eigenvalues A = 5 and


A =-l.
8.12 (i) If a = 0 = c, then Ai = ±b. Thus the conic section is a hyperbola.
(ii) Since we assumed that b =I- 0, the discriminant (a - c)2 + 4b 2 > O. By the
symmetry of the equation in x and y, we may assume that a - c;::: O.
If a - c = 0, then Ai = a ± b. Thus, the conic section is an ellipse if Al..\2 =
a2 - b2 > 0, or a hyperbola if a2 - b2 < O. If AIA2 = a2 - b2 = 0, then it is a
parabola when Al =I- 0 and e' =I- 0, or a line or two lines for the other cases.
If a - c > O. Let r2 = (a - c)2 + 4b2 > O. Then Ai = (a+~)±r for i = 1,2.
Hence, 4..\IA2 = (a + c)2 - r2 = 4(ac - b2). Thus, the conic section is an
ellipse if det A = ac - b2 > 0, or a hyperbola if det A = ac - b2 < O. If
det A = ac - b2 = 0, it is a parabola, or a line or two lines depending on some
possible values of d', e' and the eigenvalues.

8.14 (1) A = [; -~], (2) B = [~ ~], (3) Q = [~ -i J.


362 Selected Answers and Hints

8.18 (2) The signature is 1, the index is 2, and the rank is 3.


8.19 Seven of them are true.
(5) Consider a bilinear form b(x, y) = XlYl - X2Y2 on ]R2.
(7) The identity 1 is congruent to k 2 1 for all k E R (S) See (7).

(9) Consider a bilinear form b(x, y) = XlY2. Its matrix Q = [~ ~] is not


diagonalizable.

Chapter 9
Problems
9.2 (1) For A = -1, Xl = (-2, 0, 1), X2 = (0, 1, 1), and for A = 0, Xl =

n
(-1, 1, 1). (2) For A = 1, Xl = (-2, 0, 1), X2 = (~, ~, 0), and for A = -1,
Xl = (-9, -1, 1).

93 (2) [H !], [~ ~ ~ (3)

9.5 The eigenvalue is -1 of multiplicity 3 and has only one linearly independent
eigenvector (1,0,3). The solution is

y(t) = [~~m] =e-t [-1-=-1 5!:t2t22 ].


Y3(t) 1 - 15t + 6t

9.6 See Problem 6.2.


9.7 Let AI,"" An be the eigenvalues of A. Then
f(A) = det(AI - A) = (A - AI)'" (A - An).
Thus, f(B) = (B-Al/ m )··· (B-An1m) is non-singular if and only if B-AJm,
= 1, ... , n, are all non-singular. That is, none of the Ai'S is an eigenvalue of
i
B.
9.S The characteristic polynomial of A is f(A) = (A-1)(A-2)2, and the remainder
4 0
is 104A 2 _ 22SA + 13S1 = [10 9S S04].
° ° 9S
Exercises
9.1 Find the Jordan canonical form of A as Q-l AQ = J. Since A is nonsingular,
all the diagonal entries Ai of J, as the eigenvalues of A, are nonzero. Hence,
each Jordan blocks J j of J is invertible. Now one can easily show that
(Q-lAQ)-l = Q-lA-lQ = J-l which is the Jordan form of A-I, whose
Jordan blocks are of the form J j- l .
Selected Answers and Hints 363

9.3 (x, Y) = ~(4 + i, i).

9.4 y(t) = V2e 4t [ ~ ]- V2e 2t [ -~ ].

Yl(t) _2e 2 (1-t) + 4e 2 (t-l)


9.5 { Y2(t) _e 2 (1-t) + 2e 2 (t-l)
Y3(t) 2e 2 (1-t) 2e 2 (t-l)
Yl (t) 2(t-l)e t
9.6 { Y2(t) -2te t
Y3(t) (2t-l)e t
9.8 (1) (a-d)2+4bc=lOorA=aI.
9.9 (1) t 2 + t - 11, (2) t 2 + 2t + 13, (3) (t - 1)(t 2 - 2t - 5).
9.11 (2) The characteristic polynomial of W is f(>..) = >..n - l.
(4) The characteristic polynomial of B is f(>..) = (>.. - n + 1)(>" + l)n-l.
9.12 Four of them are true.
Index

LDU factorization, 36 Circulant matrix, 342


n-space,76 Cofactor, 62
QR factorization, 202 Cofactor expansion, 61
Column (matrix), 13
Additivity, 162 Column space, 83, 94
Adjoint, 65, 154, 259 Column vector, 13, 77
Angle, 161, 165 Companion matrix, 247
Associated matrix, 136 Computer graphics, 132
Augmented matrix, 5 Congruent matrix, 289
Conic section, 283
Back substitution, 8 Conjugate, 252
Basic variable, 9 Coordinate, 75
Basis, 87 Coordinate change matrix, 145
change of, 143 Coordinate function, 152
dual, 153 Coordinate system
ordered, 130 rectangular, 169
orthonormal, 169 Coordinate vector, 87, 130
standard, 87, 168 Cramer's rule, 66
Bessel's inequality, 185 Critical point, 292
Bijective, 128 Cross-product term, 282
Bilinear form, 303 Cryptography, 39
diagonalizable, 307
matrix representation of, 305 Definite form, 294
non-degenerate, 304 negative, 294
rank of, 306 positive, 294
skew-symmetric, 307 Determinant, 50
symmetric, 307 Diagonal entry, 14
Block,22 Diagonal matrix, 14
Block matrix, 22 Diagonalizable
orthogonally, 263
Cauchy-Schwarz inequality, 164, 256 unitarily, 263
Cayley-Hamilton theorem, 248, 337 Diagonalization of
Characteristic polynomial, 210 linear transformation, 243
Characteristic value, 210 matrices, 216
Characteristic vector, 210 Difference equations, 221

365
366 Index

Differential equation, 230, 240 Generalized eigenspace, 330


system of linear, 227 Generalized eigenvector, 328
Dilation, 184 chain of, 328
Dimension, 90 Global maximum 295
finite, 90 Global minimum', 295
infinite, 90
Direct sum, 92 Hermitian matrix, 259
Distance, 161, 165, 253 Hessian, 293, 296
Dot product, 161 Homogeneity, 162
Dual basis, 153 Homogeneous system, 1
Dual space, 152
Idempotent matrix, 48, 246
Identity matrix, 19
Eigenspace, 210
Identity transformation, 122
Eigenvalue, 210
Image, 124
Eigenvector, 209
Indefinite form 294
Electrical network 38
Index,311 '
Elementary colum~ operation, 29
Inertia, 284
Elementary matrix, 27
Initial condition 227
Elementary operations, 4
Injective, 128 '
Elementary product, 59
Inner product, 162, 252
signed,59
complex, 252
Elementary row operation, 6
Euclidean, 161, 162
Elimination, 2
Hermitian, 252
Entry, 13 matrix representation of 167
Euclidean n-space, 162 positive definite, 162 '
Euclidean inner product, 162
Inner product space
Exponential matrix, 235
real, 162
Input-output model, 42
Factorization
Interpolating polynomial, 113
LDU, 36
Inverse, 127
Fibonacci, 222
left, 24
number, 222
right, 24
sequence, 222
Inverse matrix, 25
Forward elimination 7
Inversion, 57
Fourier coefficient 152
Invertible matrix 25
Free variables, 9 '
Isometry, 182 '
Fundamental set, 229 Isomorphism, 128
Fundamental theorem, 100
natural, 130
first, 100
second, 186 Jordan, 318, 322
block,322
Gauss-Jordan elimination 8 canonical form, 318, 322
Gaussian elimination 8 '
General solution, 227', 229 Kernel, 124
Index 367

Kirchhoff's Current Law, 38 Magnitude, 161, 165


Kirchhoff's Voltage Law, 38 Markov process, 224
Matrix, 12
Leading 1's, 8 associated, 136
Least square solution, 187 augmented, 5
Least square solutions, 187 block,22
Length, 161, 165, 253 circulant, 342
Linear combination, 81 column, 13
Linear dependence, 84 congruent, 289
Linear difference equation, 222, 224 coordinate change, 145
Linear differential equation diagonal, 14
system of, 227 diagonalizable, 216
Linear equations, 1 diagonalization of, 216
elementary, 27
consistent system of, 1
entry of, 13
homogeneous system of, 1
exponential, 235
inconsistent system of, 1
Hermitian, 259
system of, 1
idempotent, 48
Linear form, 279, 280
identity, 19
Linear functional, 152
indefinite, 294
Linear programming, 298
inverse, 25
Linear transformation, 121 invertible, 25
associated matrix of, 136 lower triangular, 14
diagonalization of, 243 negative definite, 294
dilation, 184 negative semidefinite, 294
eigenvalue of, 243 nilpotent, 48
eigenvector of, 243 non-singular, 25
identity, 122 normal, 269
image, 124 order of, 13
invertible, 128 orthogonal, 181
isomorphism, 128 orthogonal part of, 202
kernel, 124 orthogonal projection, 196
matrix representation of, 136 permutation, 28, 29
orthogonal, 182 positive definite, 294
projection, 171 power of, 333
reflection, 123 product of, 18
rotation, 123 row, 13
scalar multiplication of, 140 scalar multiplication of, 14
sum of, 140 semidefinite, 294
transpose, 154 similar, 148
zero, 122 simultaneously diagonalizable, 246
Linearly dependent, 84 singular, 25
Linearly independent, 84 size of, 13
Lower triangular matrix, 14 skew-Hermitian, 259
368 Index

skew-symmetric, 16 Orthogonalization, 179


square, 13 Gram-Schmidt, 179
stochastic, 225 Orthogonally similar, 263
sum of, 15 Orthonormal basis, 169
symmetric, 16 Orthonormal vectors, 169
transition, 145
transpose of, 13 Parabolic cylinder, 286
unitary, 261 Parallelepiped, 68
upper triangular, 14 Parallelogram, 68
upper triangular part of, 202 Parallelogram equality, 206
Vandermonde, 114 Particular solution, 227
zero, 15 Permutation, 57
Matrix of cofactors, 65 even, 58
Matrix polynomial, 46, 337 inversion of, 57
Matrix representation, 136, 167, 305 odd, 58
inner product, 167 sign of, 58
linear transformation, 136 Permutation matrix, 28, 29
quadratic form, 281 Perpendicular vectors, 166
Maximum, 293 Pivot, 7
Minimum, 293 polarization identity, 206
Minor, 62 Principal submatrix, 300
Monic, 247 Projection, 171
orthogonal, 174
Newton's second law, 196 Pythagorean theorem, 166, 256
Nilpotent matrix, 48
Quadratic equation, 280
Non-singular matrix, 25
Quadratic form, 279, 280
Normal equation, 188
matrix representation of, 281
Normal matrix, 269
Quadratic surface, 283
Normalization, 169
Null space, 94 Rank,101
Nullity, 94 Rayleigh quotient, 314
Real inner product space, 162
Ohm's Law, 38 Recurrence relation, 222
One-to-one, 128 Row (matrix), 13
Onto, 128 Row space, 94
Ordered basis, 130 Row vector, 13,94
Orthogonal, 173 Row-echelon form, 8
Orthogonal complement, 173 reduced,8
Orthogonal decomposition, 173 Row-equivalent, 6
Orthogonal matrix, 181
Orthogonal projection, 174 Saddle point, 293, 295
Orthogonal projection matrix, 196 Scalar, 14, 15, 75
Orthogonal transformation, 182 Scalar multiplication of
Orthogonal vectors, 166 linear transformation, 140
Index 369

matrix, 14 identity, 122


vectors, 78 injective, 128
Schur's lemma, 264 linear, 121
Second derivative test, 295 mapping, 121
Self-adjoint, 259 surjective, 128
Semidefinite form, 294 zero, 122
negative, 294 Transition matrix, 145
positive, 294 Transpose, 13, 154
Signature, 311 Transposition, 57
Similar, 148 Triangular inequality, 166, 256
orthogonally, 263
unitarily, 263 Unit vector, 169
Similar matrix, 148 Unitarily similar, 263
Similarity, 148, 214 Unitary matrix, 261
Simultaneously diagonalizable, 246 Unitary space, 253
Singular matrix, 25 Upper triangular matrix, 14
Skew-Hermitian matrix, 259
Skew-symmetric matrix, 16 Value
characteristic, 210
Spanning set, 82
Vandermonde matrix, 114
Spectral decomposition, 272, 273
Spectral theorem, 271 Vector, 75, 77
Square matrix, 13 characteristic, 210
Square term, 282 column, 13
component of, 75
Standard basis, 87, 168
Stochastic matrix, 225 coordinate, 87
orthogonal, 166
Submatrix, 22
perpendicular, 166
minor, 62
principal, 300 row, 13, 94
scalar multiplication of, 78
Subspace, 79
sum of, 77
fundamental, 185
unit, 169
spanned, 81
zero, 77
sum of, 92
Vector addition, 77
Substitution, 2
Vector space, 77
Sum of
complex, 78, 252
linear transformations, 140
isomorphic, 128
matrices, 14
real, 77
subspaces, 92
Volume, 68
vectors, 77
Surjective, 128 Wronskian, 116, 229
Sylvester's law of inertia, 310
Symmetric matrix, 16 Zero matrix, 15
Zero transformation, 122
Trace, 123, 152 Zero vector, 77
Transformation

Common questions

Powered by AI

The row space and column space of a matrix have the same dimension, known as the rank of the matrix. This is because the number of pivot columns (non-zero row vectors in row-echelon form) determines both spaces' dimensions .

The determinant of the product of two matrices A and B is the product of their determinants, i.e., det(AB) = det(A)det(B). This property relies on the linearity and multiplication rule of determinants that allows such a decomposition .

For a triangular matrix, forward elimination can reduce it to a diagonal form without altering the determinant. In such matrices, the determinant equals the product of the diagonal entries as there are no cross terms and the rest of the terms in the determinant computation are zero .

A matrix is invertible if and only if its determinant is non-zero. If det A = 0, the matrix A is singular and not invertible .

An orthonormal basis can be constructed using the Gram-Schmidt process, which orthogonalizes a set of vectors and normalizes them to unit vectors. This transformation maintains the vector space structure while enforcing orthogonality and normalization .

Elementary row operations affect the determinant of a matrix in the following ways: Multiplying a row by a constant k scales the determinant by k; interchanging two rows changes the sign of the determinant; adding a multiple of one row to another does not change the determinant .

The inverse of an invertible matrix A can be found using its adjugate by the formula A^-1 = adj(A)/det(A). This makes use of the relationship between the matrix and its cofactor matrix (adjugate), divided by the determinant .

The determinant of a matrix A is equal to the determinant of its transpose, det(A) = det(A^T). This property implies that the transpose operation does not affect the invertibility of a matrix; if A is invertible, so is A^T, affirming that singular and invertible properties are preserved through transpose .

A set of vectors is linearly independent if there is no non-trivial linear combination of them that equals the zero vector. Conversely, they are dependent if such a combination exists. For matrix columns or rows, this translates to the homogeneous equation Ax = 0 having only the trivial solution indicating independence; otherwise, they are dependent .

A square matrix is diagonalizable if it has a full set of linearly independent eigenvectors. Diagonalization involves expressing the matrix in terms of these eigenvectors, where the diagonal matrix represents eigenvalues along its diagonal. This requires the eigenvectors to form a complete basis for the matrix space .

You might also like