Linear Algebra Fundamentals and Applications
Linear Algebra Fundamentals and Applications
SUNGPYO HONG
Linear Algebra
ISBN 978-1-4757-1202-5
Typesetting by the authors in L"'r~
9 8 7 65 43 2 1
Preface
Linear algebra is one of the most important subjects in the study of science
and engineering because of its widespread applications in social or natural
science, computer science, physics, or economics. As one of the most useful
courses in undergraduate mathematics, it has provided essential tools for
industrial scientists. The basic concepts of linear algebra are vector spaces,
linear transformations, matrices and determinants, and they serve as an
abstract language for stating ideas and solving problems.
This book is based on the lectures delivered several years in a sophomore-
level linear algebra course designed for science and engineering students. The
primary purpose of this book is to give a careful presentation of the basic
concepts of linear algebra as a coherent part of mathematics, and to illustrate
its power and usefulness through applications to other disciplines. We have
tried to emphasize the computational skills along with the mathematical
abstractions, which have also an integrity and beauty of their own. The
book includes a variety of interesting applications with many examples not
only to help students understand new concepts but also to practice wide
applications of the subject to such areas as differential equations, statistics,
geometry, and physics. Some of those applications may not be central to
the mathematical development and may be omitted or selected in a syllabus
at the discretion of the instructor. Most basic concepts and introductory
motivations begin with examples in Euclidean space or solving a system of
linear equations, and are gradually examined from different points of views
to derive general principles.
For those students who have completed a year of calculus, linear algebra
may be the first course in which the subject is developed in an abstract way,
and we often find that many students struggle with the abstraction and
miss the applications. Our experience is that, to understand the material,
students should practice with many problems, which are sometimes omitted
because of a lack of time. To encourage the students to do repeated practice,
v
VI Preface
we placed in the middle of the text not only many examples but also some
carefully selected problems, with answers or helpful hints. We have tried
to make this book as easily accessible and clear as possible, but certainly
there may be some awkward expressions in several ways. Any criticism or
comment from the readers will be appreciated.
We are very grateful to many colleagues in Korea, especially to the faculty
members in the mathematics department at Pohang University of Science
and Technology (POSTECH), who helped us over the years with various
aspects of this book. For their valuable suggestions and comments, we would
like to thank the students at POSTECH, who have used photocopied versions
of the text over the past several years. We would also like to acknowledge the
invaluable assistance we have received from the teaching assistants who have
checked and added some answers or hints for the problems and exercises in
this book. Our thanks also go to Mrs. Kathleen Roush who made this book
much more legible with her grammatical corrections in the final manuscript.
Our thanks finally go to the editing staff of Birkhauser for gladly accepting
our book for publication.
Jin Ho Kwak
Sungpyo Hong
E-mail: jinkwak@[Link]
sungpyo@[Link]
April 1997, in Pohang, Korea
Preface v
2 Determinants 49
2.1 Basic properties of determinant 49
2.2 Existence and uniqueness 54
2.3 Cofactor expansion . . . . . . . 60
2.4 Cramer's rule . . . . . . . . . . 65
2.5 Application: Area and Volume 68
2.6 Exercises .......... . 71
3 Vector Spaces 75
3.1 Vector spaces and subspaces . 75
3.2 Bases . . . . . . . . . . 81
3.3 Dimensions . . . . . . . 88
3.4 Rowand column spaces 94
3.5 Rank and nullity . . . . 100
vii
viii CONTENTS
Index 365
Linear Algebra
Chapter 1
1.1 Introduction
One of the central motivations for linear algebra is solving systems of linear
equations. We thus begin with the problem of finding the solutions of a
system of m linear equations in n unknowns of the following form:
au Xl + a12 X 2 + + alnXn bl
a2l X l + a22 x 2 + + a2n X n b2
where Xl, X2, ... , Xn are the unknowns and ai/s and b/s denote constant
(real or complex) numbers.
A sequence of numbers (81, 82, ... , 8 n ) is called a solution of the
system if Xl = 81, X2 = 82, ... , Xn = 8 n satisfy each equation in the system
simultaneously. When bl = b2 = ... = b m = 0, we say that the system is
homogeneous.
The central topic of this chapter is to examine whether or not a given
system has a solution, and to find a solution if it has one. For instance,
any homogeneous system always has at least one solution Xl = X2 = ... =
Xn = 0, called the trivial solution. A natural question is whether such a
homogeneous system has a nontrivial solution. If so, we would like to have a
systematic method of finding all the solutions. A system of linear equations
is said to be consistent if it has at least one solution, and inconsistent if
1
2 CHAPTER 1. LINEAR EQUATIONS AND MATRICES
{ x-y =
x-y =
-1
0 { x+y = 1
x-y = 0
{
X -
2x - 2y
Y -1
-2
y Y y
1 1
x x x
To decide whether the given system has a solution and to find a general
method of solving the system when it has a solution, we repeat here a well-
known elementary method of elimination and substitution.
Suppose first that the system consists of only one equation ax + by = c.
Then the system has either infinitely many solutions (i.e., points on the
straight line x = -h
+ ~ or y = -%x + ~ depending on whether a i= 0 or
b i= 0) or no solutions when a = b = 0 and c i= O.
1.1. INTRODUCTION 3
We now assume that the system has two equations representing two lines
in the plane. Then clearly the two lines are parallel with the same slopes
if and only if a2 = Aal and b2 = Ab l for some A f. 0, or a I b2 - a2bI = o.
Furthermore, the two lines either coincide (infinitely many solutions) or are
distinct and parallel (no solutions) according to whether C2 = ACI holds or
not.
Suppose now that the lines are not parallel, or al b2 - a2bI f. o. In this
case, the two lines cross at a point, and hence there is exactly one solution:
For instance, if the system is homogeneous, then the lines cross at the origin,
so (0,0) is the only solution. For a nonhomogeneous system, we may find
the solution as follows: Express x in terms of y from the first equation, and
then substitute it into the second equation (i.e., eliminate the variable x
from the second equation) to get
which is in turn substituted into one of the equations to find x and give a
complete solution of the system. In detail, the process can be summarized
as follows:
(1) Without loss of generality, we may assume al f. 0 since otherwise we can
interchange the two equations. Then the variable x can be eliminated from
the second equation by adding - a2 times the first equation to the second,
al
to get
C!
aIc2 - a2C!
y =
a I b2 - a2 bI .
4 CHAPTER 1. LINEAR EQUATIONS AND MATRICES
(3) Now, x is solved by substituting the value of y into the first equation,
and we obtain the solution to the problem:
b2 Cl - b1 C2
al b 2 - a2bl
alc2 - a2Cl
a 1 b 2 - a2 bl .
Note that the condition a 1 b 2 - a2bl i- 0 is necessary for the system to have
only one solution. 0
This matrix is called the augmented matrix for the system. The term
matrix means just any rectangular array of numbers, and the numbers in this
array are called the entries of the matrix. To explain the above operations
in terms of matrices, we first introduce some terminology even though in the
following sections we shall study matrices in more detail.
Within a matrix, the horizontal and vertical subarrays
are called the i-th row (matrix) and the j-th column (matrix) of the aug-
mented matrix, respectively. Note that the entries in the j-th column are
6 CHAPTER 1. LINEAR EQUATIONS AND MATRICES
The general procedure for finding the solutions will be illustrated in the
following example:
+
3~ ++
2y 4z 2
2y + 2z 3
{ 4y + 6z = -1.
1.2. GAUSSIAN ELIMINATION 7
[ ~~~ ~].
3 4 6 -1
(1) Since the coefficient of x in the first equation is zero while that in the
second equation is not zero, we interchange these two equations:
+ 2z 3
+ 4z 2
+ 6z -1
X + 2y + 2z = 3 1 2
{ 2y + 4z = 2 [ o 2
- 2y = -10 o -2
The coefficient 1 of the first unknown x in the first equation (row) is called
the pivot in this first elimination step.
Now the second and the third equations involve only the two unknowns
y and z. Leave the first equation (row) alone, and the same elimination
procedure can be applied to the second and the third equations (rows): The
pivot for this step is the coefficient 2 of y in the second equation (row). To
eliminate y from the last equation,
(3) Add 1 times the second equation (row) to the third equation (row):
[H : J]
X + 2y + 2z = 3
{ 2y + 4z = 2
4z = -8
The elimination process done so far to obtain this result is called a for-
ward elimination: i.e., elimination of x from the last two equations (rows)
and then elimination of y from the last equation (row).
Now the pivots of the second and third rows are 2 and 4, respectively.
To make these entries 1,
8 CHAPTER 1. LINEAR EQUATIONS AND MATRICES
X + 2y + 2z = 3
{ y + 2z = 1
z = -2
The resulting matrix on the right side is called a row-echelon form of the
matrix, and the 1 's at the leftmost entries in each row are called the leading
1 'so The process so far is called a Gaussian elimination.
We now want to eliminate numbers above the leading 1 's;
(5) Add -2 times the third row to the second and the first rows,
[ o~~~ ~l·
7
5
z -2 0 1 -2
[ o~ ~ ~ -~ 1
-3
y 5
z -2 0 1 -2
(4) Each column that contains a leading 1 has zeros everywhere else, in
addition to the above three properties.
Note that an augmented matrix has only one reduced row-echelon form
while it may have many row-echelon forms. In any case, the number of
nonzero rows containing leading 1's is equal to the number of columns con-
taining leading 1'so The variables in the system corresponding to columns
with the leading 1's in a row-echelon form are called the basic variables. In
general, the reduced row-echelon form U may have columns that do not con-
tain leading 1'so The variables in the system corresponding to the columns
without leading 1's are called free variables. Thus the sum of the number
of basic variables and that of free variables is precisely the total number of
variables.
For example, the first two matrices below are in reduced row-echelon
form, and the last two just in row-echelon form.
Notice that in an augmented matrix [A b], the last column b does not
correspond to any variable. Hence, if we consider the four matrices above
as augmented matrices for some systems, then the systems corresponding
to the first and the last two augmented matrices have only basic variables
but no free variables. In the system corresponding to the second augmented
matrix, the second and the forth variables, X2 and X4, are basic, and the
first and the third variables, Xl and X3, are free variables. These ideas will
be used in later chapters.
In summary, by applying a finite sequence of elementary row operations,
the augmented matrix for a system of linear equations can be changed to
its reduced row-echelon form which is row-equivalent to the original one.
From the reduced row-echelon form, we can decide whether the system has
a solution, and find the solution of the given system if it has one.
{ Xl
+ 3X2 2X3 = 3
2XI + 6X2 2X3 + 4X4 18
X2 + X3 + 3X4 = 10.
10 CHAPTER 1. LINEAR EQUATIONS AND MATRICES
[1o 3 -2 0 3]
2 6 -2 4 18
1 1 3 10
.
[1o 3 -2 0 3]
0 2 4 12 .
o 1 1 3 10
(2) Note that the coefficient of X2 in the second equation is zero and that
in the third equation is not. Thus, interchanging the second and the third
rows produces
[1o 3 -2 0 3]
011310
0 2 4 12
.
(3) The pivot in the third row is 2. Thus, dividing the third row by 2
produces a row-echelon form
3 -2 0 3]
[~ 1
o
1 3 10
1 2 6
.
1 3 0 4 15]
[ o 1 0 1 4 .
00126
(5) Finally, adding -3 times the second row to the first produces the
reduced row-echelon form:
~ ~ ~].
1 0
[ o 1
o 0 126
1.2. GAUSSIAN ELIMINATION 11
3
= 4
6.
Since Xl, x2, and X3 correspond to the columns containing leading 1's,
they are the basic variables, and X4 is the free variable. Thus by solving this
system for the basic variables in terms of the free variable X4, we have the
system of equations in a solved form:
3
4
6
By assigning an arbitrary value t to the free variable X4, the solutions can
be written as
!
Remark: Consider a homogeneous system
""Xl
a2l X I
amlxl
+
+
+
al2 x 2
a22 x 2
a m 2X 2
+
+
+ ... +
+
+
alnXn
a2n X n
amnXn
=
=
=
0
0
0,
with the number of unknowns greater than the number of equations: that
is, m < n. Since the number of basic variables cannot exceed the number
of rows, a free variable always exists as in Example 1.3, so by assigning
an arbitrary value to each free variable we can always find infinitely many
nontrivial solutions.
Problem 1.2 Suppose that the augmented matrix for a system of linear equations
has been reduced to the reduced row-echelon form below by elementary row opera-
tions. Solve the systems:
10
(1) [ 0 1
oo 5] ,
-2
1 0 o 4
(2) [ 0 1 o 2
-1 ]
6 .
o 0 o 4 o 0 1 3 2
12 CHAPTER 1. LINEAR EQUATIONS AND MATRICES
{
-x + y + 2z
{ z
0 2y 1
(1) 3x + 4y + z 0 (2) 4x lOy + 3z 5
2x + 5y + 3z O. 3x 3y 6.
w + x + y 3
-3w 17x + y + 2z 1
(3) { 4w 17x + 8y 5z 1
5x 2y + z 1.
Problem 1.4 Determine the condition on b; so that the following system has a so-
lution.
+ 2y + 6z b1 + 3y 2z b1
(1) { ,: 3y 2z b2 (2) { 2~ Y + 3z b2
3x y + 4z h 4x + 2y + z h
1.3 Matrices
Rectangular arrays of real numbers arise in many real-world problems. His-
torically, it was the English mathematician A. Cayley who first introduced
the word "matrix" in the year 1858. The meaning of the word is "that within
which something originates," and he used matrices simply as a source for
rows and columns to form squares.
In this section we are interested only in very basic properties of such
matrices.
A=
or just A = [aij] if the size of the matrix is clear from the context. The
number aij is called the (i, j)-entry of the matrix A, and can be also written
as aij = [A]ij.
An mx 1 matrix is called a column (matrix) or sometimes a column
vector, and a 1 x n matrix is called a row (matrix), or a row vector.
These special cases are important, as we will see throughout the book. We
will generally use capital letters like A, B, C for matrices and small boldface
letters like x, y, z for columns or row vectors.
x=
(2) The entries all, a22, ... , ann are called the diagonal entries of A.
(3) A is called a diagonal matrix if all the entries except for the diagonal
entries are zero.
(4) A is called an upper (lower) triangular matrix if all the entries
below (above, respectively) the diagonal are zero.
The following matrices U and L are the general forms of the upper tri-
angular and lower triangular matrices, respectively:
Note that a matrix which is both upper and lower triangular must be a
diagonal matrix, and the transpose of an upper (lower) triangular matrix is
lower (upper, respectively) triangular matrix.
aIn]
:. + [ bu
:. [
au 7 bu al n 7bIn ].
Theorem 1.2 Suppose that the sizes of A, Band C are the same. Then
the following rules of matrix arithmetic are valid:
(1) (A + B) + C = A + (B + C), (written as A + B + C) (Associativity),
(2) A + 0 = 0 + A = A,
(3) A + (-A) = (-A) + A = 0,
(4) A + B = B + A, (Commutativity),
(5) k(A+B)=kA+kB,
(6) (k+£)A=kA+fA,
(7) (k£)A = k (fA).
Proof: We prove only (5) and the remaining are left for exercises. For any
(i, j),
o1 32]
o
A= -1
-2 -3 0
are symmetric and skew-symmetric, respectively. Notice here that all the
diagonal entries of a skew-symmetric matrix must be zero, since aii = -aii.
By a direct computation, one can easily verify the following rules of the
transpose of matrices:
A= [
-1
~ =~0 ~].
1
(1) For a 1 x n row matrix a = [a1 ... an] and an n x 1 column matrix
x = [Xl ... xn]T, the product ax is a lxl matrix (i.e., just a number)
defined by the rule
Xn
Note that the number of columns of the first matrix must be equal to the
number of rows of the second matrix to have entrywise multiplications of
the entries.
(2) For an m x n matrix
A=
[i I
where ai's denote the row vectors, and for an n x 1 column matrix x =
[Xl ... Xn]T, the product Ax is by definition an m x 1 matrix defined by
the rule:
a1 x 2:7=1 a1i x i
a2x 2:7=1 a2i X i
Ax=
I
amx
or in an expanded form
a m1 a m2 a mn Xn a m 1X 1 + a m 2x 2: + ... + amnXn
which is just an m x 1 column matrix of the form [b 1 b2 ... bm]T.
Therefore, for a system of m linear equations in n unknowns, by writing
the n unknowns as an n x 1 column matrix x and the coefficients as an m x n
matrix A the system may be expressed as a matrix equation Ax = b. Notice
that this looks just like the usual linear equation in one variable: ax = b.
18 CHAPTER 1. LINEAR EQUATIONS AND MATRICES
i.e., for i = 1, ... , m and j = 1, ... , r, it is the product of i-th row and
j-th column of A: n
[ABJij = ~bi = L aikbkj.
k=l
[! ~][~] = [2.1+3.5]=[17],
4·1+0·5 4
[! ~] [-~ ] = [2.2+3.(-1)]=[1],
4·2+0·(-1) 8
[! ~][~] = [ 2.0+3.0]=[0].
4·0+0·0 °
1.4. PRODUCTS OF MATRICES 19
Therefore, AB is
[~ ~] [! _~ ~] = [1~ ~ ~].
Since A is a 2x2 matrix and B is a 2x3 matrix, the product AB is a 2x3
matrix. If we concentrate, for example, on the (2, I)-entry of AB, we single
out the second row from A and the first column from B, and then we multiply
corresponding entries together and add them up, i.e., 4·1 + 0·5 = 4. D
Theorem 1.4 Let A, B, C be arbitrary matrices for which the matrix op-
erations below are defined, and let k be an arbitrary scalar. Then
(1) A(BC) = (AB)C, (written as ABC) (Associativity),
(2) A(B + C) = AB + AC, and (A + B)C = AC + BC, (Distributivity),
(3) I A = A = AI,
(4) k(BC) = (kB)C = B(kC),
(5) (AB)T = BT AT.
This clearly shows that [A(BC)L j = [(AB)C]ij for all i, j, and consequently
A(BC) = (AB)C as desired. 0
Problem 1.8 Prove or disprove: If A is not a zero matrix and AB = AG, then
B=G.
Problem 1.9 Show that any triangular matrix A satisfying AAT = AT A is a diag-
onal matrix.
1.4. PRODUCTS OF MATRICES 21
Theorem 1.5 Any system of linear equations has either no solution, exactly
one solution, or infinitely many solutions.
for any real number k. This says that Xl + kxo is also a solution of Ax = b
for any k. Since there are infinitely many choices for k, Ax = b has infinitely
many solutions. 0
Problem 1.11 For which values of a does each of the following systems have no
solution, exactly one solution, or infinitely many solutions?
x + 2y 3z 4
(1) { 3x y + 5z 2
4x + y + (a 2 - 14)z a+2.
x - y + z 1
(2) { x + 3y + az 2
2x + ay + 3z 3.
22 CHAPTER 1. LINEAR EQUATIONS AND MATRICES
a12
A u= [ all a 13 ] , A12 = [ a 14 ] ,
a21 a22 a23 a24
AB = [
AllBll + A12B21 AllB12 + A12B22]
A21Bll + A22B21 A21B12 + A22B22 '
provided that the number of columns in Aik is equal to the number of rows
in Bkj. This will be true only if the columns of A are partitioned in the
same way as the rows of B.
It is not hard to see that the matrix product by blocks is correct. Sup-
pose, for example, that we have a 3x3 matrix A and partition it as
1.5. BLOCK MATRICES 23
B= [~~~ ~~~ 1
b31 b32
[ Bn ].
B21
AB = [ Cn ] = [ AnBn + A12B21 ].
C12 A21 B n + A22B21
I
product AB is well-defined. By considering the matrix B as a block, the
product AB can be written
a1 a1 b1 a1 b2
a2
[ alB
a2 B a2 b1 a2 b2 alb"r
AB= B= = a 2:b ,
am amB a mb 1 a mb 2 amb r
A =
[
1 2
-~ ~: ~
1 1
~ 1' B=[~
-1
3
!i:].
-21 1
A= [~ ~ ~ 1 - =~ andB = [ -n'
we have AB = 12, and BA = [-5 2-4] h.9 -2
12 -4
6 -:F
9
Thus, the matrix B is a right inverse but not a left inverse of A, while A is a
left inverse but not a right inverse of B. Since (ABf = BT AT and IT = I,
a matrix A has a right inverse if and only if AT has a left inverse. D
B = BI = B(AC) = (BA)C = IC = C.
Now any two left inverses must be both equal to a right inverse C, and hence
to each other, and any two right inverses must be both equal to a left inverse
B, and hence to each other. So there exist only one left and only one right
inverse for a square matrix A ifit is known that A has both left and right
inverses. Furthermore, the left and right inverses are equal. 0
This theorem says that if a matrix A has both a right inverse and a left
inverse, then they must be the same. However, we shall see in Chapter 3
that any mxn matrix A with m -=1= n cannot have both a right inverse and
a left inverse: that is, a nonsquare matrix may have only a left inverse or
only a right inverse. In this case, the matrix may have many left inverses or
many right inverses.
B = [~ ~ :] is a left inverse of A. 0
Note that Lemma 1.6 shows that if a square matrix A has both left and
right inverses, then it must be unique. That is why we call B "the" inverse
of A. For instance, consider a 2x2 matrix A = [~ !]. If ad - be -=1= 0,
then it is easy to verify that
d
A-I _ 1 [d
-! ] [ =
ad- be ad-b
- be 1
- ad - be -e -e a '
ad- be ad- be
26 CHAPTER 1. LINEAR EQUATIONS AND MATRICES
Problem 1.13 Let A be an invertible matrix and k any nonzero scalar. Show that
(1) A-I is invertible and (A-1)-1 = A;
(2) the matrix kA is invertible and (kA)-1 = iA-1;
(3) AT is invertible and (AT)-l = (A-I)T.
Proof: Suppose that A and B are invertible matrices of the same size. Then
(AB)(B- l A-I) = A(BB-l )A- l = A1A-l = AA- l = 1, and similarly
(B- 1 A-I )(AB) = 1. Thus AB has the inverse B- 1 A-I. 0
Problem 1.15 Let A be an invertible matrix. Is it true that (Ak)T = (AT)k for any
integer k? Justify your answer.
1.7. ELEMENTARY MATRICES 27
Ax = A(Bb) = (AB)b = b.
In particular, if A is an invertible square matrix, then it has only one inverse
A-I by Lemma 1.6, and x = A-Ib is the only solution of the system. In
this section, we discuss how to compute A-I when A is invertible.
Recall that Gaussian elimination is a process in which the augmented
matrix is transformed into its row-echelon form by a finite number of ele-
mentary row operations. In the following, we will show that each elementary
row operation can be expressed as a nonsingular matrix, called an elementary
matrix, and hence the process of Gaussian elimination is simply multiplying
a finite sequence of corresponding elementary matrices to the augmented
matrix.
For example, the following matrices are three elementary matrices cor-
responding to each type of the three elementary row operations.
(2)
[ 0~1 ~ 0~1 1
°001 interchange the second and the fourth rows of I 4 ;
E = [-~0 0~ 1~].
Multiplying the elementary matrix E to b on the left produces the desired
result:
Similarly, the operation "interchanging the first and third rows" on the
matrix b can be achieved by multiplying a permutation matrix P, which is
an elementary matrix obtained from fa by interchanging two rows, to b on
the left:
Thus, for any m x n matrix A, E' EA = A, and E' E = I = EE'. That is,
every elementary matrix is invertible so that E-l = E', which is also an
elementary matrix.
For instance, if
1.7. ELEMENTARY MATRICES 29
(2) ~ (3) : Suppose that the homogeneous system Ax = 0 has only the
trivial solution x = 0:
o
= 0
= o.
This means that the augmented matrix [A OJ of the system Ax = 0 is reduced
to the system [In OJ by Gauss-Jordan elimination. Hence, A is row-equivalent
to In.
(3) ~ (4) : Assume A is row-equivalent to In, so that A can be reduced
to In by a finite sequence of elementary row operations. Thus, we can find
elementary matrices E 1, E 2, ... , Ek such that
A -- E- 1 2 1 ... E-
1 E-
1
k 1n-- E-
1 2 1 ... E-
1 E- k 1'
[~o
~ ~]
-c 1
[
~ ~ ~]
-b 0 1
[-! ~ ~].
0 0 1
1.7. ELEMENTARY MATRICES 31
Hence,
A-I = Ek··· E2EI = Ek··· E2EIIn.
It follows that the sequence of row operations that reduces an invertible ma-
trix A to In will resolve In to A-I. In other words, let [A I I] be the aug-
mented matrix with the columns of A on the left half, the columns of I
on the right half. A Gaussian elimination, applied to both sides, by some
elementary row operations reduces the augmented matrix [A I I] to [U I K],
where U is a row-echelon form of A. Next, the back substitution process by
another series of elementary row operations reduces [U I K] to [I I A-I]:
[A I I] -t [Ef···E1 A I Ef···EII] = [U I K]
-t [Fk ··· HU I Fk··· FIK] = [I I A-I],
where Ef ··· EI represents a Gaussian elimination and Fk ... FI represents
the back substitution. The following example illustrates the computation of
an inverse matrix.
n.
Example 1.10 Find the inverse of
A~[~ ~
We apply Gauss-Jordan elimination to
2 3 1
0 o 1 (-2)row 1 + row 2
[A I I]
U 3 5
0 2
1 0
0
0 o 1 (-l)row 1 + row 3
[~ ~ 1(-l)row 2
2 3 1 0
-1 -1 -2 1
n
-2 -1 -1 0
[~
2 3 1 0
1 1 2 -1
-2 -1 -1 (2)row 2+ row 3
0
32 CHAPTER 1. LINEAR EQUATIONS AND MATRICES
[
1 2 3 I
o 1 1 I 12 -10 00 1.
o 0 1 I 3 -2 1
This is [U I K] obtained by Gaussian elimination. Now continue the back
substitution to reduce [U I K] to [1 I A-I]
1 (-1)row 3 + row 2
[~
2 3
1 0 0
[UIK] 1 1 2 -1 0
( -3)row 3 + row 1
0 1 3 -2 1
[~
2 0
1 0
-8
-1
6
1 -1-3] (-2)row 2 + row 1
0 1 3 -2 1
[~ 1=
0 0 -6 4
1 0 -1 1 -1
-1 [II A- I J.
0 1 3 -2 1
Thus, we get
:h:
Note that if A is not invertible, then, at some step in Gaussian elimina-
nc~ [l ~ 1(k~O)
1 0 0
-1
A~ [ -:
-6
0
4 11
q'B~ 1 2 0
1 2 4
1 2 4
k 0
1 k
0 1
1.B. LDU FACTORIZATION 33
Ax = [ ~ ~ ~ ~ 1[ ~~ 1 [ -~ 1
-2 2 1 1 X3 7
= b.
Note that U is the matrix obtained from A after forward elimination, and
A = LU with
which is a lower triangular matrix with I's on the diagonal. Now, the system
1
Lc=b: -2
7
1.B. LDU FACTORIZATION 35
1
Ux=c: -4
-4
resolves to
x=
[
~: 3~ 1 [ -~ 1+ [~ 1
- 1- t 1 t -1 '
t 0 1
for t E R It is suggested that the readers find the solutions for various values
ofb. 0
o 1 rld l
o o 1
u o
o dn o dn o
36 CHAPTER 1. LINEAR EQUATIONS AND MATRICES
A= [
-1
~ -3~ ~]1 [~0 -~0-4-~].
This can be further factored as A = LDU by taking
[
~ _~ _~] = [~_~ ~] [~
o 0 -4 0 0 -4 0
1i2
0
1~12] = DU.
Suppose now that during forward elimination row interchanges are nec-
essary. In this case, we can first do all the row interchanges before doing any
other type of elementary row operations, since the interchange of rows can be
done at any time, before or after the other operations, with the same effect
on the solution. Those "row-interchanging" elementary matrices altogether
form a permutation matrix P so that no more row interchanges are needed
during Gaussian elimination of P A. So P A has an LDU factorization.
o
Of course, if we choose a different permutation pI, then the LDU fac-
torization of pI A may be different from that of P A, even if there is an-
other permutation matrix pI! that changes pI A to PA. However, if we fix
a permutation matrix P when it is necessary, the uniqueness of the LDU
factorization of A can be proved.
1.B. LDU FACTORIZATION 37
Proof: Suppose that A = L 1D 1U1 = L2D2U2, where the L's are lower
triangular, the U's are upper triangular, all with l's on the diagonal, and
the D's are diagonal matrices with no zeros on the diagonal. We need to
show Ll = L2, Dl = D2, and U1 = U2.
Note that the inverse of a lower (upper) triangular matrix is also a lower
(upper) triangular matrix. And the inverse of a diagonal matrix is also
diagonal. Therefore, by multiplying (LID1)-1 = D11 L11 on the left and
U:;1 on the right, our equation L 1D 1U1 = L2D2U2 becomes
The left side is an upper triangular matrix, while the right side is a lower
triangular matrix. Hence, both sides must be diagonal. However, since the
diagonal entries of the upper triangular matrix Ul U:;1 are all 1's, it must be
the identity matrix I (see Problem 1.24). Thus U1U:;1 = I, i.e., U1 = U2.
Similarly, L11 L2 = DID21 implies that Ll = L2 and Dl = D2. 0
f1T
What is the solution to Ax = b for b = [10 - l]T ?
Example 1.13 Determine the currents in the network given in the above
figure.
2 ohms 2 ohms
P
Q
1 ohms 1 ohms
h + h h at P,
h h + h at Q.
Observe that both equations are the same, and one of them is redundant.
By applying KVL to each of the loops in the network clockwise direction,
1.9. APPLICATION: LINEAR MODELS 39
we get
13 = o
o
18.
(1) (2)
30 ohms
5 volts
ABC D x Y Z Blank ?
1 1 1 1 1 1 1 1 1 1
o 1 2 3 23 24 25 26 27 28.
40 CHAPTER 1. LINEAR EQUATIONS AND MATRICES
1 0 0
A= [ 2 1 0
1,
111
To decode it, first break the message into two vectors in ]R3 as before:
1.9. APPLICATION: LINEAR MODELS 41
We want to find two vectors Xl, x2 such that AXi is the i-th vector of the
above two vectors: z. e.,
Using our correspondence between letters and numbers, the message we have
received is "THANKS".
Problem 1.28 Encode "TAKE UFO" using the same matrix A used in the above
example.
0.2]
C2 = [ 0.4 .
0.2
For example, if h decides to produce 100 units per year, then it will order (or
demand) 20 units from h, 40 units from h, and 20 units from Is: i.e., the
consumption vector of h for the production X2 = 100 units can be written as
a column vector: 100c2 = [2040 20]T. From the concept of the consumption
vector, it is clear that the sum of decimal fractions in the column C2 must
be ~ 1.
In our example, suppose that the demands (inputs) of the outputs are
given by the following matrix, called an input-output matrix:
output
h h Is
1, [
A = input h 0.1
0.3 0.2
0.4
03]
0.1 .
13 0.3 0.2 0.3
i i i
CI C2 C3
1.9. APPLICATION: LINEAR MODELS 43
In this matrix, an industry looks down a column to see how much it needs
from where to produce its total output, and it looks across a row to see how
much of its output goes to where. For example, the second row says that,
out of the total output X2 units of the steel industry h, as the intermediate
demand, the automobile industry h demands 10% of the output Xl, the steel
industry 12 demands 40% of the output X2 and the oil industry 13 demands
10% of the output X3. Therefore, it is now easy to see that the intermediate
demand of the economy can be written as
x = Ax+d.
Therefore,
x = (1 ~ A)-ld = [ ~ ] ,
which gives the total amount of product Xi of the industry Ii for one year to
meet the required demand.
44 CHAPTER 1. LINEAR EQUATIONS AND MATRICES
Remark: (1) Under the usual circumstances, the sum of the entries in a
column of the consumption matrix A is less than one because a sector should
require less than one unit's worth of inputs to produce one unit of output.
This actually implies that 1 - A is invertible and the production vector x is
feasible in the sense that the entries in x are all nonnegative as the following
argument shows.
(2) In general, by using induction one can easily verify that for any
k = 1,2, ... ,
If the sums of column entries of A are all strictly less than one, then
limk_oo Ak = 0 (see Section 6.6 for the limit of a sequence of matrices).
Thus, we get (1 - A)(I + A + ... + Ak + ... ) = 1, that is,
Problem 1.29 Determine the total demand for industries 11 ,12 and Is for the input-
output m[atg~ Ao~~d ~~; jextra demand vector d given below:
Problem 1.30 Suppose that an economy is divided into three sectors: h = services,
12 = manufacturing industries, and Is = agriculture. For each unit of output, h
demands no services from h, 0.4 units from h, and 0.5 units from Is. For each unit
of output, h requires 0.1 units from sector II of services, 0.7 units from other parts
in sector h, and no product from sector Is. For each unit of output, Is demands
0.8 units of services II, 0.1 units of manufacturing products from 12 , and 0.1 units
of its own output from 13 • Determine the production level to balance the economy
when 90 units of services, 10 units of manufacturing, and 30 units of agriculture are
required as the extra demand.
1.10. EXERCISES 45
1.10 Exercises
1.l. Which of the following matrices are in row-echelon form or in reduced row-
n
echelon form?
[~ -3]
0 0 0
[~
0 0 0
0 1 0
A= 0 1 0 4 , B=
0 0 1
0 0 1 2
0 0 0
0 0 0
[~ -~ 1 D~[~
-n'
1 0 0
0 1 2
c= 0 0 1 o ' 0 1 1
n [~ -tJ
0 0 1
0 0 0 0
0 0 0
1 0 0
E~[~ 0 1
1 0 -2
0 F=
1
0
0
0
0
1
0 0 0
n
1.2. Find a row-echelon form of each matrix.
1 2 3 4 5
-3 2 1
2 3 4 5 1
(I) [ 1-9 10 2
-6
-6 8 1
4 2 (2)
3 4
4 5
5
1
1
2
2
3
5 1 2 3 4
1.3. Find the reduced row-echelon form of the matrices in Exercise 1.2.
1.4. Solve the systems of equations by Gauss-Jordan elimination.
Xl + X2 + X3 X4 -2
2Xl X2 + X3 + X4 0
(I) { 3Xl + 2X2 X3 X4 1
Xl + X2 + 3X3 3X4 -8.
2x 3y 8
(2) { 4x 5y + z 15
2x + 4z 1.
U],
1.9. Compute ABC and CAB, for
A ~ [; -i : l' B ~ c~ [1 -1 l,
1.10. Prove that if A is a 3 x 3 matrix such that AB = BA for every 3 x 3 matrix
B, then A = kI3 for some constant k.
1.11. Let A = [~
001
i ~]. Find Ak for all integers k.
n' n'
1.12. Compute (2A - B)C and CC T for
A ~ [~ ~ B ~ -~ ~ [ c ~[ -2
~ 2~ ~].
1
1.13. Let f(x) = anx n +an_IX n - I + .. ·+alx+aO be a polynomial. For any square
matrix A, a matrix polynomial f(A) is defined as f(A) = anAn +an_lAn-l +
... + alA + aoI. For f(x) = 3x3 + x 2 - 2x + 3, find f(A) for
U n [~ -n
matrices.
3 3
(1) A = 5 (2) A = 2
n
3 0
[l
-1 0
1.15. Find AAT and AT A for the matrix A = 1 3
8 4
1.10. EXERCISES 47
n [~ n [~ -n
2 3
[~
1 2
2 3
A= 0 3
B= 2 C= 2
5 2
0 0
1.19. Suppose A is a 2 x 1 matrix and B is a 1 x 2 matrix. Prove that the product
AB is not invertible.
n
1.23. Find the LDLT factorization of the following symmetric matrices:
n ~n
o 1
Forward elimination is the same as Lc = b, and back-substitution is Ux = c.
1.28. Determine whether the following statements are true or false, in general, and
justify your answers.
(1) Let A and B be row-equivalent square matrices. Then A is invertible if
and only if B is invertible.
(2) Let A be a square matrix such that AA = A. Then A is the identity.
(3) If A and B are invertible matrices such that A2 = I and B2 = I, then
(AB)-l = BA.
(4) If A and B are invertible matrices, A + B is also invertible.
(5) If A, Band AB are symmetric, then AB = BA.
(6) If A and B are symmetric and the same size, then AB is also symmetric.
(7) Let ABT = I. Then A is invertible if and only if B is invertible.
(8) If a square matrix A is not invertible, then neither is AB for any B.
(9) If El and E2 are elementary matrices, then EIE2 = E 2 E 1 .
(10) The inverse of an invertible upper triangular matrix is upper triangular.
(11) Any invertible matrix A can be written as A = LU, where L is lower
triangular and U is upper triangular.
(12) If A is invertible and symmetric, then A-I is also symmetric.
Chapter 2
Determinants
49
50 CHAPTER 2. DETERMINANTS
By a direct computation, one can easily verify that the function det in
Definition 2.1 satisfies the following three fundamental properties:
[~ ~]
( 1 ) det 1.
Let us first derive some direct consequences of the three rules in the
definition (the readers are suggested to verify that det of 2 x 2 matrices also
satisfies the following properties):
Proof: (1) Any row can be placed in the first row with a change of sign in
the determinant by rule (R2), and then use rules (R3) and (R2).
(2) If A has a zero row, then the row is zero times the zero row. If A
has two identical rows, then interchanging these identical rows changes only
the sign of the determinant, but not A itself. Thus we get det A = - det A.
(3) By a direct computation using (1), we get
f =f +kf
!
1
A= [ b
b+c c+a
52 CHAPTER 2. DETERMINANTS
If we add the second row to the third, then the third row becomes
[a + b+ c a +b+c a + b + c],
which is a scalar multiple of the first row. Thus, det A = o. D
Problem 2.1 Show that, for an n x n matrix A and k E JR., det(kA) = k n det A.
Proof: (1) If A is a diagonal matrix, then it is clear that det A = all ... ann
by (1) of Theorem 2.1 and rule (Rd. Suppose that A is a lower triangular
matrix. Then a forward elimination, which does not change the determinant,
produces a zero row if A has a zero diagonal entry, or makes A row equivalent
to the diagonal matrix D whose diagonal entries are exactly those of A if
the diagonal entries are all nonzero. Thus, in the former case, det A = 0
and the product of the diagonal entries is also zero. In the latter case,
det A = det D = au·· . ann. Similar arguments apply when A is an upper
triangular matrix.
(2) Note again that a forward elimination reduces a square matrix A to
an upper triangular matrix, which has a zero row if A is singular and has no
zero row if A is nonsingular (see Theorem 1.8).
(3) If A is not invertible, then AB is not invertible, and so det(AB) =
o = det A det B. By the properties of the elementary matrices, it is clear
that for any elementary matrix E, det(EB) = det E det B. If A is invertible,
2.1. BASIC PROPERTIES OF DETERMINANT 53
Remark: From the equality det A = det AT, we could define the determi-
nant in terms of columns instead of rows in Definition 2.2, and Theorem 2.1
is also true with "columns" instead of "rows".
2 -4 o 0
[ 1 -3 o 1
A= 1 0 -1 2
3 -4 3 -1
54 CHAPTER 2. DETERMINANTS
o2 -4
-1
0
0
0
1
1
det A = det U = det r 0 0 -1 4
o 0 0 13
2· (_1)2 ·13 = 26. o
Problem 2.3 Prove that if A is invertible, then det A -1 = 1/ det A.
Problem 2.4 Evaluate the determinant of each of the following matrices:
f(A) = f [~ !] = f [: + +! ]
0 0
f[~ ~]+f[~!]
f[~ ~]+f[~ ~]+f[~ !]+f[~ ~]
ad+ 0+ 0 - be. o
2.2. EXISTENCE AND UNIQUENESS 55
det
[all
a21
a12
a22
a13]
a23
a31 a32 a33
q+det [ ~ 01 [ 0
Tl
0 0
= det
[ all~ a22
a12
0 a~3 + det a~l 0
0 a33 a31 0 a32
oo 0 1 [0
~ 1+ det [ ~ Tl
0
+det
[ all~ a23 + det a21
a120 a22
a32 0 0 0 a33 a31 0
all a22 a 33 + a12 a 23 a 31 + a13 a 21 a32 - all a23 a 32 - a12 a 21 a33 - a13 a 22 a 31·
Problem 2.5 Show that the above formula of the determinant for 3 x 3 matrices
satisfies the three rules in Definition 2.2.
To get the formula of the determinant for matrices of order n > 3, the
same computational process can be repeated using the three rules again, but
the computation is going to be more complicated as the order gets higher.
To derive the explicit formula for det A of order n > 3, we examine the above
case in detail. In the process of deriving the explicit formula for det A of a
3 x 3 matrix A, we can observe the following three steps:
(1st) By using the linearity of the determinant function in each row,
det A of a 3 x 3 matrix A is expanded as the sum of the determinants of
3 3 = 27 matrices. Except for exactly six matrices, all of them have zero
columns so that their determinants are zero (see the proof of Lemma 2.3).
(2nd) In each of these remaining six matrices, all entries are zero except
for exactly three entries that came from the given matrix A. Indeed, no two
of the three entries came from the same column or from the same row of A.
56 CHAPTER 2. DETERMINANTS
In other words, in each row there is only one entry that came from A and
at the same time in each column there is only one entry that came from A.
Actually, in each of the six matrices, the three entries from A, say aij,
ak£, and a pq , are chosen as follows: If the first entry aij is chosen from the
first row and the third column of A, say a13, then the other entries ak£ and
a pq in the product should be chosen from the second or the third row and
the first or the second column. Thus, if the second entry akl is taken from
the second row, the column it belongs to must be either the first or the
second, i.e., either a21 or a22. If a21 is taken, then the third entry a pq must
be, without option, a32. Thus, the entries from A in the chosen matrix are
a13, a21, and a32. Therefore, the three entries in each of the six remaining
matrices are determined as follows: when the row indices (the first indices i
of aij) are arranged in the order 1, 2, 3, the assignment of the column indices
1, 2, 3 (the second indices j of aij) to each of the row indices is simply a
re-arrangement of 1, 2, 3 without repetitions or omissions. In this way, one
can recognize that the number 6 = 3! is simply the number of ways in which
the three column indices 1, 2, 3 are rearranged.
(3rd) The determinant of each of the six matrices may be computed by
converting it into a diagonal matrix using suitable "column interchanges"
(see Theorem 2.2 (1)), so its determinant becomes ±aijaklapq, where the
sign depends on the number of column interchanges.
For example, for the matrix having entries a13, a22, and a31 from A,
one can convert this matrix into a diagonal matrix in a couple of ways:
for instance, one can take just one interchanging of the first and the third
columns or take three interchanges: the first and the second, and then the
second and the third, and then the first and the second. In any case,
order 1, 2, 3, which represents the diagonal matrix, we can take either just
one interchanging of 3 and 1, or three interchanges: 3 and 2, 3 and 1, and
then 2 and 1. In either case, the parity is odd so that the "-" sign in the
computation of determinant came from (_1)1 = (_1)3, where the exponents
mean the numbers of interchanges of the column indices.
In summary, in the expansion of det A for A E M3X3(~), the number 6 =
3! of the determinants which contribute to the computation of det A is simply
the number of ways in which the three numbers 1,2,3 are rearranged without
repetitions or omissions. Moreover, the sign of each of the six determinants is
determined by the parity (even or odd) of the number of column interchanges
required to arrive at the order of 1, 2, 3 from the given arrangement of the
column indices.
These observations can be used to derive the explicit formula of the deter-
minant for matrices of order n > 3. We begin with the following definition.
1 2
a = (a(l), a(2), ... , a(n)) = ( a(l) a(2)
Here, the first row is the usual lay-out of N n as the domain set, and the
second row is just an arrangement in a certain order without repetitions or
omissions of the numbers in N n as the image set. A permutation that inter-
changes only two numbers in N n , leaving the rest of the numbers fixed, such
as a = (3,2,1, ... ,n), is called a transposition. Note that the composition
of any two permutations is also a permutation. Moreover, the composition
of a transposition to a permutation a produces an interchanging of two num-
bers in the permutation a. In particular, the composition of a transposition
with itself always produces the identity permutation.
It is not hard to see that if Sn denotes the set of all permutations of N n ,
then Sn has exactly n! permutations.
Once we have listed all the permutations, the next step is to determine
the sign of each permutation. A permutation a = (j1, j2, ... , jn) is said
to have an inversion if js > jt for s < t (i.e., a larger number precedes
a smaller number). For example, the permutation a = (3,1,2) has two
inversions since 3 precedes 1 and 2.
58 CHAPTER 2. DETERMINANTS
( )_ { 1 if (1 is an even permutation
sgn (1 - -
l ·1f(1·IS an 0 dd permutatIOn.
.
It is not hard to see that the number of even permutations is equal to that
of odd permutations, so it is ~. In the case n = 3, one can notice that there
are 3 terms with + sign and 3 terms with - sign.
Problem 2.6 Show that the number of even permutations and the number of odd
permutations in Sn are equal.
Now, we repeat the three steps to get an explicit formula for det A of
a square matrix A = [aij] of order n. First, the determinant det A can
be expressed as the sum of determinants of n! matrices, each of which has
zero entries except the n entries alO"(l), a20"(2) , ... , ana(n) taken from A,
where (1 is a permutation of the set {I, 2, ... , n} of column indices. The n
entries alO"(l), a20"(2) , ... , ana(n) are chosen from A in such a way that no
2.2. EXISTENCE AND UNIQUENESS 59
two of them come from the same row or the same column. Such a matrix
can be converted to a diagonal matrix. Hence, its determinant is equal to
±al u(1)a2u(2) ... anu(n) , where the sign ± is determined by the parity of the
number of column interchanges to convert the matrix to a diagonal matrix,
which is equal to that of inversions in a: sgn(a). Therefore, the determinant
of the matrix whose entries are all zero except for aiu(i) 's is equal to
It is not difficult to see that this explicit formula for det A satisfies the
three rules in the definition of the determinant. Therefore, we have both
existence and uniqueness for the determinant function of square matrices of
any order n ~ 1.
In general, for a fixed j, 1 :::; j :::; n, there are (n - I)! permutations a's in
8 n such that a(l) = j. Each a of those permutations has j - 1 inversions (j
60 CHAPTER 2. DETERMINANTS
The first factor ala(l) in each term can be anyone of au, a12, ... , al n in the
first row of A. Among the n! terms in this sum, there are precisely (n - 1)!
permutations such that ala(l) = an, i.e., 0"(1) = 1. The sum of those terms
such that 0"(1) = 1 can be written as auAu, where
summing over all permutations T of the numbers {2, 3, ... ,n}. The term
(-1)° means that there is no extra inversion other than that of T if 0"( 1) = 1
is at the first place. Note that each term in An contains no entries from the
2.3. COFACTOR EXPANSION 61
first row or from the first column of A. Hence, all the terms of the sum in Au
are the signed elementary products of the submatrix Mu of A obtained by
deleting the first row and the first column of A. Thus Au = (_1)0 det Mu.
Similarly, if alu(l) is chosen to be alj with 1 ~ j ~ n, then all (n - 1)!
terms such that 0"(1) = j in the expression of det A add up to aljAlj with
The submatrix Mij is called the minor of the entry aij and the number
Aij = (-1) Hj det Mij is called the cofactor of the entry aij. Therefore,
the determinant of an n x n matrix A can be computed by multiplying the
entries in anyone row by their co factors and adding the resulting products.
As a matter of fact, the determinant could be defined inductively by these
explicit formulas.
matrix A may be simplified into another one having more zero entries in a
row or in a column. This kind of simplification can be done by the elementary
row (or column) operations, and generally gives the most efficient way to
evaluate the determinant of a matrix. The next examples illustrate this
method for an evaluation of the determinant.
A= [-~ -! ~ =~ 1
2 -5 -3 8 .
-2 6 -4 1
Solution: Apply the elementary operations:
3 x row 1 + row 2,
(-2) x row 1 + row 3,
2 x row 1 + row 4
to A. Then
detA
o1 -11 27 -4
-1 1 [ 1 7 -4]10 .
[
det 0 -3 -7 10 = det -3 -7
o 4 0 -1 4 o -1
Now apply the operation: row 1 + row 2, to the matrix on the right side,
then
det [ 1 7 -4]
-3 -7 10
4 0-1
det [-~4 ~0 -:]
-1
= (-I)1+2·[Link] [-~ -~ 1
= -7(2 - 24) = 154.
Thus det A = 154. o
Problem 2.9 Use cofactor expansions along a row or a column to evaluate the de-
terminants of the following matrices:
(1) A = [0111]
2 0 1 1
2 2 0 1 '
1o 1]2
1 3 .
2 2 2 0 5 0
64 CHAPTER 2. DETERMINANTS
Example 2.6 Show that det A = (x-y)(x - z)(x - w)(y - z)(y- w)(z - w)
for
detA det [
~ y ~ X y2 ~ x 22 y3 ~ X33 ]
o z- X z2 - x z3 - x
o W - X w2 - x 2 w3 - x 3
A~
Xl 1
x2 . n-' ]
Xl
n-l
j
X2 2 " X2
[
Xn x2 x~-l
n
Show that
detA= II
l~i<j~n
(Xj - Xi).
2.4. CRAMER '8 RULE 65
Lemma 2.7
if i = j
if i f. j.
0 0 detA
A-I - 1
- ad - bc
[d
-c
-ba 1.
where C j is the matrix obtained from A by replacing the j-th column with
the column matrix b = [b 1 b2 ... bn]T.
x = A- 1 b = de~ A (adjA)b,
it follows that
o
2.4. CRAMER'S RULE 67
il
Solution:
[5060 22
i1
2
A=
[~ 2
2
Cl
90 2
[~ n [~
50 2
C2 = 60 C3 2 60 50 .
Therefore,
90 2 90 1
detCl detC2 _ detC3 _ 20
Xl = detA = 10, X2 = detA = 10, X3 - detA - . 0
Cramer's rule provides a convenient method for writing down the solution
of a system of n linear equations in n unknowns in terms of determinants.
To find the solution, however, one must evaluate n + 1 determinants of
order n. Evaluating even two of these determinants generally involves more
computations than solving the system by using Gauss-Jordan elimination.
Problem 2.14 Use Cramer's rule to solve the systems
(1)
{ 3Xl
-2Xl
2
+
+
3
4X2
4X2
5X2
+
+
5
3X3
5X3
2X3
-2
6
l.
x y + z
3
4 7 2
(2)
x + y + z
0
2 1
2.
Y z
Problem 2.15 Let A be the matrix obtained from the identity matrix In with i-th
column replaced by the column vector x = [Xl ..• XnV, Compute det A.
Problem 2.16 Prove that if Aij is the cofactor of aij in A = [aij], and if n > 1, then
o Xl X2 Xn
Xl all a12 al n n n
a2n =- LLAijXiXj.
i=l j=l
68 CHAPTER 2. DETERMINANTS
The set
P(A) = {ttiri : O:S ti:S 1, i = 1, ...
t=l
,n}
is called a parallelogram if n = 2, or a parallelepiped if n :::: 3. Note
that the row vectors of A form the edges of P(A), and a different order of
the row vectors does not alter the shape of P(A).
A geometrical meaning of the determinant is that it represents the volume
(or area for n = 2 ) of the parallelepiped P(A).
Proof: We present here a geometrical sketch since this way seems more
intuitive and more convincing. We give only the proof of the case n = 2,
and leave the case n = 3 to the readers. Let
where rl, r2 are the row vectors of A. Let Area(A) = Area(rl, r2) denote
the area of the parallelogram P(rl, r2) (see the figure below).
--~~-------------------- x
2.5. APPLICATION: AREA AND VOLUME 69
(2) Since the shape of the parallelogram P(A) does not depend on the
order of placing the row vectors: i.e., P(rl, r2) = P(r2, rl), we have
Area(rl,r2) = Area(r2 , rl) . On the other hand, det(rl , r2) = -det(r2,rl) .
Thus
--~----~----~~---+------~ x
(5) Thus the area function Area on M 2x2 (JR) satisfies the rules (R 1 ) and
(R3) of the determinant except for the rule (R2). Therefore, by uniqueness,
det = ±Area. 0
--~~------------------ x
z
vol(P(A)) = Vdet(AT A) .
2.6. EXERCISES 71
as expected.
Problem 2.17 Show that the area of a triangle ABC in the plane ]R2, where A =
(Xl, Yl), B = (X2' Y2), C = (X3, Y3), is equal to the absolute value of
1"2 det
[Xl
X2
Yl
Y2
1]
1 .
X3 Y3 1
2.6 Exercises
2.1. Determine the values of k for which det [ : 2~] = O.
2.2. Evaluate det(A 2BA-l) and det(B- 1 A 3) for the following matrices:
A= [-~ -~
o 1 0
i], B = [! -~ ;].
2 1 3
2.3. Evaluate the determinant of
A= [=i -~ -~ =i].
2 -3 -5 8
2.4. Evaluate det A for an n x n matrix A = [aij] when
(1) aij= { 0I ii=j,
=1= j
or
(2) ..
aij=z+J.
-A)isapolynomialinxofclx + Co.
2.5. Find all solutions of the equation det(AB) = 0 for
(2) the volume of the parallelepiped with edges determined by the vectors
(1, 0, 4), (0, -2, 2) and (3, 1, -1).
2.8. Use Cramer's rule to solve each system.
{"' + +
X2 X3 2
(1) { Xl + X2 = 3
(2) Xl + 2X2 + X3 = 2
Xl X2 -1.
Xl + 3X2 X3 -4.
X2 + X4 = -1
(3) { x, + X3 3
Xl X2 X3 X4 2
Xl + X2 + X3 + X4 o.
2.9. Use Cramer's rule to solve the given system:
! [~ 1 [ ~.1
2
(1) [ ;]x=[;] (2) 3 -1: x=
-1
1
2.10. Find a constant k so that the system of linear equations
kx - 2y - z 0
{ (k + l)y + 4z = 0
(k -l)z = o.
has more than one solution. (Is it possible to apply Cramer's rule here?)
2.11. Solve the following system of linear equations by using Cramer's rule and by
using Gaussian elimination:
[ ~;~~] =[;]
1121 x
1 1 124
3·
(1) A = !],
[~2 1i 1 (2) A = [~3 162~], (3) A = [-i -; ~].
321
2.14. Let A be the n x n matrix whose entries are alII. Show that
(1) det(A - nIn) = o.
(2) (A - nIn)ij = (_1)n- I nn-2 for all i, j, where (A - nIn)ij denotes the
cofactor of the (i, j)-entry of A - nIno
2.15. Show that if A is symmetric, then so is adjA. Moreover, if it is invertible,
then the inverse of A is also symmetric.
2.6. EXERCISES 73
2.16. Use the adjoint formula to compute the inverse of the each of the following
matrices:
2.21. Find the area of the triangle with vertices at (0,0,0), (1,1,2) and (2,2,1) in
]R3.
2.23. Determine whether or not the following statements are true in general, and
justify your answers.
(1) For any square matrices A and B of the same size, det(A+B) = detA+
detB.
(2) For any square matrices A and B of the same size, det(AB) = det(BA).
(3) If A is an n x n square matrix, then for any scalar e, det(e1n - A) =
en - detA.
(4) If A is an n x n square matrix, then for any scalar e, det(cIn - AT) =
det(cIn - A).
(5) If E is an elementary matrix, then det E = ± 1.
(6) There is no matrix A of order 3 such that A2 = -13.
(7) Let A be a nilpotent matrix, i.e., Ak = 0 for some natural number k.
Then det A = O.
(8) det(kA) = kdetA for any square matrix A.
(9) Any system Ax = b has a solution if and only if det A f. O.
(10) For any n x 1, n ~ 2, column vectors u and v, det(uv T ) = O.
(11) If A is a square matrix with det A = 1, then adj(adjA) = A.
74 CHAPTER 2. DETERMINANTS
(12) If the entries of A are all integers and detA = 1 or -1, then the entries
of A -1 are also integers.
(13) If the entries of A are O's or l's, then detA = 1, 0, or-l.
(14) Every system of n linear equations in n unknowns can be solved by
Cramer's rule.
(15) If A is a permutation matrix, then AT = A.
Chapter 3
Vector Spaces
75
76 CHAPTER 3. VECTOR SPACES
and scalar multiplication of a vector by a scalar. That is, for two vectors
x = (Xl, X2, X3), Y = (YI, Y2, Y3) in lR 3 and k a scalar, we define
x+y
kv
~~---------------- X2 ~---------------- X2
For any two vectors x = (Xl, X2, ... , xn) and y = (YI, Y2, ... , Yn) in the
n-space lRn, and a scalar k, the sum x + y and the scalar multiplication kx
of them are vectors in lRn defined by
Theorem 3.1 For any scalars k and C, and vectors x = (Xl, X2, ... , x n ),
y = (Yl, Y2, ... , Yn), and z = (Zl, Z2, ... , zn) in the n-space lR n , the
following rules hold:
(1) x+y=y+x,
(2) x + (y + z) = (x + y) + z,
(3) x + 0 = x = 0 + x,
3.1. VECTOR SPACES AND SUBSPACES 77
(4) x + (-I)x = 0,
(5) k(x+y) = kx+ky,
(6) (k + f)x = kx + fx,
(7) k(fx) = (kf)x,
(8) Ix = x,
where 0 = (0, 0, ... , 0) is the zero vector.
I
We usually write a vector (aI, a2, ... , an) in the n-space]Rn as an n x I
column matrix
a2
al T
: = [al a2 ... an] ,
an
also called a column vector. Then the two operations of the matrix sum and
the scalar multiplication of column matrices coincide with those of vectors
in ]Rn, and the above theorem is just Theorem 1.2.
These rules of arithmetic of vectors are the most important ones because
they are the only rules that we need to manipulate vectors in the n-space
]Rn. Hence, an (abstract) vector space can be defined with respect to these
rules of operations of vectors in the n-space IRn so that ]Rn itself becomes
a vector space. In general, a vector space is defined to be a set with two
operations: an addition and a scalar multiplication which satisfy the above
rules of operations in ]Rn.
Example 3.1 (1) For any positive integer m and n, the set Mmxn(R) of
all m x n matrices forms a vector space under the matrix sum and scalar
multiplication defined in Section 1.3. The zero vector in this space is the
zero matrix Omxn, and -A is the negative of a matrix A.
(2) Let C(R) denote the set of real-valued continuous functions defined
on the real line R For two functions f(x) and g(x), and a real number k,
the sum f + g and the scalar multiplication kf of them are defined by
Then one can easily verify, as an exercise, that the set C(R) is a vector
space under these operations. The zero vector in this space is the constant
function whose value at each point is zero.
(3) Let A be an m x n matrix. Then it is easy to show that the set of
solutions of the homogeneous system Ax = 0 is a vector space (under the
sum and scalar multiplication of matrices).
Problem 3.1 Let V be the set of all pairs (x, y) of real numbers. Suppose that an
addition and scalar multiplication of pairs are defined by
(x, y) + (u, v) = (x + 2u, y + 2v), k(x, y) = (kx, ky).
Is the set V a vector space under those operations? Justify your answer.
Proof: We need only to prove the sufficiency. Assume both conditions hold
and let x be any vector in W. Since W is closed under scalar multiplication,
0= Ox and -x = (-1)x are in W, so rules (3) and (4) for a vector space
hold. All the other rules for a vector space are clear. 0
A vector space V itself and the zero vector {O} are trivially subspaces.
Some nontrivial subspaces are given in the following examples.
Example 3.4 For a nonnegative integer n, let Pn(lR) denote the set of all
real polynomials in X with degree::; n. Then Pn(lR) is a subspace of the
vector space C(IR) of all continuous functions on R
Example 3.5 Let W be the set of all n x n real symmetric matrices. Then
W is a subspace of the vector space Mnxn(lR) of all n x n matrices, because
the sum of two symmetric matrices is symmetric and a scalar multiplication
of a symmetric matrix is also symmetric. Similarly, the set of all n x n
skew-symmetric matrices is also a subspace of Mnxn(IR).
Problem 3.2 Which of the following sets are subspaces of the 3-space 1R3 ? Justify
your answer.
(1) W = {(x, y, z) E 1R3 : xyz = O},
(2) W = {(2t, 3t, 4t) E 1R3 : t E 1R},
(3) W = {(x, y, z) E 1R3 : x 2 + y2 - z2 = O},
(4) W = {x E 1R3 : x T u = 0 = x T v}, where u and v are any two fixed nonzero
vectors in 1R3 .
Can you describe all subspaces of the 3-space 1R3 ?
3.2. BASES 81
Problem 3.3 Let V = C(JR.) be the vector space of all continuous functions on R
Which of the following sets Ware subspaces of V? Justify your answer.
(1) W is the set of all differentiable functions on R
(2) W is the set of all bounded continuous functions on R
(3) W is the set of all continuous nonnegative-valued functions on JR., i.e., f(x) 2
o for any x E R
(4) W is the set of all continuous odd functions on JR., i. e., f (- x) = - f (x) for
any x E R
(5) W is the set of all polynomials with integer coefficients.
3.2 Bases
Recall that any vector in the 3-space JR.3 is of the form (Xl, X2, X3) which
can alsobe written as
That is, any vector in ]R3 can be expressed as the sum of scalar multiples of
el = (1, 0, 0), e2 = (0, 1, 0) and e3 = (0, 0, 1), which are also denoted
by i, j and k, respectively. The following definition gives a name to such
expressions.
Definition 3.2 Let V be a vector space, and let {Xl, X2, ... , xm} be a set
of vectors in V. Then a vector y in V of the form
where aI, ... , am are scalars, is called a linear combination of the vectors
Xl, X2, ... , Xm·
The next theorem shows that the set of all linear combinations of a finite
set of vectors in a vector space forms a subspace.
Theorem 3.4 Let Xl, X2, ... , Xm be vectors in a vector space V. Then
the set W = {alxl + a2x2 + ... + amXm : ai E JR.} of all linear combinations
of Xl, X2, ... , Xm is a subspace of V called the subspace of V spanned
by Xl, X2, ... , Xm ·
82 CHAPTER 3. VECTOR SPACES
Proof: We want to show that W is closed under addition and scalar mul-
tiplication. Let u and w be any two vectors in W. Then
u
w
Thus, u + wand ku are linear combinations of Xl, X2, ... , Xm and conse-
quently contained in W. Therefore, W is a subspace of V. 0
Suppose that {Xl, X2, ... , xm} is any set of m vectors in a vector space
V. If any vector in V can be written as a linear combination of these vector
Xi'S, we say that it is a spanning set of V.
Example 3.6 (1) For a nonzero vector v in a vector space V, linear com-
binations of v are simply scalar multiples of v. Thus the subspace W of V
spanned by v is W = {kv : k E Il~.}.
(2) Consider three vectors VI = (1,1,1), V2 = (1, -1, 1) and V3 =
(1,0,1) in ~a. The subspace WI spanned by VI and V2 is written as
en (0,0,0, ... , 1)
Example 3.8 Let A = [al a2 ... an] be an m x n matrix. Then the column
vectors ai's are in IRm , and the matrix product Ax represents the linear
combination of the column vector ai's whose coefficients are the components
of x E IR n , i.e., Ax = xlal + X2a2 + ... + xnan . Therefore, the set
Problem 3.4 Let Xl, X2, ... , Xm be vectors in a vector space V and let W be the
subspace spanned by Xl, X2, ... , X m . Show that W is the smallest subspace of V
containing Xl, X2, ... , X m , i.e., if U is a subspace of V containing Xl, X2, ... , X m ,
then W S;;; U.
Problem 3.5 Show that the set of all matrices of the form AB - BA does not span
the vector space JI;[nxn(IR).
84 CHAPTER 3. VECTOR SPACES
Definition 3.3 A set of vectors {Xl, X2, ... , xm} in a vector space V is
said to be linearly independent if the vector equation, called the linear
dependence of Xi'S,
Therefore, a set of vectors {Xl, X2, ... , xm} is linearly dependent if and
only if there is a linear dependence
with a nontrivial solution (CI' C2, ... , cm). In this case, we may assume
that Cm i=- o. Then the equation can be rewritten as
CI C2 Cm-l
Xm = - - X l - -X2 - ... - --Xm-l·
Cm Cm Cm
That is, a set of vectors is linearly dependent if and only if at least one of
the vectors in the set can be written as a linear combination of the others.
general, if x, yare noncollinear vectors in the 3-space IR 3 , the set of all linear
combinations of x and y determines a plane W through the origin in IR3 , i. e.,
W = {ax + by : a, b E IR}. Let z be another nonzero vector in the 3-space
IR3. If z E W, then there are some numbers a, b E IR, not all of them are
zero, such that z = ax + by, that is, the set {x, y, z} is linearly dependent.
If z ¢ W, then ax + by + cz = 0 is possible only when a = b = c = (prove °
it). Therefore, the set {x, y, z} is linearly independent if and only if z does
not lie in W. 0
[! -; -! ~ 1
Example 3.10 The columns of the matrix
A=
2 -1 1 3
are linearly dependent in the 3-space IR , since the third column is the sum
3
of the first and the second.
As this example shows, the concept of linear dependence can be applied
to the row or column vectors of any matrix.
n
Example 3.11 Consider an upper triangular matrix
A~ [~ ~
The linear dependence of the column vectors of A may be written as
From the third row, C3 = 0, from the second row C2 = 0, and substitution of
them into the first row forces CI = 0, i.e., the homogeneous system has only
the trivial solution, so that the column vectors are linearly independent. 0
86 CHAPTER 3. VECTOR SPACES
Theorem 3.5 The nonzero rows of a matrix of a row-echelon form are lin-
early independent, and so are the columns that contain leading 1 'so
Lemma 3.6 (1) Any set of n vectors in the m-space ]Rm is linearly de-
pendent if n > m.
(2) If U is the reduced row-echelon form of A, then the columns of U
are linearly independent if and only if the columns of A are linearly
independent.
Example 3.13 In general, it is quite clear that the vectors el, e2""l en in
]Rn are linearly independent. Moreover, they span the n-space ]Rn: In fact,
when we write a vector x E ]Rn as (Xl, ... , x n ), it means just the linear
combination of the vector ei's:
x = (Xl, ... , xn) = xlel + '" + xnen ·
However, if anyone of the ei's is missed, then they cannot span ]Rn. Thus,
this kind of vector plays a special role in the vector space.
3.2. BASES 87
For example, as we saw in Example 3.13, the set {el' e2, ... , en} forms
a basis, called the standard basis for the n-space ]Rn. Of course, there are
many other bases for ]Rn.
Example 3.14 (1) The set of vectors (1,1,0), (0, -1, 1), and (1,0,1) is not
a basis for the 3-space ]R3, since this set is linearly dependent (the third is
the sum of the first two vectors), and cannot span ]R3. (The vector (1,0,0)
cannot be obtained as a linear combination of them (prove it).) This set
does not have enough vectors spanning ]R3.
(2) The set of vectors (1,0,0), (0,1,1), (1,0,1) and (0,1,0) is not a basis
either, since they are not linearly independent (the sum of the first two minus
the third makes the fourth) even though they span ]R3. This set of vectors
has some redundant vectors spanning ]R3.
(3) The set of vectors (1,1,1), (0,1,1), and (0,0,1) is linearly inde-
pendent and also spans ]R3. That is, it is a basis for ]R3, different from the
standard basis. This set has the proper number of vectors spanning ]R3, since
the set cannot be reduced to a smaller set nor does it need any additional
vector spanning ]R3. 0
Theorem 3.7 Let Q = {VI, V2, ... , V n} be a basis for a vector space V.
Then each vector x in V can be uniquely expressed as a linear combination
of VI, V2, ... , v n , i.e., there are unique scalars ai's, i = 1, ... , n, such
that
In this case, the column vector [al a2 ... an]T is called the coordinate
vector of x with respect to the basis Q, and it is denoted [x]a.
Proof: If x can be also expressed as x = b l VI + b2V2 + ... + bn V n , then
we have 0 = (al - bl)VI + (a2 - b 2 )V2 + ... + (an - bn)vn . By the linear
independence of xi's, ai = bi for all i = 1, ... , n. 0
88 CHAPTER 3. VECTOR SPACES
Example 3.15 Let a = {el' e2, e3} be the standard basis for ]R3, and let
(3 = {VI, V2, V3} with VI = (1,1,1) = el + e2 + e3, V2 = (0,1,1) = e2 + e3,
V3 = (0,0,1) = e3. Then
Problem 3.6 Show that the vectors VI = (1, 2, 1), V2 = (2, 9, 0) and V3 = (3, 3, 4)
in the 3-space ]R3 form a basis.
Problem 3.7 Show that the set {I, x, x 2 , ... , xn} is a basis for Pn (]R), the vector
space of all polynomials of degree::; n with real coefficients.
Problem 3.8 In the n-space ]Rn, determine whether or not the set
is linearly dependent.
Problem 3.9 Let Xk denote the vector in ]Rn whose first k - 1 coordinates are zero
and whose last n - k + 1 coordinates are 1. Show that the set {Xl, X2, ... , xn} is
a basis for ]Rn.
3.3 Dimensions
We often say that the line ]RI is one-dimensional, the plane ]R2 is two-
dimensional and the space ]R3 is three-dimensional, etc. This is mostly due
to the fact that the freedom in choosing coordinates for each element in the
space is 1, 2 or 3, respectively. This means that the concept of dimension
is closely related to the concept of bases. Note that for a vector space in
general there is no unique way to choose a basis. However, there is some-
thing common to all bases, and this is related to the notion of dimension.
We first need the following important lemma from which one can define the
dimension of a vector space.
Lemma 3.8 Let V be a vector space and let a = {Xl, X2, ... , xm} be a
set of m-vectors in V.
3.3. DIMENSIONS 89
(1) If a spans V, then every set of vectors with more than m vectors cannot
be linearly independent.
(2) If a is linearly independent, then any set of vectors with fewer than m
vectors cannot span V.
Proof: Since (2) follows from (1) directly, we prove only (1). Let (3 =
{YI, Y2, ... , Yn} be a set of n-vectors in V with n > m. We will show that
(3 is linearly dependent. Indeed, since each vector Yj is a linear combination
of the vectors in the spanning set a, i.e., for j = 1, ... , n,
m
au al2 al n CI 0
a21 a22 a2n C2 0
amI am2 a mn Cn 0
It is clear by Lemma 3.8 that if a set 0: = {Xl, X2, ... , xn} of n vectors
is a basis for a vector space V, then no other set f3 = {YI, Y2, ... , Yr} of r
vectors can be a basis for V if r t= n. This means that all bases for a vector
space V have the same number of vectors, even if there are many different
bases for a vector space. Therefore, we obtain the following important result:
Proof: We prove (1) only and leave (2) as an exercise. Let 0: = {Xl, ... ,
xd be a linearly independent set in V. If 0: spans V, then 0: is a basis. If 0:
does not span V, then there exists a vector, say Xk+l, in V that is not con-
tained in the subspace spanned by the vectors in 0:. Now {Xl, ... , Xk, Xk+1}
is linearly independent (check why). If {Xl, ... , Xk, xk+d spans V, then
3.3. DIMENSIONS 91
this is a basis for V. If it does not span V, then the same procedure can be
repeated, yielding a linearly independent set that spans V, i. e., a basis for
V. This procedure must stop in a finite step because of Lemma 3.8 for a
finite dimensional vector space V. 0
Theorem 3.10 shows that a basis for a vector space V is a set of vectors
in V which is maximally independent and minimally spanning in the above
sense. In particular, if W is a subspace of V, then any basis for W is
linearly independent also in V, and can be extended to a basis for V. Thus
dim W::::: dim V.
Proof: Again we prove (1) only. If a spanning set of n vectors were not
linearly independent, then the set would be reduced to a basis that has a
smaller number of vectors than n vectors. 0
A~[~
-2 5
1 1 -3]
4 .
o 1 o
Reduce A to a row-echelon form:
92 CHAPTER 3. VECTOR SPACES
The three nonzero row vectors of U are clearly linearly independent, and
they also span W because the vectors Xl, x2 and X3 can be expressed as
a linear combination of these three nonzero row vectors of U. Hence, U
provides a basis for W. (Note that this implies dim W = 3 and hence Xl,
X2, X3 is also a basis for W by Corollary 3.11. The linear independence of
Xi'S is a by-product of this fact).
To extend this basis, just add any nonzero vector of the form X4
(0, 0, 0, t) to the rows of U to get a basis for the space lR4. 0
Problem 3.11 Find a basis and the dimension of each of the following subspaces of
Mnxn(lR) of all n x n matrices:
(1) the space of all n x n diagonal matrices whose traces are zero;
(2) the space of all n x n symmetric matrices;
(3) the space of all n x n skew-symmetric matrices.
U + W = {u + w : u E U, W E W}.
Definition 3.6 A vector space V is called the direct sum of two subspaces
U and W, written V = U EEl W, if V = U + Wand Un W = {O}.
Proof: Suppose that V = U EEl W. Then, for any v E V, there exist vectors
U E U and w E W such that v = U + w, since V = U + W. To show the
uniqueness, suppose that v is also expressed as a sum u' + w' for u' E U
and w' E W. Then U + w = u' + w' implies
U - u' = w' - w E U n W = {O}.
Hence, U = u' and w = w'.
Conversely, if there exists a nonzero vector v in U n W, then v can be
written as sum of vectors in U and W in many different ways:
1 1 1 2
v =v+ 0=0+ v = 2v + 2v = 3v + 3v E U + W. o
Example 3.18 Consider the three vectors el, e2 and e3 in JR.3. Let U =
{aIel +b3e3 : aI, b3 E JR.} be the subspace spanned by el and e3 (xz-plane),
and let W = {a2e2 + C2e3 : a2, C3 E JR.} be the subspace of JR.3 spanned by
e2 and e3 (yz-plane). Then a vector in U + W is of the form
where a3 = b3+C3 and aI, a2, a3 are arbitrary numbers. Thus U + W = JR.3.
However, JR.3 i- U EEl W since clearly e3 E Un W i- {O}. In fact, the vector
e3 E JR.3 can be written as many linear combinations of vectors in U and W:
1 1 1 2
e3 = 2e3 + 2e3 = 3e3 + 3e3 E U + W.
Proof: Choose a basis {UI' ... , ud for U, and extend it to a basis {Ul' ... ,
Uk, uk+l, ... , un} for V. Then the subspace W spanned by {Vk+l' ... , v n }
satisfies the requirement. 0
94 CHAPTER 3. VECTOR SPACES
Problem 3.13 Let U and W be the su bspaces of the vector space M n x n OR) consist-
ing of all symmetric matrices and all skew-symmetric matrices, respectively. Show
that Mnxn(lR) = U EEl W. Therefore, the decomposition of a square matrix A given
in (3) of Problem 1.10 is unique.
Problem 3.14 Let {VI, v2, ... , v n } be a basis for a vector space V and let Wi =
{rvi : r E JR} be the subspace of V spanned by Vi. Show that V = WI EEl W2 EEl
···EEl Wn ·
aml am2 a mn rm
Cl C2 cn ] ,
where the ri's are the row vectors of A that are in JR n , and the Cj'S are the
column vectors of A that are in JR m .
Definition 3.7 Let A be an m x n matrix with row vectors {q, ... , rm}
and column vectors {Cl' ... , c n }.
(1) The row space of A is the subspace in JRn spanned by the row vectors
{rl' ... , r m}, denoted by R(A).
(2) The column space of A is the subspace in JRm spanned by the column
vectors {Cl' ... , C n }, denoted by C(A).
(3) The solution set of the homogeneous equation Ax = 0 is called the
null space of A, denoted by N(A).
Note that the null space N(A) is a subspace of the n-space ]Rn, whose
dimension is called the nullity of A. Since the row vectors of A are just the
column vectors of its transpose AT, and the column vectors of A are the row
vectors of AT, the row space of A is the column space of AT; that is,
3.4. ROW AND COLUMN SPACES 95
Since Ax = XICI + X2C2 + ... XnCn for any vector x = (Xl, X2,"" xn) E ]Rn,
we get
C(A) = {Ax : x E ]Rn}.
Thus, for a vector b E ]Rm, the system Ax = b has a solution if and only if
b E C(A) ~ ]Rm. Thus, the column space C(A) is the set of vectors b E ]Rm
for which Ax = b has a solution.
It is quite natural to ask what the dimensions of those subspaces are,
and how one can find bases for them. This will help us to understand the
structure of all the solutions of the equation Ax = b. Since the set of the
row vectors and the set of the column vectors of A are spanning sets for the
row space and the column space, respectively, a minimally spanning subset
of each of them will be a basis for each of them.
This is not difficult for a matrix of a (reduced) row-echelon form.
Clearly, the first three nonzero row vectors containing leading 1's are lin-
early independent and they form a basis for the row space R(U), so that
dim R(U) = 3. On the other hand, note that the first three columns con-
taining leading 1's are linearly independent (see Theorem 3.5), and that the
last two column vectors can be expressed as linear combinations of them.
Hence, they form a basis for C(U), and dimC(U) = 3. To find a basis for the
null space N(U), we first solve the system Ux = 0 with arbitrary values 8
and t for the free variables X4 and Xs, and get the solution
Xl -28 2t -2 -2
X2 8 3t 1 -3
X3 -48 + t =8 -4 +t 1 = 8lls + tllt,
X4 s 1 0
Xs t 0 1
where lls = (-2, 1, -4, 1, 0), llt = (-2, -3, 1, 0, 1). It shows that these
two vectors lls and llt span the null space N(U), and they are clearly linearly
independent. Hence, the set {lls, llt} is a basis for the null space N(U). 0
96 CHAPTER 3. VECTOR SPACES
In the following, the row, the column or the null space of a matrix A
will be discussed in relation to the corresponding space of its (reduced) row-
echelon form. We first investigate the row space R(A) and the null space
N(A) of A by comparing them with those of the reduced row-echelon form U
of A. Since Ax = 0 and Ux = 0 have the same solution set by Theorem 1.1,
we have N(A) = N(U).
Let A = [ r.1 1be an m x n matrix, where ri's are the row vectors of
r~
A. The three elementary row operations change A into the following three
types:
rl rl
rj
AI= kri for k =1= 0, A2 = for i < j, A3 = ri + krj
ri
rm rm
It is clear that the row vectors of the three matrices AI, A2 and A3 are linear
combinations of the row vectors of A. On the other hand, by the inverse
elementary row operations, these matrices can be changed into A. Thus,
the row vectors of A can also be written as linear combinations of those of
Ai'S. This means that if matrices A and B are row equivalent, then their
row spaces must be equal, i.e., R(A) = R(B).
Now the nonzero row vectors in the reduced row-echelon form U are
always linearly independent and span the row space of U (see Theorem 3.5).
Thus they form a basis for the row space R(A) of A. We have the following
theorem.
The following example shows how to find bases for the row and the null
spaces, and at the same time how to find a basis for the column space C(A).
3.4. ROW AND COLUMN SPACES 97
A = [ -21o -5
2 01 -12 -85] [ r2.
-3 3 4 1
rl ]
r3
3 6 0 -7 2 r4
Find bases for the row space R(A), the null space N(A), and the column
space C(A) of A.
0 2 0
[~ 1]
1 -1 0
u= 0 0 1
0 0 0
VI (1, 0, 2, 0, 1),
V2 (0, 1, -1, 0, 1),
V3 (0, 0, 0, 1, 1)
of U are linearly independent, they form a basis for the row space R(U) =
R(A), so dim R(A) = 3. (Note that in the process of Gaussian elimination,
we did not use a permutation matrix. This means that the three nonzero
rows of U were obtained from the first three row vectors rl, r2, r3 of A and
the fourth row r 4 of A turned out to be a linear combination of them. Thus
the first three row vectors of A also form a basis for the row space.)
(2) Find a basis for N(A). It is enough to solve the homogeneous system
Ux = 0, since N(A) = N(U). That is, neglecting the fourth zero equation,
the equation U x = 0 takes the following system of equations:
+ Xs o
+ Xs o
X4 + Xs O.
Since the first, the second and the fourth columns of U contain the leading
1's, we see that the basic variables are Xl, X2, X4, and the free variables are
98 CHAPTER 3. VECTOR SPACES
X3, X5. By assigning arbitrary values sand t to the free variables X3 and
X5, we find the solution x of Ux = 0 as
Xl -2s t -2 -1
X2 s - t 1 -1
+ tnt,
x= X3 s =s 1 +t
° = sns
-2CI+C2+C3 0,
-CI - C2 - C4 + C5 0,
C3 2CI - C2,
C5 = CI + C2 + C4.
That is, the column vectors C3, C5 of A are linear combinations of the column
vectors CI, C2, C4, which correspond to the basic variables in Ax = o. Hence,
{CI, C2, C4} spans the column space C(A).
3.4. ROW AND COLUMN SPACES 99
We claim that {CI' C2, C4} is linearly independent. Let A = [CI C2 C4]
and U = [UI U2 U4] be submatrices of A and U, respectively, where Uj is the
j-th column vector of the reduced row-echelon form U of A obtained in (1).
Then clearly U is the reduced row-echelon form of A so that N(A) = N(U).
Since the vectors UI, U2, U4 are just the columns of U containing leading
l's, they are linearly independent, by Theorem 3.5, and Ux = 0 has only a
trivial solution. This means that Ax = 0 has also only a trivial solution, so
{CI' C2, C4} is linearly independent. Therefore, it is a basis for the column
space C(A) and dimC(A) = 3 = the number of basic variables. That is, the
column vectors of A corresponding to the basic variables in Ux = 0 form a
basis for the column space C(A). 0
A = [~ =~ ~~ ~~ ~].
2 6 18 8 6
Also find a basis for C(A) by finding a basis for R(A T ).
Problem 3.17 Let A and B be two n x n matrices. Show that AB = 0 if and only
if the column space of B is a subspace of the nullspace of A.
Problem 3.18 Find an example of a matrix A and its row-echelon form U such that
C(A) i- C(U).
100 CHAPTER 3. VECTOR SPACES
Remark: In the proof of Theorem 3.15, once we have shown that the
columns in A are linearly independent as in (1), we may replace step (2)
by the following argument: One can easily see that dimC(A) :::::: dim R(A)
by Theorem 3.10. On the other hand, since this inequality holds for arbitrary
matrices, in particular for AT, we get dimC(AT) :::::: dim R(AT). Moreover,
C(AT) = R(A) and R(AT) = C(A) implies dimC(A) ~ dim R(A), which
means dimC(A) = dim R(A). This also means that the column vectors of A
span C(A), and so form a basis.
In summary, the following equalities are now clear from Theorem 3.14
and 3.15:
dim R(A) dim R(U)
the number of nonzero row vectors of U
the maximal number of linearly independent
row vectors of A
the number of basic variables in Ux = O.
the maximal number of linearly independent
column vectors of A
dimC(A).
dimN(A) dimN(U)
the number of free variables in Ux = O.
°
If dimN(A) = (or N(A) = {O}), then dim R.(A) = n (or R.(A) = JR n ),
which means that A has exactly n linearly independent rows and n linearly
independent columns. In particular, if A is a square matrix of order n, then
the row vectors are linearly independent if and only if the column vectors
are linearly independent. Therefore, by Theorem 1.8, we get the following
corollary.
°
n
2 2
A= r -:1
-2 1 1
2 -3 -7
1 2 -2 -4
The first three nonzero rows containing leading 1's in U form a basis for
R.(U) = R.(A). Note that Xl, X3 and X5 are the basic variables in Ux = 0,
since the first, third and fifth columns of U contain leading 1'so Thus the
three columns Cl = (1, -1, 1, 1), C3 = (0,1, -3, -2) and C5 = (1,0,2,3) of A,
not the three columns in U, corresponding to those basic variables Xl, x3 and
X5 form a basis for C(A). Therefore, rank A = dim R.(A) = dimC(A) = 3,
the nullity of A = dimN(A) = 2, and dimN(AT) = 1. 0
Problem 3.19 Find the nullity and the rank of each of the following matrices:
(l)A=[ ;
-1
~
-2
-i
0 -5
~l' (2)A=[i
2 1 5 -2
i; ~l·
For each of the matrices, show that dim R(A) = dimC(A) directly by finding their
bases.
Problem 3.20 Show that a system of linear equations Ax = b has a solution if and
only if rank A = rank [A b], where [A b] denotes the augmented matrix of Ax = b.
3.5. RANK AND NULLITY 103
Theorem 3.19 For any two matrices A and B for which AB can be defined,
(1) N(AB) ~ N(B),
(2) N(AB)T) ~ N(AT ),
(3) C(AB) ~ C(A),
(4) R(AB) ~ R(B).
Proof: (1) and (2) are clear, since Bx = 0 implies (AB)x = A(Bx) = O.
(3) For an m x n matrix A and an n x p matrix B,
Problem 3.22 Prove that the rank of a matrix is equal to the largest order of its
invertible submatrices.
Problem 3.23 For each of the matrices given in Problem 3.19, find an invertible
submatrix of the largest order.
Y alvl+···+akvk
blWI + ... + beWi,
for some al, ... , ak and bl,.'" be. Let x = (al,"" ak, -b l , ... , -bi). Then it
is quite clear that Qx = 0, i.e., x E N(Q). That is, for each x E N(Q), there
corresponds a vector y E V n W, and vice versa. Moreover, if Xi, i = 1,2,
3.6. BASES FOR SUBSPACES 105
dimN(Q) = dim V n W.
{ VI
= (1, 3, -2, 2, 3),
{ WI
(2, 3, -1, -2, 9),
V2 (1, 4, -3, 4, 2), W2 = (1, 5, -6, 6, 1),
V3 = (1, 3, 0, 2, 3), W3 (2, 4, 4, 2, 8),
106 CHAPTER 3. VECTOR SPACES
1 1 1 2 1 2
3 4 3 3 5 4
Q = [VI V2 V3 WI W2 W3 1= -2 -3 0 -1 -6 4
2 4 2 -2 6 2
3 2 3 9 1 8
U~ [~ n
0 0 5 0
1 0 -3 2
0 1 0 -1
0 0 0 0
From this, one can directly see that dim(V + W) = 4. The columns
VI, V2, V3, W3 corresponding to the basic variables in Qx = 0 form a ba-
sis for C(Q) = V + W. Moreover, dimN(Q) = dim(V n W) = 2, since there
are two free variables X4 and X5 in Qx = o.
To find a basis for V n W, we solve Ux = 0 for (Xl, X2, X3, 1,0, X5) and
(Xl, X2, X3, 0,1, X5). After a simple computation, we obtain a basis for N(Q):
-5VI + 3V2 + WI 0,
-2V2 + V3 + W2 o.
Therefore, {YI, Y2} is a basis for V n W, where
2 1
3 5
YI = 5VI - 3V2 = -1 = WI, Y2 = 2V2 - V3 = -6 = W2.
-2 6
9 1
A=[A]
x '
and the matrix B is defined similarly. Then it is clear that n(A) = n(A)
and n(B) = n(B) if and only if x E V n W = n(A) n n(B). This means
that the row-echelon form of A and that of A should be the same via the
same Gaussian elimination. Thus, by comparing the row vectors of the row-
echelon form of A with those of A, we can obtain a system of linear equations
for x = (Xl, ... , xn). By the same argument applied to Band B, we get
another system of linear equations for the same x = (Xl, ... , xn). Solutions
to these two systems together will provide us with a basis for V n W.
The following example illustrates how one can apply this argument to
find bases for V + Wand V n W.
WI = (1, 3, 0, 2, 1),
W2 (1, 5, -6, 6, 3),
W3 (2, 5, 3, 2, 1).
108 CHAPTER 3. VECTOR SPACES
so that dim V = 3. Similarly, the matrix B whose row vectors are wj's is
reduced to a row-echelon form
1 3 o 2
[ o 2 -6 4
o 0 o 0
so that dim W = 2.
Now, if Q denotes the 6 x 5 matrix whose row vectors are Vi'S and wj's,
then V +W = R( Q). By Gaussian elimination, Q is reduced to a row-echelon
form, excluding zero rows:
3 -2 2
[~ -~ ]
1 -1 2
0 1 0 -1 .
0 0 0 1
[
1
1
3-2
4 -3 4 2 3]
2
2 3 -1 -2 10 .
Xl X2 X3 X4 X5
3.6. BASES FOR SUBSPACES 109
= 0
= o.
We do the same calculation with B, and obtain another homogeneous system
of linear equations for x:
-9Xl + 3X2 + X3 = 0
{ 4Xl - 2X2 + X4 = 0
2Xl - X2 + X5 o.
Solving these two homogeneous systems together yields
V n W = {t(l, 4, -3, 4, 2) : t E ]R}.
Problem 3.24 Let V and W be the subspaces of the vector space P3 (]R) spanned by
VI (X) 3 X + 4x 2 + x 3,
{ V2(X) 5 + 5x 2 + x 3 ,
V3(X) 5 5x + 10x 2 + 3x3 ,
and
9 3x + 3x 2 +
5 X + 2x2 +
6 + 4x 2 +
respectively. Find the dimensions and bases for V + Wand V n W.
Problem 3.25 Let
V = {(x, y, z, u) E]R4 : y+ z +u = O},
W ={(x, y, z, u) E]R4 : x + y = 0, z = 2u}
be two subspaces of ]R4. Find bases for V, W, V + W, and V n W.
110 CHAPTER 3. VECTOR SPACES
3.7 Invertibility
We now can have the following existence and uniqueness theorems for a
solution of a system of linear equations Ax = b for an mxn matrix A and
a vector b E ]Rm.
Proof: (1) ¢} (2): Note that C(A) ~ ]Rm in general. For any b E ]Rm,
there is a solution x E ]Rn of Ax = b if and only if b is a linear combination
of the column vectors of A. This is equivalent to saying that ]Rm = C(A).
(2) ¢} (3): Since dimC(A) = rank A = dim R(A) ::; min{m, n}, C(A) =
]Rm if and only if dimC(A) = m ::; n (see Problem 3.10).
(1) =} (4) : Let el, e2, ... , em be the standard basis for ]Rm. Then for
each i = 1, 2, ... , m we can find at least one solution Xi E ]Rn such that
AXi = ei by the condition. If B is the n x m matrix whose columns are these
solutions, i.e., B = [Xl x2 ... x m], then it follows by matrix multiplication
that
Proof: (1)::::} (2) : Note that the column vectors of A are linearly inde-
pendent if and only if the homogeneous equation Ax = 0 has only a trivial
solution. However, Ax = 0 has always a trivial solution x = 0 and (1) means
that it is the only one.
(2) {::} (3) : Clear, because all the column vectors are linearly indepen-
dent if and only if they form a basis for C(A), or dimC(A) = n ::; m.
(3) {::} (4): Clear, because dim R(A) = rank A = dimC(A) = n if and
only if R(A) =]Rn (see Problem 3.10).
(4) {::} (5): Clear, since dim R(A) + dimN(A) = n.
(2) ::::} (6) : Suppose that the columns of A are linearly independent
so that rank A = n. Extend these column vectors of A to a basis for ]Rm
by adding m - n more independent vectors to them. Construct an m x m
matrix 8 with those vectors in columns. Then the matrix 8 has rank m
and is hence invertible. Let C be the n x m matrix obtained from 8- 1 by
throwing away the last m - n rows. Since the first n columns of 8 constitute
the matrix A, we have CA = In.
(6) ::::} (1) : Let C be a left inverse of A. If Ax = b has no solution, then
we are done. Suppose that Ax = b has two solutions, say Xl and X2. Then
Remark: (1) We have proved that an mxn matrix A has a right inverse if
and only if rank A = m, and A has a left inverse if and only if rank A = n.
In the first case Ax = b always has a solution, and in the second case the
solution (if it exists) is unique. Therefore, if m i= n, A cannot have both left
and right inverses.
(2) For a practical way of finding a right or a left inverse of an mxn
matrix A, we will show later (see Corollary 5.24) that if rank A = m, then
(AAT)-l exists and AT(AAT)-l is a right inverse of A, and if rank A = n,
then (AT A)-l exists and (AT A)-l AT is a left inverse of A.
112 CHAPTER 3. VECTOR SPACES
Problem 3.26 For each of the following matrices, discuss the number of possible
solutions to the system of linear equations Ax = b for any b:
-~ ~6 13: l'
[2~ !7 -3 (2) A= [ -~1 l'
l
(1) A= ;
-6
(15)* The linear transformation A : lR.n ---> lR.n via A(x) = Ax is injective.
(16)* The linear transformation A : lR.n ---> lR.n is surjective.
(17)* Zero is not an eigenvalue of A.
Proof: Exercise: where have we proved which claim? Prove any not cov-
ered. The numbers with asterisks will be explained in the following places:
(15) and (16) in the Remark on page 141 and (17) in Theorem 6.1. 0
Since the Xi'S are all distinct, det A =1= o. It follows that A is nonsingular,
and hence Ax = b always has a unique solution, which determines the unique
polynomial p(x) of degree::; n passing through the given n+1 points (xo, Yo),
(Xl, YI), ... , (x n , Yn) in the plane ]R2.
in the plane ]R2, let p(x) = ao + alx + a2x2 + a3x3 be the polynomial passing
through the given four points. Then, we have a system of equations
{ :~
=
+ al + a2 + a3 = 0
ao - al + a2 a3 2
ao + 3al + 9a2 + 27a3 6.
Problem 3.27 Let f(x) = sinx. Then at x = 0, i, ~, 3;, Jr, the values of fare
y = 0, ~, 1, ~, O. Find the polynomial p(x) of degree 4 that passes through :s
these five points. (One may need to use a computer due to messy computation).
Problem 3.29 Find the equation of a circle that passes through the three points
(2, -2), (3, 5), and (-4, 6) in the plane ]R2.
Remark: (1) It is suggested that the readers think about the differences
between this interpolation and the Taylor polynomial approximation to a
differentiable function.
(2) Note again that the interpolating polynomial p( x) of degree :s: n is
uniquely determined when we have the correct data, i.e., when we are given
precisely n + 1 values of y at precisely n + 1 distinct points Xo, xl, ... , X n .
However, if we are given fewer data, then the polynomial is under-
determined: i.e., if we have m values of y with m < n + 1 at m distinct
points Xl, X2, ... , X m , then there are as many interpolating polynomials
as the null space of A since in this case A is an m x (n + 1) matrix with
m<n+l.
On the other hand, if we are given more than n + 1 data, then the
polynomial is over-determined: i.e., if we have m values of y with m > n + 1
at m distinct points Xl, X2, ... , X m , then there need not be any interpolating
polynomial since the system could be inconsistent. In this case, the best we
can do is to find a polynomial of degree :s: n to which the data is closest.
We will review this statement again in Section 5.8.
o
The determinant of the coefficient matrix is called the Wronskian for
{h(x), h(x),···, fn(x)} and denoted by W(x). Therefore, if there is a point
Xo E JR such that W(x) i- 0, then the coefficient matrix is nonsingular at
x = Xo, and so all Ci = O. Therefore, if the Wronskian is nonzero at a point
in JR, then {Jl(X), h(x), "', fn(x)} are linearly independent.
Example 3.25 For the sets of functions H = {x,cosx,sinx} and F2
{x, eX, e -X}, the Wronskians are
W,(x) = det [ ~
cos x
-sinx
sinx
cos x
1=x
-cosx -SlllX
and
=2x.
3.10 Exercises
3.1. Let V be the set of all pairs (x, y) of real numbers. Define
(x, y) + (Xl, YI) (X + Xl, Y + YI)
e(x, y) = (ex, y).
Is V a vector space with these operations?
3.2. For x, y E JRn and k E JR, define two operations as
xEBy = x-y, k·x = -kx.
The operations on the right sides are the usual ones. Which of the rules in
the definition of a vector space are satisfied for (JRn, EB, .)?
3.3. Determine whether the given set is a vector space with the usual addition
and scalar multiplication of functions.
(1) The set of all functions f defined on the interval [-1, 1] such that
f(O) = O.
(2) The set of all functions f defined on JR such that limx->oo f(x) = O.
(3) The set of all twice differentiable functions f defined on JR such that
f"(x) + f(x) = o.
3.4. Let C 2 [-1, 1] be the vector space of all functions with continuous second
derivatives on the domain [-1, 1]. Which of the following subsets is a sub-
space of C 2 [-1, I]?
(1) W = {f(x) E C 2[-1, 1] : f"(x) + f(x) = 0, -1 S; X S; I}.
(2) W = {f(x) E C2[-1, 1] : f"(x) + f(x) = x 2, -1 S; X S; I}.
3.5. Which of the following subsets of C[-l, 1] is a subspace of the vector space
C[-l, 1] of continuous functions on [-1, I]?
(1) W = {f(x) E C[-l, 1]: f(-l) = -f(l)}.
(2) W = {f(x) E C[-l, 1] : f(x) 2: 0 for all X in [-1, I]}.
(3) W = {f(x) E C[-l, 1] : f( -1) = -2 and f(l) = 2}.
(4) W = {f(x) E C[-l, 1] : f(~) = O}.
3.6. Does the vector (3, -1, 0, -1) belong to the subspace of JR4 spanned by the
vectors (2, -1, 3, 2), (-1, 1, 1, -3) and (1, 1, 9, -5)?
3.7. Express the given function as a linear combination of functions in the given
set Q.
(1) p(x) = -1- 3x + 3x 2 and Q = {PI (x), P2(X), P3(X)}, where
PI(X) = 1 + 2x + x 2, P2(X) = 2 + 5x, P3(X) = 3 + 8x - 2x 2.
(2) p(x) = -2 - 4x + x 2 and Q = {PI (x), P2(X), P3(X), p4(X)}, where
PI (x) = 1 + 2x2 + x 3, P2 (x) = 1 + X + 2x3, P3 (x) = -1 - 3x - 4x 3,
P4 (x) = 1 + 2x - x 2 + x 3.
118 CHAPTER 3. VECTOR SPACES
3.8. Is {cos2 x, sin 2 x, 1, eX} linearly independent in the vector space C(lR)?
3.9. Show that the given sets of functions are linearly independent in the vector
space C[-7r, 7r].
(1) {I, X, x 2 , x 3 , X4}
(2) {I, eX, e2x , e3x }
(3) {I, sinx, cosx, ... , sinkx, coskx}
3.10. Are the vectors
VI = (1, 1, 2, 4), V2 = (2, -1, -5, 2),
V3 = (1, -1, -4, 0), V4 = (2, 1, 1, 6)
3.17. How many 5 x 5 permutation matrices are there? Are they linearly indepen-
dent? Do they span the vector space Msxs(lR) ?
3.18. Find bases for the row space, the column space, and the null space for each
[: n
of the following matrices.
(1) A =
2
4
1
-3 (2) B =
0
1
2 1
1 -2 -5]
2 ,
l n
2 -1 1 5 0 0
0 1 -1 -2 1
3 1 1 -1 3 1
(3) C ~[ 6
9
(4) D = 2
0
1 -1
0 -2
8
2
3
1
3 5 -5 5 10
2 -6
3.19. Find the rank of A as a function of x: A = [ 2~ 3 -9
1 x
3.20. Find the rank and the largest invertible submatrix of each of the following
[: n
matrices.
2 3
[000 ']
[~
1
o 0 1 0 4 0
(1) 0 1 0 0 ' (2) 1 o 1 ' 2 ] , (3)
2 3
1 1 4
o 0 0 0 0 0
3.21. For any nonzero column vectors u, v, show that the matrix A = UyT has
rank 1. Conversely, every matrix of rank 1 can be written as UyT for some
u, y.
3.22. Determine whether the following statements are true or false, and justify your
answers.
(1) The set of all n x n matrices A such that AT = A -1 is a subspace of
the vector space Mnxn(l~).
(2) If Q and {3 are linearly independent subsets of a vector space V, then so
is their union Q U {3.
(3) If U and Ware subspaces of a vector space V with bases Q and {3
respectively, then the intersection Q n (3 is a basis for U n W.
(4) Let U be the row-echelon form of a square matrix A. If the first r
columns of U are linearly independent, then so are the first r columns
of A.
(5) Any two row-equivalent matrices have the same column space.
120 CHAPTER 3. VECTOR SPACES
Linear Transformations
4.1 Introduction
As we saw in Chapter 3, there are many vector spaces. Naturally, one can ask
whether or not two vector spaces are the same. To say two vector spaces are
the same or not, one has to compare them first as sets, and then see whether
or not their arithmetical rules are preserved. A usual way of comparing two
sets is defining a function between them. Recall that a function from a
set X into another set Y is a rule which assigns a unique element y in Y
to each element x in X. Such a function is denoted as f : X -> Y and
sometimes referred to as a transformation or a mapping. We say that f
transforms (or maps) X into Y. When given sets are vector spaces, one can
compare their arithmetical rules also by a transformation f if f preserves
the arithmetical rules, that is, f(x + y) = f(x) + f(y) and f(kx) = kf(x)
for any vectors x, y and any scalar k. In this chapter, we discuss this kind
of transformations between vector spaces via the linear equation Ax = b.
For an m x n matrix A, the equation Ax = b means that to every vector
x = [Xl X2 ... Xn]T in ~n the matrix multiplication Ax assigns a vector
b (= Ax) in ~m. That is, the matrix A transforms every vector x in ~n
into a vector b in ~m by the matrix multiplication Ax = b. Moreover,
the distributive law A(x + ky) = Ax + kAy, for k E ~ and x, y E ~n, of
matrix multiplication means that A preserves the sum of vectors and scalar
multiplication.
121
122 CHAPTER 4. LINEAR TRANSFORMATIONS
We often call T simply linear. It is not hard to see that the two condi-
tions for a linear transformation can be combined into a single requirement
T(x) = Ax
Example 4.3 (1) Let () denote the angle between the x-axis and a fixed
vector in jR2. Then the matrix
R = [ cos () - sin () ]
(J sin () cos ()
defines a linear transformation on jR2 that rotates any vector in jR2 through
the angle () about the origin, and is called a rotation by the angle ().
(2) The projection on the x-axis is the linear transformation T : jR2 ~
jR2 defined by, for x = (x, y) E jR2,
T(x) = [~ ~] [ :] [~].
(3) The linear transformation T : jR2 ~ jR2 defined by, for x = (x, y),
T(x) = [~ _~] [ : ] = [ _: ]
is called the reflection about the x-axis. o
Problem 4.1 Find the matrix of reflection about the line y = x in the plane 1R2.
for any matrices A and B in Mnxn(!~), which means that "tr" is a linear
transformation. In particular, one can easily show that the set of all n x n
matrices with trace 0 is a subspace of Mnxn(jR). 0
Problem 4.3 Show that, for any matrices A and B in Mnxn(lR), tr (AB) = tr (BA).
Example 4.5 From the calculus, it is well known that two transformations
Theorem 4.3 Let V and W be vector spaces. Let {VI, ... , v n } be a basis
for V and let WI, ... , Wn be any vectors (possibly repeated) in W. Then
there exists a unique linear transformation T : V - t W such that T(vd = Wi
for i = 1, ... , n.
Hence, we have S = T. o
Corollary 4.4 Let V and W be vector spaces, and let {VI, ... , v n } be a
basis for V. If S, T: V ---- Ware linear transformations and S(vd = T(Vi)
for i = 1, ... , n, then S = T, i.e., S(x) = T(x) for all x E V.
Example 4.8 Let WI = (1, 0), W2 = (2, -1), W3 = (4, 3) be three vectors
in ]R2.
(1) Let 0 = {el' e2, e3} be the standard basis for the 3-space ]R3, and
let T : ]R3 ____ ]R2 be the linear transformation defined by
Find a formula for T(XI' X2, X3), and then use it to compute T(2, -3, 5).
(2) Let (3 = {VI, V2, V3} be another basis for ]R3, where VI = (1, 1, 1),
V2 = (1, 1, 0), V3 = (1, 0, 0), and let T : ]R3 ---- ]R2 be the linear transfor-
mation defined by
Find a formula for T(XI' X2, X3), and then use it to compute T(2, -3, 5).
Solution: (1) For x = (Xl, X2, X3) = xlel + X2e2 + X3e3 E ]R3,
3 3
T(x) LXiT(ei) = LXiWi
i=l i=l
xI(I, 0) + x2(2, -1) + x3(4, 3)
(Xl + 2X2 + 4X3, -X2 + 3X3).
Thus, T(2, -3, 5) = (16, 18). In matrix notation, this can be written as
(2) In this case, we need to express x = (Xl, X2, X3) as a linear combi-
nation of VI, V2, V3, i.e.,
3
(Xl, X2, X3) =L kiVi
i=l
4.2. INVERTIBLE LINEAR TRANSFORMATIONS 127
From this formula we obtain T(2, -3, 5) = (9, 23). In matrix notation, it
can be written[; -2 -1 1[ Xl 1=
3 -4 1 X2 o
X3
Problem 4.4 Is there a linear transformation T : ]R3 ---+ ]R2 such that T(3, 1, 0) =
(1, 1) and T( -6, -2, 0) = (2, I)? If yes, can you find an expression of T(x) for
x = (Xl, X2, X3) in ]R3?
Problem 4.5 Let V and W be vector spaces and T : V ---+ W be linear. Let
{Wl' W2, ... , wd be a linearly independent subset of the image Im(T) ~ W. Sup-
pose that 0: = {Vl' V2, ... , Vk} is chosen so that T(Vi) = Wi for i = 1, 2, ... , k.
Prove that 0: is linearly independent.
Problem 4.6 Suppose that 5 and T are linear transformations whose composition
5 0 T is well-defined. Prove that
(1) if 5 0 T is one-to-one, so is T,
(2) if 5 0 T is onto, so is 5,
(3) if 5 and T are isomorphisms, then so is 50 T,
(4) if A and B are two n x n matrices of rank n, then so is AB.
Theorem 4.7 Two vector spaces V and Ware isomorphic if and only if
dimV = dim W.
Conversely, suppose that dim V = dim W. Then one can choose any
bases {VI, ... , v n } and {WI, ... , w n } for V and W, respectively. By
Theorem 4.3 there exists a linear transformation T : V - t W such that
T(vi) = Wi for i = 1, ... , n. It is not hard to show that T is invertible so
that T is an isomorphism. Hence V and Ware isomorphic. 0
n n
q,(x) = ~aiq,(Vi) = ~aiei = (aI, ... , an) =
[ al
a:
1
E ]Rn,
n
which is called the coordinate vector of x with respect to the basis ct, and
is denoted by [xJa(= q,(x)). Clearly [ViJa = ei·
Example 4.9 Recall that, from Example 4.3, the rotation by the angle ()
of ]R2 is given by the matrix
R = [ cos ()
e sin ()
- sin ()
cos ()
1.
Clearly, it is invertible and hence an isomorphism of ]R2. In fact, one can
easily check that the inverse R;l is simply R-e.
Let ct = {el,e2} be the standard basis, and let f3 = {Vl,V2}, where
Vi = Reei, i = 1,2. Then f3 is also a basis for ]R2. The coordinate vectors of
Vi with respect to ct are themselves
[VIJa = [
COS ()
sin ()
1,[V2Ja = [ - cos
sin () 1
() ,
4.2. INVERTIBLE LINEAR TRANSFORMATIONS 131
while
Example 4 .10 In Problem 4.1, one can notice that the reflection about
the line y = x may be obtained by the compositions of rotation by - ~
of the plane, reflection about the x-axis, and rotation by ~. Actually, it
is multiplication of the matrices given in (1) and (3) of Example 4.3 with
e = ~: that is, if we denote rotation by ~ by
RJ!..
4
=[ C?s"47r
sm 1I
. "47r]
- sm
cos 1I
= [..Lv'2
1
-~2]
v ..
1 ,
4 4 72 v'2
and reflection about the x-axis by [~ _~], then R_% = R~l , and the
matrix we want is
RJ!..
4
[10 -10]R-;.l
4
= [1 -1] [10 -10] [--1 -1]
-
v'2
-
v'2 v'2 v'2
= [01 01].
The reflection about any line € in the plane can be obtained in this way:
y
R- o
y = Ro oTo R-o(x)
x T x = R-o(€)
Ro To R_o(x)
Problem 4.8 Find the matrix of reflection about the line y = J3x in JR2.
Problem 4.9 Find the coordinate vector of 5 + 2x + 3x 2 with respect to the given
ordered basis Q' for P2(JR) :
(1) Q' = {I, X, x 2}; (2) Q' = {I + x, 1 + x 2, X+ x 2}.
132 CHAPTER 4. LINEAR TRANSFORMATIONS
Problem 4.10 Let T : ]R3 ~ ]R3 be the linear transformation given by T(x, y, z) =
(x + y, y + z, x + z). Let C denote the unit cube determined by the standard basis
el, e2, e3. Find the volume of the image parallelepiped T(C) of C under T.
vertices 1 2 3 4 5 6
x-coordinate
y-coordinate [~ 0
2.0
0.5 0.5
2.0 0.5
2.0
0.5
2.0
0.0
1
= A
.
Of course, we assume that the computer knows which vertices are connected
to which by lines via some other algorithm. We know that line segments are
transformed to other line segments by a matrix, considered as a linear trans-
formation. Thus, by multiplying a matrix to A, the vertices are transformed
to the other set of vertices, and the line segments connecting the vertices are
preserved. For example, the matrix B = [~ 0.~5l transforms the matrix
A to the following form, which represents new coordinates of the vertices:
vertices 1 2 3 4 5 6
BA =
[~ 0.5 1.0 0.625 2.125 2.0
2.0 2.0 0.5 0.5 0.0
1.
Now, the computer connects these vertices properly by lines according to
the given algorithm and displays on the screen the changed figure as the left
side of the following:
BA
===> ~5
1
n 3
6
(CB)A
===>
! /J
Rotation by 7r
~ Reflection Reflection ~
\\ Rotation by
~
7r
J\
The above argument generally applies to figures in any dimension. For
instance, a 3 x 3 matrix may be used to convert a figure in ]R3 since each
point has 3 components.
,R(y,(3) = [
COS
0
f3
~ - s~nf3l '
sin f3 o cos f3
are the rotations about the x, y, z-axes by the angles 0:, f3 and ",(, respectively.
In general, the matrix that rotates ]R3 with respect to a given axis is
useful in many applications. One can easily express such a general rotation
as a composition of basic rotations such as R(x,o:), R(y,(3) and R(z,,).
z
u
--,~::........L ___ Y
x
4.4. MATRICES OF LINEAR TRANSFORMATIONS 135
Suppose that the axis of a rotation is the line determined by the vector
u = (cos 0' cos j3, cos 0' sin j3, sin 0'), - ~ :S 0' :S ~, 0 :S j3 :S 27r, in spherical
coordinates, and we want to find the matrix R(u,B) of the rotation about the
u-axis by (): For this, we first rotate the u-axis about the z-axis into the
xz-plane by R(z,_(3) and then about the y-axis into the x-axis by R(y,-o:).
The rotation about the u-axis is the same as the rotation about the x-axis,
i.e., one can use the rotation R(x,B) about the x-axis. After this, we get back
to the rotation about the u-axis via R(y,o:) and R(z,(3)' In summary,
R(u,B) = R(z,(3)R(y,o:)R(x,B)R(y,-o:)R(z,_(3)· o
Problem 4.11 Find the matrix R(u,f) for the rotation about the line determined by
7r
U = (1, 1, 1) by "4'
1 T(vIi
T(v2)
aUwI
aI2 w l
+
+
a2I w 2
a22 w 2
+
+
+
+
amIWm
a m 2W m
for some scalars aij (i 1, ... , m; j = 1, ... , n). Notice the indexing
order of aij in this expression: The coordinate vector [T(vj )J/3 of T(vj) with
respect to the basis (3 can be written as a column vector
n n m
T(x) LxjT(Vj) = LXj Laijwi
j=l j=l ;=1
[T(x)]/3 = = A[x]Q,
where [x]Q = [Xl ... xn]T is the coordinate vector of x with respect to
the basis 0' in V. In this sense, we say that matrix multiplication by A
represents the transformation T. Note that A = [aij] is the matrix whose
column vectors are just the coordinate vectors [T(vj)]/3 ofT(vj) with respect
to the basis (3. Moreover, for the fixed bases 0' for V and (3 for W, the matrix
A associated with the linear transformation T with respect to these bases is
unique, because the coordinate expression of a vector with respect to a basis
is unique. Thus, the assignment of the matrix A to a linear transformation
T is well-defined.
Definition 4.4 The matrix A is called the associated matrix for T (or
matrix representation of T) with respect to the bases 0' and (3, and de-
noted by A = [T]~.
ordered bases Ct for V and /3 for W, the coordinate vector [T(x)],a of T(x)
with respect to /3 is given as a matrix product of the associated matrix [T]~
ofT and [x]a, i.e.,
[T(x)],a = [T]~[x]a.
The associated matrix [T]~ is given as
identity matrix.
Example 4.14 Let T : H (lR) ----t P2(lR) be the linear transformation defined
by
(T(p))(x) = xp(x).
Then, with the bases a = {1, x} and (3 = {1, x, x 2} for H (lR) and P2(lR) ,
Hence, [Tl~ ~ [~ ~]. If (I' ~ {es, e" e,l, then [Tl~' ~ [~ ~]. 0
Solution: Note that (a, b) = ael + be2 for any (a, b) E lR 2. Thus the
definition of T shows
T(ed -el
T(e2) = el
4.4. MATRICES OF LINEAR TRANSFORMATIONS 139
Problem 4.12 Find the matrix representation of each of the following linear trans-
formations T of ]R3 with respect to the standard basis a = {el' e2, e3}, and
f3 = {e3, e2, ed:
(1) T(x, y, z) = (2x - 3y + 4z, 5x - y + 2z, 4x + 7y),
(2) T(x, y, z) = (2y + z, x - 4y, 3x).
Problem 4.14 Let Id: ]Rn -+]Rn be the identity transformation. Let Xk denote the
vector in ]Rn whose first k -1 coordinates are zero and the last n - k + 1 coordinates
are 1. Then clearly f3 = {Xl, ... , xn} is a basis for ]Rn (see Problem 3.9). Let
a = {el' ... , en} be the standard basis for ]Rn. Find the matrix representations
[IdJ~ and [Idj3.
140 CHAPTER 4. LINEAR TRANSFORMATIONS
for any T E £(V; W) (see Section 4.4). If [Sl~ = [Tl~ for Sand T E
£(V; W), then we have S = T by Corollary 4.4. This means that ¢ is
one-to-one.
On the other hand, an m x n matrix A, considered as a linear transfor-
mation from lR n to lR m , gives rise to a linear transformation T from V to
W via the composition of A with the natural isomorphisms <1> and W, i.e.,
T = w- 1 0 A 0 <1>, which satisfies [Tl~ = A. This means that ¢ is onto.
Therefore, ¢ gives an one-to-one correspondence between £(V; W) and
Mmxn(lR). Furthermore, the following theorem shows that ¢ is linear, so
that it is in fact an isomorphism from £(V; W) to Mmxn(lR).
Theorem 4.10 Let V and W be vector spaces with ordered bases a and /3,
respectively, and let S, T: V -+ W be linear. Then we have
Thus
[8 + Tl~ = [8l~ + [Tl~·
The proof of the second equality [k8l~ = k[8l~ is left as an exercise. 0
Theorem 4.11 Let V, Wand Z be vector spaces with ordered bases a, (3,
and /, respectively. Suppose that S : V --t Wand T : W --t Z are linear
transformations. Then
[T 0 SJ~ = [TJJ[SJ~.
Proof: Let a = {VI, ... , v n }, (3 = {WI, ... , w m } and / = {ZI' ... , zd·
Let [TJ3 = [aijJ and [SJ~ = [bpqJ. Then, for 1 ::; i ::; n
= f
k=1
bki (t
j=1
ajkZ j ) = t (f
j=1 k=1
ajkbki) Zj.
Problem 4.15 Let a be the standard basis for lR3, and let S, T : lR3 ---> lR3 be two
linear transformations given by
S(el) = (2, 2, 1), S(e2) = (0, 1, 2), S(e3) = (-1, 2, 1),
T(ed = (1, 0, 1), T(e2) = (0, 1, 1), T(e3) = (1, 1, 2).
Compute [S + Tl"" [2T - Sl", and [T a Sl",.
Problem 4.16 Let T : P2(lR) ---> P2(lR) be the linear transformation defined by
T(f) = (3+x)/' +2/, and S: P2(lR) ---> lR3 defined by S(a+bx+cx 2) = (a-b, a+
b, c). For a basis a = {1, X, x 2 } for P2 (lR) and the standard basis f3 = {el' e2, e3}
for lR 3 , compute [Sl~, [Tl"" and [S a Tl~.
Theorem 4.12 Let V and W be vector spaces with ordered bases a and (3,
respectively, and let T : V --t W be an isomorphism. Then
Proof: Since T is invertible, dim V = dim W, and the matrices [TJ~ and
[T- I J3 are square and of the same size. Thus,
Problem 4.17 For the vector spaces P1(lR) and ]R2, choose the bases a = {I, x} for
P1(]R) and (3 = {el, e2} for ]R2, respectively. Let T : Pl(]R) -+ ]R2 be the linear
transformation defined by T(a + bx) = (a, a + b).
(1) Show that T is invertible. (2) Find [Tl~ and [T- l l3.
o x
[ x]
y = [ cos
sinO
0 - cosO
sin 0 ] [
y' x' ] ,0r[x]a=[1d];3[x];3,
a
where
[I d13 = [[e~]a [e2]al = [ ~~~: - ;~:: ].
Note that [Id]/-'(./ = ([Id]a)- 1
=[
.cos 0 sin 0 ] by Theorem 4.12.
a ;3 - sm 0 cos 0
or
[x]n = [1d]3[x]/3
1d
v V
j
x f---v-> x
~' T T
[x]/3 f-----v-> [x]n
j~
jRn ]Rn ,
Q = [1d]3
where
q1n 1= [ [W1]n ... [Wn]n].
qnn
Example 4.18 Let the 3-space ]R3 be equipped with the standard xyz-
coordinate system, i. e., with the standard basis a = {e1' e2, e3}. Take
a new x' y' z' -coordinate system by rotating the xyz-system around its z-
axis counterclockwise through an angle (), i. e., we take a new basis (3 =
{e~, e;, e3} by rotating the basis a about z axis through (). Then we get
[e~]n = [
COS ()
Si~ ()
1,
146 CHAPTER 4. LINEAR TRANSFORMATIONS
COS () - sin ()
Q = [I d]~ = [ sin () cos ()
o 0
so
Q-l = [ld]~ = [
COS ()
sin () 0
-s~n() cos () 0
1
,
o 1
so that
[
X'
y' 1 [
=
c~s ()
- sm ()
sin ()() 00
cos 1[ xy1.
~ 001 z o
Problem 4.18 Find the transition matrix from a basis 0: to another basis f3 for the
3-space ]R3, where
0: = {(I, 0, 1), (1, 1,0), (0, 1, I)}, f3 = {(2, 3,1), (1,2,0), (2,0, 3)}.
4.7 Similarity
The coordinate expression of a vector in a vector space V depends on the
choice of an ordered basis. Hence, the matrix representation of a linear
transformation is also dependent on the choice of bases.
Let V and W be two vector spaces of dimensions nand m with two
ordered bases a and {3, respectively, and let T : V -+ W be a linear transfor-
mation. In Section 4.4, we discussed how to find [T]~. If we have different
bases a' and (3' for V and W, respectively, then we get another matrix
representation [T]~; of T. We, in fact, have two different expressions
Therefore, we get
]Rm
T
(V, a') - - - - - -. (W, f3')
:/
[IdvJ~, Idv j IIdw [Idwl~'
(V,a) ___T_ ___• (W, (3)
~ ]Rm
Example 4.19 Let (3 = {VI, V2, V3} be a basis for lR?3 consisting of VI =
(1, 1, 0), V2 = (1, 0, 1) and V3 = (0, 1, 1). Let T be the linear transfor-
mation on lR?3 given by the matrix
[T],e = [ ~ ~ ~ - ].
-1 1 1
Let 0' = {el' e2, e3} be the standard basis. Find the transition matrix [1d]~
and [TJa.
4.7. SIMILARITY 149
o1 0]
1 , and [Idl~ = ([Idl3)-I = ~ -1~ -~1
[
-1 ]
1 .
1 1 1
Therefore,
Example 4.20 Let T : ]R3 --t ]R3 be the linear transformation defined by
Let Ct = {eI' e2, e3} be the standard ordered basis. Then we clearly have
Let (3 = {VI, V2, V3} be another ordered basis for ]R3 consisting of VI
(-1, 0, 0), V2 = (2, 1, 0), and V3 = (1, 1, 1). Let Q = [Idl3 be the
transition matrix from (3 to Ct. Since Ct is the standard ordered basis for ]R3,
the columns of Q are simply the vectors in (3 written in the same order, with
an easily calculated inverse. Thus
Q= [-1001
21]
0 1 1 ,
To show that this is the correct matrix, we can verify that the image under T
of the j-th vector of (3 is the linear combination of the vectors of (3 with the
entries of the j-th column of [Tli3 as its coefficients. For example, for j = 2
150 CHAPTER 4. LINEAR TRANSFORMATIONS
we have T(v2) = T(2, 1,0) = (5,3, -1). On the other hand, the coefficients
of [T(v2)]{3 are just the entries of the second column of [T]{3. Therefore,
2Vl + 4V2 - V3
12el + 4(2el + e2) - (el + e2 + e3)
5el + 3e2 - e3 = (5,3, -1),
as expected. D
The next theorem shows that two similar matrices are matrix represen-
tations of the same linear transformation.
Theorem 4.15 Suppose that A represents a linear transformation T : V ---+
V on a vector space V with respect to an ordered basis a = {VI, ... , v n },
i.e., [TJQ = A. If B = Q-l AQ for some nonsingular matrix Q, then there
exists a basis f3 for V such that B = [T]{3, and Q = [IdJ3.
!
Proof: Let Q = [qij] and let WI, ... , Wn be the vectors in V defined by
WI = qn Vl+q21 V2+···+qnlv n
w2 = q12v l + Q22 v 2 + ... + Qn2 v n
Example 4.21 Let D be the differential operator on the vector space P2(lR).
Given two ordered bases a = {I, x, x 2 } and f3 = {I, 2x, 4x 2 - 2} for P 2 (lR),
we first note that
D(l) = o· 1 + 0 . x + 0 . x 2
D(x) 1 . 1 + 0 . x + 0 . x2
D(x 2 ) = o. 1 + 2 . x + 0 . x 2 .
Hence, the matrix representation of D with respect to a is given by
[D]a ~ [~ ~ ~ 1
4.7. SIMILARITY 151
D (1) 0 . 1 + 0 . 2x + 0 . (4x 2 - 2)
D (2x ) = 2· 1 + 0 . 2x + 0 . (4x 2 - 2)
D (4x 2 - 2) = O· 1 + 4 . 2x + 0 . (4x 2 - 2).
Thus,
Q = [1 d]3 = [1 0 -2]
0 2
004
0 ,
Problem 4.20 Suppose that A and B are similar n x n matrices. Show that
(1) det A = det B,
(2) tr A = tr B,
(3) rank A = rank B.
Problem 4.21 Let A and B be n x n matrices. Show that if A is similar to B, then
A 2 is similar to B2.
152 CHAPTER 4. LINEAR TRANSFORMATIONS
Example 4.22 Let C[a, b] be the vector space of all continuous real-valued
functions on the interval [a, b]. The definite integral I : C[a, b] --+ ]R defined
by
I(J) = lb f(t)dt
is a linear functional of C [a, b]. In particular, if the interval is [0, 27r] and n
is an integer, then
Fn(J) = ~ r
27r Jo
27r
f(t)e-intdt
Note that as we saw in Section 4.5, the set of all linear functionals of V
is the vector space £(V;]RI) whose dimension equals the dimension of V (see
page 141).
Definition 4.7 For a vector space V, the vector space of all linear func-
tionals of V is called the dual space of V and denoted by V*.
Theorem 4.16 The set Q* = {vi, V2' ... , V~} forms a basis for the dual
space V*, and for any T E V* we have
n
T = LT(Vi)vi.
i=l
Proof: Clearly, the set Q* = {vi, V2' ... , v~} is linearly independent,
since 0 = 2:i=l civi implies 0 = 2:i=l CiVi(Vj) = Cj for each j = 1, ... , n.
Moreover, the set Q* spans V*; for any T E V* and any Vj E Q, we have
Definition 4.8 For a basis Q = {VI, V2, ... , v n } for a vector space V, the
basis Q* for V* is called the dual basis of Q.
This theorem says that, for a fixed basis Q = {VI, ... , v n } for V, the
transformation * : V ---t V* given by *(vd = vi is an isomorphism between
V and V*. Therefore, we have the following corollary.
Example 4.24 Let Q = {(I, 2), (1, 3)} be a basis for]R2. To determine the
dual basis Q* = {j, g} of Q, we consider the equations
Example 4.25 Consider V =]Rn with the standard basis Q = {el, ... ,en},
and its dual basis Q* = {ei, ... , e~} for ]Rn*. Then for a vector a =
(al, ... , an) = aIel + .. +anen E ]Rn, we have ei(a) = ei(alel + .. ·+anen )·=
ai. That is,
Now, consider two vector spaces V and W with fixed bases ex and (3,
respectively. Let 8 : V ---+ W be a linear transformation from V to W. Then
for any linear functional 9 E W*, i. e., 9 : W ---+ IR, it is easy to see that the
composition go S(x) = g(S(x)) for x E V defines a linear functional on V,
i. e., 9 0 S E V*. Thus, we have a transformation S* : W* ---+ V* defined by
S* (g) = 9 0 8 for 9 E W*.
L akiw;(wk)
k=l
Example 4.26 With the identification lRn * = lRn in Example 4.25, the
transpose AT of a matrix A is actually A *:
o
For two linear transformations S : U -+ V and T : V -+ W, it is quite
easy to show (the readers may try to) that
(T 0 S)* = S* 0 T*.
Thus, if S : V -+ W is an isomorphism, then so is its transpose S* : W* -+
V*. In particular, since * : V -+ V* is an isomorphism, so is its transpose
** : V* -+ V**. Note that even though the isomorphism * : V -+ V* depends
on a choice of a basis for V, there is an isomorphism between V and V**
that does not depend on a choice of bases for the two vector spaces: We first
define, for each x E V, x: V* -+ lR by x(f) = f(x) for every f E V*. It is
easy to verify that x is a linear functional on V*, so x E V**. We will show
below that the mapping <I> : V -+ V** defined by <I>(x) = x is the desired
isomorphism between V and V**.
Lemma 4.19 If x(f) = 0 for all f E V*, i.e., x=0 in V**, then x = O.
Proof: Suppose that x -=I O. Choose a basis a = {VI, V2, ... , v n } for V
with VI = x. Let a* = {vi, v2' ... , v~} be the dual basis of a. Then
x(Vi) = vi(x) = Vi(VI) = 1,
which contradicts the hypothesis. o
Proof: To show the linearity of <I> , let x, y E V and k a scalar. Then, for
any f E V*,
Problem 4.22 Let 0: = {(I, 0, 1), (1, 2, 1), (0, 0, I)} be a basis for JR3. Find the
dual basis 0:*.
4.9 Exercises
4.1. Which of the following functions T are linear transformations?
(1) T(x, y) = (x 2 _ y2, x 2 + y2).
(2) T(x, y, z) = (x + y, 0, 2x + 4z).
(3) T(x, y) = (sinx, y).
(4) T(x, y) = (x + 1, 2y, x + y).
(5) T(x, y, z) = (lxi, 0).
4.2. Let T : P2(JR) -+ P3 (lR) be a linear transformation such that T(l) = 1, T(x) =
x 2, and T(x 2) = x 3 + x. Find T(ax 2 + bx + c).
4.3. Find SoT and/or To S whenever it is defined.
(1) T(x, y, z) = (x - y + z, x + z), S(x, y) = (x, x - y, y);
(2) T(x, y) = (x, 3y + x, 2x - 4y, y), S(x, y, z) = (2x, y);
4.4. Let S : C(lR) -+ C(JR) be the function on the vector space C(lR) defined by,
for f E C(JR),
S(f)(x) = f(x) _jX uf(u)du.
Show that S is a linear transformation on the vector space C(lR).
4.5. Let T be a linear transformation on a vector space V such that T2 = Id and
Ti=Id. LetU={vEV:T(v)=v}and W={vEV:T(v)=-v}. Show
that
(1) at least one of U and W is a nonzero subspace of V;
(2) Un W = {O};
(3) V = U + W.
4.7. Show that each of the following linear transformations T on ]R3 is invertible,
and find a formula for T- I :
(1) T(x, y, z) = (3x, x - y, 2x + y + z).
(2) T(x, y, z) = (2x, 4x - y, 2x + 3y - z).
4.8. Let S, T: V --4 V be linear transformations of a vector space V.
(1) Show that if T 0 S is one-to-one, then T is an isomorphism.
(2) Show that if T 0 S is onto, then T is an isomorphism.
(3) Show that if Tk is an isomorphism for some positive k, then T is an
isomorphism.
4.9. Let T be a linear transformation from ]R3 to ]R2, and let S be a linear trans-
formation from ]R2 to ]R3. Prove that the composition SoT is not invertible.
4.10. Let T be a linear transformation on a vector space V satisfying T - T2 = I d.
Show that T is invertible.
4.11. Let T : P3(]R) --4 P3(]R) be the linear transformation defined by
J
--4
.
4 . 20 . Show t h at a 11 matrIces f h ~
0 t e lorm
[ cos
. ()() ··1ar.
sin ()() ] are SImI
sm - cos
n n
4.21. Show that the matrix A = [~ ~] cannot be similar to a diagonal matrix.
4.25. Let T be the linear transformation from 1R3 into 1R2 defined by
T(X1' X2, X3) = (Xl + X2, 2X3 - Xl)'
(1) For the standard ordered bases Q and (3 for 1R3 and 1R2 respectively, find
the associated matrix for T with respect to the bases Q and (3.
(2) Let Q = {Xl, X2, X3} and (3 = {Y1, Y2}, where Xl = (1,0, -1), X2 =
(1,1,1), X3 = (1,0,0), and Y1 = (0,1), Y2 = (1,0). Find the associated
matrices [TJ~ and [T*J3:.
4.26. Let T be the linear transformation from 1R3 to 1R4 defined by
T(x, y, z) = (2x+y+4z, x+y+2z, y+2z, x+y+3z).
Find the range and the kernel of T. What is the dimension of C(T)? Find
{3 {3'
[TJo and [T*Jo" where
Q = {(I, 0, 0), (0,1,0), (0,0, I)}
x· y = IIxlillyll cos().
161
162 CHAPTER 5. INNER PRODUCT SPACES
In particular, two vectors x and yare orthogonal (i. e., they form a right
angle e = 7r /2) if and only if the Pythagorean theorem holds:
Note that by symmetry (1), additivity (2) and homogeneity (3) also hold
for the second variable: i. e.,
(2') (x,y+z)= (x,y)+ (x,z),
(3') (x, ky) = k(x, y).
Now it is easy to show that (0, y) = 0(0, y) = 0, and (x,O) = o.
Example 5.1 For vectors x = (Xl, X2) and y = (Yl, Y2) in ]R2, define
where a, band c are arbitrary real numbers. Then this function ( , ) clearly
satisfies the first three rules of the inner product. Moreover, if a > 0 and
ab - c2 > 0 hold, then it also satisfies rule (4), the positive definiteness of
the inner product. (Hint: The equation (x, x) = aXI + 2CXlX2 + bx§ 2': 0
if and only if either X2 = 0 or the discriminant of (x, x) / x§ is nonpositive.)
Note that the equation can be written as matrix products:
In the case of C = 0, this reduces to (x, y) = aXlYl + bX2Y2. Notice also that
a = (el,el), b = (e2,e2) and c = (el,e2) = (e2,el). 0
Example 5.2 Let V = C [0, 1] be the vector space of all real-valued con-
tinuous functions on [0, 1]. For any two functions f(x) and g(x) in V, define
1 - 2x if 0 :S X :S ~, 0 if 0 :S x :S ~,
f (x) =
{
0 if ~ :S X :S 1,
and g(x) ={
2x - 1 if ~ :S x :S 1.
Problem 5.1 Prove that equality in the Cauchy-Schwarz inequality holds if and only
if the vectors x and yare linearly dependent.
5.2. THE LENGTHS AND ANGLES OF VECTORS 165
The lengths and angles of vectors in an inner product space are defined
similarly to the case of the Euclidean n-space.
Ilxll = J(x, x) .
The distance between two vectors x and y, denoted by d(x,y), is defined
by
d(x,y) = Ilx - YII·
(x,y)
cos () = Ilxllllyll' or (x, y) = IIxlillyll cos (),
cos() = (x,y) = 2
Ilxllllyll v'l.'4-2'
D
Problem 5.2 Prove the following properties of length in an inner product space V:
For any vectors x, y E V,
(1) Ilxll ~ 0,
(2) Ilxll = 0 if and only if x = 0,
(3) Ilkxll = Iklllxll,
(4) Ilx + yll :s Ilxll + lIyll (triangular inequality).
Problem 5.3 Let V be an inner product space. Show that for any vectors x, yand
z in V.
166 CHAPTER 5. INNER PRODUCT SPACES
(1) d(x, y) ~ 0,
(2) d(x, y) = 0 if and only if x = y,
(3) d(x, y) = d(y, x),
(4) d(x,y):::; d(x,z) +d(z,y) (triangular inequality).
Therefore, an inner product in the 3-space ]R3 may play the roles of a
ruler and a protractor in our physical world.
Definition 5.4 Two vectors x and y in an inner product space are said to
be orthogonal (or perpendicular) if (x,y) = o.
Note that for nonzero vectors x and y, (x, y) = 0 if and only if e = 7r /2.
Lemma 5.2 Let V be an inner product space and let x E V. Then the vector
x is orthogonal to every vector y in V (i. e., (x, y) = 0 for all y in V) if and
only if x = o.
Corollary 5.3 Let V be an inner product space, and let 0: = {VI, ... , V n}
be a basis for V. Then a vector x in V is orthogonal to every basis vector
Vi in 0: if and only if x = o.
Proof: If (x, Vi) = 0 for i = 1, ... , n, then (x, y) = I::?=I Yi(X, Vi) = 0 for
any y = I::?=I YiVi E V. o
Moreover, it deduces the Pythagorean theorem: IIx + yll2 = IIxll2 + lIyll2 for
any orthogonal vector x and y. 0
5.3. MATRIX REPRESENTATIONS OF INNER PRODUCTS 167
Proof: Suppose CIXI + C2X2 + ... + CkXk = o. Then for each i = 1, ... , k,
Ci IIXi 11 2 ,
because Xl, ... , Xk are mutually orthogonal. Since each Xi is not the zero
vector, Ilxill =f. 0; so Ci = 0 for i = 1, ... , k. 0
r: ;
Problem 5.4 Let f(x) and g(x) be continuous real-valued functions on [0, 1]. Prove
(1) [f01 f(x )g(x )dx [f01 f2(x )dx] [f01 g2 (x )dx] ,
1 1 1
(2) [f01(f(x) + g(x))2dx] "2 :::; [f~ f2(x)dx] "2 + [f01 g2(x)dx] "2 •
holds. If we set aij = (Vi, Vjl for i,j = 1, ... , n, then these numbers
constitute a symmetric matrix A = [aij], since (Vi, Vjl = (Vj, ViI. Thus, in
matrix notation, the inner product may be written as
n n
(x, YI = L L XiYjaij = [x];A[Y]a.
i=l j=l
Example 5.6 (1) With respect to the standard basis {e1' e2, ... , en} of
the Euclidean n-space JR n , the matrix representation of the dot product is the
identity matrix, since ei . ej = Dij. Thus for x = L Xiei, y = L Yjej E JRn
the dot product is the matrix product x T y:
Proof: It is enough to show that the column vectors of A are linearly inde-
pendent. Let Q = {VI, ... , v n } be a basis for an inner product space V.
We denote the column vectors of A = [aij] = [(Vi, Vj)] byaj for j = 1,···, n.
Consider the linear dependence of the column vectors of A: for C1, .•. ,Cn E JR,
5.3. MATRIX REPRESENTATIONS OF INNER PRODUCTS 169
Let c = Ef=l CiVi E V so that [cJa: = [Cl ... CnV. Then this equation be-
comes a homogeneous system 0 = A[cJa: of n linear equations in n unknowns:
n
0 = anCl + ... + alnCn = LaljCj = [vlJ;A[cJa: = (Vl' c),
j=l
n
0 = anlcl + ... + annCn = Lanjcj = [vnJ;A[cJa: = (vn,c),
j=l
where we used [ViJa: = ei. Thus, by Corollary 5.3, we get c = E~l CiVi = 0,
and the columns of A are linearly independent. 0
Recall that the conditions a > 0 and ab - c2 > 0 in (2) of Example 5.1
are sufficient for A to give rise to an inner product on jR2.
The standard basis of the Euclidean n-space jRn has a special property:
The basis vectors are mutually orthogonal and are of length 1. In this sense,
it is called the rectangular coordinate system for jRn. In an inner product
space, a vector with length 1 is called a unit vector. If x is a nonzero vector
in an inner product space V, the vector II~II x is a unit vector. The process of
obtaining a unit vector from a nonzero vector by multiplying by the inverse
of its length is called a normalization. Thus, if there is a set of vectors
(or a basis) in an inner product space consisting of mutually orthogonal
vectors, then the vectors can be converted to unit vectors by normalizing
them without losing their mutual orthogonality.
Problem 5.5 Normalize each of the following vectors in the Euclidean space jR3:
(1) u = (2, 1, -1), (2) v = (1/2, 1/3, -1/4).
Definition 5.5 A set of vectors xl, X2, ... , Xk in an inner product space
V is said to be orthonormal if
(orthogonality) ,
(normality).
A set {Xl, X2, ... , xn} of vectors is called an orthonormal basis for V if
it is a basis and orthonormal.
It will be shown later that any inner product space has an orthonormal
basis, just like the standard basis for the Euclidean n-space jRn.
170 CHAPTER 5. INNER PRODUCT SPACES
Problem 5.6 Determine whether each of the following sets of vectors in ]R2 is or-
thogonal, orthonormal, or neither with respect to the Euclidean inner product.
(1) {[ ~ ] , [ ~ ]} (2) {[ ~ ] , [ ~ ]}
(3) {[ ~ ] ,[ -~ ]} (4) {[ ~j~ ] ,[ -~j~ ]}
The next theorem shows a simple expression of a vector in terms of an
orthonormal basis.
(XlVl+···+XnVn , Vi)
This expression looks like the dot product in the Euclidean space ]Rn. Thus
any inner product on V can be written just like the dot product in ]Rn, if V
is equipped with an orthonormal basis.
5.4. ORTHOGONAL PROJECTIONS 171
W V
Tw(x) = (0 , 1)
_____x = (2 1) x = (2 1)
(2 0) U (1 0) U
Since the pairs {u, w} and {u, v} are linearly independent, the space
]R2 can be expressed as the direct sum in two ways: ]R2 = U EB W = U EB V.
Thus a vector x = (2, 1) E ]R2 may be written in two ways:
Let Tw and Tv denote the projections of]R2 onto Wand V along U, respec-
tively. Then
proves Im(T) n Ker(T) = {O}. Note that this also shows T(u) = u for
u E Im(T). Then, dim V = dim(Im(T)) + dim(Ker(T)) (see Remark (2) in
page 138) implies V = Im(T)+Ker(T). Now, note that T(u+w) = T(u) = u
for any u + wE Im(T) EB Ker(T). 0
Proof: (1) Suppose that dimU = k. Choose a basis {VI, ... , vd for U,
and then extend it to a basis {VI, ... , Vb vk+1, ... , v n } for V, where n =
dim V. Then x = 2:.7=1 XjVj E Ul. if and only if 0 = (x, Vi) = 2:.7=1 aijXj for
1 ::; i ::; k, where aij = (Vi, Vj). The latter equations form a homogeneous
system of k linear equations in n unknowns, that is, Ul. is precisely the null
space of the k x n coefficient matrix B = [aij], which is a submatrix of the
matrix representation A of the inner product. Thus, by Theorem 5.5 the
rows of B are linearly independent, so B is of rank k. Therefore, the null
space has dimension n - k, or dim Ul. = n - k = n - dim U.
(2) By definition, every vector in U is orthogonal to Ul., i. e., U ~ (U .1 ) .1 .
On the other hand, by (1), dim(Ul.)l. = n - dimUl. = dimU. This proves
that (Ul.)..l = U.
(3) For a basis {VI, ... , vd for U, take any basis {Vk+1' ... , v n } for
Ul.. Since Un ul. = {O}, the set {VI, ... , Vk, Vk+1, ... , v n } is linearly
independent, so it is a basis for V. Therefore, every vector x E V has a
unique expression
k n
X = Laivi + L bjvj.
i=l j=k+1
Now take xu = 2:.~ aivi E U and XU-L = 2:.k+l bjvj E Ul.. To show unique-
ness, let x = U + w be another expression with u E U and w E Ul.. Then
Xu - u = w - XU-L E Un Ul. = {O}. So, Xu = u and XU-L = w. D
where the second equality comes from the Pythagorean theorem for x -
Proju(x) ..1 Proju(x) - y. 0
x
x - Proj u( x ) = Proju-L (x )
U
o
Problem 5.9 Let U and W be subspaces of an inner product space V. Show that
(1) (U + W)l. = Ul. n Wl.. (2) (U n W)l. = Ul. + Wl..
Problem 5.10 Let U C ]R4 with the Euclidean inner product be the subspace
spanned by (1 , 1, 0, 0) and (1 , 0, 1, 0) , and W C ]R4 the subspace spanned
by (0, 1, 0, 1) and (0, 0, 1, 1). Find a basis for and the dimension of each of the
following subspaces:
(1) U + W, (2) Ul., (3) Ul. + Wl., (4) Un W.
176 CHAPTER 5. INNER PRODUCT SPACES
Proof: Let z = (x, Ul)Ul + (x, U2)U2 + ... + (x, Urn) Urn· It is enough to
show that y = x - z is orthogonal to U, because if y = x - z E U J.., then
x = z + y E U EB U J.., so the uniqueness of this orthogonal decomposition
gives z = Proju(x). However, for each j = 1, ... , m,
since {Ul' U2, ... , urn} is an orthonormal basis for U. That is, the vector
x - z = x - L:i(x, Ui)Ui is orthogonal to U. 0
where (x, u) = I/xll cos (). On the other hand, it is quite clear that y
x - (x, u)u is a vector orthogonal to u. Thus
x= (x,u)u+y E UEBUJ..
so that IIxl1 2 = IIyl12 + I(x, u)12, which is just the Pythagorean theorem. In
particular, if V = lRn the Euclidean space with the dot product, then
(Here the third equality comes from the matrix products). This equation
shows that the matrix uuT is the matrix representation of the orthogonal
projection Proju with respect to the standard basis for lRn. Further dis-
cussions about matrix representations of the orthogonal projections will be
given in Section 5.10.
5.5. THE GRAM-SCHMIDT ORTHOGONALIZATION 177
Problem 5.11 Let V = P3 (]R), the vector space of polynomials of degree < 3
equipped with the inner product
(I, g) = 10 1
f(x)g(x) dx for any f and 9 in V.
Let W be the subspace of V spanned by {I, x}, and define f (x) = x2• Find the
orthogonal projection Projw(J) of f on W.
Since C2 = 2VI + V2 V2, we still have a spanning set {VI, V2, C3} of the
column space C(A) and thus a basis. Let W2 denote the subspace spanned
by VI and V2. Then
Then one can easily show that the set {VI, V2, V3} still spans C(A) and
forms an orthonormal basis for it. 0
5.5. THE GRAM-SCHMIDT ORTHOGONALIZATION 179
Xl
VI = IlxIII'
Problem 5.12 Find an orthonormal basis for the subspace of the Euclidean space
]R3 given by x + 2y - 3z = 0, which is the orthogonal complement of the vector
(1,2, -3) in ]R3.
[x]a =
The right side of this equation is just the dot product of vectors in the
Euclidean space ~n. That is,
Example 5.12 Show that every 2 x 2 orthogonal matrix must be one of the
following forms
[
COS () sin () 1
sin () - cos () .
Definition 5.10 Let V and W be two inner product spaces. A linear trans-
formation T : V -+ W is called an isometry, or an orthogonal transfor-
mation, if it preserves the lengths of vectors, that is, for every vector x E V
IIT(x)11 = Ilxll·
(T(x + y), T(x + y)) (T(x), T(x)) + 2(T(x), T(y)) + (T(y), T(y)),
(x+y,x+y) (x, x) + 2(x, y) + (y, y),
from which we get (T(x), T(y)) = (x, y).
The converse is quite clear by choosing y = x. D
Ax· Ay = x·y.
Proof: One way is clear. Suppose that A preserves the dot product. Then
for any vectors x, y E jRn,
Ax . Ay = x T AT Ay = x T Y = x . y.
Take x = ei and y = ej. Then this equation is just [AT Alij = (jij. D
Since d(x, y) = Ilx - yll for any x and y in V, one can easily derive the
following corollary.
Proof: Note that the k-th column vector of the matrix [T]~ is just [T(Vk)];3'
Since T preserves inner products and 0: is orthonormal, we get
(1) Q = [~r 2:
-8
~],
C
(2) Q = [~ ~: ~].
T -28 C
5.7. RELATIONS OF FUNDAMENTAL SUBSPACES 185
Problem 5.19 (Bessel's Inequality) Let V be an inner product space, and let
{v 1, . . . , V m} be a set of orthonormal vectors in V (not necessarily a basis for
V). Prove that for any x in V, IIxl12 ~ l:z:,1 I(x, vi)1 2 .
Lemma 5.19 For any m x n matrix A, the null space N(A) and the row
space 'R.(A) are orthogonal in ~n. Similarly, the null space N(AT) of AT
and the column space C(A) = 'R.( AT) are orthogonal in ~m.
Proof: Note that W E N(A) if and only if Aw = 0, i.e., for every row
vector r in A, r· w = O. For the second statement, do the same with AT. 0
This theorem shows that N(A) l.. 'R.(A) and C(A) l.. N(AT ), hence
N(A) ~ 'R.(A).l (or 'R.(A) ~ N(A).l) and N(AT) ~ C(A).l ( or C(A) ~
N (AT).l ), but the equalities between them do not follow immediately. The
next theorem shows that we have equalities in both inclusions, that is, the
row space 'R.(A) and the null space N(A) are orthogonal complements of
each other, and the column space C(A) and the null space N(AT) of AT are
orthogonal complements of each other. Note that the above theorem also
shows that N(A) n'R.(A) = {O} and C(A) n N(AT) = {O}.
Theorem 5.20 (The second fundamental theorem) For any mxn ma-
trix A,
(1) N(A) EB'R.(A) = ~n,
(2) N(AT) EB C(A) = ~m.
186 CHAPTER 5. INNER PRODUCT SPACES
Proof: (1) Since both the row space R(A) and the null space N(A) of A
are subspaces of ]Rn, we have N(A) + R(A) ~ ]Rn in general. However,
IR n IR m
x A b
Xr
AT be
Problem 5.22 Given two vectors (1, 2, 1, 2) and (0, -1, -1, 1), find all vectors
in 1R4 that are perpendicular to them.
Problem 5.23 Find a basis for the orthogonal complement of the row space of A:
(1) A = [ 13 0268]
2 -1 1 ,
is the closest vector to b among the vectors in C(A). Therefore, a least square
solution Xo E ~n satisfies the following:
Example 5.13 Find all the least square solutions to Ax = b, and then
determine the orthogonal projection be of b into the column space C(A) of
A, where
-2
Solution:
AT A =[-~
2 -1
-3
-1
1
2 -i] [-~ -~ 1
-2
-3
1
[ -24
15
-3
-24
39
3
-3]
3
6
,
-5
and
15
[ -24 - 39
24 -3]
3 [Xl]
X2
[-
01] .
-3 3 6 X3 3
By solving this system of equations (left for an exercise), we obtain all the
least square solutions desired:
190 CHAPTER 5. INNER PRODUCT SPACES
Note that the vector x ~ k [ =~ 1is not in R(A). One needs to do a little
more computation to find a least square solution x E R(A). 0
A= [ 1o 0 2]
-1
2
1
2
-1 ' b =
[3]
-3
O·
-1 2 0 -3
Ax· y = (Axf y = x T AT Y = X· AT y.
Clearly, the two columns of A are linearly independent and C(A) is the xy-
plane. Thus b ~ C(A). Note that
Hence,
be = Ax = [ ~1 2]
~ [14/3]
-1/3 = [ 4]
~ .
o
Problem 5.25 Let W be the subspace of the Euclidean space IR3 spanned by the
vectors Vl = (1, 1, 2) and V2 = (1, 1, -1). Find Projw(b) for b = (1, 3, -2).
Example 5.15 Hooke's law for springs in physics says that for a uniform
spring, the length stretched or compressed is a linear function of the force
applied, that is, the force F applied to the spring is related to the length x
stretched or compressed by the equation
F = a+kx,
in the x F-plane , one can easily recognize that they are not on a straight
line of the form F = a + kx in the xF-plane, which may be caused by
experimental errors. This means that the system of linear equations:
Fl = a + 6.1k 0,
{ F2 = a+ 7.6k 2,
F3 = a+8.7k 4,
F4 = a + 10Ak 6
is inconsistent (i. e., has no solutions so the second equality in each equation
may not be a true equality). Thus, the best thing one can do is to determine
the straight line that "fits" the data, that is, the line that minimizes the sum
of the squares of the vertical distances from the line to the data: i. e., one
needs to minimize
This quantity is simply the square of the distance between the vector b =
(0,2,4,6) in ]R4 and the vectors (H, F2, F3, F4) in the column space C(A) of
the 4 x 2 matrix
A= [ 1
1 6.1
7.6
1
1 8.7 '
1 lOA
194 CHAPTER 5. INNER PRODUCT SPACES
!
data, then we obtain a system of linear equations,
The left side Ax represents the values of the polynomial at Xi'S and the right
side represents the data obtained from the inputs Xi'S in the experiment.
If n ::; k + 1, then the cases have already been discussed in Section 3.8.
In general, this kind of system may be inconsistent (i.e., it may have no
solution) if n > k + 1. This means that there may be no polynomial of
degree k < n - 1 whose graph passes through the n data (Xi, Yi) in the XY-
plane. Practically, it is due to the fact that the experimental data usually
have some errors.
Thus, the best thing we can do is to find the polynomial f(x) that min-
imizes the sum of the squares of the vertical distances between the graph
of the polynomial and the data. In matrix and vector space language, an
inconsistency of the system means that the vector b E ]Rn representing the
data is not in the column space C(A) of the coefficient matrix A. And min-
imizing the sum of the squares of the vertical distances between the graph
of the polynomial and the data means looking for the least square solution
of the system, because for any c E C(A) of the form
1 Xl x2
I
xk
I ao ao + alxl + ... + akxIk
ao + alx2 + ... + ak x 2
x2 k
1 X2 2 x~ al
=c,
1 Xn x2
n
xk
n ak ao + alxn + ... + akXnk
we have
lib - cl1 2 = (YI - ao - alxl - ... - akxt)2 + ...
+(Yn - ao - alX n - ... - akx~)2.
The previous theory says that the orthogonal projection be of b into the
column space of A minimizes this quantity and shows how to find be and a
least square solution Xo.
Example 5.16 Find a straight line Y = a + bx that fits the given experi-
mental data, (1, 0), (2, 3), (3, 4) and (4, 4), that is, a line Y = a + bx that
minimizes the sum of squares of the vertical distances IYi - a - bXi I's from
n
the line Y = a + bx to the data (Xi, Yi). By adapting matrix notation
1 1
1 2
1 3
1 4
x= [ ~1 and b = [
196 CHAPTER 5. INNER PRODUCT SPACES
AT A = [4 10
10 30
1' (AT A)-l = [ _ t~ ~1
~'
ATb
=
[11
34
1.
Hence, we have
an d y = -"21 + 13 . d rme.
. t h e d eSlre
lOX IS o
Problem 5.26 From Newton's second law of motion, a body near the surface of the
earth falls vertically downward according to the equation
1
s(t) = So + vat + 2" g t 2,
where s(t) is the distance that the body traveled in time t, and So, va are the
initial displacement and velocity, respectively, of the body, and g is the gravita-
tional acceleration at the earth's surface. Suppose a weight is released, and the
distances that the body has fallen from some reference point were measured to be
s = -0.18, 0.31, 1.03, 2.48, 3.73 feet at times t = 0.1, 0.2, 0.3, 0.4, 0.5 seconds,
respectively. Determine approximate values of So, va, g using these data.
and
ProjU.
J
0 Proju. =
,
{pO . rOJUj
ifi=l=j,
if i = j.
Problem 5.28 Show that if {VI, V2, ... , v m } is an orthonormal basis for ]Rm, then
VIV! +v2v1 + ... +vmv~ = 1m·
by Corollary 5.24. This means that A(AT A)-l AT is the projection matrix on
the subspace W = C(A). Note that this projection matrix is independent of
the choice of basis for W due to the uniqueness of the matrix representation
of a linear transformation with respect to a fixed basis. Some possible simple
computations for the matrix A(AT A)-I AT will follow later. This argument
proves the following theorem.
P A(AT A)-lAT
1[ 03 1]
- 2
14 -1 0
-1 [102 13 -3
2 6] ,
14 6 -3 5
and
-
x-A b-
T - [-- u.f: --] [ b.:l ] [(Ul., b) ]
:'
-- u Tn -- bn (u n, b)
Corollary 5.26 Suppose that the column vectors {UI, ... , un} of A form
an orthonormal basis for W in ]Rm. Then we get
P = A(AT A)-l AT = AAT = UIU[ + U2U§ + ... + unU~.
In particular, if A is an m x m orthogonal matrix, then, for all b E ]Rm,
Ax = b has the unique solution x = A-Ib = ATb.
UT
n
[ UI ... Un 1 [ uTX 1
UTX
n
= (UIU[ + U2U§ + ... + Unu~)X,
where each uT x is a scalar as the inner product of Ui and x. o
Example 5.18 If A = [CI C2], where CI = (1, 0, 0), C2 = (0, 1, 0), then
the column vectors of A are orthonormal, C(A) is the xy-plane, and the
projection of b = (x, y, z) E ]R3 onto C(A) is be = (x, y, 0). In fact,
o
Before discussing the computation of P = [Projwla with a general basis
for W, we exhibit a criterion for a square matrix to be a projection matrix.
for i = 1, ... , m. Then each Pi is the projection of JRm onto the i-th axis,
whose matrix form looks like
o o
o 1
1 o
o 1
o o
When we restrict the image to JR, Pi is an element in the dual space JRn *, and
usually denoted by Xi as the i-th coordinate function (see Example 4.25).
Problem 5.29 Show that any square matrix P that satisfies pT P = P is a projection
matrix.
CI
(C2' ql)
C2 - (ql, ql) ql
(C n , qn-I) (c n , ql)
qn = C n- qn-I - ... - ql,
(qn-I, qn-I) (ql, ql)
gives an orthogonal basis {ql, ... , qn} for C(A). By taking normalization
of these vectors, we obtain an orthonormal basis {UI' ... , un} for C(A),
where Ui = qdllqill. Rewriting these equations gives us
IlqIllul
b2I UI + b22 U 2
where aij = {~:',~~~ for i > j, aii = 1, and bij = aijll%11 for i 2: j. Hence,
bl l b2I bnl
0 b22 bn2
A = [CI ... c n ] = [UI ... un] =QR.
0 0 bnn
I
The matrix Q = [UI ... un] is an m x n matrix with orthonormal columns,
called the orthogonal part of A, and
[ 1101
1 0]
A = [CI C2 C3] = 0 1 1 .
001
Solution: We first find the decomposition of A into Q and R, the orthog-
onal part and the upper triangular part:
Problem 5.30 Find the 2 x 2 matrix P that projects the xy-plane onto the line
n
y =x.
Problem 5.31 Find the projection matrix P of]R.3 onto the column space C(A) for
A~ [i
Problem 5.32 Find the matrix for orthogonal projection from ]R.3 to the plane
spanned by the vectors (1, 1, 1) and (1, 0, 2).
Problem 5.33 Find the projection matrix P on the Xll X2, X4 coordinate subspace
of ]R.4.
5.11 Exercises
5.1. Decide which of the following functions on ]R.2 are inner products and which
are not. For x = (Xl, X2), Y = (Yl, Y2),
(1) (x,y) = XlYlX2Y2,
(2) (x, y) = 4XlYl + 4X2Y2 - XlY2 - X2Yl,
(3) (x,y) = XlY2 - X2Yl,
(4) (x,y) = XlYl + 3X2Y2,
(5) (x, y) = XlYl - XlY2 - X2Yl + 3X2Y2.
5.2. Show that the function (A, B) = treAT B) for A, B E Mnxn(]R.) defines an
inner product on Mnxn(]R.).
5.3. Find the angle between the vectors (4, 7, 9, 1, 3) and (2, 1, 1, 6, 8) in ]R.5.
5.4. Determine the values of k so that the given vectors are orthogonal with respect
to the Euclidean inner product in ]R.4.
5.5. Consider the space e[O, 1] with the inner product defined by
(/,g) = 11 f(x)g(x)dx.
Compute the length of each vector and the cosine of the angle between each
pair of vectors in each of the following:
5.11. EXERCISES 205
for any real numbers aI, a2, ... , an. When does equality hold?
5.7. Let V = P2([0, 1]), the space of polynomials of degree::; 2 on [0, 1]. Equip
V with the inner product
(1, g) = 11 f(t)g(t)dt.
5.8. Find an orthonormal basis for ]R3 with the Euclidean inner product by ap-
plying the Gram-Schmidt orthogonalization to the vectors x = (1, 0, 1),
X2 = (1, 0, -1), X3 = (0, 3, 4).
(3)
[ 1jv'2
° -1/v'2 ° -1/v'2]
1/v'2 , (4)
[ 1jv'2
1/~
1/V3
-1/V3
-1/v'G]
1/V6 .
-1/v'2 1/v'2
° 1/V3 2/V6
206 CHAPTER 5. INNER PRODUCT SPACES
5.13. Consider]R4 with the Euclidean inner product. Let W be the subspace of]R4
consisting of all vectors that are orthogonal to both x = (1, 0, -1, 1) and
y = (2, 3, -1 , 2) . Find a basis for W.
5.14. Let V be an inner product space. For vectors x and y in V , establish the
following identities:
(1) (x,y) = 111x + Yl12 -111 x _ Yl12 (polarization identity) ,
(2) (x,y) = ~ (11x + Yl12 -lIxll 2-llyI12) (Polarization identity) ,
(3) Ilx + Yl12 + Ilx - Yl12 = 2(llx11 2+ IIYI12) (parallelogram equality).
vol(A) = Vdet(AT A) .
5.17. Find the volume of the three-dimensional tetrahedron in ]R4 whose vertices
are at (0, 0,0, 0), (1,0,0, 0) , (0,1,2, 2) and (0,0,1, 2).
5.18. For an orthogonal matrix A , show that det A = ±1. Give an example of an
orthogonal matrix A for which det A = -1.
5.11. EXERCISES 207
5.19. Find orthonormal bases for the row space and the null space of each of the
n
following matrices.
(1) [ 21 41 3]
1 , (2) [ -21 -34 0]
1 ,
[1 ~ ~
201 o 0 2 (3)
(2) [ 21 41 1]
1 ' (3) [~~ ~ ].
2 3 -1
5.25. Consider the space C[-l, 1J with the inner product defined by
(1) Two vectors x and y in an inner product space are linearly independent
if and only if the angle between x and y is not zero.
(2) If V is perpendicular to W, then V..l is perpendicular to W..l.
(3) Every permutation matrix is an orthogonal matrix.
(4) The projection of]Rm on a subspace W is a linear transformation of]Rm
into itself.
(5) Two different subspaces of]Rm may have the same projection matrix.
(6) An n x n symmetric matrix A is a projection matrix if and only if
A2 =1.
(7) For any m x n matrix A and b E ]Rm, AT Ax = ATb always has a
solution.
(8) An inner product can be defined on every vector space.
(9) Let V be an inner product space. Then Ilx - yll ~ Ilxll - Ilyll for any
vectors x and y in V.
(10) The least square solution of Ax = b is unique for any symmetric matrix
A.
(11) Every system of linear equations has a least square solution.
(12) The least square solution of Ax = b is the orthogonal projection of b
on the column space A.
Chapter 6
6.1 Introduction
Gaussian elimination plays a fundamental role in solving a system Ax = b
of linear equations. In order to solve a system of linear equations, Gaussian
elimination reduces the augmented matrix to a (reduced) row-echelon form
by using elementary row operations that preserve row and null spaces.
In this chapter, as another method of simplifying a square matrix, we
examine which matrices can be similar to diagonal matrices, and what the
transition matrices are in this case. The tools are eigenvalues and eigenvec-
tors. In fact, they play important roles in their own right in mathematics
and have far-reaching applications not only in mathematics, but also other
fields of science and engineering. Some specific applications with a square
matrix A are (1) solving systems Ax = b of linear equations, (2) checking
the invertibility of A or estimation of det A, (3) calculating a power An or
the limit of a matrix series 2:~1 An, (4) solving systems of linear differential
equations or difference equations, (5) finding a simple form of the matrix
representation of a linear transformation, etc. One might notice that some
of the problems listed above are easy if A is diagonal.
We begin by introducing eigenvalues and eigenvectors of a square matrix
A. For an n x n square matrix A, there may exist a nonzero vector that is
transformed by A into a scalar multiple of itself.
209
210 CHAPTER 6. EIGENVECTORS AND EIGENVALUES
Ax = AX.
det(AI - A) = o.
Note that the left side is a polynomial of degree n in A, called the charac-
teristic polynomial of A. Thus the eigenvalues are simply the roots of the
equation det(AI - A) = O.
Thus, to find eigenvectors of A, first find the roots (or eigenvalues of A) of
the equation det(AI - A) = 0, and then solve the homogeneous system (AI -
A)x = 0 for each eigenvalue A. In summary, by referring to Theorem 3.25
we have the following theorem.
A= [ 2 J2 J2] 1 .
6.1. INTRODUCTION 211
det(AI - A) = det [ A - 2
-J2 -J2] = A2 -
A-I 3A = A(A - 3),
0,
0,
Xl - J2 X2 = 0, In
{
-J2 Xl + 2 X2 = 0, or Xl = Y 2 X2·
Thus, by a similar calculation, X2 = (J2, 1) is an eigenvector belonging to
A2 = 3 and the eigenvectors of A belonging to A2 = 3 are the nonzero vectors
of the form tX2, t E R Note that the eigenvectors Xl and X2 belonging to
the eigenvalues Al and A2 are linearly independent. 0
A =
3 -2 0
[ -2 3 0
1.
005
2
A-3
o
212 CHAPTER 6. EIGENVECTORS AND EIGENVALUES
for 8, t E lR. Since (-1, 1, 0) and (0, 0, 1) are linearly independent, they
form a basis for the eigenspace E(A2) belonging to A2 = 5. 0
Example 6.3 Consider the matrix A = [~ ~]. Like the above example,
a simple computation shows that the characteristic polynomial is (A - 1)2
so that the eigenvalue A = 1 is of multiplicity 2. However, there is only one
linearly independent eigenvector x = (1, 0) belonging to A = 1. This kind
of matrix will be discussed in Chapter 7. 0
Note that the equation det(AI - A) = 0 may have complex roots, which
are called complex eigenvalues. However, the complex numbers are not
scalars of the real vector space. In many cases, it is necessary to deal with
those complex numbers, that is, we need to expand the set of scalars to
6.1. INTRODUCTION 213
include complex numbers. This expansion of the set of scalars to the set of
complex numbers leads us to work with complex vector spaces, which will
be treated in the next chapter. In this chapter, we restrict the discussion to
the case of real eigenvalues, even though the entire discussion in this chapter
applies in the same way to general complex vector spaces.
A = [ C?s B - sin B
smB cosB
1
is >.2 -2 cos B>'+ (cos 2 B+sin2 B). Thus, the eigenvalues are >'i = cos B±i sin B,
which are complex numbers, so this matrix as a rotation of]R2 has no real
eigenvalues unless B = mr, n = 0, ±1, ±2, .... 0
o *
>. - ann
Therefore, similar matrices have the same characteristic polynomial with the
same roots. In particular, the eigenvalues are invariant under the similarity.
However, their eigenvectors might be different: When B = Q-1 AQ, x is
an eigenvector of B belonging to A if and only if Qx is an eigenvector of A
belonging to A, since AQ = QB, and QBx = AQX.
(3) If an n x n matrix A has n eigenvalues AI, ... , An counting multi-
plicities, then the characteristic polynomial of A can be factorized as
Corollary 6.3 The determinant and the trace of A are invariant under sim-
ilarity.
Corollary 6.4 For any n x n matrices A and B, the following are equivalent.
(1) Zero is an eigenvalue of AB.
(2) A or B is singular.
(3) Zero is an eigenvalue of BA.
Corollary 6.5 For any n x n matrices A and B, AB and B A have the same
eigenvalues.
Problem 6.3 Find the matrices A and B such that det A = det B, trA = trB, but
A is not similar to B.
Problem 6.4 Let A1, A2, ... , An be the eigenvalues of an n X n matrix A. Then
(1) A is invertible if and only if Ai =1= 0, for all i = 1, 2, ... , n.
1 1 1
(2) If A is invertible, then the inverse A -1 has eigenvalues A1' A2' ... , An'
Problem 6.5 Show that A and AT have the same eigenvalues. Do they necessarily
have the same eigenvectors?
Problem 6.6 For any n x n matrices A and B, show that AB and BA are similar if
A or B is nonsingular.
216 CHAPTER 6. EIGENVECTORS AND EIGENVALUES
or, equivalently, AQ = QD. Let Xl, ... , Xn denote the column vectors of
Q. Since
AQ = [AXI AX2 ... Axnl
QD = [AlXl A2X2 ... AnXn],
with Xj as the j-th column vector, then the same equation shows AQ =
QD, where D is the diagonal matrix having the eigenvalues AI, ... , An
on the diagonal. Since the column vectors of Q are assumed to be linearly
independent, Q is invertible, so Q-l AQ = D. 0
for some invertible matrix Q, and then A must be the zero matrix. Since
A is not the zero matrix, no invertible matrix Q can be achieved so that
Q-1 AQ is diagonal.
n
accordance with the order of the eigenvectors in the transition matrix. For
example, let
s~ [~ ~
Then S will diagonalize A, because it has linearly independent eigenvectors
as columns, but we find that
Problem 6.8 Find a 2 x 2 matrix A whose eigenvalues are 2 and 3, and whose
eigenvectors are (2, 1) and (3, 2), respectively.
Theorem 6.6 shows how to diagonalize a matrix and what the diagonal
matrix is when the matrix has a full set of linearly independent eigenvec-
tors. We next consider when a square matrix A can have a full set of linearly
independent eigenvectors. This problem is closely related to the eigenval-
ues of the matrix, because eigenvectors can be found practically after the
eigenvalues have been computed.
The following theorem shows that if an n x n matrix has n distinct (real)
eigenvalues, then it has n linearly independent eigenvectors so that it is
always diagonalizable and the diagonal matrix has the eigenvalues on the
main diagonal.
220 CHAPTER 6. EIGENVECTORS AND EIGENVALUES
Theorem 6.7 Let ),1, ),2, ... , ),k be distinct eigenvalues of a matrix A
and Xl, X2, ... , Xk eigenvectors belonging to them, respectively. Then
{Xl, X2, ... , xd is linearly independent.
Proof: Let r be the largest integer such that {Xl, ... , x r } is linearly
independent. If r = k, then there is nothing to prove. Suppose not, i.e.,
1 ::; r < k. Then {Xl, ... , Xr+l} is linearly dependent. Thus, there are
scalars CI, C2, ... , Cr+l with Cr+l =1= 0 such that
ClXl + C2X2 + ... + Cr+lXr+l = O. (1)
Multiplying both sides by A, and, using
we get
Cl),lXl + C2),2X2 + ... + Cr+l)'r+lXr+l = O. (2)
Multiplying both sides of (1) bY),r+! and subtracting the resulting equation
from (2) yields
Cl (),l - ),r+l)Xl + C2(),2 - ),r+!)X2 + ... + cr(),r - ),r+dxr = o.
Since {Xl, X2, ... , X r } is linearly independent and ),1, ),2, ... , )'r+l are
all distinct, it follows that Cl = C2 = '" = Cr = O. Substituting these values
in (1) yields Cr +! = 0, which is a contradiction to the assumption. 0
It follows from Theorem 6.8 that if Xl, X2, ... , Xn are eigenvectors of an
nxn matrix A belonging to n distinct eigenvalues ),1, ),2, ... , )'n, respectively,
then they form a basis for IRn and the matrix representation of A with respect
to this basis should be a diagonal matrix as shown in Remark (2) on page 217.
Of course, some matrices can have eigenvalues with multiplicities > 1
so that the number of distinct eigenvalues is strictly less than n. In this
case, if such a matrix still has n linearly independent eigenvectors, then it
is also diagonalizable, because for a diagonalization all we need is n linearly
independent eigenvectors. In some cases, such a matrix does not have n
linearly independent eigenvectors (see Example 6.3), so a diagonalization is
impossible. This case will be discussed in Section 9.1.
6.3. APPLICATION: DIFFERENCE EQUATIONS 221
Q-1 AQ = [ 5 0 ]
o -2
Therefore,
5100 0 ]
A lOO Q [ 0 (_2)100 Q-1
Xn pairs plus the number of offspring of the Xn-l pairs who were alive at
n - 1 months. Therefore, we have for n ~ 2,
Xn+l = Xn + Xn-l.
It is convenient to set XQ = o. Hence the first several terms of the sequence
become
0, 1, 1, 2, 3, 5, 8, 13, ... ,
which is called the Fibonacci sequence; each term is called a Fibonacci
number.
In general, the linear difference equation (or recurrence relation)
of order k is written as
{
Xn+l + Xn-l
Xn
[ Xn+l
Xn
1= [Ill [ l'
1 0
Xn
Xn-l
xn = AXn-l = A n xQ , n = 1, 2, ... ,
6.3. APPLICATION: DIFFERENCE EQUATIONS 223
_1-)5]
2 .
1+)5
2
1+~)5
With D = [
l-~v?l
e-ov?f 1
2 Q-.
1
Therefore, we get
X1997
=~
vis
((1 +2vis) 1997 _ (1_vls)1997)
2 .
e- v?t
Note that since X1997 must be an integer, we look for the nearest integer
to this huge number. Since < ~, actually very small, for large k,
, ,,,,)1997 . Historically, the number
it must be the integer nearest to Js (¥
2
)..k
I
~ QDkQ-lxo ~ [ ~l
)..k
2
Xk V2 Vn
1 0
Hence, if I)..il < 1 for all i, then the vector Xk must approach the zero vector
as k increases. On the other hand, if there exists an eigenvalue )..i with
I)..il > 1, this vector Xk may grow exponentially in magnitude.
Therefore, we have three possible cases for the process given by Xk
AXk-l, k = 1,2,···. The process is said to be
Example 6.8 (Markov process) We start with Xo people outside a big city
and YO people inside the city. Suppose that each year 20% of the people
6.3. APPLICATION: DIFFERENCE EQUATIONS 225
outside the city move in, and 10% of the people inside move out. Then we
are concerned about the "eventual" distribution of the population.
At the end of the first year, the distribution of the population will be
Xl - 0.8xo + [Link]
{
YI : 0. 2xo + 0.9yo.
Or, in matrix form,
[ Xl
YI
1= [0.8 0.1
0.2 0.9
1[ XoYo 1= Axo.
Thus if Xn = (xn, Yn) denotes the distribution of the population after n
years in the country in where the city is, we get Xn = Anxo.
In this formulation, the problem can be summarized as follows:
(1) The total number of people stays fixed.
(2) The numbers Xn , Yn can never become negative.
A process having these two properties is called a Markov process. In
general, a Markov process is a repeated application of a matrix A to an
initial state Xo, where the matrix A satisfies the following two conditions:
(1') entries of each column of A add up to 1;
(2') entries of A are all nonnegative so that the powers An are all nonneg-
ative.
Such a matrix A is called a stochastic matrix.
Now, to solve the problem, we first find the eigenvalues and eigenvectors
of A. They are Al = 1, A2 = 0.7 and VI = (1, 2), V2 = (-1, 1), respectively,
so that
1 -1
Q= [ 2 1
1' and Q-1 =:31 [ -21 1
1 1.
Hence, Q-1 AQ = [10 0.70 1 = D and
xn Anxo = QDnQ-Ixo
[ ~::~
Zk+l
] = [~:~ ~:~ 0~1] [ ~: ].
0.1 0.2 0.9 Zk
Find the land use in the city after 50 years. This problem has two essential prop-
erties of Markov process: The total area of the city stays fixed, and each portion of
area can never become negative.
Problem 6.12 A car rental company has three branch offices in different cities.
When a car is rented at one of the offices, it may be returned to any of three
offices. This company started business with 900 cars, and initially an equal number
of cars was distributed to each office. When the week-by-week distribution of cars
is governed by a stochastic matrix
0.6 0.1 0.2]
A = [ 0.2 0.2 0.2 ,
0.2 0.7 0.6
determine the number of cars at each office in the k-th week. Also, find lim Ak.
k-oc
r
= anYl + a12Y2 + ... + alnYn
Y2 = a21Yl + a22Y2 + ... + a2nYn
y~(t)
If A is the matrix of the coefficient in the system, then the matrix form of
the system is written as
y'(t) = Ay(t),
with an initial condition Yo = y(O) = (d 1 , ... , dn ) E IRn. It is well-known
that this system has a unique solution y(t) depending on an initial condition
Yo by the fundamental theorem of ordinary differential equations.
In a more general type of system of linear differential equations, the
entries of the coefficient matrix could be functions. However, for our purpose
we restrict our attention to systems with constant coefficients.
Example 6.9 Consider the following three systems:
Yll (t)
Y2l(t)
Yl (t) =
y'(t) =[ yi(t)
Y2(t)
1= [01 °11[ Yl(t)
Y2(t)
1= [ Yd
Y2(t) 1= Ay(t)
t) ,
Yl(t) = [ Yll(t)
Y2l(t)
1= [ ee; l' and Y2(t) = [ Y22(t)-e 1
Y12(t) = [ e t
t
1
are solutions of the system. Suppose ClYl(t) + C2Y2(t) = 0 for all tEl.
yn(t) Y12(t)
Y21(t) Y22(t)
Y(t) = [Yl(t) ... Yn(t)] =
Ynn(t)
Clearly, the n solutions are linearly independent on I if and only if det Y(t) 1=
o for all tEl. However, the following lemma says that det Y (t) 1= 0 for all
tEl if and only if det Y(t) i= 0 at one point tEl. The determinant of Y(t)
is called the Wronskian of the solutions, denoted by W(t) = det Y(t) for
tEl.
Proof:
t t Y~jYij t
Z J
=
Z
(t Y~j[adj
J
YLi)
n
L[Y' . adj Y]ii
tr(Y' . adj Y) = tr(A· (Y . adj Y))
= tr(det Y(t)A) tr(A)W(t),
230 CHAPTER 6. EIGENVECTORS AND EIGENVALUES
where lij (t) is the cofactor of Yij, and the equalities in the last two lines are
due to the fact that
Y'(t) [y~ (t)... y~ (t )1 = A [Yl (t) . . . Yn (t )1 = AY (t ) ,
Y(t) adjY(t) det Y(t)In = W(t)In. o
This lemma shows that the Wronskian W(t) satisfies a differential equa-
tion of the form Y' = ay discussed at the beginning of this section. Thus
it must be an exponential function of the form W(t) = Woe(trA)t with an
initial condition W(O) = Woo This means that the value of W(t) is zero for
all t or never zero on I depending on whether or not Wo = o. Thus, the n
solutions are linearly independent for all tEl if and only if the n solutions
are linearly independent at some tEl, i.e., the linear dependence of the n
solutions may be checked at any convenient point. This again justifies that
it is good enough to begin with a set of n linearly independent initial vectors
to find the general solution of y'(t) = Ay(t). That is, if the n initial vectors
(conditions) YlO(O) = YlO,··· ,YnO(O) = YnO are linearly independent, then
the solutions determined by them form a fundamental set.
y(t) -- [ Yl~t)
: ]_- [ e
A1t
0] [ ~l: ]
Yn(t) 0 eAnt dn
6.5. APPLICATION: DIFFERENTIAL EQUATIONS II 231
Remark: Actually, the above solution is the general solution of the system.
Indeed, if we take n initial conditions to be the standard basis {el' ... , en}
for ]Rn, then for each initial vector ei = (0, ... , 1, ... ,0) for i = 1, ... ,n, we
get the solution Yi(t) = eA;tei . Since the initial conditions Yi(O) = ei are
linearly independent eigenvectors of the diagonal matrix D, the set {yi (t) =
eA;tei: i = 1, 2, ... , n} is a fundamental set. Any initial condition can
be written as
Yo = d1el + ... + dne n ,
so the general solution is of the form
In this equation, the coefficients a and c are the birth rate of x and the death
rate of y, respectively. The nonlinear x(t)y(t) terms in the two equations
mean the interaction of the two species, so the coefficients band d are the
232 CHAPTER 6. EIGENVECTORS AND EIGENVALUES
measures of the effect of the interaction between them. A study of this gen-
eral system of differential equations leads to a very interesting development
in the theory of dynamical systems and can be found in any book on ordi-
nary differential equations. Here, we restrict our study to the case of x and
y very small, i.e., near the origin in the plane. In this case, one can neglect
the nonlinear terms in the equations, so the system is assumed to be given
as follows:
[
X' (t)
y'(t)
1 [a0 0
-c
1[ x(y(t)·
t) 1
Thus the eigenvalues are >'1 = a, A2 = -C, and their eigenvectors el, e2,
respectively. Thus, its general solution is
Remark: Note that each vector function Yi(t) = eAitvi is the solution of the
system determined by the initial condition Yi(O) = Vi, for i = 1, ... ,n, which
are linearly independent eigenvectors of A. Hence, they form a fundamental
set of solutions.
Thus, we have obtained the following theorem:
4Y2 + 4Y3
11Y2 + 12Y3
4Y2 + 5Y3,
and also find its particular solution satisfying the initial conditions Yl (0) = 0,
Y2(0) = 3 and Y3(0) = 2.
Solution: In matrix form, the system may be written as y' = Ay with
-11
-4 4] 12 .
-4 5
234 CHAPTER 6. EIGENVECTORS AND EIGENVALUES
(1) The eigenvalues of A are >'1 = .A.2 = 1, and .A.3 = -3, and their
eigenvectors are Yl = (1,1,0), Y2 = (-1,0,1) and Y3 = (1,3,1), respectively,
which are clearly linearly independent[ (011see !;Ob1e]m 6.9).
311
(2) The matrix Q = [Yl Y2 V3] = 0 diagonalizes A:
1
[ ~~ ] = [e~0 e~0 ~
X3 e- 3t
] [ ~
c
] [~;:].
ce- 3t
o
6.6. EXPONENTIAL MATRICES 235
Y' 4Yl + Y3
Problem 6.14 Solve the system { y~ = -2Yl + Y2
y~ -2Yl + Y3,
and find the particular solution of the system satisfying the initial conditions Yl (0) =
-1, Y2(0) = 1, Y3(0) = O.
yi = Yl - Y2
Problem 6.15 Solve the system { y~ = 3Yl
y~ = 2Yl + Y2
with initial conditions Yl (0) = 0, Y2(0) = 2, Y3(0) = 1.
and Dk = [ ).~ 0 l.
o ).k
n
1 2 1 3
I+D+-D +-D + ...
2! 3!
o
Definition 6.4 A sequence of matrices AI, A2, A3, ... of the same size is
said to converge to a matrix L if each sequence of the (i, j)-entries of AI,
A2, A3, ... converges to the (i, j)-entry of L for all i, j. In this case, we
write
L = lim Ak.
k-+oo
Theorem 6.11 Let AI, A2, A3, ... be a sequence of m x n matrices such
that lim Ak = L. Then
k-+oo
for any matrices Band C for which the products are defined.
6.6. EXPONENTIAL MATRICES 237
= 2)BlidLl€j = [BLl ij ,
€=1
Problem 6.16 Let A = [~ i]. Find kl2..~ Ak if it exists. (Note that the matrix
A is not diagonalizable.)
Ao + Al + A2 + ... = L Ak = L.
k=O
Example 6.14 The sequence of the partial sums of a geometric series
AO+Al+A2+ ...
m
of a square matrix A is {8m = L Ak}, and so if A = QDQ-l is diagonaliz-
k=O
able, then
m m
L QDkQ-l = Q(L Dk)Q-l
k=O k=O
238 CHAPTER 6. EIGENVECTORS AND EIGENVALUES
Thus, the sequence converges if and only if IAil < 1 for all i. D
The existence of e A for any square matrix A can easily be shown as follows:
Let M be a number such that laijl ~ M for all (i,j)-entries aij of A (such
a number exists since A has only finite number of entries). Then all (i,j)-
entries of Ak are bounded by n k - 1 Mk for all k, and hence each entry of e A
is bounded by
~~
~ kIn
k-1Mk _
-
~ enM ,
k=O • n
L
00 Ak
so by the comparison test, each entry of e A = kI is absolutely convergent
n=O •
for any square matrix A.
Again, if A is diagonalizable with Q-l AQ = D, then by Theorem 6.11,
A QDQ-l
e = e
It is a good exercise to calculate the missing entry * directly from the defi-
~~. 0
e A = e2 1(1 + N) = e ~ ~ ] [ e;
2 [
3e 2
].
n
e2 0
[~
3
Problem 6.19 Compute e A for A = 2
0
240 CHAPTER 6. EIGENVECTORS AND EIGENVALUES
Lemma 6.13 (1) Let A(t) and B(t) be matrices, whose entries are all
differentiable functions in t, such that their product is defined. Then
A(t)B(t) is differentiable, and its derivative is
(2) For any t E lR. and any square matrix A, the exponential matrix
e
~ =
~ 2 ~ 3
1+ tA + -A + -A + ...
2! 3!
d
is a differentiable function of t, and dt etA = Ae tA .
d
_etA = lim e(t+h)A - etA
= lim (e hA
-
I) etA
dt h~O h h---+O h '
since (tA)(hA) = (hA)(tA). One can now easily show limh~O eh~-I = A. 0
Lemma 6.13 implies that y = etAyo is the solution of the system y' = Ay.
In particular, if A is diagonalizable, say Q-l AQ = D, then from (2) in
Theorem 6.12 section we get
Yo = [ ~ ].
Solution: First note that A has an eigenvalue A of multiplicity 2 and is
not diagonalizable. Now we write it as
A = [~ ~] + [~ ~] = AI + N.
Then, by the same argument as in Example 6.16,
Example 6.18 Find the general solution of the system y' = Ay, where
Solution: Note that the eigenvalues of A are a ± ib, which are not real
unless b = 0, in which case the matrix is already in diagonal form. If b =I 0,
the diagonalization discussed in this section does not apply since we have
complex eigenvalues which are going to be discussed in Chapter 9. However,
242 CHAPTER 6. EIGENVECTORS AND EIGENVALUES
A = [ ab -b
a 1= a [10 0
1 1+ b [01 -10 1= aI + bJ,
[
1 - (bt)2
2! 4!
+ (bt)4
_... -(bt) + (bt)3
3!
- (bt)5
5!
+ ... j
(bt)3 (bt)5 (bt)2 (bt)4
(bt)- - + - - ... 1- - + - - ...
3! 5! 2! 4!
[
COS
sinbt
bt - sin bt
cosbt
1
for any constant band t. Thus, the general solution of y' = Ay is
- sin bt
cos bt
1[ [Link]
C2
In terms of components,
Problem 6.20 Solve the system y' = Ay with initial condition y(O) = Yo by com-
!
puting etAyo for
=: 1' ~
1
(l)A=[~ Yo [ : l, (2) A ~[ o
o
6.8. DIAGONALIZATION OF LINEAR TRANSFORMATIONS 243
~I~ ~I~
]Rn • ]Rn.
A = [Tl et
Let A be an eigenvalue of A (also of T). Then, X = (Xl, X2, ... , Xn) E
]Rn is an eigenvector of the matrix A belonging to A (Ax = AX) if and only
if (p-l(x) = v = Xlvl + X2v2 + ... + XnVn E V is an eigenvector of T
(T(v) = AV), because the commutativity of the diagram shows
Example 6.19 Let T : P2(~) --+ P2(~) be the linear transformation defined
by
(TJ)(x) = f(x) + xJ'(x) + J'(x).
Find a basis for P2(~) with respect to which the matrix of T is diagonal.
Solution: First of all, we find the eigenvalues and the eigenvectors of T.
Take a basis for the vector space P2(~)' say 0: = {I, x, x 2}. Then the
matrix of T with respect to 0: is
1 1 0
[TJo = [ 0 2 2
1,
003
o
6.9. EXERCISES 245
Problem 6.22 Let M 2x2 (lR) be the vector space of all real 2 x 2 matrices and let T
be the linear transformation on M 2x2 (lR) defined by
6.9 Exercises
6.l. Find the eigenvalues and eigenvectors for the given matrix, if they exist.
1
[~
l n l J
-4 -1
(1) [-~ ~], (2) 23,
1 3
1 1
01 01 10
l
1 1
(3) (4)
010 1 1
101 1 1 1
l~
0 -5
(5)
-12 -12 -1 -10 1 (6)
1 o 0 01 .
0 -1 2 -1 ' 0 1 -2
-1 0 -1 2 0 2 1
246 CHAPTER 6. EIGENVECTORS AND EIGENVALUES
n
6.2. Find the characteristic polynomial, eigenvalues and eigenvectors of the matrix
A~ [-~ j
6.3. Show that a 2 x 2 matrix A = [~ !] has
°
6.10. Show that if A is an eigenvalue of an idempotent n x n matrix A (i. e., A 2 = A),
then A must be either or l.
6.11. Prove that if A is an idempotent matrix, then tr A = rank A.
6.12. Let A = [aij] be an n x n matrix with eigenvalues AI, ... , An. Show that
6.13. Prove that if two diagonalizable matrices A and B have the same eigenvec-
tors (i. e., there exists an invertible matrix Q such that both Q-1 AQ and
Q-1 BQ are diagonal; such matrices A and B are said to be simultaneously
diagonalizable), then AB = BA. In fact, the converse is also true. (See
Exercise 7.17.) Prove the converse with an assumption that the eigenvalues
of A are all distinct.
6.9. EXERCISES 247
000 o -Cn
100 o -Cn-l
o 0 0 1 -Cl
istic polynomial p('\) = .An + Cl.A n - 1 + ... + Cn-l.A + Cn.
(This shows that every monic polynomial is the characteristic polynomial of
some matrix. The matrix A is called the companion matrix of p(,\).)
6.15. Let D : P3(lR) -+ P3(lR) be the differentiation defined by Df(x) = J'(x) for
f E P3(lR). Find all eigenvalues and eigenvectors of D and of D2.
6.16. Let T: P2(lR) -+ P2(lR) be the linear transformation defined by
T(a2x2 + alx + ao) = (ao + adx2 + (al + a2)x + (ao + a2).
Find a basis for P2 (lR) with respect to which the matrix representation for T
is diagonal.
-n [~ ~ n
6.17. Determine whether or not each of the following matrices is diagonalizable.
3 0
(2) (3) [ o 2
-2 0
6.18. Find an orthogonal matrix Q and a diagonal matrix D such that QT AQ =D
for
o 0 0 0 1 1 1 0
o 0 0 0 o 1 1 1
o 0 0 0 001 1
is the n x n {O, 1}-matrix with 1 's on the main diagonal and its two parallel
side diagonals.
248 CHAPTER 6. EIGENVECTORS AND EIGENVALUES
(1) A =
(2) A
[ -6
= [
-1
2 -12
24
8
~ -~ ]
-nand y(O)
and
= [ ]
rn
y(l)
~
=
.
6.28. Let f(A) = det(AI -A) be the characteristic polynomial of A. Evaluate f(A)
for
(1) A = [;
1 1 3
! ~] (2) A = [
-1 1
;
4
~ -~].
In fact, f(A) = 0 for any square matrix A and its characteristic polynomial
f(A) (this is the Cayley-Hamilton theorem).
6.29. Determine whether the following statements are true or false, in general, and
justify your answers.
6.9. EXERCISES 249
7.1 Introduction
A real matrix has real coefficients in its characteristic polynomial, but the
251
252 CHAPTER 7. COMPLEX VECTOR SPACES
Thus the same is true for the linear independence, spanning spaces, basis,
dimension, and subspace. For complex matrices, whose entries are complex
numbers, the matrix sum and product follow the same rules as real matrices.
The same is true for the concept of a linear transformation T : V ----. W from
a complex vector space V to a complex vector space W. The definitions
of the kernel and the image of a linear transformation remain the same as
those in the real case, as well as the facts about null spaces, column spaces,
matrix representations of linear transformations, and similarity.
However, if we are concerned about the inner product, there should be a
modification from the real case. Note that the absolute value (or modulus)
of a complex number Z = x + iy is defined as the nonnegative real number
Izi = (zz)~ = Jx 2 + y2, where z is the complex conjugate of z. Accordingly,
the length of a vector z = (ZI' ... , zn) in the n-space en with Zk = Xk +
iYk E e has to be modified: if we would take an inner product in en as
IIzl12 = zf + ... + z;, then a nonzero vector (1, i) in e 2 would have zero
length: 12 + i = o. In any case, a modified definition should coincide with
2
the old definition, when the vectors and matrices were real. The following
is the usual definition of an inner product in en.
Definition 7.1 If u = [UI U2 ... un]T and v = [VI v2 ... Vnr are vectors
in the n-space en with Uk, Vk E e, then their inner product u . v is defined
by
U· v = UIVI + U2V2 + ... + UnV n = ITT v,
But these two different definitions do not induce any essential difference in
a complex vector space.
In a complex inner product space, as in a real inner product space, the
length (or magnitude) of a vector u and the distance between two vectors
u and v are defined by
1
Ilull = (U,U)2, d(u,v) = Ilu-vll,
respectively.
Example 7.1 Let Cda, b] denote the set of all complex-valued continuous
functions defined on [a, b]. Thus an element in Cda, b] is of the form
f(x) = hex) + ih(x), where hex) and hex) are real-valued continuous
on [a, b]. Note that f is continuous if and only if each component function
Ii is continuous. From the theory of continuous functions, it is quite easy
to see that Cda, b] is a complex vector space under the sum and scalar
254 CHAPTER 7. COMPLEX VECTOR SPACES
(f, g) lb f(x)g(x)dx
lb [J1(X)gl(X) + h(X)g2(X)] dx
+i lb [h (X)g2(X) - h(X)gl (x)] dx. o
Problem 1.1 Show that the Euclidean inner product on en satisfies all the inner
product axioms.
UI (i, i, i) (i i i)
VI = IluI11 = J3 = J3' J3' J3 .
Step 2. Let WI denote the subspace spanned by VI. Then
Therefore,
U2 - Projw1 U2 3 ( 2i i i) (2i i i)
V2 = II u 2 - Projw1 u211 = y'6 -"3' 3' 3 = - y'6' y'6' y'6 .
Step 3. Let W 2 denote the subspace spanned by {VI, V2}. Then
U3 - Projw2 U3
= U3 - (U3, VI)Vl - (U3, V2)V2
. 1 (i i i) 1 ( 2i i i)
= (0, 0, z) - J3 J3' J3' J3 - y'6 - y'6' y'6' y'6
(0, -~, ~).
Therefore,
~) = (0, ~, ~).
256 CHAPTER 7. COMPLEX VECTOR SPACES
Thus,
0, if k "" f,
{
271" , if k = f.
Thus, the vectors in Ware orthogonal and each vector has length .;21r. By
normalizing each vector in the orthogonal set W, we obtain an orthonormal
set. Therefore, the vectors
1 ·k
fk{x) = .;21re t x, k = 0, ±1, ±2, ... ,
A = [ C?s (} - sin (}
sm (} cos(}
1
with real entries. This matrix has two complex eigenvalues for any (} E ~,
but no real eigenvalues unless (} = hr, for an integer k.
Therefore, the theorems in Chapter 6 regarding eigenvalues and eigenvec-
tors remain true without requiring the existence of n eigenvalues explicitly,
and exactly the same proofs as in the real case are valid since the argument
in the proofs is not concerned with what the scalars are. For example, one
can have a theorem like "for an n x n matrix A, the eigenvectors belonging
to distinct eigenvalues are linearly independent", and "if the n eigenvalues
of A are distinct, then the eigenvectors belonging to them form a basis for
en so that A is diagonalizable".
An n x n real matrix A can be considered as a linear transformation on
both ~n and en:
Since the entries are all real, the coefficients of the characteristic polynomial
of A, p(A) = det(AI - A), are all real. Thus, if A is a root of p(A) = 0, then
its conjugate). is also a root because p().) = p(A) = O. In particular, any
n x n real matrix A has at least one real eigenvalue if n is odd.
Moreover, if x is an eigenvector belonging to the complex eigenvalue A,
then the complex conjugate x is an eigenvector belonging to).. In fact, if
Ax = AX with x :I 0, then
where x denotes the vector whose entries are the complex conjugates of the
corresponding entries in x.
Using this fact, the following example shows that any 2 x 2 matrix with
no real eigenvalues can be written as a scalar multiple of a rotation.
258 CHAPTER 7. COMPLEX VECTOR SPACES
= A(U~iV)+A(U~iV)
= au - bv,
and similarly Av = bu + avo It implies that the matrix representation of the
linear transformation A : ]R2 ---t ]R2 with respect to the basis a is
[
r cos () r sin ()
-rsin() rcos()
1. o
Problem 7.3 Let x and y be two vectors in a vector space V. Show that x and y
are linearly independent if and only if x + y and x - yare linearly independent.
Problem 7.4 Find the eigenvalues and the eigenvectors of
Note that A is the matrix whose entries are the complex conjugates of
the corresponding entries in A. Thus, [aij]H = [aji]. With this notation, the
Euclidean inner product on en can be written as
U· v = liT V = uHv.
Problem 7.6 For any matrices A and B such that AB is defined, show that (AB)H =
BHAH.
A =[ 4 -2i 4+i]
3 '
B= [ -1 i + i 1+i]
-i .
for any x E en and y E em. The following theorem lists some important
properties of Hermitian matrices.
260 CHAPTER 7. COMPLEX VECTOR SPACES
Remark: Condition (1) in Theorem 7.1 (i.e., x H Ax is real for any complex
vector x E en) is equivalent to saying that the diagonals of A are real:
LaijXiXj
i,j
where C = Li<j aijXiXj. Since C+C E JR, all aii E JR if and only ifx H Ax E JR
for any x E en.
Problem 7.8 Prove that the determinant of any Hermitian matrix is real.
Problem 7.9 Let x be a nonzero vector in the complex vector space en, and A =
xx H . Show that A is Hermitian, and find all the eigenvalues and their eigenspaces
for A.
7.2. HERMITIAN AND UNITARY MATRICES 261
Problem 7.10 Prove Theorem 7.2 by using Theorem 7.1, and prove (3) directly.
Lemma 7.3 For a complex square matrix U, the following are equivalent:
(1) the column vectors of U are orthonormal;
(2) UHU = I;
(3) U-l = U H ;
(4) UUH = I;
(5) the row vectors of U are orthonormal.
Proof: (1) (Ux, Uy) = (x, UHUy) = (x, y), and setting x = y gives us
IIUxll 2 = IIx1l2.
(2) For Ux = ).x, (x, x) = (Ux, Ux) = ).).(x, x) = 1).1 2(x, x).
(3) Let Ux = ).x, Uy = f1Y, and), i= f1. Since U is unitary, we have
).). = 1 = f1{L, and U- 1y = f1-1 y = {Ly. Therefore,
n n
L qki L q(lj(Vk' ve)
k=l (1=1
n n
= L qkiqkj = L[QH]fk[Q]kj.
k=l k=l
This means that the columns of Q are orthonormal and Q is unitary. 0
7.3. UNITARILY DIAGONALIZABLE MATRICES 263
As in the real case, it is true that two matrices representing the same
linear transformation on a complex vector space with respect to different
bases are similar. If the two bases are both orthonormal, then the transition
matrix is unitary (or orthogonal).
Problem 7.13 Show that Idet UI = 1 for any unitary matrix U.
_ [ 1 ;i 1+i
-':+ 1
A- 1 -2.
4 i
is unitary but neither Hermitian nor skew-Hermitian.
Problem 7.17 Describe all 3 x 3 matrices that are simultaneously Hermitian, unitary,
and diagonal. How many are there?
Definition 7.6 (1) Two real matrices A and B are orthogonally simi-
lar if there exists an orthogonal matrix P such that p-l AP = B. A
matrix is orthogonally diagonalizable if it is orthogonally similar
to a diagonal matrix.
(2) Two complex matrices A and B are unitarily similar if there exists
a unitary matrix U such that U- 1 AU = B. A matrix is unitarily
diagonalizable if it is unitarily similar to a diagonal matrix.
264 CHAPTER 7. COMPLEX VECTOR SPACES
Proof: We will prove the second assertion only. The proof of the first is
similar. We will prove it by mathematical induction on n. Clearly, it is true
for n = 1. Assume now that the theorem holds for n = r - 1. Let A be an
r x r matrix and let ).1 be an eigenvalue of A with a normalized eigenvector x.
Extend it to an orthonormal basis by the Gram-Schmidt orthogonalization,
say {x, U2, ... , u r }. Set the unitary matrix U1 = [x U2··· url with these
basis vectors in its columns. A direct computation of the product U1 1AU1
shows
~~ 1[+ +l
-T
U2 I
= AU2 ...
-T
U
I
r
*
+
= o I
B
o
where B is an (r - 1) x (r - 1) matrix. By the inductive hypothesis there
exists an (r - 1) x (r - 1) unitary matrix U2 such that Ui 1BU2 is an upper
triangular matrix with diagonal entries ).2, ).3, ... , ).r. Define
*
o
A=[ l+i
2 1 -1 i 1.
266 CHAPTER 7. COMPLEX VECTOR SPACES
1- -1]
and let
U
= ~
j3
[
1
i
l+i .
Then U is a unitary matrix and diagonalizes A:
[~ ~]. o
Since all the real symmetric matrices are Hermitian matrices, they are
unitarily diagonalizable by Theorem 7.7. However, the following theorem
says more than that.
Theorem 7.8 Let A be an n x n real matrix. Then the following are equiv-
alent.
(1) A is symmetric.
(2) A is orthogonally diagonalizable.
(3) A has a full set of n orthonormal eigenvectors.
is diagonalizable.
n
Example 7.6 Find an orthogonal matrix Q that diagonalizes
A=[H
268 CHAPTER 7. COMPLEX VECTOR SPACES
det(>.I - A) = det [
). - 4
-2
-2
). - 4
-2
-2
1
-2 -2 ).-4
Thus, the eigenvalues of A are ). = 2 and ). = 8. By the method used in
Example 6.2, it can be shown that
UI = (-1, 1, 0) and= (-1, 0, 1) U2
Note that all the Hermitian, unitary and skew-Hermitian matrices are
normal. There are matrices that are normal but are none of these.
U- 1 AU = ~2 [ 1
1
=D.
o
Problem 7.19 Which of following matrices are Hermitian, skew-Hermitian or nor-
[: :j,
mal?
(1) [ 1 ~i 1 + i ] . (2)
1
1 (3) [ -1i 1 -i 1
n
2i ~ ;
o ' 1 -7 0 -7
As a matter of fact, the theorem below shows that normal matrices are
all classified as unitarily diagonalizable matrices. We begin with a lemma.
and then compare the corresponding diagonal entries of both sides, i. e.,
270 CHAPTER 7. COMPLEX VECTOR SPACES
which implies tl2 = ... = tIn = O. Inductively, assume that ti-li = ... =
ti-In = 0 has been shown for i = 1, ... , k. Then
and
(THT)kk = Itlkl2 + ... + Itk_Ikl2 + Itkkl2 = Itkkl2,
because tlk = ... = tk-Ik = 0 by induction hypothesis. But TTH = THT
yields tkk+l = ... = tkn = O. Thus, we get tii+1 = ... = tin = 0 for all
i = 1, ... , n, which shows that all the entries of T off the diagonal are zero,
i. e., T is a diagonal matrix. 0
Proof: (1) {::> (2) Suppose that A is normal. By Schur's lemma again,
we can find a unitary matrix U so that T = UH AU is an upper triangular
matrix. Then T is also normal, since
Note that there are many nonnormal complex matrices that are still
diagonalizable, but of course not unitarily. Readers are suggested to find
such an example.
7.5. THE SPECTRAL THEOREM 271
Problem 7.21 For any real matrices Hl and H2 of the same size, show that A =
Hl + iH2 is normal if and only if HlH2 = H 2H l .
where
A
_£
= -A£1 P,"1 + ... + -A£1 p,"k·
I k
-1
PI H
VIVI = -1 [ -11 ] [-1 1 0]
2 0
= -1 [ -11
2 0
1
0
~ ],
P2 v2 v f= [ ~ =i ] [-1 - 1 2] =6
1 [ 1
1
1
1 -2 -2] ,
-2 -2 4
V3Vtj = -1[1]
1 [1 1 1] = -1 [ 11 1
1 11 ] .
3 1 3 1 1 1
[ 2-1 -1]
Hence,
P{ = PI + P2 = ~ -1 2-1
-1 -1 2
is the projection onto the eigenspace E(2) belonging to .>. = 2, P3 is the
projection onto the eigenspace E(8) belonging to .>. = 8, and
o
det(A/ - A) = det [ _~ .>. - i
o
-~ ] = (A - i)'(A + i),
7.5. THE SPECTRAL THEOREM 275
-1o 1.
1
Hence,
OIl
2 0 -~
.[ 1 0 -1
0 0 0
1.
o 1 -1 0 1 o
Problem 7.24 Find the spectral decomposition of each of the following matrices:
! n n
3 '
~
0 0 1 1
(3) C [
2
0
0
0
2
0
(4)D~ [1 0 0
0 0
0 0
276 CHAPTER 7. COMPLEX VECTOR SPACES
7.6 Exercises
7.1. Calculate Ilxll for
(1) x = [ 1; i ] , (2) x = [
1 - 2i
i
1 .
3+i
7.2. Construct an orthonormal basis for C 2 from {(i, 4 + 2i), (5 + 6i, I)} by
applying the Gram-Schmidt orthogonalization.
(1)
(3)
1 1
7.11. For each of the following Hermitian matrices A, find a unitary matrix U such
that U H AU is diagonal.
1
(2) [ 2 - 3i
2 + 3i ]
-1 ' (3) [ ~i 2
2+i ]
1- i .
2-i 1+i 2
7.12. Find the diagonal matrices to which the following matrices are unitarily sim-
ilar. Determine whether each of them is Hermitian, unitary or orthogonal.
(1) !2 [ 1-
1+ ~
z
1- i ]
l+i '
(2) [0.6
0.8
-0.8]
0.6 ' (3) [ -] 1 °li ].
-i
7.13. For a skew-Hermitian matrix A, show that
(1) A - I is invertible, (2) e A is unitary.
7.14. Let U be a unitary matrix. Prove that U and U T have the same set of
eigenvalues.
A = [ i ~ ~]
-2 -4 -5
can be diagonalized.
Quadratic Forms
8.1 Introduction
The reader should now be well aware of the important roles of matrices in
the study of linear equations, which can be expressed in the form
The left side alXl + a2x2 + ... + anXn = aT x of the equation is a (homo-
geneous) polynomial of degree 1 in n variables, called a linear form. In
this chapter, we study a (homogeneous) polynomial of degree 2 in several
variables, called a quadratic form, and show that matrices also play an
important role in the study of a quadratic form. Quadratic forms arise in
a variety of applications, including geometry, vibrations of mechanical sys-
tems, statistics, and electrical engineering, etc. A more general form of a
quadratic form is a bilinear form which is described in Section 8.7. The
inner product of a vector space and the determinant function on M 2x2 (IR.)
are typical examples of bilinear forms. It turns out that a quadratic form
(or bilinear form) may be associated with a real symmetric matrix, and vice
versa.
A quadratic equation in two variables x and y is an equation of the form
279
280 CHAPTER 8. QUADRATIC FORMS
where
x = [ ~] and A = [~ ~].
Note also that the matrix A is taken to be a (real) symmetric matrix.
Geometrically, the solution set of a quadratic equation in x and y usually
represents a conic section, such as an ellipse, a parabola or a hyperbola in
the xy-plane.
f(x) = x T Ax + b T x + c = 0,
where A = [aij], x = [Xl ... xn]T and b = [b l ... bnV in ]Rn.
n n
= L L aijXiXj,
i=l j=l
A=B+C,
quadratic form x T Ax. For example, for a symmetric matrix [~ _ ~ l' the
equation
Problem 8.1 Find the symmetric matrices representing the quadratic forms
(1) 9xi - x~ + 4x~ + 6XIX2 - 8XIX3 + 2X2X3,
(2) XIX2 + XIX3 + X2X3,
(3) xi + x~ - x~ - x~ + 2XIX2 - lOXIX4 + 4X3X4.
282 CHAPTER 8. QUADRATIC FORMS
in which the first part L:f=l aiixT is called the (perfect) square terms and
the second part L:iofi aijXiXj is called the cross-product terms. Actually,
what makes it hard to sketch the quadratic surface is the cross product
terms. However, the symmetric matrix A can be orthogonally diagonalized,
i. e., there exists an orthogonal matrix P such that
o
pT AP = p- l AP = D =
o
Here, the diagonal entries Ai'S are the eigenvalues of A and the column
vectors of P are their associated eigenvectors of A. Then we get, by setting
x=Py,
Remark: (1) Recall that in Theorem 8.1 the columns of the matrix Pare
the orthonormal eigenvectors of A and y is just the coordinate expression
of x with respect to the orthonormal eigenvectors of A. In fact, P = [Id]3,
8.2. DIAGONALIZATION OF A QUADRATIC FORM 283
[x Y] [~ ~] [: ] = 8.
and
X _ _1_ 1 1 x' ] _ [
J2x
I '+ J2Y
I ']
[]
Y [
- v'2 -1 1 ] [
y' - - ~x' + ~y' .
284 CHAPTER 8. QUADRATIC FORMS
Thus, we get
x T Ax = yTpT APy
yT [; ~ 1y = 2(x')2 + 4(y')2 8,
or
(X')2 (y')2
--+--=l.
4 2
Its solution set is just an ellipse with axes VI = pT el and V2 = pT e2. 0
It turns out that the inertia In(A) completely determines the geometric
type of the quadratic form in the following sense. Since In( -A) = (q, p, k)
°
if In(A) = (p, q, k) and the equation x T Ax = c is inconsistent if p = and
c > 0, it suffices to consider the cases of c 2: and p > 0. Excluding those
°
inconsistent cases we have the following characterization of the solution sets
for n = 2 and 3:
For n = 2, there are only three possible cases for In(A):
In(A) The solution of Xl Ax = c
(p, q, k) c>o c=o
(2, 0, 0) ellipse a point
(1, 1,0) hyperbola two lines crossing at 0
(1, 0, 1) two parallel lines a line
A= [ °100
° °1,
1
1 1
so the eigenvalues of A are found to be >'1 = J2, >'2 = -J2, >'3 = 0, and
the associated orthonormal eigenvectors are
1 1 1) ( 1 1 1)
VI = ( J2' 2' 2 ,V2 = - J2' 2' 2 '
respectively. Hence, an orthogonal matrix P that diagonalizes A is
(1) If it does not have a linear form, i.e., b = 0, then, as we have seen
above, a parabolic curve or surface does not appear as a solution of the
quadratic form.
(2) Suppose that it has a nonzero linear form, i.e., b i- O. If the matrix
A is invertible, then, by taking a change of variables as y = x + ~ A-I b (or
286 CHAPTER 8. QUADRATIC FORMS
Example 8.3 Classify the conic section for 3x2 - 6xy + 4y2 + 2x - 2y = o.
Solution: The matrix for the quadratic form 3x 2 - 6xy + 4y2 is
A= [_~ -!].
Its inverse is
A-I = ~ [~ ~] and b = [ _; ].
Example 8.4 In analytic geometry, a general quadratic curve (or conic sec-
tion) is represented by a quadratic equation in two variables as
ax 2 + 2bxy + cy2 + dx + ey + f = 0
ax 2 + cy2 + dx + ey + f = o.
8.2. DIAGONALIZATION OF A QUADRATIC FORM 287
ax 2 + 2bxy + cy2
+ dx + ey + f = AlU 2 + A2v2 + d'u + e'v + f = o.
for some constants d' and e'. Hence, the classification of the conic sections is
reduced to case (1) according to the various possible cases of the eigenvalues
of A. However, the eigenvalues are given as
which are determined by the coefficients a, b, and c. Hence one can classify
the conic section according to the various possible cases a, b, and c (see
Exercise 8.12).
(3) The axes of the conic section are the directions of the eigenvectors,
which are orthogonal to each other. Since we only need to find axis lines,
but not the direction vectors, we may choose them to be the rotation of the
standard coordinate x- and y-axes which are determined by el, e2. Now, a
pair of orthogonal eigenvectors are found to be
-(a - AI)
+ y --v;- + 1
a-c l(a-c)2
b - --v;-
- cot 2() + cosec 2() = tan (),
288 CHAPTER 8. QUADRATIC FORMS
where cot 2e = aibc for some e. Since b -=I 0 and a - c > 0, we may assume
that 0 < e < n. This means that if we set tan2e = a~c with -I < e < I'
then e is the rotation angle we were looking for. Therefore, the orthonormal
eigenvectors Ul and U2 of A may be chosen as the rotation of the standard
basis through the angle e. The transition matrix is now
Problem 8.2 Sketch the graph of each of the following quadratic equations:
(1) 2x2 + 2y2 + 6yz + lOz 2 = 9;
(2) x 2 - 8xy + 16y2 - 3z 2 = 8;
(3) 4x 2 + 12xy + 9y2 + 3x - 4 = O.
They are related as [X]a = P[x]/3' where P = [I d]3 is the transition matrix
from (3 to n. This is just a change of coordinates (or variables). If we set
notations x = [x]a and y = [x]/3' then the quadratic form can be written as
Remark: (1) Clearly, two orthogonally similar matrices are congruent, but
the converse is not true in general. By Theorem 8.1, a symmetric matrix A
is congruent to a diagonal matrix D by an orthogonal matrix P. However, it
can be congruent to many diagonal matrices (not necessarily by orthogonal
matrices). In fact, if pT AP = D by an orthogonal matrix P, then the
matrix Q = kP, k -=I 0, also diagonalizes A to a different diagonal matrix via
a congruent relation:
which is another diagonal matrix with diagonal entries k 2AI, k 2A2, ... , k 2An.
In this case, if k -=I ±1, Q is not an orthogonal matrix and the resulting
diagonal entries are not the eigenvalues of A anymore.
(2) Sylvester's law of inertia (Theorem 8.13 in Section 8.7) says that
even though a symmetric matrix A may be congruent to various diagonal
matrices, the numbers of positive, negative and zero diagonal entries, respec-
tively, of the congruent diagonal matrices do not change no matter what the
diagonalizing matrices are. That is, any two congruent matrices have the
same inertia.
There is a practical way of diagonalizing a symmetric matrix through
the congruence relation by using the elementary operation of adding a row
(or a column) to a constant multiple of another.
Suppose that we have diagonalized a symmetric matrix A by an invertible
matrix P through the congruence relation pT AP = D. Since both P and
290 CHAPTER 8. QUADRATIC FORMS
Note also that multiplying an elementary matrix to A on the left (on the
right) is the same as executing an elementary row (column, respectively)
operation to A. Clearly, if E is an elementary matrix, so is ET. Moreover,
multiplying an elementary matrix ET on the right of A is equal to performing
the same operation on the i-th column of A as the row operation E operates
on the i-th row. Since A is symmetric, this means that the operations
EAET will have the same effect on the diagonally opposite entries of A
simultaneously. For instance, if
E = [ -1
1 0 0
1 0
1,
001
which adds -1 times the first row to the second row, yielding
Thus, the above equation implies that the operations performed on the
left side of A (i. e., Ek" . E2E1) are nothing but a Gaussian elimination on
A to get an upper triangular matrix pTA and those on the right side (i.e.,
E[ Ef ... En are the corresponding column operations to yield a diago-
nal matrix D. In summary, if we take a Gaussian-elimination of A by the
elementary matrices E 1, ... , E k , then E k ··· E 1AE[ ... El = D implies
P -- ET T '
1'" E k' z.e.,
Remark: (1) Notice that in this case P need not be an orthogonal matrix,
and the diagonal entries of D need not be eigenvalues of A.
(2) Be careful not to apply the same argument for the diagonalization of
symmetric matrices through the similarity p- 1 AP = D, because multiplying
E-l on the right of A is not the same column operation as E (try it yourself)
so that the operations EAE- 1 do not work for the diagonalization of A.
A~[~ ~ :]
be a symmetric matrix. The preceding method produces
[A I I] = [~2 3~ 6~ 1i ~0 0~ 1~]
1 0 0 I 1 -1
[ o -1 1 I 0 1
o 1210 0
El = [ - ~ 0~ ~],
o 1
E2 = [ ~ ~ ~], E3 = [~ ~ ~].
-2 0 1 0 1 1
Problem 8.3 Find an invertible matrix P such that pT AP is diagonal for each of
the following matrices:
(1) A = [
0 1
1 1
-1
0
1; (2) A [ -31
=
-1 0 2 1
292 CHAPTER 8. QUADRATIC FORMS
H = [a b
b c
1= [fxx(x o)
fxy(xo)
fxy(xo)
fyy(xo)
1
is a symmetric matrix, called the Hessian of f at Xo = (xo, YO). Hence,
f (x, y) has a local minimum (or maximum) at Xo if and only if the quadratic
form q( h, k) is positive (or negative, respectively) for all sufficiently small
(h, k). The critical point Xo is called a saddle point if q( h, k) takes both
positive and negative values. Thus at this point f(x, y) has neither a local
minimum nor a local maximum. (This is the second derivative test for a local
extrema of f(x, y).)
In particular, a quadratic form
H =2 [~ ~ 1= 2A.
Thus H is nonsingular if and only if ac - b2 =1= o.
Since q(O) = 0, it follows that the quadratic form q takes the global
minimum at 0 if and only if
q(x) = x T Ax > 0 for all x =1= 0,
[-io -~ -~]
-1 2
Example 8.6 For q(x,y) = 2x2 - 4xy + 5y2, determine the nature of the
critical point (0, 0).
Solution: The matrix A of the quadratic form is
[ 2 -2]
-2 5 .
Example 8.7 Find and describe all critical points of the function
1
f(x, y) = 3" x 3 + xy2 - 4xy + 1.
296 CHAPTER 8. QUADRATIC FORMS
Setting fy = 0, we get x = °°
or y = 2. Setting fx = 0, we see that if
x = 0, then y must be either or 4, and if y = 2, then x = ±2. Thus
(0, 0), (0, 4), (2, 2), (-2, 2) are the critical points of f. To classify these
critical points, we compute the second partial derivatives:
For each critical point (xo, Yo), we determine the eigenvalues }.1 and }.2 of
the Hessian
H= [ 2yo2xo- 4 2yo -
2xo
4] .
These values are summarized in the following table:
Ix = 2x + z, I y = 3 sin y, Iz = x + 2z.
It follows that (x, y, z) is a critical point of I if and only if x = z = 0 and
y = mr, where n is an integer. Let Xo = (0, 2k7r, 0). The Hessian of I at
Xo is given by
The eigenvalues of H(xo) are 3, 3, and 1. Since the eigenvalues are all
positive, it follows that H(xo) is positive definite and hence I has a local
minimum at Xo. On the other hand, at a critical point of the form Xl =
(0, (2k - 1)7r, 0), the Hessian will be
The eigenvalues of H(XI) are -3, 3, and 1. It follows that H(XI) is indefinite
and hence Xl is a saddle point of I. D
Problem 8.4 For each of the following functions, determine whether the given crit-
ical point corresponds to a local minimum, local maximum, or saddle point:
(1) f(x, y) = 3x2 - xy + y2 at (0, 0);
(2) f(x, y, z) = x 3 + xyz + y2 - 3x at (1, 0, 0).
Problem 8.5 Which of the following matrices are positive definite? negative defi-
nite? indefinite?
1 2 1
(1) [ 2 1 1
1, 2 0 0
(2) [ 0 5 3
1,
112 035
298 CHAPTER 8. QUADRATIC FORMS
Thus,
Ax (x, V1)Av1 + (x, V2)Av2 + ... + (x, vn)Av n
Al (x, V1)V1 + A2(x, V2)V2 + ... + An(x, vn)vn.
If JJxJJ2 = (x, V1)2 + (x, V2)2 + ... + (x, v n)2 = 1, then we obtain
x T Ax = (x, Ax) Al (x, V1)2 + A2(x, V2)2 + ... + An (x, v n)2
< Al (x, V1)2 + Al (X, V2)2 + ... + Al (x, v n)2
Al ((x, V1)2 + (x, V2)2 + ... + (x, v n)2) ,
AI,
8.5. APPLICATION: QUADRATIC OPTIMIZATION 299
where Al and An are the largest and the smallest eigenvalues of A, respec-
tively.
Example 8.9 Find the maximum and minimum values of the quadratic
form
x~ + x~ + 4X1X2
subject to the constraint x~ + x§ = 1, and determine values of Xl and X2 at
which the maximum and minimum occur.
Solution: The quadratic form can be written as
The eigenvalues of A are A = 3 and A = -1, which are the largest and
smallest eigenvalues, respectively. Their corresponding eigenvectors are
respectively. Note that those extreme values of the quadratic form occur at
those unit eigenvectors.
Thus, subject to the constraint x~ + x§ = 1, the maximum value of the
quadratic form is A = 3, which occurs at x = ±(1/v'2, 1/v'2) , and the
minimum value is A = -1, which occurs at x = ±(1/v'2, -1/v'2). 0
300 CHAPTER 8. QUADRATIC FORMS
Problem 8.6 Find the maximum and minimum values of the quadratic form
2xi + 2x~ + 3XIX2
subject to the constraint xi + x~ = 1, and determine values of Xl and X2 at which
the maximum and minimum occur.
Problem 8.7 Find the maximum and minimum of the following quadratic forms
subject to the constraint xi + x~ + x~ = 1 and determine the values of Xl, x2, and
X3 at which the maximum and minimum occur:
(1) xi + x~ + 2x~ - 2XIX2 + 4XIX3 + 4X2X3,
(2) 2xi + x~ + x~ + 2XIX3 + 2XIX2.
[aJ and [~ ~ 1
are positive.
The natural generalization of the above conditions will involve all n-
submatrices of A, called the principal submatrices of A, which are defined
as the upper left square submatrices
8.6. DEFINITE FORMS 301
Theorem 8.5 The following are equivalent for a real symmetric matrix A:
(1) A is positive definite, i. e., x T Ax > 0 for all nonzero vector x;
(2) all the eigenvalues of A are positive;
(3) all the principal submatrices Ak'S have positive determinants;
(4) all the pivots (without row interchanges) are positive;
(5) there exists a nonsingular matrix W such that A = WTW.
Thus x T Ax > 0 for all such nonzero xif and only if xr AkXk > 0 for all
nonzero Xk E ]Rk; that is, Ak'S are positive definite, all eigenvalues of Ak are
positive, and its determinant is positive.
(3) => (4) Recall that the symmetric matrix A can be factorized uniquely
into the form
where L is a lower triangular matrix with l's on its diagonal and D is the
diagonal matrix with the pivots dk of A on the diagonal. But the k-th pivot
dk is exactly the ratio of det Ak to det Ak-l :
dk = detA k .
detAk_l
Hence, all dk's are positive.
(4) => (5) Let A = LDLT as above with
o
D=
o
302 CHAPTER 8. QUADRATIC FORMS
Define
o
-ID=
o
Then, clearly det( VD) > 0, D = VDVD and (VD)T = VD. Hence,
because Wx i- o. o
Problem 8.8 State the corresponding conditions to the ones in Theorem 8.5 for the
negative definite forms.
Problem 8.9 Determine which one of the following matrices A and B is positive
definite. For the positive definite one, find a nonsingular matrix W such that it is
Theorem 8.6 The following are equivalent for a real symmetric matrix A:
(1) A is positive semidefinite, i. e., x T Ax ~ 0 for all vectors x;
(2) all the eigenvalues of A are nonnegative;
(3) all the principal submatrices Ak'S have nonnegative determinants;
(4) all the pivots (without row exchanges) are nonnegative;
8.7. BILINEAR FORMS 303
Example 8.11 Let V be a vector space and V* its dual vector space, that
is, V* = C(V; JR). Let b: V x V* - 7 JR be defined by
Then one can easily show that b is a bilinear form on the pair of vector
spaces V and V*. The reader should notice that the vector space operations
on the dual space V* are defined in such a way as to force the mapping b to
be a bilinear form.
Then the bilinearity of b proves that CPw E V*, from which we obtain a
linear transformation
Example 8.12 Let b : 1R2 x 1R2 --+ IR be defined by b(x, y) = XIYI + 3XIY2 +
2X2YI - X2Y2 with respect to the standard basis 0: = {el' e2}. Then b is
clearly a bilinear form, and the matrix representation of b with respect to 0:
is
[bl a = [~ - ~ ].
If (3 = {VI = (1, 0), V2 = (1, I)} is another basis for 1R2, then the matrix
representation becomes
[bl~ = [~ :],
because b(VI' VI) = 1, b(VI' V2) = 4, b(V2, VI) = 3 and b(V2, V2) = 5.
Since rank[bl a = rank[bl~ = 2, the bilinear form b is nondegenerate.
8.7. BILINEAR FORMS 307
with respect to the standard basis. Is this a bilinear form? If so, find the matrix
representation of b with respect to the basis
Q = {VI = (1, 0, 1), V2 = (1, 0, -1), V3 = (0, 1, On.
Find its rank.
(2) Let V = M 2x2 (lR) be the vector space of2 x 2 matrices, and let b : V x V -+ lR
be defined by b( A, B) = tr( A) . tr( B). Is this a bilinear form? If so, find the matrix
representation of b with respect to the basis
One can easily see that a bilinear form is symmetric (or skew-symmetric)
if and only if its matrix representation is symmetric (or skew-symmetric) for
any basis. As an example, an inner product on a vector space V is just a
symmetric, nondegenerate bilinear form on V.
A bilinear form b on V is diagonalizable if there exists a basis a for V
such that the matrix representation [b]a of b with respect to a is diagonal.
Example 8.13 Let b : lR3 x lR3 ---t lR be the bilinear form defined by
Then clearly b(x, y) = b(y, x), and the matrix representation of b with
respect to the standard basis Q = {el' e2, e3} is
[b]a = [ 0
0
-2 2
OIl ,
1 2-1
which is symmetric. Hence, the bilinear form b is symmetric. By Theo-
rem 8.10, it is diagonalizable. In fact,
[0o -20
n
1 I1 0
[[b]a I I] 210 1
1 2 -1 10 0
--+
[-1o 0 2
010 0
010 1 -: ] ~ ]D] Pl.
o 0 -1 I 1 2 -1
By a direct computation, one can easily show that pT[b]aP = D. 0
there exists a basis Q for V with respect to which the matrix representation
[b]a is of the form
[-~ ~ 1
[ -~ ~ 1
o
o
8.7. BILINEAR FORMS 309
Then clearly b(x, y) = -b(y, x), and the matrix representation of b with
respect to the standard basis a = {el' e2, e3} is
[bJa = [ -1
0 1 -1
0 1
1,
1 -1 0
[b]p =[ -~ H1 o
310 CHAPTER 8. QUADRATIC FORMS
Problem 8.15 Show that any bilinear form b on a vector space V is the sum of a
symmetric bilinear form and a skew-symmetric bilinear form.
The following theorem shows how quadratic forms and bilinear forms are
related.
This form b is clearly symmetric, bilinear and b(x, x) = q(x). The unique-
ness also comes from this relation. D
Proof: Let Q = {XI, ... , x P ' X p +l, ... , x n } be an ordered basis for V in
which
b(Xi' Xi) > 0 for i = 1, 2, ... , p, and
b(Xi' Xi) :::; 0 for i = p + 1, ... , n,
8.7. BILINEAR FORMS 311
and let (3 = {YI, ... , Yp', Yp'+l, ... , Yn} be another ordered basis for V
in which
b(Yi' Yi) > 0 for i = 1, 2, ... , p', and
b(Yi' Yi) :S 0 for i = p' + 1, ... , n.
To show p = P:, let U and W be subspaces V spanned by {Xl, ... , xp} and
{Yp'+l, ... , Yn}, respectively. Then b(u, u) > 0 for any nonzero vector
u E U and b(w, w) :S 0 for any nonzero vector wE W. Thus Un W = {O},
and
Hence, the index and signature together with the rank of a symmetric
matrix are invariants under the congruence relation, and any two of these
invariants determine the third, by noting that
Corollary 8.15 Two symmetric (square) matrices are congruent if and only
if they have the same invariants: index, signature and rank.
312 CHAPTER 8. QUADRATIC FORMS
Proof: Suppose that two symmetric matrices A and B have the same in-
variants, and let D and E be diagonal matrices congruent to A and B,
respectively. Without loss of generality, we may choose D and E so that
the diagonal entries are in the order of positive, negative and zero. Let p
and r denote the index and the rank, respectively, of both D and E. Let
di denote the i-th diagonal entry of D. Define the diagonal matrix Q whose
i-th diagonal entry qi is given by
ifl:S;i:S;p
ifp<i:S;r
if r < i :s; n.
Then
o
-Ir-p
o
Hence, A is congruent to Jpr, and so is B, i.e., A is congruent to B. 0
Example 8.15 Determine the index, the signature and the rank for each
of the following matrices.
n
Solution: In Example 8.5, we saw that the matrix A can be diagonalized
to
D= [~ ~[
Therefore, A has rank 3, index 2 and signature 1. The matrix B is already
diagonal, and has rank 3, index 3 and signature O. Using the method of
Example 8.5, one can show that C is congruent to the diagonal matrix with
diagonal entries 1, 1, -4. Therefore, C has rank 3, index 2 and signature l.
(Note that it is not necessary to find the eigenvalue of C to diagonalize it
orthogonally.) We conclude that A and C are congruent and B is congruent
to neither A nor C by Corollary 8.15. 0
8.8. EXERCISES 313
Problem 8.16 Prove that if the diagonal entries of a diagonal matrix are permuted,
then the resulting diagonal matrix is congruent to the original one.
Problem 8.17 Prove that the total number of distinct equivalence classes of congru-
ent n x n real symmetric matrices is equal to ~(n + l)(n + 2).
Problem 8.18 Find the signature, the index and the rank of each of the following
matrices.
0 1 2]
(1) [ 1 -2 3 ,
2 3 4
Which are congruent each other?
8.8 Exercises
8.1. Find the matrix representing each of the following quadratic forms:
(1) xI + 4XIX2 + 3x~,
(2) xI - x~ + x~ + 4XIX3 - 5X2X3,
(3) xI - 2x~ - 3x~ + 4XIX2 + 6XIX3 - 8X2X3,
(4) 3XIYI - 2XIY2 + 5X2YI + 7X2Y2 - 8X2Y3 + 4X3Y2 - X3Y3,
8.5. Determine whether each of the following matrices takes a local minimum,
local maximum or saddle point at the given point:
(1) f(x, y) = -1 + 4(e X - x) - 5xsiny + 6y2 at the point (x, y) = (0, 0);
(2) f(x, y) = (x 2 - 2x) cosy at (x, y) = (1, 11").
8.6. Show that the quadratic form q(x) = 2x2 + 4xy + y2 has a saddle point at
the origin, despite the fact that its coefficients are positive. Show that q can
be written as the difference of two perfect squares.
8.7. Find the maximum and the minimum values of the function
n
which is called the Rayleigh quotient of A, when
(1) A ~ U~ ~ (2) A ~ -~
[ -n]
8.8. Determine whether or not each of the following matrices is positive definite:
(1) [-~ -~ ~
o 1 -1
], (2) [-i -~
0 1-2
~] -2
3
o 0 5
-~ ~].
8.14. Let b be a bilinear form on IR.2 defined by
b(x, y) = 2X1Y1 - 3X1Y2 + X2Y2.
S.S. EXERCISES 315
(1) Find the matrix A of b with respect to the basis a = {(I, 0), (1, I)}.
(2) Find the matrix B of b with respect to the basis f3 = {(2, 1), (1, -I)}.
(3) Find the transition matrix Q from the basis f3 to the basis a and verify
that B = QT AQ.
8.15. Which of the following functions bon jR2 are of bilinear form?
mgt~: ~~ ~ ~XI + yd +
(3) b(x, y) = (Xl
- YI)2
2 -
X2Y2
(Xl - YI)2
(4) b(x, y) = XIY2 - X2YI
8.16. For a bilinear form on jR2 defined by b(x, y) = XIYI + X2Y2, find the matrix
representation of b with respect to each of the following bases:
a = {(I, 0), (0, I)}, f3 = {(I, -1), (1, I)}, I = {(I, 2), (3, 4)}.
8.17. Which one of the following bilinear forms on jR3 are symmetric or skew-
symmetric? For each symmetric one, find its matrix representation of the
diagonal form, and for each skew-symmetric one, find its matrix representa-
tion of the block form in Theorem [Link].
(1) b(x, y) = XIY3 + X3YI
(2) b(x, y) = XIYI + 2XIY3 + 2X3YI - X2Y2
(3) b(x, y) = XIY2 + 2XIY3 - X2Y3 - X2YI - 2X3YI + X3Y2
(4) b(x, y) = L:;,j=l(i - j)XiYj·
8.18. Find the signature, index and rank of each of the following symmetric matri-
1 32] , ~ ].
-3
-1 2
3 4 1 -6
8.19. Determine whether the following statements are true or false, in general, and
justify your answers.
(1) For any quadratic form q on jRn, there exists a basis a for jRn with
respect to which the matrix representation of q is diagonal.
(2) Any two matrix representations of a quadratic form have the same in-
ertia.
(3) The sum of two bilinear forms on V is also a bilinear form.
(4) If A is a real symmetric positive definite matrix, then the solution set
of x T Ax = 1 is an ellipsoid.
(5) For any nontrivial bilinear form b i- 0 on V, if b(v, v) = 0, then v = O.
(6) Any symmetric matrix is congruent to a diagonal matrix.
(7) Any two congruent matrices have the same eigenvalues.
(S) Any two congruent matrices have the same determinant.
(9) Any matrix representation of a bilinear form is diagonalizable.
(10) If a real symmetric matrix A is both positive semidefinite and negative
semidefinite, then A must be the zero matrix.
(11) Any two similar real symmetric matrices have the same signature.
Chapter 9
9.1 Introduction
Recall that an n x n matrix A is diagonalizable if and only if A has a full set
of n linearly independent eigenvectors. In particular, if A is normal, then
A can be diagonalized by a unitary matrix U whose column vectors are the
orthonormal eigenvectors of A. There are some nonnormal matrices that
are still diagonalizable. Of course, in this case the transition matrix need
not be unitary. If a matrix A is diagonalizable, then the dimension of each
eigenspace E()") = N()"I - A) is equal to the multiplicity of the eigenvalue
)... Therefore, if )..1, ... , )..e are distinct eigenvalues of A with multiplicities
mAl' ... , mAl' respectively, then
and hence,
In some cases, a matrix may not have a full set of linearly independent
eigenvectors. That is, a matrix A may have an eigenvalue).. with multiplicity
m A > 1, but the number of linearly independent eigenvectors belonging to
).. could be less than m A , so
This means that A does not have enough eigenvectors belonging to ).., hence
it is impossible to find a transition matrix Q such that Q-l AQ = D is a
diagonal matrix. In this case, it wouldn't be easy, for example, to compute
317
318 CHAPTER 9. JORDAN CANONICAL FORMS
the exponential matrix eA, so the general solution x( t) = etAc of the system
x'(t) = Ax(t) of linear differential equations may not be easily found.
However, we show in this section that even a nondiagonalizable matrix
is similar to a matrix very "close" to a diagonal matrix, called a Jordan
canonical form. In this case, the columns of a transition matrix Q are
something similar to eigenvectors, but not quite. They are called generalized
eigenvectors. Using this Jordan canonical form of A, the computation of e A
could be easier.
Recall that if A is a diagonalizable matrix, then the general solution of
a system of linear differential equations x'(t) = Ax(t) is given as
x(t) = etAxo = cleAltul + c2eA2tu2 + ... + cneAntun,
where the vectors u/s are linearly independent eigenvectors of A belonging
to the eigenvalues 'xi'S (see Section 6.4). This solution is not valid if A is not
diagonalizable, since we do not know how to compute etA. The following
example will illustrate how to handle these cases for n = 2 and 3 by solving
a system x'(t) = Ax(t) for an arbitrary matrix A. If the reader is not
comfortable with systems of linear differential equations, the next example
may be skipped, but it will be very helpful to understand the most essential
features of a Jordan-canonical form.
Example 9.1 (1) Let x'(t) = Ax(t) be a system of linear differential equa-
tions, and let A be a 2 x 2 matrix with an eigenvalue ,x of multiplicity 2.
If dimE('x) = 2, then one can find a basis {UI, U2} of E('x); then
Xi = eAtui' i = 1,2, are linearly independent solutions of the system. Thus
the general solution is
x(t) = CIXI + C2X2 = eAt(quI + C2U2),
where CI, C2 are arbitrary constants. Note that A is diagonalized by them.
Suppose dim E('x) = 1. Then with a basis U of E('x) one obtains only
one solution Xl (t) = eAtu. To get the general solution of the system, we need
one more solution linearly independent to XI(t). Motivated by the type of
solutions in Example 6.17, we assume that the second solution is of the form
x(t) = teAtv + eAtw,
where the vectors v and ware to be determined. As a solution, this vector
function should satisfy the equation x' (t) = Ax( t). Thus for all t
x' (t) teAt,Xv + eAtv + 'xeAtw,
Ax(t) teAt Av + eAt Aw.
9.1. INTRODUCTION 319
By comparing the coefficient vectors of teAt and eAt, we obtain two equations:
Av = AV, or (A - AI)V = 0,
AW=V+AW, or (A - AI)W = v.
x(t) + C2X2(t)
ClXl(t)
where Cl, C2 and C3 are arbitrary constants. In this case, the matrix Q =
[Ul U2 U3] diagonalizes A as usual:
J
320 CHAPTER 9. JORDAN CANONICAL FORMS
n
AQ = A[u v wl = [Au Av Awl [.[Link] v + .Awl
lu v wi [~ ~ QJ,
9.1. INTRODUCTION 321
(iii) Finally, suppose that dim E (A) = 1, and let u be a basis for E (A) so
that we get only one solution Xl(t) = e'~tu. Then, by experience, the second
and third solutions are supposed to be of the form
By substituting these equations into the equation x'(t) = Ax(t), one can
obtain
Az = W+ AZ, or (A - AI)z = w,
AW=U+AW, or (A - AI)w = (A - AI)2z = u,
Au = AV, or (A - AI)u = (A - AI)3z = o.
It can be shown that the solution vectors v (= u), w, and z are linearly
independent (see Theorem 9.2 below), which are so-called generalized eigen-
vectors of A. They give us three linearly independent solutions Xi'S. Thus
the general solution is given as
Hl
AQ = A[uwz] [Au Aw Az] [AU u + AW W + AZ]
= [u wz[ [ ~ = QJ,
A 1
where J = [ 0 A
o 0
!l Thus Q-'AQ = J. o
The matrix J in each of the above cases is called the Jordan canonical
form of the matrix A. Note that in each case in the example, J can be divided
into smaller submatrices, called Jordan blocks: For instance, in case (ii) of
1 and J 2 = [~ ~ 1of order 2. Since etJ1 = eAt and eth = eAt [~ ~ l'
tJ1
Qe tJ Q -1 Xo = Q [e 0 0
etJ2 1Q -1 Xo
the corresponding Jordan canonical form is just the diagonal matrix with
eigenvalues on the diagonal. Hence, a diagonal matrix is a particular case of
the Jordan canonical form.
Actually, the Jordan canonical form of a matrix can be completely de-
termined by the multiplicities of the eigenvalues and the number of linearly
independent eigenvectors in each of the eigenspaces without knowing the
transition matrix Q as shown in the following example.
which consists of only one Jordan block with eigenvalue A on the diagonal.
(2) Suppose it has two linearly independent eigenvectors belonging to A.
Then the Jordan canonical form of A is either one of the forms
A 1 A
0 A A 1 0 0
J= A 1 0 or J= 0 A 1 0
0 A 1 0 0 A 1
0 0 A 0 0 0 A
each of which consists of two Jordan blocks with eigenvalue A on the diagonal.
(3) Suppose it has three linearly independent eigenvectors belonging to
A. Then the Jordan canonical form of A is either one of the forms
A A
A 1 A
J= 0 A or J= A 1 0
A 1 0 A 1
0 A 0 0 A
each of which consists of three Jordan blocks with eigenvalue A on the diag-
onal.
324 CHAPTER 9. JORDAN CANONICAL FORMS
A 1
o A
which consists of four Jordan blocks with eigenvalue A on the diagonal.
(5) Suppose it has five linearly independent eigenvectors belonging to A.
Then the Jordan canonical form of A is of the form
which is just the diagonal matrix, that is, the Jordan canonical form of A
coincides with the diagonalizability.
Note that in cases (2) and (3), the problem of choosing one of the two
possible Jordan canonical forms that is similar to the given matrix A depends
on the nature of the eigenvectors of A and will be discussed in the following
section. D
J=
[0]
(1) Find all possible forms of the matrix A that can be similar to J, i.e.,
Q-1 AQ = J for some invertible matrix Q.
(2) Find an invertible matrix Q such that Q-1 AQ = J.
6 1
o 6
A Xl X2 ... X5 Xl X2 ... X5 o 1
o 0
o
Hence,
These "special" vectors, X2 and X4, fill up the deficient eigenvectors that
the eigenvalues 6 and 0 are lacking, respectively, and are called generalized
eigenvectors.
In summary, if a 5 x 5 matrix A is similar to the Jordan canonical form J,
then A should have eigenvalues 6 and 0 of multiplicities 2 and 3 respectively,
but only one linearly independent eigenvector, say Xl, belonging to 6 and
only two linearly independent eigenvectors, say X3, X5, belonging to O. For
such a matrix A, the transition matrix Q can be made by Xl, X2, ... , X5,
where X2, X4 are nonzero vectors satisfying the following equations:
but
(A - 6I)X2 =I- 0, and (A - 0I)x4 =I- O. o
326 CHAPTER 9. JORDAN CANONICAL FORMS
A= [
-4
~ =~3 =~].
3
(A - I)x
The three equations are identical, so we get two linearly independent eigen-
vectors Ul = (1,0,2) and U2 = (0,2, -3). Thus Xi(t) = etui' i = 1,2, are
two linearly independent solutions.
(3) For the third solution, we set x(t) = tetv + etw, where v and ware
supposed to satisfy (A - I)v = °
and (A - I)w = v. The first equation
means v is an eigenvector of A, so one can write
(A - I)w = [
-4
: =: =~] [~] [
3 2 Z 2Cl -
2~~ 1
3C2
= v.
~ 1'
1 2
(4) Note that for Q = [u v w 1= [ 0 4 we get
2 -2 -1
J=
[ o~ ~ ~ 1
0 1
= [Jl
0 h
0 l'
or A = QJQ-l. Now the general solution may also be computed as
= Q[ e~1
~ e'lu v tv+w] [ ~: 1
et (C1U + C2V + C3(tv + w))
= et (C1U + (C2 + C3t)V + w),
where Q-lxQ = (Cl' C2, C3), since etJ1 = e t and etJ2 = et [ ~ ~ 1by Exam-
pIe 6.17. This coincides with the result in (3). o
Problem 9.1 Let A be a 5 x 5 matrix with two distinct eigenvalues A of multiplicity
3 and p, of multiplicity 2. Find all possible Jordan canonical forms of A up to
permutations of the Jordan blocks.
that those corresponding to the first columns of each of the Jordan blocks
of J are precisely the linearly independent eigenvectors of A, and, as we saw
in Example 9.1, the other columns of Q are some generalized eigenvectors.
Xk x,
Xk-l (A - >..I)x (A - ..\I)xk,
Xk-2 (A - ..\I)2x (A - ..\I)xk-l'
X2 = (A - ..\I)k-2x (A - >"I)X3,
Xl = (A - ..\I)k-Ix (A - ..\I)X2.
Definition 9.2 The set of vectors {Xl, X2, ... , xd is called a chain of
generalized eigenvectors belonging to the eigenvalue ..\.
Note that, if X is a generalized eigenvector of A of rank k > 1 belonging
:s
to an eigenvalue ..\, then, for each e, 1 < e k, (A - >..I)exe = (A - >..I)kx = 0
and (A_..\I)e-I xe = (A-..\I)k-Ix i- o. Hence, the vector Xi! = (A->..I)k-e x
is a generalized eigenvector of A of rank e. However, Xl = (A - >..I)k-I x is
always an eigenvector belonging to >.., called the initial vector of the chain.
Note also that (A - >..I)i!Xi = 0 for e ?: i.
The following series of theorems shows that a transition matrix Q may
be constructed from a set of linearly independent generalized eigenvectors of
A, and justifies the invertibility of Q.
Example 9.3 also reveals how to find a transition matrix Q practically,
and the validity of the method is justified by the following theorems.
Proof: Let us solve CIXI +C2X2+·· ·+qxk = 0 for scalars Ci, i = 1, ... , k.
If we multiply (on the left) both sides of this equation by (A - >"I)k-l, then
for i = 1, ... , k - 1,
9.2. GENERALIZED EIGENVECTORS 329
Proof: Let {Xl, X2, ... , xd and {YI, Y2, ... , ye} be the chains of gen-
eralized eigenvectors of A belonging to the eigenvalues A and Ji-, respectively,
and let A i- Ji-. We wish to show that the set of vectors {Xl, ... , Xk, YI, ... ,
ye} is linearly independent. To solve the linear dependence of them,
for Ci'S and d/s, we multiply both sides of the equation by (A - AI)k and
note that (A - AI)kxi = 0 for all i = 1, ... , k. Thus we have
Theorem 9.4 Let S = {Xl, X2, ... , xd and T = {YI, Y2, ... , ye} be
two chains of generalized eigenvectors of A belonging to the same eigenvalue
A. If the initial vectors Xl and YI are linearly independent, then the union
S U T is linearly independent.
(A - 2I)x
From the second equation, x has to be of the form (a, b, 0), and from
the first equation we must have b =1= o. Let us take U2 = (0, 1, 0) for a
generalized eigenvector of rank 2. Thus we have
Then
Q ~ [~
0
1
0 -n 1
so Q-l=
[~ -3]0
1
0
1
1
.
2
~l
[-:- -:-J
Q-lAQ = = [
+ J2] ,
0 I
where J l = [ ~ ~] and J2 = [3]. 0
A= [ H ~ ~ 1·
-1 4 -6 4
Solution: The characteristic polynomial of the matrix A is
332 CHAPTER 9. JORDAN CANONICAL FORMS
-~ -~0 -1~ ~1 1
A-I
r -1o 4 -6 3
is 3; the fourth row is a linear combination of the first three rows, which are
linearly independent. Thus there is only one eigenvector belonging to A = 1,
say x = (1, 1, 1, 1). We need to find a generalized eigenvector of rank 4,
which is a solution x of the following equations:
3 -3
r -1 3 -3
(A - 1)3 x
-1
-1
3
3
-3
-3
; 1x -I 0,
(A - 1)4 x = o.
But, a direct computation shows that the matrix (A - 1)4 = o. Hence, we
may take any vector that satisfies the first equation, say x = (-1, 0, 0, 0),
-n
as a generalized eigenvector of rank 4. Now, take X4 = (-1, 0, 0, 0), and
-~ -~
r -1 4 -6 3
-~
1r 1 r 1'
0
~
1
(-1,0, 1,2),
(1, 1, 1, 1).
Thus clearly the chain of generalized eigenvectors {Xl, X2, X3, X4} is linearly
independent. Therefore,
~r ~
-1 1 0
-~ 1
1
Q=
0
1
0
0
and Q-1
-1 1 0
1 -2 o
1 1'
r1 2 1 -1 3 -3 1
and
~ !
1 0
1 1
Q-IAQ = =J.
0 1
r 0 0 1 0
9.3. COMPUTATION OF EA 333
Problem 9.2 Find a full set of generalized eigenvectors of the following matrices:
[~ ~ ! n
Problem 9.3 Find the Jordan canonical form for each of the following matrices:
(2) [ 04 14 2]
2 ,
(3)
004
9.3 Computation of eA
The Jordan canonical form of an arbitrary matrix enables us to compute the
exponential matrix. Let A be an arbitrary square matrix, and let J be the
Jordan canonical form of A such that
).. 1 0 1 0 0 0 1 0
0 0 0 0
J= ).. +
).. 1 1 0 0 1
0 0 )..
0 0 1 0 0 0
AI + N.
334 CHAPTER 9. JORDAN CANONICAL FORMS
Jk =
0 A 1
A
0
1
= (AI +N)k = t
j=O J
(~)Ak-jNj.
0 0 A
Jk = ~ (~)Ak-jNj
j=O J
-3 1 2]
o2 1
2 0
100]
(1) A = [ 0 0 2 0 '
000 1
(2) B = [ =~
-2
1
1
-3
-1 2
-1 2
1 4
.
eA = eQJQ - 1
= QeJ Q-l
eh
o
= Q
o
9.3. COMPUTATION OF EA 335
where the Ji's are Jordan blocks. Thus, it is enough to compute e J when J
is a simple Jordan block of the form
J=>..I+N,
where I and N are as in (1). Then, as usual, Nk = 0 for k ~ n, and
1 1
1 1 2! (n - I)!
1
o 1 1
(n - 2)!
1
1
o 1
I
etAyo =
t2 tn - l
1 t
2! (n - I)!
tn - 2
0 1 t
(n - 2)! C2
Cl
eAt [UI U2 ... Un]
1
Cn
t
0 1
where Q-IyO = (q, ... , cn) and the Ui'S are generalized eigenvectors be-
longing to >.. of A.
n
Example 9.7 Solve the linear differential equation y' Ay with initial
condition y(O) = Yo, where
A ~ [j -~ =n, Yo ~ [
336 CHAPTER 9. JORDAN CANONICAL FORMS
where
since
eth = e 2t [ ~ ~1 and e th = e 3t .
n
Thus
0
2t
e2t
e~t ][
A = 2
[ -3 -11 -11 1, Yo = [ 1
-1
-1 .
9 3 -4 1
9.4. CAYLEY-HAMILTON THEOREM 337
~ l'
>.kn
we have
f(>'d
f(D) = [ b
338 CHAPTER 9. JORDAN CANONICAL FORMS
Jk 0 ]
Jk = [ :1 ... :
. .'
o Jks
it is enough to show f(J) = 0 for a single Jordan block J = aI + N with
eigenvalue a, where Nn = O. Since f()..) = det()"I - A) = det(M - J) =
().. _ a)n,
D
f(J) = f(aI + N) = (aI + N - aIt = N n = o.
A= [j _j J]
is f()..) = det(M - A) = )..3 +)..2 - 6)", and
f(A) A 3 +A2 - 6A
p ~]
-27 -102 -54 9 30 18
~]
0
~
6
-6 [ 2 = 0
-3 -12 -6 0 0 D
9.4. CAYLEY-HAMILTON THEOREM 339
A= [-! ~ -~]
-2 4 1
Hence
A-I 1 2
10(A - 8A + 17h)
= -
1 [ -5 -10 10]
1 0 2 .
10 -14 -20 22
0
Problem 9.7 Let A and B be square matrices, not necessarily of the same size, and
let f(>..) = det(>..I - A) be the characteristic polynomial of A. Show that f(B) is
invertible if and only if A has no eigenvalue in common with B.
p(A) = r(A).
Therefore
Problem 9.8 For the matrix A = [~ ~ ~ 1' evaluate the matrix polynomial
002
9.5 Exercises
9.1. Show that if A nonsingular, then A-I has the same block structure in its
Jordan canonical form as A does.
9.5. EXERCISES 341
9.2. Find the number of linearly independent eigenvectors for each of the following
matrices:
1 1 0 0 0 2 0 0 0 0 2 1 0 0 0
0 1 1 0 0 0 2 0 0 0 0 2 0 0 0
(1) 0 0 1 0 0 , (2) 0 0 2 0 0 , (3) 0 0 3 0 0
0 0 0 3 1 0 0 0 5 1 0 0 0 3 0
0 0 0 0 3 0 0 0 0 5 0 0 0 0 5
9.5. Solve y' = Ay, where A = [=621 2: : ] and y(l) = (2, 1, 0).
-12 -6
9.6. Solve the initial value problem
y~ = -Y1 +2Y3, Y1(O) = -2
{ Y2 = 2Y1 +Y2 -2Y3, Y2(O) = 0
y~ = -2Y1 +3Y3, Y3(O) = -1.
9.9. For each of the following matrices, find a polynomial of which the matrix is
a root.
(3) [~ i -~].
o 2 -1
9.10. Verify that each of the matrices below satisfies its own characteristic polyno-
mial and from these results compute A -1, if it exists.
a2 a3 a4 al
(1) Show that any circulant matrix is normal.
(2) Find all eigenvalues of the n x n circulant matrix
o 1 0 0
o 0 1 0
w=
o0 0 1
1 0 0 0
(3) Find all eigenvalues of the circulant matrix A by showing that
n
A = LaiWi-1.
i=l
(4) Use your answer to find the eigenvalues of
o 1 1 1
101 1
B=
1 1 0 1
1 1 1 0
9.12. Determine whether the following statements are true or false, in general, and
justify your answers.
(1) Any square matrix similar to a triangular matrix.
(2) If a matrix A has exactly k linearly independent eigenvectors, then the
Jordan canonical form of A has k Jordan blocks.
(3) If a matrix A has k distinct eigenvalues, then the Jordan canonical form
of A has k Jordan blocks.
(4) If a 4 x 4 matrix A has eigenvalues 1 and 2, each of multiplicity 2, such
that dimE(I) = 2 and dimE(2) = 1, then the Jordan canonical form
of A has three Jordan blocks.
(5) If A!, ... , Ak are k distinct eigenvalues of A with multiplicities mi and
dimE(Ai) =I- mi, then A is not diagonalizable.
(6) For any Jordan block J with eigenvalue A, det e J = eA.
(7) If f(x) is a polynomial and A is a square matrix such that f(A) = 0,
then f(x) is a multiple of the characteristic polynomial of A.
Selected Answers and Hints
Chapter 1
Problems
1. 2 (1) Inconsistent.
(2) (Xl, X2, X3, X4) = (-1 - 4t, 6 - 2t, 2 - 3t, t) for any t E R
1.3 (1) (x, y, z) = (t, -t, t). (3) (w, x, y, z) = (2, 0, 1, 3)
1.4 (1) b1 + b2 - b3 = O. (2) For any bi's.
1. 7 a =- 127, b= 123, C = 143 , d = -4.
1.8 Consider the matrices: A = [; :], B = [; ~], C = [~ ~].
1.9 Compare the diagonal entries of AAT and AT A.
1.11 (1) Infinitely many for a = 4, exactly one for a i- ±4,and none for a = -4.
(2) Infinitely many for a = 2, none for a = -3, and exactly one otherwise.
343
-L nD~u +4;3lU~[~ T +1
344 Selected Answers and Hints
L25L~[+
1.26 There are four possibilities for P.
1.27 (1) II = 0.5,12 = 6,13 = 5.5. (2) h = O,h = h = 1,14 = h = 5.
0.35]
1.29 x = k [ 0.40 , for k > O.
0.25
Exercises
1.23 (1) A = [ ;
3 1 1
~ ~] [~~ ~ [~0 i0
0 0 -1
] i],
1
Chapter 2
Problems
2.4 (1) -27, (2) 0, (3) (1- x 4)3.
2.6 Let a be a transposition in Sn. Then the composition of a with an even (odd)
permutation in Sn is an odd (even, respectively) permutation.
2.9 (1) -14. (2) o.
2.10 See Example 2.6, and use mathematical induction on n.
2.12 If A = 0, then clearly adjA = O. Otherwise, use A· adjA = (det A)I.
2.13 Use adjA . adj(adjA) = det(adjA) = I.
2.14 (1) Xl = 4,X2 = 1, X3 = -2.
10 5 5
(2) X = 23' Y = 6' z = 2·
346 Selected Answers and Hints
Exercises
2.1 k = 0 or 2.
2.2 It is not necessary to compute A 2 or A 3 .
2.3 -37.
2.4 (1) detA= (_I)n-l(n-1). (2)0.
2.5 -2,0,1,4.
2.6 Consider L alCY(l) ... ana(n)'
Chapter 3
Problems
3.22 By (2) of Theorem 3.21 and Corollary 3.18, a matrix A of rank r must have
an an invertible submatrix C of rank r. By (1) of the same theorem, the rank
of C must be the largest.
3.24 dim(V + W) = 4 and dim(V n W) = l.
3.25 A basis for V is {(1,0,0,0), (0,-1,1,0), (0,-1,0,1)},
for W: {(-I, 1,0,0), (0,0,2, I)}, and for V n W : {(3, -3, 2, I)}. Thus,
dim(V + W) = 4 means V + W = ]R4 and any basis for]R4 works for V + W.
348 Selected Answers and Hints
3.28
°° °2
1 1
1 2
Exercises
3.1 Consider 0(1,1).
3.2 (5).
3.3 (1), (2), (3).
3.4 (1).
3.5 (1), (4).
3.6 No.
3.7 (1) p(x) = -Pl(X) + 3p2(X) - 2P3(X).
3.8 No.
3.10 No.
3.11 {(I, 1,0), (1,0, I)}.
3.12 (3) (5, 2, 0).
3.13 2.
3.14 Consider {ej = {a;}~d where ai = { °
I if i = j,
otherwise.
°
3.15 (1) 0 = c1Ab 1+. ,+cpAb s = A(c1b 1+ . .+cpb S ) implies c1b 1+. ,+cpbP = 0
since N(A) = 0, and this also implies Ci = for all i = 1, ... ,p since columns
of B are linear independent.
(2) B has a right inverse. (3) and (4): Look at (1) and (2) above.
3.16 (1) {( -5,3, I)}. (2) 3.
3.17 5!, and dependent.
3.18
(1) R(A) ((1,2,0,3), (0,0,1,2)), C(A) = ((5,0,1), (0,5,2)),
N(A) (( -2,1,0,0), (-3,0, -2, 1)).
(2) R(B) ((1,1, -2, 2), (0,2,1, -5), (0,0,0,1)),
C(B) ((1,-2,0), (0,1,1), (0,0,1)), N(B) = ((5,-1,2,0)).
3.19 rank = 2 when x = -3, rank = 3 when x =J -3.
3.21 See Exercise 2.23: Each column vector of UyT is of the form ViU, that is, u
spans the column space. Conversely, if A is of rank 1, then the column space
is spanned by anyone column of A, say the first column u of A, and the
remaining columns are of the form ViU, i = 2, ... , n. Take y = [1 V2 ... VnV,
Then one can easily see that A = UyT.
3.22 Three of them are true.
Selected Answers and Hints 349
Chapter 4
Problems
~ [~ ~ ~ ~
in n
415 [S +TI. [T 0 Sio
Exercises
4.1 (2).
4.2 ax 3 + bx 2 + ax + c.
4.5 (1) Consider the decomposition of v = v+;(v) + v-;(v).
4.6 (1) {(x, ~x, 2x) E JR3 : x E JR}.
4.7 (2) T-1(r, s, t) = (~r, 2r - s, 7r - 3s - t).
4.8 (1) Since To S is one-to-one from V into V, To S is also onto and so T is
onto. Moreover, if S(u) = S(v), then To S(u) = To S(v) implies u = v.
Thus, S is one-to-one, and so onto. This implies T is one-to-one. In fact, if
T(u) = T(v), then there exist x and y such that S(x) = u and S(y) = v.
Thus To S(x) = To S(y) implies x = y and so u = T(x) = T(y) = v.
4.9 Note that T cannot be one-to-one and S cannot be onto.
4.14 (1) T(1, 0, 0) = (4, 0), T(1, 1, 0) = (1, 3), T(1, 1, 1) = (4, 3).
(2) T(x, y, z) = (4x - 2y + z, y + 2z).
4.16 (1) P ~ [~
4.17 Use the trace.
_: -H (2) Q ~ [: ~ ~ p-'
-7 -33 -13]
4.18 (1) [ 4 19 8·
n
4.26 N(T) = {O},
Chapter 5
Problems
5.1 (x,y)2 = (x,x)(y,y) if and only if Iltx+yl12 = (x,x)t 2 +2(x,y)t+(y,y) =0
has a repeated real root to.
5.2 (4) Compute the square of both sides and then use Cauchy-Schwarz inequality.
5.4 (j,g) = fo1 f(x)g(x)dx defines an inner product on e[O, 1]. Use Cauchy-
Schwarz inequality.
5.5 (1) ~(2, 1, -1), (2) Jh(6, 4, -3).
5.6 (1): Orthogonal, (2) and (3): None, (4): Orthonormal.
5.9 1) is just the definition, and use (1) to prove (2).
352 Selected Answers and Hints
5.11 -i+x.
5.13 {I, V3(2x - 1), V5(6x 2 - 6x + I)}.
5.15 Projw(p) = (~,~, -~).
5.26 [ ~~ ]
'2g
= x = (AT A)-lATh = [ ~~3:
16.1
].
5.28 For x E lR m, x = (VI, X)VI + ... + (vm, x)v m = (VI v[)x + ... + (v m v;')x.
5.29 First, show that P is symmetric.
5.30 The line is a subspace with an orthonormal basis ~ (1, 1), or is the column
space of A = ~[~ ].
Exercises
5.1 Inner products are (2), (4), (5).
5.2 For the last condition of the definition, note that (A, A) tr(A T A)
2::i,j aTj = 0 if and only if aij = 0 for all i, j.
5.4 (1) k=3.
5.5 (3) Ilfll = Ilgll = Ji/2, The angle is 0 if n = m, ~ if n =I- m.
5.6 Use the Cauchy-Schwarz inequality and Problem 5.1 with x = (al,···, an)
and y = (1,···,1) in (lR n , .).
Selected Answers and Hints 353
5 7 (1) _ 37 {l9
. 4' V3'
(2) If (h, g) = h( ~ + ~ +c) = 0 with h =1= 0 a constant and g(x) = ax 2+bx+c,
then (a, b, c) is on the plane ~ + ~ + c = 0 in JR3.
3 1
5.10 (1) 2V2, (2) 2V2.
5.12 Orthogonal: (4). Nonorthogonal: (1), (2), (3).
5.16 Use induction on n. Let B be the matrix A with the first column Cl replaced
by C = Cl -ProjW(cl), and write ProjW(Cl) = a2c2+" ·+anc n for some ai's.
Show that Jdet(AT A) = Jdet(BT B) = Ilcllvol(c2, ... , cn) = vol(P(A)).
3
in y = a + bx. Then y = x + 2'
1 -1 1 -1
1 0 0 0
5.22 Follow Exercise 5.21 with A = 1 1 1 1 . Then y = 2x 3 - 4x 2 +
1 2 4 8
1 3 9 27
3x - 5.
5.25 (1) Let h(x) = ~(f(x) + f( -x)) and g(x) = ~(f(x) - f( -x)). Then f = h+g.
(2) For fEU and 9 E V, (1, g) = f~1 f(x)g(x)dx = - fl- 1 f( -t)g( -t)dt
= - f~1 f(t)g(t)dt = -(1, g), by change of variable x = -to
(3) Expand the length in the inner product.
5.26 Six of them are true.
(1) Consider (1,0) and (-1,0).
(2) Consider two subs paces U and W ofJR3 spanned by el and e2, respectively.
(3) The set of column vectors in a permutation matrix P are just
{el' ... , en}, which is a set of orthonormal vectors.
354 Selected Answers and Hints
Chapter 6
Problems
[T]Q =A = [~ : : ~]. The eigenvalues are 3,1, 1, -1, and their asso-
101 1
ciated eigenvectors are (1,1,1,1), (-1,0,1,0), (0, -1,0,1), and (-1,1, -1, 1),
respectively.
6.16 With '''peot to the bas;, a ~ {I, x, x'J, [T]. ~ [~ : ~]. The e;gen-
values are 2,1, -1 and the eigenvectors are (1,1,1), (-1,1, 0) and (1,1, -2),
respectively.
6.19 Eigenvalues are 1, 1, 2 and eigenvectors are (1, 0, 0), (0,1,2) and (1,2,3).
AlOX = (1025, 2050, 3076).
xn an ] -_
= [ an-l [1 -1]
1 a
[ an-I] -_ A Xn-l -_ An Xl,
an-2
with al = 1 and a2 = O. Using the eigenvalues might make the computation
a mess. Instead, one can use the Cayley-Hamilton Theorem 9.5: Since the
characteristic polynomial of A is A2 - A + 1, A2 - A + I = a holds. Thus,
A 3 = A2 - A = -I, so A 6 = I. One can now easily compute an modulo 6.
6.22 The characteristic equation is A2 - XA - 0.18 = O. Since A = 1 is a solution,
x = 0.82. The eigenvalues are now 1, -0.18 and the eigenvectors are ( -0.3, -1)
and (1, -0.6).
6.23 (1) eA = [~ e- i ].
6.24 The initial status in 1985 is Xo = (xo, Yo, zo) = (0.4, 0.2, 0.4), where x, y, Z
represent the pe[rc;~t]age Of[la;~e, ~~diU~, a]nd[ S~4al]l car owners. In 1995, the
status is Xl = Yl = 0.3 0.7 0.1 0.2 = Axo. Thus, in 2025,
Zl a 0.2 0.9 0.4
Selected Answers and Hints 357
the status is X4 = A4 xQ . The eigenvalues are 0.5, 0.8, and 1, whose eigenvec-
tors are (-0.41,0.82, -0.41), (0.47,0.47, -0.94), and (-0.17, -0.52, -1.04),
respectively.
Yl(X) = _2e 2 (1-x) + 4e 2 (x-l)
6.27 (1) { Y2(X) = _e 2(1-x) + 2e 2(x-l) (2) { Yl(X) = e 2 x(cosx - sin x)
Y3(X) = 2e 2 (1-x) - 2e 2 (x-l).
Y2(X) = 2e 2x sinx.
6.28 (1) f()..) = )..3 - 10)..2 + 28).. - 24, eigenvalues are 6, 2, 2, and eigenvectors are
(1,2,1), (-1,1,0) and (-1,0,1).
(2) f()..) = ().. - 1)()..2 - 6)" + 9), eigenvalues are 1, 3, 3, and eigenvectors are
(2,-1,1), (1,1,0) and (1,0,1).
6.29 Three of them are true:
(1) For A = [~ ~], if B = Q-l AQ, then Q must be singular.
Chapter 7
Problems
7.1 (1) u· v = uT V =-L"' UiVi -= Li ViUi =- v-:-rr.
= Li kUiVi = k Li UiVi = k(u· v).
(3) (ku) . v
(4) °
u· u = Li IUil2 ~ 0, and U· u = if and only if Ui = for all i. °
°: :;
7.2 (1) If x = 0, clear. Suppose x:f 0 :f y. For any scalar k,
(x - ky,x - ky) = (x, x) - k(x,y) - k(y,x) + kk(y,y). Let k = ~~::\
to obtain 1(x, x) (y, y) - 1(x, y) 12 ~ 0. Note that equality holds if and only if
x = ky for some scalar k.
(2) Expand IIx + Yl12 = (x + y, x + y) and use (1).
7.3 Suppose that x and y are linearly independent, and consider the linear depen-
dence a(x+y)+b(x-y) = 0 ofx+y and x-Yo Then 0 = (a+b)x+(a-b)y.
Since x and yare linearly independent, we have a + b = and a - b = which
°
are possible only for a = = b. Thus x+y and x-yare linearly independent.
° °
Conversely, if x + y and x - yare linearly independent, then the the linear
dependence ax+by = 0 ofx and y gives ~(a+b)(x+y)+ ~(a-b)(x-y) = O.
°
Thus we get a = = b. Thus x and y are linearly independent.
358 Selected Answers and Hints
7.4 (1) Eigenvalues are 0, 0, 2 and their eigenvectors are (1,0, -i) and (0,1,0),
respectively. (2) Eigenvalues are 3, 1+2VS , 1-2VS, and their eigenvectors are
-i 1-i) (VS- 3 i 1 1-VS(1 + i)) and (- vs+3i 1 l+VS(1 + i)) respec-
( 1 '~2' 2"2 ' 2"2 '
tively.
7.5 Refer to the real case.
7.6 (AB)H = (AB)T = BT-;r? = BHAH.
7.7 (AH)(A-1)H = (A-1 A)H = I.
7.8 The determinant is just the product of the eigenvalues and a Hermitian matrix
has only real eigenvalues.
7.9 See Exercise 6.9.
7.10 To prove (3) directly, show that "X(x . y) = 7l(x . y) by using the fact that
AH x = -[Link] when Ax = [Link].
7.11 AH = BH + (iC)H = BT - iCT = -B - iC = -A.
7.12 ±AB = (AB)H = BHAH = (±B)(±A) = BA, + if they are Hermitian, - if
they are skew-Hermitian.
7.13 Note that det U H = det U, and 1 = det 1= det(UHU) = Idet U12.
7.16 Since A- 1 = A H , (AB)HAB = I.
7.17 Hermitian means the diagonal entries are real, and diagonality implies off-
diagonal entries are zero. Unitary means the diagonal entries must be ±1.
7.20 This is a normal matrix. From a direct computation, one can find the eigen-
values, 1- i, 1- i and 1 + 2i, and the corresponding eigenvectors: (-1,0,1),
(-1,1,0) and (1,1,1), respectively, which are not orthogonal. But by an
orthonormalization, one can obtain a unitary transition matrix so that A is
unitarily diagonalizable.
7.21 AHA = (H1 - iH2)(H1 + iH2) = (H1 + iH2)(H1 - iH2) = AAH if and only
if H1H2 - H2H1 = 0.
7.22 In one direction these are all already proven in the theorems. Suppose that
UH AU = D for a unitary matrix U and a diagonal matrix D.
(1) and (2). If all the eigenvalues of A are real (or purely imaginary), then
the diagonal entries of D are all real (or purely imaginary). Thus DH = ±D,
so that A is Hermitian (or skew-Hermitian).
(3) The diagonal entries of D satisfy 1).1 = 1. Thus, DH = D-1, and
AH = UD- 1U- 1 = A- 1.
1 [y'3° -v'2 -1 1
7.23 Q = y6
6 y'3 v'2
v'2
-2
1
.
Selected Answers and Hints 359
+ 3-2V6 [
1 (1-V6)(2+i)
5
1
6 (1-V6)(2-i) 7-2V6 .
5 5
Exercises
7.1 (1) y6, (2) 4.
7.4 (1) A = i, x = t(l, -2 - i), A = -i, x = t(l, -2 + i).
(2) A = 1, x = t(i, 1), A = -1, x = t(-i, 1).
(3) Eigenvalues are 2,2 + i, 2 - i, and eigenvectors are (0, -1, 1)),
(1, -i(2 + i), 1), (1, -i(2 - i), 1).
(4) Eigenvalues are 0, -1,2, and eigenvectors are
(1,0, -1)), (1, -i, 1), (1, 2i, 1).
7.6 A + cI is invertible if det(A + cI) =1= O. However, for any matrix A, det(A +
eI) = 0 as a complex polynomial has always a (complex) solution. For the
. [cos
rea I matnx ()] A + r I"IS mvertl'ble Clor every reaI numb er r
. ()() - sin ()'
sm cos
since A has no real eigenvalues.
(6) and (7) A permutation matrix is an orthogonal matrix, but not symmetric.
(10) There is an invertible matrix Q such that A = Q-1 DQ. Thus,
det(A + if) = det(D + if) =I- o.
(11) Consider A = [; =~]. (12) Modify (10).
Chapter 8
Problems
8.1
-1
3 -4] 1 [0 1 1]
1 ,(2) 2 1 0 1 ,(3) [ 101 101 -100 -025] .
1 4 1 1 0 -5 0 2-1
8.2 (1) The eigenvalues of A are 1, 2, 11. (2) The eigenvalues are 17, 0, -3, and
so it is a hyperbolic cylinder. (3) A is singular and the linear form is present,
thus the graph is a parabola.
8.4 (1) local minimum, (2) saddle point.
8.5 (1) is indefinite. (2) and (3) are positive definite.
8.6 max =~ at ±(I/V2, 1/V2), min =~ at ±(I/V2, -1/V2).
8.7 (1) max = 4 at ± ~ (1, 1, 2), min = -2 at ± ~ (-1, -1, 1);
1 . 1
(2) max = 3 at ± J6 (2, 1, 1), mm = 0 at ± J3 (1, -1, -1).
8.8 A is negative definite if and only if -A is positive definite.
8.9 B with the eigenvalues 2, 2 + V2 and 2 - V2.
8.12 The determinant is the product of the eigenvalues.
8.13 (2) bl l = b14 = b41 = b44 = 1, all others are zero.
Selected Answers and Hints 361
Exercises
8.5 (2) The point (1,7r) is a critical point, and the Hessian is [~ _~]. Hence,
1(1,7r) is a local maximum.
8.7 Note that the maximum value of R(x) is the maximum eigenvalue of A, and
similarly for the minimum value.
8.9 If A is an eigenvalue of A, then ..\2 and *are eigenvalues of A2 and A-I,
respectively. Note x T (A + B)x = x T Ax + x T Bx.
Chapter 9
Problems
9.2 (1) For A = -1, Xl = (-2, 0, 1), X2 = (0, 1, 1), and for A = 0, Xl =
n
(-1, 1, 1). (2) For A = 1, Xl = (-2, 0, 1), X2 = (~, ~, 0), and for A = -1,
Xl = (-9, -1, 1).
9.5 The eigenvalue is -1 of multiplicity 3 and has only one linearly independent
eigenvector (1,0,3). The solution is
365
366 Index
The row space and column space of a matrix have the same dimension, known as the rank of the matrix. This is because the number of pivot columns (non-zero row vectors in row-echelon form) determines both spaces' dimensions .
The determinant of the product of two matrices A and B is the product of their determinants, i.e., det(AB) = det(A)det(B). This property relies on the linearity and multiplication rule of determinants that allows such a decomposition .
For a triangular matrix, forward elimination can reduce it to a diagonal form without altering the determinant. In such matrices, the determinant equals the product of the diagonal entries as there are no cross terms and the rest of the terms in the determinant computation are zero .
A matrix is invertible if and only if its determinant is non-zero. If det A = 0, the matrix A is singular and not invertible .
An orthonormal basis can be constructed using the Gram-Schmidt process, which orthogonalizes a set of vectors and normalizes them to unit vectors. This transformation maintains the vector space structure while enforcing orthogonality and normalization .
Elementary row operations affect the determinant of a matrix in the following ways: Multiplying a row by a constant k scales the determinant by k; interchanging two rows changes the sign of the determinant; adding a multiple of one row to another does not change the determinant .
The inverse of an invertible matrix A can be found using its adjugate by the formula A^-1 = adj(A)/det(A). This makes use of the relationship between the matrix and its cofactor matrix (adjugate), divided by the determinant .
The determinant of a matrix A is equal to the determinant of its transpose, det(A) = det(A^T). This property implies that the transpose operation does not affect the invertibility of a matrix; if A is invertible, so is A^T, affirming that singular and invertible properties are preserved through transpose .
A set of vectors is linearly independent if there is no non-trivial linear combination of them that equals the zero vector. Conversely, they are dependent if such a combination exists. For matrix columns or rows, this translates to the homogeneous equation Ax = 0 having only the trivial solution indicating independence; otherwise, they are dependent .
A square matrix is diagonalizable if it has a full set of linearly independent eigenvectors. Diagonalization involves expressing the matrix in terms of these eigenvectors, where the diagonal matrix represents eigenvalues along its diagonal. This requires the eigenvectors to form a complete basis for the matrix space .