Matrix Notes
Matrix Notes
“It is not knowledge, but the act of learning, not the possession of but the act of getting
there, which grants the greatest enjoyment.” Carl Friedrich Gauss
Most of the content of this chapter is a revision of what we already know. The
discussion is brief but sufficient for our purpose. There are many ‘why?’, consider them
as interesting exercises, it will help revising the concepts better. Ideally we should have
started with the introduction to graphs, but we will begin with the revision of matrices,
as they are a very natural way to represent a graph.
2.1 Matrices
Let us recall our school days where we have studied the matrices, the arrangment of
numbers (entries) in rows and columns. For example following is a matrix
1 0
Q=
−1 1 .
0 −1
The entries of a matrix can be complex numbers but we will restrict ourselves to real
numbers, in fact almost always to just integers. An another matrix
" #
1 −1 0
0 1 −1
9
has the same entries as Q but the arrangement is different; the rows of the Q become
corresponding columns of the above matrix. By making rows of a matrix M correspond-
ing columns we get another matrix called as transpose of M , denoted M T . The above
matrix is QT . We see in Q there are three rows and two columns. In general when a
matrix has m rows and n column we say the matrix has the order or the dimension
m × n, so Q has order 3 × 2. In particular when m = n, we say the matrix is a square
matrix of order n. A matrix of order 1 × n is called a row vector of order n; similarly, an
n × 1 matrix is a column vector of order n1 . The set of all the column vectors of order n
having real entries is denoted by Rn . We denote the i-th row of a matrix M by M (i, :),
and the i-th column by M (:, i). The entry that is in row M (i, :) and column M (:, j) is
denoted by M (i, j). If i = j, we call M (i, j) a diagonal entry, else nondiagonal entry.
Often we write M = (mij ) to denote the matrix where M (i, j) = mij .
Recall matrix product of two matrices. The product of Q with QT , that is, QQT is the
matrix
1 −1 0
L=
−1 2 .
−1
0 −1 1
Check whether QQT = QT Q?...Big no!. Matrix product is probably one of the first
few counterintuitive notions in our school. Firstly, for any two matrices A, B it is not
necessary that both AB and BA are defined. Second, even if they both are defined it is
not necessary that AB = BA, that is, it is not commutative.
Now let us come back to L. Apart from being a square matrix L has an special
arrangement of entries; it is transpose of itself, that is, L = LT . Such a matrix is
called a symmetric matrix. Symmetric matrices are very interesting and useful matrices,
they are used in plethora of applications. A trivial symmetric matrix where all the
nondiagonal entries are zero is called a diagonal matrix. We denote a diagonal matrix
D = (dij ) of order n by diag(d11 , . . . , dnn ), where dii is the i-th diagonal entry. For
example diag(1, 2, 1) is the matrix
1 0 0
D=
0 2 0 .
0 0 1
10
Exercise 2.1.1. Prove that for any two square matrices of order n if AB = In , then
BA = In .
y y y
v v v
u
u
u
x x x
Figure 2.1: Different possibilities for vectors u, v. If u, v are orthogonal then they are
also independent.
It is correct, now let us see the above product with a different way; the output vector
[2, 3, −5]T can be written as
1 0
2
−1 + 5 1 ,
0 −1
11
Suppose we have vectors v1 , . . . , vn and scalars α1 , . . . , αn , not all zero, then the vector
α1 v1 + · · · + α2 vn is said to be a linear combination of v1 , . . . , vn . The scalars α1 , . . . , αn ,
are the coefficients of the linear combination.
Suppose we have matrices A of order m × n and B of order n × p. Its now easy to see
that in AB the columns are the linear combinations of the columns of A and coefficients
are given by the columns of B. We can think of product AB as linear combinations
of rows rather than columns. In AB the rows are the linear combinations of the rows
of B and coefficients are given by the rows of A. We will see in many occasions that
visualising the matrix product as linear combinations of rows or columns is quite handy.
Write down AB as (a) linear combination of rows, (b) linear combination of columns.
α1 v1 + · · · + α2 vn = 0. (2.1)
Check that columns of Q and the output vector [2, 3, −5]T form a set of linearly dependent
vectors. If the vectors v1 , . . . , vn are not dependent, they are called independent vectors.
For a given set of vectors V = {v1 , . . . , vk } we can form a set S of all the vectors which
are some linear combination of vectors in V . The( "set
# S" is# called
" # )the vector space of
1 0 1
V . For example the vector space for a set V = , , is the 2D space as
0 2 2
every vector in 2D(plane
" # "can # )written as some combination of vectors in V . But see
1 0
that the set V 0 = , can also generate (or span) the 2D space using linear
0 2
" #
1
combinations of its vectors. In this context the vector is redundant. For a vector
2
space a maximal independent set of vectors whose linear combinations can generate all
vectors in the vector space is called it basis. In the above example V 0 is basis for 2D
12
while V is not a basis as it is not a maximal independent set. Note that there can be
several basis for a vector space, but the number of vectors in every basis is the same.
This number is called the dimension of the vector space.
Exercise 2.3.1. Check whether the following vectors are linear independent or dependent
2 0 4
−1 , 3 , 0 .
0 2.5 1
Exercise 2.3.2. Find a maximal set of pair-wise independent vectors from the following
vectors.
1 1 2 −1 −1
1 , 0 , 1 , −1 , 0 .
−1 −1 0 1 1
Exercise 2.3.3. Find a value of x for which the following set of vectors form a basis
for R3 .
( 1 x 1 )
V0 =
0 , 0 , x
x 0 1
Following inequality about the dot product of two vectors is quite helpful to know. The
proof is given in Appendix 19.
Cauchy–Schwarz inequality
√
where, |.| denotes the absolute value. The quantity uT u is known as 2-norm of u
and is denoted by ||u||. Thus Cauchy–Schwarz inequality is often stated as
|uT v| ≤ ||u||||v||.
Exercise 2.3.5. Prove that equality in the Cauchy-Schwarz inequality holds if and only
if u = cv, where c is some scalar.
13
It is quite fruitful to know that if the dot product of two nonzero vectors u, v, that
is, uT v is 0, then they are always independent (why?), such vectors are known as
orthogonal vectors. Recall our 3D-coordinate space with famous x-axis, y-axis, z-axis,
pick one vector along each of these axes, check that they are pairwise orthogonal. In
general, if v1 , . . . , vn are pairwise orthogonal vectors, they form an independent set of
vectors (why?). Corresponding to a vector space if a basis has vectors which are all pair
wise orthogonal to each other, then this basis is called as orthogonal basis (its callled
orthonormal basis if in addition every vector has norm 1.)
Exercise 2.3.7. Prove that vectors u, v are orthogonal if and only if ||u + v|| = ||u − v||.
Exercise 2.3.8. Find a value of x for which the following set of vectors form an
orthogonal basis for R3 . Convert the orthogonal basis into a orthonormal basis.
( x −2 0 )
V =
1 , x , 1 .
1 1 −x
M x = α1 v1 + · · · + αn vn .
14
column rank must also be 15. For any matrix the row rank and the column rank are
the same. This number is called the rank of M , denoted rank M .
Suppose the column rank of M is p. That means there exist basis vectors b1 , . . . , bp
and each column vector of M can be written as some linear combination of these
basis vectors. Suppose vi = α1i b1 + · · · + αki bp , (the coefficients α1i , . . . αpi can not be
simultaneously zero) and let ai = [α1i , α2i , . . . , αpi ]T , i = 1, . . . , n. Then we can write
| | ... | | | ... | | | ... |
M =
v 1 v 2 . . . vn
= b1 b2 . . . bp a1 a2 . . . an .
| | ... | | | ... | | | ... |
| {z }| {z }
B A
Now think about the row rank of matrix M . Observe that any row of M is a linear
combination of rows in A and the coefficients are given by the rows in B. (In particular,
i-th row of M is a linear combination of rows of A, and the i-th row of B give the
corresponding coefficients.) This implies that row rank of M is at most p which is the
column rank.
Now suppose the row rank of M is q. Now visualise M in terms of its rows. Let
u1 , u2 , . . . , um be its rows. As the row rank is q, there exist basis vectors y1 , . . . , yq
and each row vector of M can be written as some linear combination of these basis
vectors. Suppose ui = βi1 y1 + · · · + βik yq , (again, the coefficients βi1 , . . . βiq can not be
simultaneously zero) and let xi = [βi1 , βi2 , . . . , βiq ], i = 1, . . . , m. Then we can write
− u1 − − x1 − − y1 −
− u2 − − x2 −− y2 −
M =. =.
.. .. .. . .. ..
...
.. ..
. .
. . . . .
− um − − xm − − yq −
| {z }| {z }
X Y
Now think about the column rank of matrix M . Observe that any column of M is a
linear combination of columns in X and the coefficients are given by the columns in Y .
15
(In particular, i-th column of M is a linear combination of columns of X, and the i-th
column of Y gives the corresponding coefficients.) This implies that column rank of M
is at most q which is the row rank.
Combining the facts that the row rank can be at most column rank, and column rank
can be at most rank rank we arrive at the conclusion that both the ranks must be equal.
Theorem 2.4.1. For any matrix the row rank and the column rank are the same.
Exercise 2.4.2. Find the row rank and the column rank of the following matrix and
verify that both are the same " #
1 −1 0
.
2 −1 4
Exercise 2.4.3. Prove that if M = BC, then rank M ≤ min(rank B, rank C).
M x = 0. (2.2)
Is this possible even if M, x are nonzero matrices? Think about two real number a, b such
that ab = 0. We know that this is possible when only at least one of a, b is zero. Can we
generalize this to matrices and say that at least one of M, x must be a zero matrix for
M x = 0? No!. Multiply QT by all-one vector [1, 1, 1]T it gives a zero vector [0, 0, 0]T .
The set of all x satisfying Equation (2.2) is called null space of M . Vector [1, 1, 1]T is in
the null space of QT . The minimal number of vectors whose linear combinations form
the null space of M is called the nullity of M , denoted, null M . Another fundamental
fact is
rank M + null M = n,
see Appendix 19 for a proof. Next, suppose x is a nonzero vector, under what condition
M x = 0? Notice that this is the same as the form (2.1). When M is a square matrix
what does this imply? Here comes our old strange old mate, the determinant! It is
possible only when the determinant of M , denoted det M is zero. Recall that M x = 0
16
Figure 2.2: 2D plane: on projecting (multiplying) each point in the left shaded region
with area a by matrix A and we get the shaded region on right with the area | det(A)|a.
(In 3D it happens for volume.)
says that the columns of M are linearly dependent which implies det M = 0, to figure
out why? Its time to revisit our old mate, the determinant. We had already seen
the applications of determinants in solving the system of linear equations, finding the
area of 2D figures and volume of 3D figures. In, fact determinants can determine the
areas or volumes of figures resulting after linear transformation, see Figure 2.2. We will
see many more applications of determinants. The following quote on determinant by
mathematician J. J. Sylvester tells its great importance.
For what is the theory of determinants? It is an algebra upon algebra; a calculus which
enables us to combine and foretell the results of algebraical operations, in the same way
as algebra itself enable us to dispense with the performance of the special operations of
arithmetic. All analysis must ultimately clothe itself under this form.
and
17
a b c " # " # " #
e f d f d e
d e f = a det h i − b det g i + c det g h
det
(2.4)
g h i
= aei − af h − bdi + bgf + cdh − ceg.
In the above equation we expanded the determinant of the 3 × 3 matrix with respect to
the first row. This method is called Laplace expansion which is more formally defined
as follows. Let M = (mij ) be a matrix of order n, and let M (i|j) be the matrix that
results from M by removing i-th row and j-th column. Then for any i the following
recurrence relation holds
n
X
det M = (−1)i+j mij det M (i|j). (2.5)
j=1
Recall that the following elementary row operations; these are at the heart of determi-
nant.
If the rows (hence the columns) of M are linear dependent, by using these row operations
we can make a row to be zero, which implies det M = 0. Note that all the three row
operations on any matrix leave the rank of the matrix unchanged.
A matrix resulting after applying one row operation to the identity matrix I is known
as an elementary matrix. Those which involve switching rows of the identity matrix are
called permutation matrices. Performing any of the three row operations on a matrix A
is equivalent to take the product EA, where E is the elementary matrix obtained by
using the desired row operation on the identity matrix. Thus
18
6 0 (why?). Another important fact that
(similarly rank AE = rank A). Also det E =
follows from the elementary row operations is that for any square matrix M
Recall that if M is full rank, that is, all the rows are independent, then there exist
elementary matrices, E1 , . . . , Ek such that
Ek . . . E1 M = I. (2.8)
where A, B are any suitable order matrices. The above discussion leads to the following
important relation on determinant and determinant of the product of two square matrices
A, B of the same order.
Exercise 2.6.3. Using determinants find the rank of the following matrix.
1 1 2 3
4 1 5 6
7 1 8 9
19
Exercise 2.6.4. If M is a nonsingular matrix, prove that det A−1 = 1
det A .
Exercise 2.6.7. Prove that for a p × q matrix M , det M T M > 0 if and only if
rank M = q.
Till now we have seen plenty of examples for multiplying a matrix with a vector. If M is
a square matrix of order n and x is a vector of order n, then M x is some vector of order
n. Usually x and M x are oriented in different directions. But there are special vectors
x such that both x and M x have the same direction or just opposite direction. In other
words, M x is some scaling of x. These special vectors are known as eigenvectors of M ,
and the scaling factor is known as the eigenvalue. Formally we have
M x = λx. (2.11)
The scalar λ is the eigenvalue, and x is a corresponding eigenvector. Notice that any
nonzero multiple of x is still an eigenvector corresponding to the eigenvalue λ. Thus
most often, we consider x having unit length, that is, xT x = 1, also called a normal
vector.
y y y Mu
Mu
−1
u=
1
1 1
u= u=
1 0
x x x
Mu
20
The eigenvalues, eigenvectors play a significant role in understanding various phenomena
in many branches of science. How to find out the eigenvalues and eigenvectors? In
Equation (2.11) the only thing that we know is M ; the other two are unknown. But
somehow, if we know one of λ, x we can know the other. We can write
(M − λI)x = 0.
It has the same form as Equation (2.2). It implies that det(M − λI) = 0. This gives a
polynomial of λ having degree n, we call this polynomial as the characteristic polynomial.
The n roots λ1 , . . . , λ2 of this polynomial are the eigenvalues of M . The set of all the
eigenvalues are known as the spectrum of the matrix. We may factor the characteristic
polynomial as
det(M − λI) = (λ1 − λ) . . . (λn − λ).
The relationship between the eigenvalues and the determinants of submatrices is quite
crucial to know. A principal submatrix of M is a submatrix formed by a set of rows
and the corresponding set of columns. A principal minor of M is the determinant of a
principal submatrix. We state the following theorem which tells a relationship between
eigenvalues and the principal minors. A proof is given in Chapter 19 (not yet).
For k = 1, this implies that the sum of all the eigenvalues is equal to the trace of M (the
sum of the diagonal entries). For k = n, this implies that the product of the eigenvalues
is equal to the determinant of M .
" #
1 2
Exercise 2.7.2. Find the eigenvalues of in terms of k. Also find an eigenvector
k 1
corresponding to each of the eigenvalues.
Exercise 2.7.3. Find the eigenvalues and eigenvectors of the following n × n matrices
n 0 ... 0 0 0 ... 0 0 1 ... 1 n 1 ... 1
0 n . . . 0 0 0 . . . 0 1 0 . . . 1 1 n ... 1
, , ,
. . . . .. . . .
.. .. . . ...
. . .
.. .. . . ...
. . .. ..
.. .. . . .. .. . .
0 0 ... 0 0 0 ... n 1 1 ... 0 1 1 ... n
21
Exercise 2.7.4. Prove that, if v1 , v2 , . . . , vk are the eigenvectors of M associated with
the same eigenvalue λ, then any nonzero linear combination
v = α1 v1 + α2 v2 + · · · + αk vk
Exercise 2.7.5. For a square matrix M , prove that M and M T have the same eigen-
values.
Exercise 2.7.6. For any Am×n , Bn×m matrices, prove that the nonzero eigenvalues of
AB and BA are the same.
M x = αx, (2.12)
M y = βy. (2.13)
On both the side of Equation (2.12), take dot product with y, similarly, on both the
side of Equation (2.13), take dot product with x, we get
y T M x = αy T x, (2.14)
xT M y = βxT y. (2.15)
Taking transpose on both the sides of Equation (2.14), and then subtracting it from
Equation (2.15) we get (α − β)(xT y) = 0, which implies xT y = 0, that is, x, y are
orthogonal.
Theorem 2.8.1. For a symmetric matrix the eigenvectors associated with distinct
eigenvalues are orthogonal.
22
2.8.1 Orthogonal diagonalization
In fact for symmetric matrices we can say more than what Theorem 2.8.1 conveys.
Even if the eigenvalues are not distinct we can choose associated eigenvectors to be
orthogonal. The next theorem states this fact, its proof is given in Appendix 19. It
is also known as the spectral theorem, and the decomposition it gives is known as
eigenvalue decomposition of a symmetric matrix.
Consider the following symmetric matrix A and a matrix B which is a principal submatrix
of A.
−2 4 2 −4
4
−2 4 2
4 2 1
A= , B = 4 4 2.
2 2 0 −3
2 2 0
−4 1 −3 4
The eigenvalues of A are λ1 = 8.62, λ2 = 5.03, λ3 = −1.95, λ4 = −5.70, and the eigen-
values of B are β1 = 7.03, β2 = −0.81, β3 = −4.23. Let us plot these eigenvalues on the
number line.
β3 β2 β1
λ4 λ3 0 λ2 λ1
23
Any βi is trapped between λi and λi+1 ; we have" λi+1#≤ βi ≤ λi for i = 1, 2, 3.
4 2
Now let us consider another principal submatrix of A. Its eigenvalues are
2 0
β1 = 4.83, β2 = −0.83. Let us plot these too on the number line.
β2 β1
λ4 λ3 0 λ2 λ1
λi ≥ βi ≥ λi+n−m , for i = 1, . . . , m.
In particular if m = n − 1, then
λ1 ≥ β1 ≥ λ2 ≥ · · · ≥ βn−1 ≥ λn .
Its eigenvalues are λ1 = 8.76, λ2 = 7.13, λ3 = 1.35, λ4 = 0.76. On the number line they
are placed as follows.
0 λ4 λ3 λ2 λ1
24
All the eigenvalues of M are nonnegative. A symmetric matrix whose all the eigenvalues
are nonnegative is known as positive semi-definite matrix, and when the eigenvalues are
positive it is known as positive definite matrix. Check that M is positive definite. These
matrices have very high importance for different applications and display beautiful
properties. We will see two equivalent criteria for a matrix to be positive semidefinite
or definite.
Let M be a symmetric positive semidefinite matrix of order n. Using 2.8.2 we can
write M = V DV T , where V is the matrix with columns v1 , v2 , . . . , vn which are the
eigenvectors of M corresponding to the eigenvalues λ1 , . . . , λn , and V is a orthogonal
1 1 1
matrix. Since the eigenvalues are nonnegative we can write D = D 2 D 2 , where D 2
√ 1
is a diagonal matrix with i-th diagonal entry equals to λi . Let V̂ = V D 2 , then
1 1
we can write M = V D 2 D 2 V T = V̂ V̂ T . It implies that for any nonzero vector x, we
have xT M x = xT V̂ V̂ T x = (V̂ T x)T (V̂ T x) ≥ 0. We will see that this is a very useful
information. When M is positive definite, that is, all the eigenvalues are positive, then
V̂ has rank n and so V̂ T x is can not be a zero vector unless x is zero vector. Since x is
nonzero this implies that xT M x > 0.
Now, assume that xT M x ≥ 0, for any symmetric matrix M and nonzero vector x. Using
Theorem 19.0.5 the smallest eigenvalue λn of M is equal to
xT M x
λn = min xT M x = min .
||x||=1 ||x||2
Since xT M x ≥ 0, we have, λn ≥ 0 which implies that all the eigenvalues are nonnegative.
If xT M x > 0, it implies that all the eigenvalues are positive. Thus we state the following
theorem.
Exercise 2.8.5. If A and B are positive semidefinite matrices, then trace AB ≥ 0, and
equality holds if and only if AB = 0.
Exercise 2.8.6. Prove that the rank of a symmetric matrix is equal to the number of
25
its nonzero eigenvalues.
Exercise 2.8.8. Show that trace AT A = 0 if and only if the eigenvalues of AT A are
zero.
Exercise 2.8.10. Show that for any matrix Am×n , nullspaces of A and AT A are the
same. Also, the column spaces of A and AT A are the same.
compute the eigenvalues, and verify the Cauchy interlace theorem by computing the
eigenvalues of its principal submatrices.
26