MAE270A Linear Dynamic Systems
Lecture Notes #1
Prof. M’Closkey
• Some notation (not comprehensive† )
• Eigenvalue/vector review
• A theorem on matrices with distinct e-vals
• Definitions of hermitian and real symmetric matrices, unitary matrices, and skew-
symmetric matrices
• Hermitian matrix diagonalization
†
Supplemental notes not covered in class: Linear_Algebra_Review1.pdf and
Linear_Algebra_Review2.pdf
Notation
1. R denotes the set of real numbers, C denotes the set of complex numbers. If we
use F, it means that within a definition/theorem/proof every occurrence of F can
be interpreted either as R or C.
2. Rn will denote the set of all n × 1 vectors with real entries. Cn will denote the set
of all n × 1 vectors with complex entries. If v ∈ Fn , then vi denotes the i’th entry
of v.
3. Rn×m denotes the set of all real, n × m matrices. Here, the first integer denotes
the number of rows, and the second integer denotes the number of columns.
Similarly, Cn×m denotes the set of complex n × m matrices. If A ∈ Fn×m , then
(A)ij or aij usually denote the (i, j)’th entry of A (i’th row, j’th column).
4. Let v ∈ Fn , then vT denotes the transpose of v. Note that vT ∈ F1×n and (vT )1k =
vk .
5. Let v ∈ Fn , then v∗ denotes the complex conjugate-transpose of v. Note that
v∗ ∈ F1×n and (v∗ )1k = vk∗
6. Let A ∈ Fn×m , then AT denotes the transpose of A. Note that AT ∈ Fm×n so
(AT )ij = (A)ji = aji .
7. Let A ∈ Fn×m , then A∗ denotes the complex conjugate-transpose of A. Note that
A∗ ∈ Fm×n .
Matrix-vector operations
1. Addition of vectors: if v ∈ Fn and u ∈ Fn , then the sum u + v ∈ Fn is defined by
(v + u)i := vi + ui
2. Addition of matrices: if A ∈ Fn×m and B ∈ Fn×m , then the sum A + B ∈ Fn×m is
defined by
(A + B)ij := aij + bij
1
3. Scalar multiplication: if α ∈ F and v ∈ Fn , then the product αv ∈ Fn is defined by
(αv)i := αvi
4. Scalar multiplication: if α ∈ F and A ∈ Fn×m , then the product αA ∈ Fn×m is
defined by
(αA)ij := αaij
5. Matrix-Vector multiplication: if A ∈ Fn×m and v ∈ Fm , then the product Av ∈ Fn is
defined by
m
X
(Av)i := aij vj
j=1
6. Matrix-Matrix Multiplication: if A ∈ Fn×m and B ∈ Fm×p , then the product AB ∈
Fn×p is defined by
m
X
(AB)ij := aik bkj
k=1
7. Suppose x, y, z ∈ Fn and α, β ∈ F, then
• Addition is commutative and associative
x + y = y + x, x + (y + z) = (x + y) + z
• The zero vector 0n is the only vector such that x + 0n = x for all x ∈ Fn
• For each x ∈ Rn the vector −x is the unique vector such that
x + (−x) = 0
• Scalar multiplication is associative and distributive with respect to scalar addi-
tion
α(βx) = (αβ)x, (α + β)x = αx + βx
• Scalar multiplication is distributive with respect to vector addition
α(x + y) = αx + αy
8. Matrix addition is commutative, so for every matrix A and B of the same dimen-
sions,
A+B =B+A
2
9. For every A, B ∈ Fn×m and every x, y ∈ Fm ,
A(x + y) = Ax + Ay, (A + B)x = Ax + Bx
Hence vector addition distributes across matrix-vector multiplication, and matrix
addition distributes across matrix-vector multiplication.
10. Matrix multiplication is associative, so for every A ∈ Fn×p , B ∈ Fp×q , A ∈ Fq×m
A(BC) = (AB)C
11. For every A ∈ Fn×m , α ∈ F, and x ∈ Fm ,
A(αx) = α(Ax) = (αA)x
12. Block Partitioned Matrices: Suppose that n1 + n2 = n, m1 + m2 = m, p1 + p2 = p.
Let A ∈ Fn×m and B ∈ Fm×p be partitioned as
A11 A12 B11 B12
A= B=
A21 A22 B21 B22
with each Aij ∈ Fni ×mj and Bjk ∈ Fmj ×pk . Then
A11 B11 + A12 B21 A11 B12 + A12 B22
AB =
A21 B11 + A22 B21 A21 B12 + A22 B22
13. Inner product: Let u, v ∈ Cn . Their inner product, denoted hu, vi, is defined
n
X
∗
hu, vi = v u = vk∗ uk ∈ C.
k=1
The vectors are orthogonal if hu, vi = 0. If u, v ∈ Rn then ∗ is replaced by T .
3
Eigenvalues and Eigenvectors
Definition:
• The scalar λ and the non-zero vector t ∈ Cn are called an eigenvalue-eigenvector
pair of A ∈ Cn×n if the following relation is satisfied
At = λt
Facts:
• the degree n polynomial det(sI −A), where s is an indeterminate variable, is called
the characteristic equation of A
• the eigenvalue/vector relation may be rearranged to (λI − A)t = 0 and this implies
det(λI − A) = 0 since t 6= 0
• e-vals are necessarily roots of the characteristic equation and the converse is also
easily shown to be true
Definition:
• two matrices A, B ∈ Cn×n are similar if there exists a nonsingular matrix T ∈ Cn×n ,
i.e. det T 6= 0, such that
T −1 AT = B
Facts:
• Similar matrices have the same characteristic equation (hence, the same e-vals)
because
det(sI − B) = det(sI − T −1 AT )
= det T −1 (sI − A)T
= det(T −1 ) det(sI − A) det(T )
= det(sI − A)
• similar matrices should be thought of as the same matrix expressed in different
coordinate systems
4
• matrices that have the same characteristic equation are not necessarily similar,
though, for example,
0 0 0 1
A1 = , A2 = ,
0 0 0 0
have the characteristic equation s2 = 0, however, they are not similar since it
would require T ∈ C2×2 , det T 6= 0, such that T −1 A1 T = A2 , which is clearly not
possible.
• Similar matrices arise in the study of linear differential equations: consider a
spring-mass system given by the second order ODE
mẍ + cẋ + kx = 0,
where x represents the mass position and m, c, and k are the mass, damping and
stiffness parameters, respectively. We can define the following state variables:
x1 := x and x2 = ẋ and the corresponding state vector,
x
x= 1 .
x2
The differential equations for the state variables can be derived by differentiating
the definitions:
ẋ1 = ẋ = x2
1 c k
ẋ2 = ẍ = (−cẋ − kx) = − x2 − x1 .
m m m
These coupled, first order, linear differential equations can be organized into a
matrix-vector format:
ẋ1 0 1 x1
= k c ,
ẋ2 − m − m x2
| {z }
A1
which can be succinctly written
ẋ = A1 x.
The choice of state variables is not unique. For example, if we define z1 := ẋ and
z2 := x, with a new state vector
z
z= 1 ,
z2
Then you can confirm ż = A2 z where
c
− m − mk
A2 = .
1 0
5
Thus, the “A” matrix depends on the choice of state variables, however, both differen-
tial equations describe the same physical system so its seems reasonable that there
is a deeper connection between A1 and A2 . Note that a nonsingular matrix links x and
z:
x1 0 1 z1
= ,
x2 1 0 z2
| {z }
T
or, succinctly, x = T z. The time derivatives can also be related:
d d d
ẋ = (T z) = (T )z + T (z) = T ż,
dt dt dt
since d(T )/dt = 0 as T is a time-invariant matrix. Thus,
ẋ = A1 y ⇒ T ż = A1 T z ⇒ ż = T −1 A1 T z,
so
A2 = T −1 A1 T.
In other words, A1 and A2 are similar.
6
Matrices with distinct eigenvalues
Theorem: Suppose the eigenvalues {λ1 , λ2 , . . . , λn } of A ∈ Cn×n are distinct, i.e.
λl 6= λk for k 6= l, then the set of corresponding eigenvectors is linear independent
Proof sketch: Let the e-vecs corresponding to the e-vals be denoted {t1 , t2 , . . . , tn }
and form the matrix T := [t1 t2 · · · tn ] ∈ Cn×n . It is easily shown that the e-vals and
e-vecs of A satisfy AT = T Λ where
λ1 0 0 · · · · · · 0
0 λ2 0 · · · · · · 0
0 0 λ ...
..
3 .
Λ = .. .. . . . . ∈ Cn×n
. . . . . . .
.
. .
. . .
.. .. . . λn−1 0
0 0 ··· ··· 0 λn
Assume that there exists a vector v ∈ Cn , v 6= 0, such that T v = 0; in other words,
the e-vecs are linearly dependent. The objective is to show that v 6= 0 is contradicted
when the e-vals are distinct. Consider the following expressions:
(A − λn I)T v = 0 =⇒ T (Λ − λn I)v = 0
(A − λn−1 I)(A − λn I)T v = 0 =⇒ T (Λ − λn−1 I)(Λ − λn I)v = 0
(A − λn−2 I)(A − λn−1 I)(A − λn I)T v = 0 =⇒ T (Λ − λn−2 I)(Λ − λn−1 I)(Λ − λn I)v = 0
..
.
(A − λ2 I)(A − λ3 I) · · · (A − λn−1 I)(A − λn I)T v = 0
=⇒ T (Λ − λ2 I)(Λ − λ3 I) · · · (Λ − λn−1 I)(Λ − λn I)v = 0
Considering the right hand side of the last expression we see that Λ − λk I is diagonal
and that its (k, k)th element is zero, k = 2, 3, . . . , n. Thus, the product of the matrices
between T and v in this expression is
ψ 0 0··· 0
0 0 · · · 0
∈ Cn×n ,
.. .. . . .
.
. . . .
0 0 ··· 0
7
where ψ = (λ1 − λ2 )(λ1 − λ3 ) · · · (λ1 − λn−1 )(λ1 − λn ). The entire expression reduces to
t1 ψv1 = 0,
where vk denotes the kth element of v. Since t1 6= 0 by definition (it’s an e-vec), and
ψ 6= 0 because the e-vals are distinct, then v1 = 0.
We next consider the expression before the preceding one, i.e.
T (Λ − λ3 I)(Λ − λ4 I) · · · (Λ − λn−1 I)(Λ − λn I)v = 0,
and show, in a manner entirely analogous to that above, that the product of matrices
between T and v reduces to
φ 0 0 ··· 0
0 µ 0 · · · 0
..
0 0 0 . ∈ Cn×n ,
. . . . . ...
.. ..
0 0 ··· ··· 0
where
φ = (λ1 − λ3 )(λ1 − λ4 ) · · · (λ1 − λn−1 )(λ1 − λn )
µ = (λ2 − λ3 )(λ2 − λ4 ) · · · (λ2 − λn−1 )(λ2 − λn )
The expression then reduces to
t1 φv1 + t2 µv2 = 0.
Having already established that v1 = 0, v2 = 0 because t2 6= 0 and µ 6= 0. This proce-
dure is continued until it is shown that all elements of v are zero, thereby contradicting
the assumption of linear dependence of the e-vecs of A. QED
Fact:
• the theorem can be used to show that a matrix with distinct eigenvalues can be
“diagonalized” by a similarity transform whose column vectors are the eigenvec-
tors of the matrix (just rearrange AT = T Λ to T −1 AT = Λ because it was shown
that T is invertible)
8
Some important classes of matrices
Some definitions:
1. A ∈ Cn×n is hermitian if A∗ = A (if A ∈ Rn×n then the matrix is real symmetric if
AT = A); eg.
1 1+j
A1 = is hermitian
1−j 2
1 1+j
A2 = is not hermitian
1−j 2+j
2. U ∈ Cn×n is unitary if U ∗ U = I (if U ∈ Rn×n then it is called an orthogonal matrix);
eg. " #
− √13 √2
6
U1 = is unitary
− √13 + j √13 − √16 + j √16
" #
√1 √1
U2 = 2 2 is orthogonal
− 12
√ √1
2
" #
1 1 1
− 3
√ − 3 −j 3
√ √
U3 = 1 1 √1
is unitary and hermitian
− √3 + j √3 3
3. S ∈ Rn×n is skew-symmetric if −AT = A
4. A ∈ Cn×n is diagonal if only non-zero elements are on its diagonal, in other words
(A)ij = 0 when i 6= j
5. A ∈ Cn×n is normal if A∗ A = AA∗
6. A ∈ Cn×n is upper triangular if all elements below the diagonal are zero, in other
words (A)ij = 0 when i > j
7. A ∈ Cn×n is lower triangular if all elements above the the diagonal are zero, in
other words (A)ij = 0 when i < j
9
Some more theorems:
Theorem: The e-vals of a hermitian matrix A ∈ Cn×n are real.
Proof: Let λ ∈ C and v ∈ Cn be an e-val/vec of A, so v∗ Av = λv∗ v; since (v∗ Av)∗ =
v∗ Av, then λ∗ v∗ v = λv∗ v, and since v∗ v = kvk22 > 0, λ∗ = λ. Note: if A is real
symmetric, then the corresponding eigenvectors can be chosen to be real too.
Example: The e-vals of A1 defined above are λ1 = 0 and λ2 = 3.
Theorem: The e-vecs of a hermitian matrix corresponding to distinct e-vals are or-
thogonal. In other words, if λ1 and λ2 are e-vals such that λ1 6= λ2 , then their e-vecs
are orthogonal: v2∗ v1 = 0.
Proof: Suppose v1 , v2 ∈ Cn correspond to distinct eigenvalues λ1 , λ2 ∈ R; consider
v2∗ Av1 = λ1 v2∗ v1 and note that
∗
v2∗ A = ((v2∗ A)∗ )
= (A∗ v2 )∗
= (Av2 )∗
= (λ2 v2 )∗
= v2∗ λ2 (recall λ2 = λ∗2 because the e-vals are real)
Thus, λ2 v2∗ v1 = λ1 v2∗ v1 ⇒ v2∗ v1 = 0, since λ1 6= λ2 .
Eg. The e-vecs of A1 are orthogonal,
" # " #
√1 + j √1 √1 + j √16
v1 = 3 3 , v2 = 6
− √13 √2
6
Theorem: The e-vals of a skew-symmetric matrix are zero or purely imaginary.
Proof: Let λ ∈ C and v ∈ Cn be an e-val/vec of A, so v∗ Av = λv∗ v; since (v∗ Av)∗ =
−v∗ Av, then λ∗ v∗ v = −λv∗ v, and since v∗ v = kvk22 > 0, λ∗ = −λ.
10
A diagonalizing theorem for hermitian matrices
Theorem: A hermitian matrix can be diagonalized by a unitary similarity transform (if
the matrix is real symmetric then it can be diagonalized by an orthogonal similarity
transform)
Remark: We’ve already seen that the e-vecs of a hermitian matrix are orthogonal
when pertaining to distinct e-vals. Thus, if the e-vals are all distinct then the previous
diagonalizing theorem states that the similarity transformation matrix formed from the
e-vecs will have orthogonal columns, which can then be normalized so that the trans-
formation matrix is unitary, i.e. T ∗ T = I. This theorem states that more is possible,
namely the distinctness of the e-vals is not necessary for diagonalization.
Proof: The proof proceeds by induction. Consider the n = 2 case. Let
a11 a12
A= ∗ , ∈ C2×2 , A∗ = A,
a12 a22
where there exists λ1 ∈ R and v1 6= 0 ∈ C2 such that Av1 = λ1 v1 . We assume that v1 is
normalized to 1, i.e. kv1 k2 = 1. Now construct v2 6= 0 ∈ C2 , orthogonal to v1 and also
normalized, i.e. v1∗ v2 = 0 and kv2 k2 = 1 (see the Gram-Schmidt orthonormalization
process at the end of these notes for a refresher on how to do this). Define T = [v1 v2 ]
then ∗ ∗
v1 v1 v1∗ v2
∗ v1 1 0
T T = ∗ v1 v2 = ∗ = ,
v2 v2 v1 v2∗ v2 0 1
so T is unitary. Now
∗ ∗ ∗ ∗
∗
v v λ1 v v 1 v Av 2 λ 1 v Av 2
T ∗ AT = 1∗ Av1 Av2 = 1∗ λ1 v1 Av2 = 1 1 1
= .
v2 v2 λ1 v2∗ v1 v2∗ Av2 0 v2∗ Av2
However, T ∗ AT is hermitian itself so v1∗ Av2 = 0. Finally, T ∗ AT is similar to A so they
have the same eigenvalues, denoted λ1 and λ2 , hence, v2∗ Av2 = λ2 ,
∗ λ1 0
T AT = .
0 λ2
11
Proceeding by induction, suppose the result is true for hermitian matrices of dimension
1, 2, . . . , N − 1, i.e. given Ak ∈ Ck×k , A∗k = Ak , there exists a Tk ∈ Ck×k , Tk∗ Tk = I and
Tk∗ Ak Tk = diag(λ1 , λ2 , . . . , λk ), k = 1, . . . , N − 1. Consider AN ∈ CN ×N , A∗N = AN ,
and let v1 6= 0 ∈ CN be an e-vec corresponding to e-val λ1 ∈ R, i.e. AN v1 = λ1 v1 .
Construct v2 , v3 , . . . , vN ∈ CN such that T := [v1 v2 · · · vN ] is unitary, i.e. T ∗ T = I.
Now consider
T ∗ AN T = T ∗ AN v1 AN v2 · · · AN vN
v1∗
v∗
= ..2 λ1 v1 AN v2 · · · AN vN
.
∗
vN
λ1 v1∗ v1 v1∗ AN v2 · · · v1∗ AN vN
λ v∗ v v∗ A v · · · v∗ A v
1 2 1 2 N 2 2 N N
λ v∗ v v∗ A v · · · v∗ A v
= 1 3 1 3 N 2 3 N N
.
.. .
.. ..
.
∗ ∗ ∗
λ1 v N v1 vN A N v2 · · · vN AN vN
λ1 v1∗ AN v2 · · · v1∗ AN vN
0 v∗ A v · · · v∗ A v
2 N 2 2 N N
0 v∗ A v · · · v∗ A v
= 3 N 2 3 N N .
.. .
.. ..
. .
∗ ∗
0 vN AN v2 · · · vN A N vN
However, due to the fact that T ∗ AT is hermitian, v1∗ AN vk = 0, k = 2, 3, . . . , N , thus
λ1 0 · · · 0
0
∗
T AN T = .. ,
. AN −1
0
where the lower right N − 1 × N − 1 block has been defined as AN −1 . Note that
det(sI − AN ) = det(sI − T ∗ AN T ) = det(s − λ1 ) det(sI − AN −1 ),
so if the e-vals of AN are {λ1 , λ2 , . . . , λN } then this shows that the e-vals of AN −1 must
be {λ2 , λ3 , . . . , λN }. But by hypothesis there exists TN −1 ∈ CN −1×N −1 , TN∗ −1 TN −1 = I
12
such that
λ2 0 · · · · · ·
0
0 λ ... ..
3 .
.. . . . . ..
TN∗ −1 AN −1 TN −1 =. . . ... .
. ... λ
..
N −1 0
0 ··· ··· 0 λN
Define
1 0
TN := T ∈ CN ×N ,
0 TN −1
which is unitary,
1 0 1 0
TN∗ TN = ∗ T ∗T
0 TN −1 0 TN −1
1 0 .
= ∗
0 TN −1 TN −1
=I
Finally,
1 0 1 0
TN∗ AN TN = ∗ T ∗ AN T
0 TN −1 0 TN −1
1 0 λ1 0 1 0
=
0 TN∗ −1 0 AN −1 0 TN −1
λ1 0
= ∗
0 TN −1 AN −1 TN −1
λ1 0 · · · 0
0 λ
2
= ..
. . . .
0 λN
QED
Example: Consider the matrix A1 again. Since the e-vals are distinct, the e-vecs
(given on page 6) are linearly independent. Furthermore they are orthogonal because
A1 is hermitian. Thus, if they are normalized and form the matrix U defined as
" #
√1 + j √1 √1 + j √1
U := v1 v2 = 3 3 6 6 ,
− √13 √2
6
13
you can confirm that U ∗ U = U U ∗ = I and U ∗ A1 U = diag(λ1 , λ2 ).
Review: Gram-Schmidt orthogonalization
Note: This section requires the notion of an inner product and the norm that is derived
from the inner product. It is assumed that you are familiar with the inner product on Cn
and the Euclidean norm. More details concerning norms are given in future lectures.
Suppose you are given a set of p ≤ n linearly independent vectors in Cn , {v1 , v2 , . . . , vp }.
These vectors span a p-dimensional subspace of Cn . The objective behind Gram-
Schmidt orthogonalization is to calculate another basis for this subspace in which the
basis vectors are orthonormal. Orthonormal bases are not unique so there is a lot of
freedom in the construction process. To get started you choose a vector from the set
and normalize it to unit length (the Euclidean norm, denoted k · k2 , is used to measure
the vector length). Suppose we start with v1 and normalize it to the first vector in our
new basis:
v1
u1 :=
kv1 k2
where the new vector is denoted u1 . The second vector is selected by removing the
component of v2 that is collinear with u1 . This is simply done using the inner product:
ũ2 = v2 − hv2 , u1 iu1 ,
then normalize it
ũ2
u2 := .
kũ2 k2
This yields the first two vectors in the new orthonormal basis. The remaining p − 2
basis vectors are recursively defined. For example, we now remove the components
in v3 that are collinear with u1 and u2 , and then normalize this vector to obtain u3 :
ũ3 = v3 − hv3 , u1 iu1 − hv3 , u2 iu2
ũ3
u3 = .
kũ3 k2
The set {u1 , u2 , u3 } is now orthogonal and additional orthogonal vectors can be added
to the set using the recursive process outlined above.
14
For example, consider the three linearly independent vectors in R4 ,
1 1 1
0 1 1
v1 = 0 , v2 = 0 , v3 = 1
0 0 0
It is clear these vectors span the 3 dimensional subspace of vectors of the form
α
β
v= γ , α, β, γ ∈ R.
0
We can generate an orthonormal basis for this subspace using the Gram-Schmidt
procedure:
1
u1 := v1
kv1 k2
1
u2 := ũ2 , where ũ2 := v2 − u∗1 v2 u1
kũ2 k2 |{z}
hv2 ,u1 i
1
u3 := ũ3 , where ũ3 := v3 − u∗1 v3 u1 − u∗2 v3 u2 ,
kũ3 k2 |{z} |{z}
hv3 ,u1 i hv3 ,u2 i
which yields
1 0 0
0 1 0
u1 =
0 ,
0 ,
u2 = u3 =
1
0 0 0
On the other hand, if we choose a different initial “generating” vector we create another
set of orthogonal basis vectors that span the same subspace. In other words, if we
relabel the vectors
1 1 1
1
, v2 = 1 , v3 = 0
v1 =
1 0 0
0 0 0
15
and apply the orthonormalization algorithm the following basis is obtained,
1 1 1
√ √ √
3 6
√1 √1 √21
3 6 −
u1 = 1 , u2 = 2 , u3 = 2
√3 − √6 0
0 0 0
16