0% found this document useful (0 votes)

24 views89 pages

Linear Algebra Concepts and Applications

Q: Explain why a triangular matrix's eigenvalues are particularly straightforward to determine.

The eigenvalues of a triangular matrix are straightforward to determine because they are the entries on the matrix's diagonal. This direct relationship simplifies eigenvalue computation by avoiding the need for solving the characteristic polynomial for potentially complex roots .

Q: What conditions must an augmented matrix meet for it to represent a consistent system of linear equations?

An augmented matrix represents a consistent system of linear equations if there is a row in which all the coefficients are zero, but the last column is non-zero, indicating a contradiction. Furthermore, other rows should not form a redundant or contradictory equation to the rest. This arrangement ensures that each equation in the matrix is feasible and contributes uniquely to solving the set of equations .

Q: Explain the properties of vector spaces that allow the formation of vector subspaces.

Vector spaces allow for the formation of subspaces because they retain the operations of vector addition and scalar multiplication, which ensure closure within the space. A subspace must be a non-empty subset of a vector space V, and any linear combination of vectors in this subset remains within the subset, thereby fulfilling the vector space axioms .

Q: Discuss the possible solutions of a system of linear equations and the implications of each type of solution.

A system of linear equations can have no solution, a unique solution, or infinitely many solutions. A system with no solution is termed as inconsistent, meaning the equations represent parallel lines that never intersect. A system with exactly one solution has a point of intersection, suggesting that the equations are consistent and independent. When there are infinitely many solutions, it usually implies dependency among the equations, representing the same line or plane, thus depicting consistency with infinite intersections .

Q: How is the concept of matrix equality defined, and what conditions must be met for two matrices to be considered equal?

For two matrices to be considered equal, they must have the same dimensions (m x n), and their corresponding entries must be identical. This condition ensures that the matrices are equivalent in structure and content .

Q: Analyze how diagonalization simplifies computations with matrices.

Diagonalization simplifies computations with matrices by converting a matrix into a diagonal form, where calculative processes such as powers of the matrix, determinants, and inverses become straightforward. This is because diagonal matrices have non-zero entries only along the diagonal, reducing complexity in these operations and enabling more efficient computations, especially when assessing properties like eigenvalues and eigenvectors .

Q: What criteria must a subset of vectors meet to be considered a basis for a vector space?

A subset of vectors forms a basis for a vector space if it is linearly independent and spans the vector space, meaning every vector in the space can be uniquely expressed as a linear combination of the subset. This ensures that the subset is both a minimal generating set and a maximal linearly independent set .

Q: What role do eigenvalues play in determining the similarity between matrices?

Eigenvalues are crucial in determining matrix similarity because two matrices are similar if there exists an invertible matrix that transforms one into the other without changing eigenvalues. Having the same eigenvalues implies a form of equivalence in their linear transformations, allowing one matrix to be expressed as a transformation of the other .

Q: What is the significance of LU decomposition in the context of linear algebra applications?

LU decomposition, initiated by Turing, is significant because it breaks down a matrix into a product of a lower triangular matrix and an echelon matrix, simplifying complex computations required in linear algebra. This method is crucial in computer applications across fields like graphics and robotics, where computations are lengthy and error-prone when done manually .

Q: How is the dimension of a vector space defined, and what does it imply about the vector space?

The dimension of a vector space is the number of vectors in its basis. It implies the minimal number of independent directions required to describe any vector in the space fully. For finite dimensional spaces, it indicates the size of the space in terms of the basis vectors .

The document outlines the fundamentals of linear algebra, covering topics such as matrices, systems of linear equations, vector spaces, eigenvalues, and applications in various fields. It provides historical context, definitions, and operations related to matrices, emphasizing their significance in mathematics and practical applications in physics, engineering, and computer science. The content is structured into chapters, detailing both theoretical concepts and practical applications of linear algebra.

Uploaded by

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

24 views89 pages

Linear Algebra Concepts and Applications

Uploaded by

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

LINEAR ALGEBRA AND IT’S APPLICATIONS

CONTENTS
1) Basics of linear algebra(Preliminaries) ………………….(1- 15)
1.1 Introduction to linear algebra
1.2 History
1.3 Definition of matrix
1.4 Matrix Operations
1.5 Determinant
2) System of linear equation…………………….………….(16 - 30)
2.1 An Introduction
2.2 Geometric interpretation of solution
2.3 Row reduction and Echelon form
2.4 Vector Equation
2.5 Matrix Equation Ax=b
2.6 Linear independence
3) Vector Space……………..……………………………….(31-46)
3.1 Definition and example
3.2 Vector subspace
3.3 Basis and dimension
3.4 Rank
3.5 Linear transformation
3.6 The matrix of linear transformation
4) Eigenvalue, Eigenvector and Diagonalizable…….…(47- 61)
4.1 Eigenvalue and Eigenvector
4.2 Diagonalizable and triangulable operator
4.3 Jordan canonical form
5) Inner product space……………………………………….(62-70)
5.1 inner product space(introduction)
5.2 orthogonality
5.3 Gram -Schmidt orthogonality
6) Application of linear algebra……………………………(71-85)
6.1 An Application to Output and Input Economic Model
6.2 An Application to computer Graphics
6.3 An Application to Markov chain
6.4 PCA of Image Comparison
7) Reference…………………………………………………..(86)
Chapter-1
Basics of linear Algebra
1.1) Introduction
Linear algebra is a main important part of the mathematics. It is a principal branch
of mathematics that is related to mathematical structures closed under the
operations of addition and scalar multiplication and that includes the theory of
systems of linear equations, matrices, determinants, vector spaces, and linear
transformations. Linear algebra, is a mathematical discipline that deals with
vectors and matrices and, more generally, with vector spaces and linear
transformations. Unlike other parts of mathematics that are frequently invigorated
by new ideas and unsolved problems, linear algebra is very well understood. Its
value lies in its many applications, from mathematical physics to modern algebra
and its usage in the engineering and medical fields such as image processing and
analysis.

It is mostly used in Physics and Engineering as it helps to define the basic objects
such as planes, lines and rotations of the object. It allows us to model many natural
phenomena, and also it has a computing efficiency.

1.2) History
Linear Algebra is concerning linear equation. System of Linear Equation arose in
Europe with the introduction in 1637 by Rene Descartes of coordinates in
geometry. The first systematic method for solving linear system used determinants
and were first considered by Leibniz in 1693, and in 1750 Cramer presented a
method for solving systems of linear equations, which is today known as Cramer’s
Rule. This is the first foundation stone on the development of linear algebra and
matrix theory. At the beginning of the evolution of digital computers, the matrix
calculus has received very much attention. John von Neumann and Alan Turing
were the world-famous pioneers of computer science. They introduced significant
contributions to the development of computer linear algebra. In 1947, Von
Neumann and Goldstine investigated the effect of rounding errors on the solution

1|P a ge
of linear equations. One year later, Turing initiated a method for factoring a matrix
to a product of a lower triangular matrix with an echelon matrix (the factorization
is known as LU decomposition). At present, computer linear algebra is broadly of
interest. This is due to the fact that the field is now recognized as an absolutely
essential tool in many branches of computer applications that require computations
which are lengthy and difficult to get right when done by hand, for example: in
computer graphics, in geometric modeling, in robotics, etc.

1.3) Definition of Matrix

A rectangular array of numbers is called a matrix.

The horizontal arrays of a matrix are called its rows and the vertical arrays are
called its columns. A matrix A having m rows and n columns is said to be a matrix
of size order m × n and can be represented in either of the following forms:
𝑎11 𝑎12 ⋯ 𝑎1𝑛
𝑎21 𝑎22 ⋯ 𝑎2𝑛
A=[ ⋮ ⋮ ⋱ ⋮ ]
𝑎𝑚1 𝑎𝑚2 … 𝑎𝑚𝑛

Where aij is the entry at the intersection of the ith row and jth column. One writes
A∈ Mm,n(F) to mean that A is a m×n matrix with entries from the set F, or in short
A = [aij ] or A = (aij ). We write A[i, :] to denote the i-th row of A, A[:, j] to denote
the j-th column of A and aij or (A)ij or A[i, j], for the (i, j)-th entry of A.
8 3 2
1 −5 7
For example, A=[ ] , B=[−1 5 9]
3 12 6
4 −7 2
1 3+𝑖 7 7
C =[ ] then, C[1,:]= [1 3 + 𝑖 7], C[:,3]=[ ]
4 5 6 + 5𝑖 6 + 5𝑖
In general, a matrix with m rows and n columns is referred to as an m ×n matrix or
as having size m ×n. A matrix of size 1×n is called a row matrix, whereas one of
size m×1 is called a column matrix. Matrices of size n×n for some n are called
square matrices.

2|P a ge
Two points (x1, y1) and (x2, y2) in the plane are equal if and only if they have the
same coordinates, that is x1 = x2 and y1 = y2. Similarly, two matrices A and B are
called equal (written A = B) if and only if:

1. They have the same size (m x n).

2. Corresponding entries are equal.

If the entries of A and B are written in the form A = [aij], B = [bij], described
earlier, then the second condition takes the following form:

A = [aij] = [bij] means aij = bij for all i and j

Example:-
𝑎 𝑏 1 2 −1 1 0
Given A =[ ] , B =[ ] and C =[ ] discuss the possibility that A
𝑐 𝑑 3 0 1 −1 2
= B, B = C, A = C.

A = B is impossible because A and B are of different sizes: A is 2×2 whereas B is

2×3. Similarly, B = C is impossible. But A = C is possible provided that
𝑎 𝑏 1 0
corresponding entries are equal: [ ] =[ ] means a = 1, b = 0, c = −1, and
𝑐 𝑑 −1 2
d = 2.

Special Matrices
Let A = [aij] be an m × n matrix with aij ∈ F.

1. Then A is called a zero-matrix, denoted 0 (order is mostly clear from the

context), if aij = 0 for all i and j.
0 0 0 0 0
For example, 02×2 = [ ] and 02×3 = [ ].
0 0 0 0 0
2. Then A is called a square matrix if m = n and is denoted by A ∈ Mn(F).
3. Let A ∈ Mn(F).
(a) Then, the entries a11, a22, . . . , ann are called the diagonal entries of A.
They constitute the principal diagonal of A.
(b) Then, A is said to be a diagonal matrix, denoted diag(a11, . . . , ann), if
aij =0 for i ≠ j.

3|P a ge
4 0
For example, the zero matrix 0n and [ ] are diagonal matrices.
0 1
(c) Then, A = diag(1, . . . , 1) is called the identity matrix, denoted In, or in
1 0 0
1 0
short I. For example, I2 = [ ] and I3 = [0 1 0]
0 1
0 0 1
(d) If A = αI, for some α ∈ F, then A is called a scalar matrix.
(e) Then, A is said to be an upper triangular matrix if aij = 0 for i > j.
(f) Then, A is said to be a lower triangular matrix if aij = 0 for i < j.
(g) Then, A is said to be triangular if it is an upper or a lower triangular matrix.
1 2 4 0 0 0
For example, [0 0 8 ]is upper triangular, [1 0 0] is lower triangular
0 0 −1 0 1 1
and the matrices 0, I are upper as well as lower triangular matrices

4. For 1 ≤ i ≤ n, define ei = In[:, i], a matrix of order n × 1. Then the column

matrices e1, . . . , en are called the standard unit vectors or the standard basis of
Mn,1(C) or Cn . The dependence of n is omitted as it is understood from the context.
1
2 1 3
For example, if e1 ∈ C then, e1 = [ ] and if e1 ∈ C then e1 = [0].
0
0

1.4) Matrix Operation

a) Transpose and Conjugate Transpose of Matrices
Definition:- Let A = [aij] ∈ Mm,n(C). Then
1. The transpose of A, denoted AT, is an n × m matrix with (AT )ij = aji, for all
i, j.
2. The conjugate transpose of A, denoted A*, is an n × m matrix with
(A∗)ij = ̅𝑎̅̅𝑗𝑖̅(the complex-conjugate of aji), for all i, j.
1 4+𝑖 1 0 1 0
If A = [ ] then AT = [ ] and A* = [ ] .Note
0 1−𝑖 4+𝑖 1−𝑖 4−𝑖 1+𝑖
that A*≠ AT.
1
Note that if x = [ ] is a column matrix then xT = [1 0] and x* are row
0
matrix.

4|P a ge
Theorem:- For any matrix A, (A*)*= A and (AT )T = A.
Proof:- Let A = [aij ], A* = [bij ] and (A*)* = [cij]. Clearly, the order of A and (A*)*
is the same. Also, by definition cij =𝑏̅̅̅
𝑗𝑖 = ̿̿̿̿
𝑎𝑖𝑗 = aij for all i, j.

b) Sum and Scalar Multiplication of Matrices

Definition:- Let A = [aij ], B = [bij ] ∈ Mm,n(C) and k ∈ C.
1. Then the sum of A and B, denoted A + B, is defined to be the matrix
C = [cij] ∈ Mm,n(C) with cij = aij+bij for all i, j.
2. Then, the product of k ∈ C with A, denoted kA, equals kA = [kaij ] = [aijk]
= Ak.
Example:- If A = [1 4 5], B =[1 −4 5] then A + B = [2 0 6]and
0 1 2 1 1 7 1 2 9
5 20 25
5A = [ ].
0 5 10
Theorem:- Let A, B, C ∈ Mm,n(C) and let k,l ∈ C. Then
1. A+B=B+A (commutativity)
2. (A + B) + C = A + (B + C) (associativity)
3. k(lA) = (kl)A.
4. (k + l)A = kA + lA.

Proof:- (1). Let A = [aij ] and B = [bij ]. Then by definition A + B = [aij ] + [bij ]
= [aij + bij ] = [bij + aij ] = [bij ] + [aij ] = B + A as complex numbers commute.
Similarly others.

Definition:- Let A ∈ Mm,n(C). Then

1. The matrix 0m×n satisfying A + 0 = 0 + A = A is called the additive identity.
2. The matrix B with A + B = 0 is called the additive inverse of A, denoted
−A = (−1)A.

5|P a ge
c) Multiplication of matrices
Definition:- Let A = [aij ] ∈ Mm,n(C) and B = [bij ] ∈ Mn,r(C). Then, the product of
A and B, denoted AB, is a matrix C = [c ij ] ∈ Mm,r(C) such that for 1 ≤ i ≤ m, 1 ≤ j
≤r
𝑏1𝑗
𝑏
cij = A[i, :]B[:, j] = [ai1, ai2, . . . , ain] 2𝑗 = ai1b1j + ai2b2j + · · · + ainbnj
⋮
[𝑏𝑛𝑗 ]

=∑𝑛𝑘=1 𝑎𝑖𝑘 𝑏𝑘𝑗 .

Thus, AB is defined if and only if the number of columns of A = the number of

rows of B.
1 −1
3 4 5
Example:- Let A = [2 0 ] and B = [ ].
−1 0 1
0 1
(AB)11 = 1 · 3 + (−1) · (−1) = 3 + 1 = 4. Similarly, compute the rest and
4 4 4
verify that AB = [ 6 8 10].
−1 0 1
Note:-
1. Let A ∈ Mm,n(C) and B ∈ Mn,p(C). Then the product AB is defined and
observe the following:-
a) If m≠p then the product BA is NOT defined.
b) Let m = p. Here BA and AB can still be different.
1
For example, if A =[2] and B = [−1 2 3] then
3
−1 2 3
AB = [−2 4 6].whereas BA = −1 + 4 + 9 = 12.
−3 6 9

6|P a ge
c) If m = n = p, then the orders of AB and BA are same. Even then AB may
1 1 1 1
NOT equal BA. For example, if A =[ ] and B =[ ] then
−1 −1 1 1
2 2 0 0
AB = [ ] whereas BA = [ ] . Thus, AB≠ BA and hence
−2 −2 0 0
(A + B)2 = A2 + AB + BA + B2≠A2 + B2 + 2AB.
2 1 3 3
Whereas if C = [ ] then BC = CB = [ ] = 3A≠ A = CA.
1 2 3 3
Definition:- Two square matrices A and B are said to commute if AB = BA.
Note:- Let A ∈ Mm,n(C), B ∈ Mn,p(C) and C ∈ Mp,q(C).

1. Then (AB)C = A(BC), i.e., the matrix multiplication is associative.

2. For any k ∈ C, (kA)B = k(AB) = A(kB).
3. Then A(B + C) = AB + AC, i.e., multiplication distributes over addition.
4. If A ∈ Mn(C) then AIn = InA = A.

d) Inverse of Matrices
Definition:- Let A ∈ Mn(C). Then
1. B ∈ Mn(C) is said to be a left inverse of A if BA = In.
2. C ∈ Mn(C) is called a right inverse of A if AC = In.
3. A is invertible (has an inverse) if there exists B ∈ Mn(C) such that
AB = BA = In.

Lemma:- Let A ∈ Mn(C). If there exist B,C ∈ Mn(C) such that AB = In and
CA = In then B = C, i.e., If A has a left inverse and a right inverse then they are
equal. Proof:- Note that C = CIn = C(AB) = (CA)B = InB = B.

This implies that whenever A is invertible, the inverse is unique. Thus, we denote
the inverse of A by A−1. That is, AA−1 = A−1A = I.

Theorem:- Let A and B be two invertible matrices. Then,

1. (A−1 )−1 = A.
2. (AB)−1 = B−1A−1.
3. (A* )−1 = (A−1)*.

7|P a ge
Proof:- (1). Let B = A−1. Then AB = BA = I. Thus, by definition, B is invertible
and B−1 = A. Or equivalently, (A−1) −1 = A.

(2). By associativity (AB)(B−1A−1 ) = A(BB−1 )A−1 = I = (B−1A−1 )(AB).

(3). As AA−1 = A−1A = I, we get (AA−1 )* = (A−1A)* = I* . Or equivalently,

(A−1)*A* = A* (A−1)* = I. Thus, by definition (A*)−1 = (A−1 )*.

Some More Special Matrices

1) For 1 ≤ k ≤ m and 1 ≤ l ≤ n, define ekl ∈ Mm,n(C) by (ekl)ij =
1, 𝑖𝑓 (𝑘, 𝑙) = (𝑖, 𝑗)
{
0, 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒
Then, the matrices ekl for 1 ≤ k ≤ m and 1 ≤ l ≤ n are called the standard
basis elements for Mm,n(C).
1 0 0 1 0 1 0
So, if ekl ∈ M2,3(C) then e11 = [ ]. =[ ] [1 0 0] , e12 =[ ]
0 0 0 0 0 0 0
1 0 0 0 0
= [ ] [0 1 0] and e22 =[ ] =[ ] [0 1 0].
0 0 1 0 1
In particular, if eij ∈ Mn(C) then eij = eie j =eie∗j , for 1 ≤ i, j ≤ n.
T

2) Let A ∈ Mn(R). Then

1 3
a) A is called symmetric if AT = A. For example, A = [ ].
3 2
0 3
b) A is called skew-symmetric if AT = −A. For example, A = = [ ]
−3 0
1 1 1
c) A is called orthogonal if AAT = ATA = I. For example, A = [ ]
√2 1 −1
d) A is said to be a permutation matrix if A has exactly one non-zero
entry, namely 1, in each row and column. For example, In for each
0 1 0 1 0 0 0 0 1
0 1
positive integer n, =[ ] , = [1 0 0], [0 0 1]and [0 1 0]are
1 0
0 0 1 0 1 0 1 0 0
permutation matrices.
3) Let A ∈ Mn(C). Then
1 𝑖
a) A is called normal if A*A = AA*. For example, [ ] is a normal
𝑖 1
matrix.

8|P a ge
1 1+𝑖
b) A is called Hermitian if A* = A. For example, A =[ ].
1−𝑖 2
c) A is called skew-Hermitian if A* = −A. For example,
0 1+𝑖
A= [ ].
−1 + 𝑖 0
1 1+𝑖 1
d) A is called unitary if AA* = A*A = I. For example, A = [ ].
√3 −1 1−𝑖
4) A vector u ∈ Mn,1(C) such that u.u = 1 is called a unit vector.
1 0
5) A matrix A is called idempotent if A2 = A. For example, A = [ ] is
1 0
idempotent.
6) Let A ∈ Mn(C). Then, A is said to be nilpotent if there exists a positive
integer n such that An = 0.

1.5) DETERMINANT
Definition:- Let A be a square matrix of order n. Then, the determinant
of A, denoted det(A) (or | A | ) is defined by
𝑎, 𝑖𝑓 𝐴 = [𝑎](𝑐𝑜𝑟𝑟𝑒𝑠𝑝𝑜𝑛𝑑𝑠 𝑡𝑜 𝑛 = 1)
det(A) = {
∑𝑛𝑗=0(−1)1+𝑗 det(𝐴𝑖𝑗 ) 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒.

Definition:- Cofactor expansion of a Matrix

Assume that determinants of (n−1)×(n−1) matrices have been defined. If A = [aij ]
is n×n define det A = a11c11(A) +a12c12(A) +···+a1nc1n(A)

This is called the cofactor expansion of det A along row 1.

3 4 5
For Example:- Compute the determinant of A = [1 7 2]
9 8 −6
The cofactor expansion along the first row is as follow:

det A= 3c11(A)+4c11(A)+5c13(A)
7 2 1 2 1 7
=3| |-4| |+3| |
8 −6 9 −6 9 8
=3(-58)-4(-24)+5(-55 ) = -353

9|P a ge
Note that the signs alternate along the row (indeed along any row or column). Now
we compute det A by expanding along the first column.

det A= 3c11(A)+1c21(A)+9c31(A)
7 2 4 5 4 5
=3| |-| |+9| |
8 −6 8 −6 7 2
=3(-58)-(-64)+9(-27)

= -353

Statement:- Let A be an n × n matrix.

1. det(In) = 1.
2. If A is a triangular matrix with d1, . . . , dn on the diagonal then
det(A) = d1· · · dn.
3. If A has a row or column of zeros, det A = 0.
4. If two distinct rows (or columns) of A are interchanged, the determinant of
the resulting matrix is − det A.
5. If a row (or column) of A is multiplied by a constant u, the determinant of
the resulting matrix is u(det A).
6. If two distinct rows (or columns) of A are identical, det A =0.

EXAMPLE:-
1 0 0
1 0
1) det A=| | = 1 or det B=|0 1 0|
0 1
0 0 1
6 0 0
2) det A=|0 5 0| = 6x5x1 = 30
0 0 1
1 −6 0
3) |9 1 7|= 0 (because the last row consist of zero)
0 0 0
3 −1 5 5 −1 3
4) |2 8 7 |=− | 7 8 2| (because two column are interchanged)
1 2 −1 −1 2 1
8 1 2 8 1 2
5) |3 0 9 |=3|1 0 3 |
1 2 −1 1 2 −1

10 | P a g e
2 1 2
6) |4 1 4|=0 (because two row are identical)
1 0 1

Result:- Consider matrices [𝐴 𝑋 ] and [𝐴 0 ] in block form, where A and B are

0 𝐵 𝑌 𝐵
𝐴 𝑋 𝐴 0
square matrices. Then det [ ] = det A det B and det [ ] = det A det B.
0 𝐵 𝑌 𝐵
Example:
2 3 1 3 2 1 3 3
1 −2 −1 1 1 −1 −2 1 2 1 1 1
1) det= [ ]= -| |= -| || |
0 1 0 1 0 0 1 1 1 −1 4 1
0 4 0 1 0 0 4 1
= -(-3)(-3) = -9

STATEMENT:-

1) If A and B are n×n matrices, then det(AB) = det A det B

2) If A and B are n×n matrices, then det(A+B) ≠ det A +det B

Example:-

𝑎 𝑏 𝑐 𝑑 ac − bd ad + bc
If A =[ ] and B = [ ] then AB =[ ].
−𝑏 𝑎 −𝑑 𝑐 −(ad + bc) ac − bd
Hence det A det B = det(AB) gives the identity

(a2 +b2)(c2 +d2) = (ac−bd)2 + (ad +bc)2

Theorem:- An n×n matrix A is invertible if and only if det A ≠0. When this is the
case, det(A −1 ) = 𝐝𝐞𝐭𝟏 𝑨

Proof:- If A is invertible, then AA−1= I; so the product theorem gives

1 = det I = det(AA−1 ) = det A det A−1

Hence, det A≠0 and also det A−1 =det1 𝐴.

Conversely, if det A≠ 0, we show that A can be carried to I by elementary row

operations. Certainly, A can be carried to its reduced row-echelon form R, so
11 | P a g e
R = Ek ···E2E1A where the Ei are elementary matrices. Hence the product theorem
gives

det R = det Ek ··· det E2 det E1 det A

Since det E≠ 0 for all elementary matrices E, this shows det R≠ 0. In particular, R
has no row of zeros, so R = I because R is square and reduced row-echelon.

Note:-

1) If A is any square matrix, det AT = det A.

2) A square matrix is called orthogonal if A−1 = AT

Adjugates:-
The adjugate of A, denoted adj(A), is the transpose of this cofactor matrix; in
symbols, adj(A) = [cij(A)]T
Adjoint of a matrix properties:-

1. A(Adj A) = (Adj A)A= AI

2. For a zero matrix 0, adj(0)=0
3. For an identity matrix I , adj(I)=I
4. For an scalar k, adj(kA)= kn-1adj(A)
5. Adj(AT)= (Adj A)T
6. adj(AB)= adj(B)adj(A)

cramer rule:-
If A is an invertible n×n matrix, the solution to the system

Ax = b

of n equations in the variables x1, x2, ..., xn is given by

x1 = det 𝐴1
det 𝐴
, x2 = det 𝐴2
det 𝐴
, ··· , xn = det 𝐴𝑛
det 𝐴

Where, for each k, Ak is the matrix obtained from A by replacing column k by b.

12 | P a g e
For Example:-

Given equation 2x-y=5 and x+y=4

Let’s us write these equation in the form Ax=b

2 −1 𝑥 5
` [ ] [𝑦] = [ ]
1 1 4
2 −1 5 𝑥
A=[ ] , b=[ ] and X=[𝑦]
1 1 4
2 −1
Now, D=|𝐴| =| |=2(1)-1(-1)= 3
1 1
2 5 5 −1
Dy=| |=2(4)-5(1) = 3 and Dx = | | = 5(1)-(-1)(4) =9
1 4 4 1
𝐷𝑥 9
Therefore; x= = =3
𝐷 3

𝐷𝑦 3
y= = =1
𝐷 3

 Polynomial Interpolation
Let take an example

A forester wants to estimate the age (in years) of a tree by measuring the
diameter of the trunk (in cm). She obtains the following data :

Tree 1 Tree 2 Tree 3

Trunk diameter 5 10 15
Age 3 5 6
Estimate the age of a tree with a trunk diameter of 12 cm.

The forester decides to “fit” a quadratic polynomial

p(x) = r0+r1x+r2x2

to the data, that is choose the coefficients r0, r1, and r2 so that p(5) = 3,
p(10) = 5, and p(15) = 6, and then use p(12) as the estimate. These conditions
give three linear equations:

13 | P a g e
r0 + 5r1 + 25r2 = 3

r0 + 10r1 + 100r2 = 5

r0 + 15r1 + 225r2 = 6
7 1
The (unique) solution is r0 = 0, r1 = , and r2 = − , so
10 50

7 1 1
p(x) = x− x2 = x(35−x)
10 50 50

Hence the estimate is p(12)= 5.52

As in Example, it often happens that two variables x and y are related but the
actual functional form y = f(x) of the relationship is unknown. Suppose that for
certain values x1, x2, ..., xn of x the corresponding values y1, y2, ..., yn are
known. One way to estimate the value of y corresponding to some other value a
of x is to find a polynomial.

p(x) = r0 +r1x+r2x2 +···+rn−1xn−1

that “fits” the data, that is p(xi) = yi holds for each i = 1, 2, ..., n. Then the
estimate for y is p(a). As we will see, such a polynomial always exists if the x i
are distinct.

The conditions that p(xi) = yi are

r0 + r1x1 + r2𝑥12 + ···+ rn−1𝑥1𝑛−1 = y1

r0 + r1x2 + r2𝑥22 + ···+ rn−1𝑥2𝑛−1 = y2

r0 + r1xn + r2𝑥𝑛2 + ···+ rn−1𝑥𝑛𝑛−1 = yn

1 𝑥1 𝑥12 … 𝑥1𝑛−1 𝑟0 𝑦0
𝑥22 𝑥2𝑛−1 𝑟1 𝑦1
In matrix form, this is 1 𝑥2 … [⋮]= [⋮]
⋮ ⋮ ⋮ ⋱ ⋮
[1 𝑥𝑛 𝑥𝑛2 … 𝑛−1
𝑥𝑛 ] 𝑟𝑛 𝑦𝑛

Let n data pairs (x1, y1), (x2, y2), ..., (xn, yn) be given, and assume that the xi are
distinct. Then there exists a unique polynomial

p(x) = r0 +r1x+r2x2 +···+rn−1xn−1

14 | P a g e
such that p(xi) = yi for each i = 1, 2, ..., n.

The polynomial is called the interpolating polynomial.

We conclude by evaluating the determinant of the coefficient matrix in

equation. If a1, a2,…, an are number . the determinant

1 𝑎1 𝑎12 … 𝑎1𝑛−1
𝑑𝑒𝑡 1 𝑎2 𝑎22 … 𝑎1𝑛−1 = ∏
1≤𝑗≤𝑖≤𝑛(𝑎𝑖 − 𝑎𝑗 )
⋮ ⋮ ⋮ ⋱ ⋮
[1 𝑎𝑛 𝑎𝑛2 … 𝑎1𝑛−1]

is called a Vandermonde determinant.

Conclusion:-
In this unit, I started with the introduction of topic and it’s history , then I
came across basics concept related to this topic like definition of a matrix
and some examples, matrix operation, determinant, adjoint etc. at last some
application related to determinant.

15 | P a g e
Chapter-2
System of Linear Equations
2.1) Introduction
If a, b, and c are real numbers, the graph of an equation of the form

ax+by = c

is a straight line (if a and b are not both zero), so such an equation is called a linear
equation in the variables x and y. However, it is often convenient to write the
variables as x1, x2, ..., xn, particularly when more than two variables are involved.
An equation of the form

a1x1 +a2x= +···+anx = b

is called a linear equation in the n variables x1, x2, ..., xn. Here a1, a2, ..., an denote
real numbers (called the coefficients of a1, a2, ..., an, respectively) and b is also a
number (called the constant term of the equation). A finite collection of linear
equations in the variables x1, x2, ..., xn is called a system of linear equations in
these variables. Hence,

2x1 −3x2 +5x3 = 7

is a linear equation; the coefficients of x1, x2, and x3 are 2, −3, and 5, and the
constant term is 7. Note that each variable in a linear equation occurs to the first
power only.

Given a linear equation a1x1 +a2x2 +···+anxn = b, a -sequence s1, s2, ..., sn of n
numbers is called a solution to the equation if

a1s1 +a2s2 +···+ansn = b

that is, if the equation is satisfied when the substitutions x1 = s1, x2= s2, ..., xn = sn
are made. A sequence of numbers is called a solution to a system of equations if it
is a solution to every equation in the system.

16 | P a g e
For example, x = −2, y = 5, z = 0 and x = 0, y = 4, z = −1 are both solutions to the
system

x + y + z = 3 and 2x + y + 3z = 1

A system may have no solution at all, or it may have a unique solution, or it may
have an infinite family of solutions. For instance, the system x+y = 2, x+y = 3 has
no solution because the sum of two numbers cannot be 2 and 3 simultaneously. A
system that has no solution is called inconsistent; a system with at least one
solution is called consistent.

Definition:- A system of m linear equations in n variables x1, x2, . . . , xn is a set of

equations of the form

a11x1 + a12x2 + · · · + a1nxn = b1

a21x1 + a22x2 + · · · + a2nxn = b2

am1x1 + am2x2 + · · · + amnxn = bm

where for 1 ≤ i ≤ m and 1 ≤ j ≤ n; aij , bi ∈ R. The linear system is called

homogeneous if b1 = 0 = b2 = · · · = bm and non-homogeneous, otherwise.
𝑎11 𝑎12 ⋯ 𝑎1𝑛 𝑥1 𝑏1
𝑎21 𝑎22 ⋯ 𝑎2𝑛 𝑥2 𝑏2
Definition:- A=[ ⋮ ⋮ ⋱ ⋮ ],X= [ ⋮ ] and b = [ ] . Then, Equation
⋮
𝑎𝑚1 𝑎𝑚2 … 𝑎𝑚𝑛 𝑥𝑛 𝑏𝑛
can be re-written as Ax = b, where A is called the coefficient matrix and the block
matrix [A b] is called the augmented matrix.

we solve the system x + 2y = −2, 2x + y = 7 in this manner. At each stage, the

corresponding augmented matrix is displayed. The original system is
1 2 −2
x + 2y = −2 [ ]
2 1 7
2x + y = 7

First, subtract twice the first equation from the second. The resulting system is
1 2 −2
x + 2y = −2 [ ]
0 −3 11
17 | P a g e
− 3y = 11
11
which is equivalent to the original. At this stage we obtain y = − by multiplying
3
1
the second equation by − . The result is the equivalent system
3

1 2 −2
x+2y = −2 , [0 1 − ]
11
3

11
y=−
3

Finally, we subtract twice the second equation from the first to get another
equivalent system.
16
16 1 0
3
x= [ 11]
3
0 1 −
3

11
y=−
3

2.2) Geometric Interpretations of solution

Recall that the linear system ax + by = c for (a, b)≠ (0, 0), in the variables x and y,
represents a line in R2. So, let us consider the points of intersection of the two lines
a1x + b1y = c1, a2x + b2y = c2, where a1, a2, b1, b2, c1, c2 ∈ R with (a1, b1),(a2,b2 ) ≠
(0,0)

18 | P a g e
Example:-
Three planes intersecting at a point.

Consider the system of linear equations

x+y+z=2
x + 2y + z = 3
x + 2y − 3z = 2

The augmented matrix is

1 1 1 2
[1 2 1 |3]
1 2 −3 2

whose reduced row echelon form is

3
1 0 04
0 1 0|1
0 0 11
[ 4]
3 1
This means that there is a single solution: (x,y,z)=( ,1, ) Here are the three planes
4 4
corresponding to the three equations:

19 | P a g e
Three non-intersecting planes with no solutions.

Consider the system of linear equations

2x – y + z = 1
2x− y + z = 2
2x− y + z = 3

The augmented matrix is

2 −1 1 1
[2 −1 1|2]
2 −1 1 3

whose reduced row echelon form is

−1 10
1
[ 2 2| ]
0 0 11
0 0 00

The middle row indicates that there is no solution.

20 | P a g e
2.3) Row Reduction and Echelon form
A rectangular matrix is in echelon form (or row echelon form) if it has the
following three properties:

1. All nonzero rows are above any rows of all zeros.

2. Each leading entry of a row is in a column to the right of the leading entry of
the row above it.
3. All entries in a column below a leading entry are zeros.

If a matrix in echelon form satisfies the following additional conditions, then it is

in reduced echelon form (or reduced row echelon form):

4. The leading entry in each nonzero row is 1.

5. Each leading 1 is the only nonzero entry in its column.

An echelon matrix (respectively, reduced echelon matrix) is one that is in echelon

form (respectively, reduced echelon form). Property 2 says that the leading entries
form an echelon (“steplike”) pattern that moves down and to the right through the
matrix. Property 3 is a simple consequence of property 2, but we include it for
emphasis. The “triangular” matrices such as
2 −3 2 1 1 0 0 29
[0 1 −4 8]
5
and [0 1 0 16]
0 0 0 0 0 1 3
2

are in echelon form. In fact, the second matrix is in reduced echelon form. Here are
additional examples.

More Examples:-

Solve the linear system x + y + z = 4, 2x + 3z = 5, y + z = 3.

1 1 1 4
Solution: Let B0 = [A b] = [2 0 3 5] be the augmented matrix. Then
0 1 1 3
(a) The given system looks like (correspond to the augment matrix B0).

21 | P a g e
x+y+z=4
1 1 1 4
2x + 3z = 5 B0 =[2 0 3 5]
0 1 1 3
y+z=3
(b) In the new system, replace 2-nd equation by 2-nd equation minus 2 times the
1-st equation (replace B0[2, :] by B0[2, :] − 2 · B0[1, :] to get B1).
x+y+z=4
1 0 0 1 1 1 4
−2y + z = −3 B1 =[−2 1 0] B0 =[0 −2 1 −3]
0 0 1 0 1 1 3
y+z=3
1
(c) In the new system, replace 3-rd equation by 3-rd equation plus times the 2-
2
1
nd equation (replace B1[3, :] by B1[3, :] + · B1[2, :] to get B2).
2
x+y+z=4
1 0 0 1 1 1 4
−2y + z = −3 B2 =[0 1 3] B1 =[0
1
−2 1
3
−3]
3
0 1 0 0
2 2 2
3 3
z=
2 2
Observe that the matrix B2 is in REF. Verify that the solution set is
{[x, y, z] T | [x, y, z] = [1, 2, 1]}, again a unique solution.

2.3 ) Vector Equation

a) Vectors in R2
A matrix with only one column is called a column vector, or simply a vector.
Examples of vectors with two entries are
3 .2 𝑤1
u=[ ] v=[ ] w=[𝑤 ]
−1 .3 2

where w1 and w2 are any real numbers. The set of all vectors with two entries is
denoted by R2 (read “r-two”). The R stands for the real numbers that appear as

22 | P a g e
entries in the vectors, and the exponent 2 indicates that each vector contains two
entries.

Two vectors in R2 are equal if and only if their corresponding entries are equal.
4 7
Thus[ ] and [ ]are not equal, because vectors in R 2 are ordered pairs of real
7 4
numbers.

Geometric Descriptions of R2
Consider a rectangular coordin7ate system in the plane. Because each point in the
plane is determined by an ordered pair of numbers, we can identify a geometric
𝑎
point (a,b) with the column vector[ ] . So we may regard R2 as the set of all points
𝑏
in the plane.

𝑎
The geometric visualization of a vector such as [ ]is often aided by including
𝑏
an arrow (directed line segment) from the origin (0,0) to the point.

b) Vectors in R3
Vectors in R3 are 3x1 column matrices with three entries. They are
represented geometrically by points in a three-dimensional coordinate space,
with arrows from the origin sometimes included for visual clarity. The
2
vectors a=[3] and 2a are displayed in Figure.
4

23 | P a g e
c) Vectors in Rn
If n is a positive integer, Rn (read “r-n”) denotes the collection of all lists (or
ordered n-tuples) of n real numbers, usually written as nx1 column matrices,
𝑢1
𝑢2
such as, u =[ ⋮ ]
𝑢𝑛
The vector whose entries are all zero is called the zero vector and is denoted
by 0.

2.4 ) Matrix Equation Ax=b

If A is an m x n matrix, with columns a1,…, an, and if x is in Rn , then the
product of A and x, denoted by Ax, is the linear combination of the columns
of A using the corresponding entries in x as weights; that is,
𝑢1
𝑢2
Ax =[a1, a2, . . . , an] [ ⋮ ] = x1a1+ x2a2+….+xnan
𝑢𝑛

Statement:-
If A is an m x n matrix, with columns a1,…, an, and if b is in Rm, the matrix
equation

Ax= b
24 | P a g e
has the same solution set as the vector equation

x1a1 +x2a2+…+xnan = b
which, in turn, has the same solution set as the system of linear equations whose
augmented matrix is

[ a1 a2 … an b]

2.5) linear Independence

Definition:- An indexed set of vectors {v1,.., vp} in Rn is said to be linearly
independent if the vector equation

x1v1 + x2v2+…..+ xpvp = 0

has only the trivial solution. The set {v1,.., vp} is said to be linearly dependent if
there exist weights c1,…,cp, not all zero, such that

c1v1 +c2v2 +…..+cpvp =0

An Application to Network Flow

There are many types of problems that concern a network of conductors along
which some sort of flow is observed. Examples of these include an irrigation
network and a network of streets or freeways. There are often points in the system
at which a net flow either enters or leaves the system. The basic principle behind
the analysis of such systems is that the total flow into the system must equal the
total flow out. In fact, we apply this principle at every junction in the system.

Example:-
A network of one-way streets is shown in the accompanying diagram. The rate of
flow of cars into intersection A is 500 cars per hour, and 400 and 100 cars per hour
emerge from B and C, respectively. Find the possible flows along each street.

25 | P a g e
Suppose the flows along the streets are f1, f2, f3, f4, f5, and f6 cars per hour in the
directions shown.

Then, equating the flow in with the flow out at each intersection,

we get,

Intersection A 500 = f1 + f2 + f3
Intersection B f1 + f4 + f6 = 400
Intersection C f3 + f5 = f6 +100
Intersection D f2 = f4 + f5
These give four equations in the six variables f1, f2, ..., f6.

f1 + f2 + f3 = 500
f1 + f4 + f6 = 400
f3 + f5 − f6 = 100
f2 − f4 − f5 =0
The reduction of the augmented matrix is

1 1 1 0 0 0 500 1 0 0 1 0 1 400
1 0 0 1 0 1 400 0 1 0 −1 −1 0 0
[ | ] —›[ | ]
0 0 1 0 1 −1 100 0 0 1 0 1 −1 100
0 1 0 −1 −1 0 0 0 0 0 0 0 0 0

26 | P a g e
Hence, when we use f4, f5, and f6 as parameters, the general solution is
f1 =400-f4-f 6 f2=f4+f5 f3=100-f5+f6
This gives all solutions to the system of equations and hence all the possible flows.
Of course, not all these solutions may be acceptable in the real situation. For
example, the flows f1, f2, ..., f6 are all positive in the present context (if one came
out negative, it would mean traffic flowed in the opposite direction). This imposes
constraints on the flows: f1 ≥ 0 and f 3≥ 0 become

f4 + f6 ≤ 400 f5 − f6 ≤ 100

Further constraints might be imposed by insisting on maximum values on the flow

in each street.

An Application to Electrical Networks

In an electrical network it is often necessary to find the current in amperes (A)
flowing in various parts of the network. These networks usually contain resistors
that retard the current. The resistors are indicated by a symbol ( ), and the
resistance is measured in ohms (Ω). Also, the current is increased at various points
by voltage sources (for example, a battery). The voltage of these sources is
measured in volts (V), and they are represented by the symbol ( ).

Example:-
Find the various currents in the circuit shown.

First apply the junction rule at junctions A, B,

C, and D to obtain

Junction A I1 = I2 +I3

Junction B I6 = I1 +I5

27 | P a g e
Junction C I2 +I4 = I6

Junction D I3 +I5 = I4

Note that these equations are not independent

(in fact, the third is an easy consequence of the other three). Next, the circuit rule
insists that the sum of the voltage increases (due to the sources) around a closed
circuit must equal the sum of the voltage drops (due to resistances). By Ohm’s law,
the voltage loss across a resistance R (in the direction of the current I) is RI. Going
counterclockwise around three closed circuits yields

Upper left 10 + 5 = 20I1

Upper right −5 + 20 = 10I3 +5I4

Lower −10 = −20I5 −5I4

Hence, disregarding the redundant equation obtained at junction C, we have six

equations in the six unknowns I1,..., I6. The solution is
15 28
I1 = I4 =
20 20

1 12
I2 = − I5 =
20 20

16 27
I3 = I6 =
20 20

The fact that I2 is negative means, of course, that this current is in the opposite
1
direction, with a magnitude of amperes.
20

An Application to Chemical Reactions

When a chemical reaction takes place a number of molecules combine to produce
new molecules.

28 | P a g e
Hence, when hydrogen H2 and oxygen O2 molecules combine, the result is water
H2O. We express this as

H2 +O2 → H2O

Individual atoms are neither created nor destroyed, so the number of hydrogen and
oxygen atoms going into the reaction must equal the number coming out (in the
form of water). In this case the reaction is said to be balanced. Note that each
hydrogen molecule H2 consists of two atoms as does each oxygen molecule O 2,
while a water molecule H2O consists of two hydrogen atoms and one oxygen atom.
In the above reaction, this requires that twice as many hydrogen molecules enter
the reaction; we express this as follows:

2H2 +O2 → 2H2O

This is now balanced because there are 4 hydrogen atoms and 2 oxygen atoms on
each side of the reaction.

Example:-
Balance the following reaction for burning octane C 8H18 in oxygen O2:

C8H18 +O2 → CO2 +H=O

Where CO2 represents carbon dioxide. We must find positive integers x, y, z, and
w such that

xC8H18 +yO2 → zCO2 +wH2O

Equating the number of carbon, hydrogen, and oxygen atoms on each side gives 8
x = z, 18x = 2w and 2y = 2z+w, respectively. These can be written as a
homogeneous linear system

8x −z =0

18x − 2w = 0

2y − 2z − w = 0

which can be solved by Gaussian elimination. In larger systems this is necessary

but, in such a simple situation, it is easier to solve directly. Set w = t, so that

29 | P a g e
1 8 16 25
x = ( ) t, z = ( )t, 2y = ( )t +t = t. But x, y, z, and w must be positive integers,
9 9 9 9
so the smallest value of t that eliminates fractions is 18. Hence, x = 2, y = 25, z
= 16, and w = 18, and the balanced reaction is

2C8H18 +25O2 → 16CO2 +18H2O

The reader can verify that this is indeed balanced.

Conclusion:-
In second unit, we started with different solution of system of linear equation ,
then we observed the geometrical interpretation of solution of system of linear
equation. Later we came across concept of Row Echelon form and reduced
Row Echelon form, vector equation in (R2, R3 , Rn )& it geometrical
representation, matrix equation and linear independence. At last some
application related to this unit (network flows, electrical network, chemical
reaction).

Chapter -3

Vector Space
30 | P a g e
3.1) Introduction
Let us recall that the vectors in R2 and R3 satisfy the following properties:

1. Vector Addition:- To every pair u, v ∈ R3 there corresponds a unique

element u+v ∈ R3 (called the addition of vectors) such that
a) u + v = v + u (Commutative law)
b) (u + v) + w = u + (v + w) (Associative law)
c) R3 has a unique element, denoted 0, called the zero vector that satisfies
u+0 = u, for every u ∈ R3 (called the additive identity).
d) For every u ∈ R3 there is an element w∈ R3 that satisfies u + w = 0.
2. Scalar Multiplication:-For each u ∈ R3 and α ∈ R, there corresponds a
unique element α · u ∈ R3 (called the scalar multiplication) such that
a) α · (β · u) = (α · β) · u for every α, β ∈ R and u ∈ R3 .
b) 1 · u = u for every u ∈ R3, where 1 ∈ R.
3. Distributive Laws:- relating vector addition with scalar multiplication For
any α, β ∈ R and u, v ∈ R 3 , the following distributive laws hold:
a) α · (u + v) = (α · u) + (α · v).
b) (α + β) · u = (α · u) + (β · u).

Definition:- A vector space V over F, denoted V(F) or in short V (if the field F is
clear from the context), is a non-empty set, in which one can define vector
addition, scalar multiplication. Further, with these definitions, the properties of
vector addition, scalar multiplication and distributive laws satisfied.

Theorem:- Let V be a vector space over F. Then,

1. u + v = u implies v = 0
2. α ·u = 0 if and only if either u = 0 or α = 0
3. (−1) · u = −u, for every u ∈ V.
Proof:- Part 1: for each u ∈ V there exists −u ∈ V such that −u + u = 0. Hence
u + v = u implies

0 = −u + u = −u + (u + v) = (−u + u) + v = 0 + v = v.

31 | P a g e
Part 2: As 0 = 0 + 0, using Condition 3, α · 0 = α ·(0 + 0) = (α · 0) + (α ·0). Thus,
using Part 1, α · 0 = 0 for any α ∈ F. Similarly, 0 · u = (0 + 0)· u = (0 ·u) + (0 · u)
implies 0 · u = 0, for any u ∈ V.

Now suppose α ·u = 0. If α = 0 then the proof is over. So, assume that α ≠ 0,

α ∈ F. Then (α) −1 ∈ F and using, 1 · u = u for every vector u ∈ V, we have

0 = (α) −1 · 0 = (α) −1 · (α · u) = ((α) −1 · α) · u = 1 · u = u.

Thus, if α≠0 and α · u = 0 then u = 0.

Part 3: As 0 = 0 ·u = (1 + (−1))u = u + (−1) · u, one has (−1) · u = −u.

Example:-
1. Let V = {0}. Then, V is a real as well as a complex vector space.
2. Consider R with the usual addition and multiplication. Then R forms a real
vector space.
3. Let F be a field and let K be its subfield. Then F is a vector space over K
under the addition and multiplication of F, that is 𝛼 ∈ K And x ∈ F. the
scalar multiplication 𝛼x is the multiplication of 𝛼 and x in F.
Thus, C can be considered as a vector space over R(or over Q) and R can
be considered as a vector space over Q.
4. Let C(R, R) = {f : R → R | f is continuous}. Then, C(R, R) is a real vector
space, where (f + αg)(x) = f(x) + αg(x), for all x ∈ R.
5. Let Rn = {(a1. . . an) T | ai ∈ R, 1 ≤ i ≤ n}. For u = (a1, . . . , an) T ,v = (b1, . . . ,
bn) T ∈ V and α ∈ R, define
u + v = (a1 + b1. . . an + bn)T and α · u = (αa1, . . . , αan)T
(called component-wise operations). Then, V is a real vector space. The
vector space Rn is called the real vector space of n-tuples.
6. V= P(x)=(a0+a1x+a2x2+…=anxn |ai ∈IR}
F=IR i.e. v(IR) is vector space.

3.2) Vector Subspace

Let V be a vector space over F. Then, a non-empty subset W of V is called a
subspace of V if W is also a vector space with vector addition and scalar

32 | P a g e
multiplication in W coming from that in V (compute the vector addition and scalar
multiplication in V and then the computed vector should be an element of W).

Example:-
1. Q(Q) is subspace of R(R)
2. R(C) is not subspace of C(C) , here C(C) is vector space but R(C) is not
vector space.
3. The vector space R[x; n] is a subspace of R[x].
4. Let V be a vector space. Then V and {0} are subspaces, called trivial
subspaces.
5. W = {x ∈ R3 | [-1, 2, −1]x = 0} is a subspace. It represents a plane in R3
containing 0.

Theorem:- Let V(F) be a vector space and W ⊆ V, W ≠ ∅. Then, W is a

subspace of V if and only if αu + βv ∈ W whenever α, β ∈ F and u, v ∈ W.

Proof:- Let W be a subspace of V and let u, v ∈ W. As W is a subspace, the scalar

multiplication and vector addition gives elements of W itself. Hence, for every
α, β ∈ F, αu, βv ∈ W and αu + βv ∈ W.

Now, we assume that αu + βv ∈ W, whenever α, β ∈ F and u, v ∈ W. To show, W

is a subspace of V:

1. Taking α = 0 and β = 0 ⇒ 0 ∈ W. So, W is non-empty.

2. Taking α = 1 and β = 1, we see that u + v ∈ W, for every u, v ∈ W.
3. Taking β = 0, we see that αu ∈ W, for every α ∈ F and u ∈ W. Hence, using
previous theorem −u = (−1)u ∈ W as well.
4. The commutative and associative laws of vector addition hold as they hold
in V.
5. The conditions related with scalar multiplication and the distributive laws
also hold as they hold in V.

Proposition:- Let W₁ and W₂ be subspaces of a vector space V. Then W₁∪W2

is a subspace of V if and only if W₁⊆W2 or W2⊆W1.

PROOF:- Assume that W₁ U W2 is a subspace of V and W₁⊈W2. Let w2 ∈ W2

and choose w₁ ∈W₁ \ W2. Since W₁ ∪ W2 is a subspace of V, W1 + W2 ∈ W1 ∪
33 | P a g e
W2. If w₁ + w2 ∈ W2, then w₁ = (w1 + w2) - w2 ∈W2, à contradiction. Therefore,
w₁ + w2 ∈ W1, and w2 ∈ W₁. Hence W2∈W1. The converse is obvious.

Linear Combination and Linear Span

Let {v1, v2, ..., vn} be a set of vectors in a vector space V. As in R n , a vector v
is called a linear combination of the vectors v1, v2, ..., vn if it can be expressed
in the form

v = a1v1 +a2v2 +···+anvn

where a1, a2, ..., an are scalars, called the coefficients of v1, v2, ..., vn. The set of
all linear combinations of these vectors is called their Span, and is denoted by

Span {v1, v2, ..., vn} = {a1v1 +a2v2 +···+anvn | ai in R}

Definition:- Let V be a vector space over F and S ⊆ V.

1. Then, the linear Span of S, denoted LS(S), is defined as

LS(S) = {α1u1 + · · · + αnun | αi ∈ F, ui ∈ S, for 1 ≤ i ≤ n}.
That is, LS(S) is the set of all possible linear combinations of finitely many
vectors of S. If S is an empty set, we define LS(S) = {0}.
2. V is said to be finite dimensional if there exists a finite set S such that
V = LS(S).
3. If there does not exist any finite subset S of V such that V = LS(S) then V is
called infinite dimensional.

Lemma:- Let V be a vector space over F with S ⊆ V. Then LS(S) is a subspace

of V.

Proof:- By definition, 0 ∈ LS(S). So, LS(S) is non-empty. Let u,v ∈ LS(S).

To show, au + bv ∈ LS(S) for all a, b ∈ F. As u,v ∈ LS(S), there exist n ∈ N,
vectors wi ∈ S and scalars αi , βi ∈ F such that u = α1w1 + · · · + αnwn and
v = β1w1 + · · · + βnwn. Hence,

au + bv = (aα1 + bβ1)w1 + · · · + (aαn + bβn)wn ∈ LS(S)

as aαi + bβi ∈ F for 1 ≤ i ≤ n. Thus, by Theorem LS(S) is a vector subspace.

34 | P a g e
3.3) Basis and Dimension
Let V be a vector space over K. A finite subset {x1,...,xn} of V is a linearly
dependent set (or a K-linearly dependent set, whenever we want to specify the
underlying field K) if there are scalars α 1,..., α n in K, not all of which are zero,
such that

α1x1 +…+ αnxn = 0.

Otherwise, {x1,...,xn} is called a linearly independent set.

Thus, {x1,...,xn} is a linearly independent set if and only if for α1,..., αn in K,

α 1x1 + + αnxn=0 implies that α₁ = ... = αn = 0.

{x1,...,xn} is a linearly independent (respectively, linearly dependent) subset of a

vector space V, then we also write this as: the vectors x1,...,xn are linearly
independent (respectively, linearly dependent) in V.

An arbitrary subset X (not necessarily finite) of a vector space V is said to be

linearly independent if every finite subset of X is linearly independent; otherwise
X is said to be linearly dependent. Clearly, if X is a linearly independent subset of
V, then every subset of X is also linearly independent; and if X is linearly
dependent, then every subset of V which contains X is also linearly dependent.
Since for the zero vector 0, 1.0 = 0; 0 is linearly dependent. Every nonzero vector
is linearly independent. It is vacuously true that the empty set is linearly
independent.

Let V be a finite dimensional vector space over K. Define μ(V), the least number
of elements of V which are required to generate V. Clearly, μ(V) = 0 if and only if
V = {0}; and μ(V) ≥ 1 if V ≠{0}.

Proposition:- Let V be a finite dimensional vector space over K, and let 𝜇 (V) = n.
If V = 〈x1,...,xn〉, then x1,...,xn are linearly independent vectors.

Proof:- Clearly, x1,...,xn are nonzero distinct elements of V. If vectors x1,...,xn are
linearly dependent, then there are scalars α₁,..., αn in K, not all of which are zero,
such that α1x1 +…+ αnxn = 0. Let ak≠0; then

xk=∑𝑛𝑖=1,𝑖≠𝑘(−α𝑘−1α𝑖 )𝑥𝑖 ,
35 | P a g e
and so V = 〈x1,...,xk-1, xk+1,...,xn〉. This contradicts that μ(V) = n. Hence, x1,..., xn
are linearly independent.

Let V be a vector space over a field K. A subset B of V is a basis of V if V = 〈B〉

and B is a linearly independent subset of V.

Proposition shows that a finite dimensional vector space has a basis. In fact, an
infinite dimensional vector space also has a basis. Further, any two bases of a
vector space have the same cardinality. We will prove this result for finite
dimensional vector spaces.

Proposition:- Let V be a vector space over K. A finite subset B of V is a basis of

V if and only if every element of V is a unique linear combination of elements of
B.

Proof:- Let B={x1,...,xn} be a basis of V. Suppose that v = ∑𝑛𝑖=0 𝛼𝑖 𝑥𝑖 , 𝛼 i∈ K, and

also that v = ∑𝑛𝑖=0 𝛽𝑖 𝑥𝑖 , 𝛽i ∈ K. Then we have ∑𝑛𝑖=1(𝛼𝑖 − 𝛽𝑖 )𝑥𝑖 = 0. By linear
independence of B, it follows that αi = 𝛽 i for all i.

Conversely, it is clear that V=〈B〉. Let a1,...,an be in K such that 𝛼 1x1 +….+ 𝛼 nxn =0
since 0= 0x1 +….+ 0xn by uniqueness of representation of elements of V as linear
combination of B, 𝛼 i=…= 𝛼 n = 0. Hence, B is a basis of V.

Let V be a vector space over a field K. The dimension of V, denoted by dim V, is

the cardinality of a basis of V. We also denote it by dimkV, whenever we want to
specify the field K.

Proposition:- Let V be a finite dimensional vector space over K. Then the

following statements are equivalent for a subset B of V:

1. B is a basis;
2. B is a minimal generating set, that is, no subset of B can generate V.
3. B is a maximal linearly independent set,

Note:-
1. A linearly independent subset of a finite dimensional vector space can be
extended to form a basis of the vector space.

36 | P a g e
2. Let V be Vector space of dimension n. Then every linearly independent
subset of V with n element is a basis of V.
3. Let V be finite dimensional vector space and let W be subspace of V. then
W≤dim V; and V=W if and only if dim V = dim W.
4. Let W1 and W2 be finite dimensional subspace of a vector space V over K.
then
dim(W1+W2)=dimW1+dimW2 - dim(W1∩W2)

3.4) Rank
Definition:- The rank of a matrix A is the dimension of its column space. We will
use rank(A) to denote the rank of A.

(Or)

Let A ∈ Mm,n(C). Then, the rank of A, denoted Rank(A), is the number of pivots
in the RREF(A).

Recall that Col(A) = Range(T A), and thus the rank of A is the dimension of the
range of the linear mapping T A. The range of a mapping is sometimes called the
image.

Example:-
1 2 2 −1
1) Compute the rank of A = [3 6 5 0 ] and find bases for row A .
1 2 1 2
Solution:- the reduction Of A to row- echelon form is as follows:
1 2 2 −1 1 2 2 −1 1 2 2 −1
[3 6 5 0 ] → [0 0 −1 3 ] → [0 0 −1 3 ]
1 2 1 2 0 0 −1 3 0 0 0 0
Hence rank A= 2, {[1 2 2 -1]},{[ 0 0 -1 3]} is a basis of row A

2) Find the rank of matrix a by using the row echelon form.

1 2 3
A=[2 1 4]
3 0 5

37 | P a g e
Solution:-
1 2 3
Given, A=[2 1 4]
3 0 5
Now, we apply elementary transformation
R2→R2-2R1
R3→R3-3R1
We get,
1 2 3
[0 −3 −2]
0 −6 −4
R3→R3-2R3
1 2 3
[0 −3 −2]
0 0 0
The above matrix is in row echelon form.
Number of non-zero row = 2
Hence, the rank of matrix = 2

RESULT:-

1. If A is any matrix, then rank A = rank (AT).

2. If A is an m × n matrix, then rank A ≤ m and rank A ≤ n.
3. rank A = rank (UA) = rank (AV) whenever U and V are invertible.
4. If A is m×n and B is n×m, then rank AB ≤ rank A and rank AB ≤ rank B.

Definition:-The nullity of a matrix A is the dimension of its nullspace Null(A).

We will use nullity(A) to denote the nullity of A.

Theorem:-(Rank Theorem) Let A be a m × n matrix. The rank of A is the

number of leading 1’s in its RREF. Moreover, the following equation holds:

N = rank(A) + nullity(A)
Proof:- A basis for the column space is obtained by computing rref(A) and
identifying the columns that contain a leading 1. Each column of A corresponding
to a column of rref(A) with a leading 1 is a basis vector for the column space of A.
Therefore, if r is the number of leading 1’s then r = rank(A). Now let d = n − r. The

38 | P a g e
number of free parameters in the solution set of Ax = 0 is d and therefore a basis
for Null(A) will contain d vectors, that is, nullity(A) = d. Therefore,

nullity(A) = n − rank(A).

Example:-
1. Find the rank and nullity of the matrix.
1 −2 2 3 −6
A=[ 0 −1 −3 1 1]
−2 4 −3 −6 11
Solution:- row reduce far enough to identify where the leading entries are:-

1 −2 2 3 −6 1 −2 2 3 −6
[ 0 −1 −3 1 1 ] → 2𝑅1 + 𝑅2 → [0 −1 −3 1 1 ]
−2 4 −3 −6 11 0 0 1 0 −1
There are r = 3 leading entries and therefore rank (A) = 3. The nullity is therefore
nullity (A) = 5 – rank(A) = 2 .

2. Find the rank nullity of the matrix.

1 −3 −1
A=[−1 4 2]
−1 3 0
Solution:- row reduce far enough to identify where the leading entries are:
1 −3 −1
A→R1+R2, R1+R3 → [0 1 1]
0 0 −1
There are r =3 leading entries and therefore rank(A) = 3. The nullity is therefore
nullity (A) =3 –rank(A)= 0 .

3.5) linear transformation

Let V and V' be vector spaces over K. A mapping T : V → V' is a linear
transformation if for all u, v ∈ V and 𝛼, 𝛽 𝜖 K.

39 | P a g e
T (𝛼u+𝛽v) = 𝛼T (u) + 𝛽T (v).

(OR)

T(u+v)= T(u)+(v)

T(𝛼𝑢)=𝛼𝑇(𝑢)

Further, if T is a bijective mapping, then T is called an isomorphism. We say

that V is isomorphic to V', written as V ~ V' if there is an isomorphism from V
to V'.

If T: V→V' is an isomorphism, then its inverse mapping T-1: V'→V is also an

isomorphism. Since, for 𝛼, ∈ K, and x, y ∈ V',

T-1 (𝛼x+ By) = T¹ (𝛼T (T-1 (x)) + 𝛽T (T-¹ (y)))

=T-1 (T(𝛼T-1 (x) + 𝛽T-1 (y)))

=𝛼T-1 (x) + 𝛽T-1 (y).

Thus, V≃V' if and only if V'≃V, and so V≃V' states that V and V' are isomorphic.
In fact, an easy computation shows that the relation 'is isomorphic to' is an
equivalence relation.

The following statements are first consequences of the above definition.

Proposition:- Let V and V' be vector spaces ever K, and let T : V→V' be a
linear transformation. Then:

(i) T (0) = 0;
(ii) T(-v)= -T(v), v ∈ V;
(iii) T (U) is a subspace of V', whenever U is a subspace of V;
(iv) T-1 (U') is a subspace of V, whenever U' is a subspace of V'.

Proof:- (i) We remark that this statement means that a linear transformation maps
the zero of the vector space V (on the left of the expression) to the zero of the
vector space V' (on the right of the expression). Since
T (0) = T (0+0) = T (0) +T (0), so T (0) = 0.

(ii) For v ∈ V,T(-v) = T(-v) + 0 = T(-v) + (T(v) + (-T (v)))

40 | P a g e
= (T(-v)+T (v))+(-T (v))

= T((-v)+v)+(-T (v)) = T (0)+(-T (v)) = -T (v).

(iii) If x, y ∈ T (U), then for u, v ∈ U, x = T (u) and y = T (v). If U is a subspace of

V, then 𝛼u + 𝛽v ∈ U for all 𝛼, 𝛽 ∈ K, and so 𝛼x + 𝛽y = 𝛼T (u)+ 𝛽T (v)
= T (𝛼u + 𝛽v) ∈ T (U). Hence, T (U) is a subspace of V'.

(iv)If u, v ∈T-1 (U'), then T (u), T (v) ∈ U'. If U' is a subspace of V', then for all 𝛼,
𝛽 ∈ K, 𝛼T(u) + 𝛽T (v) = T (𝛼 u + 𝛽v) ∈ U', that is, au+ 𝛽v ∈T-1(U'). Hence, T-1
(U') is a subspace of V.

Let T:V→V' be a linear transformation. Since {0} is a subspace of V', by the above
Proposition, T-1 ({0}) is a subspace of V. We denote this subspace by ker T, called
the kernel of T. Therefore,

ker T = {v∈V | T(v) = 0}

Proposition:- Let T : V→V' be linear transformation. Then T if injective if only if

kerT= {0}.

Proof:- If T is injective, and v ∈ ker T, then T (v) = 0 = T(0) implies that v = 0.

Hence, Ker T = {0}.

Conversely, let kerT= {0}. For u,v ∈ 𝑉 if T(u)=T (v), then T (u-v) = 0, that is,
u-v ∈ kerT. Hence, u = v, and T is injective.

It follows from the above proposition that vector spaces V and V' are isomorphic if
and only if there is a linear transformation T: V→V' with ker T = {0} and
imT = V'. In particular, if T : V→V' is an injective linear transformation, then
V ≃imT.

Let V and V' be vector spaces over a field K. Denote by L (V, V') the set of all
linear transformations from V to V'. Our next result states that L(V, V') is also a
vector space over K. We write L (V) for L (V, V) and call its elements linear
operators on V.

Note:- Let V, V' and V" be vector spaces over K.

41 | P a g e
(i) If S,T ∈ L(V, V'), then S+T ∈ L(V, V'), where S+T: V→V' is defined by
(S+T) (v) = S (v) +T (v).
(ii) If T ∈ L(V, V'), then for 𝛼 ∈ K, 𝛼T ∈ L(V, V'), where 𝛼T: V→V' is
defined by (𝛼T) (v) = 𝛼T (v).
(iii) L(V, V') is a vector space over K.
(iv) If T ∈ L(V, V") and S ∈ L(V", V'), then the composite of S and T,
S ᵒ T ∈ L(V,V').

3.5) Matrix of linear Transformation

Whenever a linear transformation T arises geometrically or is described in words,
we usually want a “formula” for T(x). The discussion that follows shows that every
linear transformation from Rn to Rm is actually a matrix transformation x→Ax and
those important properties of T are intimately related to familiar properties of A.
The key to finding A is to observe that T is completely determined by what it does
to the columns of the n x n identity matrix In.
1 0 1
Example:- The columns of I2=[ ] are e1=[ ] and
0 1 0
0
e2=[ ]. Suppose T is a linear transformation from R 2 into
1
3
R such that.

5 −3
T(e1) =[−7] and T(e2) =[ 8 ]
2 0
With no additional information, find a formula for the image of an arbitrary x in
R 2.
𝑥1 1 0
Solution:- write x=[𝑥 ]= x1[ ]+ x2[ ] = x1e1+ x2e2 ……(1)
2 0 1
Since T is a linear transformation,

T(x) = x1T(e1)+ x2T(e2) ……(2)

42 | P a g e
5 −3 5𝑥1 − 3𝑥2
=x1[−7] +x2[ 8 ] = [−7𝑥1 + 8𝑥2 ]
2 0 2𝑥1 + 0

The step from equation (1) to equation (2) explains why knowledge of T(e1) and
T(e2) is sufficient to determine T(x) for any x. Moreover, since (2) expresses T (x)
as a linear combination of vectors, we can put these vectors into the columns of a
matrix A and write (2) as
𝑥1
T(x)=[ T(e1) …. T(e2)] [𝑥 ]= Ax
2

Theorem:-Let T: Rn→Rm be a linear transformation. Then there exists a unique

matrix A such that

T(x) =Ax for all x in Rn

In fact, A is the m x n matrix whose jth column is the vector T (ej ), where ej is the
jth column of the identity matrix in Rn :

A = [T(e1)….T(en)]

Proof:- Write x= Inx = [ e1… en] = x1e1+ …+ xnen, and use the linearity of T to
compute

T(x) = T[x1e1+….+xnen] =x1T (e1) +…+ xnT(en)

𝑥1
= [T(e1) … T(en)] [ ⋮ ] = Ax
𝑥2

The matrix A in A [T (e1)….T(en)] is called the standard matrix for the linear
transformation T.

Example:- Let T : R2→R2 be the transformation that rotates each point in R 2

about the origin through an angle ∅, with counterclockwise rotation for a positive
angle. We could show geometrically that such a transformation is linear. Find the
standard matrix A of this transformation.

1 sin ∅ 0 −sin ∅
Solution:- [ ] rotates into [ ] , and [ ] rotates into [ ]
0 cos ∅ 1 cos ∅

43 | P a g e
cos ∅ − sin ∅
By above theorem, A=[ ]
sin ∅ cos ∅

 Geometric linear transformation

Some common geometric linear transformations of

the plane. Because the transformations are linear,
they are determined completely by what they do to
the columns of I2. Instead of showing only the
images of e1 and e2, they show what a transformation
does to the unit square . Other transformations can be
constructed from those listed below by applying one
transformation after another. For instance, a
horizontal shear could be followed by a reflection in the x2-axis.

44 | P a g e
45 | P a g e
Conclusion:-
In this unit, we covered the concept of vector space and it’s example , vector
subspace , basis and dimension and its related theorems, Rank and it’s well know
theorem i.e., rank nullity theorem, linear transformation and its application related
to R2 in different form of matrix.

46 | P a g e
Chapter- 4
Eigenvalues , Eigenvectors and Diagonalizability

4.1) Eigenvector and Eigenvalue

Definition:- An eigenvector of an n x n matrix A is a nonzero vector x such that
Ax = λx for some scalar λ. A scalar λ is called an eigenvalue of A if there is a
nontrivial solution x of Ax = λx; such an x is called an eigenvector corresponding
to λ.

1 6 6 3
Example:- 1) Let A=[ ] , u=[ ] and v= [ ] . Are u and v
5 2 −5 −2
eigenvector of A.
1 6 6 −24 6
Solution:- Au = [ ][ ] = [ ]= -4[ ]= -4u
5 2 −5 20 −5
1 6 3 −9 3
Av = [ ] [ ] = [ ] ≠ λ[ ]
5 2 −2 11 −2
Thus u is an eigenvector corresponding to an eigenvalue (-4), but v is not
eigenvector of A, because Av is not a multiple of v.

47 | P a g e
Thus λ ∈ K is an eigenvalue of a if and only if the subspace ker(A-λI)≠{0}, and
any nonzero element of this subspace is an eigenvector corresponding to λ. If λ is
an eigenvalue of A the subspace ker(A- λI) is called an eigenspace of A
corresponding to λ.

4 −1 6
Example:- 2) Let A=[2 1 6]. An eigenvalue of A in 2. Find a basis for
2 −1 5
the corresponding eigenspace.
Solution:-
4 −1 6 2 0 0 2 −1 6
A-2I= [2 1 6] -[0 2 0 ] = [2 −1 6]
2 −1 5 0 0 2 2 −1 6
And row reduce the augmented matrix for (A-2I) =0.
2 −1 6 0 2 −1 6 0
[2 −1 6 0] ~ [0 0 0 0]
2 −1 6 0 0 0 0 0
At this point, it is clear that 2 is indeed an eigenvalue of A because the
equation (A-2I)x=0 has free variable. The general solution is
1
𝑥1 −3
2
[𝑥2 ] = 𝑥2 [1] + 𝑥3 [ 0 ], x2 and x3 free
𝑥3 1
0
The eigenspace, shown in figure, is two- dimensional subspace of R2. A basis
1 −3
is {[2] , [ 0 ]}
0 1

48 | P a g e
Note:-
1) The eigenvalue of triangular matrix are the entries on its main
diagonal.
2) Let v1….,vk be eigenvalue of A corresponding to distinct eigenvalues
λ1,..…, λn of A. the { v1,…,vk} is a linearly independent set.
3) The matrix A ∈ Rnxn is invertible if and only if λ = 0 is not an eigenvalue
of A, or the determinant of A is not zero.

The Characteristic Equation

A scalar λ is an eigenvalue of an n x n matrix A if and only if λ satisfies the
characteristic equation

det(A - λI) = 0
Example:- Find the characteristic equation of
5 −2 6 −1
0 3 −8 0
Solution:- A=[ ]
0 0 5 4
0 0 0 1
5−λ −2 6 −1
0 3−λ −8 0
det (A – λI) = det[ ]
0 0 5−λ 4
0 0 0 1−λ
= (5-λ)(3-λ)(5-λ)(1-λ)
49 | P a g e
The characteristic equation is

(5-λ)2(3-λ)((1-λ) =0

(Or ) (λ-5)2(λ-3)( λ-1) =0

Expanding the product, we can also write

λ 4 -14 λ3+68 λ2-130 λ +75=0

Result:-
1) The characteristic polynomial p(λ)= det(A - λI) of n x n matrix A is an with
degree polynomial.
2) Let the A is n x n matrix and has n distinct eigenvalue λ1, λ2 ,…., λn. Let vi
be an eigenvector of A corresponding to λi. Then {v1,v2,…, vn} is a basis for
Rn.
3) Eigenvalues and similarity transformation
Let A and B be n x n matrix .we will say that A is similar to B if there exist
an invertible matrix P such that
A= PBP-1

Definition:- suppose that A ∈Mnxn has characteristics polynomial p(λ) that can
be factored as

p(λ)= (λ- λ1)k1(λ- λ2)k2…..(λ- λp)kp

the exponent ki is called the algebra multiplicity of the eigen value λi.

Example:-
1) Find the characteristic polynomial, eigenvalues, and basic eigenvectors for
2 0 0
A=[1 2 −1]
1 3 −2
Solution:- Here the characteristic polynomial is given by

50 | P a g e
𝑥−2 0 0
CA(x) = det [ 1 𝑥−2 −1 ] = (𝑥 − 2)(𝑥 − 1)(𝑥 + 1)
1 3 𝑥+2
So the eigenvalue are 𝜆1 = 2, 𝜆2 =1, 𝜆3 = -1. to find all eigenvector for 𝜆1 = 2
𝜆1 − 2 0 0 0 0 0
𝜆1I- A =[ 1 𝜆1 − 2 −1 ] = [−1 0 1]
1 3 𝜆1 + 2 −1 −3 4
We want the (non zero) solution to (𝜆1I- A)x=0 the augmented matrix
0 0 0 0 0 0 0 0
becomes is [−1 0 1 0] → [−1 0 1 0]
−1 −3 4 0 −1 −3 4 0
1
Using row operation . hence the general solution x to (𝜆1I- A)x= t [1] where
1
1
t is arbitrary , so we can use x1=[1] as the basic eigenvectors
1
0 0
1
corresponding 𝜆1=2. Another basis eigenvector to x2 =[1] and x3=[ 3 ]
1 1
corresponding to 𝜆2 =1 and 𝜆3= -1 respectively.

4.2) Diagonalizable and triangular operator

Before discussing diagonalization, we first consider the eigenvalues of triangular
matrices.

1) Let A be a triangular matrix (either upper or lower). Then the eigenvalues of

A are its diagonal entries.
2) A matrix D whose off-diagonal entries are all zero is called a diagonal
matrix.

Recall that two matrices A and B are said to be similar if there exists an invertible
matrix P such that

A = PBP−1.

51 | P a g e
A very simple type of matrix is a diagonal matrix since many computations with
diagonal matrices are trivial. The problem of diagonalization is thus concerned
with answering the question of whether a given matrix is similar to a diagonal
matrix.

Definition:- A matrix A is called diagonalizable if it is similar to a diagonal

matrix D. In other words, if there exists an invertible P such that

A = PDP−1

Result:-
1) A matrix A is diagonalizable if and only if there is a basis {v1, v2, . . . , vn}
of Rn consisting of eigenvectors of A.
2) Suppose that A∈ Rn×n has n distinct eigenvalues λ1, λ2, . . . , λn. Then A is
diagonalizable.
3) A matrix A is diagonalizable if and only if the algebraic and geometric
multiplicities of each eigenvalue are equal.
4) A is diagonalizable if and only if it has eigenvectors x1, x2, ..., xn such that
the matrix P = [x1 x2 ... xn ] is invertible.
5) An n x n matrix with n distinct eigenvalue is diagonalizable.
6) When this is the case, P−1AP = diag (λ1, λ2, ..., λn) where, for each i, λi is the
eigenvalue of A corresponding to xi .

Example:-
2 0 0
1) Diagonalize the matrix A= [1 2 −1]
1 3 −2
Solution:- the eigenvalue of a are 𝜆1 = 2, 𝜆2 =1, 𝜆3 = -1 with
1 0 0
1
corresponding basic eigenvector x1=[1] , x2 =[1] and x3=[ 3 ]respectively.
1 1 1
1 0 0
Since the matrix P=[ x1 x2 x3 ] = [1 1 1] is invertible,
1 1 3

52 | P a g e
𝜆1 0 0 2 0 0
PAP-1= [ 0 𝜆2 0 ] = [0 1 0 ] =D
0 0 𝜆3 0 0 −1
0 1 1
2) Diagonalize the matrix A= [1 0 1]
1 1 0
Solution:- To compute the characteristic polynomial of A first add
rows 2 and 3 of xI −A to row 1:
𝑥 −1 −1 𝑥−2 𝑥−2 𝑥−2
CA(X) = det [−1 𝑥 −1] = [ −1 𝑥 −1 ]
−1 −1 𝑥 −1 −1 𝑥
𝑥−2 0 0
=det[ −1 𝑥 + 1 0 ] = (x-2)(x+1)2
−1 −1 𝑥+1
Hence the eigenvalues are λ1 = 2 and λ2= −1, with λ2 repeated twice (we
say that λ2 has multiplicity two). However, A is diagonalizable. For
λ1 = 2, the system of equations (λ1I −A)x = 0 has general solution
1 1
x = t[1], so a basic λ1-eigenvector is x1 = [1]. Turning to the repeated
1 1
eigenvalue λ2 = −1, we must solve (λ2I −A)x = 0. By Gaussian
−1 −1
elimination, the general solution is x = s [ 1 ] + t [ 0 ] where s and t are
0 1
arbitrary. Hence the Gaussian algorithm produces two basic
−1 −1
λ2-eigenvectors x2 =[ 1 ] and y2 = [ 0 ] If we take P = [ x1 x2 y2 ] =
0 1
1 −1 −1
[1 1 0 ] we find that P is invertible. Hence P−1AP = diag(2,−1, −1).
1 0 1

4.3) Jordan canonical form

Definition:-
Let λ ∈ C. A Jordan block J(λ) is an upper triangular matrix whose all diagonal
entries are λ, all entries of the superdiagonal (entries just above the diagonal) are 1
and other entries are zero. Therefore,
53 | P a g e
𝛌 𝟏 𝟎 ⋯ 𝟎
𝟎 𝛌 𝟏 ⋯ 𝟎
J(λ) = ⋮ ⋮ … ⋮ ⋮
𝟎 𝟎 … 𝛌 𝟏
[𝟎 𝟎 ⋯ 𝟎 𝛌]
Example:-
−1 1
1) J(-1) = [ ]
0 −1
5 1 0 0
0 5 1 0
2) J(5) = [ ]
0 0 5 1
0 0 0 5

Definition:-
A Jordan form or Jordan-Canonical form is a block diagonal matrix whose each
block is a Jordan block, that is, Jordan form is a matrix of the following form
𝑱𝟏 𝟎 … 𝟎
𝟎 𝑱𝟐 … 𝟎
=[ ]
⋮ ⋮ … ⋮
𝟎 𝟎 ⋯ 𝑱𝒏

Example:-
2 1 0 0
1 0 0
0 2 0 0
1) Jcf = [0 −1 1] and 2) Jcf = [ ]
0 0 2 1
0 0 −1
0 0 0 2

Definition:- Let T : V → V be a linear transformation and λ ∈ C. A non-zero

vector v ∈ V is called a generalized eigenvector of T corresponding to λ if
(T − λI)p(v) = 0 for some positive integer p. The generalized eigenspace of T
corresponding to λ, denoted by Kλ, is the subset of V defined by

Kλ = {v ∈ V | (T − λI)p(v) = 0 for some natural number p}

54 | P a g e
Example:-
1) Find the characteristics polynomial, minimal polynomial and
Jordan canonical form.
4 1 0
A= [−1 2 0]
1 1 3
Solution:-
| xI- A| =0
𝑥−4 −1 0
[ 1 𝑥−2 0 ] =0
−1 −1 𝑥−3
(x- 4){(x-2)(x-3) -0} +1(x-3) =0
(x-3)(x2-6x+9) =0
(x-3)3 =0
x = 3,3,3
p(x)= (x-3)3 is the characteristics polynomial of A
now for the minimal polynomial,
m(x)/p(x) = (x-3)3
m(x) = (x-3), (x-3)2, (x-3)
For m1(x) = x-3
m1(A) = A-3I
4 1 0 3 0 0
m1(A) = [−1 2 0] - [0 3 0]
1 1 3 0 0 3
1 1 0
= [−1 −1 0] ≠ 0
1 1 0
For m2(x) = (x-3)2
m2(A) = (A-3I)2
1 1 0 1 1 0
=[−1 −1 0] [−1 −1 0]
1 1 0 1 1 0

55 | P a g e
0 0 0
= [0 0 0] = 0
0 0 0
m(x) = (x-3)2 is the minimal polynomial of matrix A.

Now, Jordan Canonical form

Characteristic polynomial p(x) = (x-3)3

Minimal polynomial m(x) = (x-3)2

3 1 0
Jcf = [0 3 0]
0 0 3
2) Find the possible Jordan canonical form of a matrix where characteristics &
minimal polynomial are
a) P(x) = (x-2)7
m(x) = (x-2)3
solution:-
i) (3,2,2)
ii) (3,1,1,1,1)
iii) (3,3 1)
iv) (3,2,1,1)

2 1 0 0 0 0 0
0 2 1 0 0 0 0
0 0 2 0 0 0 0
Jcf = 0 0 0 2 1 0 0
0 0 0 0 2 0 0
0 0 0 0 0 2 1
[0 0 0 0 0 0 2]
b) P(x) = (x-2)4 (x-5)3
m(x) = (x-2)2 (X-5)3
Solution:-

56 | P a g e
i) (2,3,2)
ii) (2,3,1,1)
2 1 0 0 0 0 0
0 2 0 0 0 0 0
0 0 5 1 0 0 0
Jcf= 0 0 0 5 1 0 0
0 0 0 0 5 0 0
0 0 0 0 0 2 1
[0 0 0 0 0 0 2]
An application to system of differential equation
A function f of a real variable is said to be differentiable if its derivative exists and,
in this case, we let f ′ denote the derivative. If f and g are differentiable functions, a
system

f ′ = 3 f +5g

g ′ = −f +2g

is called a system of first order differential equations, or a differential system.

Solving a variety of problems, particularly in science and engineering, comes down

to solving a system of linear differential equations. Diagonalization enters into this
as follows. The general problem is to find differentiable functions f1, f2, ..., fn that
satisfy a system of equations of the form

f ′1 = a11 f1 + a12 f2 +···+ a1n fn

f ′2 = a21f1 + a22 f2 +···+ a2n fn

⋮ ⋮ ⋮ ⋮

f ′n = an1 f1 + an2 f2 +···+ ann fn

where the aij are constants. This is called a linear system of differential equations
or simply a differential system. The first step is to put it in matrix form. Write

57 | P a g e
𝑓1 𝑓1′ 𝑎11 𝑎12 ⋯ 𝑎1𝑛
𝑓 𝑓′ 𝑎21 𝑎22 ⋯ 𝑎2𝑛
f =[ 2 ] f′ =[ 2 ] A=[ ⋮ ⋮ ⋱ ⋮ ]
⋮ ⋮
𝑓𝑛 𝑓𝑛′ 𝑎𝑛1 𝑎𝑛2 … 𝑎𝑛𝑛

Then the system can be written compactly using matrix multiplication:

f′ = Af

Hence, given the matrix A, the problem is to find a column f of differentiable

functions that satisfies this condition. This can be done if A is diagonalizable. Here
is an example

Example:- Find a solution to the system

𝑓1′ = f1 +3 f2

𝑓2′ = 2 f1 +2 f2

that satisfies f1(0) = 0, f2(0) = 5.

𝑓 1 3
Solution:- This is f ′ = Af, where f = [ 1] and A =[ ] . The reader can verify
𝑓2 2 2
1 3
that cA(x) = (x−4)(x+1), and that x1 = [ ] and x2 =[ ] are eigenvectors
1 −2
corresponding to the eigenvalues 4 and −1, respectively. Hence the diagonalization
4 0 1 3
algorithm gives P −1AP =[ ] , where P = [x1 x2] = [ ] . Now consider
0 −1 1 −2
𝑔1
new functions g1 and g2 given by f = Pg (equivalently, g = P −1f ), where g =[𝑔 ]
2
Then
𝑓 1 3 𝑔1
[ 1] = [ ] [ ] , that is, 𝑓1= g1+ 3g2 and 𝑓1 = g1- 2g2
𝑓2 1 −2 𝑔2

Hence 𝑓1′ = g1+ 3g2 and 𝑓2′ = g1- 2g2 so that

𝑓1′ 1 3 𝑔1
𝑓′ = [ ′ ] = [ ] [ ] = P𝑔′
𝑓2 1 −2 𝑔2

If this is substituted in 𝑓 ′ = 𝐴𝑓, the result is P𝑔′ = 𝐴𝑃𝑔, whence0

58 | P a g e
𝑔′ = 𝑃−1𝐴𝑃𝑔

But this means that

𝑔1′ 4 0 𝑔1
[ ′] = [ ] [𝑔 ] , so 𝑔1′ = 4𝑔1 and 𝑔2′ = 4𝑔2
𝑔2 0 −2 2

Hence gives, g1(x) = ce4x, g2(x) = de-x, where c and d are constants, Finally then,
𝑓1(𝑥) 𝑔 (𝑥) 1 3 𝑐𝑒 4𝑥 4𝑥 −𝑥
[ ] = 𝑝[ 1 ] = [ ] [ −𝑥 ] = [𝑐𝑒 4𝑥 + 3𝑑𝑒 −𝑥 ]
𝑓2(𝑥) 𝑔2 (𝑥) 1 −2 𝑑𝑒 𝑐𝑒 − 2𝑑𝑒
So the general solution is

𝑓1 (𝑥 ) = 𝑐𝑒 4𝑥 + 3𝑑𝑒 −𝑥

𝑓2(𝑥 ) = 𝑐𝑒 4𝑥 − 2𝑑𝑒 −𝑥

It is worth observing that can be written matrix form as

𝑓1(𝑥) 1 3
[ ] = 𝑐 [ ]e4x + d [ ] e-x
𝑓2(𝑥) 1 −2
That is,

This form of the solution works more generally, as will be shown.

Finally, the requirement that f1(0) = 0 and f2(0) = 5 in this example determines the
constant c and d

0 = f1(0) = 𝑐𝑒 0 + 3𝑑𝑒 0 = c + 3d

0 = f2(0) = 𝑐𝑒 0 − 2𝑑𝑒 0 = c - 2d

These equation gives c =3 and d = -1, so

𝑓1 (𝑥 ) = 3𝑒 4𝑥 − 3𝑒 −𝑥

𝑓2(𝑥 ) = 3𝑒 4𝑥 + 2𝑒 −𝑥

59 | P a g e
Linear Dynamical System
If A is an n×n matrix, a sequence v0, v1, v2, ... of columns in R n is called a linear
dynamical system if v0 is specified and v1, v2, ... are given by the matrix
recurrence vk+1 = Avk for each k ≥ 0.

Graphical Description of Dynamical Systems

If a dynamical system vk+1 = Avk is given, the sequence v0, v1, v2,... is called the
trajectory of the system starting at v0. It is instructive to obtain a graphical plot of
𝑥𝑘
the system by writing vk = [𝑦 ] and plotting the successive values as points in the
𝑘
plane, identifying vk with the point (xk , yk) in the plane. We give several examples
which illustrate properties of dynamical systems. For ease of calculation we
assume that the matrix A is simple, usually diagonal.

Example:-
1
0 1 1
Let A = [ 2 1 ] Then the eigenvalues are and , with
2 3
0
3
1 0
corresponding eigenvectors x1 =[ ] and x2 = [ ]. The exact
0 1
1 𝑘 1 1 𝑘 1
formula is vk= b1( ) [ ] + 𝑏2 ( ) [ ] for k= 0,1,2 ...
2 0 2 0
, where the coefficients b1 and b2 depend on the initial
point v0. Several trajectories are plotted in the diagram and, for each choice of v0,
the trajectories converge toward the origin because both eigenvalues are less than 1
in absolute value. For this reason, the origin is called an attractor for the system.
3
0 3 4
Let A = [ 2 4 ] . Here the eigenvalues are and , with
2 3
0
3
1 0
corresponding eigenvectors x1 = [ ] and x2 = [ ] as
0 1
3 1 4 1
before. The exact formula is vk = b1( )𝑘 [ ] + 𝑏2 ( )𝑘 [ ]
2 0 3 0
for k= 0,1,2 ... Since both eigenvalues are greater than 1 in

60 | P a g e
absolute value, the trajectories diverge away from the origin for every choice of
initial point V0. For this reason, the origin is called a repellor for the system.
1
1 − 3 1
2
Let A = [ 1 ] . Here the eigenvalues are and ,
2 2
− 1
2
−1 1
with corresponding eigenvectors x1 = [ ] and x2 = [ ]
1 1
3 𝑘 −1 1 𝑘 1
the exact formula is vk = b1( ) [ ] + 𝑏2 ( ) [ ] for k=
2 1 2 1
3
0,1,2 ... In this case is the dominant eigenvalue so, if
2
3 −1
b1≠ 0,we have vk ≈ b1( )𝑘 [ ] for large k and vk is approaching the line y = −x.
2 1
1 1
however, if b1 = 0, then vk = b2( )𝑘 [ ]and so approaches the origin along the line
2 1
y = x. In general the trajectories appear as in the diagram, and the origin is called a
saddle point for the dynamical system in this case.

Conclusion:-
In this unit, we learned the concept of eigenvalues and eigenvectors, characteristic
equation and diagonalizable and triangular operator. Then we saw Jordan canonical
form and application related to eigenvalues and eigenvectors i.e, (system of
differential equation, linear dynamical system).

61 | P a g e
Chapter- 5
Inner Product space
5.1) Inner product on Rn
The inner product on Rn generalizes the notion of the dot product of vectors in R2
and R3 that you may are already familiar with.

Definition:- Let u = (u1, u2, . . . , un) and let v = (v1, v2, . . . , vn) be vectors in Rn .
The inner product of u and v is

u . v = u1v1 + u2v2 + · · · + unvn.

Notice that the inner product u . v can be computed as a matrix multiplication as
follows:
𝑉1
𝑉
u . v = uT v = [u1 u2 · · · un] [ 2 ].
⋮
𝑉𝑛

The following theorem summarizes the basic algebraic properties of the inner
product.

Let u, v, w be vectors in R n and let α be a scalar. Then

(a) u.v=v.u
(b) (u + v) . w = u . w + v . w
(c) (αu) . v = α(u . v) = u . (αv)
(d) u . u ≥ 0, and u . u = 0 if and only if u = 0

then, it is Inner Product Space.

2 3
Example:- let u = [−5] and v = [ 2 ] . compute u.v and v.u .
−1 −3

62 | P a g e
3
Solution:- u .v = u v = [2 −5
T
−1] [ 2 ]
−3
= (2)(3) + (-5)(2)+(-1)(-3) = -1
2
v . u = v u = [3 2
T
−3] [−5]
−1
= (3)(2) + (2)(-5)+(-3)(-1) = -1
It clear from the calculation that u . v = v . u.
Definition:- The length or norm of a vector u ∈ Rn is defined as

‖𝒖‖ = √𝒖. 𝒖 = √𝒖𝟐𝟏 + 𝒖𝟐𝟐 +. . . + 𝒖𝟐𝒏

A vector u ∈ Rn with norm 1 will be called a unit vector:

‖𝒖‖ = 1.

Theorem:- Let u ∈ Rn and let α be a scalar. Then ‖𝛼𝑢‖ = |α|‖𝑢‖.

Proof:- We have

‖𝛼𝑢‖ = √(αu). (αu)

= √𝛼 2(u . u)

= |α| √u . u

= |α|‖𝑢‖.

Example:-
Let u = (2, 3, 6). Compute ‖𝑢‖ and find the unit vector v in the same direction as u.

Solution:-. By definition,

‖𝑢‖= √u . u

63 | P a g e
= √22 + 32 + 62
= √49 = 7
Then the unit vector that is the same direction as u is
2

2 7
1 1 3
V = ‖ ‖ 𝑢 = [3] =
𝑢 7 7
6 6
[7]
Verify that ‖𝑣‖ = 1

2 3 6
‖𝑣‖=√( )2 + ( )2 + ( )2
7 7 7

4 9 3
=√ + +
49 49 49

49
=√ =1
49

Definition:- Let u and v be vectors in R n .The distance between u and v is the

length of the vector u − v. We will denote the distance between u and v by d(u, v).
In other words,

d(u, v) =‖𝐮 − 𝐯‖
3 7
Example:- Find the distance between u =[ ] and v = [ ] .
−2 −9
Solution:- we compute:

d(u,v) = √(3 − 7)2 + (−2 + 9)2 = √65

5.2) Orthogonality
Definition:- Two vector u and v in Rn are said to be orthogonal if u.v = 0.

64 | P a g e
Theorem:- (Pythagorean Theorem) Two vectors u and v are orthogonal if and
only if ‖𝑢 + 𝑣‖2 = ‖𝑢‖2 + ‖𝑣‖2 .

Proof:- First recall that ‖𝑢 + 𝑣‖ = √(𝑢 + 𝑣 ). (𝑢 + 𝑣) and therefore

‖𝑢 + 𝑣‖2 = (u + v). (u + v)

= u. u + u. v + v. u + v. v

= ‖𝑢‖2 + 2(u . v) + ‖𝑣‖2 .

Therefore, ‖𝑢 + 𝑣‖2 = ‖𝑢‖2 + ‖𝑣‖2 if and only if u . v = 0.

Definition:- A set of vectors {u1, u2, . . . , up} is said to be an orthogonal set if

any pair of distinct vectors ui , uj are orthogonal, that is, ui . uj = 0 whenever i≠ j.

Example:- Show that {u1, u2 , u3} is an orthogonal set, where

1
3 −1 −
2
u1=[1] , u2=[ 2 ] 𝑎𝑛𝑑 u3=[−2]
7
1 1
2

Solution:- u1.u2 = 3(-1) +1(2) +1(1) = 0

1 7
u1.u3 = 3(- ) +1(-2) +1( ) = 0
2 2
1 7
u2.u3 = -1(- ) +2(-2) +1( ) = 0
2 2

each pair of distinct vector is orthogonal and so {u1, u2 , u3} is an orthogonal set.
Note:- If S = {u1, u2, . . . , up} is an orthogonal set of nonzero vectors in Rn , then S
is linearly independent and hence is a basis for the subspace spanned by S.

Definition:- An orthogonal basis for a subspace W of R n is a basis for W that is

also an orthogonal set.

Theorem:- Let {u1, u2, . . . , up} be an orthogonal basis for a subspace W of Rn .

For each y in W, the weights in the linear combination
y =c1u1 + . . . + cpup
are given by

65 | P a g e
𝑦.𝑢𝑗
cj = (j = 1, . . ., p)
𝑢𝑗 .𝑢𝑗

Proof:- As in the preceding proof, the orthogonality of {u1, u2, . . . , up} shows that
y.u1 = ( c1u1 + c2u2 + . . . + cpup)u1 = c1.(u1 .u1)
Since u1.u1 is not zero, the equation above can be solved for c 1. To find cj for
j = 2, . . . , p, compute y .uj and solve for cj.

Example:- The set S = {u1, u2 ,u3} in above example is an orthogonal basis for
6
3
R . Express the vector y = [ 1 ] as a linear combination of the vector in S.
−8
Solution:- compute
y.u1 = 11 y. u2 = -12 y. u3 = -33
u1. u1 = 11 u2 . u2 = 6 u3.u3 = 33/2
by above theorem ,
𝑦.𝑢1 𝑦.𝑢2 𝑦.𝑢3
y= u1 + u2 + u3
𝑢1 .𝑢1 𝑢2 .𝑢2 𝑢3 .𝑢3
11 12 33
y= u1 − u2 − 33 u3
11 6
2

= u1-2u2 -2u3

Definition:- Orthonormal Sets

A set {u1, u2, . . . , up} is an orthonormal set if it is an orthogonal set of unit
vectors. If W is the subspace spanned by such a set, then {u1, u2, . . . , up} is an
orthonormal basis for W.

5.3) The Gram-Schmidt Process

Given a basis {u1, u2, . . . , up } for a nonzero subspace W of Rn , define
v1 =x1
𝑥2 . 𝑣1
v2 = x2 - 𝑣1
𝑣1 .𝑣1

66 | P a g e
𝑥3 . 𝑣1 𝑥3 . 𝑣2
v3 = x3 – 𝑣1 – 𝑣2
𝑣1 .𝑣1 𝑣2 .𝑣2

⋮
𝑥𝑝 . 𝑣1 𝑥𝑝 . 𝑣2 𝑥𝑝 . 𝑣𝑝−1
vp = xp – 𝑣1 – 𝑣2 − . . . − 𝑣𝑝−1
𝑣1 .𝑣1 𝑣2 .𝑣2 𝑣𝑝−1 .𝑣𝑝−1

Then {v1, v2, . . . , vp} is an orthogonal basis for W. in addition

Span {v1, v2, . . . , vp} = span {u1, u2, . . . , up } for 1≤ 𝑘 ≤ 𝑝
3 1
Example:- let W = span{x1 x2} , where x1 = [6] and x2 = [2] .construct an
0 2
orthogonal basis { v1 v2}
Solution:- The subspace W , along with x1, x2, and the projection p of x2 onto x1.
The component of x2 orthogonal to x1 is x2 –p , which is in W because it is formed
from x2 and a multiple of x1.
3
Let v1 = x1 and i.e., v1 = [6]
0
1 3 0
𝑥2 . 𝑥1 15
V2 = x2 – p = x2 - 𝑥 𝑥 𝑣1
𝑥1 = [2] - [6] = [0]
45
1
2 0 2
Thus {v1 v2} is an orthogonal set of nonzero vector in W. since dim W = 2 , the
set { v1 v2} is basis for W.

Application to linear models

A common task in science and engineering is to analyze and understand
relationships among several quantities that vary. This section describes a variety of
situations in which data are used to build or verify a formula that predicts the value
of one variable as a function of other variables. In each case, the problem will
amount to solving a least squares problem.
For easy application of the discussion to real problems that you may encounter
later in your career, we choose notation that is commonly used in the statistical
analysis of scientific and engineering data. Instead of Ax= b, we write X𝛽 = y and
refer to X as the design matrix, 𝛽 as the parameter vector, and y as the observation
vector.

67 | P a g e
Least-Squares Lines
The simplest relation between two variables x and y is the linear equation
y = 𝛽0 + 𝛽1 𝑥. Experimental data often produce points (x1, y1),…, (xn ,yn) that,
when graphed, seem to lie close to a line. We want to determine the parameters 𝛽0
and 𝛽1 that make the line as “close” to the points as possible.
Suppose 𝛽0 and 𝛽1 are fixed, and consider the line y = 𝛽0 + 𝛽1 𝑥 in Figure.
Corresponding to each data point (xj , yj ) there is a point (xj , 𝛽0 + 𝛽1 𝑥 ) on the
line with the same x-coordinate. We call yj the observed value of y and 𝛽0 + 𝛽1 𝑥
the predicted y-value (determined by the line). The difference between an observed
y-value and a predicted y-value is called a residual.

There are several ways to measure how “close” the line is to the data. The usual
choice (primarily because the mathematical calculations are simple) is to add the
squares of the residuals. The least-squares line is the line y=𝛽0 + 𝛽1 𝑥 that
minimizes the sum of the squares of the residuals. This line is also called a line of
regression of y on x, because any errors in the data are assumed to be only in the
y-coordinates. The coefficients 𝛽0 . 𝛽1 of the line are called (linear) regression
coefficients.
If the data points were on the line, the parameters 𝛽 0 and 𝛽 1 would satisfy the
equations

68 | P a g e
We can write this system as
1 𝑥1 𝑦1
1 𝑥2 𝛽 𝑦2
X𝛽 = 𝑦 , where X = [ ] , = [ 0] , y = [ ⋮ ] …(1)
⋮ ⋮ 𝛽1
1 𝑥𝑛 𝑦𝑛
For calculation of least square solution of Ax= b satisfies the equation
ATAx= AT b

Example:- find the equation y=𝛽0 + 𝛽1 𝑥 of the least equation line that best fits
the data points (2,1) ,(5,2), (7,3) and (8,3).
Solution:- Use the x-coordinates of the data to build the design matrix X in (1) and
the y-coordinates to build the observation vector y:
1 2 1
1 5 2
X=[ ], y=[ ]
1 7 3
1 8 3
For the least-squares solution of X𝛽 = 𝑦, obtain the normal equations (with the
new notation):
XTX𝛽 = 𝑋𝑇 𝑦

1 2
1 1 1 1 1 5 4 22
i.e., XTX =[ ][ ]= [ ]
2 5 7 8 1 7 22 142
1 8
1
1 1 1 1 2 9
𝑋𝑇 𝑦 = [ ][ ] = [ ]
2 5 7 8 3 57
3

69 | P a g e
The normal equation are
4 22 𝛽0 9
[ ][ ] = [ ]
22 142 𝛽1 57
2
𝛽 −1
4 22 9 1 142 −22 9 1 24
Hence [ 0 ] = [ ] [ ]= [ ] [ ] = [ ]= [75 ]
𝛽1 22 142 57 84 −22 4 57 84 30
14
2 5
Then the least squares line the equation, y= + 𝑥
7 14

Conclusion:-
In this unit, we have come across the concept of Inner product space, norm,
orthogonality (orthogonal set and orthonormal set), Gram Schmidt orthogonality
process and some application of linear model(least square line).

70 | P a g e
Chapter- 6
Application of Linear Algebra

6.1) Application to output and input Economic model

In 1973 Wassily Leontief was awarded the Nobel prize in economics for his work
on mathematical models(“input–output” (or “production”) model).
Suppose a nation’s economy is divided into n sectors that produce goods or
services, and let x be a production vector in Rn that lists the output of each sector
for one year. Also, suppose another part of the economy (called the open sector)
does not produce goods or services but only consumes them, and let d be a final
demand vector (or bill of final demands) that lists the values of the goods and
services demanded from the various sectors by the nonproductive part of the
economy. The vector d can represent consumer demand, government consumption,
surplus production, exports, or other external demands.
As the various sectors produce goods to meet consumer demand, the producers
themselves create additional intermediate demand for goods they need as inputs
for their own production. The interrelations between the sectors are very complex,
and the connection between the final demand and the production is unclear.
Leontief asked if there is a production level x such that the amounts produced (or
“supplied”) will exactly balance the total demand for that production, so that
{amount produced(x) }= {intermediate demand} +{ final demand(d)}
The basic assumption of Leontief’s input–output model is that for each sector,
there is a unit consumption vector in Rn that lists the inputs needed per unit of
output of the sector (Prices of goods and services are held constant.)
EXAMPLE:- Suppose an economy consists of the Coal, Electric (power), and
Steel sectors, and the output of each sector is distributed among the various sectors
as shown in Table 1, where the entries in a column represent the fractional parts of
a sector’s total output.
The second column of Table 1, for instance, says that the total output of the
Electric sector is divided as follows: 40% to Coal, 50% to Steel, and the remaining
10% to Electric. (Electric treats this 10% as an expense it incurs in order to operate
71 | P a g e
its business.) Since all output must be taken into account, the decimal fractions in
each column must sum to 1.
Denote the prices of the total annual outputs of the Coal, Electric, and Steel sectors
by pC, pE, and pS, respectively. If possible, find equilibrium prices that make each
sector’s income match its expenditures.

Solution:- A sector looks down a column to see where its output goes, and it looks
across a row to see what it needs as inputs. For instance, the first row of Table 1
says that Coal receives (and pays for) 40% of the Electric output and 60% of the
Steel output. Since the respective values of the total outputs are p E and pS, Coal
must spend .4pE dollars for its share of Electric’s output and .6pS for its share of
Steel’s output. Thus Coal’s total expenses are .4pE + .6pS. To make Coal’s income,
pC, equal to its expenses, we want

pC = .4pE + .6pS

72 | P a g e
The second row of the exchange table shows that the Electric sector spends .6pC
for coal, .1pE for electricity, and .2pS for steel. Hence the income/expense
requirement for Electric is

pE = .6pC +.1pE +.2pS

Finally, the third row of the exchange table leads to the final requirement:

pS = .4pC + .5pE +.2pS

To solve the system of equations, move all the unknowns to the left sides of the
equations and combine like terms.

PC - .4PE - .6PS =0

-.6PC + .9PE - .2PS = 0

- .4PC - .5PC + .8PS =0

Row reduction is next. For simplicity here, decimals are rounded to two places.
2 3
1 −.4 −.6 0 10 −4 −6 0 1 − − 0
[−.6 . 9 −.2 0] ~ [−6 9 −2 0] ~ [ 5 5 ]~
−6 9 −2 0
−.4 −.5 8 0 −4 −5 8 0
−4 −5 8 0
2 3 2 3
2 3 1 − − 0 1 − − 0
1 − − 0 5 5 5 5
5 5 33 28 28
33 28 ~ 0 − 0 ~ 0 1 − 0
0 − 0 5 5 33
5 5 33 28 33 28
[−4 −5 8 0] [0 −
5 8
0] [0 −
5 8
0]

31
1 0 − 0 31
33 1 0 − 0
28 33 1 0 −0.94 0
0 1 − 0 ~ 28 ~ [0 1 −0.84 0]
33 0 1 − 0 0 0 0 0
33 28 33
[0 0 0 0]
[0 −
5 8
0]

The general solution is pC = .94pS, pE = .84pS, and pS is free. The equilibrium price
vector for the economy has the form
73 | P a g e
𝑃𝐶 . 94𝑃𝑆 0.94
P =[𝑃𝐸 ] = [. 85𝑃𝑆 ] = 𝑃𝑆 [0.85]
𝑃𝑆 𝑃𝑆 1
Any (nonnegative) choice for pS results in a choice of equilibrium prices. For
instance, if we take pS to be 100 (or 100 million), then pC = .94 and pE = .84. The
incomes and expenditures of each sector will be equal if the output of Coal is
priced at 94 million, that of Electric at 84 million, and that of Steel at 100 million.

6.2) An Application of Markov chain

Many natural phenomena progress through various stages and can be in a variety
of states at each stage. For example, the weather in a given city progresses day by
day and, on any given day, may be sunny or rainy. Here the states are “sun” and
“rain,” and the weather progresses from one state to another in daily stages.
Another example might be a football team: The stages of its evolution are the
games it plays, and the possible states are “win,” “draw,” and “loss.”

The general setup is as follows: A “system” evolves through a series of “stages,”

and at any stage it can be in any one of a finite number of “states.” At any given
stage, the state to which it will go at the next stage depends on the past and present
history of the system that is, on the sequence of states it has occupied to date.

Definition:- A Markov chain is such an evolving system wherein the state to

which it will go next depends only on its present state and does not depend on
the earlier history of the system.

Even in the case of a Markov chain, the state the system will occupy at any stage is
determined only in terms of probabilities. In other words, chance plays a role. For
example, if a football team wins a particular game, we do not know whether it will
win, draw, or lose the next game. On the other hand, we may know that the team
tends to persist in winning streaks; for example, if it wins one game it may win the
1 4 1
next game of the time, lose of the time, and draw of the time. These
2 10 10
fractions are called the probabilities of these various possibilities.

The probability that a given event will occur is the long run proportion of the time
that the event does indeed occur. Hence, all probabilities are numbers between 0

74 | P a g e
and 1. A probability of 0 means the event is impossible and never occurs; events
with probability 1 are certain to occur. If a Markov chain is in a particular state, the
probabilities that it goes to the various states at the next stage of its evolution are
called the transition probabilities for the chain.

Example:- A man always eats lunch at one of two restaurants, A and B. He never
eats at A twice in a row. However, if he eats at B, he is three times as likely to eat
at B next time as at A. Initially, he is equally likely to eat at either restaurant.

a. What is the probability that he eats at A on the third day after the initial one?
b. What proportion of his lunches does he eat at A?
Solution:- The table of transition probabilities follows. The A column indicates
that if he eats at A on one day, he never eats there again on the next day and so is
certain to go to B.

Present lunch
Next A 0 0.25
lunch B 1 0.75
The B column shows that, if he eats at B on one day, he will eat there on the next
3 1
day of the time and switches to A only of the time. The restaurant he visits on a
4 4
given day is not determined. The most that we can expect is to know the
probability that he will visit A or B on that day.
(𝑚)
𝑠 (𝑚)
Let Sm = [ 1(𝑚)] Denotes the state vector for m. Here 𝑠1 denotes the probability
𝑠2
(𝑚)
that the eats at A on day m , and 𝑠2 is the probability that he eats at B on day m.
It is convenient to let s0 correspond to the initial day. Because he is equally likely
to eat at A or B on that initial day,
(0) (0) 0.5
𝑠1 = 0.5 and 𝑠2 = 0.5 , so S0= [ ]
0.5
0 0.25
Now let p= [ ] denote the transition matrix . we claim that the relationship
1 0.75
Sm+1= psm

75 | P a g e
Hold for all integers m≥ 0. This will be derived later , for now , we use it followsw
to successively compute s1, s2, s3,….
0 0.25 0.5 0.125
S1= pS0 = [ ][ ] = [ ]
1 0.75 0.5 0.875
0 0.25 0.125 0.21875
S2= pS1 = [ ][ ]=[ ]
1 0.75 0.875 0.78125
0 0.25 0.21875 0.1953125
S3= pS3 = [ ][ ]= [ ]
1 0.75 0.78125 0.8046875
Hence, the probability that his third lunch (after the initial one) is at A is
approximately 0.195, whereas the probability that it is at B is 0.805. If we carry
these calculations on, the next state vectors are (to five figures):
0.20117 0.19971
S4 = [ ] S5 = [ ]
0.79883 0.80029
0.20007 0.19998
S6 = [ ] S7 = [ ]
0.79993 0.80002
Moreover, as m increases the entries of sm get closer and closer to the
0.2
corresponding entries of [ ] . Hence, in the long run, he eats 20% of his lunches
0.8
at A and 80% at B.

The system evolves through various stages and at each

stage can be in exactly one of n distinct states. It
progresses through a sequence of states as time goes
on. If a Markov chain is in state j at a particular stage
of its development, the probability pij that it goes to
state i at the next stage is called the transition
probability. The n × n matrix P = [pij] is called the
transition matrix for the Markov chain. The situation
is depicted graphically in the diagram.

p1j + p2j +···+ pnj = 1

76 | P a g e
for each j Thus, the columns of P all sum to 1 and the entries of P lie between 0
and 1. Hence P is called a stochastic matrix.
(𝑚)
𝑠1
(𝑚)
Then n x1 matrix, Sm = 𝑠2 m=0,1 ,2,…..
⋮
(𝑚)
[𝑠𝑛 ]

Are called the state vector for the markov cahin.

Let P be the transition matrix for an n-state Markov chain. If Sm is the state vector
at stage m, then

sm+1 = Psm for each m = 0, 1, 2, ....

6.3) An Application to computer graphics

Computer graphics deals with images displayed on a computer screen, and so
arises in a variety of applications, ranging from word processors, to Star Wars
animations, to video games, to wire-frame images of an airplane. These images
consist of a number of points on the screen, together with instructions on how to
fill in areas bounded by lines and curves. Often curves are approximated by a set of
short straight-line segments, so that the curve is specified by a series of points on
the screen at the end of these segments. Matrix transformations are important here
because matrix images of straight line segments are again line segments. Note that
a colour image requires that three images are sent, one to each of the red, green,
and blue phosphorus dots on the screen, in varying intensities.

Consider displaying the letter A. In reality, it is depicted on the screen, as in Figure

by specifying the coordinates of the 11 corners and filling in the interior. For
simplicity, we will disregard the thickness of the letter, so we require only five
coordinates as in Figure.

77 | P a g e
This simplified letter can then be stored as a data matrix

Vertex 1 2 3 4 5
0 6 5 1 3
D=[ ]
0 0 3 3 9
where the columns are the coordinates of the vertices in order. Then if we want to
transform the letter by a 2×2 matrix A, we left-multiply this data matrix by A (the
effect is to multiply each column by A and so transform each vertex).

For example, we can slant the letter to the right by multiplying by an x-shear
1 0.2
matrix A = [ ] . The result is the letter with data matrix
0 1

1 0.2 0 6 5 1 3
A= [ ][ ]
0 1 0 0 3 3 9
0 6 5.6 1.6 4.8
=[ ]
0 0 3 3 9

If we want to make this slanted matrix narrower, we can now apply an xscale
0.8 0
matrix B = [ ] that shrinks the x-coordinate by 0.8. The result is the
0 1
composite transformation

0.8 0 1 0.2 0 6 5 1 3
BAD = [ ][ ][ ]
0 1 0 1 0 0 3 3 9
0 4.8 4.48 1.28 3.84
=[ ]
0 0 3 3 9

78 | P a g e
𝜋
On the other hand, we can rotate the letter about the origin through (or 30◦ ) by
6
𝜋 𝜋
𝜋 cos − sin 0.866 −0.5
multiplying by the matrix R ( ) = [ 𝜋6 𝜋
6
]= [ ].
6 sin cos 0.5 0.866
6 6

This gives ,
𝜋 0.866 −0.5 0 6 5 1 3
R( ) = [ ][ ]
6 0.5 0.866 0 0 3 3 9
0 5.196 2.83 −0.634 −1.902
= [ ]
0 3 5.098 3.098 9.294
As plotted in figure .

Homogenous coordinates
Each point (x,y) in R2 can be identified with the point (x, y, 1) on the plane in R3
that lies one unit above the xy-plane. We say that (x, y) has homogeneous
coordinates (x, y, 1). For instance, the point (0, 0) has homogeneous coordinates
(0, 0, 1). Homogeneous coordinates for points are not added or multiplied by
scalars, but they can be transformed via multiplication by 3 x 3 matrices.

Composite Transformations
The movement of a figure on a computer screen often requires two or more basic
transformations. The composition of such transformations corresponds to matrix
multiplication when homogeneous coordinates are used.

Let 3 x 3 matrix that corresponds to the composite transformation of a scaling by

.3, a rotation of 90° about the origin, and finally a translation that adds (-5, 2) to
each point of figure.
𝜋
If ∅ = , then sin ∅ = 1 and cos ∅ = 0. We have
2

𝑥 .3 0 0 𝑥
[𝑦 ] → 𝑠𝑐𝑎𝑙𝑒 → [ 0 . 3 0] [𝑦]
1 0 0 1 1

79 | P a g e
0 −1 0 .3 0 0 𝑥
→ 𝑅𝑜𝑡𝑎𝑡𝑒 → [1 0 0] [ 0 .3 0] [𝑦]
0 0 1 0 0 1 1
→ 𝑡𝑟𝑎𝑛𝑠𝑙𝑎𝑡𝑒 →
1 0 −5 0 −1 0 .3 0 0 𝑥
[0 1 2 ] [1 0 0] [ 0 . 3 0] [𝑦]
0 0 1 0 0 1 0 0 1 1

The matrix for the compute transformation is

1 0 −5 0 −1 0 . 3 0 0
[0 1 2 ] [1 0 0] [ 0 . 3 0]
0 0 1 0 0 1 0 0 1
0 −1 −5 . 3 0 0 0 −3 −5
= [1 0 2 ][0 .3 0] = [. 3 0 2]
0 0 1 0 0 1 0 0 1

Perspective Projection
A three-dimensional object is represented on the two-dimensional computer screen
by projecting the object onto a viewing plane. For simplicity, let the xy-plane
represent the computer screen, and imagine that the eye of a viewer is along the
positive z-axis, at a point (0, 0, d ). A perspective projection maps each point (x,
y, z) onto an image point (x*,y* ,0) so that the two points and the eye position,
called the center of projection, are on a line.

80 | P a g e
The triangle in the xz-plane in Figure is redrawn in another showing the lengths of
line segments. Similar triangles show that
x∗ 𝒙 𝑑𝑥 𝑥 𝑥
= and 𝑥 ∗ = = =
d 𝒅−𝒛 𝑑−𝑧 𝑑−𝑧 1−𝑧
𝑑
𝑦
𝑦∗ = 𝑧
1−𝑑

Using homogeneous coordinates, we can represent the perspective projection by a

𝑥 𝑦
matrix, say, P. We want (x, y, z,1) to map into ( 𝑧 , 𝑧 , 0,1). Scaling these
1− 1−
𝑑 𝑑
𝑧 𝑧
coordinate by 1- , we can also use (x, y, 0, 1- ) as homogenous coordinate for the
𝑑 𝑑
image. Now it easy to display P. In fact,

𝑥 1 0 0 0 𝑥 𝑥
𝑦 0 1 0 0 𝑦 𝑦
P[ ] = 0 0 0 [
0 𝑧 ] = [ 0 ]
𝑧 𝑧
1
1 [0 0 − 1] 1 1−
𝑑 𝑑

6.4) PCA of image comparison

Principal Components Analysis (PCA) is a way of identifying patterns in data, and
expressing the data in such a way as to highlight their similarities and differences.
It is one of several statistical tools available for reducing the dimensionality of a
data set based on calculating eigenvectors and eigenvalues of the input data.
Since patterns in data can be hard to find in data of high dimension, where the
luxury of graphical representation is not available, PCA is a powerful tool for
analyzing data. The other main advantage of PCA is that once you have found
these patterns in the data, and you compress the data, i.e. by reducing the number
of dimensions, without much loss of information.

Principal Component Analysis – PCA was used for the recognition of patterns and
compression of digital images used in Medicine. The description of Principal
Component Analysis is made by means of the explanation of eigenvalues and
eigenvectors of a matrix. This concept is presented on a digital image collected in

81 | P a g e
the clinical routine of a hospital, based on the functional aspects of a matrix. The
analysis of potential for recovery of the original image was made in terms of the
rate of compression obtained.

Principal Components Analysis (PCA) is a mathematical formulation used in the

reduction of data dimensions. Thus, the PCA technique allows the identification of
standards in data and their expression in such a way that their similarities and
differences are emphasized. Once patterns are found, they can be compressed, i.e.,
their dimensions can be reduced without much loss of information.

Use of the PCA technique in data dimension reduction is justified by the easy
representation of multidimensional data, using the information contained in the
data covariance matrix, principles of linear algebra and basic statistics. The studies
carried out by Mashal (2005) adopted the PCA formulation in the selections of
images from a multimedia database.

MRI (Magnetic Resonance Imaging) image compression using PCA

The steps normally followed in a PCA of a digital image can now be established:
𝑥1
𝑥2
Step 1: In the computational model of a digital image, in equation X = [ ⋮ ], the
𝑥𝑝
variables X1, X2 ,...,Xp are the columns of the image. The PCA is begun by coding
(correcting) the image to that its columns have zero means and unitary variances.
This is common, in order to avoid one or the other of the columns having undue
influence on the principal components

image corrected by the mean = image – mean of the image

Step 2: The covariance matrix C is calculated using equation

1
Sik = ∑𝑛𝑗=0(𝑥𝑗𝑖 − 𝑥̅𝑖 )(𝑥𝑗𝑘 − 𝑥̅𝑘 ) implemented computationally, that is:
𝑛

covImage = image corrected by the mean × (image corrected by the mean)T

Step 3: The eigenvalues l1 ,l2 ,...,lp and the corresponding eigenvectors a , a2 ,..., ap
are calculated.

82 | P a g e
Step 4: The value of a vector of characteristics is obtained, a matrix with vectors
containing the list of eigenvectors (matrix columns) of the covariance matrix.

vc = (av1 , av2 , av3 ,..., avn )

Step 5: The final data are obtained, that is, a matrix with all the eigenvectors
(components) of the covariance matrix.

finaldata = vcT × (Image - mean)T

Step 6: The original image is obtained from the final data without compression
using the expression Image

T = (vc)T × finaldata + meanT

Step 7: Any components that explain only a small portion of the variation in data
for the effect of image compression are discarded. The eliminations have the effect
of reducing the quantity of eigenvectors of the characteristics vectors and can
produce final data with a smaller dimension.

 Compression ration
According to Castro (Castro, 2010), low-loss compression afforded by the present
method may be expressed in terms of the compression factor of (r) and of the mean
squared error (MSE) committed in the approximation of A (original image) by Ã
(image obtained from the disposal of some of the components) (Gonzalez and
Woods, 1992). The compression factor is defined by:
unit of memory required to represent 𝐴̃
ρ=
𝑢𝑛𝑖𝑡 𝑜𝑓 𝑚𝑒𝑚𝑜𝑟𝑖𝑎𝑙 𝑟𝑒𝑞𝑢𝑖𝑟𝑒𝑑 𝑡𝑜 𝑟𝑒𝑝𝑡𝑟𝑒𝑠𝑒𝑛𝑡 𝐴

And the MSE committed in the approximation of A by Ã is:

𝑀𝑆𝐸 = ∑𝑖−1 ̅𝑖 − 𝑎𝑖 )𝑇 (𝑎̅𝑖 − 𝑎𝑖 )

𝑖=0((𝑎

83 | P a g e
Example:-
1) Recovering a TIFF image with 512*512 pixels with all the components
(512) of image covariance matrix (without compression, i.e., steps 1 to 6).

2) Recovery of a TIFF image with 512x512 pixels with 112 principal

components of the covariance matrix of the image (with compression, that
is, steps from 1 to 5 to 7).

84 | P a g e
Conclusion:-
In last unit, I took some major application of linear algebra i.e., An application to
output and input economic model, An application of Markov chain, an application
to computer graphics, PCA of image comparison.

85 | P a g e
REFERENCE
1) Linear Algebra. Vivek Sahai, Vikas Bist. Second Edition.
2) Linear Algebra and Its Applications. David C. Lay, Steven R. Lay and Judi
J. McDonald 2003.
3) Linear Algebra with Applications. W. Keith Nicholson
4) Linear Algebra through Matrices Arbind K. Lal and Sukant Pati.
5) Bhattacharya P.B., Jain S.K. and Nagpaul S.R., First Course in Linear
Algebra, Wiley Eastern Ltd., 1991
6) Friedberg S.H, Insel A.J. and Spence L.E., Linear Algebra, 4th Edition,
Prentice-Hall of India, New Delhi, 2004.
7) Lay D.C., Linear Algebra and Its application, 3rd edition, Pearson
Education(Singapore) Pvt. Ltd., Delhi, 2003
8) Linear Algebra - An Introduction with Applications, Authors: Raymond A.
Barnett and Michael R. Ziegler.
9) Gilbert Strang - "Introduction to Linear Algebra".

10) Kolman, Bernard; Hill, David R. (2007), Elementary Linear Algebra with
Applications.

86 | P a g e

Common questions