16-03-2025
Analytic geometry
► We have studied vector spaces in the previous lecture.
► Now we would like to provide some geometric interpretation to these
concepts.
► We shall take a close look at geometric vectors and the concepts of
Lecture 3 lengths of vectors and angles between vectors.
Math Foundations Team ► But first we need to add the concept of an inner product to our
By vector space.
Dr. Sanjay Yadav
Norms
The length or the magnitude of the vector is known as vector norm or vector
magnitude.
In mathematics, a function is defined on a vector space that maps each vector
to its magnitude (a scalar quantity).
In Machine learning/data science, we use them to measure the distance
between multiple data points in different dimensions.
16-03-2025
16-03-2025
Inner products Symmetric, positive-definite matrices
Theorem: For a real-valued, finite-dimensional vector space V and an ordered
► A positive-definite, symmetric bilinear mapping Ω : V × V → R is called an
basis B of V, it holds that ⟨., .⟩ : V × V → R is an inner product if and only if
inner product. To denote an inner product on V we generally write ⟨x , y ⟩.
there exists a symmetric, positive definite matrix A ∈ Rn×n with ⟨x , y ⟩ = xˆTAyˆ.
► The pair (V, ⟨., .⟩) is called an inner product space.
► Can a symmetric, positive-definite matrix have less than full rank?
► Next we introduce the concept of symmetric, positive-definite matrices and
show we can express an inner product using such matrices. ► We have x T Ax > 0 for all non-zero x . Thus x = 0 is the only vector allowed in
the nullspace. The nullspace is 0-dimensional so A has full rank.
► We recall that in a vector space V any vector x can be written as linear
combination of the basis vectors. We use this to express an inner product in ► What can be said about the diagonal elements of a positive-definite matrix?
terms of a matrix. From (ei )T Aei > 0 where ei is the ith canonical basis vector, we see that Aii >
0. Thus the diagonal entries are all strictly positive.
16-03-2025
Lengths and distances
Consider the matrix
1 2 1 2
3 4 1 6
► Inner products and norms are closely related in the sense that 5 4 1 0
any inner product induces a norm: x = √ ⟨x , x ⟩ (1) The row vectors of A are
► Not every norm is induced by an inner product, for example r1 (1, 2, 1, 2), r2 (3, 4, 1, 6), r3 (5, 4, 1, 0)
the Manhattan norm.
► For an inner product vector space (V, ⟨., .⟩), the induced norm These vectors span a subspace of R4 called the row space of A.
. satisfies the Cauchy-Schwarz inequality: ⟨x , y ⟩ ≤ x y . (2) The column vectors of A are
Why is this true?
1 2 1 2
c1 3 c 2 4 c3 1 c 4 6
5 4 1 0
These vectors span a subspace of R3 called the column space of A.
Orthonormal Vectors and Projections
Find a basis for the row space of the following matrix A, and
determine its rank.
1 2 3
A 2 5 4 Definition
1 1 5
A set of vectors in a vector space V is said to be an orthogonal
set if every pair of vectors in the set is orthogonal. The set is
Use elementary row operations to find a reduced echelon form said to be an orthonormal set if it is orthogonal and each vector
of the matrix A. We get is a unit vector.
1 2 3 1 2 3 1 0 7
2 5 4 0 1 2 0 1 2
1 1 5 0 1 2 0 0 0
The two vectors (1, 0, 7), (0, 1, 2) form a basis for the row
space of A. Rank(A) = 2.
16-03-2025
Example 1 Theorem
3 4 4 3
Show that the set (1, 0, 0), 0, , , 0, , is an orthonormal set. An orthogonal set of nonzero vectors in a vector space is
5 5 5 5
linearly independent.
(1) orthogonal:
(1,0,0) 0, 35 , 54 0; Proof Let {v1, …, vm} be an orthogonal set of nonzero
(1,0,0) 0, 54 , 53 0; vectors in a vector space V. Let us examine the identity
0, 53 , 45 0, 54 , 35 0;
Ch04_17
c 1 v 1 + c 2 v 2 + … + c mv m = 0
Let vi be the ith vector of the orthogonal set. Take the dot
(2) unit vector: product of each side of the equation with vi and use the
(1,0,0) 12 0 2 0 2 1 properties of the dot product. We get
(c1 v1 c2 v 2 cm v m ) v i 0 v i
0, 35 , 45 0 35 45 1
2 2 2
c1 v1 v i c2 v 2 v i cm v m v i 0
Since the vectors v1, …, v2 are mutually orthogonal, vj‧vi = 0
0, 45 , 35 0 45 53 1
2 2 2
unless j = i. Thus c v v 0
i i i
Since vi is a nonzero, then vi‧vi 0. Thus ci = 0.
Thus the set is thus an orthonormal set. Letting i = 1, …, m, we get c1 = 0, cm = 0, proving that the
vectors are linearly independent.
Example 2
Definition The following vectors u1, u2, and u3 form an orthonormal basis
A basis that is an orthogonal set is said to be an orthogonal basis. for R3. Express the vector v = (7, 5, 10) as a linear combination
A basis that is an orthonormal set is said to be an orthonormal of these vectors.
3 4 4 3
basis. u1 (1, 0, 0), u 2 0, , , u 3 0, ,
5 5 5 5
Standard Basis Solution v u1 (7, 5, 10) (1, 0, 0) 7
• R2: {(1, 0), (0, 1)} 3 4
• R3: {(1, 0, 0), (0, 1, 0), (0, 0, 1)} orthonormal bases v u 2 (7, 5, 10) 0, , 5
5 5
• Rn: {(1, …, 0), …, (0, …, 1)}
4 3
v u 3 (7, 5, 10) 0, , 10
Theorem 5 5
Let {u1, …, un} be an orthonormal basis for a vector space V. Thus
Let v be a vector in V. v can be written as a linearly combination v 7u1 5u 2 10u 3
of these basis vectors as follows:
v ( v u1 )u1 ( v u 2 )u 2 ( v u n )u n
16-03-2025
Projection of One vector onto Another Vector
Let v and u be vectors in Rn with angle (0 ) between them. Definition
The projection of a vector v onto a nonzero vector u in Rn is
Figure 4.17 OA : the projection of v onto u denoted projuv and is defined by
OA OB cos || v || cos v u
proju v u
v u v u uu
|| v ||
|| v || || u || || u ||
v u u v u
OA ( )( ) u
|| u || || u || u u
v u
Note : If / 2 then 0.
u u
v u
So we define proju v u.
u u O
Example 3 Theorem
Determine the projection of the vector v = (6, 7) onto the vector The Gram-Schmidt Orthogonalization Process
u = (1, 4). Let {v1, …, vn} be a basis for a vector space V. The set of
Solution vectors {u1, …, un} defined as follows is orthogonal. To obtain
v u (6, 7) (1, 4) 6 28 34 an orthonormal basis for V, normalize each of the vectors u1, …,
u u (1, 4) (1, 4) 1 16 17 un .
u1 v1
Thus v u 34
proju v u (1, 4) (2, 8) u 2 v 2 proju1 v 2
u u 17
u 3 v 3 proju1 v 3 proju 2 v 3
The projection of v onto u is (2, 8).
u n v n proju1 v n proju n 1 v n
16-03-2025
Example 4
The set {(1, 2, 0, 3), (4, 0, 5, 8), (8, 1, 5, 6)} is linearly independent in The set {(1, 2, 0, 3), (2, 4, 5, 2), (4, 1, 0, 2)} is an orthogonal basis
R4. The vectors form a basis for a three-dimensional subspace V of R4. for V.
Construct an orthonormal basis for V. Normalize them to get an orthonormal basis:
Solution (1, 2, 0, 3) 12 2 2 0 2 32 14
Let v1 = (1, 2, 0, 3), v2 = (4, 0, 5, 8), v3 = (8, 1, 5, 6)}. (2, 4, 5, 2) 2 2 (4) 2 52 2 2 7
Use the Gram-Schmidt process to construct an orthogonal set
{u1, u2, u3} from these vectors. (4, 1, 0, 2) 4 2 12 0 2 (2) 2 21
Let u1 v1 (1, 2, 0, 3) orthonormal basis for V:
(v2 u2 ) 1 2 3 2 4 5 2 4 1 2
Let u 2 v 2 proju1 v 2 v 2 u1 (2, 4, 5, 2) , , 0, , , , , , , , 0,
(u1 u1 ) 14 14 14 7 7 7 7 21 21 21
Let u 3 v 3 proju1 v 3 proju 2 v 3
( v 3 u1 ) (v u )
v3 u1 3 2 u 2 (4, 1, 0, 2)
(u1 u1 ) (u 2 u 2 )
16-03-2025
Cauchy-Schwarz inequality Metric space
► Consider an inner product space (V, ⟨., .⟩). Define d(x , y) the
distance between two vectors x and y to be
d(x , y) = x − y = √ ⟨x − y , x − y ⟩.
► If we use the dot product as the inner product, then the
distance is called the Euclidean distance.
► The mapping d : V × V → R is called a metric.
Properties of a metric space
A metric d has the following properties:
► d is positive-definite which means d(x , y) ≥ 0 ∀x , y ∈ V.
d(x , y) = 0 =⇒ x = y.
► d is symmetric which means d(x , y) = d(y , x ) ∀x , y ∈ V.
► d obeys the triangle inequality as follows:
d(x , z) ≤ d(x , y) + d(y , z) ∀x , y , z ∈ V
Inner products and metrics seem to be very similar in terms of their
properties - however there is one important difference. When x and y
are close to each other the inner product is large but the distance metric
is small. On the other hand when x and y are far apart, then the inner
product is small but the distance metric is large.
16-03-2025
Angles and orthogonality Angles and orthogonality
► In addition to being able to capture the lengths of vectors and the ► Since the Cauchy-Schwarz ratio lies between -1 and 1 we can set
distance between vectors, inner products can also capture the it equal to the cosine of a unique angle ω ∈ [0, π] such that
angle ω between two vectors and can thus capture the geometry
of a vector space. ⟨x , y ⟩
cos(ω) = (2)
► The key to using the inner product to characterize the angle x y
between two vectors is the Cauchy-Schwarz inequality. ► The angle ω is the angle between two vectors. What does it
► Assume that x and y are not the 0 vector. Then the capture?
Cauchy-Schwarz inequality tells us that ► The notion of angle captures similarity of orientation between
two vectors. When the dot product is close to zero, the vectors
x Ty
−1 ≤ ≤1 (1) are more or less pointing in the same direction and
x y
ω ≈ 0.
Example - angles and orthogonality Example - angles and orthogonality
► Continuing with our example we have
► Consider the vectors x = [1, 1]T = [−1, 1]T
and y
► With respect to the inner product defined as a dot product we
see that ⟨x , y ⟩ = x T y = 1 ∗ −1+ 1 ∗1 = 0.
16-03-2025
Orthonormal matrix Orthonormal matrix
► A square matrix A ∈ Rn×n is an orthogonal matrix if and only if
its columns are orthonormal: ► Transformations by an orthonormal matrix preserve lengths.
This can be seen as follows, using the dot product as the
AT A = I = AAT definition of the inner product:
AT = A−1 Ax 2 = (Ax )T Ax = x T AT Ax = x T I x = x T x.
► An example of an orthonormal matrix is the 2D-rotation
► If the columns of a matrix are orthonormal, why are its rows
orthonormal too? matrix which can be expressed as cosθ −sinθ where θ is
sinθ cosθ
► This follows from the fact that the left-inverse of a square the angle of rotation.
matrix is the same as the right-inverse.
► Let A be a square matrix with B and C the left and right
inverses of A: BA = I = AC =⇒ B = C . Why is this true?
Orthonormal matrix Orthonormal basis
► We already looked at the concept of a basis of a vector space, and found
► Also the angle between two vectors x and y does not change
after transformation by an orthonormal matrix. This can be that for the vector space Rn we need n basis vectors.
seen as follows:
► Our basis vectors needed only to be linearly independent - we can
(Ax )T Ay
cos(ω) = ensure linear independence by ensuring that our basis vectors point in
Ax Ay
x T AT Ay different directions, so that a linear combination of n − 1 basis vectors
=
x y cannot cancel out the nth basis vector.
x Ty
= ► Now we will look at a special case of a basis where the vectors are all
x y
mutually orthogonal in the sense of the inner product, and each vector is
of unit length. We call such a basis an orthonormal basis.
16-03-2025
Orthonormal basis Gram-Schmidt process
► Given a set of basis vectors for a vector space, can we convert
the given basis into an orthogonal basis? Yes, we shall use
► Question: Can you immediately think of an orthonormal basis for Rn? Is an Gaussian elimination to construct such a basis.
orthonormal basis for a vector space unique?
► Let us start with an example: Consider R2 and two basis
► Formal definition of an orthonormal basis: vectors v1 = (3, 1)T and v2 = (2, 2)T . Put these vectors into
► Consider an n-dimensional vector space V and n basis vectors 3 2
columns of a matrix A such that A = .
{b1, b2, . . . bn}. 1 2
If it is true that ∀i, j = 1, . . . n, i =
̸ j ⟨bi , bj ⟩ = 0 and ⟨bi , bi ⟩ = 1, ► The next step is to perform Gaussian elimination on the
then the basis is called an orthonormal basis. 10 8| 3 1
► If the basis vectors are only mutually orthogonal but not of length unity, following augmented matrix: [ATA|AT ] =
8 8| 2 2
then we have an orthogonal basis. ► On performing Gaussian elimination of this augmented matrix
1 0.8| 0.3 0.1
we end up with
0 1| −0.25 0.75
Gram-Schmidt process Elementary transformations
► Note that after the completion of Gaussian elimination the two rows on the
right hand side are orthogonal. They form a basis for R2. We can normalize
the vectors to get an orthonormal basis.
► What is the justification for this technique?
► First we see that when the m × n matrix A has full column rank, then the
matrix AT A is positive definite. To see this note that any solution x to Ax =
0 is also a solution to AT Ax = 0 and vice-versa. Why is this the case?
► When A is full rank, there are no non-trivial solutions to Ax = 0. Thus the
fact that there are no non-trivial solutions to Ax = 0 means that ∀x ∈ Rm, x ̸= 0,
x T Ax > 0.
16-03-2025
Product of elementary transformations Final argument
► A series of Gaussian elimination steps can be represented as a product of ► Returning to our problem we are performing Gaussian elimination on the
elementary transformations acting on A: matrix ATA where A contains the basis vectors as its columns. Upon
EmEm−1 . . . E1A. Gaussian elimination on the augmented matrix we reduce [ATA|AT] to
► The product of lower triangular matrices can be seen to be lower triangular, get [U|L−1AT where AT A = LU .
and the inverse of a lower triangular matrix can also be seen as a lower ► Now we shall show that Q T = L−1AT is an orthogonal matrix whose rows are
triangular matrix. orthogonal.
► Thus the action of Gaussian elimination operations can be seen in the ► Consider Q T Q = L−1AT A ( L −1 ) T = U ( L −1 ) T = some upper triangular
following terms L−1A = U where the product of the elementary matrix
transformations is represented as the inverse of a lower triangular matrix for
► But Q T Q is a symmetric matrix and can only be upper triangular if it is
notational convenience, and the right hand side U is an upper triangular
diagonal. Therefore Q is an orthogonal matrix whose columns are
matrix. Thus we have A = LU .
orthogonal. They can be normalized to obtain an orthonormal basis.
Angles and orthogonality Angles and orthogonality
► A key feature of the inner product is that we can use it to
► Food for thought: Suppose we choose vectors x and y uniformly characterize vectors that are orthogonal.
at random in high dimensions. What happens to the dot product ► Two vectors x and y are orthogonal if and only if the inner
product between them is 0. For an orthogonal pair of vectors
between the vectors and hence the angle between them?
x , y we can write x ⊥ y.
► To choose a vector uniformly at random over a sphere let ► By the above definition the 0-vector is orthogonal to
every component in the vector be an independent Gaussian all vectors.
random variable of mean 0 and unit variance. ► Vectors which are othrogonal with respect to one inner
product need not be orthogonal with respect to another inner
► Write a small program to see what happens ... product.
16-03-2025
BITS Pilani, Pilani Campus