Inner Products
M Narasimha Murty
Professor, Dept. of CSA
Indian Institute of Science, Bengaluru
mnm@[Link]
August-December 2017
Piazza Page: [Link]
1 / 24
Inner Products1
• The notion of a vector space is only algebraic. The vectors here do
not have any direction or magnitude.
• We would like to study vector spaces in which it makes sense to speak
of the length of a vector and of the angle between two vectors. We
shall do this by studying a certain type of scalar-valued function on
pairs of vectors, known as an inner product.
• One example of an inner product is the scalar or dot product of
vectors in ℜ3 . The scalar product of the vectors
α = (x1 , x2 , x3 ) and β = (y1 , y2 , y3 ) in ℜ3 is the real number
< α, β >= x1 y1 + x2 y2 + x3 y3 .
• Geometrically, this dot product is the product of the length of α, the
length of β, and the cosine of the angle between α and β. It is
therefore possible to define the geometric concepts of length and
angle in ℜ3 by means of the algebraically defined scalar product.
[1] K. Hoffman and R. Kunze, Linear Algebra, Second Edition, PHI, 2000
2 / 24
Inner Products1
Example 5.1 To begin with, consider !the vector space R2 over
x1
the field R. Each element x = in R2 can be identified
x2
with the notion of a vector that we learn in Physics by the di-
rected arrow from the origin to the point (x1 , x2 ) in the plane.
Y
3 (x1 , x2 )
2
1
0 X
0 1 2 3
( ) ( )
x1 y
• Definition: Given α = , and β = 1 ∈ ℜ2 , the dot product
x2 y2
or inner product denoted by < α, β > of these two vectors is
< α, β >= x1 y1 + x2 y2
• We restrict to length and orthogonality (perpendicularity) of vectors.
[1] R. Vittal Rao, Lecture Notes, Linear Algebra, Fall 2010 ([Link] 3 / 24
Inner Products
• In classification
1 In both SVMs and Perceptrons, the classifier is of the form
Wt X + b > 0 ⇒ X ∈ C+ and Wt X + b < 0 ⇒ X ∈ C− .
2 In general, we need to compute Wa .X where Wa and X are vectors of
size d + 1. Wt X = W.X is the dot product between W and X.
3 Even neural networks involve computations of the form Wt X.
• In dimensionality reduction one of methods is based on Random
projections. Let A be the data matrix of size n × l and take a matrix,
R, of random numbers whic is of size l × k then the matrix C given by
C = AR is of size n × k. Some of the properties of A and C are:
1 C represents data in a reduced dimensional space because k < l.
2 Ai .Aj ≈ Ci .Cj ∀ i, j; it preserves dot products between pairs of vectors.
• In clustering similarity between two patterns X and Y is characterized
using dot product.
4 / 24
Inner Products
• < α, α >= x21 + x22 . It can be seen that < α, α > ≥ 0, ∀α ∈ ℜ2 and
< α, α >= 0 if and only if α = θ2 where θ2 = 0 ∈ ℜ2 .
√
• So, < α, α > = Euclidean distance of (x1 , x2 ) from the origin (0, 0).
∑
• Extend to ℜk : for α, β ∈ ℜk , < α, β >= ki=1 xi yi .
• For α ∈ Ck , < α, α > may be complex; so, < α, α > ≥ 0 may not
make sense.
∑
• So, for α, β ∈ Ck , < α, β > is defined as ki=1 xi ȳi .
• The inner product of w = (w1 , · · · , wn ) ∈ Cn with z ∈ Cn is
w1 z¯1 = · · · + wn z¯n .
• The inner product of w with z equals the complex conjugate of the
inner product of z with w. √
• If λ = a + bi, where a, b ∈ ℜ, then | λ |= a2 + b2 is its absolute
value. Complex conjugate λ̄ = a − bi ⇒| λ |2 = λλ̄.
• We are now ready to define an inner product on V, which may be a
real or a complex vector space.
5 / 24
Inner Products
• Definition: Let F be the field of real numbers or the field of complex
numbers, and V a vector space over F. An inner product on V is a
function which assigns to each ordered pair of vectors α, β ∈ V a
scalar < α, β >∈ F in such a way that ∀α, β, γ ∈ V and all scalars c
1 < α + β, γ >=< α, γ > + < β, γ >;
2 < cα, β >= c < α, β >;
3 < β, α >= < α, β >, the bar denoting complex conjugation;
4 < α, α > > 0 if α ̸= 0. It should be observed that conditions (1), (2),
and (3) imply that
5 < α, cβ + γ >= c̄ < α, β > + < α, γ >.
• < α, cβ + γ >= < cβ + γ, α > = c < β, α > + < γ, α >
= c < β, α > + < γ, α > = c̄ < α, β > + < α, γ >
• When F is the field ℜ of real numbers, the complex conjugates
appearing in (3) and (5) are superfluous; however, in the complex
case they are necessary for the consistency of the conditions. Without
these complex conjugates, we would have the contradiction:
< α, α > > 0 and < iα, iα >= −1 < α, α > > 0. 6 / 24
Inner Products: Examples
• α, β, γ ∈ V ⇒< α, β + γ >= < β + γ, α > = < β, α > + < γ, α >
< β, α > + < γ, α > =< α, β > + < α, γ >.
• EXAMPLE 1: The standard inner product on Fn for
∑
α = (x1 , · · · , xn ) and β = ∑
(y1 , · · · , yn ) is < α, β >= j xj ȳj when
F = ℜ, this is < α, β >= j xj yj .
• In the real case, the standard inner product is often called the dot or
scalar product and denoted by α.β.
• EXAMPLE 2: For α = (x1 , x2 ) and β = (y1 , y2 ) ∈ ℜ2 , let
< α, β >= x1 y1 − x2 y1 − x1 y2 + 4x2 y2 .
Since < α, α >= (x1 − x2 )2 + 3x22 , it follows that
< α, α > > 0 if α ̸= 0. Conditions (1), (2), and (3) of the definition
are easily verified.
7 / 24
Inner Products
• Suppose V is a complex vector space with an inner product. Then for
all α, β ∈ V < α, β >= Re < α, β > +i Im < α, β > where
Re < α, β > and Im < α, β > are the real and imaginary parts of the
complex number < α, β >.
• If z is a complex number, then Im(z) = Re(−iz). It follows that
Im < α, β >= Re[−i < α, β >] = Re < α, iβ >
• So, the inner product is given by its real part
(8.2) < α, β >= Re < α, β > +i Re < α, iβ >
• It may be useful to know that an inner product on a real or complex
vector space is determined by another function, the so-called
quadratic form determined by the inner product.
• To define it, we first denote the positive square root of < α, α > by
||α||; ||α|| is called the norm of α with respect to the inner product.
• By looking at the standard inner products in ℜ, C, ℜ2 , and ℜ3 , it is
appropriate to think of the norm of α as the length or magnitude of α.
8 / 24
Norms
• The quadratic form determined by the inner product is the function
that assigns to each vector α the scalar ||α||2 . It follows that
||α ± β||2 = ||α||2 ± 2Re(< α, β >) + ||β||2 for all vectors α and β.
• < α + β, α + β >=< α, α + β > + < β, α + β >=< α, α > + <
α, β > + < β, α > + < β, β >= ||α||2 + ||β||2 + < α, β > +< α, β >
= ||α||2 + ||β||2 + 2 Re (< α, β >).
• In the real case
1 1
(8.3) < α, β >= ||α + β||2 − ||α − β||2
4 4
• For complex case use (8-2) (< α, β >= Re < α, β > +i Re < α, iβ >)
to obtain the expression
(8.4) < α, β >= 14 ||α + β||2 − 14 ||α − β||2 + 4i ||α + iβ||2 − 4i ||α − iβ||2 .
9 / 24
Norms
• Note that ||α|| = 0 if and only if α = 0 (because < α, α >= 0 if and
only if α = 0). Another property of the norm is that ||cα|| =| c | ||α||
for all c ∈ F and all α ∈ V.
• Proof: ||cα||2 =< cα, cα >= c < α, cα >= cc̄ < α, α >=| c |2 ||α||2 ,
taking square roots now gives the desired results.
• Two vectors α and β in V are said to be orthogonal if < α, β >= 0.
Note that the order of the vectors does not matter because
< u, v >= 0 if and only if < v, u >= 0. Instead of saying that u and v
are orthogonal, sometimes we say that u is orthogonal to v.
• Clearly 0 is orthogonal to every vector. Furthermore, 0 is the only
vector that is orthogonal to itself.
• For the special case where V = ℜ2 , we have If α, β ∈ V are
orthogonal, then
||α + β||2 = ||α||2 + ||β||2 .
10 / 24
Pythagoras Theorem
• Suppose that α, β are orthogonal vectors in V. Then
||α + β||2 =< α + β, α + β >= ||α||2 + ||β||2 + < α, β > + < β, α >.
• Note that < α, β > and < β, α > are equal to 0 as α and β are
orthogonal.
• So ||α + β||2 =< α + β, α + β >= ||α||2 + ||β||2 .
• Parallelogram Law: ||α + β||2 + ||α − β||2 = 2(||α||2 + ||β||2 )
• Proof: We have ||α + β||2 =< α + β, α + β >=< α, α >
+ < β, β > + < α, β > + < β, α >
||α − β||2 =< α − β, α − β >=< α, α > + < β, β > − < α, β >
− < β, α >
• Adding the two and using the fact that
||α||2 =< α, α >; ||β||2 =< β, β >, we get the identity.
11 / 24
Inner Product Spaces
• definition: An inner product space is a real or complex vector
space, together with a specified inner product on that space.
• A finite-dimensional real inner product space is often called a
Euclidean space. A complex inner product space is often referred to
as a unitary space.
• Theorem 1: If V is an inner product space, then for any vectors
α, β ∈ V and any scalar c
1 ||cα|| =| c | ||α||;
2 ∥|α|| > 0 for α ̸= 0;
3 < α, β >≤ ||α|| ||β||;
4 ||α + β|| ≤ ||α|| + ||β||.
• Note that ||z||2 = z1 z¯1 + · · · + zn z¯n . So, if zi is a complex number of
the form a + bi, then
|| z ||2 = (a1 + b1 i)(a1 − b1 i) + · · · + (an + bn i)(an − bn i) =
a21 + b21 + · · · + a2n + b2n > 0 for α ̸= 0.
• The inequality in (3) is clearly valid when α = 0. If α ̸= 0, then let
γ = β − <β,α>
||α||2
α. (Note that < γ, α >= 0.)
12 / 24
Inner Product Spaces
• < γ, α >=< β, α > − <β,α>
||α||2
< α, α >=< β, α > − < β, α >= 0
<β,α>
• 0 ≤ ||γ||2 =< β − ||α||2
α, β − <β,α>
||α||2
α >=< β, β >
+ <β,α><α,β>
||α||2
− <β,α><α,β>
||α||2
− <β,α><α,β>
||α||2
= ||β||2 − |<α,β>|
2
||α||2
• This is because in case of reals < α, β >=< β, α > and in case of
complex field, < β, α >= < α, β >. So, in either case
< α, β >< β, α >=|< α, β >|2 .
• Hence |< α, β >|2 ≤ ||α||2 ||β||2 .
• Now using this, we find that
||α + β||2 = ||α||2 + < α, β > + < β, α > +||β||2
= ||α||2 + 2Re < α, β > +||β||2 ≤ ||α||2 + 2||α|| ∥|β|| + ||β||2
= (||α|| + ||β||)2 .
• Thus, ||α + β|| ≤ ||α|| + ||β||. (Triangle Inequality).
13 / 24
Cauchy-Schwarz Inequality
• The inequality in (3), < α, β >≤ ||α|| ||β||, is called the
Cauchy-Schwarz inequality.
• It has a wide variety of applications. The proof shows that if (for
example) α is non-zero, then |< α, β >|< ||α|| ||β|| unless
< β, α >
β= α.
||α||2
• Thus, equality occurs in (3) if and only if α and β are linearly
dependent.
• Definition: Let α and β be vectors in an inner product space V.
Then α is orthogonal to β if < α, β >= 0; since this implies β is
orthogonal to α, we often simply say that α and β are orthogonal.
• Definition: If S is a set of vectors in V, S is called an orthogonal set
provided all pairs of distinct vectors in S are orthogonal.
• Definition: An orthonormal set is an orthogonal set S with the
additional property that ||α|| = 1 for every α ∈ S.
14 / 24
Orthogonal Sets
• The zero vector is orthogonal to every vector in V and is the only
vector with this property. It is appropriate to think of an orthonormal
set as a set of mutually perpendicular vectors, each having length 1.
• Example 8: The standard basis of either ℜn or Cn is an orthonormal
set with respect to the standard inner product.
• Example 9: The vector (x, y) in ℜ2 is orthogonal to (−y, x) with
respect to the standard inner product, for
< (x, y), (−y, x) >= −xy + yx = 0.
• Theorem 2: An orthogonal set of non-zero vectors is linearly
independent. Proof: Let S be a finite or infinite orthogonal set of
non-zero vectors in a given inner product space. Suppose
α1 , α2 , · · · , αm are distinct vectors in S and that
β = c1 α1 + c2∑ α2 + · · · + cm αm . ∑
then
< β, αk >= ( j < cj αj , αk >= j cj < αj , αk >= ck < αk , αk > .
• Since < αk , αk ≯= 0, it follows that ck = <β,α k>
||αk ||2
, 1 ≤ k ≤ m.
Thus when β = 0, each ck = 0; so S is an independent set. 15 / 24
Orthogonal Sets
• Corollary: If a vector β is a linear combination of an orthogonal
sequence of non-zero vectors α1 , · · · , αm , then β is the particular
linear combination
∑
m
< β, αk >
(8.8) β= αk
||αk ||2
k=1
• If {α1 , · · · , αm } is an orthogonal set of non-zero vectors in a
finite-dimensional inner product space V, then m ≤ dim V. This says
that the number of mutually orthogonal directions in V cannot exceed
the algebraically defined dimension of V.
• So geometric dimension of V is not greater than its algebraic
dimension. The fact that these two dimensions are equal is a
particular corollary of the next result.
16 / 24
Orthogonal Sets
• Theorem 3: Let V be an inner product space and let β1 , · · · , βn be
any independent vectors in V. Then one may construct orthogonal
vectors α1 , · · · , αn ∈ V such that for each k = 1, 2, · · · , n the set
{α1 , · · · , αk } is a basis for the subspace spanned by β1 , · · · , βk .
• Proof: The vectors α1 , · · · , αn will be obtained by means of a
construction known as the Gram-Schmidt orthogonalization
process.
• First let α1 = β1 . The other vectors are then given inductively as
follows:
• Suppose α1 , · · · , αm (1 ≤ m < n) have been chosen so that for every
k {α1 , · · · , αk }, 1 ≤ k ≤ m is an orthogonal basis for the subspace of
V that is spanned by β1 , · · · , βk .
• To construct the next vector αm+1 , let
∑
m
< βm+1 , αk >
(8.9) αm+1 = βm+1 − αk
||αk |||2
k=1
17 / 24
Gram-Schmidt orthogonalization Process
• Then αm+1 ̸= 0. For otherwise βm+1 is a linear combination of
α1 , · · · , αm and hence a linear combination of β1 , · · · , βm . If
1 ≤ j ≤ m, < αm+1 , αj >=< βm+1 , αj >
∑
− m k=1
<βm+1 ,αk >
||αk ||2
< αk , αj >=< βm+1 , αj > − < βm+1 , αj >= 0.
• Therefore {α1 , · · · , αm+1 } is an orthogonal set consisting of m + 1
nonzero vectors in the subspace spanned by β1 , · · · , βm+1 . By
Theorem 2, it is a basis for this subspace. Thus the vectors
α1 , · · · , αn , may be constructed one after the other in accordance
with (8-9). In particular, when n = 4, we have
α1 = β1 ; α2 = β2 − <β 2 ,α1 >
||α1 ||2
α1 ; α3 = β3 − <β 3 ,α1 >
||α1 ||2
α1 − <β 3 ,α2 >
||α2 ||2
α2
α4 = β4 − <β 4 ,α1 >
||α1 ||2
α1 − <β 4 ,α2 >
||α2 ||2
α2 − <β 4 ,α3 >
||α3 ||2
α3 .
• Corollary: Every finite-dimensional inner product space has an
orthonormal basis.
Proof: Let V be a finite-dimensional inner product space and
{β1 , · · · , βn } a basis for V. Apply the Gram-Schmidt process to
construct an orthogonal basis {α1 , · · · , αn }. Then to obtain an 18 / 24
Gram-Schmidt orthogonalization Process
orthonormal basis, simply replace each vector αk by ||ααkk || .
• If B = {α1 , · · · , αn } is an orthonormal basis for V, then for any
∑ ∑ ∑
scalars xi and yk , < j xj αj , k yk αk >= j xj ȳj .
• Example 12: Let β1 = (3, 0, 4); β2 = (−1, 0, 7); β3 = (2, 9, 11).
Applying the Gram-Schmidt process to β1 , β2 , β3 , α1 = (3, 0, 4)
α2 = (−1, 0, 7) − <(−1,0,7),(3,0,4)>
25 (3, 0, 4)
= (−1, 0, 7) − (3, 0, 4) = (−4, 0, 3); α3 = (2, 9, 11) −
<(2,9,11),(3,0,4)>
25 (3, 0, 4) − <(2,9,11),(−4,0,3)>
25 (−4, 0, 3) = (0, 9, 0)
• These vectors are non-zero and mutually orthogonal. Hence
{α1 , α2 , α3 } is an orthogonal basis for ℜ3 . To express (x1 , x2 , x3 ) in
ℜ3 as a linear combination of α1 , α2 , α3 one can use (8.8).
(x1 , x2 , x3 ) = 3x125
+4x2
α1 + −4x251 +3x2
α2 + x92 α3 .
In particular, (1, 2, 3) = 35 (3, 0, 4) + 15 (−4, 0, 3) + 29 (0, 9, 0).
• The orthonormal basis corresponding to α1 , α2 , α3 is
1 1
5 (3, 0, 4), 5 (−4, 0, 3), (0, 1, 0).
19 / 24
Orthogonal projection
• Suppose W is a subspace of an inner product space V, and let β be an
arbitrary vector in V. The problem is to find a best possible
approximation to β by vectors in W. This means we want to find a
vector α for which || β − α || is as small as possible subject to the
restriction that α should belong to W.
• A best approximation to β by vectors in W is a vector α in W such
that || β − α ||≤|| β − γ || for every vector γ ∈ W.
• Theorem 4: Let W be a subspace of an inner product space V and
let β ∈ V.
1 The vector α ∈ W is a best approximation to β by vectors in W if and
only if β − α is orthogonal to every vector in W.
2 If a best approximation to β by vectors in W exists, it is unique.
3 If W is finite-dimensional∑ and {α1 , · · · , αn } is any orthonormal basis for
W, then the vector α = k <β,α k>
||α2 ||2 αk is the (unique) best
approximation to β by vectors in W.
Proof: First note that if γ is any vector in V, then
β − γ = (β − α) + (α − γ), and
|| β − γ ||2 =|| β − α ||2 +2 Re < β − α, α − γ > + || α − γ ||2 20 / 24
Orthogonal projection
• Now suppose β − γ is orthogonal to every vector in W, that γ is in W
and that γ ̸= α. Then since α − γ is in W,
|| β − γ ||2 =|| β − α ||2 + || α − γ ||2 >|| β − γ ||2 .
• Conversely, suppose that || β − γ ||≥|| β − α || for every γ ∈ W. Then
from the first equation above it follows that
2 Re < β − α, α − γ > + || α − γ ||2 ≥ 0 ∀γ ∈ W.
• Since every vector in W may be expressed in the form α − γ with
γ ∈ W, we see that 2 Re < β − α, τ > + || τ ||2 ≥ 0 for every τ ∈ W.
• If γ ∈ W and γ ̸= α, we may take τ = − <β−α>,α−γ>
||α−γ||2
(α − γ). Then
the inequality reduces to
|< β − α >, α − γ >|2 |< β − α >, α − γ >|2
−2 + ≥0
|| α − γ ||2 || α − γ ||2
.
• This holds if and only if (β − α, α − γ >= 0. Therefore, β − α is
orthogonal to every vector in W.
21 / 24
Orthogonal projection
• This proves the equivalence of the two conditions in (1). The
orthogonality condition is satisfied by at most one vector in W, which
proves (2).
• Now suppose that W is a finite-dimensional subspace of V. Then we
know, as a corollary of Theorem 3, that W has an orthogonal basis.
Let {α1 , · · · , αn } be any orthogonal basis for W.
• Then, by the computation in the proof of Theorem 3, β − α is
orthogonal to each of the vectors αk (β − α is the vector obtained at
the last stage when the orthogonalization process is applied to
α1 , α2 , · · · , αn , β).
• Thus β − α is orthogonal to every linear combination of
α1 , · · · , αn , i.e., to every vector in W.
• If γ is in W and γ ̸= α, it follows that || β − γ ||>|| β − α ||.
Therefore, α is the best approximation to β that lies in W.
22 / 24
Orthogonal Complement
• Definition: Let V be an inner product space and S any set of vectors
in V. The orthogonal complement of S is the set S⊥ of all vectors in
V which are orthogonal to every vector in S.
• The orthogonal complement of V is the zero subspace, and conversely
{0}⊥ = V.
• If S is any subset of V, its orthogonal complement S⊥ (S perp) is
always a subspace of V.
• For S is non-empty, since it contains 0; and whenever α and β are in
S⊥ and c is any scalar,
< (cα + β, γ >= c < α, γ > + < β, γ >= c0 + 0 = 0
for every γ ∈ S, thus cα + β also lies in S.
• In Theorem 4 the characteristic property of the vector α is that it is
the only vector in W such that β − α belongs to W⊥ .
23 / 24
Orthogonal Projection
• Definition: Whenever the vector α in Theorem 4 exists it is called
the orthogonal projection of β on W. If every vector in V has an
orthogonal projection on W, the mapping that assigns to each vector
in V its orthogonal projection on W is called the orthogonal
projection of V on W.
• By Theorem 4, the orthogonal projection of an inner product space on
a finite-dimensional subspace always exists. But Theorem 4 also
implies the following result.
• Corollary: Let V be an inner product space, W a finite-dimensional
subspace, and E the orthogonal projection of V on W. Then the
mapping β → β − Eβ is the orthogonal projection of V on W⊥ .
• Proof: Let β be in any vector in V. Then β − Eβ is in W⊥ , and for
any γ in W⊥ , β − γ = Eβ + (β − Eβ − γ). Since β − Eβ − γ ∈ W⊥ and
Eβ ∈ W, || β − γ ||2 =|| Eβ ||2 + || β − Eβ − γ ||2 ≥|| β − (β − Eβ ||2
• with strict inequality whenγ ̸= β − Eβ. Therefore, β − Eβ is the best
approximation to β by vectors in W⊥ .
24 / 24