Vectors in Machine Learning
1. Vector:
A vector is a mathematical object that encodes a length (magnitude) and direction.
Formally, vectors are elements of a vector space: a collection of objects closed under
vector addition and scalar multiplication.
Representation: Vectors are typically represented as a 1D array of numbers
(components):
o Row vector: 𝐯 = [𝑣1 , 𝑣2 , … , 𝑣𝑛 ] ∈ ℝ𝑛 , where 𝑣𝑖 ∈ ℝ.
𝑣1
𝑣2
o Column vector: 𝐯 = [ ⋮ ] ∈ ℝ𝑛 .
𝑣𝑛
Geometrically, vectors represent coordinates in an 𝑛-dimensional space (e.g., arrows
from the origin in 2D or 3D).
Simple geometric view: An arrow with origin, direction, and magnitude (length ∥ 𝐯 ∥=
√𝑣12 + ⋯ + 𝑣𝑛2 ).
o One-dimensional arrays (row or column vectors) in n-dimensional space (ℝⁿ).
o Components are real numbers.
Examples in ℝ𝑛 :
o For 𝑛 = 2 (ℝ2 = ℝ × ℝ): 𝐯 = (1,2). – components along x and y axes.
o For 𝑛 = 3: 𝐯 = (1,2,3) – components in x, y, z directions.
2. Vector Algebra
Addition: Component-wise for vectors of the same dimension.
𝐮 + 𝐯 = (𝑢1 + 𝑣1 , 𝑢2 + 𝑣2 , … , 𝑢𝑛 + 𝑣𝑛 ).
o Example in ℝ2 : Let 𝐮 = (1,3), 𝐯 = (1,1). Then 𝐮 + 𝐯 = (2,4).
o Geometrically: Parallelogram rule (head-to-tail).
Subtraction:
o Example: v₁ = (1, 3), v₂ = (1, 1) → v₁ - v₂ = (0, 2).
Scalar multiplication: 𝑐𝐯 = (𝑐𝑣1 , 𝑐𝑣2 , … , 𝑐𝑣𝑛 ), where 𝑐 ∈ ℝ.
o Example: 2𝐮 = (2,6).
Dot product (inner product): : Scalar from component-wise multiplication and
summation.
𝐮 ⋅ 𝐯 = 𝑢1 𝑣1 + 𝑢2 𝑣2 + ⋯ + 𝑢𝑛 𝑣𝑛 = ∑𝑛𝑖=1 𝑢𝑖 𝑣𝑖 .
o Measures alignment (0 if perpendicular).
o Example in ℝ𝟑 :
Let 𝐮 = (1,1, −1), 𝐯 = (2,3,1).
Then 𝐮 ⋅ 𝐯 = 1 ⋅ 2 + 1 ⋅ 3 + (−1) ⋅ 1 = 2 + 3 − 1 = 4.
Magnitude (Norm): ∥ 𝐯 ∥= √𝐯 ⋅ 𝐯.
o Example: v = (1, -1, 2) → ||v|| = √(1 + 1 + 4) = √6.
Angle Between Vectors: cos θ = (v₁ · v₂) / (||v₁|| · ||v₂||).
3. Linear Combinations
Given a set 𝑆 = {𝐯1 , 𝐯2 , … , 𝐯𝑘 }, a linear combination is: 𝐯 = 𝛼1 𝐯1 + 𝛼2 𝐯2 + ⋯ + 𝛼𝑘 𝐯𝑘 , where
𝛼1 , … , 𝛼𝑘 ∈ ℝ are scalars.
Example:
Let 𝐯1 = [1,1, −1], 𝐯2 = [0,1, −1], 𝐯3 = [0,0,1].
Then 𝛼1 𝐯1 + 𝛼2 𝐯2 + 𝛼3 𝐯3 = 𝛼1 [1,1, −1] + 𝛼2 [0,1, −1] + 𝛼3 [0,0,1]
= [𝛼1 , 𝛼1 + 𝛼2 , −𝛼1 − 𝛼2 + 𝛼3 ].
4. Linear Independence
A set 𝑆 = {𝐯1 , … , 𝐯𝑘 }is linearly independent (LI) if the equation 𝛼1 𝐯1 + ⋯ + 𝛼𝑘 𝐯𝑘 = 𝟎 holds
only when 𝛼1 = 𝛼2 = ⋯ = 𝛼𝑘 = 0. Otherwise, it's linearly dependent (LD) (some non-trivial
combination gives zero).
OR
Linearly Independent (LI): Only trivial solution (all αᵢ = 0) to α₁v₁ + ... + αₖvₖ = 0.
Linearly Dependent (LD): Non-trivial solution exists.
Example in ℝ2 : 𝑆 = {(1,0), (1,1)}. Solve 𝛼1 (1,0) + 𝛼2 (1,1) = (0,0): 𝛼1 + 𝛼2 = 0, 𝛼2 = [Link]
𝛼2 = 0, 𝛼1 = 0. Thus, LI.
Example in ℝ2 : 𝑆1 = {(1,1), (3,3)}. −3(1,1) + 1(3,3) = (−3 + 3, −3 + 3) = (0,0).Non-zero
scalars, so LD.
Examples in ℝ²:
o LI: {(1, 1), (1, -1)} – only zero coefficients work.
o LD: {(1, 1), (3, 3)} – 3(1, 1) - (3, 3) = (0, 0).
Example in ℝ³: S={(1, -1, 0), (1, 0, 1), (0, 1, 1)} is LD,
Non-trivial solution: Let 𝛼1 = 1, then 𝛼2 = −1, 𝛼3 = 1.
Verify:
1 ⋅ (1, −1,0) − 1 ⋅ (1,0,1) + 1 ⋅ (0,1,1) = (1 − 1 + 0, −1 − 0 + 1,0 − 1 + 1) = (0,0,0).
Thus, LD (not LI).
Remarks:
In ℝ𝑛 , any set with more than 𝑛vectors is LD.
Any set containing the zero vector is LD.
LD sets contain redundant vectors (one is a combination of others).
6. Orthogonal and Orthonormal Vectors
A set {𝐯1 , … , 𝐯𝑛 }is mutually orthogonal (pairwise) if 𝐯𝑖 ⋅ 𝐯𝑗 = 0for all 𝑖 ≠ 𝑗.
It's orthonormal if orthogonal and∥ 𝐯𝑖 ∥= 1 for each 𝑖.
Example (Orthogonal in ℝ3 ): 𝑆 = {[1,0, −1], (1, √2, 1), [1, −√2, 1]}.
(1,0, −1) ⋅ (1, √2, 1) = 1 + 0 − 1 = 0.
(1,0, −1) ⋅ (1, −√2, 1) = 1 + 0 − 1 = 0.
(1, √2, 1) ⋅ (1, −√2, 1) = 1 − 2 + 1 = 0. All pairs orthogonal, so orthogonal set. (Note:
Your notes had a typo in the last dot product, listing both as (1, √2, 1), which would be 4
≠ 0.)
1 , 1 1 1
Example (Orthonormal in ℝ2 ): Your notes have {[ ],[ ,− ]}.
√2 √2 √2 √2
1 1 1 1 1 1
Dot product: ⋅ + ⋅ (− ) = 2 − 2 = 0.
√2 √2 √2 √2
1 2 1 2 1 1
Each length: √( ) + ( ) = √2 + 2 = 1. Thus, orthonormal.
√2 √2
Remark: Any orthogonal set (with no zero vectors) is LI (non-zero dot products prevent trivial
combinations equaling zero).
Feature Vectors in Machine Learning
Concept: Data points as vectors.
Example: Employee data – [height, weight] for each person; ID might be excluded for
numerical features.
MATLAB Implementation
Vector Definition: v = [1, -1, 2]; w = [2, 5, 2]; (row vectors by default; use v = [1; -1; 2] for
column vectors if needed).
Operations:
o Addition: v + w
o Subtraction: v - w
o Scalar Multiply: 3 * v
o Norm: norm(v)
o Dot Product: dot(v, w) (or v * w.' for row vectors)
Basics of Matrix Algebra
Definition:
A matrix is a rectangular array of scalars (real numbers), denoted as an 𝑚 × 𝑛matrix
with 𝑚rows and 𝑛columns.
Elements: 𝑎𝑖𝑗 where 𝑖= row index, 𝑗= column index (1-based indexing).
Example: (2×3 matrix)
1 2 3
𝐴=[ ]
4 5 6
2. Types of Matrices
Diagonal Matrix: Off-diagonal elements = 0.
3 0
Example: [ ]
0 5
Zero Matrix: All elements = 0 (denoted 𝑂).
Upper Triangular: Elements below main diagonal = 0.
Lower Triangular: Elements above main diagonal = 0.
Identity Matrix (𝐼𝑛 ): Square matrix with 1s on diagonal, 0s elsewhere.
1 0
Example: 𝐼2 = [ ](multiplicative identity).
0 1
3. Matrix Equality and Basic Operations
Equality: Same dimensions and corresponding elements equal (𝐴 = 𝐵 iff 𝑎𝑖𝑗 = 𝑏𝑖𝑗 for all
𝑖, 𝑗).
Addition/Subtraction: Only for same-sized matrices; element-wise.
o Commutative: 𝐴 + 𝐵 = 𝐵 + 𝐴
𝑎 + 𝑏11 ⋯
o Associative: (𝐴 + 𝐵) + 𝐶 = 𝐴 + (𝐵 + 𝐶)Example: 𝐴 + 𝐵 = [ 11 ]
⋮ ⋱
Scalar Multiplication: Multiply each element by scalar 𝛼: 𝛼𝐴 = [𝛼𝑎𝑖𝑗 ].
o Distributive: 𝛼(𝐴 + 𝐵) = 𝛼𝐴 + 𝛼𝐵; (𝛼 + 𝛽)𝐴 = 𝛼𝐴 + 𝛽𝐴.
4. Matrix Multiplication
Condition: Columns of first matrix = rows of second (𝑚 × 𝑛 × 𝑛 × 𝑝→ 𝑚 × 𝑝).
Definition: Entry 𝑐𝑖𝑗 = ∑𝑛𝑘=1 𝑎𝑖𝑘 𝑏𝑘𝑗 (row 𝑖of A dot column 𝑗of B).
Properties:
o Not Commutative: 𝐴𝐵 ≠ 𝐵𝐴(in general).
o Associative: (𝐴𝐵)𝐶 = 𝐴(𝐵𝐶).
o Distributive: 𝐴(𝐵 + 𝐶) = 𝐴𝐵 + 𝐴𝐶; (𝐴 + 𝐵)𝐶 = 𝐴𝐶 + 𝐵𝐶.
Example (2×2):
1 2 5 6 19 22
[ ][ ]=[ ]
3 4 7 8 43 50
5. Transpose of a Matrix
Definition: 𝐴𝑇 swaps rows and columns: ( 𝐴𝑇 )𝑖𝑗 = 𝑎𝑗𝑖 .
Properties:
o (𝐴 + 𝐵)𝑇 = 𝐴𝑇 + 𝐵 𝑇
o ( 𝐴𝐵)𝑇 = 𝐵 𝑇 𝐴𝑇
o ( 𝑘𝐴)𝑇 = 𝑘𝐴𝑇 (scalar 𝑘)
o ( 𝐴𝑇 )𝑇 = 𝐴
1 2 𝑇 1 3
Example: For 𝐴 = [ ], 𝐴 = [ ].
3 4 2 4
6. Determinant and Inverse
Determinant (for square matrices only): Measures "volume scaling" or singularity.
𝑎 𝑏
o 2×2: det [ ] = 𝑎𝑑 − 𝑏𝑐.
𝑐 𝑑
o If det(𝐴) = 0, matrix is singular (no inverse).
Inverse (𝐴−1 ): Exists if det(𝐴) ≠ 0(invertible/nonsingular). Satisfies 𝐴𝐴−1 = 𝐴−1 𝐴 = 𝐼.
1 𝑑 −𝑏
o For 2×2: 𝐴−1 = det(𝐴) [ ].
−𝑐 𝑎
Properties:
o ( 𝐴𝐵)−1 = 𝐵 −1 𝐴−1
o ( 𝐴−1 )−1 = 𝐴
o (𝐴𝑇 )−1 = (𝐴−1 )𝑇
1
o ( 𝑘𝐴)−1 = 𝑘 𝐴−1 (if 𝑘 ≠ 0).
7. MATLAB Implementation
Create: A = [1 2; 3 4]; (row vectors separated by spaces, columns by semicolons).
Operations:
o Addition: A + B (element-wise; no separate add function needed)
o Multiplication: A * B (matrix multiplication)
o Transpose: A'
o Determinant: det(A)
o Inverse: inv(A) (throws error if singular, e.g., MATLAB:SingularMatrix)
Error Handling: Incompatible shapes or singular matrices throw exceptions (e.g.,
dimension mismatch or singular matrix warnings/errors).
Vector Space - Definition and Examples
1. Group
A group is a non-empty set G together with a binary operation * (e.g. addition, multiplication)
such that the following four properties are satisfied:
1. Closure: ∀ a, b ∈ G, a * b ∈ G
2. Associativity: ∀ a, b, c ∈ G, a * (b * c) = (a * b) * c
3. Identity element: ∃ e ∈ G such that ∀ a ∈ G, a * e = e * a = a
4. Inverse element: ∀ a ∈ G, ∃ a⁻¹ ∈ G such that a * a⁻¹ = a⁻¹ * a = e
Abelian group
An abelian group is a group (G, *) on which the binary operation is commutative i.e.;
a * b = b * a ∀ a, b ∈ G
∀v∈V
⇒ Every abelian group is a group, but not every group is abelian.
Field
A field is a non-empty set F together with two binary operations + and multiplication (·),
satisfying the following conditions:
1. (F, +) is an abelian group with identity element 0
2. (F - {0}, ·) is an abelian group with identity element 1
3. Distributive law holds ∀ a, b, c ∈ F:
a · (b + c) = a · b + a · c
or
a · (b · c) = a · b + a · c
Example:
(Z, +) is an abelian group
(Q, +, ·), (R, +, ·) and (C, +, ·) are fields
2. Definition of a Vector Space
A vector space 𝑉 over a field ℝ(real numbers) is a set of vectors equipped with two
operations:
o Vector addition (denoted +): Combines two vectors to produce another vector
in 𝑉.
A map f: V × V → V is called internal binary operation in V.
o Scalar multiplication (denoted ⋅): Multiplies a scalar from ℝby a vector to
produce another vector in 𝑉.
A map g: F × V → V is called an external binary operation in V (F)
These operations must satisfy six axioms (properties) for 𝑉to be a vector space. (Note:
Generalizable to any field, but ML typically uses ℝ.)
3. The Six Axioms / Properties
The axioms ensure the structure behaves like familiar vector operations. Structured as group
properties for addition + compatibility with scalars.
Axiom Description Mathematical Form
𝑉 with +forms an abelian
(commutative) group: closure, 𝐮+𝐯=𝐯+𝐮
1. Abelian Group commutativity, associativity, (𝐮 + 𝐯) + 𝐰 = 𝐮 + (𝐯 + 𝐰)
under Addition additive identity (zero vector 𝟎), 𝐮+𝟎=𝐮
and additive inverses (for every 𝐮 + (−𝐮) = 𝟎
𝐯, −𝐯 exists).
2. Closure under Scalar 𝑎 ∈ ℝ, vector 𝐯 ∈ 𝑉:
𝑎⋅𝐯∈𝑉
Scalar Multiplication result stays in 𝑉.
3. Distributivity
Scalar distributes over vector
(Scalar over Vector 𝑎 ⋅ (𝐮 + 𝐯) = 𝑎 ⋅ 𝐮 + 𝑎 ⋅ 𝐯
sum.
Addition)
Axiom Description Mathematical Form
4. Distributivity
Vector scaled by sum of scalars
(Vector over Scalar (𝑎 + 𝑏) ⋅ 𝐯 = 𝑎 ⋅ 𝐯 + 𝑏 ⋅ 𝐯
equals sum of scaled vectors.
Addition)
5. Associativity of Scalar multiplication is
𝑎 ⋅ (𝑏 ⋅ 𝐯) = (𝑎𝑏) ⋅ 𝐯
Scalar Multiplication associative.
6. Multiplicative The scalar 1 leaves vectors
1⋅𝐯=𝐯
Identity (Unitary Law) unchanged.
Definition of Vector Space
Let (F, +, ·) be a field. Then, a non-empty set V together with two binary operations - vector
addition (internal) and scalar multiplication (external) is called a vector space over the field F if
the following properties are satisfied:
(A) (V, +) is an abelian group.
(i) u+v ∈ V ∀ u, v ∈ V (vector addition)
(ii) (u + w) + v = u + (v + w) ∀ u, v, w ∈ V
(iii) u+0=0+u=u∀u∈V
(iv) (v) u + (-u) = (-u) + u = 0 ∀ u ∈ V
(v) u + v = v + u ∀ u, v ∈ V
(B) Scalar Multiplication
Any u ∈ V and a ∈ F ⇒ a u ∈ V. [Scalar Multiplication]
These laws must satisfy the following conditions:
(i) Multiplication by scalar is distributive over vector addition:
Mathematically, a (u + v) = a u + a v ∀ a ∈ F, u, v ∈ V
(ii) Multiplication by vector is distributive over scalar addition:
Mathematically, (a + b) u = a u + b u, ∀ a, b ∈ F, u ∈ V
(iii) (a · b) u = a · (b · u), ∀ a, b ∈ F, u ∈ V
(iv) 1 u = u is true if 1 is a scalar and u is a vector.
Remarks:
1. The elements of V are called vectors whereas the elements of F are called scalars.
2. In dealing with vector spaces there are two types of zero elements:
(i) 0 of the field, or
(ii) 0 of the vector space. 0_V ∈ V if and only if 0_V = 0 · v for some v ∈ V.
Examples of Vector Spaces
If F = ℝ then V(ℝ) is called a real vector space.
If F = ℂ then V(ℂ) is called a complex vector space.
The space {0} is called a null space or zero space.
ℝ(ℝ), ℂ(ℂ), ℂ(ℝ), ℂn(ℂ) are all vector spaces under usual addition and multiplication.
ℝ𝑛 (Euclidean Space):
o Vectors: Ordered 𝑛-tuples (e.g., ℝ2 : points (𝑥, 𝑦)).
o Addition: Component-wise (e.g., (𝑎, 𝑏) + (𝑐, 𝑑) = (𝑎 + 𝑐, 𝑏 + 𝑑)).
o Scalar Multiplication: Component-wise (e.g., 𝑘 ⋅ (𝑎, 𝑏) = (𝑘𝑎, 𝑘𝑏)).
o All axioms hold (e.g., closure: sums/products stay in ℝ𝑛 ; identity: (0,0, … ,0)).
o Relevance to ML: Represents feature vectors (e.g., 𝑛 = 1000 for high-
dimensional data).
Set of 𝑚 × 𝑛 Matrices:
o Vectors: Matrices over ℝ.
o Operations: Matrix addition and scalar multiplication (standard).
o Forms a vector space (closure under entry-wise ops).
Polynomials over ℝ:
o All polynomials in variable 𝑥with real coefficients (e.g., 3𝑥 2 + 2𝑥 − 1).
o Or polynomials of degree ≤ 𝑛(finite-dimensional subspace).
o Operations: Polynomial addition and scalar multiplication.
o Axioms hold (e.g., sum of degree-𝑛 polys is degree ≤ 𝑛).
Convergent Sequences:
o Set of all convergent real sequences (e.g., (𝑎1 , 𝑎2 , … )where lim 𝑎𝑖 exists).
o Operations: Component-wise addition/multiplication.
o Forms a vector space.
4. Non-Examples (Sets That Fail to Be Vector Spaces)
ℝ(ℂ) is not a vector space under usual addition and multiplication.
If a ∈ ℂ, u, v ∈ ℝ
Scalar Multiplication: a u ∈ ℂ
But a ∈ F, u ∈ V ⇒ a u ∈ V
⇒ Scalar multiplication a u ∈ ℂ fails since a u ∉ ℝ
e.g. (2 + 3i) ∈ ℂ, 5 ∈ ℝ, (2 + 3i)(5) ∉ ℝ
Polynomials of Exact Degree 𝑛(e.g., degree exactly 3):
o Fails closure under addition: Sum of two degree-3 polys can be degree < 3 (e.g.,
𝑥 3 + 𝑥 2 + 𝑥 + 7 + (−𝑥 3 + 3𝑥 2 + 4𝑥 + 1) = 4𝑥 2 + 5𝑥 + 8, degree 2).
Modified Operations on ℝ2 :
o Custom addition: (𝑥1 , 𝑦1 ) + (𝑥2 , 𝑦2 ) = (𝑥1 + 𝑥2 , 𝑦1 + 2𝑦2 ).
o Fails commutativity: (𝑥1 , 𝑦1 ) + (𝑥2 , 𝑦2 ) = (𝑥1 + 𝑥2 , 𝑦1 + 2𝑦2 ) ≠ (𝑥2 + 𝑥1 , 𝑦2 +
2𝑦1 ) = (𝑥2 , 𝑦2 ) + (𝑥1 , 𝑦1 )(unless 𝑦1 = 𝑦2 ).
o Lesson: Structure depends on both set and operations.
5. Geometric Interpretation (for Low Dimensions)
ℝ1 (Real Line): Vectors as arrows from origin (0) to point 𝑥(e.g., 𝑣 = 5: arrow to 5; 𝑣 =
−2: arrow left to -2).
ℝ2 (XY-Plane): Vectors as arrows from origin to (x,y).
ℝ3 (XYZ-Space): Arrows from origin to (x,y,z).
Vector Addition: Parallelogram rule (diagonal = sum; e.g., tails at origin, heads form
parallelogram).
Linear Independence: Vectors 𝐯𝟏 , 𝐯𝟐 , 𝐯𝟑 independent if 𝐯𝟑 not in plane spanned by 𝐯𝟏 , 𝐯𝟐 .
Higher Dimensions: Abstract (hard to visualize for 𝑛 > 3), but generalizes; common in
ML for datasets with many features (e.g., 𝑛 = 2000).
Vector Subspaces - Examples and Properties
Definition of a Subspace
non-empty subset W of a vector space 𝑉(e.g., ℝ𝑁 ) is a subspace if:
1. It contains the zero vector 𝟎.
2. It is closed under vector addition: If 𝐱, 𝐲 ∈ 𝑊, then 𝐱 + 𝐲 ∈ 𝑊.
3. It is closed under scalar multiplication: If 𝐱 ∈ 𝑊and 𝛼 ∈ ℝ, then 𝛼𝐱 ∈ 𝑊.
These three conditions ensure 𝑊 satisfies all vector space axioms.
Theorem: A subset 𝑊 ⊆ ℝ𝑁 is a subspace iff it contains 𝟎 and is closed under addition and
scalar multiplication.
Properties of Subspaces
Intersection: The intersection of any number of subspaces is always a subspace.
o Example: For 𝑊1 = { (𝑥1 , 𝑥2 , 𝑥3 ) ∣ 𝑥1 + 𝑥2 − 𝑥3 = 0 }and 𝑊2 =
{ (𝑥1 , 𝑥2 , 𝑥3 ) ∣ 𝑥1 = 𝑥2 = 𝑥3 }, 𝑊1 ∩ 𝑊2 = {(0,0,0)}, which is a subspace.
Union: The union of two subspaces is a subspace only if one is contained in the other
(generally not a subspace).
Examples of Subspaces
1. Trivial Subspaces:
o Zero subspace: {𝟎}.
o Whole space: ℝ𝑁 .
2. Symmetric Matrices:
o Set of all 3×3 symmetric matrices form a subspace of 𝑀3×3 (ℝ).
3. Linear Constraints:
o 𝑊 = { (𝑥1 , 𝑥2 , 𝑥3 ) ∈ ℝ3 ∣ 𝑥1 + 𝑥2 − 𝑥3 = 0 }.
Contains zero: Yes.
Closed under addition and scalar multiplication: Verified by substitution.
4. Equal Components:
o 𝑊 = { (𝑥1 , 𝑥2 , 𝑥3 ) ∈ ℝ3 ∣ 𝑥1 = 𝑥2 = 𝑥3 } = { (𝛼, 𝛼, 𝛼) ∣ 𝛼 ∈ ℝ }.
Contains zero, closed under operations.
5. Non-Example:
o 𝑊 = { (𝑥1 , 𝑥2 , 𝑥3 ) ∈ ℝ3 ∣ 𝑥1 + 𝑥2 + 𝑥3 = 1 }.
Fails: Zero vector gives 0 ≠ 1.
Geometric Interpretation in ℝ𝟑
Subspaces are:
A point: Origin { (0,0,0) }.
A line through the origin.
A plane through the origin.
The entire space ℝ3 .
Linear Span
Definition: For a set 𝑆 = {𝐯1 , … , 𝐯𝑛 } ⊆ 𝑉,
span(𝑆) = { 𝑐1 𝐯1 + ⋯ + 𝑐𝑛 𝐯𝑛 ∣ 𝑐𝑖 ∈ ℝ }.
Properties:
o Always a subspace.
o Smallest subspace containing 𝑆.
o If 𝑆 is a subspace, span(𝑆) = 𝑆.
o L(∅) = {0} is called zero space.
Examples:
o 𝑆 = {(1,0), (0,1)} ⊆ ℝ2 : span(𝑆) = ℝ2 .
o 𝑆 = {(1,2,0), (1,1, −1)} ⊆ ℝ3 : Forms a plane.
Elementary Properties of Vector Spaces
1. a · 0 = 0 for all a ∈ F and all 0 ∈ V.
2. 0 · u = 0 for all u ∈ V and all 0 ∈ F.
3. a (u + w) = a u + a w for all a ∈ F and all u, w ∈ V.
4. a (u - w) = a u - a w for all a ∈ F and all u, w ∈ V.
Here V(F) is a vector space, u, w ∈ V, a ∈ F.
Important Subspaces in Linear Algebra
For an 𝑚 × 𝑛matrix 𝐴:
Row Space: Span of rows.
Column Space (Range): Span of columns.
Null Space (Kernel): {𝐱 ∈ ℝ𝑛 ∣ 𝐴𝐱 = 𝟎}.
These are crucial in applications like machine learning (e.g., PCA for dimensionality reduction).
Basis and Dimension
Definition of Basis
A basis for a vector space 𝑉over a field 𝐹is a set of vectors that satisfies two conditions:
Linear Independence: No vector in the set can be expressed as a linear combination of
the others.
Spanning: Every vector in 𝑉can be uniquely written as a linear combination of the basis
vectors.
Formally: Let {𝑣1 , 𝑣2 , … , 𝑣𝑛 } ⊆ 𝑉. It is a basis if:
1. The vectors are linearly independent: 𝛼1 𝑣1 + ⋯ + 𝛼𝑛 𝑣𝑛 = 0implies 𝛼𝑖 = 0for all 𝑖.
2. It spans 𝑉: For any 𝑣 ∈ 𝑉, there exist 𝛼𝑖 ∈ 𝐹such that 𝑣 = 𝛼1 𝑣1 + ⋯ + 𝛼𝑛 𝑣𝑛 .
A vector space is finite-dimensional if it has a finite basis; otherwise, it is infinite-
dimensional.
Examples of Bases
ℝ2 over ℝ: Standard basis {(1,0), (0,1)}.
ℝ3 over ℝ: Standard basis {(1,0,0), (0,1,0), (0,0,1)}.
ℝ𝑛 over ℝ: Vectors with a 1 in one position and 0s elsewhere (n vectors).
1 0
Space of all 2 × 2real matrices: Basis consists of 4 matrices (e.g., 𝐸11 = ( ), etc.).
0 0
1 0
Space of symmetric 2 × 2matrices: Basis has 3 matrices due to symmetry (e.g., ( ),
0 0
0 1 0 0
( ), ( )).
1 0 0 1
Dimension of a Vector Space
The dimension of 𝑉, denoted dim(𝑉), is the number of vectors in any basis for 𝑉. Examples:
dim(ℝ𝑛 ) = 𝑛
Space of 𝑚 × 𝑛real matrices: dim =𝑚×𝑛
Space of polynomials of degree ≤ 𝑛: dim = 𝑛 + 1(basis: {1, 𝑥, 𝑥 2 , … , 𝑥 𝑛 })
Space of all polynomials: Infinite-dimensional.
Key Theorems and Properties
1. Linear Dependence in Finite Dimensions: In an n-dimensional space, any set of more
than n vectors is linearly dependent. A basis is a maximal linearly independent set.
2. Basis Extension: Any linearly independent set of k < n vectors in an n-dimensional space
can be extended to a full basis by adding n - k more vectors.
3. Uniqueness of Dimension: All bases of a finite-dimensional vector space have the same
cardinality (equal to the dimension).
4. Dimension Formula for Subspaces: For subspaces 𝑆1 , 𝑆2 ⊆ 𝑉,
dim(𝑆1 ) + dim(𝑆2 ) = dim(𝑆1 + 𝑆2 ) + dim(𝑆1 ∩ 𝑆2 )
(Where 𝑆1 + 𝑆2 = { 𝑠1 + 𝑠2 ∣ 𝑠1 ∈ 𝑆1 , 𝑠2 ∈ 𝑆2 }.)
Examples of Basis and Dimension Calculations
Example 1: Subspace in ℝ𝟑
Subspace S: 𝑥1 + 𝑥2 − 𝑥3 = 0
o Solve: Let 𝑥1 = 𝑡, 𝑥2 = 𝑠; then 𝑥3 = 𝑡 + 𝑠.
o Basis: {(1,0,1), (0,1,1)}
o Dimension: 2
Subspace W: 𝑥1 = 𝑥2 = 𝑥3
o Basis: {(1,1,1)}
o Dimension: 1
Intersection 𝑆 ∩ 𝑊: Only the zero vector (no non-trivial solution satisfies both).
o Basis: Empty (trivial subspace)
o Dimension: 0
Example 2: Subspaces in ℝ𝟒
S₁: Defined by 𝑥1 + 𝑥2 − 𝑥3 + 𝑥4 = 0and 𝑥1 + 𝑥2 + 𝑥3 + 𝑥4 = 0
o Adding equations: 2𝑥1 + 2𝑥2 = 0→ 𝑥1 = −𝑥2.
o Let 𝑥2 = 𝑡, 𝑥3 = 𝑠; then 𝑥1 = −𝑡, 𝑥4 = 𝑡 + 𝑠.
o Basis: {(−1,1,0,1), (0,0,1,1)}(or equivalent)
o Dimension: 2
S₂ (similar constraints, e.g., other equations leading to dim 2).
S₁ ∩ S₂: Trivial intersection.
o Dimension: 0
Linear Transformations
1. Introduction and Applications
Linear transformations are a fundamental tool in linear algebra, described as the "most
useful concept" for machine learning (ML) and computer science (CS).
Key Applications:
o ML and Data Projection: Used to map data from non-linearly separable spaces to
linearly separable ones (e.g., for linear SVM or perceptron classifiers).
o Calculus: Represent derivatives (e.g., from ℝ³ → ℝ³).
o Graphics/Animation: Enable 2D/3D modeling by changing object shapes, scaling,
rotating, or shifting viewpoints via geometric transformations.
o Other Fields: Common in electrical engineering, electronics, and abstract math.
3. Definition:
Let V and W be vector spaces over a field F, with dim(V) = n and dim(W) = m. A linear
transformation T: V → W is a mapping satisfying:
1. Additivity: T(v₁ + v₂) = T(v₁) + T(v₂) for all v₁, v₂ ∈ V.
2. Homogeneity: T(αv) = α T(v) for all v ∈ V and scalar α ∈ F.
Note:
1. A linear transformation is also known as Linear map or linear mapping.
2. These conditions ensure T preserves vector addition and scalar multiplication.
3. A Liner transformation from a vector space to underlying field is called a linear
functional.
Examples:
Example 1: T: ℝ² → ℝ² defined by T(x₁, x₂) = (x₁, x₁ + x₂).
o Verification:
Additivity:
T((x₁, x₂) + (y₁, y₂)) = T(x₁ + y₁, x₂ + y₂)
= (x₁ + y₁, (x₁ + y₁) + (x₂ + y₂))
= (x₁, x₁ + x₂) + (y₁, y₁ + y₂) = T(x₁, x₂) + T(y₁, y₂).
Homogeneity:
T(α(x₁, x₂)) = T(αx₁, αx₂) = (αx₁, αx₁ + αx₂) = α (x₁, x₁ + x₂) = α T(x₁, x₂).
o Therefore, T is linear.
Example 2: T: ℝ³ → ℝ³ defined by T(x₁, x₂, x₃) = (x₂, x₁, 0).
o Brief mention; verification follows similar steps (not detailed in lecture).
4. Geometric Interpretations
Overview: Linear transformations alter vectors by scaling, rotating, shearing, or
projecting shapes (e.g., a unit square in ℝ²).
Examples in ℝ² (using unit square with vertices (0,0), (1,0), (1,1), (0,1)):
o Uniform Scaling: T(x₁, x₂) = (2x₁, 2x₂) → Square scales to side length 2.
o Non-Uniform Scaling (Shear): T(x₁, x₂) = (x₁, 2x₂) → Rectangle (width 1, height 2).
o Projection: T(x₁, x₂) = (x₁, 0) → Collapses to line segment on x₁-axis [(0,0) to (1,0)].
o Rotation by θ (e.g., 45°): T(x₁, x₂) = (x₁ cos θ - x₂ sin θ, x₁ sin θ + x₂ cos θ).
cos 𝜃 − sin 𝜃 𝑥1
Matrix form: ( )applied to (𝑥 ).
sin 𝜃 cos 𝜃 2
Extends to ℝ³ rotations.
Note: Transformations change shape/orientation while preserving linearity.
Remarks
Special Cases:
o T: V → V is a linear operator.
o T: V → F is a linear functional.
Non-Linear Examples (to identify failures):
o T: ℝ² → ℝ³, T(x₁, x₂) = (x₁ + x₂ + 1, 2x₁ - x₂, x₁ + x₂) → Fails T(0) = 0 (maps to
(1,0,0)).
o T: ℝ² → ℝ², T(x₁, x₂) = (x₁ - x₂, 2|x₁| - x₂) → Fails additivity (e.g., T(1,1) + T(2,2) ≠
T(3,3)).
o T: M₂×₂ → M₂×₂, T(A) = A + I (identity) → Fails T(0) = 0.
o Linear ones: T(x₁, x₂) = (x₁ - x₂, 2x₁ - x₂); matrix multiplications.
5. Matrix Representation of Linear Transformations
Every linear transformation corresponds to a matrix (and vice versa), relative to chosen
bases.
Suppose V(F) and W(F) are finite dimensional vector spaces with bases {v₁, ..., vₙ} and {w₁, ...,
wₘ} respectively. Let T: V → W is be a linear map. Then the matrix of T with elements 𝑎𝑖𝑗 ∈
𝐹 is defined by
T(vⱼ) = 𝑎1𝑗 𝑤1 + 𝑎2𝑗 𝑤2 + 𝑎3𝑗 𝑤3 + ⋯ + 𝑎𝑚𝑗 𝑤𝑚 for j=1 to n.
or
T(vⱼ) = ∑𝑚
𝑖=1 𝑎𝑖𝑗 𝑤𝑖 for j=1 to n.
It shows that the jth column of matrix A of linear map T consists of coordinate of T(vⱼ) in the
chosen bases of W.
Conversely, every matrix 𝐴 ∈ ℝ𝑚×𝑛 induce a linear transformation 𝑇: ℝ𝑛 → ℝ𝑚 given by
T(v) = Av.
Example: Find the matrix representation of a linear transformation T: ℝ³ → ℝ² defined
as T(x, y, z) = (x − y + 2z, y − 3z)
with respect to non-standard (ordered) bases.
Basis B for the domain ℝ³ (ordered basis B = {v₁, v₂, v₃}): v₁ = (1, 1, 0)ᵀ v₂ = (1,
0, 1)ᵀ v₃ = (0, 1, 1)ᵀ
Basis C for the codomain ℝ² (ordered basis C = {w₁, w₂}): w₁ = (2, 1)ᵀ w₂ = (1, 1)ᵀ
T=[-1 6 3; 2 -9 -5];
Example: T: ℝ² → ℝ², T(x₁, x₂) = (2x₁ - 7x₂, 4x₁ + 3x₂); Find matrix representation of T
relative to basis B = {(1,3), (2,5)}.
T(1,3) = (-19,13) = a11(1,3) + a12 (2,5)
T(2,5) = (-31,23) = a21 (1,3) + a22 (2,5)
Therefore, a11=121, a12=-70, a21=201, a22=-116
121 201
Matrix A = ( ).
−70 −116
2 1
Reverse: For matrix A = ( ), T(x) = A x = (2x₁ + x₂, 4x₁ + 3x₂) (standard basis).
4 3
Dimension Note: T: ℝⁿ → ℝᵐ → m × n matrix.
6. Kernel (Null Space) and Range (Image)
Let T: V → W be a linear map. Then the Nullspace of T as defined as
null(T)= {v ∈ V | T(v) = 0};
and range of T is defined as range(T)= {w ∈ W | ∃ v ∈ V s.t. T(v) = w}
Note:
o Null space of T is also called kernel of T denoted as ker(T).
o Null(T) is subspace of V
o Range(T) is subspace of W
o The Dimension of nullity(T) is called nullity of T.
o The Dimension of range(T) is called rank of T.
8. Rank-Nullity Theorem
Statement: For a linear map T: V → W, we have
dim(rank(T)) + dim(nullity(T)) = dim(V).
Example: determine range and nullspace of the linear map T: ℝ³ → ℝ⁴, defined by
T(x₁, x₂, x₃) = (x₁ - x₂ + x₃, x₂ - x₃, x₁, 2x₁ - 5x₂ + 5x₃).
o Range: Images of standard basis e₁=(1,0,0) → (1,0,1,2); e₂=(0,1,0) → (-1,1,0,-5);
e₃=(0,0,1) → (1,-1,0,5).
T(1,0,0) = (1,0,1,2);
T(0,1,0) = (-1,1,0,-5);
T (0,0,1) = (1,-1,0,5).
Range(T)= L{(1,0,1,2), (-1,1,0,-5)}
o Span by linearly independent vectors (1,0,1,2) and (-1,1,0,-5); rank(T) = 2.
o Kernel: Solve x| T(x) = 0 → Equations: x₁ - x₂ + x₃ = 0; x₂ - x₃ = 0; x₁ = 0; 2x₁ - 5x₂ +
5x₃ = 0.
Simplifies to x₁ = 0, x₂ = x₃ (free variable x₃); basis vector (0,1,1);
nullity(T) = 1.
o Verification: rank + nullity = 2 + 1 = 3 = dim(ℝ³).
Eigenvalues and Eigenvectors
Geometric Example:
o Start with vector 𝐱 = [1,3]𝑇 .
1 2
o Apply matrix 𝐴 = [ ]: 𝐴𝐱 = [7,5]𝑇 (direction changes, magnitude scales).
2 1
o Seek "special" vectors where 𝐴𝐯only scales 𝐯(no direction change).
o Example: 𝐯 = [1,1]𝑇 , 𝐴𝐯 = [3,3]𝑇 = 3𝐯(eigenvector 𝐯, eigenvalue 𝜆 = 3).
Note: Eigenvectors are unchanged in direction; eigenvalues indicate scaling factor.
2. Mathematical Definition
Let 𝐴 ∈ ℝ𝑛×𝑛 , we say that a non-zero vector 𝑣 ∈ ℝ𝑛 is an eigenvector of 𝐴 corresponding to
eigenvalue 𝜆(a scalar) if
𝐴𝐯 = 𝜆𝐯
or
(𝐴 − 𝜆𝐼)𝐯 = 𝟎
The above relation tells us that when the matrix (linear transformation) 𝐴 is applied on
𝑣, then it scales 𝑣 by an amount 𝜆.
Note:
𝐯 must be nonzero; 𝜆 can be real or complex.
Equation implies 𝐴 − 𝜆𝐼 is singular for nontrivial solutions.
3. Computing Eigenvalues and Eigenvectors
Example: 3×3 Matrix
2 −2 3
𝐴 = [1 1 1]
1 3 −1
Step 1: Characteristic Equation for Eigenvalues
Set det(𝐴 − 𝜆𝐼) = 0 to get the characteristic polynomial (degree 𝑛).
Roots are eigenvalues 𝜆.
Characteristic equation: det(𝐴 − 𝜆𝐼) = 0 yields 𝜆3 + 𝜆2 − 5𝜆 + 3 = 0.
Eigenvalues: 𝜆1 = 3, 𝜆2 = 1, 𝜆3 = −2.
Step 2: Eigenvectors
For each 𝜆, solve (𝐴 − 𝜆𝐼)𝐯 = 𝟎.
For 𝜆 = 3: 𝐯1 = [1,1,1]𝑇 .
For 𝜆 = 1: 𝐯2 = [1, −1, −1]𝑇 .
For 𝜆 = −2: Solve row-reduced system to get 𝐯3 (e.g., proportional to [3, -1, 2]^T).
Note: Eigenvectors are defined up to scalar multiples; normalize if needed.
4. Algebraic Multiplicity
If 𝜆 repeats 𝑘 times in the characteristic polynomial, its algebraic multiplicity is 𝑘.
2 0 0
Example: 𝐴 = (0 3 −2), characteristic polynomial (𝜆 − 2)(𝜆 − 1)2 = 0.
0 1 0
𝜆 = 1 has multiplicity 2 (even if geometric multiplicity—number of independent
eigenvectors—differs).
5. Geometric Interpretation
Example 1: Diagonal Scaling
2 0
𝑇=[ ](stretches x-axis by 2, y-axis unchanged).
0 1
Eigenvalues: 𝜆1 = 2, 𝜆2 = 1.
Eigenvectors: [1, 0]𝑇 (x-direction), [0, 1]𝑇 (y-direction).
Insight: Reveals "principal axes" of transformation.
Example 2: Shear Transformation
2 1
𝑆=[ ](turns square into parallelogram).
0 1
Eigenvalues: 𝜆1 = 2, 𝜆2 = 1.
Eigenvectors: [1, 0]𝑇 (unchanged direction), along line 𝑦 = 𝑥for scaling.
For symmetric matrices: Eigenvectors are orthogonal.
Note: Eigen-decomposition simplifies understanding of how matrices distort space (stretch,
rotate, shear).
6. Properties of Eigenvalues and Eigenvectors
1
Inverse Matrix: If 𝐴𝐱 = 𝜆𝐱and 𝐴invertible, then 𝐴−1 𝐱 = 𝐱.
𝜆
Matrix Powers:𝐴𝑘 𝐱 = 𝜆𝑘 𝐱 (useful for iterations, e.g., Markov chains).
Trace:tr(𝐴) = ∑𝜆𝑖 (sum of eigenvalues).
Determinant:det(𝐴) = ∏𝜆𝑖 (product of eigenvalues).
Singular Matrices:𝜆 = 0 implies det(𝐴) = 0.
If A is n×n matrix and λ is a characteristic root of A. Then, kλ is a characteristic root of kA
(k being a scalar).
7. Python Implementation with NumPy (26:00 – 31:30)
Code Snippet:
% Define the matrix A (same as in the Python example)
A = [ 2, -2, 3;
1, 1, 1;
1, 3, -1 ];
% Compute eigenvalues and eigenvectors
[V, D] = eig(A);
% Display results
disp('Eigenvalues (diagonal entries of D):');
disp(diag(D)); % should show approximately [3; 1; -2]
disp(' ');
disp('Eigenvectors (columns of V):');
disp(V);
% Optional: normalized eigenvectors (already unit length in MATLAB by default)
disp(' ');
disp('Norm of each eigenvector column (should be ≈1):');
disp(vecnorm(V, 2, 1)); % column-wise 2-norms
Output Explanation:
Eigenvalues (diagonal entries of D):
-2
Eigenvectors (columns of V):
-0.5774 -0.7071 0.4082
-0.5774 0.0000 -0.8165
-0.5774 0.7071 0.4082
Norm of each eigenvector column (should be ≈1):
1.0000 1.0000 1.0000
Note:
Eigenvalues (𝜆): Scaling factors; roots of characteristic equation.
Eigenvectors (𝐯): Directions invariant under transformation.
Applications: Dimensionality reduction (PCA), stability in dynamical systems, graph
algorithms (PageRank).
Next Steps: Explore defective matrices (algebraic > geometric multiplicity) and
diagonalization.
Norms and Spaces
1. Metric Spaces (Generalize distance)
Definition: A metric 𝑑: 𝑆 × 𝑆 → ℝ on set 𝑆must satisfy:
1. Non-negativity: 𝑑(𝑥, 𝑦) ≥ 0, and 𝑑(𝑥, 𝑦) = 0iff 𝑥 = 𝑦.
2. Symmetry: 𝑑(𝑥, 𝑦) = 𝑑(𝑦, 𝑥).
3. Triangle inequality: 𝑑(𝑥, 𝑧) ≤ 𝑑(𝑥, 𝑦) + 𝑑(𝑦, 𝑧).
(S, d) is called a metric space.
Example: On subset 𝑆 of reals, 𝑑(𝑥, 𝑦) =∣ 𝑥 − 𝑦 ∣ (satisfies all properties). (S, d) is a metric
space.
2. Normed Spaces
Definition: A norm on a vector space 𝑉 is a function
∥⋅∥: 𝑉 → ℝ such that:
1. Non-negativity: ∥ 𝑥 ∥≥ 0, and ∥ 𝑥 ∥= 0iff 𝑥 = 0.
2. Homogeneity: ∥ 𝛼𝑥 ∥=∣ 𝛼 ∣∥ 𝑥 ∥for scalar 𝛼.
3. Triangle inequality: ∥ 𝑥 + 𝑦 ∥≤∥ 𝑥 ∥ +∥ 𝑦 ∥.
(𝑉, ∥⋅∥) is called a normed space.
Relation to Metrics:
Induces metric 𝑑(𝑥, 𝑦) =∥ 𝑥 − 𝑦 ∥, so every normed space is a metric space (but not
vice versa).
Counterexample (Metric but Not Norm):
Discrete space 𝑆 = {0,1}with 𝑑(𝑥, 𝑦) = 1if 𝑥 ≠ 𝑦, else 0. It's a metric but lacks vector
space structure for norms.
Examples of Norms (on ℝ𝒏 )
Example Geometric Shape: Unit Ball
Norm Type Formula
for 𝑥 = (1, −2) Shape (2D)
ℓ1 (Manhattan) ||x||1 = ∑ 𝑥𝑖 |x|1=1+|-2|=3 A diamond-shaped square
|x|1√12 + (−2)2
ℓ2 (Euclidean) ∥ 𝑥 ∥2 = √∑𝑥𝑖2 Circle
= √5 ≈ 2.236
Example Geometric Shape: Unit Ball
Norm Type Formula
for 𝑥 = (1, −2) Shape (2D)
1
𝑛 𝑝 As p decreases, corners get sharper → more sparsity
ℓ𝑝 (General) ||x||p = (∑|𝑥𝑖 |𝑝 ) As p increases, shape becomes smoother → less
𝑖=1
sparsity
𝑓𝑜𝑟 𝑝 > 0)
||𝑥||∞
ℓ∞ (Max) ||𝑥||∞ = max{|𝑥𝑖 |} Square
= max{|1|, |−2|} = 2
5. Convexity
Convex Set: A set is said to be convex if the line joining two points of the set lies entirely in the
set. For 𝑆 ⊂ ℝ𝑛 , line segment between any 𝑥1 , 𝑥2 ∈ 𝑆 lies in 𝑆:
𝜆𝑥1 + (1 − 𝜆)𝑥2 ∈ 𝑆 for 𝜆 ∈ [0,1].
Examples:
o convex set: Rectangles, disks;
o non-convex : crescent shapes.
Convex Function: 𝑓: 𝑆 ⊂ ℝ𝑛 → ℝ is said to be convex if for 𝑥1 , 𝑥2 ∈ 𝑆 , we have
𝑓(𝜆𝑥1 + (1 − 𝜆)𝑥2 ) ≤ 𝜆𝑓(𝑥1 ) + (1 − 𝜆)𝑓(𝑥2 ), where 𝜆 ∈ [0,1].
o Geometric:
Norms and Convexity: All norms induce convex unit balls {𝑥: ∥ 𝑥 ∥≤ 1} via triangle
inequality.
6. Inner Product Spaces
Definition: An Inner product on real vector space V is a function
⟨⋅,⋅⟩: 𝑉 × 𝑉 → ℝ
satisfying:
1. Positive-definiteness: ⟨𝑥, 𝑥⟩ ≥ 0, and ⟨𝑥, 𝑥⟩ = 0 iff 𝑥 = 0.
2. Linearity (first arg.): ⟨𝛼𝑥 + 𝑦, 𝑧⟩ = 𝛼⟨𝑥, 𝑧⟩ + ⟨𝑦, 𝑧⟩.
3. Symmetry: ⟨𝑥, 𝑦⟩ = ⟨𝑦, 𝑥⟩.
A Vector space together with inner product is called inner product space.
Example
V= ℝ𝑛 (ℝ), Let 𝑋 = (𝑥1 , 𝑥2 , … , 𝑥𝑛 ), 𝑌 = (𝑦1 , 𝑦2 , … , 𝑦𝑛 ). Then standard inner product on
ℝ𝑛
⟨𝑋, 𝑌⟩ = ∑𝑥𝑖 𝑦𝑖 = 𝑋. 𝑌 =∥ 𝑥 ∥∥ 𝑦 ∥ cos 𝜃
⟨𝑥,𝑦⟩ 1
Angle: cos 𝜃 = ∥𝑥∥∥𝑦∥ (norm from ⟨𝑥, 𝑥⟩2 ).
V= ℝ2 : ⟨(𝑢1 , 𝑢2 ), (𝑣1 , 𝑣2 )⟩ = 𝑢1 𝑣1 − 𝑢1 𝑣2 − 𝑢2 𝑣1 + 𝑢2 𝑣2 (satisfies properties).
7. Special Case: 𝓵𝟎 Norm
Definition: ∥ 𝑥 ∥0 =number of non-zero entries.
o Example: 𝑥 = (1,2,0,0,3,0,0,4), ∥ 𝑥 ∥0 = 4 .
Not a True Norm: Fails homogeneity (e.g., ∥ 2𝑥 ∥0 =∥ 𝑥 ∥0 ≠ 2 ∥ 𝑥 ∥0 ).
Applications: Sparsity in compressive sensing.
8. Defining Norms via Regions
In finite-dimensional real vector spaces: Any symmetric, compact, convex set centered at
origin defines a norm (distance to boundary).
o Examples: Circle ( ℓ2 ), square ( ℓ∞ ), ellipse.
o Non-examples: Star-shaped (non-convex) regions fail.
Orthogonal Complement and Projection Mapping
1. Orthogonal Vectors
In an inner product space (𝑉, <. , . >), two vectors 𝑣𝑖 and 𝑣𝑗 are orthogonal if ⟨𝑣𝑖 , 𝑣𝑗 ⟩ = 0.
Generalizes perpendicularity (e.g., in ℝ𝑛 , dot product = 0).
Example: In ℝ3 , (1, 0, 0)and (0, 1, 0)are orthogonal (dot product = 0).
2. Orthogonal Complement
If 𝑊 ⊆ 𝑉, where 𝑉is an inner product space, then the orthogonal complement of 𝑊, denoted
by 𝑊 ⊥ , is the set of vectors in 𝑉that are orthogonal to every element of 𝑊:
W⊥ = { v ∈ V ∣ v ⊥ w ∀𝑤 ∈ 𝑊 } i.e.
For subset 𝑊 ⊆ 𝑉, 𝑊 ⊥ = { 𝑣 ∈ 𝑉 ∣ ⟨𝑣, 𝑤⟩ = 0 ∀𝑤 ∈ 𝑊 }.
Example:
In ℝ3 with the inner product taken as the usual dot product of vectors in ℝ3 , if we take
𝑊 = 𝐿[(1,0,0)] = span{(1,0,0)} = { (𝑥1 , 𝑥2 , 𝑥3 ) ∈ ℝ3 ∣ 𝑥2 = 0, 𝑥3 = 0 }
Then 𝑊 ⊥ = 𝐿[(0,1,0), (0,0,1)] = span{(0,1,0), (0,0,1)}
= { (𝑥1 , 𝑥2 , 𝑥3 ) ∈ ℝ3 ∣ 𝑥1 = 0 }
In geometric terms:
𝑊is the x-axis (all points where y = 0 and z = 0).
𝑊 ⊥ is the yz-plane (all points where x = 0).
Example: In ℝ3 , for 𝑊 = span{(1,0,0), (0,1,0)}, 𝑊 ⊥ = span{(0,0,1)}.
Remark:
There is no requirement on 𝑊 to be a subspace of V. However, 𝑊 ⊥ is always a subspace
of 𝑉.
4. Direct Sum Decomposition
If 𝑊is a subspace of 𝑉, then 𝑉 = 𝑊 ⊕ 𝑊 ⊥ (orthogonal direct sum).
Every 𝑣 ∈ 𝑉decomposes uniquely as 𝑣 = 𝑤 + 𝑤 ⊥ with 𝑤 ∈ 𝑊, 𝑤 ⊥ ∈ 𝑊 ⊥.
Example: Decompose 𝑣 = (2,3,5)into projection on 𝑊 = span{(1,0,0), (0,1,0)} plus
orthogonal component.
𝑣 = (2,3,5) = (2,3,0) + (0,0,5)
Here (2,3,0) = 2(1,0,0) + 3(0,1,0) ϵ W and (0,0,5) ∈ 𝑊 ⊥.
5. Orthogonal Projection Operator (𝑷𝑾 )
Let V be an inner product space and W be a subspace of V. Also, let {𝑤1 , … , 𝑤𝑘 } be an
orthogonal basis of W.
The Linear transform 𝑃𝑊 : 𝑉 → W define as
𝑘
⟨𝑣, 𝑤𝑖 ⟩
𝑃𝑊 (𝑣) = ∑ 𝑤
⟨𝑤𝑖 , 𝑤𝑖 ⟩ 𝑖
𝑖=1
is called the orthogonal projection onto subspace 𝑊.
Example: In ℝ3 , 𝑊 = span{(3,0,1), (0,1,0)}, for 𝑣 = (0,3,10), 𝑃𝑊 𝑣 = (3,3,1).
⟨𝑣, 𝑤1 ⟩ ⟨𝑣, 𝑤2 ⟩
𝑃𝑊 (𝑣) = 𝑤1 + 𝑤
⟨𝑤1 , 𝑤1 ⟩ ⟨𝑤𝑖 , 𝑤2 ⟩ 2
10 3
= (3,0,1) + (0,1,0) = (3,0,1) + (0,3,0)
10 1
= (3,3,1)
Residual: 𝑣 − 𝑃𝑊 (𝑣) = (0,3,10) − (3,3,1) = (−3,0,9) ∈ 𝑊 ⊥ .
6. Properties of Orthogonal Projection
For any vector v in V, 𝑣 − 𝑃𝑊 𝑣 ⊥ 𝑊(orthogonal to W).
𝑃𝑊 (𝑤) = 𝑤 for all 𝑤 ∈ 𝑊(fixes the subspace).
2
𝑃𝑊 = 𝑃𝑊 (idempotent).
For any vector v in V, ∥ 𝑃𝑊 𝑣 ∥≤∥ 𝑣 ∥(non-expansive).
For any vector v in V and w in W,
‖𝑣 − 𝑃𝑊 (𝑣)‖ ≤∥ 𝑣 − 𝑤 ∥
𝑃𝑊 (𝑣) = 𝑎𝑟𝑔𝑤∈𝑊 𝑚𝑖𝑛⟦𝑣 − 𝑤⟧
Example: Closest Point in Subspace
Find the closest point to 𝑣 = (2,4,0, −2), in 𝑊 = span{(1,1,0,0), (0,0,1,1)}.
Sol: Let 𝑤1 = (1,1,0,0), 𝑤2 = (0,0,1,1) then the closest point to 𝑣 is
⟨𝑣, 𝑤1 ⟩ ⟨𝑣, 𝑤2 ⟩
w
̂= 𝑤1 + 𝑤
⟨𝑤1 , 𝑤1 ⟩ ⟨𝑤𝑖 , 𝑤2 ⟩ 2
6 −2
= (1,1,0,0) + (0,0,1,1) = (3,3,0,0) + (0,0, −1, −1)
2 2
= (3,3, −1, −1) =3(1,1,0,0)+(-1)(0,0,1,1) ∈ 𝑊.
7. Projection Transformation
Any linear map 𝑃 that satisfies 𝑃2 = 𝑃 is called a projection, So 𝑃𝑊 is a projection.
Orthogonal projections satisfy this; identity map is a trivial case.
8. Applications in Machine Learning
Used in dictionary learning, nearest vector approximation in subspaces.
Aids classification (e.g., closest point in class subspace).
Solves min ∥ 𝑣 − 𝑤 ∥for 𝑤 ∈ 𝑊.
Special Matrices and Properties
Overview
Continuation from eigenvalues/eigenvectors for symmetric/orthogonal matrices.
Emphasizes special matrices, particularly positive definite matrices, with applications in
convex optimization and machine learning.
Key Concepts: Quadratic forms, Rayleigh quotient, positive (semi-)definite matrices,
tests for definiteness, and geometric interpretations.
1. Quadratic Form
Quadratic form
Let 𝐴 ∈ ℝ𝑛×𝑛 be a symmetric matrix. The expression
𝑞(𝑋) = 𝑋 𝑇 𝐴𝑋
is called a Quadratic form.
Example: Given
1 −1 2
𝐴 = (−1 3 1)
2 1 4
Then for 𝑋 = (𝑥1 , 𝑥2 , 𝑥3 )𝑇 , we have
𝑞(𝑋) = 𝑋 𝑇 𝐴𝑋 = 𝑥12 + 3𝑥22 + 4𝑥32 − 2𝑥1 𝑥2 + 4𝑥1 𝑥3 + 2𝑥2 𝑥3
Similarly, if
𝑞(𝑋) = 𝑥12 + 4𝑥1 𝑥2 + 𝑥22 + 2𝑥32 + 6𝑥2 𝑥3 , then the associated symmetric matrix 𝐴 is
1 2 0
𝐴 = (2 1 3)
0 3 2
This is standard theory: every quadratic form in real variables with a symmetric matrix
corresponds uniquely via the identity
𝑞(𝐱) = ∑ 𝑎𝑖𝑖 𝑥𝑖2 + 2 ∑ 𝑎𝑖𝑗 𝑥𝑖 𝑥𝑗
𝑖 𝑖<𝑗
(where the factor of 2 on cross terms ensures symmetry of A).
Example (for 3x3 matrix): 𝑄(𝐱) = 𝑎11 𝑥12 + 𝑎22 𝑥22 + 𝑎33 𝑥32 + 2𝑎12 𝑥1 𝑥2 + 2𝑎13 𝑥1 𝑥3 +
2𝑎23 𝑥2 𝑥3 .
2. Rayleigh Quotient
For a real n × n matrix A, we have
𝑋 𝑇 𝐴𝑋
𝑅𝐴 (𝑋) = where 𝑋 ≠ 0
𝑋𝑇 𝑋
Properties:
Scale invariance: For any 𝑋 ≠ 0and any scalar 𝛼 ≠ 0,
𝑅𝐴 (𝑋) = 𝑅𝐴 (𝛼𝑋)
(This follows directly because both numerator and denominator scale by 𝛼 2 , so they cancel.)
If 𝑋is an eigenvector of A with eigenvalue 𝜆, then
𝑅𝐴 (𝑋) = 𝜆
𝑋 𝑇 𝐴𝑋
(Proof: 𝐴𝑋 = 𝜆𝑋 ⇒ 𝑋 𝑇 𝐴𝑋 = 𝜆𝑋 𝑇 𝑋 ⇒ = 𝜆.)
𝑋𝑇𝑋
𝜆min (𝐴) = min𝑋≠0 𝑅𝐴 (𝑋)
For any 𝐱 such that (∥ 𝐱 ∥2 = 1), we have
𝜆min (𝐴) ≤ 𝑋 𝑇 𝐴𝑋 ≤ 𝜆max
This equality holds iff 𝐱 is corresponding eigenvector
For any 𝑋 ≠ 0, we have
𝜆min (𝐴) = min 𝑅𝐴 (𝑋) ≤ 𝑅𝐴 (𝑋) ≤ max 𝑅𝐴 (𝑋) = 𝜆max (𝐴)
(The minimum value of the Rayleigh quotient over all non-zero vectors is exactly the smallest
eigenvalue of A.)
3. Positive (Semi-)Definite Matrices
Positive Semi-Definite (PSD): A Symmetric 𝐴 ∈ ℝ𝑛×𝑛 is said to be positive semi definite
if for all 𝐱 ∈ ℝ𝑛
𝑞(𝐱) = 𝐱 𝑇 𝐴𝐱 ≥ 0
Positive Definite (PD): 𝐱 𝑇 𝐴𝐱 > 0 for all 𝐱 ≠ 0.
Equivalent Conditions:
o All eigenvalues ≥ 0 (PSD) or > 0 (PD).
o All principal minors ≥ 0 (PSD) or > 0 (PD) via Sylvester's Criterion (P-test).
1 3
Example: 𝐴 = [ ]→ 𝑄(𝐱) = (𝑥1 + 3𝑥2 )2 ≥ 0(PSD).
3 10
4. Checking Positive Definiteness
Eigenvalue Test: Compute eigenvalues; all positive for PD and all non-negative for PSD
P-Test (Leading Principal Minors):
o All must be positive for PD and all non-negative for PSD.
2 −1
o Example: 𝐴 = [ ]:
−1 2
1×1: 2 > 0
2×2: det(A) = 3 > 0 → PD.
2 2 −1
Problem Example: For 𝐴 = [ 2 2 −1]:
−1 −1 𝑏
o PSD if -1 ≤ b ≤ 2; PD if -1 < b < 2.
Note:
For any real 𝐴 ∈ ℝ𝑛×𝑛 : 𝐴𝑇 𝐴 is always PSD.
If 𝐴 has full column rank (trivial null space i.e Nullspace(A)={0}): 𝐴𝑇 𝐴 is PD.
Regularization: If 𝐴is PSD and 𝜖 > 0, then 𝐴 + 𝜖𝐼is PD.
5. The geometry of PD quadratic forms
The isocontour of 𝑞(𝐱) = 𝐱 𝑇 𝐴𝐱 are ellipsoids s.t. the axes point in the direction of the
eigenvectors of A, and the radii of these axes are proportional to the inverse square root of the
corresponding eigenvalues.
Level sets of PD quadratic form 𝐱 𝑇 𝐴𝐱 = 𝑐(c > 0) form ellipsoids.
1
Axes align with eigenvectors; semi-axes lengths ∝ .
√𝜆𝑖
Spectral Decomposition
1. Eigen Decomposition
Definition: For an 𝑛 × 𝑛 square matrix 𝐴, eigen decomposition expresses 𝐴as 𝐴 =
𝑃𝐷𝑃 −1, where:
o 𝑃is an invertible matrix with columns as the eigenvectors 𝑣1 , 𝑣2 , … , 𝑣𝑛 of 𝐴.
o 𝐷is a diagonal matrix with eigenvalues 𝜆1 , 𝜆2 , … , 𝜆𝑛 on the diagonal.
Requirements: 𝐴must have 𝑛linearly independent eigenvectors, forming a basis for ℝ𝑛 .
Not all square matrices can be eigen decomposed.
2. Necessary Conditions for Eigen Decomposition
For each eigenvalue, the algebraic multiplicity (from the characteristic polynomial) must equal
the geometric multiplicity (dimension of the eigenspace).
If this holds for all eigenvalues, the eigenvectors span the full space, making 𝑃invertible.’
3. Counterexample: Matrix Without Eigen Decomposition
Matrix:
1 1 1
𝐴 = [0 1 1]
0 0 1
Eigenvalues: 𝜆 = 1(algebraic multiplicity 3).
Eigenvectors: Only one independent eigenvector (e.g., [1,0,0]𝑇 ), so geometric
multiplicity = 1.
Result: Cannot form a full 3 × 3 invertible 𝑃; decomposition impossible.
4. Spectral Decomposition (for Symmetric Matrices)
Symmetric Matrix: 𝐴 = 𝐴𝑇 .
Properties:
o Always has eigen decomposition.
o Eigenvectors are orthogonal (can be chosen to form an orthonormal basis).
o 𝑃is orthogonal, so 𝑃−1 = 𝑃𝑇 .
Form: 𝐴 = 𝑃𝐷𝑃𝑇 .
Alternative Expression:
𝑛
𝐴 = ∑ 𝜆𝑖 𝑣𝑖 𝑣𝑖𝑇
𝑖=1
o Each 𝜆𝑖 𝑣𝑖 𝑣𝑖𝑇 is a rank-1 matrix (outer product of eigenvector with itself).
o 𝑣𝑖 𝑣𝑖𝑇 projects orthogonally onto the 1D subspace spanned by 𝑣𝑖 .
5. Properties of Spectral Decomposition
Projection Interpretation: Each term 𝑣𝑖 𝑣𝑖𝑇 (when 𝑣𝑖 is unit length) is an orthogonal
projection matrix.
Reconstruction: The sum of these rank-1 projections exactly recovers 𝐴.
Orthogonality: Ensures the projections are mutually orthogonal.
6. Example: 2x2 Symmetric Matrix
Matrix:
2 1
𝐴=[ ]
1 2
Eigenvalues: 𝜆1 = 3, 𝜆2 = 1.
Eigenvectors: 𝑣1 = [1,1]𝑇 , 𝑣2 = [−1,1]𝑇 (orthogonal).
1 1
Normalized: 𝑣̂1 = [1,1]𝑇 , 𝑣̂2 = [−1,1]𝑇 .
√2 √2
1 1
− 3 0
𝑃 = [√2
1
√2
1 ], 𝐷=[ ].
0 1
√2 √2
Decomposition: 𝐴 = 3𝑣̂1 𝑣̂1𝑇 + 1𝑣̂2 𝑣̂2𝑇 .
Each 𝑣̂𝑖 𝑣̂𝑖𝑇 is a rank-1 projection.
7. Applications
Principal Component Analysis (PCA).
Spectral clustering.
Statistical machine learning algorithms.
Reinforcement learning.
Image processing.
Recommender systems.
Note:
This decomposition is foundational for many linear algebra applications in data science
and ML.
Singular Value Decomposition (SVD)
1. From Symmetric to Rectangular Matrices
For a symmetric square matrix 𝐴, eigenvalue decomposition: 𝐴 = 𝑃𝐷𝑃𝑇 ( 𝐷diagonal with
eigenvalues; 𝑃orthonormal eigenvectors).
For rectangular 𝑚 × 𝑛matrix 𝐴(𝑚 ≠ 𝑛), standard diagonalization fails—SVD provides an
analogous factorization.
2. Singular Values
Compute 𝐴𝑇 𝐴( 𝑛 × 𝑛symmetric, positive semi-definite matrix).
Eigenvalues of 𝐴𝑇 𝐴: 𝜆1 ≥ 𝜆2 ≥ ⋯ ≥ 𝜆𝑛 ≥ 0.
Singular values: 𝜎𝑖 = √𝜆𝑖 (non-negative, ordered decreasingly: 𝜎1 ≥ ⋯ ≥ 𝜎𝑛 ≥ 0).
Number of non-zero 𝜎𝑖 equals rank(𝐴) = 𝑟.
3. SVD Factorization
𝐴 = 𝑈Σ𝑉 𝑇 , where:
o 𝑈: 𝑚 × 𝑚orthogonal matrix (columns: orthonormal eigenvectors of 𝐴𝐴𝑇 ).
o Σ: 𝑚 × 𝑛"diagonal" matrix (singular values 𝜎1 , … , 𝜎𝑟 on main diagonal; zeros
elsewhere).
If 𝑚 > 𝑛: "tall" (extra zero rows).
If 𝑚 < 𝑛: "fat" (extra zero columns).
o 𝑉: 𝑛 × 𝑛orthogonal matrix (columns: orthonormal eigenvectors of 𝐴𝑇 𝐴, ordered
by decreasing 𝜆𝑖 ).
4. Constructing 𝑼and 𝑽
𝑉's columns: Eigenvectors of 𝐴𝑇 𝐴(corresponding to 𝜆𝑖 = 𝜎𝑖2 ).
𝑈's columns:
o Eigenvectors of 𝐴𝐴𝑇 ( 𝑚 × 𝑚symmetric).
𝐴𝑣𝑖
o Or, for 𝑖 = 1to 𝑟: 𝑢𝑖 = (using 𝑉's columns).
𝜎𝑖
Remaining 𝑈columns (if 𝑚 > 𝑟): Orthogonal to the first 𝑟(from null space of 𝐴𝐴𝑇 ).
𝟎 𝟏
5. Example 1: 3×2 Matrix 𝑨 = [𝟏 √𝟐]
𝟎 𝟏
1 √2 1
𝑇
𝐴𝐴 = [√2 3 √2].
1 √2 2
Eigenvalues: 𝜆 = 8,2,0→ 𝜎1 = 2√2, 𝜎2 = √2, 𝜎3 = 0(rank = 2).
2√2 0
Σ=[ 0 √2](3×2).
0 0
𝑈: Eigenvectors of 𝐴𝐴𝑇 (normalized).
𝑉: From 𝐴𝑇 𝐴eigenvectors.
Verify: 𝐴 = 𝑈Σ𝑉 𝑇 .
𝟒 𝟖
6. Example 2: 3×2 Matrix 𝑨 = [𝟏𝟏 𝟕]
𝟏𝟒 −𝟐
327 96
𝐴𝑇 𝐴 = [ ].
96 147
Eigenvalues: 𝜆1 = 360, 𝜆2 = 90(trace/det method) → 𝜎1 = 6√10, 𝜎2 = 3√10(rank = 2).
𝑉: Eigenvectors of 𝐴𝑇 𝐴(e.g., 𝑣1 ≈ [0.894,0.447]𝑇 , 𝑣2 ≈ [−0.447,0.894]𝑇 ; complete to
orthogonal basis).
𝐴𝑣1
𝑈: 𝑢1 = ≈ [0.632,0.632,0.447]𝑇 , similarly for 𝑢2 ; third column orthogonal.
𝜎1
6√10 0
Σ=[ 0 3√10].
0 0
Verify decomposition holds.
7. Key Properties & Teaser
SVD reveals matrix "structure" via singular values (like eigenvalues for non-square cases).
Rank(𝐴) = number of non-zero singular values.
Next lecture: Applications (e.g., least squares, PCA, image compression).
SVD - Properties and Applications
1. Geometrical Interpretation of SVD
SVD decomposes a matrix 𝐴as 𝐴 = 𝑈Σ𝑉 𝑇 , where:
o 𝑈and 𝑉are orthogonal matrices (rotations/reflections).
o Σis a diagonal matrix with singular values 𝜎𝑖 (non-negative, ordered decreasingly).
Applying SVD to a unit circle:
o 𝑉 𝑇 : Rotates the circle.
o Σ: Scales along principal axes (stretches into an ellipse if singular values differ).
o 𝑈: Rotates the result.
Overall: SVD represents linear transformations as rotation → scaling → rotation.
2. Properties of SVD
For a matrix 𝐴(m × n) of rank 𝑟:
o Number of non-zero singular values = rank 𝑟.
Range Space (Column Space):
o First 𝑟columns of 𝑈form an orthonormal basis for range of 𝐴.
Null Space:
o Last 𝑛 − 𝑟columns of 𝑉(for zero singular values) form an orthonormal basis for
null space of 𝐴.
For 𝐴𝑇 (transpose):
o Range space: First 𝑟columns of 𝑉.
o Null space: Last 𝑚 − 𝑟columns of 𝑈.
Utility: SVD directly computes rank, range, and nullity without solving systems.
3. Pseudo-Inverse via SVD
Moore-Penrose pseudo-inverse 𝐴+ = 𝑉Σ + 𝑈 𝑇 , where Σ + is the pseudo-inverse of Σ.
Constructing Σ + :
1
o For 𝜎𝑖 > 0: Diagonal entry = 𝜎 .
𝑖
o For 𝜎𝑖 = 0(or near-zero): Set to 0 (avoids instability).
Example (2×3 matrix 𝐴):
o Singular values: 6√10, 3√10.
o Σ + formed by reciprocals on diagonal.
o Verifies 𝐴𝐴+ = 𝐼(identity for compatible dimensions).
Role: Solves least-squares problems for over/under-determined systems.
4. Matrix Norms
∥𝐴𝑥∥
Operator Norm:∥ 𝐴 ∥= sup 𝑥≠0 ∥𝑥∥ (induced by vector norm).
Common Induced Norms:
o 1-Norm (∥ 𝐴 ∥1 ): Maximum absolute column sum: max 𝑗 ∑𝑖 ∣ 𝑎𝑖𝑗 ∣.
o ∞-Norm (∥ 𝐴 ∥∞ ): Maximum absolute row sum: max 𝑖 ∑𝑗 ∣ 𝑎𝑖𝑗 ∣.
o Spectral Norm (∥ 𝐴 ∥2 ): Largest singular value 𝜎1 (Euclidean-induced).
2
Frobenius Norm (∥ 𝐴 ∥𝐹 ):√∑ 𝑎𝑖𝑗 = √\trace(𝐴𝑇 𝐴) (like vector 2-norm of entries).
𝑖,𝑗
Unitary Invariance: Norms unchanged under orthogonal transformations: ∥ 𝑈𝐴𝑉 ∥=∥
𝐴 ∥for orthogonal 𝑈, 𝑉.
Example: Compute norms for a sample matrix to illustrate.
5. Applications of SVD
Compute pseudo-inverse for linear systems.
Find bases for range/null spaces.
Dimensionality reduction (preview: low-rank approximation in ML, e.g., compressing
data while preserving structure).
Numerical stability: SVD is robust for ill-conditioned matrices.
Low Rank Approximations
1. Review of Matrix Rank
Definition: Rank of a matrix 𝐴is the dimension of its range space (column space), or
equivalently, the number of linearly independent rows/columns.
Examples:
o Identity matrix 𝐼𝑛 : Rank = 𝑛(full rank).
o Matrix with dependent rows (e.g., third row = row1 + row2): Rank drops to 2 for
a 3×3 matrix.
Key Properties:
o rank(𝐴) = rank(𝐴𝑇 )(row rank = column rank).
o rank(𝑃𝐴𝑄) = rank(𝐴)if 𝑃and 𝑄are invertible.
o rank(𝐴) =number of non-zero singular values in SVD.
o rank(𝐴𝐵) ≤ min(rank(𝐴) , rank(𝐵)).
𝐴 0
o For block matrices: Inequalities like rank ( ) ≤ rank(𝐴) + rank(𝐵).
𝐶 𝐵
2. Matrix Factorization
Concept: For an 𝑚 × 𝑛matrix 𝐴of rank 𝑟, columns of 𝐴are linear combinations of 𝑟basis
vectors from the range space.
Factorization Form:𝐴 = 𝐵𝐶 𝑇 , where 𝐵is 𝑚 × 𝑟(basis vectors) and 𝐶 𝑇 is 𝑟 ×
𝑛(coefficients).
Storage Efficiency: Reduces from 𝑚𝑛elements to 𝑚𝑟 + 𝑛𝑟.
o Example: 50×100 matrix (rank 20) stores as ~3,000 elements vs. 5,000—
motivates low-rank methods for large data (e.g., images, ML models).
3. Low-Rank Approximation
Definition: Find 𝐴̃𝑘 (𝑚 × 𝑛, rank 𝑘 ≤ 𝑟) minimizing Frobenius norm ∥ 𝐴 − 𝐴̃𝑘 ∥𝐹 =
√∑𝑖,𝑗( 𝑎𝑖𝑗 − 𝑎̃𝑖𝑗 )2 .
Optimization Challenge: Non-convex problem.
o Example: Rank-1 approximation of 2×2 identity matrix. Convex combination of
two rank-1 matrices yields rank-2, not rank-1 (violates convexity).
4. Solution via Singular Value Decomposition (SVD)
SVD Recap:𝐴 = 𝑈Σ𝑉 𝑇 , where Σhas singular values 𝜎1 ≥ 𝜎2 ≥ ⋯ ≥ 0.
Best Rank-𝑘 Approximation:𝐴̃𝑘 = ∑𝑘𝑖=1 𝜎𝑖 𝑢𝑖 𝑣𝑖𝑇 (keep top 𝑘singular values, zero others).
Theorem: Eckart-Young—guarantees minimal Frobenius norm error for this 𝐴̃𝑘 .
Examples:
o 5×5 matrix with singular values [3, 1, 0.5, 0.2, 0.05]: Rank-3 approx. retains first
three.
o Upper triangular 3×3 matrix (rank 3): Rank-2 SVD approx. changes entries
minimally (e.g., 3 → 2.99, 2 → 2.01).
5. Applications and Visual Examples
Image Compression: 266-rank grayscale image approximated at ranks 1, 4, 10, 25.
o Rank-1: Basic outline (poor fidelity).
o Rank-4/10: Adds details (edges, shadows).
o Rank-25: Near-original quality—compresses data while preserving visuals.
Matlab Implementation of SVD and Low-Rank
Approximation
clear; clc;
format short; % (use format long for more decimals if needed)
% Define 3×4 matrix A
A = [1 2 3 4;
1 1 2 3;
0 1 1 0];
disp('Matrix A:');
disp(A);
% SVD Computation (full matrices by default in MATLAB)
[U, S, V] = svd(A); % U: 3×3 orthogonal, S: 3×4 diagonal-ish, V: 4×4 orthogonal
% Note: MATLAB returns V (not V'), so V' is the conjugate transpose
disp('U (3×3):');
disp(U);
disp('Singular values (diag elements of S):');
disp(diag(S)); % or just disp(S) to see full matrix
disp('V (4×4):');
disp(V);
% Verification: orthogonality
disp('U * U'' (should be identity):');
disp(U * U');
disp('V * V'' (should be identity):');
disp(V * V');
% Rank: number of non-zero singular values (all 3 here → rank = 3)
tol = 1e-10;
rank_A = sum(diag(S) > tol);
disp(['Rank of A = ' num2str(rank_A)]);
% Reconstructing A = U * S * V'
reconstructed = U * S * V';
disp('Reconstructed A:');
disp(reconstructed);
disp('Max absolute difference |A - reconstructed|:');
disp(max(abs(A(:) - reconstructed(:)))); % should be ~1e-15 or machine epsilon
% Low-Rank Approximation (Rank-2)
S2 = S;
S2(3,3) = 0; % zero out the third singular value
approx_rank2 = U * S2 * V';
disp('Rank-2 approximation:');
disp(approx_rank2);
disp('Max absolute difference from original:');
disp(max(abs(A(:) - approx_rank2(:)))); % small error expected
% =========================================================================
% SVD for 5×4 Matrix (Rows > Columns)
% =========================================================================
% Example taller matrix (modify/extend A or define new one)
A_tall = [1 2 3 4;
1 1 2 3;
0 1 1 0;
2 0 1 4;
3 1 0 2]; % 5×4 example
[U_tall, S_tall, V_tall] = svd(A_tall);
disp('For 5×4 matrix:');
disp('U: 5×5 orthogonal');
disp('S: 5×4 (diagonal in top 4×4 block)');
disp('V: 4×4 orthogonal');
% Last singular value typically near zero → rank = 4 (full column rank)
disp('Singular values:');
disp(diag(S_tall(1:4,1:4))); % last one often ≈0 if rank deficient
% Same reconstruction works: A_tall ≈ U_tall * S_tall * V_tall'
% The implementation is dimension-agnostic in MATLAB
% =========================================================================
% 19:30 – Application: Image Compression via SVD (Conceptual Translation)
% =========================================================================
% MATLAB equivalent workflow (no direct cv2, but use imread/imresize/rgb2gray)
% Example using built-in or file/URL image
% img = imread('your_image.jpg'); % or imread(url) with webread + imdecode
% if size(img,3) == 3
% img_gray = rgb2gray(img); % or mean(img,3) for simple average
% end
% img_double = im2double(img_gray); % [0,1] range
% % Optional: standardize (zero mean, unit variance)
% img_vec = img_double(:);
% mu = mean(img_vec);
% sigma = std(img_vec);
% img_scaled = (img_double - mu) / sigma;
% % SVD
% [U_img, S_img, V_img] = svd(img_scaled, 'econ'); % economy size often preferred for large
images
% % Cumulative variance explained
% sing_vals = diag(S_img);
% variance_explained = cumsum(sing_vals.^2) / sum(sing_vals.^2);
% figure; plot(variance_explained); title('Cumulative Explained Variance');
% xlabel('Number of singular values'); ylabel('Fraction of variance');
% % Low-rank reconstruction (e.g., keep k components)
% k = 50; % choose based on variance plot
% Uk = U_img(:,1:k);
% Sk = S_img(1:k,1:k);
% Vk = V_img(:,1:k);
% recon = Uk * Sk * Vk';
% % Add back mean/std if standardized
% recon = recon * sigma + mu;
% % Clip & visualize
% recon = max(0, min(1, recon));
% figure; imshow(recon); title(['Reconstructed with k = ' num2str(k)]);
% Compression insight:
% - Original image: height × width values
% - Truncated SVD with k << min(m,n): store only k×(m + n) values ≈ much less storage
% - Small singular values → negligible visual contribution → discard for compression
% =========================================================================
% Key Takeaways (MATLAB style)
% =========================================================================
% • Use [U,S,V] = svd(A) for full; [U,S,V] = svd(A,'econ') for economy size
% • Reconstruct: A ≈ U * S * V'
% • Form low-rank approx by zeroing small diagonal entries in S
% • Very powerful for compression, noise reduction, PCA (U columns = principal components)
% • Applications: image compression, dimensionality reduction, latent semantic analysis, etc.