0% found this document useful (0 votes)

26 views51 pages

Chapter II 1

The document provides an overview of vectors and their algebra in machine learning, explaining concepts such as vector representation, operations, linear combinations, and independence. It also covers matrix algebra, including types of matrices, operations, multiplication, and properties like determinants and inverses. Additionally, it discusses feature vectors in machine learning and includes MATLAB implementation examples for vector operations.

Uploaded by

editkpop143

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

26 views51 pages

Chapter II 1

Uploaded by

editkpop143

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

Vectors in Machine Learning

1. Vector:

A vector is a mathematical object that encodes a length (magnitude) and direction.

 Formally, vectors are elements of a vector space: a collection of objects closed under
vector addition and scalar multiplication.

 Representation: Vectors are typically represented as a 1D array of numbers

(components):

o Row vector: 𝐯 = [𝑣1 , 𝑣2 , … , 𝑣𝑛 ] ∈ ℝ𝑛 , where 𝑣𝑖 ∈ ℝ.

𝑣1
𝑣2
o Column vector: 𝐯 = [ ⋮ ] ∈ ℝ𝑛 .
𝑣𝑛

 Geometrically, vectors represent coordinates in an 𝑛-dimensional space (e.g., arrows

from the origin in 2D or 3D).

 Simple geometric view: An arrow with origin, direction, and magnitude (length ∥ 𝐯 ∥=
√𝑣12 + ⋯ + 𝑣𝑛2 ).

o One-dimensional arrays (row or column vectors) in n-dimensional space (ℝⁿ).

o Components are real numbers.

 Examples in ℝ𝑛 :

o For 𝑛 = 2 (ℝ2 = ℝ × ℝ): 𝐯 = (1,2). – components along x and y axes.

o For 𝑛 = 3: 𝐯 = (1,2,3) – components in x, y, z directions.

2. Vector Algebra

 Addition: Component-wise for vectors of the same dimension.

𝐮 + 𝐯 = (𝑢1 + 𝑣1 , 𝑢2 + 𝑣2 , … , 𝑢𝑛 + 𝑣𝑛 ).

o Example in ℝ2 : Let 𝐮 = (1,3), 𝐯 = (1,1). Then 𝐮 + 𝐯 = (2,4).

o Geometrically: Parallelogram rule (head-to-tail).

 Subtraction:
o Example: v₁ = (1, 3), v₂ = (1, 1) → v₁ - v₂ = (0, 2).

 Scalar multiplication: 𝑐𝐯 = (𝑐𝑣1 , 𝑐𝑣2 , … , 𝑐𝑣𝑛 ), where 𝑐 ∈ ℝ.

o Example: 2𝐮 = (2,6).

 Dot product (inner product): : Scalar from component-wise multiplication and

summation.

𝐮 ⋅ 𝐯 = 𝑢1 𝑣1 + 𝑢2 𝑣2 + ⋯ + 𝑢𝑛 𝑣𝑛 = ∑𝑛𝑖=1 𝑢𝑖 𝑣𝑖 .

o Measures alignment (0 if perpendicular).

o Example in ℝ𝟑 :

Let 𝐮 = (1,1, −1), 𝐯 = (2,3,1).

Then 𝐮 ⋅ 𝐯 = 1 ⋅ 2 + 1 ⋅ 3 + (−1) ⋅ 1 = 2 + 3 − 1 = 4.

 Magnitude (Norm): ∥ 𝐯 ∥= √𝐯 ⋅ 𝐯.

o Example: v = (1, -1, 2) → ||v|| = √(1 + 1 + 4) = √6.

 Angle Between Vectors: cos θ = (v₁ · v₂) / (||v₁|| · ||v₂||).

3. Linear Combinations

Given a set 𝑆 = {𝐯1 , 𝐯2 , … , 𝐯𝑘 }, a linear combination is: 𝐯 = 𝛼1 𝐯1 + 𝛼2 𝐯2 + ⋯ + 𝛼𝑘 𝐯𝑘 , where

𝛼1 , … , 𝛼𝑘 ∈ ℝ are scalars.

Example:

Let 𝐯1 = [1,1, −1], 𝐯2 = [0,1, −1], 𝐯3 = [0,0,1].

Then 𝛼1 𝐯1 + 𝛼2 𝐯2 + 𝛼3 𝐯3 = 𝛼1 [1,1, −1] + 𝛼2 [0,1, −1] + 𝛼3 [0,0,1]

= [𝛼1 , 𝛼1 + 𝛼2 , −𝛼1 − 𝛼2 + 𝛼3 ].

4. Linear Independence

A set 𝑆 = {𝐯1 , … , 𝐯𝑘 }is linearly independent (LI) if the equation 𝛼1 𝐯1 + ⋯ + 𝛼𝑘 𝐯𝑘 = 𝟎 holds

only when 𝛼1 = 𝛼2 = ⋯ = 𝛼𝑘 = 0. Otherwise, it's linearly dependent (LD) (some non-trivial
combination gives zero).

 Linearly Independent (LI): Only trivial solution (all αᵢ = 0) to α₁v₁ + ... + αₖvₖ = 0.
 Linearly Dependent (LD): Non-trivial solution exists.

Example in ℝ2 : 𝑆 = {(1,0), (1,1)}. Solve 𝛼1 (1,0) + 𝛼2 (1,1) = (0,0): 𝛼1 + 𝛼2 = 0, 𝛼2 = [Link]

𝛼2 = 0, 𝛼1 = 0. Thus, LI.

Example in ℝ2 : 𝑆1 = {(1,1), (3,3)}. −3(1,1) + 1(3,3) = (−3 + 3, −3 + 3) = (0,0).Non-zero

scalars, so LD.

Examples in ℝ²:

o LI: {(1, 1), (1, -1)} – only zero coefficients work.

o LD: {(1, 1), (3, 3)} – 3(1, 1) - (3, 3) = (0, 0).

Example in ℝ³: S={(1, -1, 0), (1, 0, 1), (0, 1, 1)} is LD,

 Non-trivial solution: Let 𝛼1 = 1, then 𝛼2 = −1, 𝛼3 = 1.

Verify:
1 ⋅ (1, −1,0) − 1 ⋅ (1,0,1) + 1 ⋅ (0,1,1) = (1 − 1 + 0, −1 − 0 + 1,0 − 1 + 1) = (0,0,0).
Thus, LD (not LI).

Remarks:

 In ℝ𝑛 , any set with more than 𝑛vectors is LD.

 Any set containing the zero vector is LD.

 LD sets contain redundant vectors (one is a combination of others).

6. Orthogonal and Orthonormal Vectors

 A set {𝐯1 , … , 𝐯𝑛 }is mutually orthogonal (pairwise) if 𝐯𝑖 ⋅ 𝐯𝑗 = 0for all 𝑖 ≠ 𝑗.

 It's orthonormal if orthogonal and∥ 𝐯𝑖 ∥= 1 for each 𝑖.

Example (Orthogonal in ℝ3 ): 𝑆 = {[1,0, −1], (1, √2, 1), [1, −√2, 1]}.

 (1,0, −1) ⋅ (1, √2, 1) = 1 + 0 − 1 = 0.

 (1,0, −1) ⋅ (1, −√2, 1) = 1 + 0 − 1 = 0.

 (1, √2, 1) ⋅ (1, −√2, 1) = 1 − 2 + 1 = 0. All pairs orthogonal, so orthogonal set. (Note:
Your notes had a typo in the last dot product, listing both as (1, √2, 1), which would be 4
≠ 0.)
1 , 1 1 1
Example (Orthonormal in ℝ2 ): Your notes have {[ ],[ ,− ]}.
√2 √2 √2 √2

1 1 1 1 1 1
 Dot product: ⋅ + ⋅ (− ) = 2 − 2 = 0.
√2 √2 √2 √2

1 2 1 2 1 1
 Each length: √( ) + ( ) = √2 + 2 = 1. Thus, orthonormal.
√2 √2

Remark: Any orthogonal set (with no zero vectors) is LI (non-zero dot products prevent trivial
combinations equaling zero).

Feature Vectors in Machine Learning

 Concept: Data points as vectors.

 Example: Employee data – [height, weight] for each person; ID might be excluded for
numerical features.

MATLAB Implementation

 Vector Definition: v = [1, -1, 2]; w = [2, 5, 2]; (row vectors by default; use v = [1; -1; 2] for
column vectors if needed).

 Operations:

o Addition: v + w

o Subtraction: v - w

o Scalar Multiply: 3 * v

o Norm: norm(v)

o Dot Product: dot(v, w) (or v * w.' for row vectors)

Basics of Matrix Algebra
Definition:

 A matrix is a rectangular array of scalars (real numbers), denoted as an 𝑚 × 𝑛matrix

with 𝑚rows and 𝑛columns.

 Elements: 𝑎𝑖𝑗 where 𝑖= row index, 𝑗= column index (1-based indexing).

 Example: (2×3 matrix)

1 2 3
𝐴=[ ]
4 5 6

2. Types of Matrices

 Diagonal Matrix: Off-diagonal elements = 0.

3 0
Example: [ ]
0 5
 Zero Matrix: All elements = 0 (denoted 𝑂).

 Upper Triangular: Elements below main diagonal = 0.

 Lower Triangular: Elements above main diagonal = 0.

 Identity Matrix (𝐼𝑛 ): Square matrix with 1s on diagonal, 0s elsewhere.

1 0
Example: 𝐼2 = [ ](multiplicative identity).
0 1
3. Matrix Equality and Basic Operations

 Equality: Same dimensions and corresponding elements equal (𝐴 = 𝐵 iff 𝑎𝑖𝑗 = 𝑏𝑖𝑗 for all
𝑖, 𝑗).

 Addition/Subtraction: Only for same-sized matrices; element-wise.

o Commutative: 𝐴 + 𝐵 = 𝐵 + 𝐴
𝑎 + 𝑏11 ⋯
o Associative: (𝐴 + 𝐵) + 𝐶 = 𝐴 + (𝐵 + 𝐶)Example: 𝐴 + 𝐵 = [ 11 ]
⋮ ⋱
 Scalar Multiplication: Multiply each element by scalar 𝛼: 𝛼𝐴 = [𝛼𝑎𝑖𝑗 ].

o Distributive: 𝛼(𝐴 + 𝐵) = 𝛼𝐴 + 𝛼𝐵; (𝛼 + 𝛽)𝐴 = 𝛼𝐴 + 𝛽𝐴.

4. Matrix Multiplication

 Condition: Columns of first matrix = rows of second (𝑚 × 𝑛 × 𝑛 × 𝑝→ 𝑚 × 𝑝).

 Definition: Entry 𝑐𝑖𝑗 = ∑𝑛𝑘=1 𝑎𝑖𝑘 𝑏𝑘𝑗 (row 𝑖of A dot column 𝑗of B).

 Properties:

o Not Commutative: 𝐴𝐵 ≠ 𝐵𝐴(in general).

o Associative: (𝐴𝐵)𝐶 = 𝐴(𝐵𝐶).

o Distributive: 𝐴(𝐵 + 𝐶) = 𝐴𝐵 + 𝐴𝐶; (𝐴 + 𝐵)𝐶 = 𝐴𝐶 + 𝐵𝐶.

 Example (2×2):
1 2 5 6 19 22
[ ][ ]=[ ]
3 4 7 8 43 50

5. Transpose of a Matrix

 Definition: 𝐴𝑇 swaps rows and columns: ( 𝐴𝑇 )𝑖𝑗 = 𝑎𝑗𝑖 .

 Properties:

o (𝐴 + 𝐵)𝑇 = 𝐴𝑇 + 𝐵 𝑇

o ( 𝐴𝐵)𝑇 = 𝐵 𝑇 𝐴𝑇

o ( 𝑘𝐴)𝑇 = 𝑘𝐴𝑇 (scalar 𝑘)

o ( 𝐴𝑇 )𝑇 = 𝐴
1 2 𝑇 1 3
 Example: For 𝐴 = [ ], 𝐴 = [ ].
3 4 2 4
6. Determinant and Inverse

 Determinant (for square matrices only): Measures "volume scaling" or singularity.

𝑎 𝑏
o 2×2: det [ ] = 𝑎𝑑 − 𝑏𝑐.
𝑐 𝑑
o If det(𝐴) = 0, matrix is singular (no inverse).

 Inverse (𝐴−1 ): Exists if det(𝐴) ≠ 0(invertible/nonsingular). Satisfies 𝐴𝐴−1 = 𝐴−1 𝐴 = 𝐼.

1 𝑑 −𝑏
o For 2×2: 𝐴−1 = det(𝐴) [ ].
−𝑐 𝑎
 Properties:
o ( 𝐴𝐵)−1 = 𝐵 −1 𝐴−1

o ( 𝐴−1 )−1 = 𝐴

o (𝐴𝑇 )−1 = (𝐴−1 )𝑇

1
o ( 𝑘𝐴)−1 = 𝑘 𝐴−1 (if 𝑘 ≠ 0).

7. MATLAB Implementation

 Create: A = [1 2; 3 4]; (row vectors separated by spaces, columns by semicolons).

 Operations:

o Addition: A + B (element-wise; no separate add function needed)

o Multiplication: A * B (matrix multiplication)

o Transpose: A'

o Determinant: det(A)

o Inverse: inv(A) (throws error if singular, e.g., MATLAB:SingularMatrix)

 Error Handling: Incompatible shapes or singular matrices throw exceptions (e.g.,

dimension mismatch or singular matrix warnings/errors).
Vector Space - Definition and Examples
1. Group

A group is a non-empty set G together with a binary operation * (e.g. addition, multiplication)
such that the following four properties are satisfied:

1. Closure: ∀ a, b ∈ G, a * b ∈ G

2. Associativity: ∀ a, b, c ∈ G, a * (b * c) = (a * b) * c

3. Identity element: ∃ e ∈ G such that ∀ a ∈ G, a * e = e * a = a

4. Inverse element: ∀ a ∈ G, ∃ a⁻¹ ∈ G such that a * a⁻¹ = a⁻¹ * a = e

Abelian group

An abelian group is a group (G, *) on which the binary operation is commutative i.e.;

a * b = b * a ∀ a, b ∈ G

∀v∈V

⇒ Every abelian group is a group, but not every group is abelian.

Field

A field is a non-empty set F together with two binary operations + and multiplication (·),
satisfying the following conditions:

1. (F, +) is an abelian group with identity element 0

2. (F - {0}, ·) is an abelian group with identity element 1

3. Distributive law holds ∀ a, b, c ∈ F:

a · (b + c) = a · b + a · c

a · (b · c) = a · b + a · c

Example:

(Z, +) is an abelian group

(Q, +, ·), (R, +, ·) and (C, +, ·) are fields

2. Definition of a Vector Space

 A vector space 𝑉 over a field ℝ(real numbers) is a set of vectors equipped with two
operations:

o Vector addition (denoted +): Combines two vectors to produce another vector
in 𝑉.

 A map f: V × V → V is called internal binary operation in V.

o Scalar multiplication (denoted ⋅): Multiplies a scalar from ℝby a vector to

produce another vector in 𝑉.

 A map g: F × V → V is called an external binary operation in V (F)

 These operations must satisfy six axioms (properties) for 𝑉to be a vector space. (Note:
Generalizable to any field, but ML typically uses ℝ.)

3. The Six Axioms / Properties

The axioms ensure the structure behaves like familiar vector operations. Structured as group
properties for addition + compatibility with scalars.

Axiom Description Mathematical Form

𝑉 with +forms an abelian
(commutative) group: closure, 𝐮+𝐯=𝐯+𝐮
1. Abelian Group commutativity, associativity, (𝐮 + 𝐯) + 𝐰 = 𝐮 + (𝐯 + 𝐰)
under Addition additive identity (zero vector 𝟎), 𝐮+𝟎=𝐮
and additive inverses (for every 𝐮 + (−𝐮) = 𝟎
𝐯, −𝐯 exists).
2. Closure under Scalar 𝑎 ∈ ℝ, vector 𝐯 ∈ 𝑉:
𝑎⋅𝐯∈𝑉
Scalar Multiplication result stays in 𝑉.
3. Distributivity
Scalar distributes over vector
(Scalar over Vector 𝑎 ⋅ (𝐮 + 𝐯) = 𝑎 ⋅ 𝐮 + 𝑎 ⋅ 𝐯
sum.
Addition)
Axiom Description Mathematical Form
4. Distributivity
Vector scaled by sum of scalars
(Vector over Scalar (𝑎 + 𝑏) ⋅ 𝐯 = 𝑎 ⋅ 𝐯 + 𝑏 ⋅ 𝐯
equals sum of scaled vectors.
Addition)
5. Associativity of Scalar multiplication is
𝑎 ⋅ (𝑏 ⋅ 𝐯) = (𝑎𝑏) ⋅ 𝐯
Scalar Multiplication associative.
6. Multiplicative The scalar 1 leaves vectors
1⋅𝐯=𝐯
Identity (Unitary Law) unchanged.

Definition of Vector Space

Let (F, +, ·) be a field. Then, a non-empty set V together with two binary operations - vector
addition (internal) and scalar multiplication (external) is called a vector space over the field F if
the following properties are satisfied:

(A) (V, +) is an abelian group.

(i) u+v ∈ V ∀ u, v ∈ V (vector addition)

(ii) (u + w) + v = u + (v + w) ∀ u, v, w ∈ V
(iii) u+0=0+u=u∀u∈V
(iv) (v) u + (-u) = (-u) + u = 0 ∀ u ∈ V
(v) u + v = v + u ∀ u, v ∈ V

(B) Scalar Multiplication

Any u ∈ V and a ∈ F ⇒ a u ∈ V. [Scalar Multiplication]

These laws must satisfy the following conditions:

(i) Multiplication by scalar is distributive over vector addition:

Mathematically, a (u + v) = a u + a v ∀ a ∈ F, u, v ∈ V
(ii) Multiplication by vector is distributive over scalar addition:
Mathematically, (a + b) u = a u + b u, ∀ a, b ∈ F, u ∈ V
(iii) (a · b) u = a · (b · u), ∀ a, b ∈ F, u ∈ V
(iv) 1 u = u is true if 1 is a scalar and u is a vector.

Remarks:

1. The elements of V are called vectors whereas the elements of F are called scalars.
2. In dealing with vector spaces there are two types of zero elements:

(i) 0 of the field, or

(ii) 0 of the vector space. 0_V ∈ V if and only if 0_V = 0 · v for some v ∈ V.

Examples of Vector Spaces

 If F = ℝ then V(ℝ) is called a real vector space.

 If F = ℂ then V(ℂ) is called a complex vector space.

 The space {0} is called a null space or zero space.

 ℝ(ℝ), ℂ(ℂ), ℂ(ℝ), ℂn(ℂ) are all vector spaces under usual addition and multiplication.

 ℝ𝑛 (Euclidean Space):

o Vectors: Ordered 𝑛-tuples (e.g., ℝ2 : points (𝑥, 𝑦)).

o Addition: Component-wise (e.g., (𝑎, 𝑏) + (𝑐, 𝑑) = (𝑎 + 𝑐, 𝑏 + 𝑑)).

o Scalar Multiplication: Component-wise (e.g., 𝑘 ⋅ (𝑎, 𝑏) = (𝑘𝑎, 𝑘𝑏)).

o All axioms hold (e.g., closure: sums/products stay in ℝ𝑛 ; identity: (0,0, … ,0)).

o Relevance to ML: Represents feature vectors (e.g., 𝑛 = 1000 for high-

dimensional data).

 Set of 𝑚 × 𝑛 Matrices:

o Vectors: Matrices over ℝ.

o Operations: Matrix addition and scalar multiplication (standard).

o Forms a vector space (closure under entry-wise ops).

 Polynomials over ℝ:

o All polynomials in variable 𝑥with real coefficients (e.g., 3𝑥 2 + 2𝑥 − 1).

o Or polynomials of degree ≤ 𝑛(finite-dimensional subspace).

o Operations: Polynomial addition and scalar multiplication.

o Axioms hold (e.g., sum of degree-𝑛 polys is degree ≤ 𝑛).

 Convergent Sequences:

o Set of all convergent real sequences (e.g., (𝑎1 , 𝑎2 , … )where lim 𝑎𝑖 exists).
o Operations: Component-wise addition/multiplication.

o Forms a vector space.

4. Non-Examples (Sets That Fail to Be Vector Spaces)

 ℝ(ℂ) is not a vector space under usual addition and multiplication.

If a ∈ ℂ, u, v ∈ ℝ

Scalar Multiplication: a u ∈ ℂ

But a ∈ F, u ∈ V ⇒ a u ∈ V

⇒ Scalar multiplication a u ∈ ℂ fails since a u ∉ ℝ

e.g. (2 + 3i) ∈ ℂ, 5 ∈ ℝ, (2 + 3i)(5) ∉ ℝ

 Polynomials of Exact Degree 𝑛(e.g., degree exactly 3):

o Fails closure under addition: Sum of two degree-3 polys can be degree < 3 (e.g.,
𝑥 3 + 𝑥 2 + 𝑥 + 7 + (−𝑥 3 + 3𝑥 2 + 4𝑥 + 1) = 4𝑥 2 + 5𝑥 + 8, degree 2).

 Modified Operations on ℝ2 :

o Custom addition: (𝑥1 , 𝑦1 ) + (𝑥2 , 𝑦2 ) = (𝑥1 + 𝑥2 , 𝑦1 + 2𝑦2 ).

o Fails commutativity: (𝑥1 , 𝑦1 ) + (𝑥2 , 𝑦2 ) = (𝑥1 + 𝑥2 , 𝑦1 + 2𝑦2 ) ≠ (𝑥2 + 𝑥1 , 𝑦2 +

2𝑦1 ) = (𝑥2 , 𝑦2 ) + (𝑥1 , 𝑦1 )(unless 𝑦1 = 𝑦2 ).

o Lesson: Structure depends on both set and operations.

5. Geometric Interpretation (for Low Dimensions)

 ℝ1 (Real Line): Vectors as arrows from origin (0) to point 𝑥(e.g., 𝑣 = 5: arrow to 5; 𝑣 =
−2: arrow left to -2).

 ℝ2 (XY-Plane): Vectors as arrows from origin to (x,y).

 ℝ3 (XYZ-Space): Arrows from origin to (x,y,z).

 Vector Addition: Parallelogram rule (diagonal = sum; e.g., tails at origin, heads form
parallelogram).

 Linear Independence: Vectors 𝐯𝟏 , 𝐯𝟐 , 𝐯𝟑 independent if 𝐯𝟑 not in plane spanned by 𝐯𝟏 , 𝐯𝟐 .

 Higher Dimensions: Abstract (hard to visualize for 𝑛 > 3), but generalizes; common in
ML for datasets with many features (e.g., 𝑛 = 2000).
Vector Subspaces - Examples and Properties
Definition of a Subspace

non-empty subset W of a vector space 𝑉(e.g., ℝ𝑁 ) is a subspace if:

1. It contains the zero vector 𝟎.

2. It is closed under vector addition: If 𝐱, 𝐲 ∈ 𝑊, then 𝐱 + 𝐲 ∈ 𝑊.

3. It is closed under scalar multiplication: If 𝐱 ∈ 𝑊and 𝛼 ∈ ℝ, then 𝛼𝐱 ∈ 𝑊.

These three conditions ensure 𝑊 satisfies all vector space axioms.

Theorem: A subset 𝑊 ⊆ ℝ𝑁 is a subspace iff it contains 𝟎 and is closed under addition and
scalar multiplication.

Properties of Subspaces

 Intersection: The intersection of any number of subspaces is always a subspace.

o Example: For 𝑊1 = { (𝑥1 , 𝑥2 , 𝑥3 ) ∣ 𝑥1 + 𝑥2 − 𝑥3 = 0 }and 𝑊2 =

{ (𝑥1 , 𝑥2 , 𝑥3 ) ∣ 𝑥1 = 𝑥2 = 𝑥3 }, 𝑊1 ∩ 𝑊2 = {(0,0,0)}, which is a subspace.

 Union: The union of two subspaces is a subspace only if one is contained in the other
(generally not a subspace).

Examples of Subspaces

1. Trivial Subspaces:

o Zero subspace: {𝟎}.

o Whole space: ℝ𝑁 .

2. Symmetric Matrices:

o Set of all 3×3 symmetric matrices form a subspace of 𝑀3×3 (ℝ).

3. Linear Constraints:

o 𝑊 = { (𝑥1 , 𝑥2 , 𝑥3 ) ∈ ℝ3 ∣ 𝑥1 + 𝑥2 − 𝑥3 = 0 }.

 Contains zero: Yes.

 Closed under addition and scalar multiplication: Verified by substitution.

4. Equal Components:

o 𝑊 = { (𝑥1 , 𝑥2 , 𝑥3 ) ∈ ℝ3 ∣ 𝑥1 = 𝑥2 = 𝑥3 } = { (𝛼, 𝛼, 𝛼) ∣ 𝛼 ∈ ℝ }.
 Contains zero, closed under operations.

5. Non-Example:

o 𝑊 = { (𝑥1 , 𝑥2 , 𝑥3 ) ∈ ℝ3 ∣ 𝑥1 + 𝑥2 + 𝑥3 = 1 }.

 Fails: Zero vector gives 0 ≠ 1.

Geometric Interpretation in ℝ𝟑

Subspaces are:

 A point: Origin { (0,0,0) }.

 A line through the origin.

 A plane through the origin.

 The entire space ℝ3 .

Linear Span

 Definition: For a set 𝑆 = {𝐯1 , … , 𝐯𝑛 } ⊆ 𝑉,

span(𝑆) = { 𝑐1 𝐯1 + ⋯ + 𝑐𝑛 𝐯𝑛 ∣ 𝑐𝑖 ∈ ℝ }.

 Properties:

o Always a subspace.

o Smallest subspace containing 𝑆.

o If 𝑆 is a subspace, span(𝑆) = 𝑆.

o L(∅) = {0} is called zero space.

 Examples:

o 𝑆 = {(1,0), (0,1)} ⊆ ℝ2 : span(𝑆) = ℝ2 .

o 𝑆 = {(1,2,0), (1,1, −1)} ⊆ ℝ3 : Forms a plane.

Elementary Properties of Vector Spaces

1. a · 0 = 0 for all a ∈ F and all 0 ∈ V.

2. 0 · u = 0 for all u ∈ V and all 0 ∈ F.

3. a (u + w) = a u + a w for all a ∈ F and all u, w ∈ V.

4. a (u - w) = a u - a w for all a ∈ F and all u, w ∈ V.

Here V(F) is a vector space, u, w ∈ V, a ∈ F.

Important Subspaces in Linear Algebra

For an 𝑚 × 𝑛matrix 𝐴:

 Row Space: Span of rows.

 Column Space (Range): Span of columns.

 Null Space (Kernel): {𝐱 ∈ ℝ𝑛 ∣ 𝐴𝐱 = 𝟎}.

These are crucial in applications like machine learning (e.g., PCA for dimensionality reduction).

Basis and Dimension

Definition of Basis

A basis for a vector space 𝑉over a field 𝐹is a set of vectors that satisfies two conditions:

 Linear Independence: No vector in the set can be expressed as a linear combination of

the others.

 Spanning: Every vector in 𝑉can be uniquely written as a linear combination of the basis
vectors.

Formally: Let {𝑣1 , 𝑣2 , … , 𝑣𝑛 } ⊆ 𝑉. It is a basis if:

1. The vectors are linearly independent: 𝛼1 𝑣1 + ⋯ + 𝛼𝑛 𝑣𝑛 = 0implies 𝛼𝑖 = 0for all 𝑖.

2. It spans 𝑉: For any 𝑣 ∈ 𝑉, there exist 𝛼𝑖 ∈ 𝐹such that 𝑣 = 𝛼1 𝑣1 + ⋯ + 𝛼𝑛 𝑣𝑛 .

 A vector space is finite-dimensional if it has a finite basis; otherwise, it is infinite-

dimensional.

Examples of Bases

 ℝ2 over ℝ: Standard basis {(1,0), (0,1)}.

 ℝ3 over ℝ: Standard basis {(1,0,0), (0,1,0), (0,0,1)}.

 ℝ𝑛 over ℝ: Vectors with a 1 in one position and 0s elsewhere (n vectors).

1 0
 Space of all 2 × 2real matrices: Basis consists of 4 matrices (e.g., 𝐸11 = ( ), etc.).
0 0
1 0
 Space of symmetric 2 × 2matrices: Basis has 3 matrices due to symmetry (e.g., ( ),
0 0
0 1 0 0
( ), ( )).
1 0 0 1
Dimension of a Vector Space

The dimension of 𝑉, denoted dim(𝑉), is the number of vectors in any basis for 𝑉. Examples:

 dim(ℝ𝑛 ) = 𝑛

 Space of 𝑚 × 𝑛real matrices: dim =𝑚×𝑛

 Space of polynomials of degree ≤ 𝑛: dim = 𝑛 + 1(basis: {1, 𝑥, 𝑥 2 , … , 𝑥 𝑛 })

 Space of all polynomials: Infinite-dimensional.

Key Theorems and Properties

1. Linear Dependence in Finite Dimensions: In an n-dimensional space, any set of more

than n vectors is linearly dependent. A basis is a maximal linearly independent set.

2. Basis Extension: Any linearly independent set of k < n vectors in an n-dimensional space
can be extended to a full basis by adding n - k more vectors.

3. Uniqueness of Dimension: All bases of a finite-dimensional vector space have the same
cardinality (equal to the dimension).

4. Dimension Formula for Subspaces: For subspaces 𝑆1 , 𝑆2 ⊆ 𝑉,

dim(𝑆1 ) + dim(𝑆2 ) = dim(𝑆1 + 𝑆2 ) + dim(𝑆1 ∩ 𝑆2 )

(Where 𝑆1 + 𝑆2 = { 𝑠1 + 𝑠2 ∣ 𝑠1 ∈ 𝑆1 , 𝑠2 ∈ 𝑆2 }.)

Examples of Basis and Dimension Calculations

Example 1: Subspace in ℝ𝟑

 Subspace S: 𝑥1 + 𝑥2 − 𝑥3 = 0

o Solve: Let 𝑥1 = 𝑡, 𝑥2 = 𝑠; then 𝑥3 = 𝑡 + 𝑠.

o Basis: {(1,0,1), (0,1,1)}

o Dimension: 2

 Subspace W: 𝑥1 = 𝑥2 = 𝑥3
o Basis: {(1,1,1)}

o Dimension: 1

 Intersection 𝑆 ∩ 𝑊: Only the zero vector (no non-trivial solution satisfies both).

o Basis: Empty (trivial subspace)

o Dimension: 0

Example 2: Subspaces in ℝ𝟒

 S₁: Defined by 𝑥1 + 𝑥2 − 𝑥3 + 𝑥4 = 0and 𝑥1 + 𝑥2 + 𝑥3 + 𝑥4 = 0

o Adding equations: 2𝑥1 + 2𝑥2 = 0→ 𝑥1 = −𝑥2.

o Let 𝑥2 = 𝑡, 𝑥3 = 𝑠; then 𝑥1 = −𝑡, 𝑥4 = 𝑡 + 𝑠.

o Basis: {(−1,1,0,1), (0,0,1,1)}(or equivalent)

o Dimension: 2

 S₂ (similar constraints, e.g., other equations leading to dim 2).

 S₁ ∩ S₂: Trivial intersection.

o Dimension: 0
Linear Transformations
1. Introduction and Applications

 Linear transformations are a fundamental tool in linear algebra, described as the "most
useful concept" for machine learning (ML) and computer science (CS).

 Key Applications:

o ML and Data Projection: Used to map data from non-linearly separable spaces to
linearly separable ones (e.g., for linear SVM or perceptron classifiers).

o Calculus: Represent derivatives (e.g., from ℝ³ → ℝ³).

o Graphics/Animation: Enable 2D/3D modeling by changing object shapes, scaling,

rotating, or shifting viewpoints via geometric transformations.

o Other Fields: Common in electrical engineering, electronics, and abstract math.

3. Definition:
Let V and W be vector spaces over a field F, with dim(V) = n and dim(W) = m. A linear
transformation T: V → W is a mapping satisfying:

1. Additivity: T(v₁ + v₂) = T(v₁) + T(v₂) for all v₁, v₂ ∈ V.

2. Homogeneity: T(αv) = α T(v) for all v ∈ V and scalar α ∈ F.

 Note:

1. A linear transformation is also known as Linear map or linear mapping.

2. These conditions ensure T preserves vector addition and scalar multiplication.

3. A Liner transformation from a vector space to underlying field is called a linear

functional.

Examples:

 Example 1: T: ℝ² → ℝ² defined by T(x₁, x₂) = (x₁, x₁ + x₂).

o Verification:

 Additivity:

T((x₁, x₂) + (y₁, y₂)) = T(x₁ + y₁, x₂ + y₂)

= (x₁ + y₁, (x₁ + y₁) + (x₂ + y₂))

= (x₁, x₁ + x₂) + (y₁, y₁ + y₂) = T(x₁, x₂) + T(y₁, y₂).

 Homogeneity:

T(α(x₁, x₂)) = T(αx₁, αx₂) = (αx₁, αx₁ + αx₂) = α (x₁, x₁ + x₂) = α T(x₁, x₂).

o Therefore, T is linear.

 Example 2: T: ℝ³ → ℝ³ defined by T(x₁, x₂, x₃) = (x₂, x₁, 0).

o Brief mention; verification follows similar steps (not detailed in lecture).

4. Geometric Interpretations

 Overview: Linear transformations alter vectors by scaling, rotating, shearing, or

projecting shapes (e.g., a unit square in ℝ²).

 Examples in ℝ² (using unit square with vertices (0,0), (1,0), (1,1), (0,1)):

o Uniform Scaling: T(x₁, x₂) = (2x₁, 2x₂) → Square scales to side length 2.

o Non-Uniform Scaling (Shear): T(x₁, x₂) = (x₁, 2x₂) → Rectangle (width 1, height 2).

o Projection: T(x₁, x₂) = (x₁, 0) → Collapses to line segment on x₁-axis [(0,0) to (1,0)].

o Rotation by θ (e.g., 45°): T(x₁, x₂) = (x₁ cos θ - x₂ sin θ, x₁ sin θ + x₂ cos θ).
cos 𝜃 − sin 𝜃 𝑥1
 Matrix form: ( )applied to (𝑥 ).
sin 𝜃 cos 𝜃 2

 Extends to ℝ³ rotations.

 Note: Transformations change shape/orientation while preserving linearity.

Remarks

 Special Cases:

o T: V → V is a linear operator.

o T: V → F is a linear functional.

 Non-Linear Examples (to identify failures):

o T: ℝ² → ℝ³, T(x₁, x₂) = (x₁ + x₂ + 1, 2x₁ - x₂, x₁ + x₂) → Fails T(0) = 0 (maps to
(1,0,0)).

o T: ℝ² → ℝ², T(x₁, x₂) = (x₁ - x₂, 2|x₁| - x₂) → Fails additivity (e.g., T(1,1) + T(2,2) ≠
T(3,3)).
o T: M₂×₂ → M₂×₂, T(A) = A + I (identity) → Fails T(0) = 0.

o Linear ones: T(x₁, x₂) = (x₁ - x₂, 2x₁ - x₂); matrix multiplications.

5. Matrix Representation of Linear Transformations

 Every linear transformation corresponds to a matrix (and vice versa), relative to chosen
bases.

Suppose V(F) and W(F) are finite dimensional vector spaces with bases {v₁, ..., vₙ} and {w₁, ...,
wₘ} respectively. Let T: V → W is be a linear map. Then the matrix of T with elements 𝑎𝑖𝑗 ∈
𝐹 is defined by

T(vⱼ) = 𝑎1𝑗 𝑤1 + 𝑎2𝑗 𝑤2 + 𝑎3𝑗 𝑤3 + ⋯ + 𝑎𝑚𝑗 𝑤𝑚 for j=1 to n.

T(vⱼ) = ∑𝑚
𝑖=1 𝑎𝑖𝑗 𝑤𝑖 for j=1 to n.

It shows that the jth column of matrix A of linear map T consists of coordinate of T(vⱼ) in the
chosen bases of W.

Conversely, every matrix 𝐴 ∈ ℝ𝑚×𝑛 induce a linear transformation 𝑇: ℝ𝑛 → ℝ𝑚 given by

T(v) = Av.

 Example: Find the matrix representation of a linear transformation T: ℝ³ → ℝ² defined

as T(x, y, z) = (x − y + 2z, y − 3z)

with respect to non-standard (ordered) bases.

 Basis B for the domain ℝ³ (ordered basis B = {v₁, v₂, v₃}): v₁ = (1, 1, 0)ᵀ v₂ = (1,
0, 1)ᵀ v₃ = (0, 1, 1)ᵀ

 Basis C for the codomain ℝ² (ordered basis C = {w₁, w₂}): w₁ = (2, 1)ᵀ w₂ = (1, 1)ᵀ
 T=[-1 6 3; 2 -9 -5];
 Example: T: ℝ² → ℝ², T(x₁, x₂) = (2x₁ - 7x₂, 4x₁ + 3x₂); Find matrix representation of T
relative to basis B = {(1,3), (2,5)}.

T(1,3) = (-19,13) = a11(1,3) + a12 (2,5)

T(2,5) = (-31,23) = a21 (1,3) + a22 (2,5)

Therefore, a11=121, a12=-70, a21=201, a22=-116

121 201
Matrix A = ( ).
−70 −116
2 1
 Reverse: For matrix A = ( ), T(x) = A x = (2x₁ + x₂, 4x₁ + 3x₂) (standard basis).
4 3
 Dimension Note: T: ℝⁿ → ℝᵐ → m × n matrix.

6. Kernel (Null Space) and Range (Image)

Let T: V → W be a linear map. Then the Nullspace of T as defined as

null(T)= {v ∈ V | T(v) = 0};

and range of T is defined as range(T)= {w ∈ W | ∃ v ∈ V s.t. T(v) = w}

Note:

o Null space of T is also called kernel of T denoted as ker(T).

o Null(T) is subspace of V

o Range(T) is subspace of W

o The Dimension of nullity(T) is called nullity of T.

o The Dimension of range(T) is called rank of T.

8. Rank-Nullity Theorem

 Statement: For a linear map T: V → W, we have

dim(rank(T)) + dim(nullity(T)) = dim(V).

 Example: determine range and nullspace of the linear map T: ℝ³ → ℝ⁴, defined by

T(x₁, x₂, x₃) = (x₁ - x₂ + x₃, x₂ - x₃, x₁, 2x₁ - 5x₂ + 5x₃).

o Range: Images of standard basis e₁=(1,0,0) → (1,0,1,2); e₂=(0,1,0) → (-1,1,0,-5);

e₃=(0,0,1) → (1,-1,0,5).

T(1,0,0) = (1,0,1,2);

T(0,1,0) = (-1,1,0,-5);

T (0,0,1) = (1,-1,0,5).

Range(T)= L{(1,0,1,2), (-1,1,0,-5)}

o Span by linearly independent vectors (1,0,1,2) and (-1,1,0,-5); rank(T) = 2.

o Kernel: Solve x| T(x) = 0 → Equations: x₁ - x₂ + x₃ = 0; x₂ - x₃ = 0; x₁ = 0; 2x₁ - 5x₂ +
5x₃ = 0.

 Simplifies to x₁ = 0, x₂ = x₃ (free variable x₃); basis vector (0,1,1);

 nullity(T) = 1.

o Verification: rank + nullity = 2 + 1 = 3 = dim(ℝ³).

Eigenvalues and Eigenvectors
 Geometric Example:

o Start with vector 𝐱 = [1,3]𝑇 .

1 2
o Apply matrix 𝐴 = [ ]: 𝐴𝐱 = [7,5]𝑇 (direction changes, magnitude scales).
2 1
o Seek "special" vectors where 𝐴𝐯only scales 𝐯(no direction change).

o Example: 𝐯 = [1,1]𝑇 , 𝐴𝐯 = [3,3]𝑇 = 3𝐯(eigenvector 𝐯, eigenvalue 𝜆 = 3).

Note: Eigenvectors are unchanged in direction; eigenvalues indicate scaling factor.

2. Mathematical Definition

Let 𝐴 ∈ ℝ𝑛×𝑛 , we say that a non-zero vector 𝑣 ∈ ℝ𝑛 is an eigenvector of 𝐴 corresponding to

eigenvalue 𝜆(a scalar) if

𝐴𝐯 = 𝜆𝐯

or
(𝐴 − 𝜆𝐼)𝐯 = 𝟎

 The above relation tells us that when the matrix (linear transformation) 𝐴 is applied on
𝑣, then it scales 𝑣 by an amount 𝜆.

Note:

 𝐯 must be nonzero; 𝜆 can be real or complex.

 Equation implies 𝐴 − 𝜆𝐼 is singular for nontrivial solutions.

3. Computing Eigenvalues and Eigenvectors

Example: 3×3 Matrix

2 −2 3
 𝐴 = [1 1 1]
1 3 −1
Step 1: Characteristic Equation for Eigenvalues

 Set det(𝐴 − 𝜆𝐼) = 0 to get the characteristic polynomial (degree 𝑛).

 Roots are eigenvalues 𝜆.

 Characteristic equation: det(𝐴 − 𝜆𝐼) = 0 yields 𝜆3 + 𝜆2 − 5𝜆 + 3 = 0.

 Eigenvalues: 𝜆1 = 3, 𝜆2 = 1, 𝜆3 = −2.

Step 2: Eigenvectors

 For each 𝜆, solve (𝐴 − 𝜆𝐼)𝐯 = 𝟎.

 For 𝜆 = 3: 𝐯1 = [1,1,1]𝑇 .

 For 𝜆 = 1: 𝐯2 = [1, −1, −1]𝑇 .

 For 𝜆 = −2: Solve row-reduced system to get 𝐯3 (e.g., proportional to [3, -1, 2]^T).

Note: Eigenvectors are defined up to scalar multiples; normalize if needed.

4. Algebraic Multiplicity

If 𝜆 repeats 𝑘 times in the characteristic polynomial, its algebraic multiplicity is 𝑘.

2 0 0
 Example: 𝐴 = (0 3 −2), characteristic polynomial (𝜆 − 2)(𝜆 − 1)2 = 0.
0 1 0
 𝜆 = 1 has multiplicity 2 (even if geometric multiplicity—number of independent
eigenvectors—differs).

5. Geometric Interpretation

Example 1: Diagonal Scaling

2 0
 𝑇=[ ](stretches x-axis by 2, y-axis unchanged).
0 1
 Eigenvalues: 𝜆1 = 2, 𝜆2 = 1.

 Eigenvectors: [1, 0]𝑇 (x-direction), [0, 1]𝑇 (y-direction).

 Insight: Reveals "principal axes" of transformation.

Example 2: Shear Transformation

2 1
 𝑆=[ ](turns square into parallelogram).
0 1
 Eigenvalues: 𝜆1 = 2, 𝜆2 = 1.

 Eigenvectors: [1, 0]𝑇 (unchanged direction), along line 𝑦 = 𝑥for scaling.

 For symmetric matrices: Eigenvectors are orthogonal.

Note: Eigen-decomposition simplifies understanding of how matrices distort space (stretch,

rotate, shear).

6. Properties of Eigenvalues and Eigenvectors

1
 Inverse Matrix: If 𝐴𝐱 = 𝜆𝐱and 𝐴invertible, then 𝐴−1 𝐱 = 𝐱.
𝜆

 Matrix Powers:𝐴𝑘 𝐱 = 𝜆𝑘 𝐱 (useful for iterations, e.g., Markov chains).

 Trace:tr(𝐴) = ∑𝜆𝑖 (sum of eigenvalues).

 Determinant:det(𝐴) = ∏𝜆𝑖 (product of eigenvalues).

 Singular Matrices:𝜆 = 0 implies det(𝐴) = 0.

 If A is n×n matrix and λ is a characteristic root of A. Then, kλ is a characteristic root of kA

(k being a scalar).

7. Python Implementation with NumPy (26:00 – 31:30)

 Code Snippet:

% Define the matrix A (same as in the Python example)

A = [ 2, -2, 3;

1, 1, 1;

1, 3, -1 ];

% Compute eigenvalues and eigenvectors

[V, D] = eig(A);

% Display results

disp('Eigenvalues (diagonal entries of D):');

disp(diag(D)); % should show approximately [3; 1; -2]

disp(' ');

disp('Eigenvectors (columns of V):');

disp(V);

% Optional: normalized eigenvectors (already unit length in MATLAB by default)

disp(' ');

disp('Norm of each eigenvector column (should be ≈1):');

disp(vecnorm(V, 2, 1)); % column-wise 2-norms

 Output Explanation:

 Eigenvalues (diagonal entries of D):

-2

 Eigenvectors (columns of V):

-0.5774 -0.7071 0.4082

-0.5774 0.0000 -0.8165

-0.5774 0.7071 0.4082

 Norm of each eigenvector column (should be ≈1):

1.0000 1.0000 1.0000

Note:

 Eigenvalues (𝜆): Scaling factors; roots of characteristic equation.

 Eigenvectors (𝐯): Directions invariant under transformation.

 Applications: Dimensionality reduction (PCA), stability in dynamical systems, graph

algorithms (PageRank).

 Next Steps: Explore defective matrices (algebraic > geometric multiplicity) and
diagonalization.
Norms and Spaces
1. Metric Spaces (Generalize distance)

 Definition: A metric 𝑑: 𝑆 × 𝑆 → ℝ on set 𝑆must satisfy:

1. Non-negativity: 𝑑(𝑥, 𝑦) ≥ 0, and 𝑑(𝑥, 𝑦) = 0iff 𝑥 = 𝑦.

2. Symmetry: 𝑑(𝑥, 𝑦) = 𝑑(𝑦, 𝑥).

3. Triangle inequality: 𝑑(𝑥, 𝑧) ≤ 𝑑(𝑥, 𝑦) + 𝑑(𝑦, 𝑧).

 (S, d) is called a metric space.

Example: On subset 𝑆 of reals, 𝑑(𝑥, 𝑦) =∣ 𝑥 − 𝑦 ∣ (satisfies all properties). (S, d) is a metric

space.

2. Normed Spaces

Definition: A norm on a vector space 𝑉 is a function

∥⋅∥: 𝑉 → ℝ such that:
1. Non-negativity: ∥ 𝑥 ∥≥ 0, and ∥ 𝑥 ∥= 0iff 𝑥 = 0.
2. Homogeneity: ∥ 𝛼𝑥 ∥=∣ 𝛼 ∣∥ 𝑥 ∥for scalar 𝛼.
3. Triangle inequality: ∥ 𝑥 + 𝑦 ∥≤∥ 𝑥 ∥ +∥ 𝑦 ∥.

 (𝑉, ∥⋅∥) is called a normed space.

 Relation to Metrics:

Induces metric 𝑑(𝑥, 𝑦) =∥ 𝑥 − 𝑦 ∥, so every normed space is a metric space (but not
vice versa).

 Counterexample (Metric but Not Norm):

Discrete space 𝑆 = {0,1}with 𝑑(𝑥, 𝑦) = 1if 𝑥 ≠ 𝑦, else 0. It's a metric but lacks vector
space structure for norms.

Examples of Norms (on ℝ𝒏 )

Example Geometric Shape: Unit Ball
Norm Type Formula
for 𝑥 = (1, −2) Shape (2D)
ℓ1 (Manhattan) ||x||1 = ∑ 𝑥𝑖 |x|1=1+|-2|=3 A diamond-shaped square
|x|1√12 + (−2)2
ℓ2 (Euclidean) ∥ 𝑥 ∥2 = √∑𝑥𝑖2 Circle
= √5 ≈ 2.236
Example Geometric Shape: Unit Ball
Norm Type Formula
for 𝑥 = (1, −2) Shape (2D)
1
𝑛 𝑝 As p decreases, corners get sharper → more sparsity
ℓ𝑝 (General) ||x||p = (∑|𝑥𝑖 |𝑝 ) As p increases, shape becomes smoother → less
𝑖=1
sparsity
𝑓𝑜𝑟 𝑝 > 0)
||𝑥||∞
ℓ∞ (Max) ||𝑥||∞ = max{|𝑥𝑖 |} Square
= max{|1|, |−2|} = 2
5. Convexity

Convex Set: A set is said to be convex if the line joining two points of the set lies entirely in the
set. For 𝑆 ⊂ ℝ𝑛 , line segment between any 𝑥1 , 𝑥2 ∈ 𝑆 lies in 𝑆:

𝜆𝑥1 + (1 − 𝜆)𝑥2 ∈ 𝑆 for 𝜆 ∈ [0,1].

Examples:

o convex set: Rectangles, disks;

o non-convex : crescent shapes.

Convex Function: 𝑓: 𝑆 ⊂ ℝ𝑛 → ℝ is said to be convex if for 𝑥1 , 𝑥2 ∈ 𝑆 , we have

𝑓(𝜆𝑥1 + (1 − 𝜆)𝑥2 ) ≤ 𝜆𝑓(𝑥1 ) + (1 − 𝜆)𝑓(𝑥2 ), where 𝜆 ∈ [0,1].

o Geometric:

Norms and Convexity: All norms induce convex unit balls {𝑥: ∥ 𝑥 ∥≤ 1} via triangle
inequality.

6. Inner Product Spaces

Definition: An Inner product on real vector space V is a function

⟨⋅,⋅⟩: 𝑉 × 𝑉 → ℝ

satisfying:
1. Positive-definiteness: ⟨𝑥, 𝑥⟩ ≥ 0, and ⟨𝑥, 𝑥⟩ = 0 iff 𝑥 = 0.

2. Linearity (first arg.): ⟨𝛼𝑥 + 𝑦, 𝑧⟩ = 𝛼⟨𝑥, 𝑧⟩ + ⟨𝑦, 𝑧⟩.

3. Symmetry: ⟨𝑥, 𝑦⟩ = ⟨𝑦, 𝑥⟩.

 A Vector space together with inner product is called inner product space.

Example

 V= ℝ𝑛 (ℝ), Let 𝑋 = (𝑥1 , 𝑥2 , … , 𝑥𝑛 ), 𝑌 = (𝑦1 , 𝑦2 , … , 𝑦𝑛 ). Then standard inner product on

ℝ𝑛
⟨𝑋, 𝑌⟩ = ∑𝑥𝑖 𝑦𝑖 = 𝑋. 𝑌 =∥ 𝑥 ∥∥ 𝑦 ∥ cos 𝜃
⟨𝑥,𝑦⟩ 1
Angle: cos 𝜃 = ∥𝑥∥∥𝑦∥ (norm from ⟨𝑥, 𝑥⟩2 ).

 V= ℝ2 : ⟨(𝑢1 , 𝑢2 ), (𝑣1 , 𝑣2 )⟩ = 𝑢1 𝑣1 − 𝑢1 𝑣2 − 𝑢2 𝑣1 + 𝑢2 𝑣2 (satisfies properties).

7. Special Case: 𝓵𝟎 Norm

 Definition: ∥ 𝑥 ∥0 =number of non-zero entries.

o Example: 𝑥 = (1,2,0,0,3,0,0,4), ∥ 𝑥 ∥0 = 4 .

 Not a True Norm: Fails homogeneity (e.g., ∥ 2𝑥 ∥0 =∥ 𝑥 ∥0 ≠ 2 ∥ 𝑥 ∥0 ).

 Applications: Sparsity in compressive sensing.

8. Defining Norms via Regions

 In finite-dimensional real vector spaces: Any symmetric, compact, convex set centered at
origin defines a norm (distance to boundary).

o Examples: Circle ( ℓ2 ), square ( ℓ∞ ), ellipse.

o Non-examples: Star-shaped (non-convex) regions fail.

Orthogonal Complement and Projection Mapping
1. Orthogonal Vectors

 In an inner product space (𝑉, <. , . >), two vectors 𝑣𝑖 and 𝑣𝑗 are orthogonal if ⟨𝑣𝑖 , 𝑣𝑗 ⟩ = 0.

 Generalizes perpendicularity (e.g., in ℝ𝑛 , dot product = 0).

Example: In ℝ3 , (1, 0, 0)and (0, 1, 0)are orthogonal (dot product = 0).

2. Orthogonal Complement

If 𝑊 ⊆ 𝑉, where 𝑉is an inner product space, then the orthogonal complement of 𝑊, denoted
by 𝑊 ⊥ , is the set of vectors in 𝑉that are orthogonal to every element of 𝑊:

W⊥ = { v ∈ V ∣ v ⊥ w ∀𝑤 ∈ 𝑊 } i.e.

For subset 𝑊 ⊆ 𝑉, 𝑊 ⊥ = { 𝑣 ∈ 𝑉 ∣ ⟨𝑣, 𝑤⟩ = 0 ∀𝑤 ∈ 𝑊 }.

Example:

In ℝ3 with the inner product taken as the usual dot product of vectors in ℝ3 , if we take

𝑊 = 𝐿[(1,0,0)] = span{(1,0,0)} = { (𝑥1 , 𝑥2 , 𝑥3 ) ∈ ℝ3 ∣ 𝑥2 = 0, 𝑥3 = 0 }

Then 𝑊 ⊥ = 𝐿[(0,1,0), (0,0,1)] = span{(0,1,0), (0,0,1)}

= { (𝑥1 , 𝑥2 , 𝑥3 ) ∈ ℝ3 ∣ 𝑥1 = 0 }
In geometric terms:

 𝑊is the x-axis (all points where y = 0 and z = 0).

 𝑊 ⊥ is the yz-plane (all points where x = 0).

Example: In ℝ3 , for 𝑊 = span{(1,0,0), (0,1,0)}, 𝑊 ⊥ = span{(0,0,1)}.

Remark:

 There is no requirement on 𝑊 to be a subspace of V. However, 𝑊 ⊥ is always a subspace

of 𝑉.

4. Direct Sum Decomposition

 If 𝑊is a subspace of 𝑉, then 𝑉 = 𝑊 ⊕ 𝑊 ⊥ (orthogonal direct sum).

 Every 𝑣 ∈ 𝑉decomposes uniquely as 𝑣 = 𝑤 + 𝑤 ⊥ with 𝑤 ∈ 𝑊, 𝑤 ⊥ ∈ 𝑊 ⊥.

Example: Decompose 𝑣 = (2,3,5)into projection on 𝑊 = span{(1,0,0), (0,1,0)} plus
orthogonal component.

𝑣 = (2,3,5) = (2,3,0) + (0,0,5)

Here (2,3,0) = 2(1,0,0) + 3(0,1,0) ϵ W and (0,0,5) ∈ 𝑊 ⊥.

5. Orthogonal Projection Operator (𝑷𝑾 )

Let V be an inner product space and W be a subspace of V. Also, let {𝑤1 , … , 𝑤𝑘 } be an

orthogonal basis of W.

The Linear transform 𝑃𝑊 : 𝑉 → W define as

𝑘
⟨𝑣, 𝑤𝑖 ⟩
𝑃𝑊 (𝑣) = ∑ 𝑤
⟨𝑤𝑖 , 𝑤𝑖 ⟩ 𝑖
𝑖=1

is called the orthogonal projection onto subspace 𝑊.

Example: In ℝ3 , 𝑊 = span{(3,0,1), (0,1,0)}, for 𝑣 = (0,3,10), 𝑃𝑊 𝑣 = (3,3,1).

⟨𝑣, 𝑤1 ⟩ ⟨𝑣, 𝑤2 ⟩
𝑃𝑊 (𝑣) = 𝑤1 + 𝑤
⟨𝑤1 , 𝑤1 ⟩ ⟨𝑤𝑖 , 𝑤2 ⟩ 2
10 3
= (3,0,1) + (0,1,0) = (3,0,1) + (0,3,0)
10 1
= (3,3,1)

 Residual: 𝑣 − 𝑃𝑊 (𝑣) = (0,3,10) − (3,3,1) = (−3,0,9) ∈ 𝑊 ⊥ .

6. Properties of Orthogonal Projection

 For any vector v in V, 𝑣 − 𝑃𝑊 𝑣 ⊥ 𝑊(orthogonal to W).

 𝑃𝑊 (𝑤) = 𝑤 for all 𝑤 ∈ 𝑊(fixes the subspace).

2
 𝑃𝑊 = 𝑃𝑊 (idempotent).

 For any vector v in V, ∥ 𝑃𝑊 𝑣 ∥≤∥ 𝑣 ∥(non-expansive).

 For any vector v in V and w in W,

‖𝑣 − 𝑃𝑊 (𝑣)‖ ≤∥ 𝑣 − 𝑤 ∥

𝑃𝑊 (𝑣) = 𝑎𝑟𝑔𝑤∈𝑊 𝑚𝑖𝑛⟦𝑣 − 𝑤⟧

Example: Closest Point in Subspace

 Find the closest point to 𝑣 = (2,4,0, −2), in 𝑊 = span{(1,1,0,0), (0,0,1,1)}.

Sol: Let 𝑤1 = (1,1,0,0), 𝑤2 = (0,0,1,1) then the closest point to 𝑣 is
⟨𝑣, 𝑤1 ⟩ ⟨𝑣, 𝑤2 ⟩
w
̂= 𝑤1 + 𝑤
⟨𝑤1 , 𝑤1 ⟩ ⟨𝑤𝑖 , 𝑤2 ⟩ 2
6 −2
= (1,1,0,0) + (0,0,1,1) = (3,3,0,0) + (0,0, −1, −1)
2 2
= (3,3, −1, −1) =3(1,1,0,0)+(-1)(0,0,1,1) ∈ 𝑊.

7. Projection Transformation

Any linear map 𝑃 that satisfies 𝑃2 = 𝑃 is called a projection, So 𝑃𝑊 is a projection.

 Orthogonal projections satisfy this; identity map is a trivial case.

8. Applications in Machine Learning

 Used in dictionary learning, nearest vector approximation in subspaces.

 Aids classification (e.g., closest point in class subspace).

 Solves min ∥ 𝑣 − 𝑤 ∥for 𝑤 ∈ 𝑊.

Special Matrices and Properties
Overview

 Continuation from eigenvalues/eigenvectors for symmetric/orthogonal matrices.

Emphasizes special matrices, particularly positive definite matrices, with applications in
convex optimization and machine learning.

 Key Concepts: Quadratic forms, Rayleigh quotient, positive (semi-)definite matrices,

tests for definiteness, and geometric interpretations.

1. Quadratic Form

Quadratic form

Let 𝐴 ∈ ℝ𝑛×𝑛 be a symmetric matrix. The expression

𝑞(𝑋) = 𝑋 𝑇 𝐴𝑋

is called a Quadratic form.

Example: Given
1 −1 2
𝐴 = (−1 3 1)
2 1 4
Then for 𝑋 = (𝑥1 , 𝑥2 , 𝑥3 )𝑇 , we have

𝑞(𝑋) = 𝑋 𝑇 𝐴𝑋 = 𝑥12 + 3𝑥22 + 4𝑥32 − 2𝑥1 𝑥2 + 4𝑥1 𝑥3 + 2𝑥2 𝑥3

Similarly, if

𝑞(𝑋) = 𝑥12 + 4𝑥1 𝑥2 + 𝑥22 + 2𝑥32 + 6𝑥2 𝑥3 , then the associated symmetric matrix 𝐴 is
1 2 0
𝐴 = (2 1 3)
0 3 2
This is standard theory: every quadratic form in real variables with a symmetric matrix
corresponds uniquely via the identity

𝑞(𝐱) = ∑ 𝑎𝑖𝑖 𝑥𝑖2 + 2 ∑ 𝑎𝑖𝑗 𝑥𝑖 𝑥𝑗

𝑖 𝑖<𝑗

(where the factor of 2 on cross terms ensures symmetry of A).

Example (for 3x3 matrix): 𝑄(𝐱) = 𝑎11 𝑥12 + 𝑎22 𝑥22 + 𝑎33 𝑥32 + 2𝑎12 𝑥1 𝑥2 + 2𝑎13 𝑥1 𝑥3 +
2𝑎23 𝑥2 𝑥3 .
2. Rayleigh Quotient

For a real n × n matrix A, we have

𝑋 𝑇 𝐴𝑋
𝑅𝐴 (𝑋) = where 𝑋 ≠ 0
𝑋𝑇 𝑋
Properties:

 Scale invariance: For any 𝑋 ≠ 0and any scalar 𝛼 ≠ 0,

𝑅𝐴 (𝑋) = 𝑅𝐴 (𝛼𝑋)

(This follows directly because both numerator and denominator scale by 𝛼 2 , so they cancel.)

 If 𝑋is an eigenvector of A with eigenvalue 𝜆, then

𝑅𝐴 (𝑋) = 𝜆
𝑋 𝑇 𝐴𝑋
(Proof: 𝐴𝑋 = 𝜆𝑋 ⇒ 𝑋 𝑇 𝐴𝑋 = 𝜆𝑋 𝑇 𝑋 ⇒ = 𝜆.)
𝑋𝑇𝑋

 𝜆min (𝐴) = min𝑋≠0 𝑅𝐴 (𝑋)

 For any 𝐱 such that (∥ 𝐱 ∥2 = 1), we have

𝜆min (𝐴) ≤ 𝑋 𝑇 𝐴𝑋 ≤ 𝜆max

This equality holds iff 𝐱 is corresponding eigenvector

 For any 𝑋 ≠ 0, we have

𝜆min (𝐴) = min 𝑅𝐴 (𝑋) ≤ 𝑅𝐴 (𝑋) ≤ max 𝑅𝐴 (𝑋) = 𝜆max (𝐴)

(The minimum value of the Rayleigh quotient over all non-zero vectors is exactly the smallest
eigenvalue of A.)

3. Positive (Semi-)Definite Matrices

 Positive Semi-Definite (PSD): A Symmetric 𝐴 ∈ ℝ𝑛×𝑛 is said to be positive semi definite

if for all 𝐱 ∈ ℝ𝑛

𝑞(𝐱) = 𝐱 𝑇 𝐴𝐱 ≥ 0

 Positive Definite (PD): 𝐱 𝑇 𝐴𝐱 > 0 for all 𝐱 ≠ 0.

 Equivalent Conditions:

o All eigenvalues ≥ 0 (PSD) or > 0 (PD).

o All principal minors ≥ 0 (PSD) or > 0 (PD) via Sylvester's Criterion (P-test).
1 3
 Example: 𝐴 = [ ]→ 𝑄(𝐱) = (𝑥1 + 3𝑥2 )2 ≥ 0(PSD).
3 10
4. Checking Positive Definiteness

 Eigenvalue Test: Compute eigenvalues; all positive for PD and all non-negative for PSD

 P-Test (Leading Principal Minors):

o All must be positive for PD and all non-negative for PSD.

2 −1
o Example: 𝐴 = [ ]:
−1 2
 1×1: 2 > 0

 2×2: det(A) = 3 > 0 → PD.

2 2 −1
 Problem Example: For 𝐴 = [ 2 2 −1]:
−1 −1 𝑏
o PSD if -1 ≤ b ≤ 2; PD if -1 < b < 2.

Note:

 For any real 𝐴 ∈ ℝ𝑛×𝑛 : 𝐴𝑇 𝐴 is always PSD.

 If 𝐴 has full column rank (trivial null space i.e Nullspace(A)={0}): 𝐴𝑇 𝐴 is PD.

 Regularization: If 𝐴is PSD and 𝜖 > 0, then 𝐴 + 𝜖𝐼is PD.

5. The geometry of PD quadratic forms

The isocontour of 𝑞(𝐱) = 𝐱 𝑇 𝐴𝐱 are ellipsoids s.t. the axes point in the direction of the
eigenvectors of A, and the radii of these axes are proportional to the inverse square root of the
corresponding eigenvalues.

Level sets of PD quadratic form 𝐱 𝑇 𝐴𝐱 = 𝑐(c > 0) form ellipsoids.

1
 Axes align with eigenvectors; semi-axes lengths ∝ .
√𝜆𝑖
Spectral Decomposition
1. Eigen Decomposition

 Definition: For an 𝑛 × 𝑛 square matrix 𝐴, eigen decomposition expresses 𝐴as 𝐴 =

𝑃𝐷𝑃 −1, where:

o 𝑃is an invertible matrix with columns as the eigenvectors 𝑣1 , 𝑣2 , … , 𝑣𝑛 of 𝐴.

o 𝐷is a diagonal matrix with eigenvalues 𝜆1 , 𝜆2 , … , 𝜆𝑛 on the diagonal.

 Requirements: 𝐴must have 𝑛linearly independent eigenvectors, forming a basis for ℝ𝑛 .

 Not all square matrices can be eigen decomposed.

2. Necessary Conditions for Eigen Decomposition

For each eigenvalue, the algebraic multiplicity (from the characteristic polynomial) must equal
the geometric multiplicity (dimension of the eigenspace).

 If this holds for all eigenvalues, the eigenvectors span the full space, making 𝑃invertible.’

3. Counterexample: Matrix Without Eigen Decomposition

 Matrix:
1 1 1
𝐴 = [0 1 1]
0 0 1

 Eigenvalues: 𝜆 = 1(algebraic multiplicity 3).

 Eigenvectors: Only one independent eigenvector (e.g., [1,0,0]𝑇 ), so geometric

multiplicity = 1.

 Result: Cannot form a full 3 × 3 invertible 𝑃; decomposition impossible.

4. Spectral Decomposition (for Symmetric Matrices)

 Symmetric Matrix: 𝐴 = 𝐴𝑇 .

 Properties:

o Always has eigen decomposition.

o Eigenvectors are orthogonal (can be chosen to form an orthonormal basis).

o 𝑃is orthogonal, so 𝑃−1 = 𝑃𝑇 .

 Form: 𝐴 = 𝑃𝐷𝑃𝑇 .

 Alternative Expression:
𝑛

𝐴 = ∑ 𝜆𝑖 𝑣𝑖 𝑣𝑖𝑇
𝑖=1

o Each 𝜆𝑖 𝑣𝑖 𝑣𝑖𝑇 is a rank-1 matrix (outer product of eigenvector with itself).

o 𝑣𝑖 𝑣𝑖𝑇 projects orthogonally onto the 1D subspace spanned by 𝑣𝑖 .

5. Properties of Spectral Decomposition

 Projection Interpretation: Each term 𝑣𝑖 𝑣𝑖𝑇 (when 𝑣𝑖 is unit length) is an orthogonal

projection matrix.

 Reconstruction: The sum of these rank-1 projections exactly recovers 𝐴.

 Orthogonality: Ensures the projections are mutually orthogonal.

6. Example: 2x2 Symmetric Matrix

 Matrix:
2 1
𝐴=[ ]
1 2
 Eigenvalues: 𝜆1 = 3, 𝜆2 = 1.

 Eigenvectors: 𝑣1 = [1,1]𝑇 , 𝑣2 = [−1,1]𝑇 (orthogonal).

1 1
 Normalized: 𝑣̂1 = [1,1]𝑇 , 𝑣̂2 = [−1,1]𝑇 .
√2 √2

1 1
− 3 0
 𝑃 = [√2
1
√2
1 ], 𝐷=[ ].
0 1
√2 √2

 Decomposition: 𝐴 = 3𝑣̂1 𝑣̂1𝑇 + 1𝑣̂2 𝑣̂2𝑇 .

 Each 𝑣̂𝑖 𝑣̂𝑖𝑇 is a rank-1 projection.

7. Applications

 Principal Component Analysis (PCA).

 Spectral clustering.
 Statistical machine learning algorithms.

 Reinforcement learning.

 Image processing.

 Recommender systems.

Note:

 This decomposition is foundational for many linear algebra applications in data science
and ML.
Singular Value Decomposition (SVD)
1. From Symmetric to Rectangular Matrices

 For a symmetric square matrix 𝐴, eigenvalue decomposition: 𝐴 = 𝑃𝐷𝑃𝑇 ( 𝐷diagonal with

eigenvalues; 𝑃orthonormal eigenvectors).

 For rectangular 𝑚 × 𝑛matrix 𝐴(𝑚 ≠ 𝑛), standard diagonalization fails—SVD provides an

analogous factorization.

2. Singular Values

 Compute 𝐴𝑇 𝐴( 𝑛 × 𝑛symmetric, positive semi-definite matrix).

 Eigenvalues of 𝐴𝑇 𝐴: 𝜆1 ≥ 𝜆2 ≥ ⋯ ≥ 𝜆𝑛 ≥ 0.

 Singular values: 𝜎𝑖 = √𝜆𝑖 (non-negative, ordered decreasingly: 𝜎1 ≥ ⋯ ≥ 𝜎𝑛 ≥ 0).

 Number of non-zero 𝜎𝑖 equals rank(𝐴) = 𝑟.

3. SVD Factorization

 𝐴 = 𝑈Σ𝑉 𝑇 , where:

o 𝑈: 𝑚 × 𝑚orthogonal matrix (columns: orthonormal eigenvectors of 𝐴𝐴𝑇 ).

o Σ: 𝑚 × 𝑛"diagonal" matrix (singular values 𝜎1 , … , 𝜎𝑟 on main diagonal; zeros

elsewhere).

 If 𝑚 > 𝑛: "tall" (extra zero rows).

 If 𝑚 < 𝑛: "fat" (extra zero columns).

o 𝑉: 𝑛 × 𝑛orthogonal matrix (columns: orthonormal eigenvectors of 𝐴𝑇 𝐴, ordered

by decreasing 𝜆𝑖 ).

4. Constructing 𝑼and 𝑽

 𝑉's columns: Eigenvectors of 𝐴𝑇 𝐴(corresponding to 𝜆𝑖 = 𝜎𝑖2 ).

 𝑈's columns:

o Eigenvectors of 𝐴𝐴𝑇 ( 𝑚 × 𝑚symmetric).

𝐴𝑣𝑖
o Or, for 𝑖 = 1to 𝑟: 𝑢𝑖 = (using 𝑉's columns).
𝜎𝑖

 Remaining 𝑈columns (if 𝑚 > 𝑟): Orthogonal to the first 𝑟(from null space of 𝐴𝐴𝑇 ).
𝟎 𝟏
5. Example 1: 3×2 Matrix 𝑨 = [𝟏 √𝟐]
𝟎 𝟏
1 √2 1
𝑇
 𝐴𝐴 = [√2 3 √2].
1 √2 2

 Eigenvalues: 𝜆 = 8,2,0→ 𝜎1 = 2√2, 𝜎2 = √2, 𝜎3 = 0(rank = 2).

2√2 0
 Σ=[ 0 √2](3×2).
0 0
 𝑈: Eigenvectors of 𝐴𝐴𝑇 (normalized).

 𝑉: From 𝐴𝑇 𝐴eigenvectors.

 Verify: 𝐴 = 𝑈Σ𝑉 𝑇 .
𝟒 𝟖
6. Example 2: 3×2 Matrix 𝑨 = [𝟏𝟏 𝟕]
𝟏𝟒 −𝟐
327 96
 𝐴𝑇 𝐴 = [ ].
96 147
 Eigenvalues: 𝜆1 = 360, 𝜆2 = 90(trace/det method) → 𝜎1 = 6√10, 𝜎2 = 3√10(rank = 2).

 𝑉: Eigenvectors of 𝐴𝑇 𝐴(e.g., 𝑣1 ≈ [0.894,0.447]𝑇 , 𝑣2 ≈ [−0.447,0.894]𝑇 ; complete to

orthogonal basis).
𝐴𝑣1
 𝑈: 𝑢1 = ≈ [0.632,0.632,0.447]𝑇 , similarly for 𝑢2 ; third column orthogonal.
𝜎1

6√10 0
 Σ=[ 0 3√10].
0 0
 Verify decomposition holds.

7. Key Properties & Teaser

 SVD reveals matrix "structure" via singular values (like eigenvalues for non-square cases).

 Rank(𝐴) = number of non-zero singular values.

 Next lecture: Applications (e.g., least squares, PCA, image compression).

SVD - Properties and Applications

1. Geometrical Interpretation of SVD

 SVD decomposes a matrix 𝐴as 𝐴 = 𝑈Σ𝑉 𝑇 , where:

o 𝑈and 𝑉are orthogonal matrices (rotations/reflections).

o Σis a diagonal matrix with singular values 𝜎𝑖 (non-negative, ordered decreasingly).

 Applying SVD to a unit circle:

o 𝑉 𝑇 : Rotates the circle.

o Σ: Scales along principal axes (stretches into an ellipse if singular values differ).

o 𝑈: Rotates the result.

 Overall: SVD represents linear transformations as rotation → scaling → rotation.

2. Properties of SVD

 For a matrix 𝐴(m × n) of rank 𝑟:

o Number of non-zero singular values = rank 𝑟.

 Range Space (Column Space):

o First 𝑟columns of 𝑈form an orthonormal basis for range of 𝐴.

 Null Space:

o Last 𝑛 − 𝑟columns of 𝑉(for zero singular values) form an orthonormal basis for
null space of 𝐴.

 For 𝐴𝑇 (transpose):

o Range space: First 𝑟columns of 𝑉.

o Null space: Last 𝑚 − 𝑟columns of 𝑈.

 Utility: SVD directly computes rank, range, and nullity without solving systems.

3. Pseudo-Inverse via SVD

 Moore-Penrose pseudo-inverse 𝐴+ = 𝑉Σ + 𝑈 𝑇 , where Σ + is the pseudo-inverse of Σ.

 Constructing Σ + :
1
o For 𝜎𝑖 > 0: Diagonal entry = 𝜎 .
𝑖
o For 𝜎𝑖 = 0(or near-zero): Set to 0 (avoids instability).

 Example (2×3 matrix 𝐴):

o Singular values: 6√10, 3√10.

o Σ + formed by reciprocals on diagonal.

o Verifies 𝐴𝐴+ = 𝐼(identity for compatible dimensions).

 Role: Solves least-squares problems for over/under-determined systems.

4. Matrix Norms
∥𝐴𝑥∥
 Operator Norm:∥ 𝐴 ∥= sup 𝑥≠0 ∥𝑥∥ (induced by vector norm).

 Common Induced Norms:

o 1-Norm (∥ 𝐴 ∥1 ): Maximum absolute column sum: max 𝑗 ∑𝑖 ∣ 𝑎𝑖𝑗 ∣.

o ∞-Norm (∥ 𝐴 ∥∞ ): Maximum absolute row sum: max 𝑖 ∑𝑗 ∣ 𝑎𝑖𝑗 ∣.

o Spectral Norm (∥ 𝐴 ∥2 ): Largest singular value 𝜎1 (Euclidean-induced).

2
 Frobenius Norm (∥ 𝐴 ∥𝐹 ):√∑ 𝑎𝑖𝑗 = √\trace(𝐴𝑇 𝐴) (like vector 2-norm of entries).
𝑖,𝑗

 Unitary Invariance: Norms unchanged under orthogonal transformations: ∥ 𝑈𝐴𝑉 ∥=∥

𝐴 ∥for orthogonal 𝑈, 𝑉.

 Example: Compute norms for a sample matrix to illustrate.

5. Applications of SVD

 Compute pseudo-inverse for linear systems.

 Find bases for range/null spaces.

 Dimensionality reduction (preview: low-rank approximation in ML, e.g., compressing

data while preserving structure).

 Numerical stability: SVD is robust for ill-conditioned matrices.

Low Rank Approximations
1. Review of Matrix Rank

 Definition: Rank of a matrix 𝐴is the dimension of its range space (column space), or
equivalently, the number of linearly independent rows/columns.

 Examples:

o Identity matrix 𝐼𝑛 : Rank = 𝑛(full rank).

o Matrix with dependent rows (e.g., third row = row1 + row2): Rank drops to 2 for
a 3×3 matrix.

 Key Properties:

o rank(𝐴) = rank(𝐴𝑇 )(row rank = column rank).

o rank(𝑃𝐴𝑄) = rank(𝐴)if 𝑃and 𝑄are invertible.

o rank(𝐴) =number of non-zero singular values in SVD.

o rank(𝐴𝐵) ≤ min(rank(𝐴) , rank(𝐵)).

𝐴 0
o For block matrices: Inequalities like rank ( ) ≤ rank(𝐴) + rank(𝐵).
𝐶 𝐵
2. Matrix Factorization

 Concept: For an 𝑚 × 𝑛matrix 𝐴of rank 𝑟, columns of 𝐴are linear combinations of 𝑟basis
vectors from the range space.

 Factorization Form:𝐴 = 𝐵𝐶 𝑇 , where 𝐵is 𝑚 × 𝑟(basis vectors) and 𝐶 𝑇 is 𝑟 ×

𝑛(coefficients).

 Storage Efficiency: Reduces from 𝑚𝑛elements to 𝑚𝑟 + 𝑛𝑟.

o Example: 50×100 matrix (rank 20) stores as ~3,000 elements vs. 5,000—
motivates low-rank methods for large data (e.g., images, ML models).

3. Low-Rank Approximation

 Definition: Find 𝐴̃𝑘 (𝑚 × 𝑛, rank 𝑘 ≤ 𝑟) minimizing Frobenius norm ∥ 𝐴 − 𝐴̃𝑘 ∥𝐹 =

√∑𝑖,𝑗( 𝑎𝑖𝑗 − 𝑎̃𝑖𝑗 )2 .

 Optimization Challenge: Non-convex problem.

o Example: Rank-1 approximation of 2×2 identity matrix. Convex combination of
two rank-1 matrices yields rank-2, not rank-1 (violates convexity).

4. Solution via Singular Value Decomposition (SVD)

 SVD Recap:𝐴 = 𝑈Σ𝑉 𝑇 , where Σhas singular values 𝜎1 ≥ 𝜎2 ≥ ⋯ ≥ 0.

 Best Rank-𝑘 Approximation:𝐴̃𝑘 = ∑𝑘𝑖=1 𝜎𝑖 𝑢𝑖 𝑣𝑖𝑇 (keep top 𝑘singular values, zero others).

 Theorem: Eckart-Young—guarantees minimal Frobenius norm error for this 𝐴̃𝑘 .

 Examples:

o 5×5 matrix with singular values [3, 1, 0.5, 0.2, 0.05]: Rank-3 approx. retains first
three.

o Upper triangular 3×3 matrix (rank 3): Rank-2 SVD approx. changes entries
minimally (e.g., 3 → 2.99, 2 → 2.01).

5. Applications and Visual Examples

 Image Compression: 266-rank grayscale image approximated at ranks 1, 4, 10, 25.

o Rank-1: Basic outline (poor fidelity).

o Rank-4/10: Adds details (edges, shadows).

o Rank-25: Near-original quality—compresses data while preserving visuals.

Matlab Implementation of SVD and Low-Rank
Approximation
clear; clc;

format short; % (use format long for more decimals if needed)

% Define 3×4 matrix A

A = [1 2 3 4;

1 1 2 3;

0 1 1 0];

disp('Matrix A:');

disp(A);

% SVD Computation (full matrices by default in MATLAB)

[U, S, V] = svd(A); % U: 3×3 orthogonal, S: 3×4 diagonal-ish, V: 4×4 orthogonal

% Note: MATLAB returns V (not V'), so V' is the conjugate transpose

disp('U (3×3):');

disp(U);

disp('Singular values (diag elements of S):');

disp(diag(S)); % or just disp(S) to see full matrix

disp('V (4×4):');

disp(V);

% Verification: orthogonality

disp('U * U'' (should be identity):');

disp(U * U');

disp('V * V'' (should be identity):');

disp(V * V');

% Rank: number of non-zero singular values (all 3 here → rank = 3)

tol = 1e-10;

rank_A = sum(diag(S) > tol);

disp(['Rank of A = ' num2str(rank_A)]);

% Reconstructing A = U * S * V'

reconstructed = U * S * V';

disp('Reconstructed A:');

disp(reconstructed);

disp('Max absolute difference |A - reconstructed|:');

disp(max(abs(A(:) - reconstructed(:)))); % should be ~1e-15 or machine epsilon

% Low-Rank Approximation (Rank-2)

S2 = S;

S2(3,3) = 0; % zero out the third singular value

approx_rank2 = U * S2 * V';

disp('Rank-2 approximation:');

disp(approx_rank2);

disp('Max absolute difference from original:');

disp(max(abs(A(:) - approx_rank2(:)))); % small error expected

% =========================================================================

% SVD for 5×4 Matrix (Rows > Columns)

% =========================================================================

% Example taller matrix (modify/extend A or define new one)

A_tall = [1 2 3 4;

1 1 2 3;

0 1 1 0;

2 0 1 4;

3 1 0 2]; % 5×4 example

[U_tall, S_tall, V_tall] = svd(A_tall);

disp('For 5×4 matrix:');

disp('U: 5×5 orthogonal');

disp('S: 5×4 (diagonal in top 4×4 block)');

disp('V: 4×4 orthogonal');

% Last singular value typically near zero → rank = 4 (full column rank)

disp('Singular values:');

disp(diag(S_tall(1:4,1:4))); % last one often ≈0 if rank deficient

% Same reconstruction works: A_tall ≈ U_tall * S_tall * V_tall'

% The implementation is dimension-agnostic in MATLAB

% =========================================================================

% 19:30 – Application: Image Compression via SVD (Conceptual Translation)

% =========================================================================

% MATLAB equivalent workflow (no direct cv2, but use imread/imresize/rgb2gray)

% Example using built-in or file/URL image

% img = imread('your_image.jpg'); % or imread(url) with webread + imdecode

% if size(img,3) == 3

% img_gray = rgb2gray(img); % or mean(img,3) for simple average

% end

% img_double = im2double(img_gray); % [0,1] range

% % Optional: standardize (zero mean, unit variance)

% img_vec = img_double(:);

% mu = mean(img_vec);

% sigma = std(img_vec);

% img_scaled = (img_double - mu) / sigma;

% % SVD

% [U_img, S_img, V_img] = svd(img_scaled, 'econ'); % economy size often preferred for large
images

% % Cumulative variance explained

% sing_vals = diag(S_img);

% variance_explained = cumsum(sing_vals.^2) / sum(sing_vals.^2);

% figure; plot(variance_explained); title('Cumulative Explained Variance');

% xlabel('Number of singular values'); ylabel('Fraction of variance');

% % Low-rank reconstruction (e.g., keep k components)

% k = 50; % choose based on variance plot

% Uk = U_img(:,1:k);

% Sk = S_img(1:k,1:k);

% Vk = V_img(:,1:k);

% recon = Uk * Sk * Vk';

% % Add back mean/std if standardized

% recon = recon * sigma + mu;

% % Clip & visualize

% recon = max(0, min(1, recon));

% figure; imshow(recon); title(['Reconstructed with k = ' num2str(k)]);

% Compression insight:

% - Original image: height × width values

% - Truncated SVD with k << min(m,n): store only k×(m + n) values ≈ much less storage

% - Small singular values → negligible visual contribution → discard for compression

% =========================================================================

% Key Takeaways (MATLAB style)

% =========================================================================

% • Use [U,S,V] = svd(A) for full; [U,S,V] = svd(A,'econ') for economy size
% • Reconstruct: A ≈ U * S * V'

% • Form low-rank approx by zeroing small diagonal entries in S

% • Very powerful for compression, noise reduction, PCA (U columns = principal components)

% • Applications: image compression, dimensionality reduction, latent semantic analysis, etc.

Chapter II 1
No ratings yet
Chapter II 1
4 pages
Vectors and Matrices in Data Science
No ratings yet
Vectors and Matrices in Data Science
33 pages
Linear Algebra Cheat Sheet Overview
No ratings yet
Linear Algebra Cheat Sheet Overview
5 pages
Linear Algebra Concepts and Applications
No ratings yet
Linear Algebra Concepts and Applications
20 pages
MPDS Part2
No ratings yet
MPDS Part2
41 pages
Matrix Algebra in Linear Algebra
100% (1)
Matrix Algebra in Linear Algebra
89 pages
Chapter 2 1
No ratings yet
Chapter 2 1
14 pages
PODS2 - Essential Mathematics I - Linear Algebra Vectors
No ratings yet
PODS2 - Essential Mathematics I - Linear Algebra Vectors
30 pages
Matrix and Vector Algebra Basics
No ratings yet
Matrix and Vector Algebra Basics
49 pages
Understanding Vector Spaces in Linear Algebra
No ratings yet
Understanding Vector Spaces in Linear Algebra
6 pages
Understanding Vector Spaces in Mathematics
100% (1)
Understanding Vector Spaces in Mathematics
20 pages
Linear Algebra Concepts for Machine Learning
No ratings yet
Linear Algebra Concepts for Machine Learning
18 pages
Linear Algebra and Its ModernApplications
No ratings yet
Linear Algebra and Its ModernApplications
8 pages
Chapter 1
No ratings yet
Chapter 1
13 pages
Algebra Course Overview and Topics
No ratings yet
Algebra Course Overview and Topics
371 pages
Linear Algebra for Machine Learning
No ratings yet
Linear Algebra for Machine Learning
33 pages
Abstract Linear Algebra Concepts
No ratings yet
Abstract Linear Algebra Concepts
60 pages
INT255 Unit 1
No ratings yet
INT255 Unit 1
94 pages
Linear Algebra Concepts and Exercises
No ratings yet
Linear Algebra Concepts and Exercises
143 pages
Mathematics for Data Science and AI
No ratings yet
Mathematics for Data Science and AI
21 pages
Term Paper (1) Flattened Flattened
No ratings yet
Term Paper (1) Flattened Flattened
72 pages
Vector Space
No ratings yet
Vector Space
22 pages
Numerical Methods in Computer Science
No ratings yet
Numerical Methods in Computer Science
20 pages
Vectors and Linear Dependence Overview
No ratings yet
Vectors and Linear Dependence Overview
10 pages
Linear Algebra II Overview
No ratings yet
Linear Algebra II Overview
35 pages
Mathematical Foundations for Data Science
No ratings yet
Mathematical Foundations for Data Science
24 pages
Subspace Concepts in Linear Algebra
No ratings yet
Subspace Concepts in Linear Algebra
11 pages
Linear Algebra Concepts and Definitions
No ratings yet
Linear Algebra Concepts and Definitions
8 pages
Vector Space and Linear Maps Report
No ratings yet
Vector Space and Linear Maps Report
14 pages
Vector Spaces and Operations Overview
No ratings yet
Vector Spaces and Operations Overview
8 pages
Linear Algebra Concepts Overview
No ratings yet
Linear Algebra Concepts Overview
10 pages
Vectors and Matrices: Addition Explained
No ratings yet
Vectors and Matrices: Addition Explained
66 pages
Understanding Vectors in Linear Algebra
No ratings yet
Understanding Vectors in Linear Algebra
21 pages
Vectors in Euclidean Space Overview
No ratings yet
Vectors in Euclidean Space Overview
11 pages
Linear Algebra Study Guide 2025
No ratings yet
Linear Algebra Study Guide 2025
50 pages
Vectors and Vector Spaces in Linear Algebra
No ratings yet
Vectors and Vector Spaces in Linear Algebra
33 pages
Abstract Linear Algebra Overview
No ratings yet
Abstract Linear Algebra Overview
48 pages
Introduction to Vectors and Operations
No ratings yet
Introduction to Vectors and Operations
16 pages
Understanding Vector Spaces and Operations
No ratings yet
Understanding Vector Spaces and Operations
8 pages
Discrete Math (Rosen Style) - Linear Algebra Basics
No ratings yet
Discrete Math (Rosen Style) - Linear Algebra Basics
10 pages
Vector Spaces and Operations Overview
100% (1)
Vector Spaces and Operations Overview
55 pages
Digital Image Processing Matrix Concepts
No ratings yet
Digital Image Processing Matrix Concepts
129 pages
Maximal Linearly Independent Groups
No ratings yet
Maximal Linearly Independent Groups
39 pages
Introduction to Vectors and Operations
No ratings yet
Introduction to Vectors and Operations
16 pages
CS III-Vectors in RN Linear Span Subspaces of RN
No ratings yet
CS III-Vectors in RN Linear Span Subspaces of RN
14 pages
Understanding Vector Spaces and Operations
No ratings yet
Understanding Vector Spaces and Operations
20 pages
Linear Algebra and Probability for Deep Learning
No ratings yet
Linear Algebra and Probability for Deep Learning
124 pages
Linear Algebra for Business Analytics
No ratings yet
Linear Algebra for Business Analytics
27 pages
Vector Space Project
No ratings yet
Vector Space Project
18 pages
Introduction to Linear Algebra Concepts
No ratings yet
Introduction to Linear Algebra Concepts
207 pages
Geometric Concepts in Linear Algebra
No ratings yet
Geometric Concepts in Linear Algebra
10 pages
Understanding Vector Algebra Basics
No ratings yet
Understanding Vector Algebra Basics
111 pages
Unit 8
No ratings yet
Unit 8
11 pages
Linear Algebra
No ratings yet
Linear Algebra
27 pages
Linear Algebra Basics for Data Science
No ratings yet
Linear Algebra Basics for Data Science
135 pages
Vector Spaces Study Guide
No ratings yet
Vector Spaces Study Guide
37 pages
Distech Controls Relay Specifications
No ratings yet
Distech Controls Relay Specifications
3 pages
Understanding Confidence Intervals
No ratings yet
Understanding Confidence Intervals
41 pages
August 2023 KPI Monthly Review Report
No ratings yet
August 2023 KPI Monthly Review Report
36 pages
T 3386e Thyricon
No ratings yet
T 3386e Thyricon
6 pages
Inter-Agency Cooperation in Nepal's Border Governance
No ratings yet
Inter-Agency Cooperation in Nepal's Border Governance
14 pages
Tumefactive Demyelination Imaging Review
No ratings yet
Tumefactive Demyelination Imaging Review
12 pages
C# Programming Overview and Features
No ratings yet
C# Programming Overview and Features
22 pages
Decision Tree Classifier with Tuning
No ratings yet
Decision Tree Classifier with Tuning
7 pages
Mycoplasma Detection 5994 2657EN Agilent
No ratings yet
Mycoplasma Detection 5994 2657EN Agilent
3 pages
Early Student Performance Prediction
No ratings yet
Early Student Performance Prediction
12 pages
Avnet Xilinx Product Selection Guide 2021 EN PDF
No ratings yet
Avnet Xilinx Product Selection Guide 2021 EN PDF
68 pages
Hard Ray Optics Questions for JEE/Boards
No ratings yet
Hard Ray Optics Questions for JEE/Boards
2 pages
Hybrid Nanocomposite Mechanical Properties
No ratings yet
Hybrid Nanocomposite Mechanical Properties
8 pages
Properties and Uses of Platinum
No ratings yet
Properties and Uses of Platinum
8 pages
Substitution Methods for ODEs
No ratings yet
Substitution Methods for ODEs
7 pages
Carton Packaging Design Insights
90% (10)
Carton Packaging Design Insights
93 pages
Genetic Parameters of Milk EC in Cows
No ratings yet
Genetic Parameters of Milk EC in Cows
8 pages
Math Lesson Plan: Data & Statistics for Grades 5-6
No ratings yet
Math Lesson Plan: Data & Statistics for Grades 5-6
3 pages
RHCE7 Exam Review
No ratings yet
RHCE7 Exam Review
6 pages
Dynamic Analysis of Locomotive Wheel Axle
No ratings yet
Dynamic Analysis of Locomotive Wheel Axle
16 pages
Standard Sizes for Super T Girder Design
No ratings yet
Standard Sizes for Super T Girder Design
1 page
Intelligent Slipform Paver: The Worldwide Leader in Concrete Paving Technology
No ratings yet
Intelligent Slipform Paver: The Worldwide Leader in Concrete Paving Technology
8 pages
IT Basics: Computer Hardware & Software Quiz
No ratings yet
IT Basics: Computer Hardware & Software Quiz
3 pages
Engineering Dynamics Problem Set 10
No ratings yet
Engineering Dynamics Problem Set 10
5 pages
TSO/ISPF Overview and Logoff Guide
No ratings yet
TSO/ISPF Overview and Logoff Guide
31 pages
Combination Meter Assembly Testing Guide
No ratings yet
Combination Meter Assembly Testing Guide
12 pages
Market Equilibrium Analysis of Soup and Cobalt
No ratings yet
Market Equilibrium Analysis of Soup and Cobalt
9 pages
NC2x MIDI Implementation Overview
No ratings yet
NC2x MIDI Implementation Overview
9 pages
IGCSE Moles and Chemical Equations Workbook
No ratings yet
IGCSE Moles and Chemical Equations Workbook
142 pages
Understanding Oxidation and Reduction
100% (1)
Understanding Oxidation and Reduction
36 pages