Matrix Decompositions Cheat Sheet
Numerical linear algebra course at Skoltech by Ivan Oseledets
Poster is prepared by TAs Maxim Rakhuba and Alexandr Katrutsa
Based on the idea by Skoltech students, NLA 2016
Name Definition ∃ ! Algorithms Use cases
values are I SVD via spectral decomposition of AA∗ and I Data compression, as Eckart-Young theorem
" # "σ1 # " #
I Singular
A= unique A∗A – stability issues states that truncated SVD
σr
2
I If all σi are different, U I Stable algorithm, O(mn ) flops (m > n):
" #
hσ i h i
m×m m×n n×n 1
∗
U Σ V and V are unique up 1. Bidiagonalize A by Householder reflections Ak = σk
k×k k×n
I r = rank(A) to unitary diagonal D: " # m×k
∗
U k Σ k V
SVD I U , V — unitary U ΣV ∗ = (U D)Σ(V D)∗ A = U1BV1∗ = U1 V1∗ k
yields best rank-k approximation to A in k · k2,F
I σ1 ≥ . . . ≥ σr > 0 are nonzero singular I If some σi coincide,
(Singular Value 2. Find SVD of B = U2ΣV2∗ by spectral decom- I Calculation of pseudoinverse A+, e.g. in solv-
values then U and V are not
Decomposition) position of T (2 options): ing over/underdetermined, singular, or ill-posed
I columns of U , V are singular vectors unique ∗B, don’t form T explicitly!
a) T = B linear systems
∗
h i
Note: SVD can be also defined with U ∈ Cm×p, b) T = B B , permute T to tridiagonal I Feature extraction in machine learning
Σ ∈ Rp×p and V ∈ Cn×p, p = min{n.m} 3. U = U1U2, V = V1V2 Note: SVD is also called principal component analysis (PCA)
Assuming m > n:
Not unique: I Model reduction, data compression, and
h I in A = CR version ∀S: I truncated SVD, O(mn2) flops, speedup of computations in numerical analy-
i
X =
det(S) 6= 0: sis: given rank-r matrix with r n, m one needs
r×n
C = Ur Σr , R = Vr∗
to store O((n + m)r) nm elements
V> CR = CSS −1R = C
eRe
m×r
I RRQR: O(mnr) flops I Feature extraction in machine learning, where
U b−1R
Skeleton or using matrix entries I in A = C
bA b version
I Cross approximation: O((n + m)r 2) flops. It it is also known as matrix factorization
any r linearly inde- is based on greedy maximization of | det(A)|. I All applications where SVD applies, since Skele-
b
i−1 h pendent columns Might fail on some A.
(also known as h i ton decomposition can be transformed into
= and rows can be
Rank
I Optimization methods (ALS, ...) for truncated SVD form
r×r r×n
chosen
decomposition) A
b−1 R
b kA − CRk → min,
m×n m×r C,R
A C
b
I r = rank(A)
sometimes with additional constraints, e.g.
– nonnegativity of C and R elements
I C and C
b are full column rank
– small norms of C and R
I R and R
b are full row rank
" #" #" #
λ1 I Not unique in terms of I QR algorithm, O(n4) flops: I Computation of matrix spectrum
A= both U and T : per- Ak = Qk Rk , Ak+1 = Rk Qk I Computation of matrix functions (Schur-Parlett
λn mutation of λ1, . . . , λn
n×n n×n n×n I “Smart” QR algorithm, O(n3) flops: algorithm)
U T U∗ in T will change both
I Solving matrix equations (e.g. Sylvester equa-
U and off-diagonal 1. Reduce A to upper Hessenberg form
I U is unitary " # tion)
Schur part of T
I λ1, . . . , λn are eigenvalues e = Q∗AQ =
A
I columns of U are Schur vectors
Note: then each iteration of QR algorithm will cost O(n2)
2. Run QR algorithm for A
e with shifting strat-
egy to speed-up convergence
" #"
λ1
#" #−1 I ∃ iff ∀λi its geo- I If all λi are differ- I If A = A∗, Jacobi method: O(n3) I Full spectral decomposition is rarely used unless
metric multiplic- ent, then unique up all eigenvectors are needed
A= I If AA∗ = A∗A, QR algorithm: O(n3)
λn
ity equals alge- to permutation and I If one needs only spectrum, Schur decomposi-
∗ ∗ 3
n×n n×n n×n braic multiplicity scaling of eigenvec- I If AA 6= A A, O(n ) flops: tion is the method of choice
Spectral S Λ S −1 tors 1. Find Schur form A = U T U ∗ via QR algo-
I ∃ and S – unitary I If matrix has no spectral decomposition, Schur
I λ1, . . . , λn are eigenvalues iff A is normal: I If some λi coincide, rithm
decomposition is preferable for numerics com-
I columns of S are eigenvectors AA∗ = A∗A, S is not unique 2. Given T find its eigenvectors V pared to Jordan form
e.g. Hermitian 3. S = U V , Λ = diag(T )
" # I Unique if all diagonal Assuming m > n: I Computation of orthogonal basis in a linear
elements of R are set I Gram-Schmidt (GS) process: 2mn2 flops; not space
A =
m≥n to be positive stable I Solving least squares problem (m > n):
kAx − bk2 → min ⇒ x = R−1Q∗b
n×n
m×n
R I modified Gram-Schmidt (MGS) process:
x
Q is left unitary 2mn2 flops; stable
QR 2
I via Householder reflections: 2mn − (2/3)n3 I Solving linear systems
Note: more stable, but has larger constant than LU
" # " # flops; best for dense matrices, sequential
computer architectures; stable I Don’t confuse QR decomposition and QR al-
A= m<n
I via Givens rotations: 3mn2 − n3 flops; best
gorithm!
m×m m×n for sparse matrices, parallel computer archi-
Q is unitary R tectures; stable
unique since I Basic algorithm: Householder QR with col- I Solving rank deficient least squares problem
" # " #
I Not
any r linearly inde- umn pivoting. On k-th iteration:
AP = I Finding subset of linearly independent columns
RRQR pendent columns 1. Find column of largest norm in R [:,k:n]
k
n×n r n−r n×n I Computation of matrix approximation of a
can be selected 2. Permute this column and the k-th column
Q is unitary R given rank
(Rank 3. Zero subcolumn of the k-th column by
Revealing QR) I P is permutation matrix Householder reflection → Rk+1
I r = rank(A) Complexity: O(nmr) flops
" #" #
1 Let det(A) 6= 0 I Unique if det(A) 6= 0 I Different versions of Gaussian elimination, LU, LDL, Cholesky are used for
LU A= I LU ∃ iff all leading O(n3) flops. In LU for stability use permuta- I solving linear systems. Given A = LU , complex-
1
n×n n×n minors 6= 0 tion of rows or columns (LUP) ity of solving Ax = b is O(n2):
L U I O(n3) can be decreased for sparse matri- 1. Forward substitution: Ly = b
Let det(A) 6= 0 ces by appropriate permutations, e.g.
1 1
2. Backward substitution: U x = y
LDL A = I LDL ∃ iff A = A∗ – minimum degree ordering I matrix inversion
1
n×n n×n
1
n×n
and all leading – Cuthill–McKee algorithm
∗ minors 6= 0 I computation of determinant
L D L I Banded matrix with bandwidth b
I Cholesky ∃ iff I Unique if A 0
" #" # Cholesky is also used for
A= A = A∗ and A 0
2b
Cholesky I computing QR decomposition
n×n n×n
∗
L L can be decomposed using O(nb2) flops
References Contact information
[1] G. H. Golub and C. F. Van Loan, Matrix computations, JHU Press, 4th ed., 2013. Course materials: [Link]
[2] L. N. Trefethen and D. Bau III, Numerical linear algebra, vol. 50, SIAM, 1997. Email: [Link]@[Link]
[3] E. E. Tyrtyshnikov, A brief introduction to numerical analysis, Springer Science & Business Media, 2012. Our research group website: [Link]
LATEX TikZposter