0% found this document useful (0 votes)
11 views6 pages

Randomized Algorithms for Matrix Operations

Randomized algorithms have gained prominence in the last 15 years, particularly for matrix operations in big data applications, addressing challenges like poor scalability of standard linear algebra methods. Key techniques include Strassen's algorithm for matrix multiplication, randomized SVD for approximating singular values, and the randomized Kaczmarz method for solving linear systems efficiently. These methods leverage randomness to reduce computational complexity and improve performance in handling large matrices.

Uploaded by

DHANRAJ Malla
Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views6 pages

Randomized Algorithms for Matrix Operations

Randomized algorithms have gained prominence in the last 15 years, particularly for matrix operations in big data applications, addressing challenges like poor scalability of standard linear algebra methods. Key techniques include Strassen's algorithm for matrix multiplication, randomized SVD for approximating singular values, and the randomized Kaczmarz method for solving linear systems efficiently. These methods leverage randomness to reduce computational complexity and improve performance in handling large matrices.

Uploaded by

DHANRAJ Malla
Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

1 Randomized Algorithms

This is a field that became large in the past 15 years, as a subfield of Linear
Algebra, Scientific Computing and especifically, big data. This field did not
receive much attention because of its lack of precision.
The main application of Randomized algorithms is matrix operations. In
many big–data tasks we are dealing with large matrices

• matrix multiplication (ex. covariance matrices),

• solving linear systems of equations. Furthermore, nonlinear system of


equations are usually solved iteratively by linear approximation,

• computing eigenvectors, singular vectors, eigenvalues, etc. ([Link],


PCA), and

• matrix factorization

One of the shortcomings on the computation of standard linear algebra


is that it scales poorly for large matrices (roughly, cubic complexity)

1.1 Matrix Multiplication


Let A, B be two n × n matrices. The cost for computing the product AB is
O(n3 ).

1.2 Strassen’s algorithm (1969)


Let’s decompose the matrix multiplication of A and B into blocks in the
following way
    
A11 A12 B11 B12 C11 C12
=
A21 A22 B21 B22 C21 C22

with each block of size n2 × n2 . Following the standard way of multiplication, it
requires 8 n2 × n2 matrix multiplications and 4 additions of n2 × n2 matrices.
Strassen’s algorithm can compute this block matrix multiplication using
only seven n2 × n2 matrix multiplication and 18 n2 × n2 additions.
3
• Cost for multipliying n2 × n2 is O n2

1
n n n 2

• Cost for adding 2
× 2
is O 2

Now, one can design a recursive procedure with blocks of size n2 , n4 , . . . , (up
to blocks of size 16) and end up with Strassen’s algorithm.
The total cost now is O(nlog2 7 ) ∼ O(n2.8074 ). Strassen’s algorithm has
been improved to O(n2.37 ), but the hidden constant increases significantly.
The stability of the algorithm is normal, but the communication cost is
higher, thus, these algorithms are not (yet) used in practice. Strassen’s
also show that matrix multiplication actually reduces the complexity of the
Gaussian elimination procedure.
Big question Can we construct an algorithm that multiplies two arbitrary
n × n matrices in O(n2+ ) or O(n2 log n) operations?

1.3 Power iteration


Let A be an n × n symmetric matrix, with largest eigenvalue λ1 , |λ1 | ≥
|λ2 | ≥ . . ., with corresponding eigenvector v1 . Choose x0 ∈ Rn randomly,
and compute
Axk
xk+1 = , µk+1 = xTk+1 Axk+1 , k = 0, . . . ,
kAxk k
then xk → v1 and µk → λ1 as k → ∞, with a rate of convergence given by
! !
2 2
λ2 λ2
|µk − λ1 | = O , kxk − v1 k = O
λ1 λ1

(if hx0 , v1 i =
6 0), which in large independent random data sets occurs with
probability one). These methods are called Lanczos-type methods, and in
matlab can be called by eigs. One major application for this is given by the
page-rank problem.

1.4 Randomized algorithms


Matrix multiplication
Let A be an m × n matrix, and B an n × p matrix. We want to approximate
AB at a cost smaller than O(mnp). Note that
n
X
AB = A(i) B(i) ,
i=1

2
where A(i) is the i-th row of A, and B(i) is the i-th column of B. Then, AB
is the sum of rank-1 matrices A(i) B(i) . P
The idea is to approximate AB = ni=1 A(i) B(i) by randomly sampling
rank-1 components, according to their size, by given probabilities {pi }ni=1 ,
where
kA(i) k2 kB(i) k2
pi = n
X
kA(i) k2 kB(i) k2
i=1

Note that, as A(i) B(i) are rank-1 matrices, it is easy to check that

kA(i) B(i) kop = kA(i) k2 kB(i) k2

We select a total number of c such rank-1 components A(it ) B(it ) with


probability P{ik = k} = pk , with replacement, such that

A(it ) B (it )
C (t) = √ , R(t) = √ ,
cpit cpit

and get
c
1 X 1 (it )
AB ≈ CR = A B(it )
C t=1 pit
One can show that in this case, Drineas-Mahoney
 (2011)
 [], and Shynin-
T kAkF kAk2F
Rudelson (2007) []: If B = A , set c ≈ 
log 2 √δ . Then, for any
0 <  < 1, with probability at least 1 − δ,

kAAT − CC T kop ≤ 

The proof uses a matrix concentration inequality, for example

Theorem 1 (Tropp). Let {Ak } be a finite sequence of fixed self-adjoint ma-


trices of size d×d, and let {ξk } be a sequence of Rademacher random variables
(i.e., ξ ∈ {−1, 1} with equal probability). Then, for all t ≥ 0,
( ! )
X t2
P λmax ξk Ak ≥ t ≤ d e− 2σ2 ,
k

where σ 2 = k k A2k kop .


P

3
This result can be related to the sum of zero-one variables. One needs to
consider the Symmetrization result:
X X
E (Xi − E(Xi )) ≤ 2E ξi Xi
i i

2 Randomized SVD
Given an m × n matrix A. SVD of A is given by A = UΣV∗ , where
U = [u1 , . . . , um ], V = [v1 , . . . , vn ], and Σ = diag(σ1 , . . . , σmin(m,n) ). Best
rank-k approximation to A is Ak = kj=1 σj uj vj∗ , and kA − Ak kop = σk+1 ,
P
which serves as the benchmark for approximation error.
Randomized SVD algorithm:
Step 1: (i) Let Ω be an n × 2k Gaussian random matrix. (ii) Form Y =
(AA∗ )q AΩ for some q ∈ N. Y is a matrix of size m × 2k. (iii) Construct a
matrix Q whose columns form an orthonormal basis for the range of Y.
Step 2: (i) Form B = Q∗ A, where B is a 2k × n matrix. (ii) Compute SVD
of B: B = Ũ ˜ Σ̃Ṽ∗ . (iii) Set Ũ = QŨ ˜ and get A ≈ ŨΣ̃Ṽ∗ .
Cost: Let Tmult be the cost of a matrix-vector multiplication by A or A∗ .
and then total cost of randomized SVD is Tmult (2q + 2)k + θ(k 2 (m + n)). In
contrast, full SVD costs θ(min(m, n) · m · n).
Theorem 2. Let A be an m × n matrix. Select an exponent q and a target
number of k singular vectors, where 2 ≤ k ≤ min(m, n)/2. Let ŨΣ̃Ṽ∗ be the
rank-2k factorization produced by randomized SVD. Then
" r 1
# 2q+1
2 · min(m, n)
E kA − ŨΣ̃Ṽ∗ kop ≤ 1 + 4 σk+1 .
k−1

In practice, we can truncate the approximate SVD and keep only the best
k singular values Σ̃k and corresponding singular vectors Ũk , Ṽk . The error
bound is
" r 1
# 2q+1
min(m, n)
E kA − Ũk Σ̃k Ṽk∗ kop ≤ 2 + 4 σk+1 .
k−1

Typically, q = 1 or 2 suffices. If singular values decay fast, often q = 0 is


okay. For practical implementation, the n × 2k Gaussian random matrix can

4
be replaced byp an n × ` randomly subsampled DFT or Hadamard matrix of
the form Ω = n` DFR, where D is an n × n diagonal matrix with random
±1 or random entries from the complex unit circle, F is the n×n unitary DFT
or Hadamard matrix, and R is an n × ` matrix that samples ` coordinates
from {1, 2, . . . , n} uniformly at random (without replacement). In theory,
instead of 2k, we need ` ∼ (2k + log n) · log k, but in practice, ` ∼ 2k suffices.
Full proof is omitted, but the key to the proof is an estimate for kA −
QQ∗ Akop . Concentration-of-measure bounds yield that kA − QQ∗ Akop con-
centrates tightly around E kA − QQ∗ Akop .

3 Randomized Kaczmarz Algorithm


Given Ax = y, if A is too large, we cannot load it into computer memory.
Instead, we can take “row action” methods that apply to rows of A individ-
ually rather than access to the entire matrix A. Let A be an m × n matrix.
Denote the rows of A by a∗1 , . . . , a∗m .
Kaczmarz method: Iteratively project orthogonally onto solution hyper-
plane of hai , xi = yi . Given an initial guess x0 , compute
yi − hai , xk i
xk+1 = xk + ai ,
kai k22
where i = k mod (m + 1). Convergence of Kaczmarz method is easy to
establish, but rate of convergence is not clear (m > 2). Moreover, in many
cases, empirical convergence can be very slow. Remedy is to pick up the
hyperplane onto which we project in some random order instead of the linear
order.
Randomized Kaczmarz method: Iterative projection onto solution hyper-
plane in some random order
yr(i) − har(i) , xk i
xk+1 = xk + ar(i) ,
kar(i) k22

where r(i) is chosen from {1, . . . , m} at random with probability proportional


to kar(i) k22 .
Theorem 3. Let A be an m × n matrix with m ≥ n and rank(A) = n.
Let x be the solution of Ax = y. Denote A† = (A∗ A)−1 A∗ . Then ran-
domized Kaczmarz method converges in expectation and E kxk − xk22 ≤ (1 −

5
κ(A)−2 )k kxo − xk22 , where κ(A) = kAkF kA† kop is scaled condition num-
ber of A. Expected number of iterations achieve accuracy  is E[k ] ∼
2κ(A)2 log(1/).

Full proof can be found in [1]. Note that randomized Kaczmarz method
is not useful if A has a lot of structures so that matrix-vector multiplication
is cheap.

References
[1] Strohmer, Thomas; Vershynin, Roman (2009), “A randomized Kacz-
marz algorithm for linear systems with exponential convergence”, Jour-
nal of Fourier Analysis and Applications 15: 262-278.

Common questions

Powered by AI

Randomized algorithms are mainly applied to matrix operations in big data and scientific computing tasks such as matrix multiplication, solving linear systems of equations, computing eigenvectors, singular vectors, eigenvalues, and performing matrix factorization .

Achieving O(n^2+ε) or O(n^2 log n) complexity for matrix multiplication would dramatically reduce computation time and resource usage, allowing for the handling of even larger datasets efficiently. This would enhance the practical applicability of many scientific computing applications, such as simulations and large-scale data analyses, overcoming current limitations associated with high computational costs .

Rademacher random variables, taking values ±1 with equal probability, are used in matrix concentration inequalities to bound the maximum eigenvalue of a sequence of self-adjoint matrices. They facilitate the randomization process by ensuring that the high-probability bounds on approximations are maintained, crucial for the stability and accuracy of randomized algorithms in approximating matrix operations .

Randomized SVD reduces computational costs by projecting the matrix onto a lower-dimensional subspace using a Gaussian random matrix, which requires fewer singular values and vectors to be computed directly. This approach lowers computational complexity from θ(min(m, n)·m·n) to a more feasible cost involving matrix multiplications and matrix additions, enhancing efficiency .

Strassen's algorithm reduces the number of multiplications needed for matrix multiplication from eight to seven for block matrices. The traditional method requires a computational cost of O(n^3), while Strassen's algorithm reduces it to O(n^log₂7) ≈ O(n^2.8074), which is a significant improvement .

In implementing randomized SVD, practical considerations include selecting an appropriate exponent q and the target number of k singular vectors, understanding the distribution of singular values, and efficiently constructing the Gaussian random matrix or using alternates like a randomly subsampled DFT or Hadamard matrix. Also, adjusting the truncation level based on decay rate of singular values can influence the effectiveness in practical applications .

The power iteration method is significant as it converges to the eigenvector corresponding to the largest eigenvalue of a symmetric matrix by iteratively multiplying by the matrix. It converges at a rate proportional to the ratio of the second largest eigenvalue to the largest eigenvalue squared, which ensures convergence for large independent random data sets with high probability .

Randomized algorithms approximate matrix products by randomly sampling rank-1 components (products of corresponding rows and columns of input matrices) using size-based probabilities. This method reduces the computational effort from O(mnp) to a more manageable cost with less precision, utilizing a matrix concentration inequality to account for approximation errors .

The randomized Kaczmarz method achieves faster convergence by selecting the hyperplane onto which projections are made based on probabilities proportional to the square of row norms, rather than using a deterministic order. This approach reduces the expected number of iterations, leading to exponential convergence in many scenarios .

Matrix factorization simplifies the representation of large data matrices, facilitating tasks like dimensionality reduction and noise filtering. However, without randomized algorithms, matrix factorization is often constrained by high computational costs and inefficiency, as traditional methods scale poorly with large matrix sizes, typically requiring cubic complexity .

You might also like