0% found this document useful (0 votes)

45 views12 pages

Shifted Power Method for Eigenvalues

Q: Why is it beneficial to structure matrices in either Hessenberg or symmetric tridiagonal form before applying iterative eigenvalue algorithms?

Structuring matrices in either Hessenberg or symmetric tridiagonal forms before applying iterative eigenvalue algorithms is beneficial due to the dramatic reduction in computational complexity and execution time. These forms simplify the matrix, enabling efficient application of rotations or reflections needed for factorization, which is fundamental in iterative techniques like the QR algorithm. Such forms not only simplify computational processes but also help maintain numerical stability by reducing off-diagonal element interference, which can lead to significant numerical errors in poorly conditioned matrices. This reduces the computational burden and enhances convergence speed, particularly important with large dimensional matrices .

Q: What distinguishes the QR algorithm from the power method in terms of convergent properties and mathematical requirements?

The QR algorithm is distinguished from the power method by its ability to compute all eigenvalues simultaneously and its convergence properties. Unlike the power method, which only finds the dominant eigenvalue, the QR algorithm does not require a dominant eigenvalue to converge. It is based on orthogonal similarity transformations and will converge for any n x n matrix. The QR algorithm also generally requires fewer iterations and is more robust against numerical instability caused by closely positioned eigenvalues, which can slow convergence in the power method. The QR method additionally avoids the difficulties of dealing with large powers of matrices by operating on the matrix in iterations involving QR factorizations which maintain the matrix's similarity class .

Q: What are the advantages and potential disadvantages of transforming a matrix to upper Hessenberg form before applying the QR algorithm?

The main advantage of transforming a matrix into upper Hessenberg form is the reduction in computational complexity for the QR algorithm, as it allows the factorization to be achieved with fewer Givens transformations, facilitating efficient calculations while maintaining numerical stability. Additionally, such transformations preserve matrix similarity, ensuring that eigenvalues remain unchanged. However, the transformation process itself can be computationally intensive and may introduce round-off errors, particularly if the initial matrix is not well-conditioned. Moreover, if not carefully implemented, the reduction to Hessenberg form might accumulate errors due to finite precision arithmetic .

Q: Discuss the process and reasons for using rotations like the Givens transformation in solving eigenvalue problems.

Rotations such as the Givens transformation are crucial in numerically stable solutions to eigenvalue problems, particularly within the QR algorithm's context. The process involves applying orthogonal rotations to selectively zero out elements below the diagonal in the transformation of a matrix to upper Hessenberg form. This controlled annihilation simplifies subsequent factorizations, allowing fewer operations and reducing numerical error propagation. The essence of using such rotations lies in their ability to preserve the orthogonality of transformations, thus maintaining the condition number of the matrix, which is central to numerical stability. Since eigenvalue problems are sensitive to perturbations, maintaining stability is vital for ensuring the accuracy of computations over multiple iterations .

Q: What role does numerical stability play in the context of computing eigenvalues and eigenvectors, and how can it be maintained?

Numerical stability is critical in computing eigenvalues and eigenvectors as it determines the resistance of algorithms to perturbations from rounding errors and approximation inaccuracies. To maintain stability, care must be taken in selecting and applying transformations during matrix computations. For instance, the use of orthogonal transformations like Givens and Householder in reducing a matrix to a simpler form, such as Hessenberg, helps preserve stability by maintaining the matrix's condition and preventing the amplification of errors. Algorithms such as the QR method benefit significantly, as they involve repeated factorizations that risk error propagation if not implemented stably. Pre-processing matrices to more treatable forms before iteration is a preferred strategy for maintaining stability .

Q: How do Givens transformations operate within the QR algorithm, and what role do they play in maintaining numerical stability?

Givens transformations perform rotations in the plane of two coordinate axes to zero out specific sub-diagonal elements, which is crucial for obtaining the R matrix in the QR factorization of an upper Hessenberg matrix. Each transformation, G_ij, operates exactly on elements in the i-th and j-th positions, applying a rotation that annihilates the desired element while preserving orthogonality and thus numerical stability. By maintaining the form of Hessenberg matrices, Givens transformations prevent significant numerical deterioration by avoiding the excessive growth of entries, which could occur in less structured transformations. This contributes to the QR algorithm's robust handling of various matrices, ensuring that orthogonal matrices derived in factorization remain well-conditioned .

Q: Explain how the Householder transformation is utilized to assist in eigenvalue computation for a matrix.

The Householder transformation is used in the reduction of a matrix to a simpler form, specifically upper Hessenberg form, which facilitates eigenvalue computation. The transformation is orthogonal and involves constructing Householder matrices that zero out below-diagonal elements of a given column. By selectively transforming A into a Hessenberg matrix using sequences of Householder transformations, iterative algorithms such as the QR algorithm can perform more efficiently. This process not only reduces computational cost but also leads to more stable numerical solutions by leveraging the orthogonality of Householder matrices to preserve matrix norms during transformations .

Q: How does the power method help in finding the dominant eigenvalue of a matrix, and what are the potential difficulties associated with this method?

The power method helps find the dominant eigenvalue of a matrix A by iteratively applying A to an arbitrary nonzero vector, v0, in Rn. The method exploits the property that if the eigenvalues satisfy |λ1| > |λ2| ≥ · · · ≥ |λn|, then for a sufficiently large k, Akv0 will approximate an eigenvector corresponding to λ1, assuming α1 ≠ 0 for v0 = α1x1 + · · · + αnxn. As k increases, (1/λk1)vk converges to an eigenvector. However, since λ1 is unknown, scaling vk directly by 1/λk1 presents practical difficulties due to λk1 approaching 0 or ±∞. The method circumvents this by iteratively normalizing vk to a unit vector, leading {uk} to converge to an eigenvector directionally toward x1. A key difficulty is slow convergence when eigenvalues are close together, or when the initial vector does not have a substantial component in the direction of the dominant eigenvector .

The document discusses numerical methods for computing eigenvalues and eigenvectors of matrices, focusing on the power method and QR algorithm. It explains the power method as an iterative approach for finding the dominant eigenvalue and its corresponding eigenvector, detailing its convergence properties and applications. Additionally, it highlights the challenges of using characteristic polynomials for numerical computations and presents a theorem regarding the stability of eigenvalues under small perturbations in the matrix.

Uploaded by

post2keerthana

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

45 views12 pages

Shifted Power Method for Eigenvalues

Uploaded by

post2keerthana

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

440 Chapter 7 Numerical Linear Algebra

(a) Show that u is an eigenvector of Q. What is the 16. Let A = xyT , where x ∈ Rm , y ∈ Rn , and both x
corresponding eigenvalue? and y are nonzero vectors. Show that A has a singu-
(b) Let z be a nonzero vector in Rn that is ortho- lar value decomposition of the form H1 H2 , where
gonal to u. Show that z is an eigenvector of Q H1 and H2 are Householder transformations and
belonging to the eigenvalue λ = 1. σ1 = x y , σ2 = σ3 = · · · = σn = 0
(c) Show that the eigenvalue λ = 1 must have 17. Let
multiplicity n − 1. What is the value of det(Q)? ⎧ ⎫
R=⎪ ⎩ cos θ − sin θ ⎪ ⎭
14. Let R be an n × n plane rotation. What is the sin θ cos θ
value of det(R)? Show that R is not an elementary
Show that if θ is not an integer multiple of π , then
orthogonal matrix.
R can be factored into a product R = ULU, where
15. Let A = Q1 R1 = Q2 R2 , where Q1 and Q2 are ortho- ⎧ cos θ −1 ⎫ ⎧ ⎫
⎪1 sin θ ⎪ ⎪ 1 0⎪
U=⎪ ⎪ ⎪ ⎪
gonal and R1 and R2 are both upper triangular and ⎪ ⎪ ⎪ ⎪
⎩ ⎭ and L = ⎩ ⎭
nonsingular. 0 1 sin θ 1
(a) Show that QT1 Q2 is diagonal.
This type of factorization of a rotation matrix arises
(b) How do R1 and R2 compare? Explain. in applications involving wavelets and ﬁlter bases.

7.6 The Eigenvalue Problem

In this section, we are concerned with numerical methods for computing the eigenval-
ues and eigenvectors of an n×n matrix A. The first method we study is called the power
method. The power method is an iterative method for finding the dominant eigenvalue
of a matrix and a corresponding eigenvector. By the dominant eigenvalue, we mean an
eigenvalue λ1 satisfying |λ1 | > |λi | for i = 2, . . . , n. If the eigenvalues of A satisfy
|λ1 | > |λ2 | > · · · > |λn |
then the power method can be used to compute the eigenvalues one at a time. The
second method, the QR algorithm, is an iterative method involving orthogonal simil-
arity transformations. It has many advantages over the power method. It will converge
whether or not A has a dominant eigenvalue, and it calculates all the eigenvalues at the
same time.
In the examples in Chapter 6, the eigenvalues were determined by forming the
characteristic polynomial and finding its roots. However, this procedure is gener-
ally not recommended for numerical computations. The difficulty is that often a
small change in one or more of the coefficients of the characteristic polynomial
can result in a relatively large change in the computed zeros of the polynomial.
For example, consider the polynomial p(x) = x10 . The lead coefficient is 1 and
the remaining coefficients are all 0. If the constant term is altered by adding
−10−10 , we obtain the polynomial q(x) = x10 − 10−10 . Although the coefficients
of p(x) and q(x) differ only by 10−10 , the roots of q(x) all have absolute value
1
10
, whereas the roots of p(x) are all 0. Thus, even when the coefficients of the
characteristic polynomial have been determined accurately, the computed eigenval-
ues may involve significant error. For this reason, the methods presented in this
section do not involve the characteristic polynomial. To see that there is some ad-
vantage to working directly with the matrix A, we must determine the effect that
small changes in the entries of A have on the eigenvalues. This is done in the next
theorem.
7.6 The Eigenvalue Problem 441

Theorem 7.6.1 Let A be an n×n matrix with n linearly independent eigenvectors, and let X be a matrix
that diagonalizes A. That is,
⎧ ⎫
⎪ λ1 ⎪
⎪
⎪ ⎪
⎪
⎪
⎪ λ2 ⎪
⎪
X AX = D = ⎪
−1
⎪
⎪ . ⎪
⎪
⎪
⎪
⎪ . . ⎪
⎪
⎩ ⎭
λn

If A = A + E and λ is an eigenvalue of A , then

min |λ − λi | ≤ cond2 (X) E 2 (1)

1≤i≤n

Proof We may assume that λ is unequal to any of the λi ’s (otherwise there is nothing to
prove). Thus, if we set D1 = D − λ I, then D1 is a nonsingular diagonal matrix. Since
λ is an eigenvalue of A , it is also an eigenvalue of X −1 A X. Therefore, X −1 A X − λ I
is singular, and hence D−1 −1
1 (X A X − λ I) is also singular. But

D−1 −1 −1 −1
1 (X A X − λ I) = D1 X (A + E − λ I)X
= D−1 −1
1 X EX + I

Therefore, −1 is an eigenvalue of D−1 −1

1 X EX. It follows that

| −1| ≤ D−1 −1
1 X EX 2 ≤ D−1
1 2 cond2 (X) E 2

The 2-norm of D−1

1 is given by

D−1
1 2 = max |λ − λi |−1
1≤i≤n

The index i that maximizes |λ − λi |−1 is the same index that minimizes |λ − λi |. Thus,

min |λ − λi | ≤ cond2 (X) E 2

1≤i≤n

If the matrix A is symmetric, we can choose an orthogonal diagonalizing matrix.

In general, if Q is any orthogonal matrix, then

cond2 (Q) = Q 2 Q−1 2 =1

Hence (1) simpliﬁes to

min |λ − λi | ≤ E 2
1≤i≤n

Thus, if A is symmetric and E 2 is small, the eigenvalues of A will be close to the

eigenvalues of A.
We are now ready to talk about some of the methods for calculating the eigenvalues
and eigenvectors of an n × n matrix A. The ﬁrst method we will present computes an
eigenvector x of A by successively applying A to a given vector in Rn . To see the
442 Chapter 7 Numerical Linear Algebra

idea behind the method, let us assume that A has n linearly independent eigenvectors
x1 , . . . , xn and that the corresponding eigenvalues satisfy
|λ1 | > |λ2 | ≥ · · · ≥ |λn | (2)
Given an arbitrary vector v0 in Rn , we can write
v0 = α1 x1 + · · · + αn xn
Av0 = α1 λ1 x1 + α2 λ2 x2 + · · · + αn λn xn
A2 v0 = α1 λ21 x1 + α2 λ22 x2 + · · · + αn λ2n xn

and, in general,
Ak v0 = α1 λk1 x1 + α2 λk2 x2 + · · · + αn λkn xn
If we define
vk = Ak v0 k = 1, 2, . . .
then
k k
1 λ2 λn
vk = α1 x1 + α2 x2 + · · · + αn xn (3)
λk1 λ1 λ1
Since
λi
<1 for i = 2, 3, . . . , n
λ1
it follows that
1
vk → α1 x1 as k→∞
λk1
Thus, if α1 = 0, then the sequence {(1/λk1 )vk } converges to an eigenvector α1 x1 of A.
There are some obvious difficulties with the method as it has been presented so far.
The main difficulty is that we cannot compute (1/λk1 )vk , since λ1 is unknown. But even
if λ1 were known, there would be difficulties because of λk1 approaching 0 or ±∞.
Fortunately, however, we do not have to scale the sequence {vk } using 1/λk1 . If the vk ’s
are scaled so that we obtain unit vectors at each step, the sequence will converge to a
unit vector in the direction of x1 . The eigenvalue λ1 can be computed at the same time.
This method of computing the eigenvalue of largest magnitude and the corresponding
eigenvector is called the power method.

The Power Method

In this method, two sequences {vk } and {uk } are deﬁned recursively. To start, u0 can be
any nonzero vector in Rn . Once uk has been determined, the vectors vk+1 and uk+1 are
calculated as follows:
1. Set vk+1 = Auk .
2. Find the coordinate jk+1 of vk+1 that has the maximum absolute value.
3. Set uk+1 = (1/vjk+1 )vk+1 .
7.6 The Eigenvalue Problem 443

The sequence {uk } has the property that, for k ≥ 1, uk ∞ = ujk = 1. If

the eigenvalues of A satisfy (2) and u0 can be written as a linear combination of
eigenvectors α1 x1 + · · · + αn xn with α1 = 0, the sequence {uk } will converge to
an eigenvector y of λ1 . If k is large, then uk will be a good approximation to y
and vk+1 = Auk will be a good approximation to λ1 y. Since the jk th coordinate
of uk is 1, it follows that the jk th coordinate of vk+1 will be a good approximation
to λ1 .
In view of (3), we can expect that the uk ’s will converge to y at the same rate
at which (λ2 /λ1 )k is converging to 0. Thus, if |λ2 | is nearly as large as |λ1 |, the
convergence will be slow.

EXAMPLE 1 Let
⎧ ⎫
⎩2 1⎪
A=⎪ ⎭
1 2

It is an easy matter to determine the exact eigenvalues of A. These turn out to be λ1 = 3

and λ2 = 1, with corresponding eigenvectors x1 = (1, 1)T and x2 = (1, −1)T . To
illustrate how the vectors generated by the power method converge, we will apply the
method with u0 = (2, 1)T :
⎧ ⎫ ⎧ ⎫
1
v1 = Au0 = ⎪ ⎩5⎪ ⎭, u1 = v1 = ⎪ ⎩ 1.0 ⎪
⎭
4 5 0.8
⎧ ⎫
⎧ ⎫ ⎪
⎪
1 ⎪ ⎧
⎪ ⎫
⎪
⎩ 2.8 ⎪
⎭,
1 ⎪
⎪ ⎪
⎪ ⎪
⎩ 1.00 ⎪
⎭
v2 = Au1 = u2 = v2 = ⎪
⎪ ⎪≈
2.6 2.8 ⎩ 13 ⎪
⎭ 0.93
14
⎧ ⎫
⎧ ⎫ ⎪
⎪
1 ⎪ ⎧
⎪ ⎫
1 ⎪ 41 ⎪
⎩ ⎭
14 ⎪
⎪ ⎪
⎪ ⎪ 1.00 ⎪
v3 = Au2 = , u3 = v3 = ⎪
⎪ ⎪ ≈ 0.98 ⎭
⎩
14 40 41 ⎩ 40 ⎪⎭
41
⎧ ⎫
v4 = Au3 ≈ ⎪
⎩ 2.98 ⎪
⎭
2.95

If u3 = (1.00, 0.98)T is taken as an approximate eigenvector, then 2.98 is the approx-

imate value of λ1 . Thus, with only a few iterations, the approximation for λ1 involves
an error of only 0.02.

The power method is particularly useful in applications where only a few of the
dominant eigenvalues and eigenvectors are needed. For example, in the analytic hier-
archy process (AHP) only the eigenvectors belonging to the dominant eigenvalues are
needed to determine the weight vectors for the decision process (see Section 6.8).

APPLICATION 1 Computation of AHP Weight Vectors

In Application 4 of Section 6.8 we considered an example where a search committee
at a college makes a hiring choice using AHP. In the example the committee decided
444 Chapter 7 Numerical Linear Algebra

that teaching was twice as important as research and 8 times as important as profes-
sional activities. They also decided that research should be 3 times as important as
professional activities. The comparison matrix for this problem is

⎧ ⎫
⎪
⎪
1 2 8⎪
⎪
⎪1
⎪ 1 3⎪⎪
C=⎪
⎪ ⎪
⎪
⎪
⎩1
2
⎪
⎭
1
8 3
1

The eigenvector belonging to the dominant eigenvalue can be computed using the
power method. Since the dominant eigenvalue is close to 3 and the remaining eigen-
values are close to 0, the power method should converge rapidly. In this case we use
u0 = (1, 1, 1)T as our starting vector and normalize at each step so that the entries of
uk (k ≥ 1) all add up to 1. Using this process we end up with the following sequence
of vectors

⎧ ⎫ ⎧ ⎫ ⎧ ⎫ ⎧ ⎫
⎪
⎪ 0.6486 ⎪
⎪ ⎪
⎪ 0.6286 ⎪
⎪ ⎪
⎪ 0.6281 ⎪
⎪ ⎪
⎪ 0.6282 ⎪
⎪
u1 = ⎪ ⎪, u = ⎪
⎪ 0.2654 ⎪ ⎪ 0.2854 ⎪
⎪ ⎪ 0.2854 ⎪
⎪ ⎪ ⎪ 0.2854 ⎪
⎪ ⎪
⎪
⎩ ⎪
⎭ 2 ⎪ ⎩ ⎭, u3 = ⎪
⎪ ⎩ ⎭, u4 = ⎪
⎪ ⎩ ⎪
⎭
0.0860 0.0860 0.0864 0.0864

where all entries are displayed to 4 digits of accuracy. For k ≥ 3 the computed vectors
uk will all agree to 3 digits of accuracy. Thus if we take w = u4 as our weight vector,
it should be accurate to 3 digits.
For an n × n comparison matrix C, the power method algorithm for computing
AHP weights can be summarized as follows:

1. Set u0 = e where e is a vector in Rn whose entries are all equal to 1.

2. For k = 1, 2, . . .

Set v = Auk
n
s= vi
i=1
uk+1 = 1s v

The iterations should be terminated when uk and uk+1 agree to the desired digits
of accuracy. We then use the computed eigenvector uk+1 as an AHP weight
vector.

The power method can be used to compute the eigenvalue λ1 of largest magnitude
and a corresponding eigenvector y1 . What about finding additional eigenvalues and
eigenvectors? If we could reduce the problem of finding additional eigenvalues of
A to that of finding the eigenvalues of some (n − 1) × (n − 1) matrix A1 , then the
power method could be applied to A1 . This can actually be done by a process called
deflation.
7.6 The Eigenvalue Problem 445

Deflation
The idea behind deflation is to find a nonsingular matrix H such that HAH −1 is a matrix
of the form
⎧ ⎫
⎪ λ1 × · · · × ⎪
⎪
⎪ 0 ⎪
⎪
⎪
⎪ ⎪
⎪
⎪
⎪ ⎪
⎪ (4)
⎪ .
⎪
. ⎪
⎪
⎪
⎩ . A1 ⎪
⎭
0

Since A and HAH −1 are similar, they have the same characteristic polynomials. Thus,
if HAH −1 is of the form (4), then

det(A − λI) = det(HAH −1 − λI) = (λ1 − λ) det(A1 − λI)

and it follows that the remaining n − 1 eigenvalues of A are the eigenvalues of A1 . The
question remains; How do we find such a matrix H? Note that the form (4) requires
that the first column of HAH −1 be λ1 e1 . The first column of HAH −1 is HAH −1 e1 . Thus,

HAH −1 e1 = λ1 e1

or, equivalently,

A(H −1 e1 ) = λ1 (H −1 e1 )

So H −1 e1 is in the eigenspace corresponding to λ1 . Thus, for some eigenvector x1

belonging to λ1 ,

H −1 e1 = x1 or Hx1 = e1

We must ﬁnd a matrix H such that Hx1 = e1 for some eigenvector x1 belonging to
λ1 . This can be done by means of a Householder transformation. If y1 is the computed
eigenvector belonging to λ1 , set

1
x1 = y1
y1 2

Since x1 2 = 1, we can ﬁnd a Householder transformation H such that

Hx1 = e1

Because H is a Householder transformation, it follows that H −1 = H, and hence HAH

is the desired similarity transformation.

Reduction to Hessenberg Form

The standard methods for ﬁnding eigenvalues are all iterative. The amount of work
required in each iteration is often prohibitively high unless, initially, A is in some spe-
cial form that is easier to work with. If this is not the case, the standard procedure
446 Chapter 7 Numerical Linear Algebra

is to reduce A to a simpler form by means of similarity transformations. Generally,

Householder matrices are used to transform A into a matrix of the form
⎧ ⎫
⎪ × × ··· × × ×⎪
⎪
⎪ ⎪
⎪
⎪ × × ··· × × ×⎪⎪
⎪
⎪
⎪ ⎪
⎪
⎪ 0 × ··· × × ×⎪⎪
⎪
⎪
⎪ ⎪
⎪
⎪ 0 0 ··· × × ×⎪⎪
⎪
⎪
⎪ .. ⎪
⎪
⎪
⎪ ⎪
⎪
⎪
⎪ . ⎪
⎪
⎪
⎪ 0 ··· × × ×⎪⎪
⎪
⎩ 0 ⎪
⎭
0 0 ··· 0 × ×

A matrix in this form is said to be in upper Hessenberg form. Thus B is in upper

Hessenberg form if and only if bij = 0 whenever i ≥ j + 2.
A matrix A can be transformed into upper Hessenberg form in the following
manner: First, choose a Householder matrix H1 so that H1 A is of the form
⎧ ⎫
⎪ a11 a12 · · · a1n ⎪
⎪
⎪ ⎪
⎪
⎪ × × ··· × ⎪ ⎪
⎪
⎪
⎪ ⎪
⎪
⎪
⎪ 0 × · · · × ⎪
⎪
⎪
⎪ . ⎪
⎪
⎪
⎪ . ⎪
⎪
⎪
⎩ . ⎪
⎭
0 × ··· ×

The matrix H1 will be of the form

⎧ ⎫
⎪ 1 0 ··· 0 ⎪
⎪
⎪ ⎪
⎪
⎪ 0 × ··· ×⎪⎪
⎪
⎪
⎪ ⎪
⎪
⎪
⎪
.. ⎪
⎪
⎪
⎩ . ⎪
⎭
0 × ··· ×

and hence postmultiplication of H1 A by H1 will leave the ﬁrst column unchanged. If

A(1) = H1 AH1 , then A(1) is a matrix of the form
⎧ (1) ⎫
⎪
⎪ a11 a(1) · · · a(1)
1n ⎪⎪
⎪
⎪
12
⎪
⎪
⎪
⎪ ⎪
(1) ⎪
⎪
⎪ a(1) (1)
a22 · · · a2n ⎪ ⎪
⎪
⎪
21 ⎪
⎪
⎪
⎪ (1) ⎪
⎪
⎪ 0 a32 · · · a3n ⎪
(1)
⎪
⎪
⎪
⎪ ⎪
⎪
⎪
⎪ .. ⎪
⎪
⎪
⎪ . ⎪
⎪
⎩ (1) ⎭
0 an2 · · · ann
(1)

Since H1 is a Householder matrix, it follows that H1−1 = H1 , and hence A(1) is similar
to A. Next, a Householder matrix H2 is chosen so that

H2 (a(1) (1) (1) T (1) (1)

12 , a22 , . . . , an2 ) = (a12 , a22 , ×, 0, . . . , 0)
T
7.6 The Eigenvalue Problem 447

The matrix H2 will be of the form

⎧ ⎫
⎪ 1 0 0 ··· 0 ⎪
⎪
⎪ ⎪
⎪ 1 0 ··· 0 ⎪
⎪ ⎫
⎪ ⎧
⎪ 0 ⎪
⎪
⎪
⎪
⎪ 0 0 × ··· ×⎪⎪
⎪ = ⎪
⎩
I2 O ⎪
⎭
⎪
⎪ .. ⎪
⎪ O X
⎪
⎪ ⎪
⎪
⎪
⎩. ⎪
⎭
0 0 × ··· ×

Multiplication of A(1) on the left by H2 will leave the ﬁrst two rows and the ﬁrst column
unchanged:
⎧ (1) ⎫
⎪
⎪ a11 a(1) a(1) · · · a(1)
1n ⎪⎪
⎪
⎪
12 13
⎪
⎪
⎪
⎪ ⎪
(1) ⎪
⎪
⎪ a(1) (1) (1)
a22 a23 · · · a2n ⎪ ⎪
⎪
⎪
21 ⎪
H2 A(1) =⎪
⎪ 0 × × ··· × ⎪ ⎪
⎪
⎪
⎪ × ··· × ⎪ ⎪
⎪
⎪ 0 0 ⎪
⎪
⎪
⎪ ⎪
⎪
⎪ ...
⎪ ⎪
⎪
⎪
⎩ ⎪
⎭
0 0 × ··· ×

Postmultiplication of H2 A(1) by H2 will leave the ﬁrst two columns unchanged. Thus,
A(2) = H2 A(1) H2 is of the form
⎧ ⎫
⎪ × × × ···
×⎪
⎪
⎪ ⎪
⎪×
⎪ × × ×⎪
···⎪
⎪
⎪
⎪ ⎪
⎪
⎪ 0 × × ×⎪
···⎪
⎪
⎪
⎪ × ×⎪
···⎪
⎪
⎪ 0 0 ⎪
⎪
⎪
⎪ .. ⎪
⎪
⎪
⎪ ⎪
⎪
⎪
⎩ . ⎪
⎭
0 0 × ··· ×

This process may be continued until we end up with an upper Hessenberg matrix

H = A(n−2) = Hn−2 · · · H2 H1 AH1 H2 · · · Hn−2

which is similar to A.
If, in particular, A is symmetric, then, since

H T = Hn−2
T
· · · H2T H1T AT H1T H2T · · · Hn−2
T

= Hn−2 · · · H2 H1 AH1 H2 · · · Hn−2

it follows that H is tridiagonal. Thus, any n × n matrix A can be reduced to upper

Hessenberg form by similarity transformations. If A is symmetric, the reduction will
yield a symmetric tridiagonal matrix.
We close this section by outlining one of the best methods available for computing
the eigenvalues of a matrix. The method is called the QR algorithm and was developed
by John G. F. Francis in 1961.
448 Chapter 7 Numerical Linear Algebra

QR Algorithm
Given an n × n matrix A, factor it into a product Q1 R1 , where Q1 is orthogonal and R1
is upper triangular. Deﬁne

A1 = A = Q1 R1

and

A2 = QT1 AQ1 = R1 Q1

Factor A2 into a product Q2 R2 , where Q2 is orthogonal and R2 is upper triangular.

Deﬁne

A3 = QT2 A2 Q2 = R2 Q2

Note that A2 = QT1 AQ1 and A3 = (Q1 Q2 )T A(Q1 Q2 ) are both similar to A. We can
continue in this manner and obtain a sequence of similar matrices. In general, if

Ak = Qk Rk

then Ak+1 is deﬁned to be Rk Qk . It can be shown that, under very general conditions,
the sequence of matrices deﬁned in this way converges to a matrix T of the form
⎧ ⎫
⎪ B1 × ··· ×⎪
⎪
⎪ ⎪
⎪
⎪ B2 ×⎪ ⎪
⎪
T=⎪
⎪
⎪ .. ⎪
⎪
⎪
⎪
⎪ . ⎪
⎪
⎩ O ⎭
Bs

where the Bi ’s are either 1 × 1 or 2 × 2 diagonal blocks. The matrix T is the real Schur
form of A. (See Theorem 6.4.6.) Each 2 × 2 diagonal block of T will correspond to a
pair of complex conjugate eigenvalues of A. The eigenvalues of A will be eigenvalues
of the Bi ’s. In the case where A is symmetric, each of the Ak ’s will also be symmetric
and the sequence will converge to a diagonal matrix.

EXAMPLE 2 Let A1 be the matrix from Example 1. The QR factorization of A1 requires only a single
Givens transformation,
⎧ ⎫
1 ⎪2 1 ⎪
G1 = √ ⎩ ⎭
5 1 −2
Thus
⎧ ⎫⎧ ⎫⎧ ⎫ ⎧ ⎫
1 ⎪2
⎩ 1⎪⎭ ⎪
⎩ 2 1⎭⎪ ⎪
⎩ 2 1⎪⎭ ⎪
⎩ 2.8 −0.6 ⎭
⎪
A2 = G1 AG1 = =
5 1 −2 1 2 1 −2 −0.6 1.2

The QR factorization of A2 can be accomplished with the Givens transformation

⎧ ⎫
1 ⎪ 2.8 −0.6 ⎪
G2 = √ ⎩ ⎭
8.2 −0.6 −2.8
7.6 The Eigenvalue Problem 449

It follows that
⎧ ⎫
⎪
A3 = G2 A2 G2 ≈ ⎩
2.98 0.22 ⎪
⎭
0.22 1.02
The off-diagonal elements are getting closer to 0 after each iteration, and the diagonal
elements are approaching the eigenvalues λ1 = 3 and λ2 = 1.

Remarks
1. Because of the amount of work required at each iteration of the QR algorithm,
it is important that the starting matrix A be in either Hessenberg or sym-
metric tridiagonal form. If this is not the case, we should perform similarity
transformations on A to obtain a matrix A1 that is in one of these forms.
2. If Ak is in upper Hessenberg form, the QR factorization can be carried out with
n − 1 Givens transformations.
Gn,n−1 · · · G32 G21 Ak = Rk
Setting
QTk = Gn,n−1 · · · G32 G21
we have
Ak = Qk Rk
and
Ak+1 = QTk Ak Qk
To compute Ak+1 , it is not necessary to determine Qk explicitly. We need only
keep track of the n − 1 Givens transformations. When Rk is postmultiplied
by G21 , the resulting matrix will have the (2, 1) entry filled in. The other
entries below the diagonals will all still be zero. Postmultiplying Rk G21 by
G32 will have the effect of filling in the (3, 2) position. Postmultiplication of
Rk G21 G32 by G43 will fill in the (4, 3) position, and so on. Thus, the resulting
matrix Ak+1 = Rk G21 G32 · · · Gn,n−1 will be in upper Hessenberg form. If A1
is a symmetric tridiagonal matrix, then each succeeding Ai will be upper
Hessenberg and symmetric. Hence, A2 , A3 , . . . will all be tridiagonal.
3. As in the power method, convergence may be slow when some of the eigenval-
ues are close together. To speed up convergence, it is customary to introduce
origin shifts. At the kth step, a scalar αk is chosen and Ak − αk I (rather than
Ak ) is decomposed into a product Qk Rk . The matrix Ak+1 is defined by
Ak+1 = Rk Qk + αk I
Note that
QTk Ak Qk = QTk (Qk Rk + αk I)Qk = Rk Qk + αk I = Ak+1
so Ak and Ak+1 are similar. With the proper choice of shifts αk , the convergence
can be greatly accelerated.
4. In our brief discussion, we have presented only an outline of the method. Many
of the details, such as how to choose the origin shifts, have been omitted. For
a more thorough discussion and a proof of convergence, see Wilkinson [36].
450 Chapter 7 Numerical Linear Algebra

SECTION 7.6 EXERCISES

1. Let (b) Find a Householder transformation H such that
⎧ ⎫ HAH is of the form
A=⎪
⎩1 1⎪⎭ ⎧ ⎫
1 1 ⎪
⎪ 4 × ×⎪ ⎪
⎪
⎪ 0 × ×⎪ ⎪
⎪
⎩ ⎪
⎭
(a) Apply one iteration of the power method to A 0 × ×
with any nonzero starting vector.
(b) Apply one iteration of the QR algorithm (c) Compute HAH and ﬁnd the remaining eigen-
to A. values of A.
(c) Determine the exact eigenvalues of A by solv- 6. Let A be an n × n matrix with distinct real eigen-
ing the characteristic equation, and determine values λ1 , λ2 , . . . , λn . Let λ be a scalar that is not
the eigenspace corresponding to the largest ei- an eigenvalue of A and let B = (A − λI)−1 . Show
genvalue. Compare your answers with those to that
parts (a) and (b). (a) the scalars μj = 1/(λj − λ), j = 1, . . . , n are
2. Let the eigenvalues of B.
⎧ ⎫ ⎧ ⎫
⎪
⎪ 2 1 0⎪ ⎪ ⎪
⎪ 1⎪
A=⎪ ⎪ 1 3 1⎪ ⎪ ⎪1⎪ ⎪ (b) if xj is an eigenvector of B belonging to μj , then
⎪ ⎪ 0 = ⎪⎪ ⎪
⎩ ⎪
and u
⎩ ⎭ ⎭ xj is an eigenvector of A belonging to λj .
0 1 2 1
(a) Apply the power method to A to compute v1 , (c) if the power method is applied to B, then the
u1 , v2 , u2 , and v3 . (Round off to two decimal sequence of vectors will converge to an eigen-
places.) vector of A belonging to the eigenvalue that is
(b) Determine an approximation λ1 to the largest closest to λ. [The convergence will be rapid if λ
eigenvalue of A from the coordinates of v3 . De- is much closer to one λi than to any of the oth-
termine the exact value of λ1 and compare it ers. This method of computing eigenvectors by
with λ1 . What is the relative error? using powers of (A − λI)−1 is called the inverse
3. Let power method.]
⎧ ⎫ ⎧ ⎫
7. Let x = (x1 , . . . , xn )T be an eigenvector of A
A=⎪ ⎩ 1 2⎪ ⎭ and u0 = ⎪ ⎩1⎪ ⎭
−1 −1 1 belonging to λ. Show that if |xi | = x ∞ , then
(a) Compute u1 , u2 , u3 , and u4 , using the power n

method. (a) aij xj = λxi

(b) Explain why the power method will fail to j=1
n
converge in this case.
(b) |λ − aii | ≤ |aij | (Gerschgorin’s theorem)
4. Let j=1
⎧ ⎫ j =i
A = A1 = ⎪ ⎩1 1⎪ ⎭
1 3 8. Let λ be an eigenvalue of an n × n matrix A. Show
that for some index j,
Compute A2 and A3 , using the QR algorithm. Com-
pute the exact eigenvalues of A and compare them n (column version of
with the diagonal elements of A3 . To how many |λ − ajj | ≤ |aij | Gerschgorin’s
decimal places do they agree? i=1 (theorem)
i =j
5. Let
⎧ ⎫ 9. Let A be a matrix with eigenvalues λ1 , . . . , λn and
⎪
⎪ 5 2 2⎪⎪ let λ be an eigenvalue of A + E. Let X be a matrix
A=⎪ ⎪ −2 1 −2 ⎪ ⎪
⎪
⎩ ⎪
⎭ that diagonalizes A and let C = X −1 EX. Prove:
−3 −4 2
(a) For some i,
(a) Verify that λ1 = 4 is an eigenvalue of A and n
y1 = (2, −2, 1)T is an eigenvector belonging |λ − λi | ≤ |cij |
to λ1 . j=1
7.7 Least Squares Problems 451

[Hint: λ is an eigenvalue of X −1 (A + E)X. Ap- where Uk is an upper triangular matrix with 1’s on
ply Gerschgorin’s theorem from Exercise 7.] the diagonal and Dk is a diagonal matrix. Let Rk+1
(b) min |λ − λj | ≤ cond∞ (X) E ∞ be an upper triangular matrix of the form
1≤j≤n ⎧ ⎫
⎪
⎪ Rk bk ⎪ ⎪
10. Let Ak = Qk Rk , k = 1, 2, . . . be the sequence of ⎩ ⎭
matrices derived from A = A1 by applying the QR 0T βk
algorithm. For each positive integer k, deﬁne where βk is not an eigenvalue of Rk . Determine
(k +1)×(k +1) matrices Uk+1 and Dk+1 of the form
Pk = Q1 Q2 · · · Qk and Uk = Rk · · · R2 R1 ⎧ ⎫ ⎧ ⎫
⎪ Uk xk ⎪ ⎪ Dk 0 ⎪
Show that Uk+1 = ⎪
⎩ T ⎭, Dk+1 = ⎪
⎪ ⎩ T ⎪
⎭
0 1 0 β
Pk Ak+1 = APk such that

for all k ≥ 1. Rk+1 Uk+1 = Uk+1 Dk+1

11. Let Pk and Uk be defined as in Exercise 10. Show
that 13. Let R be an n × n upper triangular matrix whose
diagonal entries are all distinct. Let Rk denote the
(a) Pk+1 Uk+1 = Pk Ak+1 Uk = APk Uk leading principal submatrix of R of order k and set
(b) Pk Uk = Ak , and hence U1 = (1).
(Q1 Q2 · · · Qk )(Rk · · · R2 R1 ) (a) Use the result from Exercise 12 to derive an
algorithm for finding the eigenvectors of R.
is the QR factorization of Ak . The matrix U of eigenvectors should be upper
12. Let Rk be a k × k upper triangular matrix and triangular with 1’s on the diagonal.
suppose that (b) Show that the algorithm requires approx-
3
imately n6 floating-point multiplications/
Rk Uk = Uk Dk divisions.

7.7 Least Squares Problems

In this section, we study computational methods for ﬁnding least squares solutions of
overdetermined systems. Let A be an m × n matrix with m ≥ n and let b ∈ Rm . We
consider some methods for computing a vector x̂ that minimizes b − Ax 22 .

Normal Equations
We saw in Chapter 5 that if x̂ satisﬁes the normal equations

ATAx = AT b

then x̂ is a solution to the least squares problem. If A is of full rank (rank n), then
ATA is nonsingular and hence the system will have a unique solution. Thus, if ATA
is invertible, one possible method for solving the least squares problem is to form the
normal equations and then solve them by Gaussian elimination. An algorithm for doing
this would have two main parts.
1. Compute B = ATA and c = AT b.
2. Solve Bx = c.
Note that forming the normal equations requires roughly mn2 /2 multiplications.
Since ATA is nonsingular, the matrix B is positive deﬁnite. For positive deﬁnite

Common questions