0% found this document useful (0 votes)
91 views92 pages

Numerical Integration Methods Overview

Numerical integration involves calculating the numerical value of a definite integral. There are several reasons to do this, such as when the function is only known at certain points or its antiderivative cannot be expressed in closed form. Common methods include quadrature rules based on interpolating functions like polynomials, Gaussian quadrature which uses varying intervals, and Monte Carlo methods which work for multidimensional integrals. Newton-Cotes formulas evaluate the integrand at equally spaced points and include techniques like the rectangle rule, trapezoidal rule, and Simpson's rule.

Uploaded by

TonyBarosevcic
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
91 views92 pages

Numerical Integration Methods Overview

Numerical integration involves calculating the numerical value of a definite integral. There are several reasons to do this, such as when the function is only known at certain points or its antiderivative cannot be expressed in closed form. Common methods include quadrature rules based on interpolating functions like polynomials, Gaussian quadrature which uses varying intervals, and Monte Carlo methods which work for multidimensional integrals. Newton-Cotes formulas evaluate the integrand at equally spaced points and include techniques like the rectangle rule, trapezoidal rule, and Simpson's rule.

Uploaded by

TonyBarosevcic
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Numerical integration

In numerical analysis, numerical integration constitutes a broad family of algorithms for


calculating the numerical value of a definite integral, and by extension, the term is also
sometimes used to describe the numerical solution of differential equations. This article
focuses on calculation of definite integrals. The term numerical quadrature (often abbreviated
to quadrature) is more or less a synonym for numerical integration, especially as applied to
one-dimensional integrals. Two- and higher-dimensional integration is sometimes described
as cubature, although the meaning of quadrature is understood for higher dimensional
integration as well.
The basic problem considered by numerical integration is to compute an approximate solution
to a definite integral:

If f(x) is a smooth well-behaved function, integrated over a small number of dimensions and
the limits of integration are bounded, there are many methods of approximating the integral
with arbitrary precision.

Numerical integration consists of finding numerical approximations for the value S.


Reasons for numerical integration
There are several reasons for carrying out numerical integration. The integrand f(x) may be
known only at certain points, such as obtained by sampling. Some embedded systems and
other computer applications may need numerical integration for this reason.
A formula for the integrand may be known, but it may be difficult or impossible to find an
antiderivative which is an elementary function. An example of such an integrand is f(x) =
exp(x2), the antiderivative of which cannot be written in elementary form.
It may be possible to find an antiderivative symbolically, but it may be easier to compute a
numerical approximation than to compute the antiderivative. That may be the case if the
antiderivative is given as an infinite series or product, or if its evaluation requires a special
function which is not available.

91

Methods for one-dimensional integrals


Numerical integration methods can generally be described as combining evaluations of the
integrand to get an approximation to the integral. The integrand is evaluated at a finite set of
points called integration points and a weighted sum of these values is used to approximate the
integral. The integration points and weights depend on the specific method used and the
accuracy required from the approximation.
An important part of the analysis of any numerical integration method is to study the behavior
of the approximation error as a function of the number of integrand evaluations. A method
which yields a small error for a small number of evaluations is usually considered superior.
Reducing the number of evaluations of the integrand reduces the number of arithmetic
operations involved, and therefore reduces the total round-off error. Also, each evaluation
takes time, and the integrand may be arbitrarily complicated.
A 'brute force' kind of numerical integration can be done, if the integrand is reasonably wellbehaved (i.e. piecewise continuous and of bounded variation), by evaluating the integrand
with very small increments.
Quadrature rules based on interpolating functions
A large class of quadrature rules can be derived by constructing interpolating functions which
are easy to integrate. Typically these interpolating functions are polynomials.
The simplest method of this type is to let the interpolating function be a constant function (a
polynomial of degree zero) which passes through the point ((a+b)/2, f((a+b)/2)). This is called
the midpoint rule or rectangle rule.

Illustration of the rectangle rule.


The interpolating function may be an affine function (a polynomial of degree 1) which passes
through the points (a, f(a)) and (b, f(b)). This is called the trapezoidal rule.

92

Illustration of the trapezoidal rule.


For either one of these rules, we can make a more accurate approximation by breaking up the
interval [a, b] into some number n of subintervals, computing an approximation for each
subinterval, then adding up all the results. This is called a composite rule, extended rule, or
iterated rule. For example, the composite trapezoidal rule can be stated as

Interpolation with polynomials evaluated at equally-spaced points in [a, b] yields the


NewtonCotes formulas, of which the rectangle rule and the trapezoidal rule are examples.
Simpson's rule, which is based on a polynomial of order 2, is also a NewtonCotes formula.

Illustration of Simpson's rule.


Quadrature rules with equally-spaced points have the very convenient property of nesting.
The corresponding rule with each interval subdivided includes all the current points, so those
integrand values can be re-used.
If we allow the intervals between interpolation points to vary, we find another group of
quadrature formulas, such as the Gaussian quadrature formulas. A Gaussian quadrature rule is
typically more accurate than a NewtonCotes rule which requires the same number of
function evaluations, if the integrand is smooth (i.e., if it is sufficiently differentiable) Other
quadrature methods with varying intervals include ClenshawCurtis quadrature (also called
Fejr quadrature) methods.
Gaussian quadrature rules do not nest, but the related GaussKronrod quadrature formulas do.
ClenshawCurtis rules nest.
Adaptive algorithms

93

If f(x) does not have many derivatives at all points, or if the derivatives become large, then
Gaussian quadrature is often insufficient. In this case, an adaptive algorithm will perform
better. For many cases, estimating the error from quadrature over an interval for a function
f(x) isn't obvious. One popular solution is to use two different rules of quadrature, and use
their difference as an estimate of the error from quadrature. The other problem is deciding
what "too large" or "very small" signify. A local criterion for "too large" is that the quadrature
error should not be larger than
where t, a real number, is the tolerance we wish to set for
global error. Then again, if h is already tiny, it may not be worthwhile to make it even smaller
even if the quadrature error is apparently large. A global criterion is that the sum of errors on
all the intervals should be less than t. This type of error analysis is usually called "a
posteriori" since we compute the error after having computed the approximation.
Extrapolation methods
The accuracy of a quadrature rule of the Newton-Cotes type is generally a function of the
number of evaluation points. The result is usually more accurate as number of evaluation
points increases, or, equivalently, as the width of the step size between the points decreases. It
is natural to ask what the result would be if the step size were allowed to approach zero. This
can be answered by extrapolating the result from two or more nonzero step sizes (Richardson
extrapolation). The extrapolation function may be a polynomial or rational function.
Multidimensional integrals
The quadrature rules discussed so far are all designed to compute one-dimensional integrals.
To compute integrals in multiple dimensions, one approach is to phrase the multiple integral
as repeated one-dimensional integrals by appealing to Fubini's theorem. This approach
requires the function evaluations to grow exponentially as the number of dimensions
increases. Two methods are known to overcome this so-called curse of dimensionality.
Monte Carlo
Monte Carlo methods and quasi-Monte Carlo methods are easy to apply to multi-dimensional
integrals, and may yield greater accuracy for the same number of function evaluations than
repeated integrations using one-dimensional methods.
A large class of useful Monte Carlo methods are the so-called Markov chain Monte Carlo
algorithms, which include the Metropolis-Hastings algorithm and Gibbs sampling.
Sparse grids
Sparse grids were originally developed by Smolyak for the quadrature of high dimensional
functions. The method is always based on a one dimensional quadrature rule, but performs a
more sophisticated combination of univariate results.
Connection with differential equations
The problem of evaluating the integral

94

can be reduced to an initial value problem for an ordinary differential equation. If the above
integral is denoted by I(b), then the function I satisfies

Methods developed for ordinary differential equations, such as RungeKutta methods, can be
applied to the restated problem and thus be used to evaluate the integral. For instance, the
standard fourth-order RungeKutta method applied to the differential equation yields
Simpson's rule from above.
The differential equation I'(x) = (x) has a special form: the right-hand side contains only
the dependent variable (here x) and not the independent variable (here I). This simplifies the
theory and algorithms considerably. The problem of evaluating integrals is thus best studied
in its own right.

NewtonCotes formulas
In numerical analysis, the NewtonCotes formulas, also called the NewtonCotes rules, are a
group of formulas for numerical integration (also called quadrature) based on evaluating the
integrand at equally-spaced points. They are named after Isaac Newton and Roger Cotes.
NewtonCotes formulas can be useful if the value of the integrand at equally-spaced points is
given. If it is possible to change the points at which the integrand is evaluated, then other
methods such as Gaussian quadrature and ClenshawCurtis quadrature are probably more
suitable.
Description
It is assumed that the value of a function defined on [a, b] is known at equally spaced points
xi, for i = 0, , n, where x0 = a and xn = b. There are two types of NewtonCotes formulas,
the "closed" type which uses the function value at all points, and the "open" type which does
not use the function values at the endpoints. The closed Newton-Cotes formula of degree n is
stated as

where xi = h i + x0, with h (called the step size) equal to (xn x0)/n = (b a)/n. The wi are
called weights.
As can be seen in the following derivation the weights are derived from the Lagrange basis
polynomials. This means they depend only on the xi and not on the function . Let L(x) be the
interpolation polynomial in the Lagrange form for the given data points (x0, (x0)), , (xn,
(xn)), then

95

The open NewtonCotes formula of degree n is stated as

The weights are found in a manner similar to the closed formula.


Instability for high degree
A NewtonCotes formula of any degree n can be constructed. However, for large n a
NewtonCotes rule can sometimes suffer from catastrophic Runge's phenomenon where the
error grows exponentially for large n. Methods such as Gaussian quadrature and Clenshaw
Curtis quadrature with unequally spaced points (clustered at the endpoints of the integration
interval) are stable and much more accurate, and are normally preferred to NewtonCotes. If
these methods can not be used, because the integrand is only given at the fixed equidistributed
grid, then Runge's phenomenon can be avoided by using a composite rule, as explained
below.
Composite rules
For the NewtonCotes rules to be accurate, the step size h needs to be small, which means
that the interval of integration [a,b] must be small itself, which is not true most of the time.
For this reason, one usually performs numerical integration by splitting [a,b] into smaller
subintervals, applying a NewtonCotes rule on each subinterval, and adding up the results.
This is called a composite rule.

Rectangle method
In mathematics, specifically in integral calculus, the rectangle method (also called the
midpoint or mid-ordinate rule) computes an approximation to a definite integral, made by
finding the area of a collection of rectangles whose heights are determined by the values of
the function.
Specifically, the interval (a,b) over which the function is to be integrated is divided into n
equal subintervals of length = (b a) / n. The rectangles are then drawn so that either their
left or right corners, or the middle of their top line lies on the graph of the function, with bases
running along the x-axis. The approximation to the integral is then calculated by adding up the
areas (base multiplied by height) of the n rectangles, giving the formula:

where i' is defined to be either i 1, i or i 1 / 2, giving the approximation based on the topleft corner, the top-right corner or the middle of the top line respectively.
As n gets larger, this approximation gets more accurate. In fact, this computation is the spirit
of the definition of the Riemann integral and the limit of this approximation as
is
defined and equal to the integral of f on (a,b) if this Riemann integral is defined. Note that this

96

is true regardless of which i' is used, however the midpoint approximation tends to be more
accurate for finite n.
Error
For a function f which is twice differentiable, the approximation error in each section (a,a +
) of the midpoint rule decays as the cube of the width of the rectangle

for some in (a,a + ). Summing this, the approximation error for n intervals with width is,
for some in (a,b), less than or equal to

Trapezoidal rule
In mathematics, the trapezoidal rule (also known as the trapezoid rule, or the trapezium rule
in British English) is a way to approximately calculate the definite integral

The trapezoidal rule works by approximating the region under the graph of the function f(x) as
a trapezoid and calculating its area. It follows that

The function f(x) (in blue) is approximated by a linear function (in red).
To calculate this integral more accurately, one first splits the interval of integration [a,b] into
n smaller subintervals, and then applies the trapezoidal rule on each of them. One obtains the
composite trapezoidal rule:

97

Illustration of the composite trapezoidal rule (with a uniform grid).


This can alternatively be written as:

where

(one can also use a non-uniform grid).

Illustration of the composite trapezoidal rule (with a non-uniform grid).


The trapezoidal rule is one of a family of formulas for numerical integration called Newton
Cotes formulas. Simpson's rule is another, often more accurate, member of the same family.
Simpson's rule and other like methods can be expected to improve on the trapezoidal rule for
functions which are twice continuously differentiable; however for rougher functions the
trapezoidal rule is likely to prove preferable. Moreover, the trapezoidal rule tends to become
extremely accurate when periodic functions are integrated over their periods, a fact best
understood in connection with the EulerMaclaurin summation formula. For non-periodic
functions, however, methods with unequally spaced points such as Gaussian quadrature and
ClenshawCurtis quadrature are generally far more accurate; ClenshawCurtis quadrature can
be viewed as a change of variables to express arbitrary integrals in terms of periodic integrals,
at which point the trapezoidal rule can be applied accurately.

98

Error analysis
The error of the composite trapezoidal rule is the difference between the value of the integral
and the numerical result:

This error can be written as

where is some number between a and b.


It follows that if the integrand is concave up (and thus has a positive second derivative), then
the error is negative and the trapezoidal rule overestimates the true value. This can also been
seen from the geometric picture: the trapezoids include all of the area under the curve and
extend over it. Similarly, a concave-down function yields an underestimate because area is
unaccounted for under the curve, but none is counted above. If the interval of the integral
being approximated includes an inflection point, then the error is harder to identify.
An asymptotic error estimate for n is given by

Further terms in this error estimate are given by the EulerMaclaurin summation formula.

Simpson's rule
In numerical analysis, Simpson's rule is a method for numerical integration, the numerical
approximation of definite integrals. Specifically, it is the following approximation:

99

Simpson's rule can be derived by approximating the integrand f(x) (in blue) by the quadratic
interpolant P(x) (in red).
Derivation
Simpson's rule can be derived in various ways.
Quadratic interpolation
One derivation replaces the integrand f(x) by the quadratic polynomial P(x) which takes the
same values as f(x) at the end points a and b and the midpoint m = (a+b) / 2. One can use
Lagrange polynomial interpolation to find an expression for this polynomial,

An easy (albeit tedious) calculation shows that

Averaging the midpoint and the trapezoidal rules


Another derivation constructs Simpson's rule from two simpler approximations: the midpoint
rule

and the trapezoidal rule

The errors in these approximations are

100

respectively. It follows that the leading error term vanishes if we take the weighted average

This weighted average is exactly Simpson's rule.


Using another approximation (for example, the trapezoidal rule with twice as many points), it
is possible to take a suitable weighted average and eliminate another error term. This is
Romberg's method.
Undetermined coefficients
The third derivation starts from the ansatz

The coefficients , and can be fixed by requiring that this approximation be exact for all
quadratic polynomials. This yields Simpson's rule.
Error
The error in approximating an integral by Simpson's rule is

where is some number between a and b.


The error is (asymptotically) proportional to (b a)5. However, the above derivations suggest
an error proportional to (b a)4. Simpson's rule gains an extra order because the points at
which the integrand is evaluated are distributed symmetrically in the interval [a, b].
Note that Simpson's rule provides exact results for any polynomial of degree three or less,
since the error term involves the fourth derivative of f.
Composite Simpson's rule
If the interval of integration [a,b] is in some sense "small", then Simpson's rule will provide
an adequate approximation to the exact integral. By small, what we really mean is that the
function being integrated is relatively smooth over the interval [a,b]. For such a function, a
smooth quadratic interpolant like the one used in Simpson's rule will give good results.
However, it is often the case that the function we are trying to integrate is not smooth over the
interval. Typically, this means that either the function is highly oscillatory, or it lacks
derivatives at certain points. In these cases, Simpson's rule may give very poor results. One
common way of handling this problem is by breaking up the interval [a,b] into a number of

101

small subintervals. Simpson's rule is then applied to each subinterval, with the results being
summed to produce an approximation for the integral over the entire interval. This sort of
approach is termed the composite Simpson's rule.
Suppose that the interval [a,b] is split up in n subintervals, with n an even number. Then, the
composite Simpson's rule is given by

where xj = a + jh for j = 0,1,...,n 1,n with h = (b a) / n; in particular, x0 = a and xn = b. The


above formula can also be written as

The error committed by the composite Simpson's rule is bounded (in absolute value) by

where h is the "step length", given by h = (b a) / n.


This formulation splits the interval [a,b] in subintervals of equal length. In practice, it is often
advantageous to use subintervals of different lengths, and concentrate the efforts on the places
where the integrand is less well-behaved. This leads to the adaptive Simpson's method.
Alternative extended Simpson's rule
This is another formulation of a composite Simpson's rule: instead of applying Simpson's rule
to disjoint segments of the integral to be approximated, Simpson's rule is applied to
overlapping segments, yielding:

Simpson's 3/8 rule


Simpson's 3/8 rule is another method for numerical integration proposed by Thomas Simpson.
It is based upon a cubic interpolation rather than a quadratic interpolation. Simpson's 3/8 rule
is as follows:

The error of this method is:

102

where is some number between a and b. Thus, the 3/8 rule is about twice as accurate as the
standard method, but it uses one more function value. A composite 3/8 rule also exists,
similarly as above.

Gaussian quadrature
In numerical analysis, a quadrature rule is an approximation of the definite integral of a
function, usually stated as a weighted sum of function values at specified points within the
domain of integration. An n-point Gaussian quadrature rule, named after Carl Friedrich
Gauss, is a quadrature rule constructed to yield an exact result for polynomials of degree 2n
1 or less by a suitable choice of the points xi and weights wi for i = 1,...,n. The domain of
integration for such a rule is conventionally taken as [1, 1], so the rule is stated as

Gaussian quadrature as above will only produce accurate results if the function f(x) is well
approximated by a polynomial function within the range [-1,1]. The method is, for example,
not suitable for functions with singularities. However, if the integrated function can be written
as
, where g(x) is approximately polynomial, and W(x) is known, then
there are alternative weights wi such that

Common weighting functions include

(Gauss-Chebyshev) and

(Gauss-Hermite).
It can be shown that the evaluation points are just the roots of a polynomial belonging to a
class of orthogonal polynomials.
Rules for the basic problem
For the integration problem stated above, the associated polynomials are Legendre
polynomials, Pn(x). With the nth polynomial normalized to give Pn(1) = 1, the ith Gauss node,
xi, is the ith root of Pn; its weight is given by

103

Change of interval for Gaussian quadrature


An integral over [a, b] must be changed into an integral over [1, 1] before applying the
Gaussian quadrature rule. This change of interval can be done in the following way:

After applying the Gaussian quadrature rule, the following approximation is obtained:

Other forms of Gaussian quadrature


The integration problem can be expressed in a slightly more general way by introducing a
positive weight function into the integrand, and allowing an interval other than [1, 1]. That
is, the problem is to calculate

for some choices of a, b, and . For a = 1, b = 1, and (x) = 1, the problem is the same as
that considered above. Other choices lead to other integration rules.
Error estimates
The error of a Gaussian quadrature rule can be stated as follows. For an integrand which has
2n continuous derivatives,

for some in (a, b), where pn is the orthogonal polynomial of order n and where

In the important special case of (x) = 1, we have the error estimate

Stoer and Bulirsch remark that this error estimate is inconvenient in practice, since it may be
difficult to estimate the order 2n derivative, and furthermore the actual error may be much
less than a bound established by the derivative. Another approach is to use two Gaussian

104

quadrature rules of different orders, and to estimate the error as the difference between the
two results. For this purpose, GaussKronrod quadrature rules can be useful.
GaussKronrod rules
If the interval [a, b] is subdivided, the Gauss evaluation points of the new subintervals never
coincide with the previous evaluation points (except at zero for odd numbers), and thus the
integrand must be evaluated at every point. GaussKronrod rules are extensions of Gauss
quadrature rules generated by adding n + 1 points to an n-point rule in such a way that the
resulting rule is of order 3n + 1. This allows for computing higher-order estimates while reusing the function values of a lower-order estimate. The difference between a Gauss
quadrature rule and its Kronrod extension are often used as an estimate of the approximation
error.

ClenshawCurtis quadrature
ClenshawCurtis quadrature and Fejr quadrature are methods for numerical integration, or
"quadrature", that are based on an expansion of the integrand in terms of Chebyshev
polynomials. Equivalently, they employ a change of variables x = cos and use a discrete
cosine transform (DCT) approximation for the cosine series. Besides having fast-converging
accuracy comparable to Gaussian quadrature rules, ClenshawCurtis quadrature naturally
leads to nested quadrature rules (where different accuracy orders share points), which is
important for both adaptive quadrature and multidimensional quadrature (cubature).
Briefly, the function f(x) to be integrated is evaluated at the N extrema or roots of a
Chebyshev polynomial and these values are used to construct a polynomial approximation for
the function. This polynomial is then integrated exactly. In practice, the integration weights
for the value of the function at each node are precomputed, and this computation can be
performed in O(NlogN) time by means of fast Fourier transform-related algorithms for the
DCT.
Description
A simple way of understanding the algorithm is to realize that ClenshawCurtis quadrature
amounts to integrating via a change of variable x = cos(). The algorithm is normally
expressed for integration of a function f(x) over the interval [-1,1] (any other interval can be
obtained by appropriate rescaling). For this integral, we can write:

That is, we have transformed the problem from integrating f(x) to one of integrating
f(cos)sin. This can be performed if we know the cosine series for f(cos):

in which case the integral becomes:

105

Of course, in order to calculate the cosine series coefficients

one must again perform a numeric integration, so at first this may not seem to have simplified
the problem. Unlike computation of arbitrary integrals, however, Fourier-series integrations
for periodic functions (like f(cos), by construction), up to the Nyquist frequency k = N, are
accurately computed by the N equally-spaced and equally-weighted points n = n / N for
(except the endpoints are weighted by 1/2, to avoid double-counting,
equivalent to the trapezoidal rule or the EulerMaclaurin formula). That is, we approximate
the cosine-series integral by the type-I discrete cosine transform (DCT):

for
and then use the formula above for the integral in terms of these ak.
Because only a2k is needed, the formula simplifies further into a type-I DCT of order N/2,
assuming N is an even number:

From this formula, it is clear that the Clenshaw-Curtis quadrature rule is symmetric, in that it
weights f(x) and f(x) equally.
Connection to Chebyshev polynomials
The reason that this is connected to the Chebyshev polynomials Tk(x) is that, by definition,
Tk(cos) = cos(k), and so the cosine series above is really an approximation of f(x) by
Chebyshev polynomials:

and thus we are "really" integrating f(x) by integrating its approximate expansion in terms of
Chebyshev polynomials. The evaluation points xn = cos(n / N) correspond to the extrema of
the Chebyshev polynomial TN(x).
The fact that such Chebyshev approximation is just a cosine series under a change of variables
is responsible for the rapid convergence of the approximation as more terms Tk(x) are
included. A cosine series converges very rapidly for functions that are even, periodic, and
sufficiently smooth. This is true here, since f(cos) is even and periodic in by construction,

106

and is k-times differentiable everywhere if f(x) is k-times differentiable on [ 1,1]. (In


contrast, directly applying a cosine-series expansion to f(x) instead of f(cos) will usually not
converge rapidly because the slope of the even-periodic extension would generally be
discontinuous.)
Fejr quadrature
Fejr proposed two quadrature rules very similar to ClenshawCurtis quadrature, but much
earlier (in 1933).
Of these two, Fejr's "second" quadrature rule is nearly identical to ClenshawCurtis. The
only difference is that the endpoints f( 1) and f(1) are set to zero. That is, Fejr only used the
interior extrema of the Chebyshev polynomials, i.e. the true stationary points.
Fejr's "first" quadrature rule evaluates the ak by evaluating f(cos) at a different set of
equally-spaced points, halfway between the extrema: n = (n + 0.5) / N for
.
These are the roots of TN(cos), and are known as the Chebyshev nodes. (These equallyspaced midpoints are the only other choice of quadrature points that preserve both the even
symmetry of the cosine transform and the translational symmetry of the periodic Fourier
series.) This leads to a formula:

which is precisely the type-II DCT. However, Fejr's first quadrature rule is not nested: the
evaluation points for 2N do not coincide with any of the evaluation points for N, unlike
ClenshawCurtis quadrature or Fejr's second rule.
Despite the fact that Fejr discovered these techniques before Clenshaw and Curtis, the name
"ClenshawCurtis quadrature" has become standard.
Comparison to Gaussian quadrature
The classic method of Gaussian quadrature evaluates the integrand at N + 1 points and is
constructed to exactly integrate polynomials up to degree 2N + 1. In contrast, ClenshawCurtis quadrature, above, evaluates the integrand at N + 1 points and exactly integrates
polynomials only up to degree N. It may seem, therefore, that ClenshawCurtis is intrinsically
worse than Gaussian quadrature, but in reality this does not seem to be the case.
In practice, several authors have observed that ClenshawCurtis can have accuracy
comparable to that of Gaussian quadrature for the same number of points. This is possible
because most numeric integrands are not polynomials (especially since polynomials can be
integrated analytically), and approximation of many functions in terms of Chebyshev
polynomials converges rapidly (Chebyshev approximation). In fact, recent theoretical results
argue that both Gaussian and ClenshawCurtis quadrature have error bounded by O([2N] k /
k) for a k-times differentiable integrand.
One often cited advantage of ClenshawCurtis quadrature is that the quadrature weights can
be evaluated in O(NlogN) time by fast Fourier transform algorithms (or their analogues for the

107

DCT), whereas the Gaussian quadrature weights require O(N2) time to compute. As a
practical matter, however, high-order numeric integration is rarely performed by simply
evaluating a quadrature formula for very large N. Instead, one usually employs an adaptive
quadrature scheme that first evaluates the integral to low order, and then successively refines
the accuracy by increasing the number of sample points, possibly only in regions where the
integral is inaccurate. To evaluate the accuracy of the quadrature, one compares the answer
with that of a quadrature rule of even lower order. Ideally, this lower-order quadrature rule
evaluates the integrand at a subset of the original N points, to minimize the integrand
evaluations. This is called a nested quadrature rule, and here Clenshaw-Curtis has the
advantage that the rule for order N uses a subset of the points from order 2N. In contrast,
Gaussian quadrature rules are not naturally nested, and so one must employ GaussKronrod
quadrature formulas or similar methods. Nested rules are also important for sparse grids in
multidimensional quadrature.

GaussKronrod quadrature formula


In numerical mathematics, the GaussKronrod quadrature formula is a method for numerical
integration (calculating approximate values of integrals). GaussKronrod quadrature is a
variant of Gaussian quadrature, in which the evaluation points are chosen so that an accurate
approximation can be computed by re-using the information produced by the computation of a
less accurate approximation. It is an example of what is called a nested quadrature rule: for
the same set of function evaluation points, it has two quadrature rules, one higher-order and
one lower-order (the latter called an embedded rule), and the difference between these two
approximations is used to estimate the calculational error of the integration. These formulas
are named after Alexander Kronrod, who invented them in the 1960s, and Carl Friedrich
Gauss.
Description
The problem in numerical integration is to approximate definite integrals of the form

Such integrals can be approximated, for example, by n-point Gaussian quadrature

where wi, xi are the weights and points at which to evaluate the function f(x).
If the interval [a, b] is subdivided, the Gauss evaluation points of the new subintervals never
coincide with the previous evaluation points (except at zero for odd numbers), and thus the
integrand must be evaluated at every point. GaussKronrod formulas are extensions of Gauss
quadrature formulas generated by adding n + 1 points to an n-point rule in such a way that the
resulting rule is of order 3n + 1. This allows for computing higher-order estimates while reusing the function values of a lower-order estimate. The difference between a Gauss
quadrature rule and its Kronrod extension are often used as an estimate of the approximation
error.
108

Adaptive quadrature
In applied mathematics, adaptive quadrature is a process in which the integral of a function
f(x) is approximated using static quadrature rules on adaptively refined subintervals of the
integration domain. Generally, adaptive algorithms are just as efficient and effective as
traditional algorithms for "well-behaved" integrands, but are also effective for "badlybehaved" integrands for which traditional algorithms fail.
General scheme
Adaptive quadrature follows the general scheme
1. procedure integrate ( f , a , b , tau )

2.

3.
4.
5.
6.
7.
8.

if

then
m = (a + b) / 2
Q = integrate(f,a,m,tau) + integrate(f,m,b,tau)
endif
return Q

An approximation Q to the integral of f(x) over the interval [a,b] is computed (line 2), as well
as an error estimate (line 3). If the estimated error is larger than the required tolerance
(line 4), the interval is subdivided (line 5) and the quadrature is applied on both halves
separately (line 6). Either the initial estimate or the sum of the recursively computed halves is
returned (line 7).
The important components are the quadrature rule itself

the error estimator

and the logic for deciding which interval to subdivide, and when to terminate.
There are, of course, several variants of this scheme. The most common will be discussed
later.
Basic quadrature rules
The quadrature rules generally have the form

109

where the nodes xi and weights wi are generally pre-computed.


In the simplest case, NewtonCotes formulas of even degree are used, where the nodes xi are
evenly spaced in the interval:

.
When such rules are used, the points at which f(x) has been evaluated can be re-used upon
recursion:

A similar strategy is used with ClenshawCurtis quadrature, where the nodes are chosen as

Or, when Fejr quadrature is used,

Gaussian quadrature
GaussKronrod quadrature formula

An algorithm may elect to use different quadrature methods on different subintervals, for
example using a high-order method only where the integrand is smooth.
Error estimation
Some quadrature algorithms generate a sequence of results which should approach the correct
value. Otherwise one can use a "null rule" which has the form of the above quadrature rule,
but whose value would be zero for a simple integrand (for example, if the integrand were a
polynomial of the appropriate degree).

110

Subdivision Logic
"Local" adaptive quadrature makes the acceptable error for a given interval proportional to the
length of that interval. This criterion can be difficult to satisfy if the integrand are badly
behaved at only a few points, for example with a few step discontinuities. Alternatively, one
could require only that the sum of the errors on each of the subintervals be less than the user's
requirement. This would be "global" adaptive quadrature. Global adaptive quadrature can be
more efficient (using fewer evaluations of the integrand) but are generally more complex to
program and may require more working space to record information on the current set of
intervals.

Romberg's method
In numerical analysis, Romberg's method generates a triangular array consisting of numerical
estimates of the definite integral

by using Richardson extrapolation repeatedly on the trapezium rule. Romberg's method


evaluates the integrand at equally-spaced points. The integrand must have continuous
derivatives, though fairly good results may be obtained if only a few derivatives exist. If it is
possible to evaluate the integrand at unequally-spaced points, then other methods such as
Gaussian quadrature and ClenshawCurtis quadrature are generally more accurate.
Method
The method can be defined inductively as follows:

or

where

111

In big O notation, the error for R(n, m) is:

The zeroeth extrapolation, R(n, 0), is equivalent to the trapezoidal rule with 2n1 + 1 points;
the first extrapolation, R(n, 1), is equivalent to Simpson's rule with 2n1 + 1 points.
When function evaluations are expensive, it may be preferable to replace the polynomial
interpolation of Richardson with the rational interpolation proposed by Bulirsch & Stoer.

Monte Carlo integration


In mathematics, Monte Carlo integration is numerical integration using random numbers. That
is, Monte Carlo integration methods are algorithms for the approximate evaluation of definite
integrals, usually multidimensional ones. The usual algorithms evaluate the integrand at a
regular grid. Monte Carlo methods, however, randomly choose the points at which the
integrand is evaluated.
Informally, to estimate the area of a domain D, first pick a simple domain d whose area is
easily calculated and which contains D. Now pick a sequence of random points that fall
within d. Some fraction of these points will also fall within D. The area of D is then estimated
as this fraction multiplied by the area of d.
The traditional Monte Carlo algorithm distributes the evaluation points uniformly over the
integration region. Adaptive algorithms such as VEGAS and MISER use importance
sampling and stratified sampling techniques to get a better result.

An illustration of Monte Carlo integration. In this example, the domain D is the inner circle
and the domain d is the square. Because the square's area can be easily calculated, the area of
the circle can be estimated by the ratio (0.8) of the points inside the circle (40) to the total
number of points (50), yielding an approximation for

112

Recursive stratified sampling


Recursive stratified sampling is a generalization of one-dimensional adaptive quadratures to
multi-dimensional integrals. On each recursion step the integral and the error are estimated
using a plain Monte Carlo algorithm. If the error estimate is larger than the required accuracy
the integration volume is divided into sub-volumes and the procedure is recursively applied to
sub-volumes.
The ordinary `dividing by two' strategy does not work for multi-dimensions as the number of
sub-volumes grows way too fast to keep track of. Instead one estimates along which
dimension a subdivision should bring the most dividends and only subdivides the volume
along this dimension.
The stratified sampling algorithm concentrates the sampling points in the regions where the
variance of the function is largest thus reducing the grand variance and making the sampling
more effective.
MISER Monte Carlo
The MISER algorithm of Press and Farrar is based on recursive stratified sampling. This
technique aims to reduce the overall integration error by concentrating integration points in
the regions of highest variance.
Hence the smallest error estimate is obtained by allocating sample points in proportion to the
standard deviation of the function in each sub-region.
The MISER algorithm proceeds by bisecting the integration region along one coordinate axis
to give two sub-regions at each step. The direction is chosen by examining all d possible
bisections and selecting the one which will minimize the combined variance of the two subregions. The variance in the sub-regions is estimated by sampling with a fraction of the total
number of points available to the current step. The same procedure is then repeated
recursively for each of the two half-spaces from the best bisection. The remaining sample
points are allocated to the sub-regions. This recursive allocation of integration points
continues down to a user-specified depth where each sub-region is integrated using a plain
Monte Carlo estimate. These individual values and their error estimates are then combined
upwards to give an overall result and an estimate of its error.
This routines uses the MISER Monte Carlo algorithm to integrate the function f over the dimdimensional hypercubic region defined by the lower and upper limits in the arrays xl and xu,
each of size dim. The integration uses a fixed number of function calls, and obtains random
sampling points using the random number generator r. A previously allocated workspace s
must be supplied. The result of the integration is returned in result, with an estimated absolute
error abserr.

113

Importance sampling
VEGAS Monte Carlo
The VEGAS algorithm of [Link] is based on importance sampling. It samples points
from the probability distribution described by the function | f |, so that the points are
concentrated in the regions that make the largest contribution to the integral.
In general, if the Monte Carlo integral of f is sampled with points distributed according to a
probability distribution described by the function g.
In practice it is not possible to sample from the exact distribution g for an arbitrary function,
so importance sampling algorithms aim to produce efficient approximations to the desired
distribution.
The VEGAS algorithm approximates the exact distribution by making a number of passes
over the integration region while histogramming the function f. Each histogram is used to
define a sampling distribution for the next pass. Asymptotically this procedure converges to
the desired distribution.
VEGAS incorporates a number of additional features, and combines both stratified sampling
and importance sampling. The integration region is divided into a number of "boxes", with
each box getting a fixed number of points (the goal is 2). Each box can then have a fractional
number of bins, but if bins/box is less than two, Vegas switches to a kind variance reduction
(rather than importance sampling).
This routines uses the VEGAS Monte Carlo algorithm to integrate the function f over the dimdimensional hypercubic region defined by the lower and upper limits in the arrays xl and xu,
each of size dim. The integration uses a fixed number of function calls, and obtains random
sampling points using the random number generator r. A previously allocated workspace s
must be supplied. The result of the integration is returned in result, with an estimated absolute
error abserr. The result and its error estimate are based on a weighted average of independent
samples.
The VEGAS algorithm computes a number of independent estimates of the integral internally,
according to the iterations parameter, and returns their weighted average. Random sampling
of the integrand can occasionally produce an estimate where the error is zero, particularly if
the function is constant in some regions. An estimate with zero error causes the weighted
average to break down and must be handled separately. In the original Fortran
implementations of VEGAS the error estimate is made non-zero by substituting a small value
(typically 1e-30).

114

Numerical differentiation
Numerical differentiation is a technique of numerical analysis to produce an estimate of the
derivative of a mathematical function or function subroutine using values from the function
and perhaps other knowledge about the function.
Finite difference formulae
The simplest method is to use finite difference approximations. A simple two-point estimation
is to compute the slope of a nearby secant line through the points (x,f(x)) and (x+h,f(x+h)).
Choosing a small number h, h represents a small change in x, and it can be either positive or
negative. The slope of this line is

This expression is Newton's difference quotient.

The slope of this secant line differs from the slope of the tangent line by an amount that is
approximately proportional to h. As h approaches zero, the slope of the secant line approaches
the slope of the tangent line. Therefore, the true derivative of f at x is the limit of the value of
the difference quotient as the secant lines get closer and closer to being a tangent line:

Since immediately substituting 0 for h results in division by zero, calculating the derivative
directly can be unintuitive.
A simple three-point estimation is to compute the slope of a nearby secant line through the
points (x-h,f(x-h)) and (x+h,f(x+h)). The slope of this line is

115

More generally, three-point estimation uses the secant line through the points (x h1,f(x h1))
and (x + h2,f(x + h2)). The slope of this line is

The slope of these secant lines differ from the slope of the tangent line by an amount that is
approximately proportional to h2 so that three-point estimation is a more accurate
approximation to the tangent line than two-point estimation when h is small.
For the three-point estimation method, the estimation error is given by:

,
where c is some point between x h and x + h. This error does not include the rounding error.
Practical considerations
An important consideration in practice when the function is approximated using floating point
arithmetic is how small a value of h to choose. If chosen too small, the subtraction will yield a
large rounding error and in fact all the finite difference formulae are ill-conditioned and due to
cancellation will produce a value of zero if h is small enough. If too large, the calculation of
the slope of the secant line will be more accurate, but the estimate of the slope of the tangent
by using the secant could be worse.
A suitable choice for h is sqrt(eps) * x where the machine epsilon eps is typically of the order
2.2e-16. Another important consideration is to make sure that h and x+h are representable in
floating point precision so that the difference between x+h and x is exactly h. This can be
accomplished by placing their values into and out of memory as follows: h = sqrt(eps) * x,
temp = x + h and h = temp x. It may be necessary to declare temp as a volatile variable to
ensure the steps are not undone by compiler optimization.
Higher order methods
Higher order methods for approximating the derivative, as well as methods for higher
derivatives exist. Below is the five point method for the first derivative (five-point stencil in
one dimension).

Slope
In mathematics, the slope or gradient of a line describes its steepness, incline, or grade. A
higher slope value indicates a steeper incline.

116

The slope is defined as the ratio of the "rise" divided by the "run" between two points on a
line, or in other words, the ratio of the altitude change to the horizontal distance between any
two points on the line. Given two points (x1,y1) and (x2,y2) on a line, the slope m of the line is

The slope of a line is defined as the rise over the run, m = y/x.
Through differential calculus, one can calculate the slope of the tangent line to a curve at a
point. The concept of slope applies directly to grades or gradients in geography and civil
engineering. Through trigonometry, the grade m of a road is related to its angle of incline by

Definition
The slope of a line in the plane containing the x and y axes is generally represented by the
letter m, and is defined as the change in the y coordinate divided by the corresponding change
in the x coordinate, between two distinct points on the line. This is described by the following
equation:

(The delta symbol, "", is commonly used in mathematics to mean "difference" or "change".)
Given two points (x1,y1) and (x2,y2), the change in x from one to the other is x2 x1, while the
change in y is y2 y1. Substituting both quantities into the above equation obtains the
following:

117

Note that the way the points are chosen on the line and their order does not matter; the slope
will be the same in each case. Other curves have "accelerating" slopes and one can use
calculus to determine such slopes.
Calculus
The concept of a slope is central to differential calculus. For non-linear functions, the rate of
change varies along the curve. The derivative of the function at a point is the slope of the line
tangent to the curve at the point, and is thus equal to the rate of change of the function at that
point.
If we let x and y be the distances (along the x and y axes, respectively) between two points
on a curve, then the slope given by the above definition,

,
is the slope of a secant line to the curve. For a line, the secant between any two points is the
line itself, but this is not the case for any other type of curve.
For example, the slope of the secant intersecting y = x2 at (0,0) and (3,9) is 3. (The slope of
the tangent at x = 32 is also 3 a consequence of the mean value theorem.)
By moving the two points closer together so that y and x decrease, the secant line more
closely approximates a tangent line to the curve, and as such the slope of the secant
approaches that of the tangent. Using differential calculus, we can determine the limit, or the
value that y/x approaches as y and x get closer to zero; it follows that this limit is the
exact slope of the tangent. If y is dependent on x, then it is sufficient to take the limit where
only x approaches zero. Therefore, the slope of the tangent is the limit of y/x as x
approaches zero. We call this limit the derivative.

Tangent
In geometry, the tangent line (or simply the tangent) to a curve at a given point is the straight
line that "just touches" the curve at that point (in the sense explained more precisely below).
As it passes through the point of tangency, the tangent line is "going in the same direction" as
the curve, and in this sense it is the best straight-line approximation to the curve at that point.
The same definition applies to space curves and curves in n-dimensional Euclidean space.

Tangent to a curve.
118

Similarly, the tangent plane to a surface at a given point is the plane that "just touches" the
surface at that point. The concept of a tangent is one of the most fundamental notions in
differential geometry and has been extensively generalized; (Tangent space).
In trigonometry, tangent is the slope formula that combines the rise and run. The tangent
function takes an angle and tells the slope when the length of the line is 1.
The word "tangent" comes from the Latin tangere, meaning "to touch".
Tangent line to a curve
The intuitive notion that a tangent line "just touches" a curve can be made more explicit by
considering the sequence of straight lines (secant lines) passing through two points, A and B,
those that lie on the function curve. The tangent at A is the limit when point B approximates or
tends to A. The existence and uniqueness of the tangent line depends on a certain type of
mathematical smoothness, known as "differentiability". For example, if two circular arcs meet
at a sharp point (a vertex) then there is no uniquely defined tangent at the vertex because the
limit of the progression of secant lines depends on the direction in which "point B"
approaches the vertex.

A tangent, a chord, and a secant to a circle.


In most cases commonly encountered, the tangent to a curve does not cross the curve at the
point of tangency (though it may, when continued, cross the curve at other places away from
the point of tangent) This is true, for example, of all tangents to a circle or a parabola.
However, at exceptional points called inflection points, the tangent line does cross the curve
at the point of tangency. An example is the point (0,0) on the graph of the cubic parabola y =
x3.
Conversely, it may happen that the curve lies entirely on one side of a straight line passing
through a point on it, and yet this straight line is not a tangent line. This is the case, for
example, for a line passing through the vertex of a triangle and not intersecting the triangle
where the tangent line does not exist for the reasons explained above. In convex geometry,
such lines are called supporting lines.

119

Analytical approach
The geometric idea of the tangent line as the limit of secant lines serves as the motivation for
analytical methods that are used to find tangent lines explicitly. The question of finding the
tangent line to a graph, or the tangent line problem, was one of the central questions leading
to the development of calculus in the 17th century. In the second book of his Geometry, Ren
Descartes said of the problem of constructing the tangent to a curve, "And I dare say that this
is not only the most useful and most general problem in geometry that I know, but even that I
have ever desired to know".
Intuitive description
Suppose that a curve is given as the graph of a function, y = f(x). To find the tangent line at
the point p = (a, f(a)), consider another nearby point q = (a + h, f(a + h)) on the curve. The
slope of the secant line passing through p and q is equal to the difference quotient

and since the difference in x values (h) is getting smaller or going to zero it can be described
has

As the point q approaches p, which corresponds to making h smaller and smaller, the
difference quotient should approach a certain limiting value k, which is the slope of the
tangent line at the point p. If k is known, the equation of the tangent line can be found in the
point-slope form:

More rigorous description


To make the preceding reasoning rigorous, one has to explain what is meant by the difference
quotient approaching a certain limiting value k. The precise mathematical formulation was
given by Cauchy in the 19th century and is based on the notion of limit. Suppose that the
graph does not have a break or a sharp edge at p and it is neither plumb nor too wiggly near p.
Then there is a unique value of k such that as h approaches 0, the difference quotient gets
closer and closer to k, and the distance between them becomes negligible compared with the
size of h, if h is small enough. This leads to the definition of the slope of the tangent line to
the graph as the limit of the difference quotients for the function f. This limit is the derivative
of the function f at x = a, denoted f (a). Using derivatives, the equation of the tangent line can
be stated as follows:

Calculus provides rules for computing the derivatives of functions that are given by formulas,
such as the power function, trigonometric functions, exponential function, logarithm, and their
120

various combinations. Thus, equations of the tangents to graphs of all these functions, as well
as many others, can be found by the methods of calculus.
When the method fails
Calculus also demonstrates that there are functions and points on their graphs for which the
limit determining the slope of the tangent line does not exist. For these points the function f is
non-differentiable. There are two possible reasons for the method of finding the tangents
based on the limits and derivatives to fail: either the geometric tangent exists, but it is a
vertical line, which cannot be given in the point-slope form since it does not have a slope, or
the graph is too badly behaved to admit a geometric tangent.
The graph y = x1/3 illustrates the first possibility: here the difference quotient at a = 0 is equal
to h1/3/h = h2/3, which becomes very large as h approaches 0. The tangent line to this curve at
the origin is vertical.
The graph y = |x| of the absolute value function consists of two straight lines with different
slopes joined at the origin. As a point q approaches the origin from the right, the secant line
always has slope 1. As a point q approaches the origin from the left, the secant line always has
slope 1. Therefore, there is no unique tangent to the graph at the origin (although in a certain
sense, there are two half-tangents, corresponding to two possible directions of approaching
the origin).
Surfaces and higher-dimensional manifolds
The tangent plane to a surface at a given point p is defined in an analogous way to the tangent
line in the case of curves. It is the best approximation of the surface by a plane at p, and can
be obtained as the limiting position of the planes passing through 3 distinct points on the
surface close to p as these points converge to p. More generally, there is a k-dimensional
tangent space at each point of a k-dimensional manifold in the n-dimensional Euclidean space.

Secant line
A secant line of a curve is a line that (locally) sometimes intersects two points on the curve.
The word secant comes from the Latin secare, to cut.
It can be used to approximate the tangent to a curve, at some point f. If the secant to a curve is
defined by two points, P and Q, with P fixed and Q variable, as Q approaches P along the
curve, the direction of the secant approaches that of the tangent at P, assuming there is just
one. As a consequence, one could say that the limit of the secant's slope, or direction, is that
of the tangent. In calculus, this idea is the basis of the geometric definition of the derivative.
A chord is the portion of a secant that lies within the curve.

Derivative
In calculus (a branch of mathematics) the derivative is a measure of how a function changes
as its input changes. Loosely speaking, a derivative can be thought of as how much a quantity
is changing at a given point; for example, the derivative of the position of a vehicle with
respect to time is the instantaneous velocity at which the vehicle is traveling. Conversely, the
integral of the velocity over time is the change in the vehicle's position.
121

The derivative of a function at a chosen input value describes the best linear approximation of
the function near that input value. For a real-valued function of a single real variable, the
derivative at a point equals the slope of the tangent line to the graph of the function at that
point. In higher dimensions, the derivative of a function at a point is a linear transformation
called the linearization. A closely related notion is the differential of a function.
The process of finding a derivative is called differentiation. The fundamental theorem of
calculus states that differentiation is the reverse process to integration.
Differentiation and the derivative
Differentiation is a method to compute the rate at which a dependent output y changes with
respect to the change in the independent input x. This rate of change is called the derivative of
y with respect to x. In more precise language, the dependence of y upon x means that y is a
function of x. This functional relationship is often denoted y = (x), where denotes the
function. If x and y are real numbers, and if the graph of y is plotted against x, the derivative
measures the slope of this graph at each point.
The simplest case is when y is a linear function of x, meaning that the graph of y against x is a
straight line. In this case, y = (x) = m x + c, for real numbers m and c, and the slope m is
given by

where the symbol (the uppercase form of the Greek letter Delta) is an abbreviation for
"change in." This formula is true because
y + y = (x+ x) = m (x + x) + c = m x + c + m x = y + mx.
It follows that y = m x.
This gives an exact value for the slope of a straight line. If the function is not linear (i.e. its
graph is not a straight line), however, then the change in y divided by the change in x varies:
differentiation is a method to find an exact value for this rate of change at any given value of
x.
The idea, illustrated by the Figures below, is to compute the rate of change as the limiting
value of the ratio of the differences y / x as x becomes infinitely small.
In Leibniz's notation, such an infinitesimal change in x is denoted by dx, and the derivative of
y with respect to x is written

suggesting the ratio of two infinitesimal quantities. (The above expression is read as "the
derivative of y with respect to x", "d y by d x", or "d y over d x". The oral form "d y d x" is
often used conversationally, although it may lead to confusion.)

122

The most common approach to turn this intuitive idea into a precise definition uses limits, but
there are other methods, such as non-standard analysis.

The tangent line at (x, (x)).

The secant to curve y= (x) determined by points (x, (x)) and (x+h, (x+h)).

The tangent line as limit of secants.

123

Definition via difference quotients


Let be a real valued function. In classical geometry, the tangent line at a real number a was
the unique line through the point (a, (a)) which did not meet the graph of transversally,
meaning that the line did not pass straight through the graph. The derivative of y with respect
to x at a is, geometrically, the slope of the tangent line to the graph of at a. The slope of the
tangent line is very close to the slope of the line through (a, (a)) and a nearby point on the
graph, for example (a + h, (a + h)). These lines are called secant lines. A value of h close to
zero will give a good approximation to the slope of the tangent line, and smaller values (in
absolute value) of h will, in general, give better approximations. The slope m of the secant
line is the difference between the y values of these points divided by the difference between
the x values, that is,

This expression is Newton's difference quotient. The derivative is the value of the difference
quotient as the secant lines approach the tangent line. Formally, the derivative of the function
at a is the limit

of the difference quotient as h approaches zero, if this limit exists. If the limit exists, then is
differentiable at a. Here (a) is one of several common notations for the derivative
(Lagrange's notation).
Equivalently, the derivative satisfies the property that

which has the intuitive interpretation that the tangent line to at a gives the best linear
approximation

to near a (i.e., for small h). This interpretation is the easiest to generalize to other settings.
Substituting 0 for h in the difference quotient causes division by zero, so the slope of the
tangent line cannot be found directly. Instead, define Q(h) to be the difference quotient as a
function of h:

Q(h) is the slope of the secant line between (a, (a)) and (a + h, (a + h)). If is a continuous
function, meaning that its graph is an unbroken curve with no gaps, then Q is a continuous

124

exists, meaning that there is a


function away from the point h = 0. If the limit
way of choosing a value for Q(0) which makes the graph of Q a continuous function, then the
function is differentiable at the point a, and its derivative at a equals Q(0).
In practice, the existence of a continuous extension of the difference quotient Q(h) to h = 0 is
shown by modifying the numerator to cancel h in the denominator. This process can be long
and tedious for complicated functions, and many shortcuts are commonly used to simplify the
process.

Finite difference
A finite difference is a mathematical expression of the form f(x + b) f(x + a). If a finite
difference is divided by b a, one gets a difference quotient. The approximation of
derivatives by finite differences plays a central role in finite difference methods for the
numerical solution of differential equations, especially boundary value problems.
In mathematical analysis, operators involving finite differences are studied. A difference
operator is an operator which maps a function f to a function whose values are the
corresponding finite differences.
Forward, backward, and central differences
Only three forms are commonly considered: forward, backward, and central differences.
A forward difference is an expression of the form

Depending on the application, the spacing h may be variable or held constant.


A backward difference uses the function values at x and x h, instead of the values at x + h
and x:

Finally, the central difference is given by

Relation with derivatives


The derivative of a function f at a point x is defined by the limit

If h has a fixed (non-zero) value, instead of approaching zero, then the right-hand side is

125

Hence, the forward difference divided by h approximates the derivative when h is small. The
error in this approximation can be derived from Taylor's theorem. Assuming that f is
continuously differentiable, the error is

The same formula holds for the backward difference:

However, the central difference yields a more accurate approximation. Its error is proportional
to square of the spacing (if f is twice continuously differentiable):

Higher-order differences
In an analogous way one can obtain finite difference approximations to higher order
derivatives and differential operators. For example, by using the above central difference
formula for f'(x + h / 2) and f'(x h / 2) and applying a central difference formula for the
derivative of f' at x, we obtain the central difference approximation of the second derivative of
f:

More generally, the nth-order forward, backward, and central differences are respectively
given by:

Note that the central difference will, for odd n, have h multiplied by non-integers. This is
often a problem because it amounts to changing the interval of discretization. The problem
may be remedied taking the average of n[f](x h / 2) and n[f](x + h / 2).
126

The relationship of these higher-order differences with the respective derivatives is very
straightforward:

Higher-order differences can also be used to construct better approximations. As mentioned


above, the first-order difference approximates the first-order derivative up to a term of order
h. However, the combination

approximates f'(x) up to a term of order h2. This can be proven by expanding the above
expression in Taylor series, or by using the calculus of finite differences, explained below.
If necessary, the finite difference can be centered about any point by mixing forward,
backward, and central differences.
Arbitrarily sized kernels
Using a little linear algebra, one can fairly easily construct approximations, which sample an
arbitrary number of points to the left and a (possibly different) number of points to the right of
the center point, for any order of derivative. This involves solving a linear system such that
the Taylor expansion of the sum of those points, around the center point, well approximates
the Taylor expansion of the desired derivative.
This is useful for differentiating a function on a grid, where, as one approaches the edge of the
grid, one must sample fewer and fewer points on one side.
Finite difference methods
An important application of finite differences is in numerical analysis, especially in numerical
differential equations, which aim at the numerical solution of ordinary and partial differential
equations respectively. The idea is to replace the derivatives appearing in the differential
equation by finite differences that approximate them. The resulting methods are called finite
difference methods.
Common applications of the finite difference method are in computational science and
engineering disciplines, such as thermal engineering, fluid mechanics, etc.

Difference operator
In mathematics, a difference operator maps a function, (x), to another function, (x + b)
(x + a).
The forward difference operator

127

occurs frequently in the calculus of finite differences, where it plays a role formally similar to
that of the derivative, but used in discrete circumstances. Difference equations can often be
solved with techniques very similar to those for solving differential equations. This similarity
led to the development of time scale calculus. Analogously we can have the backward
difference operator

When restricted to polynomial functions f, the forward difference operator is a delta operator,
i.e., a shift-equivariant linear operator on polynomials that reduces degree by 1.
n-th difference
The nth forward difference of a function f(x) is given by

is the binomial coefficient. Forward differences applied to a sequence are


where
sometimes called the binomial transform of the sequence, and have a number of interesting
combinatorial properties.
Forward differences may be evaluated using the NrlundRice integral. The integral
representation for these types of series is interesting because the integral can often be
evaluated using asymptotic expansion or saddle-point techniques; by contrast, the forward
difference series can be extremely hard to evaluate numerically, because the binomial
coefficients grow rapidly for large n.
Newton series
The Newton series or Newton forward difference equation, named after Isaac Newton, is the
relationship

which holds for any polynomial function f and for some, but not all, analytic functions. Here,

is the binomial coefficient, and

128

is the "falling factorial" or "lower factorial" and the empty product (x)0 defined to be 1. Note
also the formal similarity of this result to Taylor's theorem; this is one of the observations that
lead to the idea of umbral calculus.
In analysis with p-adic numbers, Mahler's theorem states that the assumption that f is a
polynomial function can be weakened all the way to the assumption that f is merely
continuous.
Carlson's theorem provides necessary and sufficient conditions for a Newton series to be
unique, if it exists. However, a Newton series will not, in general, exist.
The Newton series, together with the Stirling series and the Selberg series, is a special case of
the general difference series, all of which are defined in terms of scaled forward differences.

Taylor's theorem
In calculus, Taylor's theorem gives a sequence of approximations of a differentiable function
around a given point by polynomials (the Taylor polynomials of that function) whose
coefficients depend only on the derivatives of the function at that point. The theorem also
gives precise estimates on the size of the error in the approximation. The theorem is named
after the mathematician Brook Taylor, who stated it in 1712, though the result was first
discovered 41 years earlier in 1671 by James Gregory.
Taylor's theorem in one variable
Motivation
Taylor's theorem asserts that any sufficiently smooth function can locally be approximated by
polynomials. A simple example of application of Taylor's theorem is the approximation of the
exponential function ex near x = 0:

The approximation is called the n-th order Taylor approximation to ex because it approximates
the value of the exponential function by a polynomial of degree n. This approximation only
holds for x close to zero, and as x moves further away from zero, the approximation becomes
worse. The quality of the approximation is controlled by the remainder term:

More generally, Taylor's theorem applies to any sufficiently differentiable function , giving
an approximation, for x near a point a, of the form

129

The remainder term is just the difference of the function and its approximating polynomial

Although an explicit formula for the remainder term is seldom of any use, Taylor's theorem
also provides several ways in which to estimate the value of the remainder. In other words, for
x near enough to a, the remainder ought to be "small"; Taylor's theorem gives information on
precisely how small it actually is.
Statement
The precise statement of the theorem is as follows: If n 0 is an integer and is a function
which is n times continuously differentiable on the closed interval [a, x] and n + 1 times
differentiable on the open interval (a, x), then

Here, n! denotes the factorial of n, and Rn(x) is a remainder term, denoting the difference
between the Taylor polynomial of degree n and the original function. The remainder term
Rn(x) depends on x and is small if x is close enough to a. Several expressions are available for
it.
The Lagrange form of the remainder term states that there exists a number between a and x
such that

This exposes Taylor's theorem as a generalization of the mean value theorem. In fact, the
mean value theorem is used to prove Taylor's theorem with the Lagrange remainder term.
The Cauchy form of the remainder term states that there exists a number between a and x
such that

More generally, if G(t) is a continuous function on [a,x] which is differentiable with nonvanishing derivative on (a,x), then there exists a number between a and x such that

This exposes Taylor's theorem as a generalization of the Cauchy mean value theorem.

130

The above forms are restricted to the case of functions taking real values. However, the
integral form of the remainder term applies as well when the function takes complex values. It
is:

provided, as is often the case, (n) is absolutely continuous on [a, x]. This shows the theorem
to be a generalization of the fundamental theorem of calculus.
In general, a function does not need to be equal to its Taylor series, since it is possible that the
Taylor series does not converge, or that it converges to a different function. However, for
many functions (x), one can show that the remainder term Rn approaches zero as n
approaches . Those functions can be expressed as a Taylor series in a neighbourhood of the
point a and are called analytic.
Taylor's theorem (with the integral formulation of the remainder term) is also valid if the
function has complex values or vector values. Furthermore, there is a version of Taylor's
theorem for functions in several variables. For complex functions analytic in a region
containing a circle C surrounding a and its interior, there is a contour integral expression for
the remainder

valid inside of C.
Estimates of the remainder
Another common version of Taylor's theorem holds on an interval (a r, a + r) where the
variable x is assumed to take its values. This formulation of the theorem has the advantage
that it is often possible to explicitly control the size of the remainder terms, and thus arrive at
an approximation of a function valid in a whole interval with precise bounds on the quality of
the approximation.
A precise version of Taylor's theorem in this form is as follows. Suppose is a function which
is n times continuously differentiable on the closed interval [a r, a + r] and n + 1 times
differentiable on the open interval (a r, a + r). If there exists a positive real constant Mn
such that |(n+1)(x)| Mn for all x (a r, a + r), then

where the remainder function Rn satisfies the inequality (known as Cauchy's estimate):

131

for all x (a r, a + r). This is called a uniform estimate of the error in the Taylor
polynomial centered at a, because it holds uniformly for all x in the interval.
If is infinitely differentiable on [a r, a + r], then positive constants Mn exist for each n = 1,
2, 3, such that | (n+1)(x)| Mn for all x (a r, a + r). If, in addition, it is possible to
select these constants so that

as
then is an analytic function on (a r, a + r). In particular, the remainder term in the Taylor
approximation Rn(x) tends to zero uniformly as n. In other words, an analytic function is
the uniform limit of its Taylor polynomials on an interval.
Taylor's theorem for several variables
Taylor's theorem can be generalized to several variables as follows. Let B be a ball in RN
centered at a point a, and be a real-valued function defined on the closure
having n+1
continuous partial derivatives at every point. Taylor's theorem asserts that for any
,

where the summation extends over multi-indices (this formula uses the multi-index
notation).
The remainder terms satisfy the inequality

for all with || = n + 1. As was the case with one variable, the remainder terms can be
described explicitly.

Taylor series
In mathematics, the Taylor series is a representation of a function as an infinite sum of terms
calculated from the values of its derivatives at a single point. It may be regarded as the limit
of the Taylor polynomials. Taylor series are named after the English mathematician Brook
Taylor. If the series is centered at zero, the series is also called a Maclaurin series, named
after the Scottish mathematician Colin Maclaurin.
Definition
The Taylor series of a real or complex function (x) that is infinitely differentiable in a
neighbourhood of a real or complex number a, is the power series

132

which in a more compact form can be written as

where n! denotes the factorial of n and (n)(a) denotes the nth derivative of evaluated at the
point a; the zeroth derivative of is defined to be itself and (x a) and 0! are both defined
to be 1.
In the particular case where a = 0, the series is also called a Maclaurin series.
Convergence
Taylor series need not in general be convergent. More precisely, the set of functions with a
convergent Taylor series is a meager set in the Frechet space of smooth functions. In spite of
this, for many functions that arise in practice, the Taylor series does converge.
The limit of a convergent Taylor series of a function f need not in general be equal to the
function value f(x), but in practice often it is. For example, the function

is infinitely differentiable at x = 0, and has all derivatives zero there. Consequently, the Taylor
series of f(x) is zero. However, f(x) is not equal to the zero function, and so it is not equal to
its Taylor series.
If f(x) is equal to its Taylor series in a neighborhood of a, it is said to be analytic in this
neighborhood. If f(x) is equal to its Taylor series everywhere it is called entire. The
exponential function ex and the trigonometric functions sine and cosine are examples of entire
functions. Examples of functions that are not entire include the logarithm, the trigonometric
function tangent, and its inverse arctan. For these functions the Taylor series do not converge
if x is far from a.
Taylor series can be used to calculate the value of an entire function in every point, if the
value of the function, and of all of its derivatives, are known at a single point. Uses of the
Taylor series for entire functions include:
1. The partial sums (the Taylor polynomials) of the series can be used as approximations
of the entire function. These approximations are good if sufficiently many terms are
included.
2. The series representation simplifies many mathematical proofs.

133

For example, the function sin(x) may be accurately approximated around the point a = 0 by a
polynomial of degree seven:

The error in this approximation is no more than |x|9/9!. In particular, for 1 < x < 1, the error
is less than 0.000003.
In contrast, the function ln(1 + x) approximated around the point a = 0 by Taylor polynomials
converges to the function only in the region 1 < x 1; outside of this region the higherdegree Taylor polynomials are worse approximations for the function. This is similar to
Runge's phenomenon.
The error incurred in approximating a function by its nth-degree Taylor polynomial, is called
the remainder or residual and is denoted by the function Rn(x). Taylor's theorem can be used
to obtain a bound on the size of the remainder.
Calculation of Taylor series
Several methods exist for the calculation of Taylor series of a large number of functions. One
can attempt to use the Taylor series as-is and generalize the form of the coefficients, or one
can use manipulations such as substitution, multiplication or division, addition or subtraction
of standard Taylor series to construct the Taylor series of a function, by virtue of Taylor series
being power series. In some cases, one can also derive the Taylor series by repeatedly
applying integration by parts. Particularly convenient is the use of computer algebra systems
to calculate Taylor series.
Taylor series in several variables
The Taylor series may also be generalized to functions of more than one variable with

For example, for a function that depends on two variables, x and y, the Taylor series to second
order about the point (a, b) is:

where the subscripts denote the respective partial derivatives.


A second-order Taylor series expansion of a scalar-valued function of more than one variable
can be written compactly as

134

where Df(a) is the gradient of f evaluated at x = a and D2f(a) is the Hessian matrix. Applying
the multi-index notation the Taylor series for several variables becomes

which is to be understood as a still more abbreviated multi-index version of the first equation
of this paragraph, again in full analogy to the single variable case.

135

136

Numerical ordinary differential equations


Numerical ordinary differential equations is the part of numerical analysis which studies the
numerical solution of ordinary differential equations (ODEs). This field is also known under
the name numerical integration, but some people reserve this term for the computation of
integrals.
Many differential equations cannot be solved analytically, in which case we have to satisfy
ourselves with an approximation to the solution. The algorithms studied here can be used to
compute such an approximation. An alternative method is to use techniques from calculus to
obtain a series expansion of the solution.
Ordinary differential equations occur in many scientific disciplines, for instance in mechanics,
chemistry, biology, and economics. In addition, some methods in numerical partial
differential equations convert the partial differential equation into an ordinary differential
equation, which must then be solved.
The problem
We want to approximate the solution of the differential equation

where f is a function that maps [t0,) Rd to Rd, and the initial condition y0 Rd is a given
vector.
The above formulation is called an initial value problem (IVP). The Picard-Lindelf theorem
states that there is a unique solution, if f is Lipschitz continuous. In contrast, boundary value
problems (BVPs) specify (components of) the solution y at more than one point. Different
methods need to be used to solve BVPs, for example the shooting method (and its variants) or
global methods like finite differences, galerkin methods, or collocation methods.
Note that we restrict ourselves to first-order differential equations (meaning that only the first
derivative of y appears in the equation, and no higher derivatives). However, a higher-order
equation can easily be converted to a system of first-order equations by introducing extra
variables. For example, the second-order equation y'' = y can be rewritten as two first-order
equations: y' = z and z' = y.
Methods
Two elementary methods are discussed to give the reader a feeling for the subject. After that,
pointers are provided to other methods (which are generally more accurate and efficient). The
methods mentioned here are analysed in the next section.
The Euler method
A brief explanation: From any point on a curve, you can find an approximation of a nearby
point on the curve by moving a short distance along a line tangent to the curve.

137

Rigorous development: Starting with the differential equation (1), we replace the derivative y'
by the finite difference approximation

which when re-arranged yields the following formula

and using (1) gives:

This formula is usually applied in the following way. We choose a step size h, and we
construct the sequence t0, t1 = t0 + h, t2 = t0 + 2h, We denote by yn a numerical estimate of
the exact solution y(tn). Motivated by (3), we compute these estimates by the following
recursive scheme

This is the Euler method (or forward Euler method, in contrast with the backward Euler
method, to be described below). The method is named after Leonhard Euler who described it
in 1768.
The Euler method is an example of an explicit method. This means that the new value yn+1 is
defined in terms of things that are already known, like yn.
The backward Euler method
If, instead of (2), we use the approximation

we get the backward Euler method:

The backward Euler method is an implicit method, meaning that we have to solve an equation
to find yn+1. One often uses fixed point iteration or (some modification of) the Newton
Raphson method to achieve this. Of course, it costs time to solve this equation; this cost must
be taken into consideration when one selects the method to use. The advantage of implicit
methods such as (6) is that they are usually more stable for solving a stiff equation, meaning
that a larger step size h can be used.

138

Generalizations
The Euler method is often not accurate enough. In more precise terms, it only has order one
(the concept of order is explained below). This caused mathematicians to look for higherorder methods.
One possibility is to use not only the previously computed value yn to determine yn+1, but to
make the solution depend on more past values. This yields a so-called multistep method.
Perhaps the simplest is the Leapfrog method which is second order and (roughly speaking)
relies on two time values.
Almost all practical multistep methods fall within the family of linear multistep methods,
which have the form

Another possibility is to use more points in the interval [tn,tn+1]. This leads to the family of
RungeKutta methods, named after Carl Runge and Martin Kutta. One of their fourth-order
methods is especially popular.
Both ideas can also be combined. The resulting methods are called general linear methods.
Advanced features
A good implementation of one of these methods for solving an ODE entails more than the
time-stepping formula.
It is often inefficient to use the same step size all the time, so variable step-size methods have
been developed. Usually, the step size is chosen such that the (local) error per step is below
some tolerance level. This means that the methods must also compute an error indicator, an
estimate of the local error.
An extension of this idea is to choose dynamically between different methods of different
orders (this is called a variable order method). Methods based on Richardson extrapolation,
such as the BulirschStoer algorithm, are often used to construct various methods of different
orders.
Other desirable features include:

dense output: cheap numerical approximations for the whole integration interval, and
not only at the points t0, t1, t2, ...
event location: finding the times where, say, a particular function vanishes.
support for parallel computing.
when used for integrating with respect to time, time reversibility.

Alternative methods
Many methods do not fall within the framework discussed here. Some classes of alternative
methods are:
139

multiderivative methods, which use not only the function f but also its derivatives. This
class includes HermiteObreschkoff methods and Fehlberg methods, as well as
methods like the ParkerSochacki method, which compute the coefficients of the
Taylor series of the solution y recursively.
methods for second order ODEs. We said that all higher-order ODEs can be
transformed to first-order ODEs of the form (1). While this is certainly true, it may not
be the best way to proceed. In particular, Nystrm methods work directly with secondorder equations.
geometric integration methods are especially designed for special classes of ODEs
(e.g., symplectic integrators for the solution of Hamiltonian equations). They take care
that the numerical solution respects the underlying structure or geometry of these
classes.

Analysis
Numerical analysis is not only the design of numerical methods, but also their analysis. Three
central concepts in this analysis are:

convergence: whether the method approximates the solution,


order: how well it approximates the solution, and
stability: whether errors are damped out.

Convergence
A numerical method is said to be convergent if the numerical solution approaches the exact
solution as the step size h goes to 0. More precisely, we require that for every ODE (1) with a
Lipschitz function f and every t* > 0,

All the methods mentioned above are convergent. In fact, convergence is a condition sine qua
non for any numerical scheme.
Consistency and order
Suppose the numerical method is

The local error of the method is the error committed by one step of the method. That is, it is
the difference between the result given by the method, assuming that no error was made in
earlier steps, and the exact solution:

The method is said to be consistent if

140

The method has order p if

Hence a method is consistent if it has an order greater than 0. The (forward) Euler method (4)
and the backward Euler method (6) introduced above both have order 1, so they are
consistent. Most methods being used in practice attain higher order. Consistency is a
necessary condition for convergence, but not sufficient; for a method to be convergent, it must
be both consistent and zero-stable.
A related concept is the global error, the error sustained in all the steps one needs to reach a
fixed time t. Explicitly, the global error at time t is yN y(t) where N = (tt0)/h. The global
error of a pth order one-step method is O(hp); in particular, such a method is convergent. This
statement is not necessarily true for multi-step methods.
Stability and stiffness
For some differential equations, application of standard methods such as the Euler method,
explicit RungeKutta methods, or multistep methods (e.g., AdamsBashforth methods)
exhibit instability in the solutions, though other methods may produce stable solutions. This
"difficult behaviour" in the equation (which may not necessarily be complex itself) is
described as stiffness, and is often caused by the presence of different time scales in the
underlying problem. Stiff problems are ubiquitous in chemical kinetics, control theory, solid
mechanics, weather prediction, biology, and electronics.

Initial value problem


In mathematics, in the field of differential equations, an initial value problem is an ordinary
differential equation together with specified value, called the initial condition, of the unknown
function at a given point in the domain of the solution. In physics or other sciences, modeling
a system frequently amounts to solving an initial value problem; in this context, the
differential equation is an evolution equation specifying how, given initial conditions, the
system will evolve with time.
Definition
An initial value problem is a differential equation

together with a point in the domain of

called the initial condition.

141

A solution to an initial value problem is a function y that is a solution to the differential


equation and satisfies

This statement subsumes problems of higher order, by interpreting y as a vector. For


derivatives of second or higher order, new variables (elements of the vector y) are introduced.
More generally, the unknown function y can take values on infinite dimensional spaces, such
as Banach spaces or spaces of distributions.
Existence and uniqueness of solutions
For a large class of initial value problems, the existence and uniqueness of a solution can be
demonstrated.
The PicardLindelf theorem guarantees a unique solution on some interval containing t0 if
and its partial derivative / y are continuous on a region containing t0 and y0. The proof of
this theorem proceeds by reformulating the problem as an equivalent integral equation. The
integral can be considered an operator which maps one function into another, such that the
solution is a fixed point of the operator. The Banach fixed point theorem is then invoked to
show that there exists a unique fixed point, which is the solution of the initial value problem.
An older proof of the PicardLindelf theorem constructs a sequence of functions which
converge to the solution of the integral equation, and thus, the solution of the initial value
problem. Such a construction is sometimes called "Picard's method" or "the method of
successive approximations". This version is essentially a special case of the Banach fixed
point theorem.
Hiroshi Okamura obtained a necessary and sufficient condition for the solution of an initial
value problem to be unique. This condition has to do with the existence of a Lyapunov
function for the system.
In some situations, the function is not of class C1, or even Lipschitz, so the usual result
guaranteeing the local existence of a unique solution does not apply. The Peano existence
theorem however proves that even for merely continuous, solutions are guaranteed to exist
locally in time; the problem is that there is no guarantee of uniqueness. An even more general
result is the Carathodory existence theorem, which proves existence for some discontinuous
functions .

Boundary value problem


In mathematics, in the field of differential equations, a boundary value problem is a
differential equation together with a set of additional restraints, called the boundary
conditions. A solution to a boundary value problem is a solution to the differential equation
which also satisfies the boundary conditions.
Boundary value problems arise in several branches of physics as any physical differential
equation will have them. Problems involving the wave equation, such as the determination of
normal modes, are often stated as boundary value problems. A large class of important
142

boundary value problems are the Sturm-Liouville problems. The analysis of these problems
involves the eigenfunctions of a differential operator.
To be useful in applications, a boundary value problem should be well posed. This means that
given the input to the problem there exists a unique solution, which depends continuously on
the input. Much theoretical work in the field of partial differential equations is devoted to
proving that boundary value problems arising from scientific and engineering applications are
in fact well-posed.
Among the earliest boundary value problems to be studied is the Dirichlet problem, of finding
the harmonic functions (solutions to Laplace's equation); the solution was given by the
Dirichlet's principle.

Shows a region where a differential equation is valid and the associated boundary values.
Initial value problem
A more mathematical way to picture the difference between an initial value problem and a
boundary value problem is that an initial value problem has all of the conditions specified at
the same value of the independent variable in the equation (and that value is at the lower
boundary of the domain, thus the term "initial" value). On the other hand, a boundary value
problem has conditions specified at the extremes of the independent variable. For example, if
the independent variable is time over the domain [0,1], an initial value problem would specify
a value of y(t) and y'(t) at time t = 0, while a boundary value problem would specify values for
y(t) at both t = 0 and t = 1.
If the problem is dependent on both space and time, then instead of specifying the value of the
problem at a given point for all time the data could be given at a given time for all space. For
example, the temperature of an iron bar with one end kept at absolute zero and the other end
at the freezing point of water would be a boundary value problem.
Types of boundary value problems
If the boundary gives a value to the normal derivative of the problem then it is a Neumann
boundary condition. For example, if there is a heater at one end of an iron rod, then energy
would be added at a constant rate but the actual temperature would not be known.
If the boundary gives a value to the problem then it is a Dirichlet boundary condition. For
example, if one end of an iron rod is held at absolute zero, then the value of the problem
would be known at that point in space.

143

If the boundary has the form of a curve or surface that gives a value to the normal derivative
and the problem itself then it is a Cauchy boundary condition.
Aside from the boundary condition, boundary value problems are also classified according to
the type of differential operator involved. For an elliptic operator, one discusses elliptic
boundary value problems. For an hyperbolic operator, one discusses hyperbolic boundary
value problems. These categories are further subdivided into linear and various nonlinear
types.

The boundary value problem for an idealised 2D rod.

Euler method
In mathematics and computational science, the Euler method, named after Leonhard Euler, is
a first-order numerical procedure for solving ordinary differential equations (ODEs) with a
given initial value. It is the most basic kind of explicit method for numerical integration for
ordinary differential equations.

Illustration of the Euler method. The unknown curve is in blue, and its polygonal
approximation is in red.
Informal geometrical description
Consider the problem of calculating the shape of an unknown curve which starts at a given
point and satisfies a given differential equation. Here, a differential equation can be thought of
as a formula by which the slope of the tangent line to the curve can be computed at any point
on the curve, once the position of that point has been calculated.

144

The idea is that while the curve is initially unknown, its starting point, which we denote by A0,
is known (see the picture above). Then, from the differential equation, the slope to the curve
at A0 can be computed, and so, the tangent line.
Take a small step along that tangent line up to a point A1. If we pretend that A1 is still on the
curve, the same reasoning as for the point A0 above can be used. After several steps, a
is computed. In general, this curve does not diverge too
polygonal curve
far from the original unknown curve, and the error between the two curves can be made small
if the step size is small enough and the interval of computation is finite (although things are
more complicated for stiff equations, as discussed below).
Derivation
We want to approximate the solution of the initial value problem

by using the first two terms of the Taylor expansion of y, which represents the linear
approximation around the point (t0,y(t0)) . One step of the Euler method from tn to tn+1 = tn + h
is

The Euler method is explicit, i.e. the solution yn + 1 is an explicit function of yi for

While the Euler method integrates a first order ODE, any ODE of order N can be represented
as a first-order ODE in more than one variable by introducing N 1 further variables, y', y'',
..., y(N), and formulating N first order equations in these new variables. The Euler method can
be applied to the vector
order system.

to integrate the higher-

Error
The balls magnitude of the errors arising from the Euler method can be demonstrated by
comparison with a Taylor expansion of y. If we assume that f(t) and y(t) are known exactly at
a time t0, then the Euler method gives the approximate solution at time t0 + h as:

(the second equality holds because y satisfies the differential equation y' = f(t,y)). In
comparison, the Taylor expansion in h about t0 gives:

The error introduced by the Euler method is given by the difference between these equations:

145

For small h, the dominant error per step is proportional to h2. To solve the problem over a
given range of t, the number of steps needed is proportional to 1 / h so it is to be expected that
the total error at the end of the fixed time will be proportional to h (error per step times
number of steps). For this reason, the Euler method is said to be first order. This makes the
Euler method less accurate (for small h) than other higher-order techniques such as RungeKutta methods and linear multistep methods.
The Euler method can also be numerically unstable, especially for stiff equations. This
limitation along with its slow convergence of error with h means that the Euler method
is not often used, except as a simple example of numerical integration.

Illustration of numerical integration for the equation y' = y,y(0) = 1. Blue: the Euler method,
green: the midpoint method, red: the exact solution, y = et. The step size is h = 1.0.

The same illustration for h = 0.25. It is seen that the midpoint method converges faster than
the Euler method.

146

Linear multistep method


Linear multistep methods are used for the numerical solution of ordinary differential
equations. Conceptually, a numerical method starts from an initial point and then takes a short
step forward in time to find the next solution point. The process continues with subsequent
steps to map out the solution. Single-step methods (such as Euler's method) refer to only one
previous point and its derivative to determine the current value. Methods such as Runge-Kutta
take some intermediate steps (for example, a half-step) to obtain a higher order method, but
then discard all previous information before taking a second step. Multistep methods attempt
to gain efficiency by keeping and using the information from previous steps rather than
discarding it. Consequently, multistep methods refer to several previous points and derivative
values. In the case of linear multistep methods, a linear combination of the previous points
and derivative values is used.
Definitions
Numerical methods for ordinary differential equations approximate solutions to initial value
problems of the form

The result is approximations for the value of y(t) at discrete times ti:
ti = t0 + ih
yi = y(ti) = y(t0 + ih)
fi = f(ti,yi)
where h is the time step (sometimes referred to as t).
A linear multistep method uses a linear combination of yi and yi' to calculate the value of y for
the desired current step.
Multistep method will use the previous s steps to calculate the next value. Consequently, the
desired value at the current processing stage is yn + s.
A linear multistep method is a method of the form

where h denotes the step size and f the right-hand side of the differential equation. The
coefficents
and
determine the method. The designer of the method
chooses the coefficients; often, many coefficients are zero. Typically, the designer chooses the
coefficients so they will exactly interpolate y(t) when it is an nth order polynomial.
If the value of bs is nonzero, then the value of yn + s depends on the value of f(tn + s,yn + s).
Consequently, the method is explicit if bs = 0. In that case, the formula can directly compute
yn + s. If
then the method is implicit and the equation for yn + s must be solved. Iterative
methods such as Newton's method are often used to solve the implicit formula.
147

Sometimes an explicit multistep method is used to "predict" the value of yn + s. That value is
then used in an implicit formula to "correct" the value. The result is a Predictor-corrector
method.
Multistep method families
Three families of linear multistep methods are commonly used: AdamsBashforth methods,
AdamsMoulton methods, and the backward differentiation formulas (BDFs).
AdamsBashforth methods
The AdamsBashforth methods are explicit methods. The coefficients are as 1 = 1 and
, while the bj are chosen such that the methods has order s (this
determines the methods uniquely).
The AdamsBashforth methods with s = 1, 2, 3, 4, 5 are:

this is simply the Euler method;

The coefficients bj can be determined as follows. Use polynomial interpolation to find the
polynomial p of degree s 1 such that

The Lagrange formula for polynomial interpolation yields

The polynomial p is locally a good approximation of the right-hand side of the differential
equation y' = f(t,y) that is to be solved, so consider the equation y' = p(t) instead. This equation
can be solved exactly; the solution is simply the integral of p. This suggests taking

The AdamsBashforth method arises when the formula for p is substituted. The coefficients bj
turn out to be given by

148

Replacing f(t, y) by its interpolant p incurs an error of order hs, and it follows that the s-step
AdamsBashforth method has indeed order s.
The AdamsBashforth methods were designed by John Couch Adams to solve a differential
equation modelling capillary action due to Francis Bashforth. Bashforth (1883) published his
theory and Adams' numerical method.
AdamsMoulton methods
The AdamsMoulton methods are similar to the AdamsBashforth methods in that they also
. Again the b coefficients are chosen to obtain
have as 1 = 1 and
the highest order possible. However, the AdamsMoulton methods are implicit methods. By
removing the restriction that bs = 0, an s-step AdamsMoulton method can reach order s + 1,
while an s-step AdamsBashforth methods has only order s.
The AdamsMoulton methods with s = 0, 1, 2, 3, 4 are:

this is the backward Euler method;


this is the trapezoidal rule;

The derivation of the AdamsMoulton methods is similar to that of the AdamsBashforth


method; however, the interpolating polynomial uses not only the points tn1, tns, as above,
but also tn. The coefficients are given by

The AdamsMoulton methods are solely due to John Couch Adams, like the Adams
Bashforth methods. The name of Forest Ray Moulton became associated with these methods
because he realized that they could be used in tandem with the AdamsBashforth methods as
a predictorcorrector pair (Moulton 1926); Milne (1926) had the same idea. Adams used
Newton's method to solve the implicit equation.
Analysis
The central concepts in the analysis of linear multistep methods, and indeed any numerical
method for differential equations, are convergence, order, and stability.

149

The first question is whether the method is consistent: is the difference equation

a good approximation of the differential equation y' = f(t,y)? More precisely, a multistep
method is consistent if the local error goes to zero as the step size h goes to zero, where the
local error is defined to be the difference between the result yn + s of the method, assuming
that all the previous values
are exact, and the exact solution of the equation
at time tn + s. A computation using Taylor series shows out that a linear multistep method is
consistent if and only if

All the methods mentioned above are consistent.


If the method is consistent, then the next question is how well the difference equation defining
the numerical method approximates the differential equation. A multistep method is said to
have order p if the local error is of order O(hp + 1) as h goes to zero. This is equivalent to the
following condition on the coefficients of the methods:

The s-step AdamsBashforth method has order s, while the s-step AdamsMoulton method
has order s + 1.
These conditions are often formulated using the characteristic polynomials

In terms of these polynomials, the above condition for the method to have order p becomes

In particular, the method is consistent if it has order one, which is the case if (1) = 0 and '(1)
= (1).
If the roots of the characteristic polynomial all have modulus less than or equal to 1 and the
roots of modulus 1 are of multiplicity 1, we say that the root condition is satisfied. The
method is convergent if and only if it is consistent and the root condition is satisfied.
Consequently, a consistent method is stable if and only if this condition is satisfied, and thus
the method is convergent if and only if it is stable.

150

Furthermore, if the method is stable, the method is said to be strongly stable if z = 1 is the
only root of modulus 1. If it is stable and all roots of modulus 1 are not repeated, but there is
more than one such root, it is said to be relatively stable. Note that 1 must be a root; thus
stable methods are always one of these two.

RungeKutta methods
In numerical analysis, the RungeKutta methods are an important family of implicit and
explicit iterative methods for the approximation of solutions of ordinary differential
equations. These techniques were developed around 1900 by the German mathematicians C.
Runge and M.W. Kutta.
The common fourth-order RungeKutta method
One member of the family of RungeKutta methods is so commonly used that it is often
referred to as "RK4", "classical Runge-Kutta method" or simply as "the RungeKutta
method".
Let an initial value problem be specified as follows.

Then, the RK4 method for this problem is given by the following equations:

where yn + 1 is the RK4 approximation of y(tn + 1), and

Thus, the next value (yn + 1) is determined by the present value (yn) plus the product of the size
of the interval (h) and an estimated slope. The slope is a weighted average of slopes:

k1 is the slope at the beginning of the interval;


k2 is the slope at the midpoint of the interval, using slope k1 to determine the value of y
at the point tn + h / 2 using Euler's method;
k3 is again the slope at the midpoint, but now using the slope k2 to determine the yvalue;
k4 is the slope at the end of the interval, with its y-value determined using k3.

In averaging the four slopes, greater weight is given to the slopes at the midpoint:

151

The RK4 method is a fourth-order method, meaning that the error per step is on the order of
h5, while the total accumulated error has order h4.
Note that the above formulae are valid for both scalar- and vector-valued functions (i.e., y can
be a vector and f an operator). For example one can integrate Schrdinger's equation using the
Hamiltonian operator as function f.
Explicit RungeKutta methods
The family of explicit RungeKutta methods is a generalization of the RK4 method
mentioned above. It is given by

where

(Note: the above equations have different but equivalent definitions in different texts).
To specify a particular method, one needs to provide the integer s (the number of stages), and
the coefficients aij (for 1 j < i s), bi (for i = 1, 2, ..., s) and ci (for i = 2, 3, ..., s). These data
are usually arranged in a mnemonic device, known as a Butcher tableau (after John C.
Butcher):
0
c2 a21
c3 a31 a32
cs as1 as2
b1 b2

as,s 1
bs 1 bs

The RungeKutta method is consistent if

There are also accompanying requirements if we require the method to have a certain order p,
meaning that the truncation error is O(hp+1). These can be derived from the definition of the
truncation error itself. For example, a 2-stage method has order 2 if b1 + b2 = 1, b2c2 = 1/2,
and b2a21 = 1/2.

152

Adaptive RungeKutta methods


The adaptive methods are designed to produce an estimate of the local truncation error of a
single RungeKutta step. This is done by having two methods in the tableau, one with order p
and one with order p 1.
The lower-order step is given by

where the ki are the same as for the higher order method. Then the error is

which is O(hp). The Butcher Tableau for this kind of method is extended to give the values of
:
0
c2 a21
c3 a31 a32
cs as1 as2
b1 b2

as,s 1
bs 1 bs

The RungeKuttaFehlberg method has two methods of orders 5 and 4. Its extended Butcher
Tableau is:
0
1/4

1/4

3/8

3/32

9/32

12/13 1932/2197 7200/2197 7296/2197


1

439/216

3680/513

-845/4104

1/2

8/27

3544/2565 1859/4104

16/135

6656/12825 28561/56430 9/50 2/55

25/216

1408/2565 2197/4104

11/40
1/5

However, the simplest adaptive RungeKutta method involves combining the Heun method,
which is order 2, with the Euler method, which is order 1. Its extended Butcher Tableau is:

153

0
1 1
1/2 1/2
1

The error estimate is used to control the stepsize. Other adaptive RungeKutta methods are
the BogackiShampine method (orders 3 and 2), the CashKarp method and the Dormand
Prince method (both with orders 5 and 4).
Implicit RungeKutta methods
The implicit methods are more general than the explicit ones. The distinction shows up in the
Butcher Tableau: for an implicit method, the coefficient matrix aij is not necessarily lower
triangular:

The approximate solution to the initial value problem reflects the greater number of
coefficients:

Due to the fullness of the matrix aij, the evaluation of each ki is now considerably involved
and dependent on the specific function f(t,y). Despite the difficulties, implicit methods are of
great importance due to their high (possibly unconditional) stability, which is especially
important in the solution of partial differential equations. The simplest example of an implicit
RungeKutta method is the backward Euler method:

The Butcher Tableau for this is simply:

It can be difficult to make sense of even this simple implicit method, as seen from the
expression for k1:
154

In this case, the awkward expression above can be simplified by noting that

so that

from which

follows. Though simpler then the "raw" representation before manipulation, this is an implicit
relation so that the actual solution is problem dependent. Multistep implicit methods have
been used with success by some researchers. The combination of stability, higher order
accuracy with fewer steps, and stepping that depends only on the previous value makes them
attractive; however the complicated problem-specific implementation and the fact that ki must
often be approximated iteratively means that they are not common.

RungeKuttaFehlberg method
In mathematics, the RungeKuttaFehlberg method (or Fehlberg method) is an algorithm of
numerical analysis for the numerical solution of ordinary differential equations. It was
developed by the German mathematician Erwin Fehlberg and is based on the class of Runge
Kutta methods. The RungeKuttaFehlberg method uses an O(h4) method together with an
O(h5) method that uses all of the points of the O(h4) method, and hence is often referred to as
an RKF45 method. Similar schemes with different orders have since been developed. By
performing one extra calculation that would be required for an RK5 method, the error in the
solution can be estimated and controlled and an appropriate step size can be determined
automatically, making this method efficient for ordinary problems of automated numerical
integration of ordinary differential equations.
The Butcher tableau is:
0
1/4

1/4

3/8

3/32

9/32

12/13 1932/2197 7200/2197 7296/2197


1

439/216

3680/513

845/4104

1/2

-8/27

3544/2565 1859/4104

11/40

25/216

1408/2565 2197/4104

1/5

16/135

6656/12825 28561/56430 9/50 2/55

The first row of coefficients gives the fourth-order accurate method, and the second row gives
the fifth-order accurate method.
155

BulirschStoer algorithm
In numerical analysis, the BulirschStoer algorithm is a method for the numerical solution of
ordinary differential equations which combines three powerful ideas, Richardson
extrapolation, the use of rational function extrapolation in Richardson-type applications, and
the modified midpoint method, to obtain numerical solutions to ordinary differential equations
(ODEs) with high accuracy and comparatively little computational effort. It is named after
Roland Bulirsch and Josef Stoer. It is sometimes called the GraggBulirschStoer (GBS)
algorithm because of the importance of a result about the error function of the modified
midpoint method, due to William B. Gragg.
Underlying ideas
The idea of Richardson extrapolation is to consider a numerical calculation whose accuracy
depends on the used stepsize h as an (unknown) analytic function of the stepsize h,
performing the numerical calculation with various values of h, fitting a (chosen) analytic
function to the resulting points, and then evaluating the fitting function for h = 0, thus trying
to approximate the result of the calculation with infinitely fine steps.
Bulirsch and Stoer recognized that using rational functions as fitting functions for Richardson
extrapolation in numerical integration is superior to using polynomial functions because
rational functions are able to approximate functions with poles rather well (compared to
polynomial functions), given that there are enough higher-power terms in the denominator to
account for nearby poles. While a polynomial interpolation or extrapolation only yields good
results if the nearest pole is rather far outside a circle around the known data points in the
complex plane, rational function interpolation or extrapolation can have remarkable accuracy
even in the presence of nearby poles.
The modified midpoint method by itself is a second-order method, and therefore generally
inferior to fourth-order methods like the fourth-order RungeKutta method. However, it has
the advantage of requiring only one derivative evaluation per substep (asymptotically for a
large number of substeps), and, in addition, as discovered by Gragg, the error of a modified
midpoint step of size H, consisting of n substeps of size h = H/n each, and expressed as a
power series in h, contains only even powers of h. This makes the modified midpoint method
extremely useful to the BulirschStoer method as the accuracy increases two orders at a time
when the results of separate attempts to cross the interval H with increasing numbers of
substeps are combined.

Shooting method
In numerical analysis, the shooting method is a method for solving a boundary value problem
by reducing it to the solution of an initial value problem. The following exposition may be
clarified by this illustration of the shooting method.
For a boundary value problem of a second-order ordinary differential equation, the method is
stated as follows. Let

be the boundary value problem. Let y(t; a) denote the solution of the initial value problem
156

Define the function F(a) as the difference between y(t1; a) and the specified boundary value
y1.

If the boundary value problem has a solution, then F has a root, and that root is just the value
of y'(t0) which yields a solution y(t) of the boundary value problem.
The usual methods for finding roots may be employed here, such as the bisection method or
Newton's method.
Linear shooting method
The boundary value problem is linear if f has the form

In this case, the solution to the boundary value problem is usually given by:

where y(1)(t) is the solution to the initial value problem:

and y(2)(t) is the solution to the initial value problem:

Galerkin method
In mathematics, in the area of numerical analysis, Galerkin methods are a class of methods for
converting a continuous operator problem (such as a differential equation) to a discrete
problem. In principle, it is the equivalent of applying the method of variation to a function
space, by converting the equation to a weak formulation. Typically one then applies some
constraints on the functions space to characterize the space with a finite set of basis functions.
Often when using a Galerkin method one also gives the name along with typical
approximation methods used, such as Petrov-Galerkin method or Ritz-Galerkin method.
The approach is credited to the Russian mathematician Boris Galerkin.
Examples for Galerkin methods are:

The finite element method


Boundary element method for solving integral equations
157

Krylov subspace methods

Introduction with an abstract problem


A problem in weak formulation
Let us introduce Galerkin's method with an abstract problem posed as a weak formulation on
a Hilbert space, V, namely,
find

such that for all

, a(u,v) = f(v).

is a bilinear form (the exact requirements on


Here,
is a bounded linear operator on V.

will be specified later) and f

Galerkin discretization
Choose a subspace

, dimension n and solve the projected problem:

Find

such that for all

, a(un,vn) = f(vn).

We call this the Galerkin equation. Notice that the equation has remained unchanged and only
the spaces have changed.
Galerkin orthogonality
The key property of the Galerkin approach is that the error is orthogonal to the chosen
, we can use vn as a test vector in the original equation. Subtracting
subspaces. Since
the two, we get the Galerkin orthogonality relation for the error, en = u un which is the error
between the solution of the original problem, u, and the solution of the Galerkin equation, un
a(en,vn) = a(u,vn) a(un,vn) = f(vn) f(vn) = 0.
Matrix form
Since the aim of Galerkin's method is the production of a linear system of equations, we build
its matrix form, which can be used to compute the solution by a computer program.
Let
be a basis for Vn. Then, it is sufficient to use these in turn for testing the
such that
Galerkin equation, i.e.: find

We expand un in respect to this basis,


to obtain

and insert it into the equation above,

158

This previous equation is actually a linear system of equations Au = f, where

Symmetry of the matrix


Due to the definition of the matrix entries, the matrix of the Galerkin equation is symmetric if
is symmetric.

and only if the bilinear form


Analysis of Galerkin methods

Here, we will restrict ourselves to symmetric bilinear forms, that is


a(u,v) = a(v,u).
While this is not really a restriction of Galerkin methods, the application of the standard
theory becomes much simpler. Furthermore, a Petrov-Galerkin method may be required in the
nonsymmetric case.
The analysis of these methods proceeds in two steps. First, we will show that the Galerkin
equation is a well-posed problem in the sense of Hadamard and therefore admits a unique
solution. In the second step, we study the quality of approximation of the Galerkin solution un.
The analysis will mostly rest on two properties of the bilinear form, namely

Boundedness: for all

holds
for some constant C > 0

Ellipticity: for all

holds
for some constant c > 0

By the Lax-Milgram theorem, these two conditions imply well-posedness of the original
problem in weak formulation. All norms in the following sections will be norms for which the
above inequalities hold (these norms are often called energy norm).
Well-posedness of the Galerkin equation
Since
, boundedness and ellipticity of the bilinear form apply to Vn. Therefore, the
well-posedness of the Galerkin problem is actually inherited from the well-posedness of the
original problem.

159

Quasi-best approximation (Ca's lemma)


The error en = u un between the original and the Galerkin solution admits the estimate

This means, that up to the constant C / c, the Galerkin solution un is as close to the original
solution u as any other vector in Vn. In particular, it will be sufficient to study approximation
by spaces Vn, completely forgetting about the equation being solved.
Proof
Since the proof is very simple and the basic principle behind all Galerkin methods, we include
it here: by ellipticity and boundedness of the bilinear form (inequalities) and Galerkin
orthogonality (equals sign in the middle), we have for arbitrary
:

Dividing by

and taking the infimum over all possible vn yields the lemma.

Collocation method
In mathematics, a collocation method is a method for the numerical solution of ordinary
differential equations, partial differential equations and integral equations. The idea is to
choose a finite-dimensional space of candidate solutions (usually, polynomials up to a certain
degree) and a number of points in the domain (called collocation points), and to select that
solution which satisfies the given equation at the collocation points.
Ordinary differential equations
Suppose that the ordinary differential equation

is to be solved over the interval [t0, t0 + h]. Denote the collocation points by 0 c1< c2< <
cn 1.
The corresponding (polynomial) collocation method approximates the solution y by the
polynomial p of degree n which satisfies the initial condition p(t0) = y0, and the differential
equation p'(t) = f(t,p(t)) at all points t = t0 + ckh where k = 1, , n. This gives n + 1 conditions,
which matches the n + 1 parameters needed to specify a polynomial of degree n.
All these collocation methods are in fact implicit RungeKutta methods. However, not all
RungeKutta methods are collocation methods.

160

Numerical partial differential equations


Numerical partial differential equations is the branch of numerical analysis that studies the
numerical solution of partial differential equations (PDEs).
Numerical techniques for solving PDEs include the following:

The finite difference method, in which functions are represented by their values at
certain grid points and derivatives are approximated through differences in these
values.
The method of lines, where all but one variable is discretized. The result is a system of
ODEs in the remaining continuous variable.
The finite element method, where functions are represented in terms of basis functions
and the PDE is solved in its integral (weak) form.
The finite volume method, which divides space into regions or volumes and computes
the change within each volume by considering the flux (flow rate) across the surfaces
of the volume.
The spectral method, which represents functions as a sum of particular basis functions,
for example using a Fourier series.
Meshfree methods don't need a grid to work and so may be better suited for some
problems. However the computational effort is usually higher.
Domain decomposition methods solve a boundary value problems by splitting them
into smaller boundary value problems on subdomains and iterating to coordinate the
solution between the subdomains.
Multigrid methods solve differential equations using a hierarchy of discretizations.

The finite difference method is often regarded as the simplest method to learn and use. The
finite element and finite volume methods are widely used in engineering and in computational
fluid dynamics, and are well suited to problems in complicated geometries. Spectral methods
are generally the most accurate, provided that the solutions are sufficiently smooth.

Finite difference method


In mathematics, finite-difference methods are numerical methods for approximating the
solutions to differential equations using finite difference equations to approximate derivatives.
Intuitive derivation
Finite-difference methods approximate the solutions to differential equations by replacing
derivative expressions with approximately equivalent difference quotients. That is, because
the first derivative of a function f is, by definition,

then a reasonable approximation for that derivative would be to take

161

for some small value of h. In fact, this is the forward difference equation for the first
derivative. Using this and similar formulae to replace derivative expressions in differential
equations, one can approximate their solutions without the need for calculus.
Derivation from Taylor's polynomial
Assuming the function whose derivatives are to be approximated is properly-behaved, by
Taylor's theorem,

where n! denotes the factorial of n, and Rn(x) is a remainder term, denoting the difference
between the Taylor polynomial of degree n and the original function. Again using the first
derivative of the function f as an example, by Taylor's theorem,
f(x0 + h) = f(x0) + f'(x0)h + R1(x),
which, with some minor algebraic manipulation, is equivalent to

so that for R1(x) sufficiently small,

Accuracy and order


The error in a method's solution is defined as the difference between its approximation and
the exact analytical solution. The two sources of error in finite difference methods are roundoff error, the loss of precision due to computer rounding of decimal quantities, and truncation
error or discretization error, the difference between the exact solution of the finite difference
equation and the exact quantity assuming perfect arithmetic (that is, assuming no round-off).
To use a finite difference method to attempt to solve (or, more generally, approximate the
solution to) a problem, one must first discretize the problem's domain. This is usually done by
dividing the domain into a uniform grid (see image below). Note that this means that finitedifference methods produce sets of discrete numerical approximations to the derivative, often
in a "time-stepping" manner.

162

The finite difference method relies on discretizing a function on a grid.


An expression of general interest is the local truncation error of a method. Typically
expressed using Big-O notation, local truncation error refers to the error from a single
application of a method. That is, it is the quantity f'(xi) f'i if f'(xi) refers to the exact value and
f'i to the numerical approximation. The remainder term of a Taylor polynomial is convenient
for analyzing the local truncation error. Using the Lagrange form of the remainder from the
Taylor polynomial for f(x0 + h), which is

, where x0 < < x0 + h,


the dominant term of the local truncation error can be discovered. For example, again using
the forward-difference formula for the first derivative, knowing that f(xi) = f(x0 + ih),

and with some algebraic manipulation, this leads to

and further noting that the quantity on the left is the approximation from the finite difference
method and that the quantity on the right is the exact quantity of interest plus a remainder,
clearly that remainder is the local truncation error. A final expression of this example and its
order is:

This means that, in this case, the local truncation error is proportional to the step size.

163

Method of lines
The method of lines (MOL, NMOL, NUMOL) is a technique for solving partial differential
equations (PDEs) in which all but one dimension is discretized. MOL allows standard,
general-purpose methods and software, developed for the numerical integration of ODEs and
DAEs, to be used. A large number of integration routines have been developed over the years
in many different programming languages, and some have been published as open source
resources.
The method of lines most often refers to the construction or analysis of numerical methods for
partial differential equations that proceeds by first discretizing the spatial derivatives only and
leaving the time variable continuous. This leads to a system of ordinary differential equations
to which a numerical method for initial value ordinary equations can be applied. The method
of lines in this context dates back to at least the early 1960s Sarmin and Chudov. Many papers
discussing the accuracy and stability of the method of lines for various types of partial
differential equations have appeared since. W. E. Schiesser of Lehigh University is one of the
major proponents of the method of lines, having published widely in this field.
Application to elliptical equations
MOL requires that the PDE problem is well-posed as an initial value (Cauchy) problem in at
least one dimension, because ODE and DAE integrators are initial value problem (IVP)
solvers.
Thus it cannot be used directly on purely elliptic equations, such as Laplace's equation.
However, MOL has been used to solve Laplace's equation by using the method of false
transients. In this method, a time derivative of the dependent variable is added to Laplaces
equation. Finite differences are then used to approximate the spatial derivatives, and the
resulting system of equations is solved by MOL. It is also possible to solve elliptical problems
by a semi-analytical method of lines. In this method the discretization process results in a set
of ODE's that are solved by exploiting properties of the associated exponential matrix.

Finite element method


The finite element method (FEM) (sometimes referred to as finite element analysis (FEA)) is
a numerical technique for finding approximate solutions of partial differential equations
(PDE) as well as of integral equations. The solution approach is based either on eliminating
the differential equation completely (steady state problems), or rendering the PDE into an
approximating system of ordinary differential equations, which are then numerically
integrated using standard techniques such as Euler's method, Runge-Kutta, etc.
In solving partial differential equations, the primary challenge is to create an equation that
approximates the equation to be studied, but is numerically stable, meaning that errors in the
input data and intermediate calculations do not accumulate and cause the resulting output to
be meaningless. There are many ways of doing this, all with advantages and disadvantages.
The Finite Element Method is a good choice for solving partial differential equations over
complicated domains (like cars and oil pipelines), when the domain changes (as during a solid
state reaction with a moving boundary), when the desired precision varies over the entire
domain, or when the solution lacks smoothness. For instance, in a frontal crash simulation it is
possible to increase prediction accuracy in "important" areas like the front of the car and
164

reduce it in its rear (thus reducing cost of the simulation); Another example would be the
simulation of the weather pattern on Earth, where it is more important to have accurate
predictions over land than over the wide-open sea.
Technical discussion
We will illustrate the finite element method using two sample problems from which the
general method can be extrapolated. It is assumed that the reader is familiar with calculus and
linear algebra.
P1 is a one-dimensional problem

where f is given and u is an unknown function of x, and u'' is the second derivative of u with
respect to x.
The two-dimensional sample problem is the Dirichlet problem

is "nice" (e.g., a
where is a connected open region in the (x,y) plane whose boundary
smooth manifold or a polygon), and uxx and uyy denote the second derivatives with respect to x
and y, respectively.
The problem P1 can be solved "directly" by computing antiderivatives. However, this method
of solving the boundary value problem works only when there is only one spatial dimension
and does not generalize to higher-dimensional problems or to problems like u + u'' = f. For
this reason, we will develop the finite element method for P1 and outline its generalization to
P2.
Our explanation will proceed in two steps, which mirror two essential steps one must take to
solve a boundary value problem (BVP) using the FEM.

In the first step, one rephrases the original BVP in its weak, or variational form. Little
to no computation is usually required for this step. The transformation is done by hand
on paper.
The second step is the discretization, where the weak form is discretized in a finite
dimensional space.

After this second step, we have concrete formulae for a large but finite dimensional linear
problem whose solution will approximately solve the original BVP. This finite dimensional
problem is then implemented on a computer.

165

Variational formulation
The first step is to convert P1 and P2 into their variational equivalents, or Weak formulation.
If u solves P1, then for any smooth function v that satisfies the displacement boundary
conditions, i.e. v = 0 at x = 0 and x = 1,we have

(1)
Conversely, if u with u(0) = u(1) = 0 satisfies (1) for every smooth function v(x) then one may
show that this u will solve P1. The proof is easier for twice continuously differentiable u
(mean value theorem), but may be proved in a distributional sense as well.
By using integration by parts on the right-hand-side of (1), we obtain

(2)
where we have used the assumption that v(0) = v(1) = 0.
A proof outline of existence and uniqueness of the solution
We can loosely think of
to be the absolutely continuous functions of (0,1) that are
0 at x = 0 and x = 1 (Sobolev spaces). Such function are (weakly) "once differentiable" and it
then defines an inner product which turns
turns out that the symmetric bilinear map
into a Hilbert space (a detailed proof is nontrivial.) On the other hand, the lefthand-side
is also an inner product, this time on the Lp space L2(0,1). An
application of the Riesz representation theorem for Hilbert spaces shows that there is a unique
u solving (2) and therefore P1. This solution is a-priori only a member of
using elliptic regularity, will be smooth if f is.

, but

The variational form of P2


If we integrate by parts using a form of Green's theorem, we see that if u solves P2, then for
any v,

166

where

denotes the gradient and denotes the dot product in the two-dimensional plane.

Once more

can be turned into an inner product on a suitable space

of "once

differentiable" functions of that are zero on


. We have also assumed that
(Sobolev spaces). Existence and uniqueness of the solution can also be shown.
Discretization
The basic idea is to replace the infinite dimensional linear problem:
Find

such that

with a finite dimensional version:


(3) Find

such that

where V is a finite dimensional subspace of


. There are many possible choices for V (one
possibility leads to the spectral method). However, for the finite element method we take V to
be a space of piecewise linear functions.
For problem P1, we take the interval (0,1), choose n values of x with 0 = x0 < x1 < ... < xn < xn
+ 1 = 1 and we define V by

where we define x0 = 0 and xn + 1 = 1. Observe that functions in V are not differentiable


then the derivative is
according to the elementary definition of calculus. Indeed, if
typically not defined at any x = xk, k = 1,...,n. However, the derivative exists at every other
value of x and one can use this derivative for the purpose of integration by parts.
For problem P2, we need V to be a set of functions of . In the figure below, we have
illustrated a triangulation of a 15 sided polygonal region in the plane (below), and a
piecewise linear function (above, in color) of this polygon which is linear on each triangle of
the triangulation; the space V would consist of functions that are linear on each triangle of the
chosen triangulation.

167

A piecewise linear function in two dimensions.


One often reads Vh instead of V in the literature. The reason is that one hopes that as the
underlying triangular grid becomes finer and finer, the solution of the discrete problem (3)
will in some sense converge to the solution of the original boundary value problem P2. The
triangulation is then indexed by a real valued parameter h > 0 which one takes to be very
small. This parameter will be related to the size of the largest or average triangle in the
triangulation. As we refine the triangulation, the space of piecewise linear functions V must
also change with h, hence the notation Vh. Since we do not perform such an analysis, we will
not use this notation.
Choosing a basis
To complete the discretization, we must select a basis of V. In the one-dimensional case, for
each control point xk we will choose the piecewise linear function vk in V whose value is 1 at
xk and zero at every

, i.e.,

for k = 1,...,n; this basis is a shifted and scaled tent function. For the two-dimensional case, we
choose again one basis function vk per vertex xk of the triangulation of the planar region .
The function vk is the unique function of V whose value is 1 at xk and 0 at every

Basis functions vk (blue) and a linear combination of them, which is piecewise linear (red).

168

Depending on the author, the word "element" in "finite element method" refers either to the
triangles in the domain, the piecewise linear basis function, or both. So for instance, an author
interested in curved domains might replace the triangles with curved primitives, and so might
describe the elements as being curvilinear. On the other hand, some authors replace
"piecewise linear" by "piecewise quadratic" or even "piecewise polynomial". The author
might then say "higher order element" instead of "higher degree polynomial". Finite element
method is not restricted to triangles (or tetrahedra in 3-d, or higher order simplexes in
multidimensional spaces), but can be defined on quadrilateral subdomains (hexahedra, prisms,
or pyramids in 3-d, and so on). Higher order shapes (curvilinear elements) can be defined with
polynomial and even non-polynomial shapes (e.g. ellipse or circle).
Examples of methods that use higher degree piecewise polynomial basis functions are the hpFEM and spectral FEM.
More advanced implementations (adaptive finite element methods) utilize a method to assess
the quality of the results (based on error estimation theory) and modify the mesh during the
solution aiming to achieve approximate solution within some bounds from the 'exact' solution
of the continuum problem. Mesh adaptivity may utilize various techniques, the most popular
are:

moving nodes (r-adaptivity)


refining (and unrefining) elements (h-adaptivity)
changing order of base functions (p-adaptivity)
combinations of the above (hp-adaptivity)

Small support of the basis


The primary advantage of this choice of basis is that the inner products

and

in the (j,k) location is known


will be zero for almost all j,k. (The matrix containing
as the Gramian matrix.) In the one dimensional case, the support of vk is the interval [xk 1,xk +
1].

Hence, the integrands of

and (vj,vk) are identically zero whenever | j k | > 1.

Similarly, in the planar case, if xj and xk do not share an edge of the triangulation, then the
integrals

and
169

are both zero.


Matrix form of the problem

If we write

and

then problem (3) becomes

for j = 1,...,n.
If we denote by

and

(4)

the column vectors (u1,...,un)t and (f1,...,fn)t, and if we let L = (Lij) and

M = (Mij) be matrices whose entries are Lij = (vi,vj) and


rephrase (4) as
.

then we may

(5)

As we have discussed before, most of the entries of L and M are zero because the basis
functions vk have small support. So we now have to solve a linear system in the unknown
where most of the entries of the matrix L, which we need to invert, are zero.
Such matrices are known as sparse matrices, and there are efficient solvers for such problems
(much more efficient than actually inverting the matrix.) In addition, L is symmetric and
positive definite, so a technique such as the conjugate gradient method is favored. For
problems that are not too large, sparse LU decompositions and Cholesky decompositions still
work well. For instance, Matlab's backslash operator (which uses sparse LU, sparse Cholesky,
and other factorization methods) can be sufficient for meshes with a hundred thousand
vertices.
The matrix L is usually referred to as the stiffness matrix, while the matrix M is dubbed the
mass matrix.
General form of the finite element method
In general, the finite element method is characterized by the following process.

One chooses a grid for . In the preceding treatment, the grid consisted of triangles,
but one can also use squares or curvilinear polygons.
Then, one chooses basis functions. In our discussion, we used piecewise linear basis
functions, but it is also common to use piecewise polynomial basis functions.

A separate consideration is the smoothness of the basis functions. For second order elliptic
boundary value problems, piecewise polynomial basis function that are merely continuous
suffice (i.e., the derivatives are discontinuous.) For higher order partial differential equations,

170

one must use smoother basis functions. For instance, for a fourth order problem such as uxxxx +
uyyyy = f, one may use piecewise quadratic basis functions that are C1.
Another consideration is the relation of the finite dimensional space V to its infinite
dimensional counterpart, in the examples above
. A conforming element method is one in
which the space V is a subspace of the element space for the continuous problem. The
example above is such a method. If this condition is not satisfied, we obtain a nonconforming
element method, an example of which is the space of piecewise linear functions over the mesh
which are continuous at each edge midpoint. Since these functions are in general
discontinuous along the edges, this finite dimensional space is not a subspace of the original
.
Typically, one has an algorithm for taking a given mesh and subdividing it. If the main
method for increasing precision is to subdivide the mesh, one has an h-method (h is
customarily the diameter of the largest element in the mesh.) In this manner, if one shows that
the error with a grid h is bounded above by Chp, for some
and p > 0, then one has an
order p method. Under certain hypotheses (for instance, if the domain is convex), a piecewise
polynomial of order d method will have an error of order p = d + 1.
If instead of making h smaller, one increases the degree of the polynomials used in the basis
function, one has a p-method. If one combines these two refinement types, one obtains an hpmethod (hp-FEM). In the hp-FEM, the polynomial degrees can vary from element to element.
High order methods with large uniform p are called spectral finite element methods (SFEM).
These are not to be confused with spectral methods.
For vector partial differential equations, the basis functions may take values in

Comparison to the finite difference method


The finite difference method (FDM) is an alternative way of approximating solutions of
PDEs. The differences between FEM and FDM are:

The most attractive feature of the FEM is its ability to handle complicated geometries
(and boundaries) with relative ease. While FDM in its basic form is restricted to
handle rectangular shapes and simple alterations thereof, the handling of geometries in
FEM is theoretically straightforward.

The most attractive feature of finite differences is that it can be very easy to
implement.

There are several ways one could consider the FDM a special case of the FEM
approach. One might choose basis functions as either piecewise constant functions or
Dirac delta functions. In both approaches, the approximations are defined on the entire
domain, but need not be continuous. Alternatively, one might define the function on a
discrete domain, with the result that the continuous differential operator no longer
makes sense, however this approach is not FEM.

171

There are reasons to consider the mathematical foundation of the finite element
approximation more sound, for instance, because the quality of the approximation
between grid points is poor in FDM.

The quality of a FEM approximation is often higher than in the corresponding FDM
approach, but this is extremely problem dependent and several examples to the
contrary can be provided.

Generally, FEM is the method of choice in all types of analysis in structural mechanics (i.e.
solving for deformation and stresses in solid bodies or dynamics of structures) while
computational fluid dynamics (CFD) tends to use FDM or other methods like finite volume
method (FVM). CFD problems usually require discretization of the problem into a large
number of cells/gridpoints (millions and more), therefore cost of the solution favors simpler,
lower order approximation within each cell. This is especially true for 'external flow'
problems, like air flow around the car or airplane, or weather simulation in a large area.
Various types of finite element methods
There are various types of finite element methods: Generalized finite element method, hpFEM, XFEM, Spectral methods, Meshfree methods, and Discontinuous Galerkin methods.
Generalized finite element method
The Generalized Finite Element Method (FEM) uses local spaces consisting of functions, not
necessarily polynomials, that reflect the available information on the unknown solution and
thus ensure good local approximation. Then a partition of unity is used to bond these spaces
together to form the approximating subspace. The effectiveness of GFEM has been shown
when applied to problems with domains having complicated boundaries, problems with
micro-scales, and problems with boundary layers.
hp-FEM
The hp-FEM combines adaptively elements with variable size h and polynomial degree p in
order to achieve exceptionally fast, exponential convergence rates.

Finite volume method


The finite volume method is a method for representing and evaluating partial differential
equations in the form of algebraic equations. Similar to the finite difference method, values
are calculated at discrete places on a meshed geometry. "Finite volume" refers to the small
volume surrounding each node point on a mesh. In the finite volume method, volume
integrals in a partial differential equation that contain a divergence term are converted to
surface integrals, using the divergence theorem. These terms are then evaluated as fluxes at
the surfaces of each finite volume. Because the flux entering a given volume is identical to
that leaving the adjacent volume, these methods are conservative. Another advantage of the
finite volume method is that it is easily formulated to allow for unstructured meshes. The
method is used in many computational fluid dynamics packages.

172

Spectral element method


In mathematics, the spectral element method is a high order finite element method.
Introduced in a 1984 paper by A. T. Patera, the abstract begins: "A spectral element method
that combines the generality of the finite element method with the accuracy of spectral
techniques..."
The spectral element method is an elegant formulation of the finite element method with a
high degree piecewise polynomial basis.
Discussion
The spectral method expands the solution in trigonometric series, a chief advantage is that the
resulting method is of very high order. This approach relies on the fact that trigonometric
polynomials are an orthonormal basis for L2(). The spectral element method chooses instead
high degree piecewise polynomial basis functions, also achieving a very high order of
accuracy.
A-priori error estimate
The classic analysis of Galerkin methods and Ca's lemma holds here and it can be shown
that, if u is the solution of the weak equation, uN is the approximate solution and
:

where C is independent from N and s is no larger than the degree of the piecewise polynomial
basis. As we increase N, we can also increase the degree of the basis functions. In this case, if
u is an analytic function:

where depends only on u.


Related methods

G-NI or SEM-NI: these are the most used spectral methods. The Galerkin formulation
of spectral methods or spectral element methods, for G-NI or SEM-NI respectively, is
modified and Gaussian numerical integration is used instead of integrals in the
definition of the bilinear form
and in the functional F. These method are a
family of PetrovGalerkin methods their convergence is a consequence of Strang's
lemma.
The spectral element method uses tensor product space spanned by nodal basis
functions associated with GaussLobatto points. In contrast, the p-version finite
element method spans a space of high order polynomials by nodeless basis functions,
chosen approximately orthogonal for numerical stability. Since not all interior basis
functions need to be present, the p-version finite element method can create a space
that that contains all polynomials up to a given degree with many fewer degrees of
173

freedom. However, some speedup techniques possible in spectral methods due to their
tensor-product character are no longer available. The name p-version means that
accuracy is increased by increasing the order of the approximating polynomials (thus,
p) rather than decreasing the mesh size, h.
The hp finite element method (hp-FEM) combines the advantages of the h and p
refinements to obtain extremely fast, exponential convergence rates.

Spectral method
Spectral methods are a class of techniques used in applied mathematics and scientific
computing to numerically solve certain partial differential equations (PDEs), often involving
the use of the Fast Fourier Transform. Where applicable, spectral methods have excellent
error properties, with the so called "exponential convergence" being the fastest possible.
PDEs describe a wide array of physical processes such as heat conduction, fluid flow, and
sound propagation. In many such equations, there are underlying "basic waves" that can be
used to give efficient algorithms for computing solutions to these PDEs. In a typical case,
spectral methods take advantage of this fact by writing the solution as its Fourier series,
substituting this series into the PDE to get a system of ordinary differential equations (ODEs)
in the time-dependent coefficients of the trigonometric terms in the series (written in complex
exponential form), and using a time-stepping method to solve those ODEs.
The spectral method and the finite element method are closely related and built on the same
ideas; the main difference between them is that the spectral method approximates the solution
as linear combination of continuous functions that are generally nonzero over the domain of
solution (usually sinusoids or Chebyshev polynomials), while the finite element method
approximates the solution as a linear combination of piecewise functions that are nonzero on
small subdomains. Because of this, the spectral method takes on a global approach while the
finite element method is a local approach. This is part of why the spectral method works best
when the solution is smooth. In fact there are no known three-dimensional single domain
spectral shock capturing results.
In the finite element community, a method where the degree of the elements is very high or
increases as the grid parameter h decreases to zero is sometimes called a spectral element
method.
The implementation of the spectral method is normally accomplished either with collocation
or a Galerkin approach.
A relationship with the spectral element method
One can show that if g is infinitely differentiable, then the numerical algorithm using Fast
Fourier Transforms will converge faster than any polynomial in the grid size h. That is, for
such that the error is less than Chn for all sufficiently small
any n > 0, there is a
values of h. We say that the spectral method is of order n, for every n > 0.
Because a spectral element method is a finite element method of very high order, there is a
similarity in the convergence properties. However, whereas the spectral method is based on
the eigendecomposition of the particular boundary value problem, the spectral element

174

method does not use that information and works for arbitrary elliptic boundary value
problems.

Meshfree methods
Meshfree methods are a particular class of numerical simulation algorithms for the simulation
of physical phenomena. Traditional simulation algorithms relied on a grid or a mesh,
meshfree methods in contrast use the geometry of the simulated object directly for
calculations. Meshfree methods exist for fluid dynamics as well as for solid mechanics. Some
methods are able to handle both cases.
Description
Meshfree methods eliminate some or all of the traditional mesh-based view of the
computational domain and rely on a particle (either Lagrangian or Eulerian) view of the field
problem.
A goal of meshfree methods is to facilitate the simulation of increasingly demanding
problems that require the ability to treat large deformations, advanced materials, complex
geometry, nonlinear material behavior, discontinuities and singularities. For example the
melting of a solid or the freezing process can be simulated using meshfree methods.
There is also an additional 'sales' oriented aspect of this name. Meshfree (or 'meshless' as this
is also used) methods seem attractive as alternative to finite element method (FEM) for the
general engineering community, which consider the process of generating finite element
meshes as more difficult and expensive than the remainder of analysis process.
History
One of the earlier methods without a mesh is smoothed particle hydrodynamics, presented in
1977.
List of methods and acronyms
The following numerical methods are generally considered to fall within the general class of
"meshfree" methods. Acronyms are provided in parentheses.

Smoothed particle hydrodynamics (SPH) (1977)


Diffuse element method (DEM) (1992)
Element-free Galerkin method (EFG / EFGM) (1994)
Reproducing kernel particle method (RKPM) (1995)
hp-clouds
Natural element method (NEM)
Material Point Method (MPM)
Meshless local Petrov Galerkin (MLPG)
Moving particle semi-implicit (MPS)
Generalized finite difference method (GFDM)
Particle-in-cell (PIC)
Moving particle finite element method (MPFEM)
Finite cloud method (FCM)
175

Boundary node method (BNM)


Boundary cloud method (BCM)
Method of fundamental solution(MFS)
Method of particular solution (MPS)
Method of Finite Spheres (MFS)
Discrete Vortex Method (DVM)

Related methods:

Moving least squares (MLS) - provide general approximation method for arbitrary set
of nodes
Partition of unity methods (PoUM) - provide general approximation formulation used
in some meshfree methods
Continuous blending method (enrichment and coupling of finite elements and
meshless methods)
eXtended FEM, Generalized FEM (XFEM, GFEM) - variants of FEM (finite element
method) combining some meshless aspects
Local Maximum-Entropy (LME)
Space-Time Meshfree Collocation Method (STMCM)
Mesh-free radial point interpolation method (RPIM)

Domain decomposition methods


In mathematics, numerical analysis, and numerical partial differential equations, domain
decomposition methods solve a boundary value problem by splitting it into smaller boundary
value problems on subdomains and iterating to coordinate the solution between adjacent
subdomains. A coarse problem with one or few unknowns per subdomain is used to further
coordinate the solution between the subdomains globally. The problems on the subdomains
are independent, which makes domain decomposition methods suitable for parallel
computing. Domain decomposition methods are typically used as preconditioners for Krylov
space iterative methods, such as the conjugate gradient method or GMRES.
In overlapping domain decomposition methods, the subdomains overlap by more than the
interface. Overlapping domain decomposition methods include the Schwarz alternating
method and the additive Schwarz method. Many domain decomposition methods can be
written and analyzed as a special case of the abstract additive Schwarz method.
In non-overlapping methods, the subdomains intersect only on their interface. In primal
methods, such as Balancing domain decomposition and BDDC, the continuity of the solution
across subdomain interface is enforced by representing the value of the solution on all
neighboring subdomains by the same unknown. In dual methods, such as FETI, the continuity
of the solution across the subdomain interface is enforced by Lagrange multipliers. The FETIDP method is hybrid between a dual and a primal method.
Non-overlapping domain decomposition methods are also called iterative substructuring
methods.
Mortar methods are discretization methods for partial differential equations, which use
separate discretization on nonoverlapping subdomains. The meshes on the subdomains do not
match on the interface, and the equality of the solution is enforced by Lagrange multipliers,
176

judiciously chosen to preserve the accuracy of the solution. In the engineering practice in the
finite element method, continuity of solutions between non-matching subdomains is
implemented by multiple-point constraints.

Multigrid method
Multigrid (MG) methods in numerical analysis are a group of algorithms for solving
differential equations using a hierarchy of discretizations. The main idea of multigrid is to
accelerate the convergence of a base iterative method by correcting, from time to time, the
solution globally by solving a coarse problem. This idea is similar to extrapolation between
coarser and finer grids. The typical application for multigrid is in the numerical solution of
elliptic partial differential equations in two or more dimensions.
Multigrid methods can be applied in combination with any of the common discretization
techniques. In these cases, multigrid methods are among the fastest solution techniques
known today. In contrast to other methods, multigrid methods are general in that they can
treat arbitrary regions and boundary conditions. They do not depend on the separability of the
equations or other special properties of the equation. They are also directly applicable to
more-complicated non-symmetric and nonlinear systems of equations, like the Lam system
of elasticity or the Navier-Stokes equations.
Multigrid methods can be generalized in many different ways. They can be applied naturally
in a time-stepping solution of parabolic partial differential equations, or they can be applied
directly to time-dependent partial differential equations. Research on multilevel techniques for
hyperbolic partial differential equations is under way. Multigrid methods can also be applied
to integral equations, or for problems in statistical physics.
Other extensions of multigrid methods include techniques where no partial differential
equation nor geometrical problem background is used to construct the multilevel hierarchy.
Such algebraic multigrid methods (AMG) construct their hierarchy of operators directly from
the system matrix and thus become true black-box solvers for sparse matrices.
The finite element method may be recast as a multigrid method by choosing linear wavelets as
the basis.
Algorithm
There are many variations of multigrid algorithms, but the common features are that a
hierarchy of discretizations (grids) is considered. The important steps are:

Smoothing reducing high frequency errors, for example using a few iterations of the
GaussSeidel method.
Restriction downsampling the residual error to a coarser grid.
Prolongation interpolating a correction computed on a coarser grid into a finer grid.

Computational cost
This approach has the advantage over other methods that it often scales linearly with the
number of discrete nodes used. That is: It can solve these problems to a given accuracy in a
number of operations that is proportional to the number of unknowns.
177

Assume that one has a differential equation which can be solved approximately (with a given
accuracy) on a grid i with a given grid point density Ni. Assume furthermore that a solution on
any grid Ni may be obtained with a given effort Wi = KNi from a solution on a coarser grid i
+ 1. Here, = Nj + 1 / Nj < 1 is the ratio of grid points on "neighboring" grids and is assumed to
be constant throughout the grid hierarchy, and K is some constant modeling the effort of
computing the result for one grid point.
The following recurrence relation is then obtained for the effort of obtaining the solution on
grid k:
Wk = Wk + 1 + KNk
And in particular, we find for the finest grid N1 that
W1 = W2 + KN1
Combining these two expressions (and using Nk = k 1N1) gives

Using the geometric series, we then find (for finite n)

that is, a solution may be obtained in O(N) time.

Discontinuous Galerkin method


Discontinuous Galerkin methods (DG methods) in mathematics form a class of numerical
methods for solving partial differential equations. They combine features of the finite element
and the finite volume framework and have been successfully applied to hyperbolic, elliptic
and parabolic problems arising from a wide range of applications. DG methods have in
particular received considerable interest for problems with a dominant first-order part, e.g. in
electrodynamics, fluid mechanics and plasma physics.
Discontinuous Galerkin methods were first proposed and analyzed in the early 1970s as a
technique to numerically solve partial differential equations. In 1973 Reed and Hill introduced
a DG method to solve the hyperbolic neutron transport equation.
The origin of the DG method for elliptic problems cannot be traced back to a single
publication as features such as jump penalization in the modern sense where developed
gradually. However, among the early influential contributors were Babuka, J.-L. Lions,
Nitsche and Zlamal. A more complete account of the historical development and an
introduction to DG methods for elliptic problems is given in a publication by Arnold, Brezzi,
Cockburn and Marini.

178

In 1996, two engineers used DG to solve the Stokes equations, which is 2nd order. This
motivated mathematicians to look deep in DG and the local discontinuous Galerkin method
was generated. By suitably choosing the numerical flux, DG has been extended to solve
higher order equations.

Extended finite element method


The extended finite element method (XFEM), also known as generalized finite element
method (GFEM) or partition of unity method (PUM) is a numerical technique that extends the
classical finite element method (FEM) approach by extending the solution space for solutions
to differential equations with discontinuous functions.
History
The extended finite element method (XFEM) was developed in 1999 by Ted Belytschko and
collaborators, to help alleviate the above shortcomings of the finite element method and has
been used to model the propagation of various discontinuities: strong (cracks) and weak
(material interfaces). The idea behind XFEM is to retain most advantages of meshfree
methods while alleviating their negative sides.
Rationale
The extended finite element method was developed to ease difficulties in solving problems
with localized features that are not efficiently resolved by mesh refinement. One of the initial
applications was the modelling of fractures in a material. In this original implementation,
discontinuous basis functions are added to standard polynomial basis functions for nodes that
belonged to elements that are intersected by a crack to provide a basis that included crack
opening displacements. A key advantage of XFEM is that in such problems the finite element
mesh does not need to be updated to track the crack path. Subsequent research has illustrated
the more general use of the method for problems involving singularities, material interfaces,
regular meshing of microstructural features such as voids, and other problems where a
localized feature can be described by an appropriate set of basis functions.
Principle
Enriched finite element methods extend, or enrich, the approximation space so that it is able
to naturally reproduce the challenging feature associated with the problem of interest: the
discontinuity, singularity, boundary layer, etc. It was shown that for some problems, such an
embedding of the problem's feature into the approximation space can significantly improve
convergence rates and accuracy. Moreover, treating problems with discontinuities with
eXtended Finite Element Methods suppresses the need to mesh and remesh the discontinuity
surfaces, thus alleviating the computational costs and projection errors associated with
conventional finite element methods, at the cost of restricting the discontinuities to mesh
edges.

hp-FEM
The hp-FEM is a general version of the finite element method (FEM), a numerical method for
solving partial differential equations based on piecewise-polynomial approximations that
employs elements of variable size (h) and polynomial degree (p). The origins of hp-FEM date
179

back to the pioneering work of Ivo Babuska et al. who discovered that the finite element
method converges exponentially fast when the mesh is refined using a suitable combination of
h-refinements (dividing elements into smaller ones) and p-refinements (increasing their
polynomial degree). The exponential convergence makes the method a very attractive choice
compared to most other finite element methods which only converge with an algebraic rate.
The exponential convergence of the hp-FEM was not only predicted theoretically but also
observed by numerous independent researchers.
Differences from standard FEM
The hp-FEM differs from the standard (lowest-order) FEM in many aspects.

Choice of higher-order shape functions: To begin with, the higher-degree polynomials


in elements can be generated using different sets of shape functions. The choice of
such set can influence dramatically the conditioning of the stiffness matrix, and in turn
the entire solution process.
Automatic hp-adaptivity: In the hp-FEM, an element can be hp-refined in many
different ways. One way is to just increase its polynomial degree without subdividing
it in space. Or, the element can be subdivided geometrically, and various polynomial
degrees can be applied to the subelements. The number of element refinement
candidates easily reaches 100 in 2D and 1000 in 3D. Therefore, clearly, one number
indicating the size of error in an element is not enough to guide automatic hpadaptivity (as opposed to adaptivity in standard FEM). Other techniques such as
reference solutions or analyticity considerations must be employed to obtain more
information about the shape of error in every element.
Ratio of assembling and solution CPU times: In standard FEM, the stiffness matrix
usually is assembled quickly but it is quite large. Therefore, typically, the solution of
the discrete problem consumes the largest part of the overall computing time. On the
contrary, the stiffness matrices in the hp-FEM typically are much smaller, but (for the
same matrix size) their assembly takes more time than in standard FEM. Mostly, this
is due to the computational cost of higher-order numerical quadrature.
Analytical challenges: The hp-FEM is more difficult to understand from the analytical
point of view than standard FEM. This concerns numerous techniques, such as the
discrete maximum principles (DMP) for elliptic problems. These results state that,
usually with some limiting assumptions on the mesh, the piecewise-polynomial FEM
approximation obeys analogous maximum principles as the underlying elliptic PDE.
Such results are very important since they guarantee that the approximation remain
physically admissible, leaving no possibility of computing a negative density, negative
concentration, or negative absolute temperature. The DMP are quire well understood
for lowest-order FEM but completely unknown for the hp-FEM in two or more
dimensions.
Programming challenges: It is much harder to implement a hp-FEM solver than
standard FEM code. The multiple issues that need to be overcome include (but are not
limited to): higher-order quadrature formulas, higher-order shape functions,
connectivity and orientation information relating shape functions on the reference
domain with basis functions in the physical domain, etc.

180

Efficiency
Smooth functions can be approximated much more efficiently using large high-order elements
than small piecewise-linear ones. This is illustrated in the figure below, where a 1D Poisson
equation with zero Dirichlet boundary conditions is solved on two different meshes. The exact
solution is the sin function.

Mesh consisting of two linear elements.

Mesh consisting of one quadratic element.


While the number of unknowns is the same in both cases (1 DOF), the errors in the
corresponding norm are 0.68 and 0.20, respectively. This means that the quadratic
approximation was roughly 3.5-times more efficient than the piecewise-linear one. When we
proceed one step further and compare four linear elements to one quartic element (p=4), then
both discrete problems will have three DOF but the quartic approximation will be
approximately 40-times more efficient. When performing few more steps like this, the reader
will see that the efficiency gap opens extremely fast.
Vice versa, small low-order elements can capture small-scale features such as singularities
much better than large high-order ones. The hp-FEM is based on an optimal combination of
these two approaches which leads to exponential convergence.

181

hp-adaptivity
Some FEM sites describe hp-adaptivity as a combination of h-adaptivity (splitting elements in
space while keeping their polynomial degree fixed) and p-adaptivity (only increasing their
polynomial degree). This is not entirely accurate. The hp-adaptivity is significantly different
from both h- and p-adaptivity since the hp-refinement of an element can be done in many
different ways. Besides a p-refinement, the element can be subdivided in space (as in hadaptivity), but there are many combinations for the polynomial degrees on the subelements.
This is illustrated in the figure on the right. For example, if a triangular or quadrilateral
element is subdivided into four subelements where the polynomial degrees are allowed to
vary by at most two, then this yields 3^4 = 81 refinement candidates (not considering
polynomially anisotropic candidates). Analogously, splitting a hexahedron into eight
subelements and varying their polynomial degrees by at most two yields 3^8 = 6,561
refinement candidates. Clearly, standard FEM error estimates providing one constant number
per element are not enough to guide automatic hp-adaptivity.

182

You might also like