Numerical Integration Methods Overview
Numerical Integration Methods Overview
If f(x) is a smooth well-behaved function, integrated over a small number of dimensions and
the limits of integration are bounded, there are many methods of approximating the integral
with arbitrary precision.
91
92
93
If f(x) does not have many derivatives at all points, or if the derivatives become large, then
Gaussian quadrature is often insufficient. In this case, an adaptive algorithm will perform
better. For many cases, estimating the error from quadrature over an interval for a function
f(x) isn't obvious. One popular solution is to use two different rules of quadrature, and use
their difference as an estimate of the error from quadrature. The other problem is deciding
what "too large" or "very small" signify. A local criterion for "too large" is that the quadrature
error should not be larger than
where t, a real number, is the tolerance we wish to set for
global error. Then again, if h is already tiny, it may not be worthwhile to make it even smaller
even if the quadrature error is apparently large. A global criterion is that the sum of errors on
all the intervals should be less than t. This type of error analysis is usually called "a
posteriori" since we compute the error after having computed the approximation.
Extrapolation methods
The accuracy of a quadrature rule of the Newton-Cotes type is generally a function of the
number of evaluation points. The result is usually more accurate as number of evaluation
points increases, or, equivalently, as the width of the step size between the points decreases. It
is natural to ask what the result would be if the step size were allowed to approach zero. This
can be answered by extrapolating the result from two or more nonzero step sizes (Richardson
extrapolation). The extrapolation function may be a polynomial or rational function.
Multidimensional integrals
The quadrature rules discussed so far are all designed to compute one-dimensional integrals.
To compute integrals in multiple dimensions, one approach is to phrase the multiple integral
as repeated one-dimensional integrals by appealing to Fubini's theorem. This approach
requires the function evaluations to grow exponentially as the number of dimensions
increases. Two methods are known to overcome this so-called curse of dimensionality.
Monte Carlo
Monte Carlo methods and quasi-Monte Carlo methods are easy to apply to multi-dimensional
integrals, and may yield greater accuracy for the same number of function evaluations than
repeated integrations using one-dimensional methods.
A large class of useful Monte Carlo methods are the so-called Markov chain Monte Carlo
algorithms, which include the Metropolis-Hastings algorithm and Gibbs sampling.
Sparse grids
Sparse grids were originally developed by Smolyak for the quadrature of high dimensional
functions. The method is always based on a one dimensional quadrature rule, but performs a
more sophisticated combination of univariate results.
Connection with differential equations
The problem of evaluating the integral
94
can be reduced to an initial value problem for an ordinary differential equation. If the above
integral is denoted by I(b), then the function I satisfies
Methods developed for ordinary differential equations, such as RungeKutta methods, can be
applied to the restated problem and thus be used to evaluate the integral. For instance, the
standard fourth-order RungeKutta method applied to the differential equation yields
Simpson's rule from above.
The differential equation I'(x) = (x) has a special form: the right-hand side contains only
the dependent variable (here x) and not the independent variable (here I). This simplifies the
theory and algorithms considerably. The problem of evaluating integrals is thus best studied
in its own right.
NewtonCotes formulas
In numerical analysis, the NewtonCotes formulas, also called the NewtonCotes rules, are a
group of formulas for numerical integration (also called quadrature) based on evaluating the
integrand at equally-spaced points. They are named after Isaac Newton and Roger Cotes.
NewtonCotes formulas can be useful if the value of the integrand at equally-spaced points is
given. If it is possible to change the points at which the integrand is evaluated, then other
methods such as Gaussian quadrature and ClenshawCurtis quadrature are probably more
suitable.
Description
It is assumed that the value of a function defined on [a, b] is known at equally spaced points
xi, for i = 0, , n, where x0 = a and xn = b. There are two types of NewtonCotes formulas,
the "closed" type which uses the function value at all points, and the "open" type which does
not use the function values at the endpoints. The closed Newton-Cotes formula of degree n is
stated as
where xi = h i + x0, with h (called the step size) equal to (xn x0)/n = (b a)/n. The wi are
called weights.
As can be seen in the following derivation the weights are derived from the Lagrange basis
polynomials. This means they depend only on the xi and not on the function . Let L(x) be the
interpolation polynomial in the Lagrange form for the given data points (x0, (x0)), , (xn,
(xn)), then
95
Rectangle method
In mathematics, specifically in integral calculus, the rectangle method (also called the
midpoint or mid-ordinate rule) computes an approximation to a definite integral, made by
finding the area of a collection of rectangles whose heights are determined by the values of
the function.
Specifically, the interval (a,b) over which the function is to be integrated is divided into n
equal subintervals of length = (b a) / n. The rectangles are then drawn so that either their
left or right corners, or the middle of their top line lies on the graph of the function, with bases
running along the x-axis. The approximation to the integral is then calculated by adding up the
areas (base multiplied by height) of the n rectangles, giving the formula:
where i' is defined to be either i 1, i or i 1 / 2, giving the approximation based on the topleft corner, the top-right corner or the middle of the top line respectively.
As n gets larger, this approximation gets more accurate. In fact, this computation is the spirit
of the definition of the Riemann integral and the limit of this approximation as
is
defined and equal to the integral of f on (a,b) if this Riemann integral is defined. Note that this
96
is true regardless of which i' is used, however the midpoint approximation tends to be more
accurate for finite n.
Error
For a function f which is twice differentiable, the approximation error in each section (a,a +
) of the midpoint rule decays as the cube of the width of the rectangle
for some in (a,a + ). Summing this, the approximation error for n intervals with width is,
for some in (a,b), less than or equal to
Trapezoidal rule
In mathematics, the trapezoidal rule (also known as the trapezoid rule, or the trapezium rule
in British English) is a way to approximately calculate the definite integral
The trapezoidal rule works by approximating the region under the graph of the function f(x) as
a trapezoid and calculating its area. It follows that
The function f(x) (in blue) is approximated by a linear function (in red).
To calculate this integral more accurately, one first splits the interval of integration [a,b] into
n smaller subintervals, and then applies the trapezoidal rule on each of them. One obtains the
composite trapezoidal rule:
97
where
98
Error analysis
The error of the composite trapezoidal rule is the difference between the value of the integral
and the numerical result:
Further terms in this error estimate are given by the EulerMaclaurin summation formula.
Simpson's rule
In numerical analysis, Simpson's rule is a method for numerical integration, the numerical
approximation of definite integrals. Specifically, it is the following approximation:
99
Simpson's rule can be derived by approximating the integrand f(x) (in blue) by the quadratic
interpolant P(x) (in red).
Derivation
Simpson's rule can be derived in various ways.
Quadratic interpolation
One derivation replaces the integrand f(x) by the quadratic polynomial P(x) which takes the
same values as f(x) at the end points a and b and the midpoint m = (a+b) / 2. One can use
Lagrange polynomial interpolation to find an expression for this polynomial,
100
respectively. It follows that the leading error term vanishes if we take the weighted average
The coefficients , and can be fixed by requiring that this approximation be exact for all
quadratic polynomials. This yields Simpson's rule.
Error
The error in approximating an integral by Simpson's rule is
101
small subintervals. Simpson's rule is then applied to each subinterval, with the results being
summed to produce an approximation for the integral over the entire interval. This sort of
approach is termed the composite Simpson's rule.
Suppose that the interval [a,b] is split up in n subintervals, with n an even number. Then, the
composite Simpson's rule is given by
The error committed by the composite Simpson's rule is bounded (in absolute value) by
102
where is some number between a and b. Thus, the 3/8 rule is about twice as accurate as the
standard method, but it uses one more function value. A composite 3/8 rule also exists,
similarly as above.
Gaussian quadrature
In numerical analysis, a quadrature rule is an approximation of the definite integral of a
function, usually stated as a weighted sum of function values at specified points within the
domain of integration. An n-point Gaussian quadrature rule, named after Carl Friedrich
Gauss, is a quadrature rule constructed to yield an exact result for polynomials of degree 2n
1 or less by a suitable choice of the points xi and weights wi for i = 1,...,n. The domain of
integration for such a rule is conventionally taken as [1, 1], so the rule is stated as
Gaussian quadrature as above will only produce accurate results if the function f(x) is well
approximated by a polynomial function within the range [-1,1]. The method is, for example,
not suitable for functions with singularities. However, if the integrated function can be written
as
, where g(x) is approximately polynomial, and W(x) is known, then
there are alternative weights wi such that
(Gauss-Chebyshev) and
(Gauss-Hermite).
It can be shown that the evaluation points are just the roots of a polynomial belonging to a
class of orthogonal polynomials.
Rules for the basic problem
For the integration problem stated above, the associated polynomials are Legendre
polynomials, Pn(x). With the nth polynomial normalized to give Pn(1) = 1, the ith Gauss node,
xi, is the ith root of Pn; its weight is given by
103
After applying the Gaussian quadrature rule, the following approximation is obtained:
for some choices of a, b, and . For a = 1, b = 1, and (x) = 1, the problem is the same as
that considered above. Other choices lead to other integration rules.
Error estimates
The error of a Gaussian quadrature rule can be stated as follows. For an integrand which has
2n continuous derivatives,
for some in (a, b), where pn is the orthogonal polynomial of order n and where
Stoer and Bulirsch remark that this error estimate is inconvenient in practice, since it may be
difficult to estimate the order 2n derivative, and furthermore the actual error may be much
less than a bound established by the derivative. Another approach is to use two Gaussian
104
quadrature rules of different orders, and to estimate the error as the difference between the
two results. For this purpose, GaussKronrod quadrature rules can be useful.
GaussKronrod rules
If the interval [a, b] is subdivided, the Gauss evaluation points of the new subintervals never
coincide with the previous evaluation points (except at zero for odd numbers), and thus the
integrand must be evaluated at every point. GaussKronrod rules are extensions of Gauss
quadrature rules generated by adding n + 1 points to an n-point rule in such a way that the
resulting rule is of order 3n + 1. This allows for computing higher-order estimates while reusing the function values of a lower-order estimate. The difference between a Gauss
quadrature rule and its Kronrod extension are often used as an estimate of the approximation
error.
ClenshawCurtis quadrature
ClenshawCurtis quadrature and Fejr quadrature are methods for numerical integration, or
"quadrature", that are based on an expansion of the integrand in terms of Chebyshev
polynomials. Equivalently, they employ a change of variables x = cos and use a discrete
cosine transform (DCT) approximation for the cosine series. Besides having fast-converging
accuracy comparable to Gaussian quadrature rules, ClenshawCurtis quadrature naturally
leads to nested quadrature rules (where different accuracy orders share points), which is
important for both adaptive quadrature and multidimensional quadrature (cubature).
Briefly, the function f(x) to be integrated is evaluated at the N extrema or roots of a
Chebyshev polynomial and these values are used to construct a polynomial approximation for
the function. This polynomial is then integrated exactly. In practice, the integration weights
for the value of the function at each node are precomputed, and this computation can be
performed in O(NlogN) time by means of fast Fourier transform-related algorithms for the
DCT.
Description
A simple way of understanding the algorithm is to realize that ClenshawCurtis quadrature
amounts to integrating via a change of variable x = cos(). The algorithm is normally
expressed for integration of a function f(x) over the interval [-1,1] (any other interval can be
obtained by appropriate rescaling). For this integral, we can write:
That is, we have transformed the problem from integrating f(x) to one of integrating
f(cos)sin. This can be performed if we know the cosine series for f(cos):
105
one must again perform a numeric integration, so at first this may not seem to have simplified
the problem. Unlike computation of arbitrary integrals, however, Fourier-series integrations
for periodic functions (like f(cos), by construction), up to the Nyquist frequency k = N, are
accurately computed by the N equally-spaced and equally-weighted points n = n / N for
(except the endpoints are weighted by 1/2, to avoid double-counting,
equivalent to the trapezoidal rule or the EulerMaclaurin formula). That is, we approximate
the cosine-series integral by the type-I discrete cosine transform (DCT):
for
and then use the formula above for the integral in terms of these ak.
Because only a2k is needed, the formula simplifies further into a type-I DCT of order N/2,
assuming N is an even number:
From this formula, it is clear that the Clenshaw-Curtis quadrature rule is symmetric, in that it
weights f(x) and f(x) equally.
Connection to Chebyshev polynomials
The reason that this is connected to the Chebyshev polynomials Tk(x) is that, by definition,
Tk(cos) = cos(k), and so the cosine series above is really an approximation of f(x) by
Chebyshev polynomials:
and thus we are "really" integrating f(x) by integrating its approximate expansion in terms of
Chebyshev polynomials. The evaluation points xn = cos(n / N) correspond to the extrema of
the Chebyshev polynomial TN(x).
The fact that such Chebyshev approximation is just a cosine series under a change of variables
is responsible for the rapid convergence of the approximation as more terms Tk(x) are
included. A cosine series converges very rapidly for functions that are even, periodic, and
sufficiently smooth. This is true here, since f(cos) is even and periodic in by construction,
106
which is precisely the type-II DCT. However, Fejr's first quadrature rule is not nested: the
evaluation points for 2N do not coincide with any of the evaluation points for N, unlike
ClenshawCurtis quadrature or Fejr's second rule.
Despite the fact that Fejr discovered these techniques before Clenshaw and Curtis, the name
"ClenshawCurtis quadrature" has become standard.
Comparison to Gaussian quadrature
The classic method of Gaussian quadrature evaluates the integrand at N + 1 points and is
constructed to exactly integrate polynomials up to degree 2N + 1. In contrast, ClenshawCurtis quadrature, above, evaluates the integrand at N + 1 points and exactly integrates
polynomials only up to degree N. It may seem, therefore, that ClenshawCurtis is intrinsically
worse than Gaussian quadrature, but in reality this does not seem to be the case.
In practice, several authors have observed that ClenshawCurtis can have accuracy
comparable to that of Gaussian quadrature for the same number of points. This is possible
because most numeric integrands are not polynomials (especially since polynomials can be
integrated analytically), and approximation of many functions in terms of Chebyshev
polynomials converges rapidly (Chebyshev approximation). In fact, recent theoretical results
argue that both Gaussian and ClenshawCurtis quadrature have error bounded by O([2N] k /
k) for a k-times differentiable integrand.
One often cited advantage of ClenshawCurtis quadrature is that the quadrature weights can
be evaluated in O(NlogN) time by fast Fourier transform algorithms (or their analogues for the
107
DCT), whereas the Gaussian quadrature weights require O(N2) time to compute. As a
practical matter, however, high-order numeric integration is rarely performed by simply
evaluating a quadrature formula for very large N. Instead, one usually employs an adaptive
quadrature scheme that first evaluates the integral to low order, and then successively refines
the accuracy by increasing the number of sample points, possibly only in regions where the
integral is inaccurate. To evaluate the accuracy of the quadrature, one compares the answer
with that of a quadrature rule of even lower order. Ideally, this lower-order quadrature rule
evaluates the integrand at a subset of the original N points, to minimize the integrand
evaluations. This is called a nested quadrature rule, and here Clenshaw-Curtis has the
advantage that the rule for order N uses a subset of the points from order 2N. In contrast,
Gaussian quadrature rules are not naturally nested, and so one must employ GaussKronrod
quadrature formulas or similar methods. Nested rules are also important for sparse grids in
multidimensional quadrature.
where wi, xi are the weights and points at which to evaluate the function f(x).
If the interval [a, b] is subdivided, the Gauss evaluation points of the new subintervals never
coincide with the previous evaluation points (except at zero for odd numbers), and thus the
integrand must be evaluated at every point. GaussKronrod formulas are extensions of Gauss
quadrature formulas generated by adding n + 1 points to an n-point rule in such a way that the
resulting rule is of order 3n + 1. This allows for computing higher-order estimates while reusing the function values of a lower-order estimate. The difference between a Gauss
quadrature rule and its Kronrod extension are often used as an estimate of the approximation
error.
108
Adaptive quadrature
In applied mathematics, adaptive quadrature is a process in which the integral of a function
f(x) is approximated using static quadrature rules on adaptively refined subintervals of the
integration domain. Generally, adaptive algorithms are just as efficient and effective as
traditional algorithms for "well-behaved" integrands, but are also effective for "badlybehaved" integrands for which traditional algorithms fail.
General scheme
Adaptive quadrature follows the general scheme
1. procedure integrate ( f , a , b , tau )
2.
3.
4.
5.
6.
7.
8.
if
then
m = (a + b) / 2
Q = integrate(f,a,m,tau) + integrate(f,m,b,tau)
endif
return Q
An approximation Q to the integral of f(x) over the interval [a,b] is computed (line 2), as well
as an error estimate (line 3). If the estimated error is larger than the required tolerance
(line 4), the interval is subdivided (line 5) and the quadrature is applied on both halves
separately (line 6). Either the initial estimate or the sum of the recursively computed halves is
returned (line 7).
The important components are the quadrature rule itself
and the logic for deciding which interval to subdivide, and when to terminate.
There are, of course, several variants of this scheme. The most common will be discussed
later.
Basic quadrature rules
The quadrature rules generally have the form
109
.
When such rules are used, the points at which f(x) has been evaluated can be re-used upon
recursion:
A similar strategy is used with ClenshawCurtis quadrature, where the nodes are chosen as
Gaussian quadrature
GaussKronrod quadrature formula
An algorithm may elect to use different quadrature methods on different subintervals, for
example using a high-order method only where the integrand is smooth.
Error estimation
Some quadrature algorithms generate a sequence of results which should approach the correct
value. Otherwise one can use a "null rule" which has the form of the above quadrature rule,
but whose value would be zero for a simple integrand (for example, if the integrand were a
polynomial of the appropriate degree).
110
Subdivision Logic
"Local" adaptive quadrature makes the acceptable error for a given interval proportional to the
length of that interval. This criterion can be difficult to satisfy if the integrand are badly
behaved at only a few points, for example with a few step discontinuities. Alternatively, one
could require only that the sum of the errors on each of the subintervals be less than the user's
requirement. This would be "global" adaptive quadrature. Global adaptive quadrature can be
more efficient (using fewer evaluations of the integrand) but are generally more complex to
program and may require more working space to record information on the current set of
intervals.
Romberg's method
In numerical analysis, Romberg's method generates a triangular array consisting of numerical
estimates of the definite integral
or
where
111
The zeroeth extrapolation, R(n, 0), is equivalent to the trapezoidal rule with 2n1 + 1 points;
the first extrapolation, R(n, 1), is equivalent to Simpson's rule with 2n1 + 1 points.
When function evaluations are expensive, it may be preferable to replace the polynomial
interpolation of Richardson with the rational interpolation proposed by Bulirsch & Stoer.
An illustration of Monte Carlo integration. In this example, the domain D is the inner circle
and the domain d is the square. Because the square's area can be easily calculated, the area of
the circle can be estimated by the ratio (0.8) of the points inside the circle (40) to the total
number of points (50), yielding an approximation for
112
113
Importance sampling
VEGAS Monte Carlo
The VEGAS algorithm of [Link] is based on importance sampling. It samples points
from the probability distribution described by the function | f |, so that the points are
concentrated in the regions that make the largest contribution to the integral.
In general, if the Monte Carlo integral of f is sampled with points distributed according to a
probability distribution described by the function g.
In practice it is not possible to sample from the exact distribution g for an arbitrary function,
so importance sampling algorithms aim to produce efficient approximations to the desired
distribution.
The VEGAS algorithm approximates the exact distribution by making a number of passes
over the integration region while histogramming the function f. Each histogram is used to
define a sampling distribution for the next pass. Asymptotically this procedure converges to
the desired distribution.
VEGAS incorporates a number of additional features, and combines both stratified sampling
and importance sampling. The integration region is divided into a number of "boxes", with
each box getting a fixed number of points (the goal is 2). Each box can then have a fractional
number of bins, but if bins/box is less than two, Vegas switches to a kind variance reduction
(rather than importance sampling).
This routines uses the VEGAS Monte Carlo algorithm to integrate the function f over the dimdimensional hypercubic region defined by the lower and upper limits in the arrays xl and xu,
each of size dim. The integration uses a fixed number of function calls, and obtains random
sampling points using the random number generator r. A previously allocated workspace s
must be supplied. The result of the integration is returned in result, with an estimated absolute
error abserr. The result and its error estimate are based on a weighted average of independent
samples.
The VEGAS algorithm computes a number of independent estimates of the integral internally,
according to the iterations parameter, and returns their weighted average. Random sampling
of the integrand can occasionally produce an estimate where the error is zero, particularly if
the function is constant in some regions. An estimate with zero error causes the weighted
average to break down and must be handled separately. In the original Fortran
implementations of VEGAS the error estimate is made non-zero by substituting a small value
(typically 1e-30).
114
Numerical differentiation
Numerical differentiation is a technique of numerical analysis to produce an estimate of the
derivative of a mathematical function or function subroutine using values from the function
and perhaps other knowledge about the function.
Finite difference formulae
The simplest method is to use finite difference approximations. A simple two-point estimation
is to compute the slope of a nearby secant line through the points (x,f(x)) and (x+h,f(x+h)).
Choosing a small number h, h represents a small change in x, and it can be either positive or
negative. The slope of this line is
The slope of this secant line differs from the slope of the tangent line by an amount that is
approximately proportional to h. As h approaches zero, the slope of the secant line approaches
the slope of the tangent line. Therefore, the true derivative of f at x is the limit of the value of
the difference quotient as the secant lines get closer and closer to being a tangent line:
Since immediately substituting 0 for h results in division by zero, calculating the derivative
directly can be unintuitive.
A simple three-point estimation is to compute the slope of a nearby secant line through the
points (x-h,f(x-h)) and (x+h,f(x+h)). The slope of this line is
115
More generally, three-point estimation uses the secant line through the points (x h1,f(x h1))
and (x + h2,f(x + h2)). The slope of this line is
The slope of these secant lines differ from the slope of the tangent line by an amount that is
approximately proportional to h2 so that three-point estimation is a more accurate
approximation to the tangent line than two-point estimation when h is small.
For the three-point estimation method, the estimation error is given by:
,
where c is some point between x h and x + h. This error does not include the rounding error.
Practical considerations
An important consideration in practice when the function is approximated using floating point
arithmetic is how small a value of h to choose. If chosen too small, the subtraction will yield a
large rounding error and in fact all the finite difference formulae are ill-conditioned and due to
cancellation will produce a value of zero if h is small enough. If too large, the calculation of
the slope of the secant line will be more accurate, but the estimate of the slope of the tangent
by using the secant could be worse.
A suitable choice for h is sqrt(eps) * x where the machine epsilon eps is typically of the order
2.2e-16. Another important consideration is to make sure that h and x+h are representable in
floating point precision so that the difference between x+h and x is exactly h. This can be
accomplished by placing their values into and out of memory as follows: h = sqrt(eps) * x,
temp = x + h and h = temp x. It may be necessary to declare temp as a volatile variable to
ensure the steps are not undone by compiler optimization.
Higher order methods
Higher order methods for approximating the derivative, as well as methods for higher
derivatives exist. Below is the five point method for the first derivative (five-point stencil in
one dimension).
Slope
In mathematics, the slope or gradient of a line describes its steepness, incline, or grade. A
higher slope value indicates a steeper incline.
116
The slope is defined as the ratio of the "rise" divided by the "run" between two points on a
line, or in other words, the ratio of the altitude change to the horizontal distance between any
two points on the line. Given two points (x1,y1) and (x2,y2) on a line, the slope m of the line is
The slope of a line is defined as the rise over the run, m = y/x.
Through differential calculus, one can calculate the slope of the tangent line to a curve at a
point. The concept of slope applies directly to grades or gradients in geography and civil
engineering. Through trigonometry, the grade m of a road is related to its angle of incline by
Definition
The slope of a line in the plane containing the x and y axes is generally represented by the
letter m, and is defined as the change in the y coordinate divided by the corresponding change
in the x coordinate, between two distinct points on the line. This is described by the following
equation:
(The delta symbol, "", is commonly used in mathematics to mean "difference" or "change".)
Given two points (x1,y1) and (x2,y2), the change in x from one to the other is x2 x1, while the
change in y is y2 y1. Substituting both quantities into the above equation obtains the
following:
117
Note that the way the points are chosen on the line and their order does not matter; the slope
will be the same in each case. Other curves have "accelerating" slopes and one can use
calculus to determine such slopes.
Calculus
The concept of a slope is central to differential calculus. For non-linear functions, the rate of
change varies along the curve. The derivative of the function at a point is the slope of the line
tangent to the curve at the point, and is thus equal to the rate of change of the function at that
point.
If we let x and y be the distances (along the x and y axes, respectively) between two points
on a curve, then the slope given by the above definition,
,
is the slope of a secant line to the curve. For a line, the secant between any two points is the
line itself, but this is not the case for any other type of curve.
For example, the slope of the secant intersecting y = x2 at (0,0) and (3,9) is 3. (The slope of
the tangent at x = 32 is also 3 a consequence of the mean value theorem.)
By moving the two points closer together so that y and x decrease, the secant line more
closely approximates a tangent line to the curve, and as such the slope of the secant
approaches that of the tangent. Using differential calculus, we can determine the limit, or the
value that y/x approaches as y and x get closer to zero; it follows that this limit is the
exact slope of the tangent. If y is dependent on x, then it is sufficient to take the limit where
only x approaches zero. Therefore, the slope of the tangent is the limit of y/x as x
approaches zero. We call this limit the derivative.
Tangent
In geometry, the tangent line (or simply the tangent) to a curve at a given point is the straight
line that "just touches" the curve at that point (in the sense explained more precisely below).
As it passes through the point of tangency, the tangent line is "going in the same direction" as
the curve, and in this sense it is the best straight-line approximation to the curve at that point.
The same definition applies to space curves and curves in n-dimensional Euclidean space.
Tangent to a curve.
118
Similarly, the tangent plane to a surface at a given point is the plane that "just touches" the
surface at that point. The concept of a tangent is one of the most fundamental notions in
differential geometry and has been extensively generalized; (Tangent space).
In trigonometry, tangent is the slope formula that combines the rise and run. The tangent
function takes an angle and tells the slope when the length of the line is 1.
The word "tangent" comes from the Latin tangere, meaning "to touch".
Tangent line to a curve
The intuitive notion that a tangent line "just touches" a curve can be made more explicit by
considering the sequence of straight lines (secant lines) passing through two points, A and B,
those that lie on the function curve. The tangent at A is the limit when point B approximates or
tends to A. The existence and uniqueness of the tangent line depends on a certain type of
mathematical smoothness, known as "differentiability". For example, if two circular arcs meet
at a sharp point (a vertex) then there is no uniquely defined tangent at the vertex because the
limit of the progression of secant lines depends on the direction in which "point B"
approaches the vertex.
119
Analytical approach
The geometric idea of the tangent line as the limit of secant lines serves as the motivation for
analytical methods that are used to find tangent lines explicitly. The question of finding the
tangent line to a graph, or the tangent line problem, was one of the central questions leading
to the development of calculus in the 17th century. In the second book of his Geometry, Ren
Descartes said of the problem of constructing the tangent to a curve, "And I dare say that this
is not only the most useful and most general problem in geometry that I know, but even that I
have ever desired to know".
Intuitive description
Suppose that a curve is given as the graph of a function, y = f(x). To find the tangent line at
the point p = (a, f(a)), consider another nearby point q = (a + h, f(a + h)) on the curve. The
slope of the secant line passing through p and q is equal to the difference quotient
and since the difference in x values (h) is getting smaller or going to zero it can be described
has
As the point q approaches p, which corresponds to making h smaller and smaller, the
difference quotient should approach a certain limiting value k, which is the slope of the
tangent line at the point p. If k is known, the equation of the tangent line can be found in the
point-slope form:
Calculus provides rules for computing the derivatives of functions that are given by formulas,
such as the power function, trigonometric functions, exponential function, logarithm, and their
120
various combinations. Thus, equations of the tangents to graphs of all these functions, as well
as many others, can be found by the methods of calculus.
When the method fails
Calculus also demonstrates that there are functions and points on their graphs for which the
limit determining the slope of the tangent line does not exist. For these points the function f is
non-differentiable. There are two possible reasons for the method of finding the tangents
based on the limits and derivatives to fail: either the geometric tangent exists, but it is a
vertical line, which cannot be given in the point-slope form since it does not have a slope, or
the graph is too badly behaved to admit a geometric tangent.
The graph y = x1/3 illustrates the first possibility: here the difference quotient at a = 0 is equal
to h1/3/h = h2/3, which becomes very large as h approaches 0. The tangent line to this curve at
the origin is vertical.
The graph y = |x| of the absolute value function consists of two straight lines with different
slopes joined at the origin. As a point q approaches the origin from the right, the secant line
always has slope 1. As a point q approaches the origin from the left, the secant line always has
slope 1. Therefore, there is no unique tangent to the graph at the origin (although in a certain
sense, there are two half-tangents, corresponding to two possible directions of approaching
the origin).
Surfaces and higher-dimensional manifolds
The tangent plane to a surface at a given point p is defined in an analogous way to the tangent
line in the case of curves. It is the best approximation of the surface by a plane at p, and can
be obtained as the limiting position of the planes passing through 3 distinct points on the
surface close to p as these points converge to p. More generally, there is a k-dimensional
tangent space at each point of a k-dimensional manifold in the n-dimensional Euclidean space.
Secant line
A secant line of a curve is a line that (locally) sometimes intersects two points on the curve.
The word secant comes from the Latin secare, to cut.
It can be used to approximate the tangent to a curve, at some point f. If the secant to a curve is
defined by two points, P and Q, with P fixed and Q variable, as Q approaches P along the
curve, the direction of the secant approaches that of the tangent at P, assuming there is just
one. As a consequence, one could say that the limit of the secant's slope, or direction, is that
of the tangent. In calculus, this idea is the basis of the geometric definition of the derivative.
A chord is the portion of a secant that lies within the curve.
Derivative
In calculus (a branch of mathematics) the derivative is a measure of how a function changes
as its input changes. Loosely speaking, a derivative can be thought of as how much a quantity
is changing at a given point; for example, the derivative of the position of a vehicle with
respect to time is the instantaneous velocity at which the vehicle is traveling. Conversely, the
integral of the velocity over time is the change in the vehicle's position.
121
The derivative of a function at a chosen input value describes the best linear approximation of
the function near that input value. For a real-valued function of a single real variable, the
derivative at a point equals the slope of the tangent line to the graph of the function at that
point. In higher dimensions, the derivative of a function at a point is a linear transformation
called the linearization. A closely related notion is the differential of a function.
The process of finding a derivative is called differentiation. The fundamental theorem of
calculus states that differentiation is the reverse process to integration.
Differentiation and the derivative
Differentiation is a method to compute the rate at which a dependent output y changes with
respect to the change in the independent input x. This rate of change is called the derivative of
y with respect to x. In more precise language, the dependence of y upon x means that y is a
function of x. This functional relationship is often denoted y = (x), where denotes the
function. If x and y are real numbers, and if the graph of y is plotted against x, the derivative
measures the slope of this graph at each point.
The simplest case is when y is a linear function of x, meaning that the graph of y against x is a
straight line. In this case, y = (x) = m x + c, for real numbers m and c, and the slope m is
given by
where the symbol (the uppercase form of the Greek letter Delta) is an abbreviation for
"change in." This formula is true because
y + y = (x+ x) = m (x + x) + c = m x + c + m x = y + mx.
It follows that y = m x.
This gives an exact value for the slope of a straight line. If the function is not linear (i.e. its
graph is not a straight line), however, then the change in y divided by the change in x varies:
differentiation is a method to find an exact value for this rate of change at any given value of
x.
The idea, illustrated by the Figures below, is to compute the rate of change as the limiting
value of the ratio of the differences y / x as x becomes infinitely small.
In Leibniz's notation, such an infinitesimal change in x is denoted by dx, and the derivative of
y with respect to x is written
suggesting the ratio of two infinitesimal quantities. (The above expression is read as "the
derivative of y with respect to x", "d y by d x", or "d y over d x". The oral form "d y d x" is
often used conversationally, although it may lead to confusion.)
122
The most common approach to turn this intuitive idea into a precise definition uses limits, but
there are other methods, such as non-standard analysis.
The secant to curve y= (x) determined by points (x, (x)) and (x+h, (x+h)).
123
This expression is Newton's difference quotient. The derivative is the value of the difference
quotient as the secant lines approach the tangent line. Formally, the derivative of the function
at a is the limit
of the difference quotient as h approaches zero, if this limit exists. If the limit exists, then is
differentiable at a. Here (a) is one of several common notations for the derivative
(Lagrange's notation).
Equivalently, the derivative satisfies the property that
which has the intuitive interpretation that the tangent line to at a gives the best linear
approximation
to near a (i.e., for small h). This interpretation is the easiest to generalize to other settings.
Substituting 0 for h in the difference quotient causes division by zero, so the slope of the
tangent line cannot be found directly. Instead, define Q(h) to be the difference quotient as a
function of h:
Q(h) is the slope of the secant line between (a, (a)) and (a + h, (a + h)). If is a continuous
function, meaning that its graph is an unbroken curve with no gaps, then Q is a continuous
124
Finite difference
A finite difference is a mathematical expression of the form f(x + b) f(x + a). If a finite
difference is divided by b a, one gets a difference quotient. The approximation of
derivatives by finite differences plays a central role in finite difference methods for the
numerical solution of differential equations, especially boundary value problems.
In mathematical analysis, operators involving finite differences are studied. A difference
operator is an operator which maps a function f to a function whose values are the
corresponding finite differences.
Forward, backward, and central differences
Only three forms are commonly considered: forward, backward, and central differences.
A forward difference is an expression of the form
If h has a fixed (non-zero) value, instead of approaching zero, then the right-hand side is
125
Hence, the forward difference divided by h approximates the derivative when h is small. The
error in this approximation can be derived from Taylor's theorem. Assuming that f is
continuously differentiable, the error is
However, the central difference yields a more accurate approximation. Its error is proportional
to square of the spacing (if f is twice continuously differentiable):
Higher-order differences
In an analogous way one can obtain finite difference approximations to higher order
derivatives and differential operators. For example, by using the above central difference
formula for f'(x + h / 2) and f'(x h / 2) and applying a central difference formula for the
derivative of f' at x, we obtain the central difference approximation of the second derivative of
f:
More generally, the nth-order forward, backward, and central differences are respectively
given by:
Note that the central difference will, for odd n, have h multiplied by non-integers. This is
often a problem because it amounts to changing the interval of discretization. The problem
may be remedied taking the average of n[f](x h / 2) and n[f](x + h / 2).
126
The relationship of these higher-order differences with the respective derivatives is very
straightforward:
approximates f'(x) up to a term of order h2. This can be proven by expanding the above
expression in Taylor series, or by using the calculus of finite differences, explained below.
If necessary, the finite difference can be centered about any point by mixing forward,
backward, and central differences.
Arbitrarily sized kernels
Using a little linear algebra, one can fairly easily construct approximations, which sample an
arbitrary number of points to the left and a (possibly different) number of points to the right of
the center point, for any order of derivative. This involves solving a linear system such that
the Taylor expansion of the sum of those points, around the center point, well approximates
the Taylor expansion of the desired derivative.
This is useful for differentiating a function on a grid, where, as one approaches the edge of the
grid, one must sample fewer and fewer points on one side.
Finite difference methods
An important application of finite differences is in numerical analysis, especially in numerical
differential equations, which aim at the numerical solution of ordinary and partial differential
equations respectively. The idea is to replace the derivatives appearing in the differential
equation by finite differences that approximate them. The resulting methods are called finite
difference methods.
Common applications of the finite difference method are in computational science and
engineering disciplines, such as thermal engineering, fluid mechanics, etc.
Difference operator
In mathematics, a difference operator maps a function, (x), to another function, (x + b)
(x + a).
The forward difference operator
127
occurs frequently in the calculus of finite differences, where it plays a role formally similar to
that of the derivative, but used in discrete circumstances. Difference equations can often be
solved with techniques very similar to those for solving differential equations. This similarity
led to the development of time scale calculus. Analogously we can have the backward
difference operator
When restricted to polynomial functions f, the forward difference operator is a delta operator,
i.e., a shift-equivariant linear operator on polynomials that reduces degree by 1.
n-th difference
The nth forward difference of a function f(x) is given by
which holds for any polynomial function f and for some, but not all, analytic functions. Here,
128
is the "falling factorial" or "lower factorial" and the empty product (x)0 defined to be 1. Note
also the formal similarity of this result to Taylor's theorem; this is one of the observations that
lead to the idea of umbral calculus.
In analysis with p-adic numbers, Mahler's theorem states that the assumption that f is a
polynomial function can be weakened all the way to the assumption that f is merely
continuous.
Carlson's theorem provides necessary and sufficient conditions for a Newton series to be
unique, if it exists. However, a Newton series will not, in general, exist.
The Newton series, together with the Stirling series and the Selberg series, is a special case of
the general difference series, all of which are defined in terms of scaled forward differences.
Taylor's theorem
In calculus, Taylor's theorem gives a sequence of approximations of a differentiable function
around a given point by polynomials (the Taylor polynomials of that function) whose
coefficients depend only on the derivatives of the function at that point. The theorem also
gives precise estimates on the size of the error in the approximation. The theorem is named
after the mathematician Brook Taylor, who stated it in 1712, though the result was first
discovered 41 years earlier in 1671 by James Gregory.
Taylor's theorem in one variable
Motivation
Taylor's theorem asserts that any sufficiently smooth function can locally be approximated by
polynomials. A simple example of application of Taylor's theorem is the approximation of the
exponential function ex near x = 0:
The approximation is called the n-th order Taylor approximation to ex because it approximates
the value of the exponential function by a polynomial of degree n. This approximation only
holds for x close to zero, and as x moves further away from zero, the approximation becomes
worse. The quality of the approximation is controlled by the remainder term:
More generally, Taylor's theorem applies to any sufficiently differentiable function , giving
an approximation, for x near a point a, of the form
129
The remainder term is just the difference of the function and its approximating polynomial
Although an explicit formula for the remainder term is seldom of any use, Taylor's theorem
also provides several ways in which to estimate the value of the remainder. In other words, for
x near enough to a, the remainder ought to be "small"; Taylor's theorem gives information on
precisely how small it actually is.
Statement
The precise statement of the theorem is as follows: If n 0 is an integer and is a function
which is n times continuously differentiable on the closed interval [a, x] and n + 1 times
differentiable on the open interval (a, x), then
Here, n! denotes the factorial of n, and Rn(x) is a remainder term, denoting the difference
between the Taylor polynomial of degree n and the original function. The remainder term
Rn(x) depends on x and is small if x is close enough to a. Several expressions are available for
it.
The Lagrange form of the remainder term states that there exists a number between a and x
such that
This exposes Taylor's theorem as a generalization of the mean value theorem. In fact, the
mean value theorem is used to prove Taylor's theorem with the Lagrange remainder term.
The Cauchy form of the remainder term states that there exists a number between a and x
such that
More generally, if G(t) is a continuous function on [a,x] which is differentiable with nonvanishing derivative on (a,x), then there exists a number between a and x such that
This exposes Taylor's theorem as a generalization of the Cauchy mean value theorem.
130
The above forms are restricted to the case of functions taking real values. However, the
integral form of the remainder term applies as well when the function takes complex values. It
is:
provided, as is often the case, (n) is absolutely continuous on [a, x]. This shows the theorem
to be a generalization of the fundamental theorem of calculus.
In general, a function does not need to be equal to its Taylor series, since it is possible that the
Taylor series does not converge, or that it converges to a different function. However, for
many functions (x), one can show that the remainder term Rn approaches zero as n
approaches . Those functions can be expressed as a Taylor series in a neighbourhood of the
point a and are called analytic.
Taylor's theorem (with the integral formulation of the remainder term) is also valid if the
function has complex values or vector values. Furthermore, there is a version of Taylor's
theorem for functions in several variables. For complex functions analytic in a region
containing a circle C surrounding a and its interior, there is a contour integral expression for
the remainder
valid inside of C.
Estimates of the remainder
Another common version of Taylor's theorem holds on an interval (a r, a + r) where the
variable x is assumed to take its values. This formulation of the theorem has the advantage
that it is often possible to explicitly control the size of the remainder terms, and thus arrive at
an approximation of a function valid in a whole interval with precise bounds on the quality of
the approximation.
A precise version of Taylor's theorem in this form is as follows. Suppose is a function which
is n times continuously differentiable on the closed interval [a r, a + r] and n + 1 times
differentiable on the open interval (a r, a + r). If there exists a positive real constant Mn
such that |(n+1)(x)| Mn for all x (a r, a + r), then
where the remainder function Rn satisfies the inequality (known as Cauchy's estimate):
131
for all x (a r, a + r). This is called a uniform estimate of the error in the Taylor
polynomial centered at a, because it holds uniformly for all x in the interval.
If is infinitely differentiable on [a r, a + r], then positive constants Mn exist for each n = 1,
2, 3, such that | (n+1)(x)| Mn for all x (a r, a + r). If, in addition, it is possible to
select these constants so that
as
then is an analytic function on (a r, a + r). In particular, the remainder term in the Taylor
approximation Rn(x) tends to zero uniformly as n. In other words, an analytic function is
the uniform limit of its Taylor polynomials on an interval.
Taylor's theorem for several variables
Taylor's theorem can be generalized to several variables as follows. Let B be a ball in RN
centered at a point a, and be a real-valued function defined on the closure
having n+1
continuous partial derivatives at every point. Taylor's theorem asserts that for any
,
where the summation extends over multi-indices (this formula uses the multi-index
notation).
The remainder terms satisfy the inequality
for all with || = n + 1. As was the case with one variable, the remainder terms can be
described explicitly.
Taylor series
In mathematics, the Taylor series is a representation of a function as an infinite sum of terms
calculated from the values of its derivatives at a single point. It may be regarded as the limit
of the Taylor polynomials. Taylor series are named after the English mathematician Brook
Taylor. If the series is centered at zero, the series is also called a Maclaurin series, named
after the Scottish mathematician Colin Maclaurin.
Definition
The Taylor series of a real or complex function (x) that is infinitely differentiable in a
neighbourhood of a real or complex number a, is the power series
132
where n! denotes the factorial of n and (n)(a) denotes the nth derivative of evaluated at the
point a; the zeroth derivative of is defined to be itself and (x a) and 0! are both defined
to be 1.
In the particular case where a = 0, the series is also called a Maclaurin series.
Convergence
Taylor series need not in general be convergent. More precisely, the set of functions with a
convergent Taylor series is a meager set in the Frechet space of smooth functions. In spite of
this, for many functions that arise in practice, the Taylor series does converge.
The limit of a convergent Taylor series of a function f need not in general be equal to the
function value f(x), but in practice often it is. For example, the function
is infinitely differentiable at x = 0, and has all derivatives zero there. Consequently, the Taylor
series of f(x) is zero. However, f(x) is not equal to the zero function, and so it is not equal to
its Taylor series.
If f(x) is equal to its Taylor series in a neighborhood of a, it is said to be analytic in this
neighborhood. If f(x) is equal to its Taylor series everywhere it is called entire. The
exponential function ex and the trigonometric functions sine and cosine are examples of entire
functions. Examples of functions that are not entire include the logarithm, the trigonometric
function tangent, and its inverse arctan. For these functions the Taylor series do not converge
if x is far from a.
Taylor series can be used to calculate the value of an entire function in every point, if the
value of the function, and of all of its derivatives, are known at a single point. Uses of the
Taylor series for entire functions include:
1. The partial sums (the Taylor polynomials) of the series can be used as approximations
of the entire function. These approximations are good if sufficiently many terms are
included.
2. The series representation simplifies many mathematical proofs.
133
For example, the function sin(x) may be accurately approximated around the point a = 0 by a
polynomial of degree seven:
The error in this approximation is no more than |x|9/9!. In particular, for 1 < x < 1, the error
is less than 0.000003.
In contrast, the function ln(1 + x) approximated around the point a = 0 by Taylor polynomials
converges to the function only in the region 1 < x 1; outside of this region the higherdegree Taylor polynomials are worse approximations for the function. This is similar to
Runge's phenomenon.
The error incurred in approximating a function by its nth-degree Taylor polynomial, is called
the remainder or residual and is denoted by the function Rn(x). Taylor's theorem can be used
to obtain a bound on the size of the remainder.
Calculation of Taylor series
Several methods exist for the calculation of Taylor series of a large number of functions. One
can attempt to use the Taylor series as-is and generalize the form of the coefficients, or one
can use manipulations such as substitution, multiplication or division, addition or subtraction
of standard Taylor series to construct the Taylor series of a function, by virtue of Taylor series
being power series. In some cases, one can also derive the Taylor series by repeatedly
applying integration by parts. Particularly convenient is the use of computer algebra systems
to calculate Taylor series.
Taylor series in several variables
The Taylor series may also be generalized to functions of more than one variable with
For example, for a function that depends on two variables, x and y, the Taylor series to second
order about the point (a, b) is:
134
where Df(a) is the gradient of f evaluated at x = a and D2f(a) is the Hessian matrix. Applying
the multi-index notation the Taylor series for several variables becomes
which is to be understood as a still more abbreviated multi-index version of the first equation
of this paragraph, again in full analogy to the single variable case.
135
136
where f is a function that maps [t0,) Rd to Rd, and the initial condition y0 Rd is a given
vector.
The above formulation is called an initial value problem (IVP). The Picard-Lindelf theorem
states that there is a unique solution, if f is Lipschitz continuous. In contrast, boundary value
problems (BVPs) specify (components of) the solution y at more than one point. Different
methods need to be used to solve BVPs, for example the shooting method (and its variants) or
global methods like finite differences, galerkin methods, or collocation methods.
Note that we restrict ourselves to first-order differential equations (meaning that only the first
derivative of y appears in the equation, and no higher derivatives). However, a higher-order
equation can easily be converted to a system of first-order equations by introducing extra
variables. For example, the second-order equation y'' = y can be rewritten as two first-order
equations: y' = z and z' = y.
Methods
Two elementary methods are discussed to give the reader a feeling for the subject. After that,
pointers are provided to other methods (which are generally more accurate and efficient). The
methods mentioned here are analysed in the next section.
The Euler method
A brief explanation: From any point on a curve, you can find an approximation of a nearby
point on the curve by moving a short distance along a line tangent to the curve.
137
Rigorous development: Starting with the differential equation (1), we replace the derivative y'
by the finite difference approximation
This formula is usually applied in the following way. We choose a step size h, and we
construct the sequence t0, t1 = t0 + h, t2 = t0 + 2h, We denote by yn a numerical estimate of
the exact solution y(tn). Motivated by (3), we compute these estimates by the following
recursive scheme
This is the Euler method (or forward Euler method, in contrast with the backward Euler
method, to be described below). The method is named after Leonhard Euler who described it
in 1768.
The Euler method is an example of an explicit method. This means that the new value yn+1 is
defined in terms of things that are already known, like yn.
The backward Euler method
If, instead of (2), we use the approximation
The backward Euler method is an implicit method, meaning that we have to solve an equation
to find yn+1. One often uses fixed point iteration or (some modification of) the Newton
Raphson method to achieve this. Of course, it costs time to solve this equation; this cost must
be taken into consideration when one selects the method to use. The advantage of implicit
methods such as (6) is that they are usually more stable for solving a stiff equation, meaning
that a larger step size h can be used.
138
Generalizations
The Euler method is often not accurate enough. In more precise terms, it only has order one
(the concept of order is explained below). This caused mathematicians to look for higherorder methods.
One possibility is to use not only the previously computed value yn to determine yn+1, but to
make the solution depend on more past values. This yields a so-called multistep method.
Perhaps the simplest is the Leapfrog method which is second order and (roughly speaking)
relies on two time values.
Almost all practical multistep methods fall within the family of linear multistep methods,
which have the form
Another possibility is to use more points in the interval [tn,tn+1]. This leads to the family of
RungeKutta methods, named after Carl Runge and Martin Kutta. One of their fourth-order
methods is especially popular.
Both ideas can also be combined. The resulting methods are called general linear methods.
Advanced features
A good implementation of one of these methods for solving an ODE entails more than the
time-stepping formula.
It is often inefficient to use the same step size all the time, so variable step-size methods have
been developed. Usually, the step size is chosen such that the (local) error per step is below
some tolerance level. This means that the methods must also compute an error indicator, an
estimate of the local error.
An extension of this idea is to choose dynamically between different methods of different
orders (this is called a variable order method). Methods based on Richardson extrapolation,
such as the BulirschStoer algorithm, are often used to construct various methods of different
orders.
Other desirable features include:
dense output: cheap numerical approximations for the whole integration interval, and
not only at the points t0, t1, t2, ...
event location: finding the times where, say, a particular function vanishes.
support for parallel computing.
when used for integrating with respect to time, time reversibility.
Alternative methods
Many methods do not fall within the framework discussed here. Some classes of alternative
methods are:
139
multiderivative methods, which use not only the function f but also its derivatives. This
class includes HermiteObreschkoff methods and Fehlberg methods, as well as
methods like the ParkerSochacki method, which compute the coefficients of the
Taylor series of the solution y recursively.
methods for second order ODEs. We said that all higher-order ODEs can be
transformed to first-order ODEs of the form (1). While this is certainly true, it may not
be the best way to proceed. In particular, Nystrm methods work directly with secondorder equations.
geometric integration methods are especially designed for special classes of ODEs
(e.g., symplectic integrators for the solution of Hamiltonian equations). They take care
that the numerical solution respects the underlying structure or geometry of these
classes.
Analysis
Numerical analysis is not only the design of numerical methods, but also their analysis. Three
central concepts in this analysis are:
Convergence
A numerical method is said to be convergent if the numerical solution approaches the exact
solution as the step size h goes to 0. More precisely, we require that for every ODE (1) with a
Lipschitz function f and every t* > 0,
All the methods mentioned above are convergent. In fact, convergence is a condition sine qua
non for any numerical scheme.
Consistency and order
Suppose the numerical method is
The local error of the method is the error committed by one step of the method. That is, it is
the difference between the result given by the method, assuming that no error was made in
earlier steps, and the exact solution:
140
Hence a method is consistent if it has an order greater than 0. The (forward) Euler method (4)
and the backward Euler method (6) introduced above both have order 1, so they are
consistent. Most methods being used in practice attain higher order. Consistency is a
necessary condition for convergence, but not sufficient; for a method to be convergent, it must
be both consistent and zero-stable.
A related concept is the global error, the error sustained in all the steps one needs to reach a
fixed time t. Explicitly, the global error at time t is yN y(t) where N = (tt0)/h. The global
error of a pth order one-step method is O(hp); in particular, such a method is convergent. This
statement is not necessarily true for multi-step methods.
Stability and stiffness
For some differential equations, application of standard methods such as the Euler method,
explicit RungeKutta methods, or multistep methods (e.g., AdamsBashforth methods)
exhibit instability in the solutions, though other methods may produce stable solutions. This
"difficult behaviour" in the equation (which may not necessarily be complex itself) is
described as stiffness, and is often caused by the presence of different time scales in the
underlying problem. Stiff problems are ubiquitous in chemical kinetics, control theory, solid
mechanics, weather prediction, biology, and electronics.
141
boundary value problems are the Sturm-Liouville problems. The analysis of these problems
involves the eigenfunctions of a differential operator.
To be useful in applications, a boundary value problem should be well posed. This means that
given the input to the problem there exists a unique solution, which depends continuously on
the input. Much theoretical work in the field of partial differential equations is devoted to
proving that boundary value problems arising from scientific and engineering applications are
in fact well-posed.
Among the earliest boundary value problems to be studied is the Dirichlet problem, of finding
the harmonic functions (solutions to Laplace's equation); the solution was given by the
Dirichlet's principle.
Shows a region where a differential equation is valid and the associated boundary values.
Initial value problem
A more mathematical way to picture the difference between an initial value problem and a
boundary value problem is that an initial value problem has all of the conditions specified at
the same value of the independent variable in the equation (and that value is at the lower
boundary of the domain, thus the term "initial" value). On the other hand, a boundary value
problem has conditions specified at the extremes of the independent variable. For example, if
the independent variable is time over the domain [0,1], an initial value problem would specify
a value of y(t) and y'(t) at time t = 0, while a boundary value problem would specify values for
y(t) at both t = 0 and t = 1.
If the problem is dependent on both space and time, then instead of specifying the value of the
problem at a given point for all time the data could be given at a given time for all space. For
example, the temperature of an iron bar with one end kept at absolute zero and the other end
at the freezing point of water would be a boundary value problem.
Types of boundary value problems
If the boundary gives a value to the normal derivative of the problem then it is a Neumann
boundary condition. For example, if there is a heater at one end of an iron rod, then energy
would be added at a constant rate but the actual temperature would not be known.
If the boundary gives a value to the problem then it is a Dirichlet boundary condition. For
example, if one end of an iron rod is held at absolute zero, then the value of the problem
would be known at that point in space.
143
If the boundary has the form of a curve or surface that gives a value to the normal derivative
and the problem itself then it is a Cauchy boundary condition.
Aside from the boundary condition, boundary value problems are also classified according to
the type of differential operator involved. For an elliptic operator, one discusses elliptic
boundary value problems. For an hyperbolic operator, one discusses hyperbolic boundary
value problems. These categories are further subdivided into linear and various nonlinear
types.
Euler method
In mathematics and computational science, the Euler method, named after Leonhard Euler, is
a first-order numerical procedure for solving ordinary differential equations (ODEs) with a
given initial value. It is the most basic kind of explicit method for numerical integration for
ordinary differential equations.
Illustration of the Euler method. The unknown curve is in blue, and its polygonal
approximation is in red.
Informal geometrical description
Consider the problem of calculating the shape of an unknown curve which starts at a given
point and satisfies a given differential equation. Here, a differential equation can be thought of
as a formula by which the slope of the tangent line to the curve can be computed at any point
on the curve, once the position of that point has been calculated.
144
The idea is that while the curve is initially unknown, its starting point, which we denote by A0,
is known (see the picture above). Then, from the differential equation, the slope to the curve
at A0 can be computed, and so, the tangent line.
Take a small step along that tangent line up to a point A1. If we pretend that A1 is still on the
curve, the same reasoning as for the point A0 above can be used. After several steps, a
is computed. In general, this curve does not diverge too
polygonal curve
far from the original unknown curve, and the error between the two curves can be made small
if the step size is small enough and the interval of computation is finite (although things are
more complicated for stiff equations, as discussed below).
Derivation
We want to approximate the solution of the initial value problem
by using the first two terms of the Taylor expansion of y, which represents the linear
approximation around the point (t0,y(t0)) . One step of the Euler method from tn to tn+1 = tn + h
is
The Euler method is explicit, i.e. the solution yn + 1 is an explicit function of yi for
While the Euler method integrates a first order ODE, any ODE of order N can be represented
as a first-order ODE in more than one variable by introducing N 1 further variables, y', y'',
..., y(N), and formulating N first order equations in these new variables. The Euler method can
be applied to the vector
order system.
Error
The balls magnitude of the errors arising from the Euler method can be demonstrated by
comparison with a Taylor expansion of y. If we assume that f(t) and y(t) are known exactly at
a time t0, then the Euler method gives the approximate solution at time t0 + h as:
(the second equality holds because y satisfies the differential equation y' = f(t,y)). In
comparison, the Taylor expansion in h about t0 gives:
The error introduced by the Euler method is given by the difference between these equations:
145
For small h, the dominant error per step is proportional to h2. To solve the problem over a
given range of t, the number of steps needed is proportional to 1 / h so it is to be expected that
the total error at the end of the fixed time will be proportional to h (error per step times
number of steps). For this reason, the Euler method is said to be first order. This makes the
Euler method less accurate (for small h) than other higher-order techniques such as RungeKutta methods and linear multistep methods.
The Euler method can also be numerically unstable, especially for stiff equations. This
limitation along with its slow convergence of error with h means that the Euler method
is not often used, except as a simple example of numerical integration.
Illustration of numerical integration for the equation y' = y,y(0) = 1. Blue: the Euler method,
green: the midpoint method, red: the exact solution, y = et. The step size is h = 1.0.
The same illustration for h = 0.25. It is seen that the midpoint method converges faster than
the Euler method.
146
The result is approximations for the value of y(t) at discrete times ti:
ti = t0 + ih
yi = y(ti) = y(t0 + ih)
fi = f(ti,yi)
where h is the time step (sometimes referred to as t).
A linear multistep method uses a linear combination of yi and yi' to calculate the value of y for
the desired current step.
Multistep method will use the previous s steps to calculate the next value. Consequently, the
desired value at the current processing stage is yn + s.
A linear multistep method is a method of the form
where h denotes the step size and f the right-hand side of the differential equation. The
coefficents
and
determine the method. The designer of the method
chooses the coefficients; often, many coefficients are zero. Typically, the designer chooses the
coefficients so they will exactly interpolate y(t) when it is an nth order polynomial.
If the value of bs is nonzero, then the value of yn + s depends on the value of f(tn + s,yn + s).
Consequently, the method is explicit if bs = 0. In that case, the formula can directly compute
yn + s. If
then the method is implicit and the equation for yn + s must be solved. Iterative
methods such as Newton's method are often used to solve the implicit formula.
147
Sometimes an explicit multistep method is used to "predict" the value of yn + s. That value is
then used in an implicit formula to "correct" the value. The result is a Predictor-corrector
method.
Multistep method families
Three families of linear multistep methods are commonly used: AdamsBashforth methods,
AdamsMoulton methods, and the backward differentiation formulas (BDFs).
AdamsBashforth methods
The AdamsBashforth methods are explicit methods. The coefficients are as 1 = 1 and
, while the bj are chosen such that the methods has order s (this
determines the methods uniquely).
The AdamsBashforth methods with s = 1, 2, 3, 4, 5 are:
The coefficients bj can be determined as follows. Use polynomial interpolation to find the
polynomial p of degree s 1 such that
The polynomial p is locally a good approximation of the right-hand side of the differential
equation y' = f(t,y) that is to be solved, so consider the equation y' = p(t) instead. This equation
can be solved exactly; the solution is simply the integral of p. This suggests taking
The AdamsBashforth method arises when the formula for p is substituted. The coefficients bj
turn out to be given by
148
Replacing f(t, y) by its interpolant p incurs an error of order hs, and it follows that the s-step
AdamsBashforth method has indeed order s.
The AdamsBashforth methods were designed by John Couch Adams to solve a differential
equation modelling capillary action due to Francis Bashforth. Bashforth (1883) published his
theory and Adams' numerical method.
AdamsMoulton methods
The AdamsMoulton methods are similar to the AdamsBashforth methods in that they also
. Again the b coefficients are chosen to obtain
have as 1 = 1 and
the highest order possible. However, the AdamsMoulton methods are implicit methods. By
removing the restriction that bs = 0, an s-step AdamsMoulton method can reach order s + 1,
while an s-step AdamsBashforth methods has only order s.
The AdamsMoulton methods with s = 0, 1, 2, 3, 4 are:
The AdamsMoulton methods are solely due to John Couch Adams, like the Adams
Bashforth methods. The name of Forest Ray Moulton became associated with these methods
because he realized that they could be used in tandem with the AdamsBashforth methods as
a predictorcorrector pair (Moulton 1926); Milne (1926) had the same idea. Adams used
Newton's method to solve the implicit equation.
Analysis
The central concepts in the analysis of linear multistep methods, and indeed any numerical
method for differential equations, are convergence, order, and stability.
149
The first question is whether the method is consistent: is the difference equation
a good approximation of the differential equation y' = f(t,y)? More precisely, a multistep
method is consistent if the local error goes to zero as the step size h goes to zero, where the
local error is defined to be the difference between the result yn + s of the method, assuming
that all the previous values
are exact, and the exact solution of the equation
at time tn + s. A computation using Taylor series shows out that a linear multistep method is
consistent if and only if
The s-step AdamsBashforth method has order s, while the s-step AdamsMoulton method
has order s + 1.
These conditions are often formulated using the characteristic polynomials
In terms of these polynomials, the above condition for the method to have order p becomes
In particular, the method is consistent if it has order one, which is the case if (1) = 0 and '(1)
= (1).
If the roots of the characteristic polynomial all have modulus less than or equal to 1 and the
roots of modulus 1 are of multiplicity 1, we say that the root condition is satisfied. The
method is convergent if and only if it is consistent and the root condition is satisfied.
Consequently, a consistent method is stable if and only if this condition is satisfied, and thus
the method is convergent if and only if it is stable.
150
Furthermore, if the method is stable, the method is said to be strongly stable if z = 1 is the
only root of modulus 1. If it is stable and all roots of modulus 1 are not repeated, but there is
more than one such root, it is said to be relatively stable. Note that 1 must be a root; thus
stable methods are always one of these two.
RungeKutta methods
In numerical analysis, the RungeKutta methods are an important family of implicit and
explicit iterative methods for the approximation of solutions of ordinary differential
equations. These techniques were developed around 1900 by the German mathematicians C.
Runge and M.W. Kutta.
The common fourth-order RungeKutta method
One member of the family of RungeKutta methods is so commonly used that it is often
referred to as "RK4", "classical Runge-Kutta method" or simply as "the RungeKutta
method".
Let an initial value problem be specified as follows.
Then, the RK4 method for this problem is given by the following equations:
Thus, the next value (yn + 1) is determined by the present value (yn) plus the product of the size
of the interval (h) and an estimated slope. The slope is a weighted average of slopes:
In averaging the four slopes, greater weight is given to the slopes at the midpoint:
151
The RK4 method is a fourth-order method, meaning that the error per step is on the order of
h5, while the total accumulated error has order h4.
Note that the above formulae are valid for both scalar- and vector-valued functions (i.e., y can
be a vector and f an operator). For example one can integrate Schrdinger's equation using the
Hamiltonian operator as function f.
Explicit RungeKutta methods
The family of explicit RungeKutta methods is a generalization of the RK4 method
mentioned above. It is given by
where
(Note: the above equations have different but equivalent definitions in different texts).
To specify a particular method, one needs to provide the integer s (the number of stages), and
the coefficients aij (for 1 j < i s), bi (for i = 1, 2, ..., s) and ci (for i = 2, 3, ..., s). These data
are usually arranged in a mnemonic device, known as a Butcher tableau (after John C.
Butcher):
0
c2 a21
c3 a31 a32
cs as1 as2
b1 b2
as,s 1
bs 1 bs
There are also accompanying requirements if we require the method to have a certain order p,
meaning that the truncation error is O(hp+1). These can be derived from the definition of the
truncation error itself. For example, a 2-stage method has order 2 if b1 + b2 = 1, b2c2 = 1/2,
and b2a21 = 1/2.
152
where the ki are the same as for the higher order method. Then the error is
which is O(hp). The Butcher Tableau for this kind of method is extended to give the values of
:
0
c2 a21
c3 a31 a32
cs as1 as2
b1 b2
as,s 1
bs 1 bs
The RungeKuttaFehlberg method has two methods of orders 5 and 4. Its extended Butcher
Tableau is:
0
1/4
1/4
3/8
3/32
9/32
439/216
3680/513
-845/4104
1/2
8/27
3544/2565 1859/4104
16/135
25/216
1408/2565 2197/4104
11/40
1/5
However, the simplest adaptive RungeKutta method involves combining the Heun method,
which is order 2, with the Euler method, which is order 1. Its extended Butcher Tableau is:
153
0
1 1
1/2 1/2
1
The error estimate is used to control the stepsize. Other adaptive RungeKutta methods are
the BogackiShampine method (orders 3 and 2), the CashKarp method and the Dormand
Prince method (both with orders 5 and 4).
Implicit RungeKutta methods
The implicit methods are more general than the explicit ones. The distinction shows up in the
Butcher Tableau: for an implicit method, the coefficient matrix aij is not necessarily lower
triangular:
The approximate solution to the initial value problem reflects the greater number of
coefficients:
Due to the fullness of the matrix aij, the evaluation of each ki is now considerably involved
and dependent on the specific function f(t,y). Despite the difficulties, implicit methods are of
great importance due to their high (possibly unconditional) stability, which is especially
important in the solution of partial differential equations. The simplest example of an implicit
RungeKutta method is the backward Euler method:
It can be difficult to make sense of even this simple implicit method, as seen from the
expression for k1:
154
In this case, the awkward expression above can be simplified by noting that
so that
from which
follows. Though simpler then the "raw" representation before manipulation, this is an implicit
relation so that the actual solution is problem dependent. Multistep implicit methods have
been used with success by some researchers. The combination of stability, higher order
accuracy with fewer steps, and stepping that depends only on the previous value makes them
attractive; however the complicated problem-specific implementation and the fact that ki must
often be approximated iteratively means that they are not common.
RungeKuttaFehlberg method
In mathematics, the RungeKuttaFehlberg method (or Fehlberg method) is an algorithm of
numerical analysis for the numerical solution of ordinary differential equations. It was
developed by the German mathematician Erwin Fehlberg and is based on the class of Runge
Kutta methods. The RungeKuttaFehlberg method uses an O(h4) method together with an
O(h5) method that uses all of the points of the O(h4) method, and hence is often referred to as
an RKF45 method. Similar schemes with different orders have since been developed. By
performing one extra calculation that would be required for an RK5 method, the error in the
solution can be estimated and controlled and an appropriate step size can be determined
automatically, making this method efficient for ordinary problems of automated numerical
integration of ordinary differential equations.
The Butcher tableau is:
0
1/4
1/4
3/8
3/32
9/32
439/216
3680/513
845/4104
1/2
-8/27
3544/2565 1859/4104
11/40
25/216
1408/2565 2197/4104
1/5
16/135
The first row of coefficients gives the fourth-order accurate method, and the second row gives
the fifth-order accurate method.
155
BulirschStoer algorithm
In numerical analysis, the BulirschStoer algorithm is a method for the numerical solution of
ordinary differential equations which combines three powerful ideas, Richardson
extrapolation, the use of rational function extrapolation in Richardson-type applications, and
the modified midpoint method, to obtain numerical solutions to ordinary differential equations
(ODEs) with high accuracy and comparatively little computational effort. It is named after
Roland Bulirsch and Josef Stoer. It is sometimes called the GraggBulirschStoer (GBS)
algorithm because of the importance of a result about the error function of the modified
midpoint method, due to William B. Gragg.
Underlying ideas
The idea of Richardson extrapolation is to consider a numerical calculation whose accuracy
depends on the used stepsize h as an (unknown) analytic function of the stepsize h,
performing the numerical calculation with various values of h, fitting a (chosen) analytic
function to the resulting points, and then evaluating the fitting function for h = 0, thus trying
to approximate the result of the calculation with infinitely fine steps.
Bulirsch and Stoer recognized that using rational functions as fitting functions for Richardson
extrapolation in numerical integration is superior to using polynomial functions because
rational functions are able to approximate functions with poles rather well (compared to
polynomial functions), given that there are enough higher-power terms in the denominator to
account for nearby poles. While a polynomial interpolation or extrapolation only yields good
results if the nearest pole is rather far outside a circle around the known data points in the
complex plane, rational function interpolation or extrapolation can have remarkable accuracy
even in the presence of nearby poles.
The modified midpoint method by itself is a second-order method, and therefore generally
inferior to fourth-order methods like the fourth-order RungeKutta method. However, it has
the advantage of requiring only one derivative evaluation per substep (asymptotically for a
large number of substeps), and, in addition, as discovered by Gragg, the error of a modified
midpoint step of size H, consisting of n substeps of size h = H/n each, and expressed as a
power series in h, contains only even powers of h. This makes the modified midpoint method
extremely useful to the BulirschStoer method as the accuracy increases two orders at a time
when the results of separate attempts to cross the interval H with increasing numbers of
substeps are combined.
Shooting method
In numerical analysis, the shooting method is a method for solving a boundary value problem
by reducing it to the solution of an initial value problem. The following exposition may be
clarified by this illustration of the shooting method.
For a boundary value problem of a second-order ordinary differential equation, the method is
stated as follows. Let
be the boundary value problem. Let y(t; a) denote the solution of the initial value problem
156
Define the function F(a) as the difference between y(t1; a) and the specified boundary value
y1.
If the boundary value problem has a solution, then F has a root, and that root is just the value
of y'(t0) which yields a solution y(t) of the boundary value problem.
The usual methods for finding roots may be employed here, such as the bisection method or
Newton's method.
Linear shooting method
The boundary value problem is linear if f has the form
In this case, the solution to the boundary value problem is usually given by:
Galerkin method
In mathematics, in the area of numerical analysis, Galerkin methods are a class of methods for
converting a continuous operator problem (such as a differential equation) to a discrete
problem. In principle, it is the equivalent of applying the method of variation to a function
space, by converting the equation to a weak formulation. Typically one then applies some
constraints on the functions space to characterize the space with a finite set of basis functions.
Often when using a Galerkin method one also gives the name along with typical
approximation methods used, such as Petrov-Galerkin method or Ritz-Galerkin method.
The approach is credited to the Russian mathematician Boris Galerkin.
Examples for Galerkin methods are:
, a(u,v) = f(v).
Galerkin discretization
Choose a subspace
Find
, a(un,vn) = f(vn).
We call this the Galerkin equation. Notice that the equation has remained unchanged and only
the spaces have changed.
Galerkin orthogonality
The key property of the Galerkin approach is that the error is orthogonal to the chosen
, we can use vn as a test vector in the original equation. Subtracting
subspaces. Since
the two, we get the Galerkin orthogonality relation for the error, en = u un which is the error
between the solution of the original problem, u, and the solution of the Galerkin equation, un
a(en,vn) = a(u,vn) a(un,vn) = f(vn) f(vn) = 0.
Matrix form
Since the aim of Galerkin's method is the production of a linear system of equations, we build
its matrix form, which can be used to compute the solution by a computer program.
Let
be a basis for Vn. Then, it is sufficient to use these in turn for testing the
such that
Galerkin equation, i.e.: find
158
holds
for some constant C > 0
holds
for some constant c > 0
By the Lax-Milgram theorem, these two conditions imply well-posedness of the original
problem in weak formulation. All norms in the following sections will be norms for which the
above inequalities hold (these norms are often called energy norm).
Well-posedness of the Galerkin equation
Since
, boundedness and ellipticity of the bilinear form apply to Vn. Therefore, the
well-posedness of the Galerkin problem is actually inherited from the well-posedness of the
original problem.
159
This means, that up to the constant C / c, the Galerkin solution un is as close to the original
solution u as any other vector in Vn. In particular, it will be sufficient to study approximation
by spaces Vn, completely forgetting about the equation being solved.
Proof
Since the proof is very simple and the basic principle behind all Galerkin methods, we include
it here: by ellipticity and boundedness of the bilinear form (inequalities) and Galerkin
orthogonality (equals sign in the middle), we have for arbitrary
:
Dividing by
and taking the infimum over all possible vn yields the lemma.
Collocation method
In mathematics, a collocation method is a method for the numerical solution of ordinary
differential equations, partial differential equations and integral equations. The idea is to
choose a finite-dimensional space of candidate solutions (usually, polynomials up to a certain
degree) and a number of points in the domain (called collocation points), and to select that
solution which satisfies the given equation at the collocation points.
Ordinary differential equations
Suppose that the ordinary differential equation
is to be solved over the interval [t0, t0 + h]. Denote the collocation points by 0 c1< c2< <
cn 1.
The corresponding (polynomial) collocation method approximates the solution y by the
polynomial p of degree n which satisfies the initial condition p(t0) = y0, and the differential
equation p'(t) = f(t,p(t)) at all points t = t0 + ckh where k = 1, , n. This gives n + 1 conditions,
which matches the n + 1 parameters needed to specify a polynomial of degree n.
All these collocation methods are in fact implicit RungeKutta methods. However, not all
RungeKutta methods are collocation methods.
160
The finite difference method, in which functions are represented by their values at
certain grid points and derivatives are approximated through differences in these
values.
The method of lines, where all but one variable is discretized. The result is a system of
ODEs in the remaining continuous variable.
The finite element method, where functions are represented in terms of basis functions
and the PDE is solved in its integral (weak) form.
The finite volume method, which divides space into regions or volumes and computes
the change within each volume by considering the flux (flow rate) across the surfaces
of the volume.
The spectral method, which represents functions as a sum of particular basis functions,
for example using a Fourier series.
Meshfree methods don't need a grid to work and so may be better suited for some
problems. However the computational effort is usually higher.
Domain decomposition methods solve a boundary value problems by splitting them
into smaller boundary value problems on subdomains and iterating to coordinate the
solution between the subdomains.
Multigrid methods solve differential equations using a hierarchy of discretizations.
The finite difference method is often regarded as the simplest method to learn and use. The
finite element and finite volume methods are widely used in engineering and in computational
fluid dynamics, and are well suited to problems in complicated geometries. Spectral methods
are generally the most accurate, provided that the solutions are sufficiently smooth.
161
for some small value of h. In fact, this is the forward difference equation for the first
derivative. Using this and similar formulae to replace derivative expressions in differential
equations, one can approximate their solutions without the need for calculus.
Derivation from Taylor's polynomial
Assuming the function whose derivatives are to be approximated is properly-behaved, by
Taylor's theorem,
where n! denotes the factorial of n, and Rn(x) is a remainder term, denoting the difference
between the Taylor polynomial of degree n and the original function. Again using the first
derivative of the function f as an example, by Taylor's theorem,
f(x0 + h) = f(x0) + f'(x0)h + R1(x),
which, with some minor algebraic manipulation, is equivalent to
162
and further noting that the quantity on the left is the approximation from the finite difference
method and that the quantity on the right is the exact quantity of interest plus a remainder,
clearly that remainder is the local truncation error. A final expression of this example and its
order is:
This means that, in this case, the local truncation error is proportional to the step size.
163
Method of lines
The method of lines (MOL, NMOL, NUMOL) is a technique for solving partial differential
equations (PDEs) in which all but one dimension is discretized. MOL allows standard,
general-purpose methods and software, developed for the numerical integration of ODEs and
DAEs, to be used. A large number of integration routines have been developed over the years
in many different programming languages, and some have been published as open source
resources.
The method of lines most often refers to the construction or analysis of numerical methods for
partial differential equations that proceeds by first discretizing the spatial derivatives only and
leaving the time variable continuous. This leads to a system of ordinary differential equations
to which a numerical method for initial value ordinary equations can be applied. The method
of lines in this context dates back to at least the early 1960s Sarmin and Chudov. Many papers
discussing the accuracy and stability of the method of lines for various types of partial
differential equations have appeared since. W. E. Schiesser of Lehigh University is one of the
major proponents of the method of lines, having published widely in this field.
Application to elliptical equations
MOL requires that the PDE problem is well-posed as an initial value (Cauchy) problem in at
least one dimension, because ODE and DAE integrators are initial value problem (IVP)
solvers.
Thus it cannot be used directly on purely elliptic equations, such as Laplace's equation.
However, MOL has been used to solve Laplace's equation by using the method of false
transients. In this method, a time derivative of the dependent variable is added to Laplaces
equation. Finite differences are then used to approximate the spatial derivatives, and the
resulting system of equations is solved by MOL. It is also possible to solve elliptical problems
by a semi-analytical method of lines. In this method the discretization process results in a set
of ODE's that are solved by exploiting properties of the associated exponential matrix.
reduce it in its rear (thus reducing cost of the simulation); Another example would be the
simulation of the weather pattern on Earth, where it is more important to have accurate
predictions over land than over the wide-open sea.
Technical discussion
We will illustrate the finite element method using two sample problems from which the
general method can be extrapolated. It is assumed that the reader is familiar with calculus and
linear algebra.
P1 is a one-dimensional problem
where f is given and u is an unknown function of x, and u'' is the second derivative of u with
respect to x.
The two-dimensional sample problem is the Dirichlet problem
is "nice" (e.g., a
where is a connected open region in the (x,y) plane whose boundary
smooth manifold or a polygon), and uxx and uyy denote the second derivatives with respect to x
and y, respectively.
The problem P1 can be solved "directly" by computing antiderivatives. However, this method
of solving the boundary value problem works only when there is only one spatial dimension
and does not generalize to higher-dimensional problems or to problems like u + u'' = f. For
this reason, we will develop the finite element method for P1 and outline its generalization to
P2.
Our explanation will proceed in two steps, which mirror two essential steps one must take to
solve a boundary value problem (BVP) using the FEM.
In the first step, one rephrases the original BVP in its weak, or variational form. Little
to no computation is usually required for this step. The transformation is done by hand
on paper.
The second step is the discretization, where the weak form is discretized in a finite
dimensional space.
After this second step, we have concrete formulae for a large but finite dimensional linear
problem whose solution will approximately solve the original BVP. This finite dimensional
problem is then implemented on a computer.
165
Variational formulation
The first step is to convert P1 and P2 into their variational equivalents, or Weak formulation.
If u solves P1, then for any smooth function v that satisfies the displacement boundary
conditions, i.e. v = 0 at x = 0 and x = 1,we have
(1)
Conversely, if u with u(0) = u(1) = 0 satisfies (1) for every smooth function v(x) then one may
show that this u will solve P1. The proof is easier for twice continuously differentiable u
(mean value theorem), but may be proved in a distributional sense as well.
By using integration by parts on the right-hand-side of (1), we obtain
(2)
where we have used the assumption that v(0) = v(1) = 0.
A proof outline of existence and uniqueness of the solution
We can loosely think of
to be the absolutely continuous functions of (0,1) that are
0 at x = 0 and x = 1 (Sobolev spaces). Such function are (weakly) "once differentiable" and it
then defines an inner product which turns
turns out that the symmetric bilinear map
into a Hilbert space (a detailed proof is nontrivial.) On the other hand, the lefthand-side
is also an inner product, this time on the Lp space L2(0,1). An
application of the Riesz representation theorem for Hilbert spaces shows that there is a unique
u solving (2) and therefore P1. This solution is a-priori only a member of
using elliptic regularity, will be smooth if f is.
, but
166
where
denotes the gradient and denotes the dot product in the two-dimensional plane.
Once more
of "once
such that
such that
167
, i.e.,
for k = 1,...,n; this basis is a shifted and scaled tent function. For the two-dimensional case, we
choose again one basis function vk per vertex xk of the triangulation of the planar region .
The function vk is the unique function of V whose value is 1 at xk and 0 at every
Basis functions vk (blue) and a linear combination of them, which is piecewise linear (red).
168
Depending on the author, the word "element" in "finite element method" refers either to the
triangles in the domain, the piecewise linear basis function, or both. So for instance, an author
interested in curved domains might replace the triangles with curved primitives, and so might
describe the elements as being curvilinear. On the other hand, some authors replace
"piecewise linear" by "piecewise quadratic" or even "piecewise polynomial". The author
might then say "higher order element" instead of "higher degree polynomial". Finite element
method is not restricted to triangles (or tetrahedra in 3-d, or higher order simplexes in
multidimensional spaces), but can be defined on quadrilateral subdomains (hexahedra, prisms,
or pyramids in 3-d, and so on). Higher order shapes (curvilinear elements) can be defined with
polynomial and even non-polynomial shapes (e.g. ellipse or circle).
Examples of methods that use higher degree piecewise polynomial basis functions are the hpFEM and spectral FEM.
More advanced implementations (adaptive finite element methods) utilize a method to assess
the quality of the results (based on error estimation theory) and modify the mesh during the
solution aiming to achieve approximate solution within some bounds from the 'exact' solution
of the continuum problem. Mesh adaptivity may utilize various techniques, the most popular
are:
and
Similarly, in the planar case, if xj and xk do not share an edge of the triangulation, then the
integrals
and
169
If we write
and
for j = 1,...,n.
If we denote by
and
(4)
the column vectors (u1,...,un)t and (f1,...,fn)t, and if we let L = (Lij) and
then we may
(5)
As we have discussed before, most of the entries of L and M are zero because the basis
functions vk have small support. So we now have to solve a linear system in the unknown
where most of the entries of the matrix L, which we need to invert, are zero.
Such matrices are known as sparse matrices, and there are efficient solvers for such problems
(much more efficient than actually inverting the matrix.) In addition, L is symmetric and
positive definite, so a technique such as the conjugate gradient method is favored. For
problems that are not too large, sparse LU decompositions and Cholesky decompositions still
work well. For instance, Matlab's backslash operator (which uses sparse LU, sparse Cholesky,
and other factorization methods) can be sufficient for meshes with a hundred thousand
vertices.
The matrix L is usually referred to as the stiffness matrix, while the matrix M is dubbed the
mass matrix.
General form of the finite element method
In general, the finite element method is characterized by the following process.
One chooses a grid for . In the preceding treatment, the grid consisted of triangles,
but one can also use squares or curvilinear polygons.
Then, one chooses basis functions. In our discussion, we used piecewise linear basis
functions, but it is also common to use piecewise polynomial basis functions.
A separate consideration is the smoothness of the basis functions. For second order elliptic
boundary value problems, piecewise polynomial basis function that are merely continuous
suffice (i.e., the derivatives are discontinuous.) For higher order partial differential equations,
170
one must use smoother basis functions. For instance, for a fourth order problem such as uxxxx +
uyyyy = f, one may use piecewise quadratic basis functions that are C1.
Another consideration is the relation of the finite dimensional space V to its infinite
dimensional counterpart, in the examples above
. A conforming element method is one in
which the space V is a subspace of the element space for the continuous problem. The
example above is such a method. If this condition is not satisfied, we obtain a nonconforming
element method, an example of which is the space of piecewise linear functions over the mesh
which are continuous at each edge midpoint. Since these functions are in general
discontinuous along the edges, this finite dimensional space is not a subspace of the original
.
Typically, one has an algorithm for taking a given mesh and subdividing it. If the main
method for increasing precision is to subdivide the mesh, one has an h-method (h is
customarily the diameter of the largest element in the mesh.) In this manner, if one shows that
the error with a grid h is bounded above by Chp, for some
and p > 0, then one has an
order p method. Under certain hypotheses (for instance, if the domain is convex), a piecewise
polynomial of order d method will have an error of order p = d + 1.
If instead of making h smaller, one increases the degree of the polynomials used in the basis
function, one has a p-method. If one combines these two refinement types, one obtains an hpmethod (hp-FEM). In the hp-FEM, the polynomial degrees can vary from element to element.
High order methods with large uniform p are called spectral finite element methods (SFEM).
These are not to be confused with spectral methods.
For vector partial differential equations, the basis functions may take values in
The most attractive feature of the FEM is its ability to handle complicated geometries
(and boundaries) with relative ease. While FDM in its basic form is restricted to
handle rectangular shapes and simple alterations thereof, the handling of geometries in
FEM is theoretically straightforward.
The most attractive feature of finite differences is that it can be very easy to
implement.
There are several ways one could consider the FDM a special case of the FEM
approach. One might choose basis functions as either piecewise constant functions or
Dirac delta functions. In both approaches, the approximations are defined on the entire
domain, but need not be continuous. Alternatively, one might define the function on a
discrete domain, with the result that the continuous differential operator no longer
makes sense, however this approach is not FEM.
171
There are reasons to consider the mathematical foundation of the finite element
approximation more sound, for instance, because the quality of the approximation
between grid points is poor in FDM.
The quality of a FEM approximation is often higher than in the corresponding FDM
approach, but this is extremely problem dependent and several examples to the
contrary can be provided.
Generally, FEM is the method of choice in all types of analysis in structural mechanics (i.e.
solving for deformation and stresses in solid bodies or dynamics of structures) while
computational fluid dynamics (CFD) tends to use FDM or other methods like finite volume
method (FVM). CFD problems usually require discretization of the problem into a large
number of cells/gridpoints (millions and more), therefore cost of the solution favors simpler,
lower order approximation within each cell. This is especially true for 'external flow'
problems, like air flow around the car or airplane, or weather simulation in a large area.
Various types of finite element methods
There are various types of finite element methods: Generalized finite element method, hpFEM, XFEM, Spectral methods, Meshfree methods, and Discontinuous Galerkin methods.
Generalized finite element method
The Generalized Finite Element Method (FEM) uses local spaces consisting of functions, not
necessarily polynomials, that reflect the available information on the unknown solution and
thus ensure good local approximation. Then a partition of unity is used to bond these spaces
together to form the approximating subspace. The effectiveness of GFEM has been shown
when applied to problems with domains having complicated boundaries, problems with
micro-scales, and problems with boundary layers.
hp-FEM
The hp-FEM combines adaptively elements with variable size h and polynomial degree p in
order to achieve exceptionally fast, exponential convergence rates.
172
where C is independent from N and s is no larger than the degree of the piecewise polynomial
basis. As we increase N, we can also increase the degree of the basis functions. In this case, if
u is an analytic function:
G-NI or SEM-NI: these are the most used spectral methods. The Galerkin formulation
of spectral methods or spectral element methods, for G-NI or SEM-NI respectively, is
modified and Gaussian numerical integration is used instead of integrals in the
definition of the bilinear form
and in the functional F. These method are a
family of PetrovGalerkin methods their convergence is a consequence of Strang's
lemma.
The spectral element method uses tensor product space spanned by nodal basis
functions associated with GaussLobatto points. In contrast, the p-version finite
element method spans a space of high order polynomials by nodeless basis functions,
chosen approximately orthogonal for numerical stability. Since not all interior basis
functions need to be present, the p-version finite element method can create a space
that that contains all polynomials up to a given degree with many fewer degrees of
173
freedom. However, some speedup techniques possible in spectral methods due to their
tensor-product character are no longer available. The name p-version means that
accuracy is increased by increasing the order of the approximating polynomials (thus,
p) rather than decreasing the mesh size, h.
The hp finite element method (hp-FEM) combines the advantages of the h and p
refinements to obtain extremely fast, exponential convergence rates.
Spectral method
Spectral methods are a class of techniques used in applied mathematics and scientific
computing to numerically solve certain partial differential equations (PDEs), often involving
the use of the Fast Fourier Transform. Where applicable, spectral methods have excellent
error properties, with the so called "exponential convergence" being the fastest possible.
PDEs describe a wide array of physical processes such as heat conduction, fluid flow, and
sound propagation. In many such equations, there are underlying "basic waves" that can be
used to give efficient algorithms for computing solutions to these PDEs. In a typical case,
spectral methods take advantage of this fact by writing the solution as its Fourier series,
substituting this series into the PDE to get a system of ordinary differential equations (ODEs)
in the time-dependent coefficients of the trigonometric terms in the series (written in complex
exponential form), and using a time-stepping method to solve those ODEs.
The spectral method and the finite element method are closely related and built on the same
ideas; the main difference between them is that the spectral method approximates the solution
as linear combination of continuous functions that are generally nonzero over the domain of
solution (usually sinusoids or Chebyshev polynomials), while the finite element method
approximates the solution as a linear combination of piecewise functions that are nonzero on
small subdomains. Because of this, the spectral method takes on a global approach while the
finite element method is a local approach. This is part of why the spectral method works best
when the solution is smooth. In fact there are no known three-dimensional single domain
spectral shock capturing results.
In the finite element community, a method where the degree of the elements is very high or
increases as the grid parameter h decreases to zero is sometimes called a spectral element
method.
The implementation of the spectral method is normally accomplished either with collocation
or a Galerkin approach.
A relationship with the spectral element method
One can show that if g is infinitely differentiable, then the numerical algorithm using Fast
Fourier Transforms will converge faster than any polynomial in the grid size h. That is, for
such that the error is less than Chn for all sufficiently small
any n > 0, there is a
values of h. We say that the spectral method is of order n, for every n > 0.
Because a spectral element method is a finite element method of very high order, there is a
similarity in the convergence properties. However, whereas the spectral method is based on
the eigendecomposition of the particular boundary value problem, the spectral element
174
method does not use that information and works for arbitrary elliptic boundary value
problems.
Meshfree methods
Meshfree methods are a particular class of numerical simulation algorithms for the simulation
of physical phenomena. Traditional simulation algorithms relied on a grid or a mesh,
meshfree methods in contrast use the geometry of the simulated object directly for
calculations. Meshfree methods exist for fluid dynamics as well as for solid mechanics. Some
methods are able to handle both cases.
Description
Meshfree methods eliminate some or all of the traditional mesh-based view of the
computational domain and rely on a particle (either Lagrangian or Eulerian) view of the field
problem.
A goal of meshfree methods is to facilitate the simulation of increasingly demanding
problems that require the ability to treat large deformations, advanced materials, complex
geometry, nonlinear material behavior, discontinuities and singularities. For example the
melting of a solid or the freezing process can be simulated using meshfree methods.
There is also an additional 'sales' oriented aspect of this name. Meshfree (or 'meshless' as this
is also used) methods seem attractive as alternative to finite element method (FEM) for the
general engineering community, which consider the process of generating finite element
meshes as more difficult and expensive than the remainder of analysis process.
History
One of the earlier methods without a mesh is smoothed particle hydrodynamics, presented in
1977.
List of methods and acronyms
The following numerical methods are generally considered to fall within the general class of
"meshfree" methods. Acronyms are provided in parentheses.
Related methods:
Moving least squares (MLS) - provide general approximation method for arbitrary set
of nodes
Partition of unity methods (PoUM) - provide general approximation formulation used
in some meshfree methods
Continuous blending method (enrichment and coupling of finite elements and
meshless methods)
eXtended FEM, Generalized FEM (XFEM, GFEM) - variants of FEM (finite element
method) combining some meshless aspects
Local Maximum-Entropy (LME)
Space-Time Meshfree Collocation Method (STMCM)
Mesh-free radial point interpolation method (RPIM)
judiciously chosen to preserve the accuracy of the solution. In the engineering practice in the
finite element method, continuity of solutions between non-matching subdomains is
implemented by multiple-point constraints.
Multigrid method
Multigrid (MG) methods in numerical analysis are a group of algorithms for solving
differential equations using a hierarchy of discretizations. The main idea of multigrid is to
accelerate the convergence of a base iterative method by correcting, from time to time, the
solution globally by solving a coarse problem. This idea is similar to extrapolation between
coarser and finer grids. The typical application for multigrid is in the numerical solution of
elliptic partial differential equations in two or more dimensions.
Multigrid methods can be applied in combination with any of the common discretization
techniques. In these cases, multigrid methods are among the fastest solution techniques
known today. In contrast to other methods, multigrid methods are general in that they can
treat arbitrary regions and boundary conditions. They do not depend on the separability of the
equations or other special properties of the equation. They are also directly applicable to
more-complicated non-symmetric and nonlinear systems of equations, like the Lam system
of elasticity or the Navier-Stokes equations.
Multigrid methods can be generalized in many different ways. They can be applied naturally
in a time-stepping solution of parabolic partial differential equations, or they can be applied
directly to time-dependent partial differential equations. Research on multilevel techniques for
hyperbolic partial differential equations is under way. Multigrid methods can also be applied
to integral equations, or for problems in statistical physics.
Other extensions of multigrid methods include techniques where no partial differential
equation nor geometrical problem background is used to construct the multilevel hierarchy.
Such algebraic multigrid methods (AMG) construct their hierarchy of operators directly from
the system matrix and thus become true black-box solvers for sparse matrices.
The finite element method may be recast as a multigrid method by choosing linear wavelets as
the basis.
Algorithm
There are many variations of multigrid algorithms, but the common features are that a
hierarchy of discretizations (grids) is considered. The important steps are:
Smoothing reducing high frequency errors, for example using a few iterations of the
GaussSeidel method.
Restriction downsampling the residual error to a coarser grid.
Prolongation interpolating a correction computed on a coarser grid into a finer grid.
Computational cost
This approach has the advantage over other methods that it often scales linearly with the
number of discrete nodes used. That is: It can solve these problems to a given accuracy in a
number of operations that is proportional to the number of unknowns.
177
Assume that one has a differential equation which can be solved approximately (with a given
accuracy) on a grid i with a given grid point density Ni. Assume furthermore that a solution on
any grid Ni may be obtained with a given effort Wi = KNi from a solution on a coarser grid i
+ 1. Here, = Nj + 1 / Nj < 1 is the ratio of grid points on "neighboring" grids and is assumed to
be constant throughout the grid hierarchy, and K is some constant modeling the effort of
computing the result for one grid point.
The following recurrence relation is then obtained for the effort of obtaining the solution on
grid k:
Wk = Wk + 1 + KNk
And in particular, we find for the finest grid N1 that
W1 = W2 + KN1
Combining these two expressions (and using Nk = k 1N1) gives
178
In 1996, two engineers used DG to solve the Stokes equations, which is 2nd order. This
motivated mathematicians to look deep in DG and the local discontinuous Galerkin method
was generated. By suitably choosing the numerical flux, DG has been extended to solve
higher order equations.
hp-FEM
The hp-FEM is a general version of the finite element method (FEM), a numerical method for
solving partial differential equations based on piecewise-polynomial approximations that
employs elements of variable size (h) and polynomial degree (p). The origins of hp-FEM date
179
back to the pioneering work of Ivo Babuska et al. who discovered that the finite element
method converges exponentially fast when the mesh is refined using a suitable combination of
h-refinements (dividing elements into smaller ones) and p-refinements (increasing their
polynomial degree). The exponential convergence makes the method a very attractive choice
compared to most other finite element methods which only converge with an algebraic rate.
The exponential convergence of the hp-FEM was not only predicted theoretically but also
observed by numerous independent researchers.
Differences from standard FEM
The hp-FEM differs from the standard (lowest-order) FEM in many aspects.
180
Efficiency
Smooth functions can be approximated much more efficiently using large high-order elements
than small piecewise-linear ones. This is illustrated in the figure below, where a 1D Poisson
equation with zero Dirichlet boundary conditions is solved on two different meshes. The exact
solution is the sin function.
181
hp-adaptivity
Some FEM sites describe hp-adaptivity as a combination of h-adaptivity (splitting elements in
space while keeping their polynomial degree fixed) and p-adaptivity (only increasing their
polynomial degree). This is not entirely accurate. The hp-adaptivity is significantly different
from both h- and p-adaptivity since the hp-refinement of an element can be done in many
different ways. Besides a p-refinement, the element can be subdivided in space (as in hadaptivity), but there are many combinations for the polynomial degrees on the subelements.
This is illustrated in the figure on the right. For example, if a triangular or quadrilateral
element is subdivided into four subelements where the polynomial degrees are allowed to
vary by at most two, then this yields 3^4 = 81 refinement candidates (not considering
polynomially anisotropic candidates). Analogously, splitting a hexahedron into eight
subelements and varying their polynomial degrees by at most two yields 3^8 = 6,561
refinement candidates. Clearly, standard FEM error estimates providing one constant number
per element are not enough to guide automatic hp-adaptivity.
182