Module II: Solutions of Nonlinear Equations
Amlan Dutta∗
Department of Metallurgical and Materials Engineering,
Indian Institute of Technology Kharagpur, West Bengal 721302, India
(Dated: August 13, 2025)
1
Abstract
A general nonlinear equation may be expressed as, f (x) = 0, and its solution reduces to finding
the roots of the nonlinear function, f . There are numerous situations, where analytical solutions are
not feasible. A typical example is finding roots of a polynomial of degree exceeding four. In such
situations, we rely on various numerical algorithms to find the solutions to nonlinear problems.
Usually, such solutions are not exact but are found to a certain degree of numerical accuracy.
However, they are invaluable tools in solving many practical scientific and technical problems.
The present module introduces the students to a several commonly used numerical schemes for
solving the nonlinear equations. While most of them are demonstrated for single-variable problems,
methods of solving multi-variable equations are also discussed at the end.
1. BISECTION METHOD
This is a simple yet powerful method of numerically solving generic nonlinear equations.
The foundation of this technique is the so-called Bolzano’s theorem, which in turn, is a
special case of the intermediate-value theorem. According to Bolzano’s theorem:
if f is a continuous function in [a, b] such that f (a)f (b) < 0, there exists at least one
zero of f in [a, b].
The bisection method has two main steps; initial bracketing, followed by iterative narrow-
ing. We begin by choosing an initial interval, [a0 , b0 ], such that it is guaranteed to contain
at least one root (bracketing). Subsequently, we iteratively narrow down this bracketing
interval to arbitrary precision such that the gradually narrowing interval always contains at
least one of the roots. The following algorithm implements this method:
(1) Initialization:
We have an initial interval [a, b] such that f (a)f (b) < 0. We set,
a(0) = a; b(0) = b;
a0 +b0
I (0) = [a0 , b0 ]; x(0) = 2
.
∗
[Link]@[Link]
2
(2) Iterative narrowing:
At each iteration k ≥ 1, we select the sub-interval, I (k) ≡ [a(k) , b(k) ] ⊂ I (k−1) as
follows:
if f (x(k−1) ) = 0, then α ← x(k−1) ; terminate
else
if f (a(k−1) )f (x(k−1) ) < 0
set a(k) ← a(k−1) ; b(k) ← x(k−1)
if f (x(k−1) )f (b(k−1) ) < 0
set b(k) ← b(k−1) ; a(k) ← x(k−1)
a(k) +b(k)
set x(k) ← 2
; k ←k+1
Figure 1 demonstrates the working of the bisection method.
y
f(x)
I( 3 )
a( 0 ) x( 0 ) x( 2 ) x
x( 1 ) b( 0 )
I( 2 )
I( 1 )
I( 0 )
FIG. 1. An example demonstrating the first few iterations of the bisection method.
1.1. Convergence of bisection method
Initial interval size of the bracket is, I (0) = b(0) − a(0) = b − a. In each iteration, this
interval size reduces to half of its previous value. Therefore, at k th iteration,
3
b−a
I (k) = , (1)
2k
and the corresponding error is,
1
e(k) = x(k) − α < I (k) , (2)
2
i.e.,
b−a
e(k) < . (3)
2k+1
If at least kmin iterations are required to attain a tolerance of ϵ, we have,
b−a
e(kmin ) < ϵ =⇒ ϵ > k +1 (4)
2min
b−a
=⇒ kmin > log2 − 1. (5)
ϵ
In the expression given above, kmin is the smallest integer satisfying the inequality.
1.2. Pros and cons of the bisection method
Pros
• Rate of convergence is independent of the function.
• Required number of iterations are known beforehand.
Cons
• Reduction in error is non-monotonic.
• Inherent nature of the function is ignored. It may take several steps to converge even
for a linear function.
4
2. NEWTON-RAPHSON METHOD
Consider a function, f (x), and a point (x(k) , f (x(k) )) on the x vs. f (x) curve. Also,
consider a tangent at this point, whose equation is given by,
y(x) − f (x(k) )
(k)
= f ′ (x(k) ), (6)
x−x
where (x, y) is a point on the tangent line.
∴ y(x) = f (x(k) ) + f ′ (x(k) )(x − x(k) ). (7)
If we assume that in the next iteration, the point x(k+1) is set to where the tangent line
intersepts the x-axis, setting y(x(k+1) ) = 0 and x = x(k+1) yields from Eq. (7),
0 = f (x(k) ) + f ′ (x(k) )(x(k+1) − x(k) ) (8)
f (x(k) )
=⇒ x(k+1) = x(k) − ′ (k) , ∀k ≥ 0. (9)
f (x )
Iterative implementation of the scheme given above causes x to gradually converge to-
wards a root, α. A few iterations of the Newton-Raphson method is demonstrated in Fig. 2.
If with successive iterations, the gap between x(k) and x(k+1) reduces gradually, it leads
to the convergence of x to α. This can be verified from Eq. (8), which can be rewritten as,
f(x)
x( 2 ) x
x( 3 ) x( 1 ) x( 0 )
FIG. 2. An example demonstrating the first few iterations of the Newton-Raphson method.
5
f (x(k) ) + f ′ (x(k) )(δ (k) ) = 0, (10)
where δ (k) ≡ x(k+1) − x(k) . Now by expanding f in the neighborhood of x(k) , we get,
2
f (x(k+1) ) = f (x(k) + δ (k) ) = f (x(k) ) + δ (k) f ′ (x(k) ) + O(δ (k) ). (11)
On account of Eq. (10), Eq. (11) becomes,
2
f (x(k+1) ) = O(δ (k) ). (12)
As x(k) → x(k+1) with k → ∞, δ (k) → 0.
∴ f (x(k+1) ) → 0 =⇒ x(k+1) → α.
2.1. Convergence of Newton-Raphson method
The Newton-Raphson algorithm shows quadratic convergence, i.e., the error at a given
step is proportional to the square of error in the previous step for sufficiently large number
of iterations. More exactly, it can be shown that,
x(k+1) − α f ′′ (α)
lim 2 = . (13)
k→∞ (x(k) − α) 2f ′ (α)
In the case of zeros with multiplicity (m) larger than 1, i.e., if f ′ (α) = f ′′ (α) = ... =
f m−1 (α) = 0, the scheme given in Eq. (9) still converges, but only if x(0) is properly chosen
and f ′ (x) ̸= 0, ∀x ∈ I(α)\{α}. Moreover, the order of convergence degrades to linear.
However, we can restore the quadratic convergence if the scheme of Eq. (9) is replaced with,
f (x(k) )
x(k+1) = x(k) − m , ∀k ≥ 0. (14)
f ′ (x(k) )
2.2. Pros and cons of the Newton-Raphson method
Pros
• Unlike the bisection method, only one initial value is required.
• A linear function would converge in a single step.
6
Cons
• Convergence is not guaranteed for any arbitrary initialization, x(0) , but only for those
points, which are sufficiently close to α. However, if α ∈ [a, b] such that f ∈ C 2 [a, b],
there must exist a δ > 0 so that NR would converge if x(0) ∈ [α − δ, α + δ].
• The method can show oscillation without convergence (explained in class).
3. SECANT METHOD
The secant method is closely related to the Newton-Raphson method. We begin with
Eq. (9) but with the condition that the analytical form of f (x) is not available, and we
f (x+δ)−f (x)
approximate the derivative as, f ′ (x) ≈ δ
. Here the choice of δ is ambiguous. The
secant method offers the choice in an adaptive manner,
f (x(k) ) − f (x(k−1) )
f ′ (x(k) ) ≈ . (15)
x(k) − x(k−1)
Using the above in the NR scheme (Eq. (9)) we obtain,
x(k) − x(k−1)
(k+1) (k) (k)
x =x − f (x ) , ∀k ≥ 1. (16)
f (x(k) ) − f (x(k−1) )
3.1. Convergence of secant method
If α is a simple zero of f and I(α) is a suitable neighborhood of the root, α; if furthermore
x(0) and x(1) are sufficiently close to α and f ′ (x) ̸= 0, ∀x ∈ I(α), ∃C > 0 : x(k+1) − α ≤
√
p 1+ 5
C x(k) − α , where p = 2
≈ 1.618.
This indicates the superlinear convergence of the secant method. If α has a multiplicity
exceeding 1, the secant method’s convergence becomes linear similar to the case of the
Newton-Raphson scheme.
4. CRITERIA FOR TERMINATION
In a gradient based method, like Newton-Raphson, selecting a suitable termination
criterion is a crucial aspect. Ideally, we would want that for the required tolerance, ϵ,
7
α − x(kmin ) < ϵ. However, the actual root, α, is not known to us and some other criterion
has to be devised.
One commonly used test is to consider the gap between two consecutive iterations and
compare it to a given tolerance limit, i.e.,
x(kmin ) − x(kmin −1) < ϵ.
This test is satisfactory if α is a simple zero. Alternatively, one could also test the residue
at step, k, which is ideally 0 at x = α,
r(kmin ) ≡ f (x(kmin ) ) < ϵ.
However, this test is useful only if |f ′ (x)| ≈ 1 in the neighborhood of α. Otherwise, it may
lead to an overestimation of error if |f ′ (x)| ≫ 1 and an underestimation if |f ′ (x)| ≪ 1 in
the neighborhood of the root.
y y
(a) (b)
|f(x(k))|
|f’(x)|>>1 |f’(x)|<<1
|f(x(k))|
α
x(k) x α
x(k) x
e(k) e(k)
FIG. 3. (a) Overestimation and (b) underestimation of error based on residue computation.
5. SYSTEM OF NONLINEAR EQUATIONS
Consider the following system of multiple nonlinear equations in several variables.
8
f1 (x1 , x2 , ..., xn ) = 0;
f2 (x1 , x2 , ..., xn ) = 0;
.. (17)
.
fn (x1 , x2 , ..., xn ) = 0;
Equation (17) may be concisely expressed as, f¯(x̄) = 0̄, where
f1 x1
f2 x2
f¯ =
.. , x̄ = .. ,
. .
fn xn
and
0
0
.. ,
0̄ =
.
0
5.1. Solution by Newton-Raphson
We define the Jacobina matrix as,
∂fi
Jij = ,
∂xj
and then solve iteratively (full derivation shown in class),
J(x̄(k) ) δx̄(k) = −f¯(x̄(k) ),
Solve,
set, x̄(k+1) ← x̄(k) + δx̄(k) .
Just as in the single-variable NR method we start with the initial point, x(0) , here we start
with the initial vector, x̄(0) .
9
5.2. Broyden’s method
This is a multi-variable counterpart of the secant method. Here the Jacobian matrix,
J(x̄(k) ) is replaced with another matrix, Bk , which evolves iteratively starting from an initial
choice, B0 . The algorithm is as follows:
Solve, [Bk ] δx̄(k) = −f¯(x̄(k) ),
set, x̄(k+1) ← x̄(k) + δx̄(k) ,
T
f¯(x̄(k+1) )δ x̄(k)
set, [Bk+1 ] ← [Bk ] + T .
δ x̄(k) δ x̄(k)
It is interesting to note that although x̄(k) converges towards the root, ᾱ, with iterations, k,
Bk is not required to essentially converge to the true Jacobian, J(x̄(k) ).
10