What's New
About the Project
NIST
3 Numerical MethodsAreas

§3.8 Nonlinear Equations

Contents

§3.8(i) Introduction

The equation to be solved is

3.8.1 f(z)=0,

where z is a real or complex variable and the function f is nonlinear. Solutions are called roots of the equation, or zeros of f. If f(z0)=0 and f(z0)0, then z0 is a simple zero of f. If f(z0)=f(z0)==f(m-1)(z0)=0 and f(m)(z0)0, then z0 is a zero of f of multiplicity m; compare §1.10(i).

Sometimes the equation takes the form

3.8.2 z=ϕ(z),

and the solutions are called fixed points of ϕ.

Equations (3.8.1) and (3.8.2) are usually solved by iterative methods. Let z1,z2, be a sequence of approximations to a root, or fixed point, ζ. If

3.8.3 |zn+1-ζ|<A|zn-ζ|p

for all n sufficiently large, where A and p are independent of n, then the sequence is said to have convergence of the pth order. (More precisely, p is the largest of the possible set of indices for (3.8.3).) If p=1 and A<1, then the convergence is said to be linear or geometric. If p=2, then the convergence is quadratic; if p=3, then the convergence is cubic, and so on.

An iterative method converges locally to a solution ζ if there exists a neighborhood N of ζ such that znζ whenever the initial approximation z0 lies within N.

§3.8(ii) Newton’s Rule

This is an iterative method for real twice-continuously differentiable, or complex analytic, functions:

3.8.4 zn+1=zn-f(zn)f(zn),
n=0,1,.

If ζ is a simple zero, then the iteration converges locally and quadratically. For multiple zeros the convergence is linear, but if the multiplicity m is known then quadratic convergence can be restored by multiplying the ratio f(zn)/f(zn) in (3.8.4) by m.

For real functions f(x) the sequence of approximations to a real zero ξ will always converge (and converge quadratically) if either:

  • (a)

    f(x0)f′′(x0)>0 and f(x), f′′(x) do not change sign between x0 and ξ (monotonic convergence).

  • (b)

    f(x0)f′′(x0)<0, f(x), f′′(x) do not change sign in the interval (x0,x1), and ξ[x0,x1] (monotonic convergence after the first iteration).

Example

f(x)=x-tanx. The first positive zero of f(x) lies in the interval (π,32π); see Figure 4.15.3. From this graph we estimate an initial value x0=4.65. Newton’s rule is given by

3.8.5 xn+1 =ϕ(xn),
ϕ(x) =x+xcot2x-cotx.

Results appear in Table 3.8.1. The choice of x0 here is critical. When x04.2875 or x04.7125, Newton’s rule does not converge to the required zero. The convergence is faster when we use instead the function f(x)=xcosx-sinx; in addition, the successful interval for the starting value x0 is larger.

Table 3.8.1: Newton’s rule for x-tanx=0.
n xn
0 4.65000 00000 000
1 4.60567 66065 900
2 4.55140 53475 751
3 4.50903 76975 617
4 4.49455 61600 185
5 4.49341 56569 391
6 4.49340 94580 903
7 4.49340 94579 091
8 4.49340 94579 091

§3.8(iii) Other Methods

Bisection Method

If f(a)f(b)<0 with a<b, then the interval [a,b] contains one or more zeros of f. Bisection of this interval is used to decide where at least one zero is located. All zeros of f in the original interval [a,b] can be computed to any predetermined accuracy. Convergence is slow however; see Kaufman and Lenker (1986) and Nievergelt (1995).

Regula Falsi

Let x0 and x1 be such that f0=f(x0) and f1=f(x1) have opposite signs. Inverse linear interpolation (§3.3(v)) is used to obtain the first approximation:

3.8.6 x2=x1-x1-x0f1-f0f1=f1x0-f0x1f1-f0.

We continue with x2 and either x0 or x1, depending which of f0 and f1 is of opposite sign to f(x2), and so on. The convergence is linear, and again more than one zero may occur in the original interval [x0,x1].

Secant Method

Whether or not f0 and f1 have opposite signs, x2 is computed as in (3.8.6). If the wanted zero ξ is simple, then the method converges locally with order of convergence p=12(1+5)=1.618. Because the method requires only one function evaluation per iteration, its numerical efficiency is ultimately higher than that of Newton’s method. There is no guaranteed convergence: the first approximation x2 may be outside [x0,x1].

Steffensen’s Method

This iterative method for solving z=ϕ(z) is given by

3.8.7 zn+1=zn-(ϕ(zn)-zn)2ϕ(ϕ(zn))-2ϕ(zn)+zn,
n=0,1,2,.

It converges locally and quadratically for both and .

For other efficient derivative-free methods, see Le (1985).

Eigenvalue Methods

For the computation of zeros of orthogonal polynomials as eigenvalues of finite tridiagonal matrices (§3.5(vi)), see Gil et al. (2007a, pp. 205–207). For the computation of zeros of Bessel functions, Coulomb functions, and conical functions as eigenvalues of finite parts of infinite tridiagonal matrices, see Grad and Zakrajšek (1973), Ikebe (1975), Ikebe et al. (1991), Ball (2000), and Gil et al. (2007a, pp. 205–213).

§3.8(iv) Zeros of Polynomials

The polynomial

3.8.8 p(z)=anzn+an-1zn-1++a0,
an0,

has n zeros in , counting each zero according to its multiplicity. Explicit formulas for the zeros are available if n4; see §§1.11(iii) and 4.43. No explicit general formulas exist when n5.

After a zero ζ has been computed, the factor z-ζ is factored out of p(z) as a by-product of Horner’s scheme (§1.11(i)) for the computation of p(ζ). In this way polynomials of successively lower degree can be used to find the remaining zeros. (This process is called deflation.) However, to guard against the accumulation of rounding errors, a final iteration for each zero should also be performed on the original polynomial p(z).

Example

p(z)=z4-1. The zeros are ±1 and ±i. Newton’s method is given by

3.8.9 zn+1 =ϕ(zn),
ϕ(z) =3z4+14z3.

The results for z0=1.5 are given in Table 3.8.2.

Table 3.8.2: Newton’s rule for z4-1=0.
n zn
0 1.50000 00000 000
1 1.19907 40740 741
2 1.04431 68969 414
3 1.00274 20038 676
4 1.00001 12265 490
5 1.00000 00001 891
6 1.00000 00000 000

As in the case of Table 3.8.1 the quadratic nature of convergence is clearly evident: as the zero is approached, the number of correct decimal places doubles at each iteration.

Newton’s rule can also be used for complex zeros of p(z). However, when the coefficients are all real, complex arithmetic can be avoided by the following iterative process.

Bairstow’s Method

Let z2-sz-t be an approximation to the real quadratic factor of p(z) that corresponds to a pair of conjugate complex zeros or to a pair of real zeros. We construct sequences qj and rj, j=n+1,n,,0, from qn+1=rn+1=0, qn=rn=an, and for jn-1,

3.8.10 qj =aj+sqj+1+tqj+2,
rj =qj+srj+1+trj+2.

Then the next approximation to the quadratic factor is z2-(s+Δs)z-(t+Δt), where

3.8.11 Δs =r3q0-r2q1r22-r3,
Δt =q1-r2q0r22-r3,
=sr2+tr3.

The method converges locally and quadratically, except when the wanted quadratic factor is a multiple factor of q(z). On the last iteration qnzn-2+qn-1zn-3++q2 is the quotient on dividing p(z) by z2-sz-t.

Example

p(z)=z4-2z2+1. With the starting values s0=74, t0=-12, an approximation to the quadratic factor z2-2z+1=(z-1)2 is computed (s=2, t=-1). Table 3.8.3 gives the successive values of s and t. The quadratic nature of the convergence is evident.

Table 3.8.3: Bairstow’s method for factoring z4-2z2+1.
n sn tn
0 1.75000 00000 000 -0.50000 00000 000
1 2.13527 29454 109 -1.21235 75284 943
2 2.01786 10488 956 -1.02528 61401 539
3 2.00036 06329 466 -1.00047 63067 522
4 2.00000 01474 803 -1.00000 01858 298
5 2.00000 00000 000 -1.00000 00000 000

This example illustrates the fact that the method succeeds even if the two zeros of the wanted quadratic factor are real and the same.

For further information on the computation of zeros of polynomials see McNamee (2007).

§3.8(v) Zeros of Analytic Functions

Newton’s rule is the most frequently used iterative process for accurate computation of real or complex zeros of analytic functions f(z). Another iterative method is Halley’s rule:

3.8.12 zn+1=zn-f(zn)f(zn)-(f′′(zn)f(zn)/(2f(zn))).

This is useful when f(z) satisfies a second-order linear differential equation because of the ease of computing f′′(zn). The rule converges locally and is cubically convergent.

Initial approximations to the zeros can often be found from asymptotic or other approximations to f(z), or by application of the phase principle or Rouché’s theorem; see §1.10(iv). These results are also useful in ensuring that no zeros are overlooked when the complex plane is being searched.

For an example involving the Airy functions, see Fabijonas and Olver (1999).

For fixed-point methods for computing zeros of special functions, see Segura (2002), Gil and Segura (2003), and Gil et al. (2007a, Chapter 7).

§3.8(vi) Conditioning of Zeros

Suppose f(z) also depends on a parameter α, denoted by f(z,α). Then the sensitivity of a simple zero z to changes in α is given by

3.8.13 dzdα=-fα/fz.

Thus if f is the polynomial (3.8.8) and α is the coefficient aj, say, then

3.8.14 dzdaj=-zjf(z).

For moderate or large values of n it is not uncommon for the magnitude of the right-hand side of (3.8.14) to be very large compared with unity, signifying that the computation of zeros of polynomials is often an ill-posed problem.

Example. Wilkinson’s Polynomial

The zeros of

3.8.15 p(x)=(x-1)(x-2)(x-20)

are well separated but extremely ill-conditioned. Consider x=20 and j=19. We have p(20)=19! and a19=1+2++20=210. The perturbation factor (3.8.14) is given by

3.8.16 dxda19=-201919!=(-4.30)×107.

Corresponding numerical factors in this example for other zeros and other values of j are obtained in Gautschi (1984, §4).

§3.8(vii) Systems of Nonlinear Equations

For fixed-point iterations and Newton’s method for solving systems of nonlinear equations, see Gautschi (1997a, Chapter 4, §9) and Ortega and Rheinboldt (1970).

§3.8(viii) Fixed-Point Iterations: Fractals

The convergence of iterative methods

3.8.17 zn+1=ϕ(zn),
n=0,1,,

for solving fixed-point problems (3.8.2) cannot always be predicted, especially in the complex plane.

Consider, for example, (3.8.9). Starting this iteration in the neighborhood of one of the four zeros ±1,±i, sequences {zn} are generated that converge to these zeros. For an arbitrary starting point z0, convergence cannot be predicted, and the boundary of the set of points z0 that generate a sequence converging to a particular zero has a very complicated structure. It is called a Julia set. In general the Julia set of an analytic function f(z) is a fractal, that is, a set that is self-similar. See Julia (1918) and Devaney (1986).