# §3.8 Nonlinear Equations

## §3.8(i) Introduction

The equation to be solved is

 3.8.1 $f(z)=0,$ ⓘ Referenced by: §3.8(i) Permalink: http://dlmf.nist.gov/3.8.E1 Encodings: TeX, pMML, png See also: Annotations for §3.8(i), §3.8 and Ch.3

where $z$ is a real or complex variable and the function $f$ is nonlinear. Solutions are called roots of the equation, or zeros of $f$. If $f(z_{0})=0$ and $f^{\prime}(z_{0})\neq 0$, then $z_{0}$ is a simple zero of $f$. If $f(z_{0})=f^{\prime}(z_{0})=\cdots=f^{(m-1)}(z_{0})=0$ and $f^{(m)}(z_{0})\neq 0$, then $z_{0}$ is a zero of $f$ of multiplicity $m$; compare §1.10(i).

Sometimes the equation takes the form

 3.8.2 $z=\phi(z),$ ⓘ Referenced by: §3.8(i), §3.8(viii) Permalink: http://dlmf.nist.gov/3.8.E2 Encodings: TeX, pMML, png See also: Annotations for §3.8(i), §3.8 and Ch.3

and the solutions are called fixed points of $\phi$.

Equations (3.8.1) and (3.8.2) are usually solved by iterative methods. Let $z_{1},z_{2},\dots$ be a sequence of approximations to a root, or fixed point, $\zeta$. If

 3.8.3 $\left|z_{n+1}-\zeta\right| ⓘ Symbols: $\zeta$: fixed point A&S Ref: 3.9.2 Referenced by: §3.8(i) Permalink: http://dlmf.nist.gov/3.8.E3 Encodings: TeX, pMML, png See also: Annotations for §3.8(i), §3.8 and Ch.3

for all $n$ sufficiently large, where $A$ and $p$ are independent of $n$, then the sequence is said to have convergence of the $p$th order. (More precisely, $p$ is the largest of the possible set of indices for (3.8.3).) If $p=1$ and $A<1$, then the convergence is said to be linear or geometric. If $p=2$, then the convergence is quadratic; if $p=3$, then the convergence is cubic, and so on.

An iterative method converges locally to a solution $\zeta$ if there exists a neighborhood $N$ of $\zeta$ such that $z_{n}\to\zeta$ whenever the initial approximation $z_{0}$ lies within $N$.

## §3.8(ii) Newton’s Rule

This is an iterative method for real twice-continuously differentiable, or complex analytic, functions:

 3.8.4 $z_{n+1}=z_{n}-\frac{f(z_{n})}{f^{\prime}(z_{n})},$ $n=0,1,\dots$. ⓘ A&S Ref: 3.9.5 Referenced by: §3.8(ii) Permalink: http://dlmf.nist.gov/3.8.E4 Encodings: TeX, pMML, png See also: Annotations for §3.8(ii), §3.8 and Ch.3

If $\zeta$ is a simple zero, then the iteration converges locally and quadratically. For multiple zeros the convergence is linear, but if the multiplicity $m$ is known then quadratic convergence can be restored by multiplying the ratio $f(z_{n})/f^{\prime}(z_{n})$ in (3.8.4) by $m$.

For real functions $f(x)$ the sequence of approximations to a real zero $\xi$ will always converge (and converge quadratically) if either:

• (a)

$f(x_{0})f^{\prime\prime}(x_{0})>0$ and $f^{\prime}(x)$, $f^{\prime\prime}(x)$ do not change sign between $x_{0}$ and $\xi$ (monotonic convergence).

• (b)

$f(x_{0})f^{\prime\prime}(x_{0})<0$, $f^{\prime}(x)$, $f^{\prime\prime}(x)$ do not change sign in the interval $(x_{0},x_{1})$, and $\xi\in[x_{0},x_{1}]$ (monotonic convergence after the first iteration).

### Example

$f(x)=x-\tan x$. The first positive zero of $f(x)$ lies in the interval $(\pi,\frac{3}{2}\pi)$; see Figure 4.15.3. From this graph we estimate an initial value $x_{0}=4.65$. Newton’s rule is given by

 3.8.5 $\displaystyle x_{n+1}$ $\displaystyle=\phi(x_{n}),$ $\displaystyle\phi(x)$ $\displaystyle=x+x{\cot}^{2}x-\cot x.$ ⓘ Symbols: $\cot\NVar{z}$: cotangent function Permalink: http://dlmf.nist.gov/3.8.E5 Encodings: TeX, TeX, pMML, pMML, png, png See also: Annotations for §3.8(ii), §3.8(ii), §3.8 and Ch.3

Results appear in Table 3.8.1. The choice of $x_{0}$ here is critical. When $x_{0}\leq 4.2875$ or $x_{0}\geq 4.7125$, Newton’s rule does not converge to the required zero. The convergence is faster when we use instead the function $f(x)=x\cos x-\sin x$; in addition, the successful interval for the starting value $x_{0}$ is larger.

## §3.8(iii) Other Methods

### Bisection Method

If $f(a)f(b)<0$ with $a, then the interval $[a,b]$ contains one or more zeros of $f$. Bisection of this interval is used to decide where at least one zero is located. All zeros of $f$ in the original interval $[a,b]$ can be computed to any predetermined accuracy. Convergence is slow however; see Kaufman and Lenker (1986) and Nievergelt (1995).

### Regula Falsi

Let $x_{0}$ and $x_{1}$ be such that $f_{0}=f(x_{0})$ and $f_{1}=f(x_{1})$ have opposite signs. Inverse linear interpolation (§3.3(v)) is used to obtain the first approximation:

 3.8.6 $x_{2}=x_{1}-\frac{x_{1}-x_{0}}{f_{1}-f_{0}}f_{1}=\frac{f_{1}x_{0}-f_{0}x_{1}}{% f_{1}-f_{0}}.$ ⓘ A&S Ref: 3.9.3 Referenced by: §3.8(iii) Permalink: http://dlmf.nist.gov/3.8.E6 Encodings: TeX, pMML, png See also: Annotations for §3.8(iii), §3.8(iii), §3.8 and Ch.3

We continue with $x_{2}$ and either $x_{0}$ or $x_{1}$, depending which of $f_{0}$ and $f_{1}$ is of opposite sign to $f(x_{2})$, and so on. The convergence is linear, and again more than one zero may occur in the original interval $[x_{0},x_{1}]$.

### Secant Method

Whether or not $f_{0}$ and $f_{1}$ have opposite signs, $x_{2}$ is computed as in (3.8.6). If the wanted zero $\xi$ is simple, then the method converges locally with order of convergence $p=\frac{1}{2}(1+\sqrt{5})=1.618\ldots\,$. Because the method requires only one function evaluation per iteration, its numerical efficiency is ultimately higher than that of Newton’s method. There is no guaranteed convergence: the first approximation $x_{2}$ may be outside $[x_{0},x_{1}]$.

### Steffensen’s Method

This iterative method for solving $z=\phi(z)$ is given by

 3.8.7 $z_{n+1}=z_{n}-\frac{(\phi(z_{n})-z_{n})^{2}}{\phi(\phi(z_{n}))-2\phi(z_{n})+z_% {n}},$ $n=0,1,2,\dots$. ⓘ Permalink: http://dlmf.nist.gov/3.8.E7 Encodings: TeX, pMML, png See also: Annotations for §3.8(iii), §3.8(iii), §3.8 and Ch.3

It converges locally and quadratically for both $\mathbb{R}$ and $\mathbb{C}$.

For other efficient derivative-free methods, see Le (1985).

### Eigenvalue Methods

For the computation of zeros of orthogonal polynomials as eigenvalues of finite tridiagonal matrices (§3.5(vi)), see Gil et al. (2007a, pp. 205–207). For the computation of zeros of Bessel functions, Coulomb functions, and conical functions as eigenvalues of finite parts of infinite tridiagonal matrices, see Grad and Zakrajšek (1973), Ikebe (1975), Ikebe et al. (1991), Ball (2000), and Gil et al. (2007a, pp. 205–213).

## §3.8(iv) Zeros of Polynomials

The polynomial

 3.8.8 $p(z)=a_{n}z^{n}+a_{n-1}z^{n-1}+\dots+a_{0},$ $a_{n}\neq 0$, ⓘ Defines: $p(z)$: polynomial (locally) Referenced by: §3.8(vi) Permalink: http://dlmf.nist.gov/3.8.E8 Encodings: TeX, pMML, png See also: Annotations for §3.8(iv), §3.8 and Ch.3

has $n$ zeros in $\mathbb{C}$, counting each zero according to its multiplicity. Explicit formulas for the zeros are available if $n\leq 4$; see §§1.11(iii) and 4.43. No explicit general formulas exist when $n\geq 5$.

After a zero $\zeta$ has been computed, the factor $z-\zeta$ is factored out of $p(z)$ as a by-product of Horner’s scheme (§1.11(i)) for the computation of $p(\zeta)$. In this way polynomials of successively lower degree can be used to find the remaining zeros. (This process is called deflation.) However, to guard against the accumulation of rounding errors, a final iteration for each zero should also be performed on the original polynomial $p(z)$.

### Example

$p(z)=z^{4}-1$. The zeros are $\pm 1$ and $\pm\mathrm{i}$. Newton’s method is given by

 3.8.9 $\displaystyle z_{n+1}$ $\displaystyle=\phi(z_{n}),$ $\displaystyle\phi(z)$ $\displaystyle=\frac{3z^{4}+1}{4z^{3}}.$ ⓘ A&S Ref: 3.9.6 (in different form) Referenced by: §3.8(viii) Permalink: http://dlmf.nist.gov/3.8.E9 Encodings: TeX, TeX, pMML, pMML, png, png See also: Annotations for §3.8(iv), §3.8(iv), §3.8 and Ch.3

The results for $z_{0}=1.5$ are given in Table 3.8.2.

As in the case of Table 3.8.1 the quadratic nature of convergence is clearly evident: as the zero is approached, the number of correct decimal places doubles at each iteration.

Newton’s rule can also be used for complex zeros of $p(z)$. However, when the coefficients are all real, complex arithmetic can be avoided by the following iterative process.

### Bairstow’s Method

Let $z^{2}-sz-t$ be an approximation to the real quadratic factor of $p(z)$ that corresponds to a pair of conjugate complex zeros or to a pair of real zeros. We construct sequences $q_{j}$ and $r_{j}$, $j=n+1,n,\dots,0$, from $q_{n+1}=r_{n+1}=0$, $q_{n}=r_{n}=a_{n}$, and for $j\leq n-1$,

 3.8.10 $\displaystyle q_{j}$ $\displaystyle=a_{j}+sq_{j+1}+tq_{j+2},$ $\displaystyle r_{j}$ $\displaystyle=q_{j}+sr_{j+1}+tr_{j+2}.$ ⓘ Defines: $r_{j}$: sequence (locally) Symbols: $q(x)$: real-valued function Permalink: http://dlmf.nist.gov/3.8.E10 Encodings: TeX, TeX, pMML, pMML, png, png See also: Annotations for §3.8(iv), §3.8(iv), §3.8 and Ch.3

Then the next approximation to the quadratic factor is $z^{2}-(s+\Delta s)z-(t+\Delta t)$, where

 3.8.11 $\displaystyle\Delta s$ $\displaystyle=\frac{r_{3}q_{0}-r_{2}q_{1}}{r_{2}^{2}-\ell r_{3}},$ $\displaystyle\Delta t$ $\displaystyle=\frac{\ell q_{1}-r_{2}q_{0}}{r_{2}^{2}-\ell r_{3}},$ $\displaystyle\ell$ $\displaystyle=sr_{2}+tr_{3}.$ ⓘ Symbols: $r_{j}$: sequence and $q(x)$: real-valued function Permalink: http://dlmf.nist.gov/3.8.E11 Encodings: TeX, TeX, TeX, pMML, pMML, pMML, png, png, png See also: Annotations for §3.8(iv), §3.8(iv), §3.8 and Ch.3

The method converges locally and quadratically, except when the wanted quadratic factor is a multiple factor of $q(z)$. On the last iteration $q_{n}z^{n-2}+q_{n-1}z^{n-3}+\dots+q_{2}$ is the quotient on dividing $p(z)$ by $z^{2}-sz-t$.

### Example

$p(z)=z^{4}-2z^{2}+1$. With the starting values $s_{0}=\frac{7}{4}$, $t_{0}=-\frac{1}{2}$, an approximation to the quadratic factor $z^{2}-2z+1=(z-1)^{2}$ is computed ($s=2$, $t=-1$). Table 3.8.3 gives the successive values of $s$ and $t$. The quadratic nature of the convergence is evident.

This example illustrates the fact that the method succeeds even if the two zeros of the wanted quadratic factor are real and the same.

For further information on the computation of zeros of polynomials see McNamee (2007).

## §3.8(v) Zeros of Analytic Functions

Newton’s rule is the most frequently used iterative process for accurate computation of real or complex zeros of analytic functions $f(z)$. Another iterative method is Halley’s rule:

 3.8.12 $z_{n+1}=z_{n}-\frac{f(z_{n})}{f^{\prime}(z_{n})-(f^{\prime\prime}(z_{n})f(z_{n% })/(2f^{\prime}(z_{n})))}.$ ⓘ Permalink: http://dlmf.nist.gov/3.8.E12 Encodings: TeX, pMML, png See also: Annotations for §3.8(v), §3.8 and Ch.3

This is useful when $f(z)$ satisfies a second-order linear differential equation because of the ease of computing $f^{\prime\prime}(z_{n})$. The rule converges locally and is cubically convergent.

Initial approximations to the zeros can often be found from asymptotic or other approximations to $f(z)$, or by application of the phase principle or Rouché’s theorem; see §1.10(iv). These results are also useful in ensuring that no zeros are overlooked when the complex plane is being searched.

For an example involving the Airy functions, see Fabijonas and Olver (1999).

For fixed-point methods for computing zeros of special functions, see Segura (2002), Gil and Segura (2003), and Gil et al. (2007a, Chapter 7). For describing the distribution of complex zeros of solutions of linear homogeneous second-order differential equations by methods based on the Liouville–Green (WKB) approximation, see Segura (2013).

## §3.8(vi) Conditioning of Zeros

Suppose $f(z)$ also depends on a parameter $\alpha$, denoted by $f(z,\alpha)$. Then the sensitivity of a simple zero $z$ to changes in $\alpha$ is given by

 3.8.13 $\frac{\mathrm{d}z}{\mathrm{d}\alpha}=-\ifrac{\frac{\partial f}{\partial\alpha}% }{\frac{\partial f}{\partial z}}.$

Thus if $f$ is the polynomial (3.8.8) and $\alpha$ is the coefficient $a_{j}$, say, then

 3.8.14 $\frac{\mathrm{d}z}{\mathrm{d}a_{j}}=-\frac{z^{j}}{f^{\prime}(z)}.$ ⓘ Symbols: $\frac{\mathrm{d}\NVar{f}}{\mathrm{d}\NVar{x}}$: derivative Referenced by: §3.8(vi), §3.8(vi) Permalink: http://dlmf.nist.gov/3.8.E14 Encodings: TeX, pMML, png See also: Annotations for §3.8(vi), §3.8 and Ch.3

For moderate or large values of $n$ it is not uncommon for the magnitude of the right-hand side of (3.8.14) to be very large compared with unity, signifying that the computation of zeros of polynomials is often an ill-posed problem.

### Example. Wilkinson’s Polynomial

The zeros of

 3.8.15 $p(x)=(x-1)(x-2)\cdots(x-20)$ ⓘ Permalink: http://dlmf.nist.gov/3.8.E15 Encodings: TeX, pMML, png See also: Annotations for §3.8(vi), §3.8(vi), §3.8 and Ch.3

are well separated but extremely ill-conditioned. Consider $x=20$ and $j=19$. We have $p^{\mspace{1.0mu}\prime}(20)=19!$ and $a_{19}=1+2+\dots+20=210$. The perturbation factor (3.8.14) is given by

 3.8.16 $\frac{\mathrm{d}x}{\mathrm{d}a_{19}}=-\frac{20^{19}}{19!}=(-4.30\dots)\times 1% 0^{7}.$ ⓘ Symbols: $\frac{\mathrm{d}\NVar{f}}{\mathrm{d}\NVar{x}}$: derivative and $!$: factorial (as in $n!$) Permalink: http://dlmf.nist.gov/3.8.E16 Encodings: TeX, pMML, png See also: Annotations for §3.8(vi), §3.8(vi), §3.8 and Ch.3

Corresponding numerical factors in this example for other zeros and other values of $j$ are obtained in Gautschi (1984, §4).

## §3.8(vii) Systems of Nonlinear Equations

For fixed-point iterations and Newton’s method for solving systems of nonlinear equations, see Gautschi (1997a, Chapter 4, §9) and Ortega and Rheinboldt (1970).

## §3.8(viii) Fixed-Point Iterations: Fractals

The convergence of iterative methods

 3.8.17 $z_{n+1}=\phi(z_{n}),$ $n=0,1,\dots$, ⓘ Permalink: http://dlmf.nist.gov/3.8.E17 Encodings: TeX, pMML, png See also: Annotations for §3.8(viii), §3.8 and Ch.3

for solving fixed-point problems (3.8.2) cannot always be predicted, especially in the complex plane.

Consider, for example, (3.8.9). Starting this iteration in the neighborhood of one of the four zeros $\pm 1,\pm\mathrm{i}$, sequences $\{z_{n}\}$ are generated that converge to these zeros. For an arbitrary starting point $z_{0}\in\mathbb{C}$, convergence cannot be predicted, and the boundary of the set of points $z_{0}$ that generate a sequence converging to a particular zero has a very complicated structure. It is called a Julia set. In general the Julia set of an analytic function $f(z)$ is a fractal, that is, a set that is self-similar. See Julia (1918) and Devaney (1986).