The equation to be solved is
3.8.1 | $$f\left(z\right)=0,$$ | ||
where $z$ is a real or complex variable and the function $f$ is nonlinear. Solutions are called roots of the equation, or zeros of $f$. If $f\left({z}_{0}\right)=0$ and ${f}^{\prime}\left({z}_{0}\right)\ne 0$, then ${z}_{0}$ is a simple zero of $f$. If $f\left({z}_{0}\right)={f}^{\prime}\left({z}_{0}\right)=\mathrm{\cdots}={f}^{\left(m-1\right)}\left({z}_{0}\right)=0$ and ${f}^{\left(m\right)}\left({z}_{0}\right)\ne 0$, then ${z}_{0}$ is a zero of $f$ of multiplicity $m$; compare §1.10(i).
Sometimes the equation takes the form
3.8.2 | $$z=\varphi \left(z\right),$$ | ||
and the solutions are called fixed points of $\varphi $.
Equations (3.8.1) and (3.8.2) are usually solved by iterative methods. Let ${z}_{1},{z}_{2},\mathrm{\dots}$ be a sequence of approximations to a root, or fixed point, $\zeta $. If
3.8.3 | $$ | ||
for all $n$ sufficiently large, where $A$ and $p$ are independent of $n$, then the sequence is said to have convergence of the $p$th order. (More precisely, $p$ is the largest of the possible set of indices for (3.8.3).) If $p=1$ and $$, then the convergence is said to be linear or geometric. If $p=2$, then the convergence is quadratic; if $p=3$, then the convergence is cubic, and so on.
An iterative method converges locally to a solution $\zeta $ if there exists a neighborhood $N$ of $\zeta $ such that ${z}_{n}\to \zeta $ whenever the initial approximation ${z}_{0}$ lies within $N$.
This is an iterative method for real twice-continuously differentiable, or complex analytic, functions:
3.8.4 | $${z}_{n+1}={z}_{n}-\frac{f\left({z}_{n}\right)}{{f}^{\prime}\left({z}_{n}\right)},$$ | ||
$n=0,1,\mathrm{\dots}$. | |||
If $\zeta $ is a simple zero, then the iteration converges locally and quadratically. For multiple zeros the convergence is linear, but if the multiplicity $m$ is known then quadratic convergence can be restored by multiplying the ratio $f\left({z}_{n}\right)/{f}^{\prime}\left({z}_{n}\right)$ in (3.8.4) by $m$.
For real functions $f\left(x\right)$ the sequence of approximations to a real zero $\xi $ will always converge (and converge quadratically) if either:
$f\left({x}_{0}\right){f}^{\mathrm{\prime \prime}}\left({x}_{0}\right)>0$ and ${f}^{\prime}\left(x\right)$, ${f}^{\mathrm{\prime \prime}}\left(x\right)$ do not change sign between ${x}_{0}$ and $\xi $ (monotonic convergence).
$$, ${f}^{\prime}\left(x\right)$, ${f}^{\mathrm{\prime \prime}}\left(x\right)$ do not change sign in the interval $\left({x}_{0},{x}_{1}\right)$, and $\xi \in \left[{x}_{0},{x}_{1}\right]$ (monotonic convergence after the first iteration).
$f\left(x\right)=x-\mathrm{tan}x$. The first positive zero of $f\left(x\right)$ lies in the interval $\left(\pi ,\frac{3}{2}\pi \right)$; see Figure 4.15.3. From this graph we estimate an initial value ${x}_{0}=4.65$. Newton’s rule is given by
3.8.5 | ${x}_{n+1}$ | $=\varphi \left({x}_{n}\right),$ | ||
$\varphi \left(x\right)$ | $=x+x{\mathrm{cot}}^{2}x-\mathrm{cot}x.$ | |||
Results appear in Table 3.8.1. The choice of ${x}_{0}$ here is critical. When ${x}_{0}\le 4.2875$ or ${x}_{0}\ge 4.7125$, Newton’s rule does not converge to the required zero. The convergence is faster when we use instead the function $f\left(x\right)=x\mathrm{cos}x-\mathrm{sin}x$; in addition, the successful interval for the starting value ${x}_{0}$ is larger.
$n$ | ${x}_{n}$ |
---|---|
0 | 4.65000 00000 000 |
1 | 4.60567 66065 900 |
2 | 4.55140 53475 751 |
3 | 4.50903 76975 617 |
4 | 4.49455 61600 185 |
5 | 4.49341 56569 391 |
6 | 4.49340 94580 903 |
7 | 4.49340 94579 091 |
8 | 4.49340 94579 091 |
If $$ with $$, then the interval $\left[a,b\right]$ contains one or more zeros of $f$. Bisection of this interval is used to decide where at least one zero is located. All zeros of $f$ in the original interval $\left[a,b\right]$ can be computed to any predetermined accuracy. Convergence is slow however; see Kaufman and Lenker (1986) and Nievergelt (1995).
Let ${x}_{0}$ and ${x}_{1}$ be such that ${f}_{0}=f\left({x}_{0}\right)$ and ${f}_{1}=f\left({x}_{1}\right)$ have opposite signs. Inverse linear interpolation (§3.3(v)) is used to obtain the first approximation:
3.8.6 | $${x}_{2}={x}_{1}-\frac{{x}_{1}-{x}_{0}}{{f}_{1}-{f}_{0}}{f}_{1}=\frac{{f}_{1}{x}_{0}-{f}_{0}{x}_{1}}{{f}_{1}-{f}_{0}}.$$ | ||
We continue with ${x}_{2}$ and either ${x}_{0}$ or ${x}_{1}$, depending which of ${f}_{0}$ and ${f}_{1}$ is of opposite sign to $f\left({x}_{2}\right)$, and so on. The convergence is linear, and again more than one zero may occur in the original interval $\left[{x}_{0},{x}_{1}\right]$.
Whether or not ${f}_{0}$ and ${f}_{1}$ have opposite signs, ${x}_{2}$ is computed as in (3.8.6). If the wanted zero $\xi $ is simple, then the method converges locally with order of convergence $p=\frac{1}{2}\left(1+\sqrt{5}\right)=1.618\mathrm{\dots}$. Because the method requires only one function evaluation per iteration, its numerical efficiency is ultimately higher than that of Newton’s method. There is no guaranteed convergence: the first approximation ${x}_{2}$ may be outside $\left[{x}_{0},{x}_{1}\right]$.
This iterative method for solving $z=\varphi \left(z\right)$ is given by
3.8.7 | $${z}_{n+1}={z}_{n}-\frac{{\left(\varphi \left({z}_{n}\right)-{z}_{n}\right)}^{2}}{\varphi \left(\varphi \left({z}_{n}\right)\right)-2\varphi \left({z}_{n}\right)+{z}_{n}},$$ | ||
$n=0,1,2,\mathrm{\dots}$. | |||
It converges locally and quadratically for both $\mathrm{\mathbb{R}}$ and $\mathrm{\u2102}$.
For other efficient derivative-free methods, see Le (1985).
For the computation of zeros of orthogonal polynomials as eigenvalues of finite tridiagonal matrices (§3.5(vi)), see Gil et al. (2007a, pp. 205–207). For the computation of zeros of Bessel functions, Coulomb functions, and conical functions as eigenvalues of finite parts of infinite tridiagonal matrices, see Grad and Zakrajšek (1973), Ikebe (1975), Ikebe et al. (1991), Ball (2000), and Gil et al. (2007a, pp. 205–213).
The polynomial
3.8.8 | $$p\left(z\right)={a}_{n}{z}^{n}+{a}_{n-1}{z}^{n-1}+\mathrm{\dots}+{a}_{0},$$ | ||
${a}_{n}\ne 0$, | |||
has $n$ zeros in $\mathrm{\u2102}$, counting each zero according to its multiplicity. Explicit formulas for the zeros are available if $n\le 4$; see §§1.11(iii) and 4.43. No explicit general formulas exist when $n\ge 5$.
After a zero $\zeta $ has been computed, the factor $z-\zeta $ is factored out of $p\left(z\right)$ as a by-product of Horner’s scheme (§1.11(i)) for the computation of $p\left(\zeta \right)$. In this way polynomials of successively lower degree can be used to find the remaining zeros. (This process is called deflation.) However, to guard against the accumulation of rounding errors, a final iteration for each zero should also be performed on the original polynomial $p\left(z\right)$.
$p\left(z\right)={z}^{4}-1$. The zeros are $\pm 1$ and $\pm \mathrm{i}$. Newton’s method is given by
3.8.9 | ${z}_{n+1}$ | $=\varphi \left({z}_{n}\right),$ | ||
$\varphi \left(z\right)$ | $={\displaystyle \frac{3{z}^{4}+1}{4{z}^{3}}}.$ | |||
The results for ${z}_{0}=1.5$ are given in Table 3.8.2.
$n$ | ${z}_{n}$ |
---|---|
0 | 1.50000 00000 000 |
1 | 1.19907 40740 741 |
2 | 1.04431 68969 414 |
3 | 1.00274 20038 676 |
4 | 1.00001 12265 490 |
5 | 1.00000 00001 891 |
6 | 1.00000 00000 000 |
As in the case of Table 3.8.1 the quadratic nature of convergence is clearly evident: as the zero is approached, the number of correct decimal places doubles at each iteration.
Newton’s rule can also be used for complex zeros of $p\left(z\right)$. However, when the coefficients are all real, complex arithmetic can be avoided by the following iterative process.
Let ${z}^{2}-sz-t$ be an approximation to the real quadratic factor of $p\left(z\right)$ that corresponds to a pair of conjugate complex zeros or to a pair of real zeros. We construct sequences ${q}_{j}$ and ${r}_{j}$, $j=n+1,n,\mathrm{\dots},0$, from ${q}_{n+1}={r}_{n+1}=0$, ${q}_{n}={r}_{n}={a}_{n}$, and for $j\le n-1$,
3.8.10 | ${q}_{j}$ | $={a}_{j}+s{q}_{j+1}+t{q}_{j+2},$ | ||
${r}_{j}$ | $={q}_{j}+s{r}_{j+1}+t{r}_{j+2}.$ | |||
Then the next approximation to the quadratic factor is ${z}^{2}-\left(s+\Delta s\right)z-\left(t+\Delta t\right)$, where
3.8.11 | $\Delta s$ | $={\displaystyle \frac{{r}_{3}{q}_{0}-{r}_{2}{q}_{1}}{{r}_{2}^{2}-\mathrm{\ell}{r}_{3}}},$ | ||
$\Delta t$ | $={\displaystyle \frac{\mathrm{\ell}{q}_{1}-{r}_{2}{q}_{0}}{{r}_{2}^{2}-\mathrm{\ell}{r}_{3}}},$ | |||
$\mathrm{\ell}$ | $=s{r}_{2}+t{r}_{3}.$ | |||
The method converges locally and quadratically, except when the wanted quadratic factor is a multiple factor of $q\left(z\right)$. On the last iteration ${q}_{n}{z}^{n-2}+{q}_{n-1}{z}^{n-3}+\mathrm{\dots}+{q}_{2}$ is the quotient on dividing $p\left(z\right)$ by ${z}^{2}-sz-t$.
$p\left(z\right)={z}^{4}-2{z}^{2}+1$. With the starting values ${s}_{0}=\frac{7}{4}$, ${t}_{0}=-\frac{1}{2}$, an approximation to the quadratic factor ${z}^{2}-2z+1={\left(z-1\right)}^{2}$ is computed ($s=2$, $t=-1$). Table 3.8.3 gives the successive values of $s$ and $t$. The quadratic nature of the convergence is evident.
$n$ | ${s}_{n}$ | ${t}_{n}$ |
---|---|---|
0 | 1.75000 00000 000 | $-$0.50000 00000 000 |
1 | 2.13527 29454 109 | $-$1.21235 75284 943 |
2 | 2.01786 10488 956 | $-$1.02528 61401 539 |
3 | 2.00036 06329 466 | $-$1.00047 63067 522 |
4 | 2.00000 01474 803 | $-$1.00000 01858 298 |
5 | 2.00000 00000 000 | $-$1.00000 00000 000 |
This example illustrates the fact that the method succeeds even if the two zeros of the wanted quadratic factor are real and the same.
For further information on the computation of zeros of polynomials see McNamee (2007).
Newton’s rule is the most frequently used iterative process for accurate computation of real or complex zeros of analytic functions $f\left(z\right)$. Another iterative method is Halley’s rule:
3.8.12 | $${z}_{n+1}={z}_{n}-\frac{f\left({z}_{n}\right)}{{f}^{\prime}\left({z}_{n}\right)-\left({f}^{\mathrm{\prime \prime}}\left({z}_{n}\right)f\left({z}_{n}\right)/\left(2{f}^{\prime}\left({z}_{n}\right)\right)\right)}.$$ | ||
This is useful when $f\left(z\right)$ satisfies a second-order linear differential equation because of the ease of computing ${f}^{\mathrm{\prime \prime}}\left({z}_{n}\right)$. The rule converges locally and is cubically convergent.
Initial approximations to the zeros can often be found from asymptotic or other approximations to $f\left(z\right)$, or by application of the phase principle or Rouché’s theorem; see §1.10(iv). These results are also useful in ensuring that no zeros are overlooked when the complex plane is being searched.
For an example involving the Airy functions, see Fabijonas and Olver (1999).
Suppose $f\left(z\right)$ also depends on a parameter $\alpha $, denoted by $f\left(z,\alpha \right)$. Then the sensitivity of a simple zero $z$ to changes in $\alpha $ is given by
3.8.13 | $$\frac{\mathrm{d}z}{\mathrm{d}\alpha}=-\frac{\partial f}{\partial \alpha}/\frac{\partial f}{\partial z}.$$ | ||
Thus if $f$ is the polynomial (3.8.8) and $\alpha $ is the coefficient ${a}_{j}$, say, then
3.8.14 | $$\frac{\mathrm{d}z}{\mathrm{d}{a}_{j}}=-\frac{{z}^{j}}{{f}^{\prime}\left(z\right)}.$$ | ||
For moderate or large values of $n$ it is not uncommon for the magnitude of the right-hand side of (3.8.14) to be very large compared with unity, signifying that the computation of zeros of polynomials is often an ill-posed problem.
The zeros of
3.8.15 | $$p\left(x\right)=\left(x-1\right)\left(x-2\right)\mathrm{\cdots}\left(x-20\right)$$ | ||
are well separated but extremely ill-conditioned. Consider $x=20$ and $j=19$. We have ${p}^{\prime}\left(20\right)=19\mathrm{!}$ and ${a}_{19}=1+2+\mathrm{\dots}+20=210$. The perturbation factor (3.8.14) is given by
3.8.16 | $$\frac{\mathrm{d}x}{\mathrm{d}{a}_{19}}=-\frac{{20}^{19}}{19\mathrm{!}}=\left(-4.30\mathrm{\dots}\right)\times {10}^{7}.$$ | ||
Corresponding numerical factors in this example for other zeros and other values of $j$ are obtained in Gautschi (1984, §4).
The convergence of iterative methods
3.8.17 | $${z}_{n+1}=\varphi \left({z}_{n}\right),$$ | ||
$n=0,1,\mathrm{\dots}$, | |||
for solving fixed-point problems (3.8.2) cannot always be predicted, especially in the complex plane.
Consider, for example, (3.8.9). Starting this iteration in the neighborhood of one of the four zeros $\pm 1,\pm \mathrm{i}$, sequences $\left\{{z}_{n}\right\}$ are generated that converge to these zeros. For an arbitrary starting point ${z}_{0}\in \mathrm{\u2102}$, convergence cannot be predicted, and the boundary of the set of points ${z}_{0}$ that generate a sequence converging to a particular zero has a very complicated structure. It is called a Julia set. In general the Julia set of an analytic function $f\left(z\right)$ is a fractal, that is, a set that is self-similar. See Julia (1918) and Devaney (1986).