Newton's Method Convergence Theorem

Newton's method exhibits quadratic convergence near simple roots, making it one of the most efficient root-finding algorithms when derivatives are available and the initial guess is sufficiently accurate.

TheoremQuadratic Convergence of Newton's Method

Let $f \in C^2[a,b]$ have a root $r \in (a,b)$ with $f(r) = 0$ and $f'(r) \neq 0$ . Then there exists $\delta > 0$ such that if $|x_0 - r| < \delta$ , Newton's iteration: $x_{n+1} = x_n - \frac{f(x_n)}{f'(x_n)}$ converges quadratically to $r$ , satisfying: $\lim_{n \to \infty} \frac{|x_{n+1} - r|}{|x_n - r|^2} = \frac{|f''(r)|}{2|f'(r)|}$

Moreover, if $M = \sup_{x \in [a,b]} |f''(x)|$ and $m = \inf_{x \in [a,b]} |f'(x)|$ , then for $|x_0 - r|$ sufficiently small: $|x_{n+1} - r| \leq \frac{M}{2m} |x_n - r|^2$

The theorem guarantees that once Newton's method enters a neighborhood of the root, errors decrease quadratically. This means the number of accurate digits roughly doubles with each iteration, leading to very rapid convergence.

Remark

Comparison with Linear Convergence: A linearly convergent method with rate $c < 1$ satisfies $|x_{n+1} - r| \leq c|x_n - r|$ . To achieve error $10^{-k}$ :

Linear convergence requires $O(k)$ iterations
Quadratic convergence requires $O(\log k)$ iterations

For example, achieving 16-digit accuracy from 2-digit accuracy:

Linear with $c = 0.5$ : about 47 iterations
Quadratic: about 4 iterations

ExampleNewton vs Bisection

For $f(x) = x^2 - 2$ on $[1, 2]$ seeking $\sqrt{2}$ :

Bisection (50 iterations): $x_{50} = 1.4142135623730950 \quad (\text{error} \approx 10^{-15})$

Newton from $x_0 = 1.5$ (4 iterations):

$x_1 = 1.41\overline{6}$ (error $\approx 2 \times 10^{-3}$ )
$x_2 = 1.41421568627$ (error $\approx 2 \times 10^{-6}$ )
$x_3 = 1.41421356237469$ (error $\approx 2 \times 10^{-12}$ )
$x_4 = 1.41421356237309505$ (machine precision)

Newton achieves machine precision in 4 iterations while bisection requires 50.

The condition $f'(r) \neq 0$ is essential; when $f'(r) = 0$ (multiple root), Newton's method degrades to linear convergence. The basin of attraction depends on $f''(r)/f'(r)$ : larger ratios require closer initial guesses for convergence.

In practice, hybrid methods combine global convergence guarantees (bisection or false position) with Newton's local quadratic convergence, using Newton steps when safe and falling back to bisection otherwise.