TheoremComplete

Fundamental Theorem of Rounding

The fundamental theorem of rounding establishes the basic error bound for representing real numbers in floating point arithmetic, forming the foundation for all subsequent error analysis.

TheoremFundamental Theorem of Rounding

Let F(β,p,L,U)\mathbb{F}(\beta, p, L, U) be a floating point system with round-to-nearest rounding. For any real number xx satisfying βLxβU\beta^L \leq |x| \leq \beta^U, the floating point representation fl(x)\text{fl}(x) satisfies: fl(x)=x(1+δ)\text{fl}(x) = x(1 + \delta) where δ12β1p=12ϵmach|\delta| \leq \frac{1}{2}\beta^{1-p} = \frac{1}{2}\epsilon_{\text{mach}}.

Furthermore, for any basic arithmetic operation {+,,×,÷}\circ \in \{+, -, \times, \div\}: fl(xy)=(xy)(1+δ)\text{fl}(x \circ y) = (x \circ y)(1 + \delta) where δϵmach|\delta| \leq \epsilon_{\text{mach}}, assuming no overflow or underflow occurs.

This theorem provides the standard model for floating point arithmetic analysis. The bound 12ϵmach\frac{1}{2}\epsilon_{\text{mach}} reflects round-to-nearest rounding; directed rounding modes (round toward zero, round toward ±\pm\infty) satisfy the weaker bound δϵmach|\delta| \leq \epsilon_{\text{mach}}.

Remark

The multiplicative error model (1+δ)(1 + \delta) is crucial for error analysis because it naturally leads to relative error bounds. In contrast, an additive error model fl(x)=x+ϵ\text{fl}(x) = x + \epsilon would require tracking absolute errors, which is less informative for numbers of vastly different magnitudes.

The theorem's power lies in its composability. For a sequence of nn operations, if each introduces error (1+δi)(1 + \delta_i) with δiϵmach|\delta_i| \leq \epsilon_{\text{mach}}, the accumulated error is: (1+δ1)(1+δ2)(1+δn)=1+θn(1 + \delta_1)(1 + \delta_2)\cdots(1 + \delta_n) = 1 + \theta_n where first-order analysis gives θnnϵmach|\theta_n| \leq n\epsilon_{\text{mach}} for small ϵmach\epsilon_{\text{mach}}.

ExampleApplication to Inner Product

Computing s=i=1nxiyis = \sum_{i=1}^n x_i y_i in floating point:

  1. Each product xiyix_i y_i has relative error ϵmach\leq \epsilon_{\text{mach}}
  2. Summing nn terms accumulates error nϵmach\leq n\epsilon_{\text{mach}}
  3. Total relative error: fl(s)s/s(n+1)ϵmach+O(ϵmach2)|\text{fl}(s) - s|/|s| \leq (n+1)\epsilon_{\text{mach}} + O(\epsilon_{\text{mach}}^2)

For IEEE 754 double precision with n=106n = 10^6, this gives error 2.22×1010\approx 2.22 \times 10^{-10}, losing about 6 decimal digits from machine precision.

The theorem assumes correct rounding, which IEEE 754 guarantees for basic operations. Extended precision arithmetic and fused multiply-add (FMA) operations can achieve tighter bounds, with FMA computing xy+zx \cdot y + z with only a single rounding error rather than two.

Understanding this theorem enables rigorous error analysis: backward error analysis proves algorithms produce results as if computed exactly with slightly perturbed inputs, quantified by multiples of ϵmach\epsilon_{\text{mach}}. Forward error analysis uses condition numbers to bound output error given input perturbations. Together, these techniques form the theoretical foundation of numerical stability analysis.