Joshua Bloch
writes about a bug in the binary search implementatuion. Turns out
int mid = (low + high) / 2;
gives unexpected results for sufficiently large values of
low
and
high
due to integer overflow. Ouch.
So let's fix it. Joshua suggests rewriting the expression as
int mid = low + ((high - low) / 2);
However, this is just a kludge that makes the code less readable and doesn't address the root cause of the bug.
The underlying problem is Java's 32-bit integer arithmetic. Why does adding two large integers cause an overflow instead of returning a bignum? The answer is, no doubt, that it's a performance optimization. It's more efficient to always use native addition instead of sometimes using bignums.
At the time of Java release, 32-bit integers provided the best performance on mainstream CPUs, but that's no longer true today: 64 bit is the native word size of state-of-the-art processors. As is the case with many optimizations, it's brittle: changes to the execution environment cause performance to degrade. Java's optimizations are now obsolete and counterproductive.
This is yet another demonstration that
it is easier to optimize correct code than to correct optimized code. It's unfortunate that many language designers continue to follow the same path:
Fortress programming language also has a fixed size integer type, only it's 64 bit. The overflow bugs will take a while longer to be triggered, but they'll be triggered eventually.
...In order for a programmer to write correct code that can be later optimized, the platform must provide:- A base integer abstraction that hides the word size, implemented using bignums if the value is large;
- Several integer types of fixed size that support arithemetic with overflow:
int8
, int16
, int32
, int64
, int128
, plus the unsigned versions—primarily for interoperability with other systems; - A way for the programmer to provide the information that can be used for optimizing the base integer implementation. This can take a form of annotating invariants, such as specifying a range of a variable; if an invariant is violated, an exceptional condition is generated.