## Why Binary Searches Can Break

This post has already been published on code::gallery blog which now has been merged into this blog.

Binary searches and mergesorts can break, in fact, most of them will break because they follow the same method of calculating the mid-point for the search. It is usually calculated as average of low and high values. Here is what Josh Bloch says (the example uses Java):

The bug is in this line:

6: int mid =(low + high) / 2;

In Programming Pearls Bentley says that the analogous line “sets m to the average of l and u, truncated down to the nearest integer.” On the face of it, this assertion might appear correct, but it fails for large values of the int variables low and high. Specifically, it fails if the sum of low and high is greater than the maximum positive int value (231 – 1). The sum overflows to a negative value, and the value stays negative when divided by two. In C this causes an array index out of bounds with unpredictable results. In Java, it throws ArrayIndexOutOfBoundsException.

Very interesting, whatever the size of the integer, there will always be a condition when the sum of low and high will overflow it. Josh also gives some suggestions:

So what’s the best way to fix the bug? Here’s one way:

6: int mid = low + ((high – low) / 2);

Probably faster, and arguably as clear is:

6: int mid = (low + high) >>> 1;

In C and C++ (where you don’t have the >>> operator), you can do this:

6: mid = ((unsigned) (low + high)) >> 1;

A classic example where the code design can do the right or the wrong. It is important to realise that the code works in all the cases. While all is sometimes impossible to consider and we can use a subset, there is no guarantee that the subset will remain the same in future. That is why, in my opinion, as part of the maintenance of the application, the code should be verified against the new computing environment. Today, the software is re-released for change in business requirements or business environment, but it is also important to consider the changing computing environment.

Technorati tags: ,