### Ben and 2 Divides

Two years ago, Ben Weiss (the guy who wrote Kai's Power Tools, Goo, Soap, etc.)
was telling me about this idea he had to combine two divides. I just didn't pay
attention.
Luckily, he told me again.

Ben's idea is really simple, and I use it everywhere now.

### 2 divides at once

Suppose you want to compute:

a/b AND c/d

In the real number system, this is the same as:
ad/bd AND cb/bd

In case this isn't getting through, that's:
inv := 1 / (b * d)
inv * a * d AND inv * b * d

### Multiply instead

Of course, modern processors are LOTS faster at multiplies than
divides. In fact, in optimal conditions, most of them can do a multiply
per cycle, and a divide every 20-40 cycles. So an extra 5 multiplies
spread over 2 divides should be worth it, right?
Well, I did some speed tests:

// This snippet computes c <- a / b, component-wise
for (int j = 0; j < n; j += 2) {
float ibd = 1.0f / (b[j] * b[j+1]);
c[j] = a[j] * b[j+1] * ibd;
c[j+1] = a[j+1] * b[j] * ibd;
}

If you put your P2's FPU in single-precision mode, this code runs in a
blazing **9 CYCLES per divide** (compared to 17 per divide normally), and you only drop about 1.5 bits in accuracy.
If you're in double-precision, you can extend the technique to 3 divides per loop, and you still get even faster.
I've used this in texture mappers, perspective projection code (vertex transform, for example), and it's
just really a wonder.