Ben and 2 Divides

Two years ago, Ben Weiss (the guy who wrote Kai's Power Tools, Goo, Soap, etc.) was telling me about this idea he had to combine two divides. I just didn't pay attention.

Luckily, he told me again.

Ben's idea is really simple, and I use it everywhere now.

2 divides at once

Suppose you want to compute:

	a/b    AND    c/d
In the real number system, this is the same as:
	ad/bd    AND    cb/bd
In case this isn't getting through, that's:
	inv := 1 / (b * d)
	inv * a * d    AND    inv * b * d

Multiply instead

Of course, modern processors are LOTS faster at multiplies than divides. In fact, in optimal conditions, most of them can do a multiply per cycle, and a divide every 20-40 cycles. So an extra 5 multiplies spread over 2 divides should be worth it, right?

Well, I did some speed tests:

	// This snippet computes c <- a / b, component-wise

	for (int j = 0; j < n; j += 2) {
		float ibd = 1.0f / (b[j] * b[j+1]);
		
		c[j]   = a[j] * b[j+1] * ibd;
		c[j+1] = a[j+1] * b[j] * ibd;
	}
If you put your P2's FPU in single-precision mode, this code runs in a blazing 9 CYCLES per divide (compared to 17 per divide normally), and you only drop about 1.5 bits in accuracy. If you're in double-precision, you can extend the technique to 3 divides per loop, and you still get even faster.

I've used this in texture mappers, perspective projection code (vertex transform, for example), and it's just really a wonder.