Luckily, he told me again.
Ben's idea is really simple, and I use it everywhere now.
a/b AND c/dIn the real number system, this is the same as:
ad/bd AND cb/bdIn case this isn't getting through, that's:
inv := 1 / (b * d) inv * a * d AND inv * b * d
Well, I did some speed tests:
// This snippet computes c <- a / b, component-wise for (int j = 0; j < n; j += 2) { float ibd = 1.0f / (b[j] * b[j+1]); c[j] = a[j] * b[j+1] * ibd; c[j+1] = a[j+1] * b[j] * ibd; }If you put your P2's FPU in single-precision mode, this code runs in a blazing 9 CYCLES per divide (compared to 17 per divide normally), and you only drop about 1.5 bits in accuracy. If you're in double-precision, you can extend the technique to 3 divides per loop, and you still get even faster.
I've used this in texture mappers, perspective projection code (vertex transform, for example), and it's just really a wonder.