Luckily, he told me again.
Ben's idea is really simple, and I use it everywhere now.
a/b AND c/dIn the real number system, this is the same as:
ad/bd AND cb/bdIn case this isn't getting through, that's:
inv := 1 / (b * d) inv * a * d AND inv * b * d
Well, I did some speed tests:
// This snippet computes c <- a / b, component-wise
for (int j = 0; j < n; j += 2) {
float ibd = 1.0f / (b[j] * b[j+1]);
c[j] = a[j] * b[j+1] * ibd;
c[j+1] = a[j+1] * b[j] * ibd;
}
If you put your P2's FPU in single-precision mode, this code runs in a
blazing 9 CYCLES per divide (compared to 17 per divide normally), and you only drop about 1.5 bits in accuracy.
If you're in double-precision, you can extend the technique to 3 divides per loop, and you still get even faster.I've used this in texture mappers, perspective projection code (vertex transform, for example), and it's just really a wonder.