I think that it's really good to have a fast log base 2 approximation.
So I made one.
I use these functions for mipmapping, accumulation buffers, all sorts of things where you need to do a log per pixel. If you call the standard C library, you'll generally end up spending well over 100 cycles doing logs. Especially when you only need integer precision -- if you do:
(int)(log(x) / log(2))you'll spend about 200 cycles.
I have a couple of functions that run in "about 4" cycles, which means that it depends a lot on what register use is like when you call them. In well-pipelined situations, you might get 1 cycle.
If you need small amounts fractional precision, you can raise the input to a power before passing it in (using multiplies, not pow!) and then scale the result it gives you.
There are two implementations below, one using the FPU in a portable way (assuming you're on an IEEE floating-point machine.) The second implementation uses the x86's "bsr" instruction in a very non-portable way.
However, most CPUs do have this kind of call at the assembly level. But if you don't want to rewrite the function in assembly on your platform, I recommend simply writing an integer version using the floating-point version of the function. e.g.,
typedef unsigned long uint32;
typedef long int32;
static inline int32 ilog2(uint32 x) {
return ilog2((float)x);
}
// integer log2 of a float
static inline int32 ilog2(float x)
{
uint32 ix = (uint32&)x;
uint32 exp = (ix >> 23) & 0xFF;
int32 log2 = int32(exp) - 127;
return log2;
}
static inline int32 ilog2_x86(uint32 x)
{
int32 retval;
__asm {
bsr eax, x
mov retval, eax
}
return retval;
}