I think that it's really good to have a fast log base 2 approximation.
So I made one.
I use these functions for mipmapping, accumulation buffers, all sorts of things where you need to do a log per pixel. If you call the standard C library, you'll generally end up spending well over 100 cycles doing logs. Especially when you only need integer precision -- if you do:
(int)(log(x) / log(2))you'll spend about 200 cycles.
I have a couple of functions that run in "about 4" cycles, which means that it depends a lot on what register use is like when you call them. In well-pipelined situations, you might get 1 cycle.
If you need small amounts fractional precision, you can raise the input to a power before passing it in (using multiplies, not pow!) and then scale the result it gives you.
There are two implementations below, one using the FPU in a portable way (assuming you're on an IEEE floating-point machine.) The second implementation uses the x86's "bsr" instruction in a very non-portable way.
However, most CPUs do have this kind of call at the assembly level. But if you don't want to rewrite the function in assembly on your platform, I recommend simply writing an integer version using the floating-point version of the function. e.g.,
typedef unsigned long uint32; typedef long int32; static inline int32 ilog2(uint32 x) { return ilog2((float)x); } // integer log2 of a float static inline int32 ilog2(float x) { uint32 ix = (uint32&)x; uint32 exp = (ix >> 23) & 0xFF; int32 log2 = int32(exp) - 127; return log2; } static inline int32 ilog2_x86(uint32 x) { int32 retval; __asm { bsr eax, x mov retval, eax } return retval; }