Fast Shadows on Rectangles


Michael Herf
December 2001

December 2006 (Christmas!) update: Here's the code I gave to Andrew at DeviantArt back in 2001 (something similar to this makes the shadowed thumbnails on their site). The one I use for myself is more nicely encapsulated (no global variables and nicer encapsulation for bitmaps and rectangles), but this is easier to post.

2001 article

A lot of my technology is developed in making advanced user interfaces and interesting experiences. As a result, I have a rather large "bag of tricks" lying around that most people don't seem to know. This article is about one of the ones I think is neatest -- both for its mathematical elegance and the fact that no one else seems to have discovered anything quite as fast or high-quality.

Actually, I'll give some credit to Andrew McCann for convincing me to write this up -- he saw my web page photo maker, and said, "We have to have that for DeviantArt!". So we did a quick Linux port of my shadowrect program, and it will hopefully show up there sometime soon.

To the best of my knowledge, people who do soft drop shadows on windows or roundrects for interfaces are doing one of a few things:

  1. Precomputing tiles that get "stretched" around the edges,
  2. Doing box filters in real time to show a shadow (on the whole mask!)
Microsoft's, in particular, look pretty awful! They have these sad little gray shadows in XP, and they're still so slow. Take a look:

OS X looks much better -- they realized it makes sense to do a blur bigger than one pixel:

So, to summarize, what we're after is Photoshop-quality Gaussian blurs, but special cased for simple shapes.

And, well, it's just pretty darned easy.

Gaussian Blurs

Photoshop's Gaussian Blur is effectively three box filters applied one after the other. This is a relatively accurate approximation to the Gaussian (you probably couldn't tell the difference), but with some nice properties.

First, the blur has easily-computed bounds. For a given blur radius, each box filter will add only its radius to each edge of the blur image. Say you have an image that's 400 x 300 pixels. After blurring with a filter of radius = 4.5 three times, you just expand the rectangle enclosing the image by 13.5 pixels (=4.5 x 3) on a side.

Original Rectangle: (0, 0, 400, 300)
Blurred three times: (-13.5, -13.5, 413.5, 313.5)
Integer bounds: (-14, -14, 414, 314)

So, finally, we can store the blurred image in a new one that's 428 x 328 pixels, with no clipping at all.

Of course, a real Gaussian would give us no such benefit: it has an infinite extent, and while for practical purposes, its extent can be clamped (when the curve becomes smaller than a pixel value), it's a lot more work to do that than just to add a few numbers.

Finally, the B-Spline generated by three box filters is easily integrable, which will make things very nice for us.

A 1-D Box

It turns out that a Gaussian is separable, which, if you're blurring an image, means you can blur in one dimension, then blur in the other, and get the same result as if you had done the whole thing at once.

This comes in very handy with rectangles, because we can look at them as the intersection of two half planes.

In particular, you can do the following experiment in Photoshop or any layered image editor:

  1. Make two half-plane images:

  2. "Screen" them together (screen is an inverse multiply, just treats white like black and vice-versa):

  3. Now blur each one (separately, not together):

  4. Now turn the screen back on:

Now for something important:

Image (4) is exactly the same thing you'd get if you blurred image (2) directly.

Part 2: Integrating a B-Spline

If we wanted to talk about the triple box-filter described above as a single convolution, we really should describe the convolution kernel as a Uniform B-Spline because it's the right way to do it mathematically.

I found a good UC Davis article to explain this, so I don't even have to explain things too much: Uniform B-Splines as a Convolution.

So, in particular, we deal with the third order b-spline, which is piecewise quadratic.

The convolution filter can be written as follows:

f(x) =
0,for x < -1.5,
0.5 * (x + 1.5) ^ 2,for -1.5 <= x < -0.5
0.75 - x * x,for -0.5 <= x < 0.5
0.5 * (x - 1.5) ^ 2,for 0.5 <= x < 1.5
0,for x > 1.5

Now, if we just had a kernel and an edge, we could do a manual convolution, or, we could do a few box filters. Really, we could really do lots of different things, and they'd all have some inaccuracies, problems, etc.

In short, they wouldn't be right, and there would be a lot of code involved.

But I had a tiny breakthrough when I realized that, well, convolution is commutative. In short this means:

Gaussian filtering a Box is the same as Box filtering a Gaussian.

To do a huge range of operations on a 1-D Gaussian edge, all we have to do is write down the indefinite integral of the above B-Spline, and use it to compute a variety of interesting definite ones. So I set about writing down all the cases, and then make the following code to implement it:

typedef float real32;

// ================================================================================================
// approximation to the gaussian integral [x, infty)
// ================================================================================================
static finline real32 gi(real32 x)
    const real32 i6 = 1.f / 6.0;
    const real32 i4 = 1.f / 4.0;
    const real32 i3 = 1.f / 3.0;

    if (x > 1.5) return 0.0;
    if (x < -1.5) return 1.0;

    real32 x2 = x * x;
    real32 x3 = x2 * x;

    if (x >  0.5) return .5625  - ( x3 * i6 - 3 * x2 * i4 + 1.125 * x);
    if (x > -0.5) return 0.5    - (0.75 * x - x3 * i3);	// else
                  return 0.4375 + (-x3 * i6 - 3 * x2 * i4 - 1.125 * x);
Well, now we can do one of two things. First, we can compute just an edge curve using the indefinite integral directly. If you know your shape has symmetry, this is the easiest thing to do. For instance, Shawn and I did the cool radial shadows on This is Not a Dream using exactly this technique -- because radial symmetry is automatic, it's easy to use just a 1-D blur.

Shadowed eye candy in Flash

...and now, back to our regularly-scheduled left-brained math.

As you might have realized, the parameter "x" above is a signed value, and you have to scale it to do what you want. For instance, if you want a wide shadow, scale it by a small number, and if you want a small one, make your scalefactor bigger. In any case, I should probably rescale the whole equation so it fits in (-1, 1) but I haven't gotten around to it.

On to the races...using the indefinite integral:

As the figure says, this technique is just 1-D: you evaluate the integral once per step in X, which makes it easy to make a shadow.

Next, if you want to be lazy and not worry about how to manage the center point of your object, or you want to save a few cycles in the inner loop, you can use a definite integral instead. This technique is great, and it works in the case where the blur is bigger than the object -- most systems mess this up, but you can do it correctly here.

As the figure says, this technique requires two samples per pixel, spaced at the size of the interval you're evaluating.

Putting it all together

Now you probably have a bunch of blurry half-planes, intervals, etc. And the only thing you have to do to put them together is, well, multiply. It's really quite easy if you've gotten this far to simply put the two pieces together and draw in real-time.

If you want to simulate what most people do in Photoshop, you should add an "offset" (mine's just an integer in x and y) to translate the shadow with respect to the image shadowing it. If you're really cool you can add a "scale" parameter to simulate how shadows are sometimes bigger than their occluders.

Secondly, you'll want to modulate the alpha -- don't draw black shadows! Usually blending at about 60% of the "black" shadow opacity looks good, but it entirely depends on the radius of the effect.

Since a modern CPU can compute a single multiply per pixel very quickly, total cost for the inner loop is:

  • two table lookups,
  • a multiply, and
  • a pixel-darkening function (which usually requires a couple more multiplies.)

On my machine, I can do this amount of work at nearly memory bandwidth. Usually you have to invert before or after to make the combination work, something like (1 - xedge * yedge).

And yep, it's easy to do using multitexture on graphics hardware.

A few other things you'll want to try: since all the tables involved are smooth, you could lerp each lookup (using texture hardware or the CPU), and make a shadow that rotates quickly or draws with subpixel-accurate edges. This is slightly less work than normal bilinear -- you only have to do 2 lerps, rather than 3.

Secondly, a very important point: don't draw behind opaque objects. Really. Just write a rectangle difference function, and only draw the pieces that will actually show up.

Finally, the coolest trick: take the output of this routine (usually in 16-bit), and make a lookup table that converts the shadow to a roundrect. You have to figure that one out yourself, but it's really easy.

This article is Copyright (C) 2001 Herf Consulting LLC. All Rights Reserved.