Michael Herf and Shawn Brenneman
If you want to decompress JPEGs really fast, you have a few free options to choose from.
I was hooking up Intel's JPEG library, and on the image I was using, I noticed a significant artifact. Here's the sample I sent to Intel:
Image Copyright 2000 Phil Askey. Please see www.dpreview.com for
the original image.|
I'm assuming that IE5 uses the decompressor from the Independent JPEG Group, since they have a credit for it in their about box.
In any case, through my friend at Intel I got this response:
By default, the IJL uses IJL_BOX_FILTER upsampling method. To get better quality you can use IJL_TRIANGLE_FILTER method. Note, IJL_TRIANGLE_FILTER is slower than IJL_BOX_FILTER upsampling.Fair enough.
I enabled IJL_TRIANGLE_FILTER, but then, I decided I should really do a more comprehensive test of JPEG decompression speed.
The BenchmarkFor this benchmark, I'm only using the image above (a quite large 2160x1440 image). If your needs lean towards small image compression, you shouldn't rely on these timings at all. But if you're decompressing large files, say video-resolution or greater, these numbers should be quite applicable.
Also, we tested with just one picture. All timings were done on an Intel P3/600EB, etc. Mileage, as usual, will vary greatly by CPU and memory subsystem.
DecompressorsI used four decompressors for this test.
Off to the racesAll timings include the following:
32 Bit?Not everybody uses 32-bit images, but it's quite important to me. I think the overhead for the 24->32 bit conversion is relatively similar between the compressors, so I haven't hurt anyone in particular by doing this.
The decompressors all required different techniques for the 24->32 bit conversion.
In particular, the most expensive conversion was an in-place 24->32 bit conversion, which took about 100ms to complete. This required a full pass through the image after it had decompressed, which was definitely out-of-cache. All of the IJL timings have this overhead. IJL does have its own 32-bit modes, but they were considerably slower, so we rolled our own.
Windows can draw directly to an ARGB bitmap, so we let it do that. IJG gives each scanline back in a separate buffer, so we did the conversions on the fly (probably faster, because the cache locality is just so much better.)
And the winner...
Comments and RecommendationsWell, if you want a high-quality JPEG loader that's fast on mid-range machines, use the Independent JPEG Group's. Surprised?
Well, I am. Actually, this is doubly surprising, because in my profile, IJG's _h2v1_fancy_upsample (the "triangle filter" that takes Intel so long), only uses about 50ms total for IJG. Go figure? Maybe Intel just doesn't optimize that path much at all.
Of course, the speed of the new IJL 1.51 is truly phenomenal, especially when you consider that our 24->32 bit Unpack function is using 100ms on top of the decompress. So when you want speed at all costs, or for video applications, definitely use IJL 1.51.
Intel, however, does cost a lot in terms of code size. Also, they have a Clause of Evil in their license agreement:
"Upon Intel's release of an update, upgrade or new version of the Materials, you will make reasonable efforts to discontinue distribution of the enclosed Materials and you will make reasonable efforts to distribute such updates, upgrades or new versions to your customers who have received the Materials herein."Well, it makes sense. They want you to distribute bug-free code (though their code still leaks about 96 bytes per JPEG.)
But why is this bad?
IJL10.DLL was 148KB.
And you're obligated to upgrade all your users at your earliest convenience. So if you care about code size at all, you're at Intel's whim on this one. I'm not complaining about the performance, but it could be a nasty slide to fall down.
But, hmm, I guess VJPEG will be using IJG for a while.