Jump to navigation

Poupou's Corner of the Web

Looking for perfect security? Try a wireless brick.
Otherwise you may find some unperfect stuff here...

Weblog

More alpha bits

While the ColorMatrix class wasn't widely used it was not the only place where libgdiplus suffered from a lack of pre-multiplication. Loading bitmaps, e.g. 32bpp PNG images, also require to pre-multiply the alpha value to every R, G and B values. Otherwise things starts to look strange or bad. The pre-multiplication process is simple but CPU intensive. It requires, for every pixel (and everyone like bitmaps with a lot of pixels), a division (alpha / 0xff = float) and three multiplications (float * R, G, B).

Divisions are slow, so removing the division, using 256 pre-computed floats values (1kb), can have a very visible impact on the required time to apply a ColorMatrix or, more commonly, when loading a transparent bitmap.
Time: 0:44.1136470 seconds (see previous benchmark)
to
Time: 0:40.8741020 seconds
I suspect this difference will vary a lot depending on how well your CPU architecture handles divisions.

Multiplications are faster than divisions, but we have three of them. Sadly removing the multiplications requires a bigger table, like 65536 bytes (it could be made smaller but would require more time to compute, negating part of the advantage of using a table). This table also removes the need for the previous, albeit a lot smaller, table.
New time: 0:35.7004520 seconds

That's almost another 20% reduction (45% from the original ColorMatrix). Now is it worth the extra 64kb space in the (already more then 2.5mb) libgdiplus binary ?

If it was only for the ColorMatrix then probably not. But we get a lot more comments/bugs on libgdiplus performance than on it's size and loading transparent bitmaps correctly, and without getting (too much) slower, looks worthy enough :-)


1/3/2007 15:21:09 | Comments | Permalink

The views expressed on this website/weblog are mine alone and do not necessarily reflect the views of my employer.