Jump to navigation

Poupou's Corner of the Web

Looking for perfect security? Try a wireless brick.
Otherwise you may find some unperfect stuff here...


More alpha bits

While the ColorMatrix class wasn't widely used it was not the only place where libgdiplus suffered from a lack of pre-multiplication. Loading bitmaps, e.g. 32bpp PNG images, also require to pre-multiply the alpha value to every R, G and B values. Otherwise things starts to look strange or bad. The pre-multiplication process is simple but CPU intensive. It requires, for every pixel (and everyone like bitmaps with a lot of pixels), a division (alpha / 0xff = float) and three multiplications (float * R, G, B).

Divisions are slow, so removing the division, using 256 pre-computed floats values (1kb), can have a very visible impact on the required time to apply a ColorMatrix or, more commonly, when loading a transparent bitmap.
Time: 0:44.1136470 seconds (see previous benchmark)
Time: 0:40.8741020 seconds
I suspect this difference will vary a lot depending on how well your CPU architecture handles divisions.

Multiplications are faster than divisions, but we have three of them. Sadly removing the multiplications requires a bigger table, like 65536 bytes (it could be made smaller but would require more time to compute, negating part of the advantage of using a table). This table also removes the need for the previous, albeit a lot smaller, table.
New time: 0:35.7004520 seconds

That's almost another 20% reduction (45% from the original ColorMatrix). Now is it worth the extra 64kb space in the (already more then 2.5mb) libgdiplus binary ?

If it was only for the ColorMatrix then probably not. But we get a lot more comments/bugs on libgdiplus performance than on it's size and loading transparent bitmaps correctly, and without getting (too much) slower, looks worthy enough :-)

1/3/2007 15:21:09 | Comments

The views expressed on this website/weblog are mine alone and do not necessarily reflect the views of my employer.