Jump to navigation

Poupou's Corner of the Web

Looking for perfect security? Try a wireless brick.
Otherwise you may find some unperfect stuff here...

Weblog

The Full Price of FullName

Everyone using, even a small part of, Cecil knows it's amazing. Now there can be some inconveniants to use part of something - because the other parts cannot (or rarely) be totally ignored. In Gendarme's case it only uses the reading side of Cecil - but the later would not be so useful without support for writing as well.

This leads to a few things that are not optimal, from Gendarme's point of view, inside Cecil. The biggest issue is that a feature like writing support removes a lot of caching possibilities. This can be seen in Cecil's FullName properties (often used in ToString overrides) where, most of them, will re-generate (i.e. allocate a new string) the full name for each call.

This is not something new - it's actually been that way since Cecil gained write-ability long ago (GSoC 2006). I hoped the situation would be better with cecil-light (and maybe it is to some extent) but the new Mono Log Profiler (re)opened my eye to this issue recently.

Using the log profiler made it easy to see all the string allocations caused (in part) by the FullName properties. Running mono with --profile=log will enable to log profiler (see man mono for more options) and the resulting output.mlpd (default name) file contains the results. To generate a text report you then use the mprof-report tool. E.g.

mono --profile=log bin/gendarme.exe --config rules/rules.xml --set self-test \
	--log self-test.log --ignore=self-test.ignore --severity=all --confidence=all \
	bin/gendarme.exe bin/gendarme-wizard.exe bin/Gendarme.*.dll
mprof-report --traces --maxframes=8 output.mlpd > report

The resulting report log file contains lots of details but, for this blog entry, I'll focus solely on the Total memory allocated line.

Phase 1: Caching and avoiding the cache

Removing the duplicate allocations is simple. A new extention method, GetFullName, was added to call and cache FullName. However this traded memory at the expense of hitting the cache very often (i.e. extra lookup time). While updating Gendarme's code base it become quickly apparent that, in most cases, the full name was not really needed. I.e. a check for the Namespace and Name properties (both available without the string allocation cost) were enough. So to avoid the cache another new extention method, IsNamed was added. The first result were significative:

before	Total memory allocated: 84653632 bytes in 1014775 objects
after	Total memory allocated: 73385936 bytes in  905715 objects
diff				11267696 bytes     109060 objects
				   13.3%             10.7 %

Phase 2: API

Some Gendarme's framework API also promoted the use of the full name. Again to avoid hitting the cache they were changed, one by one, to use separate namespace and name parameters.

Inherits

before	Total memory allocated: 73385936 bytes in 905715 objects
after	Total memory allocated: 73422560 bytes in 906391 objects

Notice that memory usage actually grown to fix the API. That's because:

  • nothing more, allocation wise, is saved (i.e. phase 1 has all the gains);
  • some rules needs a bit more data to work with the split namespace/name versus the full name.
Still we avoid retrieving the fullnames from the cache when it (definitively) was not required.

Implements

before	Total memory allocated: 73422560 bytes in 906391 objects
after	Total memory allocated: 73425864 bytes in 906958 objects

Again a small memory increase, for the same reasons as Inherits above.

HasAttribute

before	Total memory allocated: 73425864 bytes in 906958 objects
after	Total memory allocated: 71064328 bytes in 807886 objects

Here a bit of duplicated code was removed, leading to less code to analyze (i.e. it's a self-test, running Gendarme on Gendarme), in turn requiring a bit less memory.

Contain[Any]Type

before	Total memory allocated: 71064328 bytes in 807886 objects
after	Total memory allocated: 71050024 bytes in 807698 objects

Another small drop. Some (now) unused extention methods and unrequired GetFullName usage were removed.

Cecil's HasTypeReference

Cecil itself use the FullName properties (JB removed a few cases recently, he had early access to my data ;-) and has some API that requires its use, e.g. ModuleDefinition.HasTypeReference. That could be worked around by using ModuleDefinition.GetTypeReferences and some, nice looking, LINQ-y replacements.

before	Total memory allocated: 71064328 bytes in 807886 objects
after	Total memory allocated: 72072960 bytes in 834140 objects

Memory goes up again! Why ? for the same reasons as FullName, i.e. Cecil's ModuleDefinition.GetTypeReferences and GetMemberReferences are allocating a new array on each call. Again this provides no useful value to read-only applications, like Gendarme, so the results were (again) cached.

before	Total memory allocated: 72072960 bytes in 834140 objects
after	Total memory allocated: 71285616 bytes in 817145 objects

So the final increase (the cached arrays) is a lot smaller than my first attempt :-)

Conclusion

I'll let the numbers speak for themselves:

original	Total memory allocated: 84653632 bytes in 1014775 objects
final		Total memory allocated: 71285616 bytes in  817145 objects
diff					13368016 bytes     197630 objects
					    15.8 %           19.5 %

There are likely a few other, indirect uses of FullName or GetFullName that could be avoided. I suspect most will be found (and fixed) before 2.12 is released - anyway there's no real harm in them.

Beside confirming old suspects, another fun aspect of profiling is that you'll notice a lot of things when reading the logs, some of them surprising because they challenge/defy your expectations... and give real, nice ideas for further optimizations. Stay tuned :-)


3/1/2011 20:10:23 | Comments

The views expressed on this website/weblog are mine alone and do not necessarily reflect the views of my employer.