August 2009 - Posts
We are often too liberal adding project references. In our team I’ve sent once a warning that every time an assembly reference is added to a project, an angel gets killed, but Norway does not seem to be very religious country. Recently we’ve even had a problem with an assembly loaded in two different contexts (using Load and LoadFrom), so it was definitely time to have a closer look at our dependencies. And NDepend is invaluable tool when it comes to dependency analysis.
I haven’t had much of experience with using NDepend, so I decided to gain it step by step and started with metrics for a small solution that consists of our common utility libraries. For dependency analysis purpose there is not much value in running NDepend on a set of utility classes without including code that uses those classes. Utility classes often have low cohesion and mostly reference only .NET FX types. But I wanted to test NDepend’s code quality metrics and perhaps even improve some of our old code. It worked out and I even found and fixed a bug.
But first I installed NDepend that added a new add-in to Visual Studio. Now I could activate NDepend by choosing “Go to VisualNDepend” from project-level context menu.
NDepend generates its metrics for projects included in a Visual Studio solution, and to generate useful metrics it’s important to make sure it works on a right project set (you can always exclude projects from analysis, as I did with unit and integration test projects). Then I started NDepend analysis and it responded soon with a picture familiar to anyone who used the product:
The actual dependcy graph was not very interesting for me at this point. Here is how it looked:
Again, these are just utility classes (split between 4 assemblies: Lfx.Core, Lfx.Data, Lfx.Global and Lfx.Services). NDepend also produced a table that showed number of dependent methods:
Here numbers with green background show the number of methods in the referencing assembly (specified in rows) that use methods from a referenced assembly (specified in columns), and numbers with blue background show the number of referenced methods in the referenced assembly. For example, the table above shows that there 8 methods in Lfx.Global assembly that reference 3 methods in Lfx.Data assembly. By left-clicking on cell I could generate a graph showing related methods:
I postponed more complex dependency analysis until the next time and looked at NDepend code metrics. NDepend comes with more than hundred customizable queries that check various aspects of code quality. For example, this result list displays 10 candidate methods for refactoring:
And here’s the criteria that was used in a CQL query to generate the result set above:
NbLinesOfCode > 30 OR
NbILInstructions > 200 OR
CyclomaticComplexity > 20 OR
ILCyclomaticComplexity > 50 OR
ILNestingDepth > 4 OR
NbParameters > 5 OR
NbVariables > 8 OR
NbOverloads > 6
Query parameters can be easily altered: you just place a cursor on a query parameter and NDepend displays a tooltip with a control to change it:
To make query results even more visual, NDepend highlights the areas that contain the offensive code:
I browsed metrics results and found two methods with high cyclomatic complexity (22 with recommended value not to exceed 15). This surprised me, because I did not expect utility classes to have methods with such complex logic. I checked one of them, it was an overriden Equals operator that tested two instances of the same class for equality. The respective class (called CpType) had several string field members, and the equality check consisted of several similarly looking lines of code:
public override bool Equals(object obj)
{
CpType type = obj as CpType;
if (type != null)
{
return (string.IsNullOrEmpty(this.FieldA) && string.IsNullOrEmpty(type.FieldA) || this.CodeBase.Equals(type.FieldA)) &&
(string.IsNullOrEmpty(this.FieldB) && string.IsNullOrEmpty(type.FieldB) || this.TypeName.Equals(type.FieldB)) &&
/* some lines skipped */
(string.IsNullOrEmpty(this.FieldZ) && string.IsNullOrEmpty(type.FieldZ) || this.RemoteServer.Equals(type.FieldZ));
}
else
{
return false;
}
}
It was not surprising that this code had high cyclomatic complexity: each of the following expressions increases it: if | while | for | foreach | case | default | continue | goto | && | || | catch | ternary operator ?: | ??. But how this can be avoided if a class has a large number of fields and each field should be compared in Equals method?
First, the number of fields is itself a concern. The class above was a collection of parameters used in dynamic type instantiation. It was so old that it even included parameters used to create an instance of remote DCOM server, and at the same time it was recently updated to support instantiation of WCF services. It was a good candidate not only for refactoring – for rewrite, and one of NDepend metrics brought an attention to it. But that was not all. When inspecting the Equals method code, I saw that one of similarly looking lines had a typo: string.IsNullOrEmpty(type.FieldM) instead of string.IsNullOrEmpty(type.FieldN). I must have pasted a previous line and haven’t propertly updated it. And that line with type passed all unit test, so I had another problem: insufficient number of unit tests (in fact I did not have negative unit tests for Equals method).
Following TDD practice, I first created failing unit tests, fixed them and then looked if I could reduce cyclomatic complexity of the method without class rewrite. It was possible if I created a lambda-expression encapsulating field equality test and called it for each field:
public override bool Equals(object obj)
{
CpType type = obj as CpType;
if (type != null)
{
Func Equals = (x, y) => { return string.IsNullOrEmpty(x) && string.IsNullOrEmpty(y) || string.Equals(x, y); };
return (Equals(this.FieldA, type.FieldA)) &&
(Equals(this.FieldB, type.FieldB)) &&
/* some lines skipped */
(Equals(this.FieldZ, type.FieldZ));
}
else
{
return false;
}
}
I recompiled the code and updated NDepend metrics: it no longer complained about this method. But the type is now in my refactoring TODO list.
While NDepend demonstrated its excellent quality to analyze individual classes and methods, its real value is in dependency analysis. This is what I plan to to next and describe in one of my future posts.
I was very pleased to find this Patrick Smacchia’s blog entry where he shows how his quest for 100% code coverage helped him finding a bug. His other post presents deeper arguments in favour of striving for 100% code coverage, and I think the most important point is that 100% test coverage protects from coverage erosion. Exactly! And I have a fresh example that illustrates this point. I am responsible for maintaining and extending a library of .NET utility classes that are shared across company projects. Since these classes are frequently used, they just can’t fail, so I keep them well covered by tests, and two of four assemblies have 100% code coverage. With other two it is difficult to reach the absolute maximum, but the coverage is still pretty good. Recently I extended one of not fully covered assemblies with new classes and did it in a rush, without writing many tests for newly added classes. The coverage remained over 95%, so the library looked well covered. Now if I don’t take my time and write more tests for new code, in a few months nobody will probably be interested in doing anything with this assembly. There are other important things to do, there are projects with coverage less than 20%, shouldn’t we take them instead? Well, it depends. If we instead of using project metrics take the class metrics, then we will have to admit that we introduced a poorly tested class, perhaps with less than 15% coverage. Or we can take one step further and have a look at method coverage. In so poorly tested class there may be methods with 0% coverage. These are the consequences of what Patrick Smacchia calls coverage erosion: once we don’t strive for maximum coverage, we open the door for poorly tested code, but our metrics will continue looking impressive.
I often hear an argument that trying to reach 100% code coverage is not practical and finding an acceptable code coverage level is a result of a compromise. I think this argument is dangerous, because it is mostly used not by people who stop writing automated tests when they see the rest is difficult to cover, but by people who won’t even try to cover as much as they can, because you know… it is not practical… we have other things to do. Last year I worked with a consultant whom I asked to write more tests for a module that deals with customer data retrieval. He spent several days and reported that the work was done. When I checked the module code coverage, if was about 60%. I spent few more hours and it increased to 80. I asked him why he decided not to move on with writing more tests, and he answered that since the average code coverage across company projects was approximately on that level, he thought this would be sufficient. To me this sounds like if we stop fixing bugs because the world is imperfect.
So if code coverage analysis tool exposed that the statement “a = b;” is not covered by automated tests, what could be the reasons for leaving this line uncovered?
- Is this because this statement is not used? Actually this is quite possible, code coverage reports often help figure out obsolete code that can be removed. Then we should confirm this and remove it.
- Is this because this statement is supposed to be tested during manual QA session? Then what is less expensive and more reliable: assign the test task to a human being or instruct a computer to run it every night or even on every check-in?
- Is this because this statement is hard to reach? Well, we should refactor it then. This is another advantage of code coverage reports: they often help us make code more accessible.
- Is this because this statement is triggered on external event that only occurs in production? So how do we know it will work? We probably have an access to an external test environment, but it’s not for automated tests invoked thousands times a day. Then we can mock it, write a simulator, there are well known methods to deal with this stuff.
If you consider various options to reach untested code, you will find that in most cases it is not that hard. And most of it is one time investment: you acquire technique to reach most of your code, and you will be writing code that is easier to reach.
So when people insist on practical aspects of attempting to reach maximum code coverage, I can ask them about practical aspects of whole unit testing initiative. It’s the same, really. The main reasoning behind automated testing is that we can’t beat a computer in accuracy and punctuality. If nobody seems to be willing to accept these days that a software product is developed without automated tests, then why should it be different for a class? For a method? For a single statement? Look at such statement and ask yourself a question: why am I willing to release this line of code in production without running it through an automated test? You should have very good arguments if you are willing to do so.
And if you are still not fully convinced, please read a relatively old blog post by Eric Sink, a founder of SourceGear. Well, he is a smart guy, you agree? So even though he says that he would personally stop short of a recommendation that coverage must be 100%, look at his NCover report. 100%, not less! And he writes that “getting full coverage has been worth the effort”. You won’t find in sotware industry many other people who combine such brilliant development and business skills, so I am pretty sure that when he says “I am somewhat fanatical about automated testing and code coverage, I enthusiastically recommend using them”, he has certain practical reasons for such statement.
And if you convinced, then let me share with you a piece of my personal practice: if I have two projects, one with 20% code coverage, and the other with 95% code coverage, then assuming that their quality and readiness is of equal importance, I take a look at the one with highest coverage first. If it’s so close to completion, then bringing its code coverage to 100% takes it to another level where it is protected from coverage erosion. And it’s worth it.
I wanted to analyze our company projects with NDepend and ran NDepend on a couple of our solutions. While it produced very interesting reports (I will try to cover results of NDepend analysis in one of my following posts), I've found that the solution contents were not optimal for such analysis. When analyzing project dependencies, you want to be specific about what projects are used as input. In our case every solution contains unit/integration test projects - actually nearly half of our projects are test projects, and inclusion of test projects in dependency analysis can give wrong impression about code usage: for example, some obsolete code may be reported as code in use, even though it is only called from tests. So I wanted to limit the projects being analyzed gathering all projects used in production and only them. I didn't have a solution that would satisfy me, so I decided that the easiest would be to find a utility that generates a Visual Studio solution file by scanning all projects within the specified root. Expecting to find such program on the net, I searched the Web with search pattern "generate visual studio solution". Nothing useful. I've extended the pattern with "utility", but still no luck. I still refuse to believe that nobody wrote and published such simple thing, I must have failed with good search pattern. Anyway, now I have such utility - I had to write it myself. It's very simple and does the following:
- Generates a solution file (.sln) that can be opened with Visual Studio 2008 (sorry, no VS 2005 support, but it’s a matter of changing two lines in the source);
- Includes in a solution file references to all C# projects (sorry, no VB) that reside in subfolders of a given root;
- Inclusion and exclusion filters can be specified in a command line.
The application and its source code can be downloaded from here. Here are examples of usage:
GenerateSolutionFile.exe
Usage: GenerateSolutionFile /p <path> /s <solution> [/i includeFilter] [/e excludeFilter]
GenerateSolutionFile.exe /p C:\Projects\Lfx /s C:\Projects\Lfx\Test.sln
Added project Lfx.ConfigReader.csproj
Added project Lfx.Core.csproj
Added project Lfx.Core.TestAssemblyConfig.csproj
Added project Lfx.Core.TestConfiguration.csproj
Added project Lfx.Core.Tests.csproj
Added project Lfx.Data.csproj
Added project Lfx.Data.Tests.csproj
Added project Lfx.Global.csproj
Added project Lfx.Global.Tests.csproj
Added project Lfx.Msmq.csproj
Added project Lfx.Msmq.Tests.csproj
Added project Lfx.Services.csproj
Added project Lfx.Services.Tests.csproj
Added project Lfx.TestUtils.csproj
Solution is generated, 14 projects added
GenerateSolutionFile.exe /p C:\Projects\Lfx /s C:\Projects\Lfx\Test.sln /i Core\*
Skipped project Lfx.ConfigReader.csproj
Added project Lfx.Core.csproj
Added project Lfx.Core.TestAssemblyConfig.csproj
Added project Lfx.Core.TestConfiguration.csproj
Added project Lfx.Core.Tests.csproj
Skipped project Lfx.Data.csproj
Skipped project Lfx.Data.Tests.csproj
Skipped project Lfx.Global.csproj
Skipped project Lfx.Global.Tests.csproj
Skipped project Lfx.Msmq.csproj
Skipped project Lfx.Msmq.Tests.csproj
Skipped project Lfx.Services.csproj
Skipped project Lfx.Services.Tests.csproj
Skipped project Lfx.TestUtils.csproj
Solution is generated, 4 projects added
GenerateSolutionFile.exe /p C:\Projects\Lfx /s C:\Projects\Lfx\Test.sln /e Core\*
Added project Lfx.ConfigReader.csproj
Skipped project Lfx.Core.csproj
Skipped project Lfx.Core.TestAssemblyConfig.csproj
Skipped project Lfx.Core.TestConfiguration.csproj
Skipped project Lfx.Core.Tests.csproj
Added project Lfx.Data.csproj
Added project Lfx.Data.Tests.csproj
Added project Lfx.Global.csproj
Added project Lfx.Global.Tests.csproj
Added project Lfx.Msmq.csproj
Added project Lfx.Msmq.Tests.csproj
Added project Lfx.Services.csproj
Added project Lfx.Services.Tests.csproj
Added project Lfx.TestUtils.csproj
Solution is generated, 10 projects added
GenerateSolutionFile.exe /p C:\Projects\Lfx /s C:\Projects\Lfx\Test.sln /e Core\* /e *.Tests
Added project Lfx.ConfigReader.csproj
Skipped project Lfx.Core.csproj
Skipped project Lfx.Core.TestAssemblyConfig.csproj
Skipped project Lfx.Core.TestConfiguration.csproj
Skipped project Lfx.Core.Tests.csproj
Added project Lfx.Data.csproj
Skipped project Lfx.Data.Tests.csproj
Added project Lfx.Global.csproj
Skipped project Lfx.Global.Tests.csproj
Added project Lfx.Msmq.csproj
Skipped project Lfx.Msmq.Tests.csproj
Added project Lfx.Services.csproj
Skipped project Lfx.Services.Tests.csproj
Added project Lfx.TestUtils.csproj
Solution is generated, 6 projects added
UPDATE. I have extended the utility to create solution folders. See this blog post.