Vagif Abilov's blog on .NET

Symbolic calculation in F#. Part 3: parsing and formatting expressions

NB! For some strange reason some sample code for this post was rejected by the server with “BLOCKED EXPRESSION” error. After several attempts to fix the problem, I converted code snippets to images. All three blog posts are combined now in an article that is published at CodeProject.

The first post about symbolic calcutaion in F# showed how to calculate derivatives, and the sequel demonstrated how the algebraic expressions can be simplified. Still, the most natural would be to write expressions in plain text, so the program could take an input like “sin(x ^ 2)” and generate an output “2 * x * cos(x ^ 2)”.

Let’s see how this can be approached with F#. We start with a formatter – formatting is usually easier than parsing. First we define a couple of helper functions to format operators and function names:

Code1

Then the rough implementation of the expression formatter does not take many lines of code:

Code2

There is only one problem with this code: it always surrounds algebraic operations with parenthesis, and this is only necessary in case the expression is contained in an outer expression. This is an example of redundant parenthesis:

Output1

It’s not complicated however to modify the original code, so it does not embrace top-level expressions with parenthesis:

Code3

Now we’re getting nice-looking output:

Output2

Satisfied with expression formatting, we can now proceed with expression parsing which appeared to be a more challenging tasks. First we need a tokenizer that would convert an input string into a list of tokens – atoms that will be building blocks of the resulting expression. Here is a simple tokenizer:

Code4

Output3

The tokenizer includes one rule that is specific for processing exponential functions (e ^ x). Unlike other functions (log, sin, cos), the exponent uses power operator notation, so adding proper support for it would devote large part of the post series just to this specific case. So I made a light constraint on use of exponent: its argument is always enclosed in parenthesis (so the input string should look like “e ^ (x)”, not “e ^ x”, and during the tokenization process the expression is converted into notation similar to other functions: e(x). So when proceeding with expression parsing, we won’t need to handle exponential functions in a special way.

Next step is to eleminate parenthesis and divide tokens into groups, each group representing a trivial expression construct. For example, an expression “(2 + x) * (5 - x) can be split into groups containing expressions “2 + x”, “5 – x” and the operator “*” binding them together. We achive this in a few steps: first by assigning each token a level (incremented with each opening parentheses and decremeneted with a closing parentheses), and then by putting contiguous tokens with the same level in a list. Here is the code that handles these operations and an example of its use:

Code5

Output4

We will also need some auxilliary functions: to test if a string represent an operator or a function, a couple of active pattern definitions to match numeric constants and variables, and methods to apply parsed operators or functions to expressions that they bind:

Code6

With supporting stuff in place, here’s the code that converts text input into expression trees:

Code7

Now it’s just to test how this all works:

Output5

So we’re done: we can now enter math expressions in plain text and obtain results of symbolic derivative calculation also in plain text. All in F#!

Symbolic calculation in F#. Part 2: simplifying algebraic expressions

In the previous post we have shown an F# program that calculates derivatives. It has a room for improvement in its presentation part: resulting expressions often can be rewritten in a simpler form, and F# suits very well to solve such task.

We will be using the same set of expression primitives and active pattern helpers “Op” and “Func”:

type Expression =
    | X
    | Const of float
    | Neg of Expression
    | Add of Expression * Expression
    | Sub of Expression * Expression
    | Mul of Expression * Expression
    | Div of Expression * Expression
    | Pow of Expression * Expression
    | Exp of Expression
    | Log of Expression
    | Sin of Expression
    | Cos of Expression

let (|Op|_|) (x : Expression) =
    match x with
    | Add(e1, e2) -> Some(Add, e1, e2)
    | Sub(e1, e2) -> Some(Sub, e1, e2)
    | Mul(e1, e2) -> Some(Mul, e1, e2)
    | Div(e1, e2) -> Some(Div, e1, e2)
    | Pow(e1, e2) -> Some(Pow, e1, e2)
    | _ -> None

let (|Func|_|) (x : Expression) =
    match x with
    | Exp(e) -> Some(Exp, e)
    | Log(e) -> Some(Log, e)
    | Sin(e) -> Some(Sin, e)
    | Cos(e) -> Some(Cos, e)
    | _ -> None

Now we define a recursive function Simplify that converts an input of Expression type to an output of the same type attempting to rewrite an expression in a simpler form. Like with Derivative function, the definition is a list of rules rather than a sequential algorithm:

let rec Simplify x : Expression =
    match x with
    | Add(Const(n1), Const(n2)) -> Const(n1 + n2)
    | Sub(Const(n1), Const(n2)) -> Const(n1 - n2)
    | Mul(Const(n1), Const(n2)) -> Const(n1 * n2)
    | Div(Const(n1), Const(n2)) -> Const(n1 / n2)
    | Neg(Const(0.)) -> Const(0.)
    | Neg(Neg(e)) -> e |> Simplify
    | Add(e, Const(0.)) -> e |> Simplify
    | Add(Const(0.), e) -> e |> Simplify
    | Add(Const(n), e) -> Add(e, Const(n)) |> Simplify
    | Add(e1, Neg(e2)) -> Sub(e1, e2) |> Simplify
    | Add(Neg(e1), e2) -> Sub(e2, e1) |> Simplify
    | Sub(e, Const(0.)) -> e |> Simplify
    | Sub(Const(0.), e) -> Neg(e) |> Simplify
    | Mul(e, Const(1.)) -> e |> Simplify
    | Mul(Const(1.), e) -> e |> Simplify
    | Mul(e, Const(0.)) -> Const(0.)
    | Mul(Const(0.), e) -> Const(0.)
    | Mul(e, Const(n)) -> Mul(Const(n), e) |> Simplify
    | Mul(Div(Const(n), e1), e2) -> Mul(Const(n), Div(e2, e1)) |> Simplify
    | Mul(e1, Div(Const(n), e2)) -> Mul(Const(n), Div(e1, e2)) |> Simplify
    | Mul(Neg(e1), e2) -> Neg(Mul(e1, e2)) |> Simplify
    | Mul(e1, Neg(e2)) -> Neg(Mul(e1, e2)) |> Simplify
    | Div(Const(0.), e) -> Const(0.)
    | Div(e, Const(1.)) -> e |> Simplify
    | Div(Neg(e1), e2) -> Neg(Div(e1, e2)) |> Simplify
    | Div(e1, Neg(e2)) -> Neg(Div(e1, e2)) |> Simplify
    | Pow(Const(0.), e) -> Const(0.)
    | Pow(Const(1.), e) -> Const(1.)
    | Pow(e, Const(0.)) -> Const(1.)
    | Pow(e, Const(1.)) -> e |> Simplify
    | Op (op, e1, e2) 
        ->
        let e1s = Simplify e1
        let e2s = Simplify e2
        if e1s <> e1 || e2s <> e2 then
            op(Simplify e1, Simplify e2) |> Simplify
        else
            op(e1, e2)
    | _ -> x

The set of rules is not complete. For example, there is no rule to rewrite “2 * (3 + X)” as “6 + 2*X”, but it should be enough to eliminate most obvious redundancies, such as multiplication by one and zero addition. So if we can fire F# interactive window, we can test how it works:

> let s = Simplify(Add(Mul(Const 0., X), Mul(Const 5., Const 1.)));;

val s : Expression = Const 5.0

What we can do now is extend the Derivative function written in the previous post, so it can take advantage of our new Simplify function:

let rec Derivative x : Expression =
    let y =
        match x with
        | X -> Const(1.)
        | Const(n) -> Const(0.)
        | Neg(e) -> Neg(Derivative(e))
        | Add(e1, e2) -> Add(Derivative(e1), Derivative(e2))
        | Sub(e1, e2) -> Sub(Derivative(e1), Derivative(e2))
        | Mul(e1, e2) -> Add(Mul(Derivative(e1), e2), Mul(e1, Derivative(e2)))
        | Pow(e, Const(n)) -> Mul(Const(n), Pow(e, Const(n-1.)))
        | Exp(X) -> Exp(X)
        | Log(X) -> Div(Const(1.), X)
        | Sin(X) -> Cos(X)
        | Cos(X) -> Neg(Sin(X))
        | Func(g, f) -> 
            let dg = Derivative(g(X))
            let df = Derivative(f)
            match dg with
            | Func(dgf, dge) -> Mul(dgf(f), df)
            | Op (op, e1, e2) -> Mul(op(e1, e2), df)
            | _ -> failwith(sprintf "Unable to match compound function [%A]" dg)
        | _ -> failwith(sprintf "Unable to match expression [%A]" x)
    Simplify y
Now let's test what we have by calculating derivative for various functions:
> let d1 = Derivative(Add(Mul(Const(5.), X), Const(3.)))
let d2 = Derivative(Add(Pow(X, Const(3.)), Const(3.)))
let d3 = Derivative(Sin(Mul(Const(2.), X)))
let d4 = Derivative(Log(Mul(Const(2.), X)))
let d5 = Derivative(Exp(Mul(Const(2.), X)))
let d6 = Derivative(Exp(Pow(X, Const(2.))))
let d7 = Derivative(Log(Sin(X)))
let d8 = Derivative(Log(Cos(X)));;

> 
val d1 : Expression = Const 5.0
val d2 : Expression = Mul (Const 3.0,Pow (X,Const 2.0))
val d3 : Expression = Mul (Const 2.0,Cos (Mul (Const 2.0,X)))
val d4 : Expression = Div (Const 2.0,X)
val d5 : Expression = Mul (Const 2.0,Exp (Mul (Const 2.0,X)))
val d6 : Expression = Mul (Exp (Pow (X,Const 2.0)),Mul (Const 2.0,X))
val d7 : Expression = Div (Cos X,X)
val d8 : Expression = Neg (Div (Sin X,X))

Can we improve anything? Well, maybe… Since it’s getting so easy, why not set an ultimate goal: present input and output as a plain text, not as Expression items. So we can pass “log(cos(x))” and get back “-sin(x)/x”. I guess this will be slightly more work to achieve, but should be fun once we get there. In the next part.

Symbolic calculation in F#. Part 1: derivatives

I am learning F#, and one of the best areas to use a functional language is to apply it to symbolic calculations. Like evaluating function derivatives. I remember how I was impressed many years ago when I looked at a program in Prolog that occupied not more than one computer screen but could tell me that derivative of sin(x) was cos(x). So I wanted to do the same in F#.

If you ask a developer who is only using procedural languages to write a program that calculates derivatives in a symbolic form, chances are pretty good that he will start with parsing input strings. That’s the nature of procedural languages: you solve a task by building a sequence of steps that it takes to convert input to output. A developer focusing on a higher level abstraction migh start with building classes for expression trees, so he can implement derivative calculation on the top of them. But what derivative calculation has with creating classes to store tree nodes? Why should we care?

In procedural languages we need to care. As Niklaus Wirth stated in his book title, "Algorithms + Data Structures = Programs". So we express our imperative programs in algorithms and apply them to data structures.

With F# it’s completely different. You simply describe your domain, your rules. And things just happens…

So we’ll do derivatives. First we need to define the scope of our calculations, our symbols. Note that Expression type doesn’t define a data structure – it’s just a collection of symbols that we will be using to form functional expressions.

type Expression =
    | X
    | Const of float
    | Neg of Expression
    | Add of Expression * Expression
    | Sub of Expression * Expression
    | Mul of Expression * Expression
    | Div of Expression * Expression
    | Pow of Expression * Expression
    | Exp of Expression
    | Log of Expression
    | Sin of Expression
    | Cos of Expression

What we just declared is called in F# discriminated union. I won’t be spending time on F# syntax details, I am learning F# from two great books: "Programming F#" and "Real World Functional Programming: With Examples in F# and C#" (I started with the second one but had to switch to the first one that in my opinion provides a better transition to a functional programming and F# style).

As you can see, we made some restrictions to expression syntax: we assume that numeric constants are of type float, and we only support four functions: exponent, natural logarithm, sine and cosine. But this should be sufficient for a purpose of language demo. In addition, there is only one variable symbol (X).

Before we proceed with derivative definition, there is a couple of helper construction I’d like to define. We may need to have a common match for binary operators (Add, Sub, Mul, Div) and supported functions (Exp, Log, Sin, Cos). So we’ll define so called active patterns that can be used to match such constructs:

let (|Op|_|) (x : Expression) =
    match x with
    | Add(e1, e2) -> Some(Add, e1, e2)
    | Sub(e1, e2) -> Some(Sub, e1, e2)
    | Mul(e1, e2) -> Some(Mul, e1, e2)
    | Div(e1, e2) -> Some(Div, e1, e2)
    | Pow(e1, e2) -> Some(Pow, e1, e2)
    | _ -> None

let (|Func|_|) (x : Expression) =
    match x with
    | Exp(e) -> Some(Exp, e)
    | Log(e) -> Some(Log, e)
    | Sin(e) -> Some(Sin, e)
    | Cos(e) -> Some(Cos, e)
    | _ -> None

Again, I am afraid I can’t use this post to define the meaning of active patterns, banana clips (“(|” and “|)”), and options syntax (“Some” and “None”), there is a lot of information online, but you can think about the definitions above as something similar to regular expression matching: an expression “Add(e1, e2)” will be matched as “Op” (Add, e1, e2), and an expression “Exp(e)” is matched as “Func (Exp, e)”. We will see in a moment how this can become useful.

Now what’s left is just a definition of a derivative. Here it is:

let rec Derivative x : Expression =
    match x with
    | X -> Const(1.)
    | Const(n) -> Const(0.)
    | Neg(e) -> Neg(Derivative(e))
    | Add(e1, e2) -> Add(Derivative(e1), Derivative(e2))
    | Sub(e1, e2) -> Sub(Derivative(e1), Derivative(e2))
    | Mul(e1, e2) -> Add(Mul(Derivative(e1), e2), Mul(e1, Derivative(e2)))
    | Pow(e, Const(n)) -> Mul(Const(n), Pow(e, Const(n-1.)))
    | Exp(X) -> Exp(X)
    | Log(X) -> Div(Const(1.), X)
    | Sin(X) -> Cos(X)
    | Cos(X) -> Neg(Sin(X))
    | Func(g, f) -> 
        let dg = Derivative(g(X))
        let df = Derivative(f)
        match dg with
        | Func(dgf, dge) -> Mul(dgf(f), df)
        | Op (op, e1, e2) -> Mul(op(e1, e2), df)
        | _ -> failwith(sprintf "Unable to match compound function [%A]" dg)
    | _ -> failwith(sprintf "Unable to match expression [%A]" x)

As you can see, it’s not an algorithm – it’s a description of what derivative is. You can also why we needed to introduce “Op” and “Func” active patterns: they are used in declaration of complex function derivative evaluation. Without these patterns we would need to list all supported operators and functions.

To test how this works, we can open a F# interactive window (FSI) and type a function to test:

> let d = Derivative(Sin(X));;

val d : Expression = Cos X

> let d = Derivative(Exp(Pow(X,Const(2.))));;

val d : Expression = Mul (Exp (Pow (X,Const 2.0)),Mul (Const 2.0,X))

That’s it! But while we are on symbolic calculations, we can improve the presentation part of it. Right now, if there is no attempt to simplify resulting expressions, so if we calculate a derivative of an expression corresponding to “5 * x + 3”, we will get a correct but silly looking answer:

> let d = Derivative(Add(Mul(Const(5.), X), Const(3.)));;

val d : Expression =
  Add (Add (Mul (Const 0.0,X),Mul (Const 5.0,Const 1.0)),Const 0.0)

But if it was so simple to calculate derivatives in F#, it should not be difficult to write a function to simplify algebraic expressions. That will be in the next part.

Links from Twitter and Google Reader

Behavior Driven Development (BDD) with SpecFlow and ASP.NET MVC.

Extensible Web portal for gathering and displaying information about source code projects.

Kanban and When Will This Be Done? (Dennis Stevens on making predictable feature estimates).

Good MVP vs. MVC comparison by Glenn Block on Stackoverflow.

Mikado method for large scale refactoring (so silly of me to miss a workshop on this at XP2010.

Is event versioning as costly as SQL schema versioning?

We had a discussion about CQRS and event sourcing, and there was a concern raised about cost of event versioning. Yes, with event sourcing we can get rid of relational database schema on the write side of the system, saving us from the maintenance of the SQL schema, but won’t the saved effort be used on the maintenance of events?

I don’t think so. Or at least I don’t think the cost will be as high. Let’s see why.

First, one of the frequent reasons to change data schema is refactoring tables to improve extensibility and expressiveness of the relational model. It does not however necessarily change any events. For example, the first version of the Internet shop could have all customer details placed in a table Customer:

In the second version it was decided to extract address details into a separate table, because in the future customers should be able to register several address:

In a system based on a relational database this change will require writing an upgrade script. But a system based on event sourcing will handle same events:

void CustomerRegistered(CustomerRegistrationDetails details);
void CustomerAddressUpdated(CustomerAddressDetails details);

I believe that systems where entities represent state transition rather than state are less affected by changes of internal internal data model. What is essential for such systems is define events right, so they reflect state transitions as seen by parties involved in a use case, and not state changes of internal data structures.

Second, and more interesting case is when changes to the system require event versioning and may or may not require versioning of a relational database schema. For example, let’s say in the third version of the Internet shop customers are required to register both shipping and billing addresses:


You will most likely use the same “Address” table to store all address types. Then you will have to upgrade existing customer data. How will you interpret old Address records that don’t specify the address type? You will either introduce a new address type (“Unspecified”) or have to make a qualified guess, for example treat all addresses without types as shipping addresses. But you will have to run another upgrade script, this time it will be data upgrade, not schema upgrade.

In case of event sourcing system you may want to be more expressive with events:

void CustomerRegistered(CustomerRegistrationDetails details);
void CustomerShippingAddressUpdated(CustomerAddressDetails details);
void CustomerBillingAddressUpdated(CustomerAddressDetails details);

But we missed the old event here: CustomerAddressUpdated. We could also make a qualified guess here, assume all already registered addresses are shipping addresses and rename the event, but I don’t think this is a right apporach. We should not redefine semantics of old events and instead include them in the event list:

void CustomerRegistered(CustomerRegistrationDetails details);
void CustomerAddressUpdated(CustomerAddressDetails details);
void CustomerShippingAddressUpdated(CustomerAddressDetails details);
void CustomerBillingAddressUpdated(CustomerAddressDetails details);

So the code that builds customer object from the event stream should understand both obsolete CustomerAddressUpdated event and new CustomerShippingAddressUpdated and CustomerBillingAddressUpdated events. Will it really require greater development effort than upgrading existing customers in a relational database? I don’t think so. But retaining historical events in the system has an important advantage of being able to reproduce exactly what happenned to the customer with option to correct possible inconsistency on the fly (for example by asking him to specify the type of previously registered address) rather than bending its shape to fit an updated relational data model.

Simple Rule Editor goes to BitBucket

About a year ago I wrote a blog post where I described a simple WYSIWIG rule editor that can be used to manage rules applied using WF rule engine. After that I received a few responses with comments and questions about using WF rule engine in .NET applications. Apparently this technique was considered useful in various projects, and I even presented it on a recent NNUG session in Oslo.

Few days ago I received a mail from one developer who complained that my rule enging wrapper was unable to handle Nullable<T> types. I never invested enough time to make the wrapper handle such issues, the whole thing was more like a demonstration of a concept. However, since it is now used by others, I decided to have more solid approach to it and moved the project to BitBucket where it’s source can be obtained from here. I also updated WF rule engine wrapper to correctly handle converstion to nullable types. This is a common problem. Strings can not be converted to a generic Nullable<T> type using Convert.ChangeType call. I had to modify RyleType.SetPropertyValue method that now looks like this:

public void SetPropertyValue(RuleObject ruleObject, int index, object value)
{
    object propertyValue;
    if (this.Properties[index].Type.IsEnum)
    {
        propertyValue = Enum.Parse(this.Properties[index].Type, value.ToString());
    }
    else if (this.Properties[index].Type.IsGenericType && this.Properties[index].Type.GetGenericTypeDefinition().Equals(typeof(Nullable<>)))
    {
        System.ComponentModel.NullableConverter nullableConverter = new System.ComponentModel.NullableConverter(this.Properties[index].Type);
        propertyValue = System.Convert.ChangeType(value, nullableConverter.UnderlyingType);
    }
    else
    {
        propertyValue = Convert.ChangeType(value, this.Properties[index].Type);
    }
    this.Properties[index].PropertyInfo.SetValue(ruleObject.PropertyObjects[this.Properties[index]], propertyValue, null);
}

Note the block that handles generic types. It doesn’t seem that conversion from string can be handles easier.

Impression from Typemock Academy

I have written a guest post about my impressions from Typemock Academy that was published in Typemock blog.

Using T4 templates to manage assembly version information

I wanted to configure version information generation for some .NET projects. It’s been a long time since I investigated available options, so I searched around hoping to find some simple way of doing this. What I’ve found didn’t look very encouraging: people write Visual Studio add-ins and custom MsBuild tasks just to obtain one integer number (okay, maybe two). This felt overkill for a small personal project.

The inspiration came from one of the StackOverflow discussions where somebody suggested that T4 templates could do the job. And of course they can. The solution requires a minimal effort and no Visual Studio or build process customization. Here what should be done:

1. Create a file with extension ".tt" and place there T4 template that will generate AssemblyVersion and AssemblyFileVersion attributes:

<#@ template language="C#" #>
// 
// This code was generated by a tool. Any changes made manually will be lost
// the next time this code is regenerated.
// 

using System.Reflection;

[assembly: AssemblyVersion("1.0.1.<#= this.RevisionNumber #>")]
[assembly: AssemblyFileVersion("1.0.1.<#= this.RevisionNumber #>")]
<#+
    int RevisionNumber = (int)(DateTime.UtcNow - new DateTime(2010,1,1)).TotalDays;
#>

You will have to decide about version number generation algoritm. For me it was sufficient to auto-generate a revision number that is set to the number of days since January 1st, 2010. As you can see, the version generation rule is written in plain C#, so you can easily adjust it to your needs.

2. The file above should be placed in one of the projects. I created a new project with just this single file to make version management technique clear. When I build this project (actually I don’t even need to build it: saving the file is enough to trigger a Visual Studio action), the following C# is generated:

// 
// This code was generated by a tool. Any changes made manually will be lost
// the next time this code is regenerated.
// 

using System.Reflection;

[assembly: AssemblyVersion("1.0.1.113")]
[assembly: AssemblyFileVersion("1.0.1.113")]

Yes, today it’s 113 days since January 1st, 2010. Tomorrow the revision number will change.

3. Next step is to remove AssemblyVersion and AssemblyFileVersion attributes from AssemblyInfo.cs files in all projects that should share the same auto-generated version information. Instead choose “Add existing item” for each projects, navigate to the folder with T4 template file, select corresponding “.cs” file and add it as a link. That will do!

What I like about this approach is that it is lightweight (no custom MsBuild tasks), and auto-generated version information is not added to source control. And of course using C# for version generation algorithm opens for algorithms of any complexity.

How to make .NET application support both console and GUI mode

I am currently working on a next version of a utility to generate Visual Studio solutions. A new version will be called Solution Maker and add user interface support to a command line option. I wanted to mix console and GUI mode in a single program, but strictly speaking this is impossible. A program is assigned its type at compile time, so any attempt to change it’s behavior at execution time does not alter the application’s nature. A console application may create Windows forms, and a Window-based application may allocate consoles, but they won’t become ducks even if they quack.

But Visual Studio does it, don’t it? You can launch IDE and still you can run “devenv” from a command-line. How does it do it?

Well, actually it doesn’t! Here’s an explanation of the trick. There two binaries: devenv.com and devenv.exe. “Com” is always probed first, and this is the console one. So when you type “devenv” it’s a console version that will be executed. And if it does not get any command-line input, it simply launches devenv.exe. Smart!

And this can be done with any application. Here are the steps:

  1. Write an application with user interface, let’s say it’s called MyProgram.exe. This program should not have anything special.
  2. Write a command line application MyProgram.Command.exe. This program needs a few special lines, we will look at them.
  3. Rename MyProgram.Command.exe to MyProgram.com and copy it to the folder where GUI MyProgram.exe resides.

That’s it. And of course, command-line version of MyProgram needs some special code. Here it is:

class Program
{
    static void Main(string[] args)
    {
        string relayProcessPath = null;

        // Only relay to another application if no command-line arguments are specified
        if (args.Length == 0)
        {
            string thisProcessPath = Process.GetCurrentProcess().MainModule.FileName;
            if (Path.GetExtension(thisProcessPath).ToLower() == ".com")
            {
                // Relay process should only differ in extension
                string expectedRelayPath = Path.ChangeExtension(thisProcessPath, ".exe");
                if (File.Exists(expectedRelayPath))
                {
                    relayProcessPath = expectedRelayPath;
                }
            }
        }

        if (!string.IsNullOrEmpty(relayProcessPath))
        {
            // Launch relay process
            Process.Start(relayProcessPath);
        }
        else
        {
            // Process with a command line
            Console.WriteLine("Hello from command line");
        }
    }
}
Source code in the cloud

It’s difficult to resist giving a try to Git or Mercurial these days, and I decided to start moving the code of my personal projects to DVCS. I’ve chosen Bitbucket as a remote storage for my repositories. It seems to have very good reputation, and it’s subscription plans include a free plan for a single private repository (up to 1 GB) with upgrades starting as low as $5 a month.

I am just making my first steps with Mercurial, so I read Joel Spolsky’s Hg Init introduction, but Bitbucket also contains guidelines that make setting up repository a simple excercise.

It’s easy enough to manage source control using Hg commands, but TortoiseHg installs a Windows shell extension that turns Windows Explorer into a source control management tool.

And finally, for those spoilt by Visual Studio source control integration, VisualHG is a Mercurial plug-in for Visual Studio.

I installed all of the above packages and was surprised how quickly I was up and running my source against remote Mercurial repository at Bitbucket. And I definitely enjoy the experience. Comparing to TFS I like the mobile nature of my local sources. Like every centralized version control system, TFS prefers its users to be connected. You can bring your projects offline, but this is considered to be a temporary state. As soon as you can connect to TFS, you are supposed to do it so it can synchronize your changes. Distributed version control systems have different phylosophy, there are no connected and disconnected users, and you can work and commit your changes locally and only push them to a remote repository for planned release or synchronization.

One thing for sure: no more source control at home. In 1993 I bought a Microsoft Delta license for my personal use. Later I upgraded it to SourceSafe (at that time not yet “Visual”). All this time I had source code repository on one of my home machines. No more. “hg push” command will make Bitbucket take care of my source.

Typemock Academy. In Oslo!

I hope that volcano (how is it called by the way? Ey… Eja… Ejafaja… oh no! it’s Eyjafjallajokull – can anyone actually manage it without Google?) won’t disturb air traffic in Norway in the end of April. Because on 27th and 28th Typemock is arranging Typemock Academy. With special appearance of one-and-only Uncle Bob. And who do you think will be one of the guest speakers? Well, I am modest, I won’t tell you.

Anyway, I got 60 minutes to share my testing experience. “A Road to True Unit Test”. This is a preliminary title of my presentation. Working on details…

Don’t use Activator.CreateInstance or ConstructorInfo.Invoke, use compiled lambda expressions

For a long time I’ve been under impression that rule engine that comes with Microsoft Windows Workflow Foundation is slow, very slow. We used it to execute some of our business rules, and soon found out that rule processing slows down application execution. What was strange is that the rules were really simple and it was hard to believe that an industry-strength rule engine uses so long time on processing them.

Eventually I ran a profiler, and it immediately showed where the time was spent. In fact it was not the actual rule validation but preparation for it: instantiation of the WF rule parser. The parser class is not public, so we used a trick to obtain its instance: retrieved its non-public constructor information via reflection and then called Invoke.

This was sloooow. So slow that we noticed it in our integration tests. What made it noticeable is that after we converted some of our business rules to use WF rule engine, we were encouraged by results and move more rules to use the same techinque. In the end rule validation was called many times during a single business operation, and performance deteriorated.

My first approach to this issue was to cache validated rules, and it worked. The total time to execute integration tests was reduced by 50%. However, I was not quite happy with the situation: we have various places where objects are instantiated using reflection, caching is not always possible and in general complicates component design. Recently I reviewed the code of StructureMap (I like reading good code), and remembered that it no longer used reflection and instead instantiated dependencies using compiled lambda expressions.

Roger Alsing in this post presented a generic method that can be used as a replacement for constructor method invocation. The essence of the method is in it’s last three lines:

// Make a NewExpression that calls the ctor with the args we just created
NewExpression newExp = Expression.New(ctor, argsExp);                  

// Create a lambda with the New expression as body and our param object[] as arg
LambdaExpression lambda = Expression.Lambda(typeof(ObjectActivator), newExp, param);              

// Compile it
ObjectActivator compiled = (ObjectActivator)lambda.Compile();

Although lambda expressions are part of LINQ (and you have to import System.Linq.Expressions namespace to manage them), as we can see from this example, they can be very useful to solve core language tasks, such as object instantiation.

But what is the gain? The gain is huge bringing the performance close to the speed of native IL code. I wrote a few tests to measure performance of object creation in different scenarios: by calling ‘new’ constructor, Activator.CreateInstance, ConstructorInfo.Invoke and using compiled lambda expression. Here are the results:

DefaultConstructor_Activator: (0,20 ms per 1000 calls)
DefaultConstructor_CompiledExpression: (0,04 ms per 1000 calls)
DefaultConstructor_Invoke: (1,07 ms per 1000 calls)
DefaultConstructor_New: (0,02 ms per 1000 calls)
DefaultConstructor_NotCompiledExpression: (169,00 ms per 1000 calls)
NonDefaultConstructor_Activator: (3,39 ms per 1000 calls)
NonDefaultConstructor_CompiledExpression: (0,07 ms per 1000 calls)
NonDefaultConstructor_Invoke: (1,57 ms per 1000 calls)
NonDefaultConstructor_New: (0,02 ms per 1000 calls)
NonDefaultConstructor_NotCompiledExpression: (293,00 ms per 1000 calls)

Results says it all, especially when arranged in a graph.

Chart

I removed from the graph the results for not compiled lambda expressions, because they make other figures insignificant. But it is important to remember that compilation should be performed only once, so the code should be carefully reviewed to avoid occasional recompilcation of lambdas.

P.S. The title of this blog post may look provocative, and of course I don’t really mean somebody should completely stop using Activator.CreateInstance. Compilation of labmda expressions is expensive enough to limit it’s practical use. But there are certain use cases (like IoC frameworks) when it gives clear performance advantage.

First look at NDepend 3.0

I installed to today the latest version of NDepend and gave it a try. Last time I blogged about NDepend, I used it on a small solution, and while the generated report was full of useful information, I did not use the product at full strength, because as it’s name suggests, NDepend is about analyzing dependencies, and my solution was too small for interesting observations. This time I plan to analyze a large solution consisting of all of our projects with an exception of a Web administration projects that are not so interesting for such analysis. The solution file was generated using a utility described earlier and contains 140 projects. I never worked with such large solutions (usually I group projects in much smaller solutions), so I was a bit uncertain about how long it will take to load on my Dell D830 notebook. But it started up relatively quickly.

The biggest news about NDepend 3.0 is that it is now fully integrated into Visual Studio development environment. So now all its menus and windows are parts of development session, and windows of course can be docked.

NDepend10

The very first impression from NDepend: it’s fast, blazingly fast. As you can see from the summary below, it used only 18 seconds to analyze 1188 source files in 140 projects of my solutions: 15 milliseconds on each file.

NDepend02

So these are the metrics for my large solution, though I disagree with one detail: the report states that the solution has only 1 attribute class. Yes, it has only one class with .NET Attribute as its immediate parent, but there are several other classes that derive from Attribute, but not directly. One class derives from TypeMock.DecoratorAttribute that inherits from Attritbute. In addition there are several classes inheriting from PostSharp.Laos.OnMethodBoundaryAspect, and walking its inheritance tree brings us to MulticastAttribute that inherits from Attribute. I believe all such classes should be classified as attribute classes (since this is how they are used).

Enough with class attributes. I browsed the report and noticed that I’d like to change a few metrics’ parameters right away. For example, a list of too long type names included all types with names longer than 35 characters. While this seems to be a reasonable limit, some exclusions from such rule can also be reasonable. The top 10 entires in this list ended with either “Exception”, “Preset”, “Response” or “Request”. “Preset” is a suffix for a special category of classes that we use in tests, “Response”/”Request” are used in Web service communication, and “Exception” is of course a suffix for exceptions. I want to see classes with “really” long names, so how do I exclude these special classes from the rule?

It appeared to be very easy. I displayed a CQL Query Explorer and double-clicked on the rule. NDepend displayed a CQL Query Edit window, and it was obvious what I had to do to customize the rule. Below is an updated query:

NDepend04

The simplicity of rule customization encouraged me to start inspecting other report’s details. One of the most important ones was the list of methods to refactor. And it also had to be customized. Because this is what I saw:

NDepend05

Firstly, I think the table will look more informative if it highlights the figures that violates the metrics. Otherwise you have to browse every column and compare its data with the values from the respective CQL query. But what deserves customization is that the top of the list is occupied by class constructors with number of parameters exceeding recommended maximum (5). Should an exception be made for constructors? I believe it should, at least these days, when developers tend to use IoC containers and compose large applications with constructor injection. But how do I specify constructor exclusion? Luckily, I didn’t even have to browse NDepend online documentation - NDepend supports intellisense.

NDepend08

Exclusion constructors from the query resulted in much more interesting list of methods (below). They are candidates for refactoring for different reasons: number of lines, number of IL instructions, netsing deptsh, number of variables, etc. Again, I would appreciate if offending figures were highlighted.

NDepend07

It took me just minutes to apply these customizations, and what’s most encouraging is that I did not have to read anything about what these metrics are and how to adjust them to fit my needs: everything is intuitive, with additional explanation text shown along the queries and reports. I need more time to interpret metrics  and diagrams related to dependencies – the solution is too big to give a simple picture. I leave it to a second look. To be continued…

Using “yield” to enumerate endless sequence

Craig Andera in his blog post showed yet another Fibonacci algorithm, this one with “yeild” operator.

private IEnumerable Fibonacci()
{
    yield return 0;
    yield return 1;

    int a = 0;
    int b = 1;
    while (true)
    {
        int temp = a + b;
        a = b;
        b = temp;
        yield return b;
    }
}

Now it's possible to fetch Fibonacci numbers in this manner:

static void Main(string[] args)
{
    foreach (int a in Fibonacci())
    {
        Console.Write(a);
        Console.Write(" more (y/n)?");

        string more = Console.ReadLine();
        if (more.ToUpper() != "Y")
            break;
    }
}

As you can see, code in Main procedure uses “foreach” statement, but the Fibonacci sequence is endless, so it can’t be populated in advance. Without “yield” we would have to create a temporary state variable (actually two: to store “a” and “b”) and pass them to a GetNextFibonacci that would produce a next number and return updated “a” and “b”. But with yield it’s possible to compute results on demand.

Posted: Thu, Mar 11 2010 10:41 AM by Vagif Abilov | with no comments
Filed under: ,
NUnitForVS: integrating NUnit tests into Visual Studio

Roy Osherove listed advantages of NUnit over MsTest but also mentioned one MsTest’s strength that can be crucial for many developers: “the integration with other team system tools and reporting is just beyond compare and the reporting alone helps alot to find recurring breaking tests, code churn vs. new tests and others”. This reminded me about what I recently read in a book by Jeff McWherter and Ben Hall Testing ASP.NET Web Applications where they said that forthcoming release of Visual Studio 2010 will make it possible to configure VS test runner to execute tests that use syntax different from MsTest.

I am a faithful user of TestDriven.Net, but I know that ability to use built-in Visual Studio test runner is an important factor that may affect the choice of unit test framework. So it would be nice to separate these decisions: selection of a unit test framework and selection of a test runner. I asked in Microsoft’s VS 2010 forum it new version of Visual Studio is really that flexible when it comes to configuring it’s test runner. Unfortunately not. Microsoft’s Euan Garden answered: “this was something we wanted to do in the release but never made it into the product.” However, Euan gave a couple of hints: Gallio and NUnit integration CodePlex project.

I checked CodePlex and found NUnitForVS: NUnit integration for Visual Studio 2008. I downloaded an ran the installer and within 5 minutes was able to make Visual Studio development environment to treat NUnit tests as they were native MsTest. The only preparation (in addition to installing NUnitForVS) was to open test project file in a text editor and add the following entry to the first PropertyGroup:

<ProjectTypeGuids>{3AC096D0-A1C2-E12C-1390-A8335801FDAB};{FAE04EC0-301F-11D3-BF4B-00C04F79EFBC}</ProjectTypeGuids>

This has to be done to every test project that uses NUnit, but otherwise Visual Studio won’t be able to classify the project as a test project (why should it? – the only type of test projects that it natively recognizes is MsTest). After this change Visual Studio accepts NUnit tests, and you will see all your tests in the Test View window:

NUnitForVS1

So far so good. Next checkpoint is to run and debug some of these tests. This also works well and test results are displayed in a respective window:

NUnitForVS2

Note the very last test in this list, the one that is called “Add(2,3,5)”. This is a parameterized test implemented using TestCase attribute:

[TestCase(2, 3, 5)]
public void Add(int x, int y, int sum)
{
    Calculator calc = new Calculator();
    decimal result = calc.Add(x, y);
    Assert.AreEqual(sum, result, "Incorrect result");
}

So using NUnitForVS we can exploit features of NUnit 2.5 that don’t exist in MsTest, and still Visual Studio correctly treats them. This is good news.

When it comes to collecting code coverage, the situation is a little trickier. When I open code coverage window, I see not very encouraging message “Code Coverage is not enabled for this test run”:

NUnitForVS3

Actually this is correct message: you have to explicitly enable code coverage instrumentation. But how? Apparently our solution sill lacks some piece, but luckily it’s easy to fix. All we need is to add a test configuration file with extension “testrunconfig”. Then we can activate its configuration and enable code coverage collection. One simple way to add such file is to add and then delete a test project to a solution (a “real” MsTest project). The MsTest project will be gone but will a trace in a form of test configuration file:

NUnitForVS4

If you now enable code coverage instrumentation for assemblies in your solution and rerun the tests, you will see the coverate summary:

NUnitForVS5

Pretty useless, isn’t it? The summary displays coverage only for the test assembly and not for the actual code under test. Why?

More googling, and the explanation came from Peter Stephen’s blog post: even though coverage instrumentation is enabled for all assemblies in our solution, one of assemblies was taken from a wrong place: the assembly under the test lies both in its own “bin” folder and in the “bin” folder of the test assembly, and it is there it has to be taken from. To resolve the problem you just need to add the assembly from a right place:

NUnitForVS6

Note 2 copies of UnitTestDemo.dll. The unchecked one is the one that was added initially, the checked one was added manually. And everything works:

NUnitForVS7

Of course, I only tried NUnitForVS on a very simple project. I plan to check it out with more complex code. But so far it looks quite promising opening Visual Studio test runner for NUnit - the most popular unit test framework.

More Posts « Previous page - Next page »