Monday, August 10, 2009 9:05 AM bart

Expression Trees, Take Two – Introducing System.Linq.Expressions v4.0

Introduction

Avid blog readers know I have a weak spot for expression trees in particular and the broader picture of meta-programming facilities. With the introducing of LINQ in the .NET Framework 3.5 timeframe, we ended up adding expression trees to the framework as a way to represent code as data. A fundamental requirement to make remote LINQ scenarios such as LINQ to SQL work, as the LINQ expression written by the developer needs to be cross-compiled into the corresponding query language, e.g. SQL. To enable this, two routes could be taken:

  1. Decompile IL code for the query at runtime, and turn it into the target language – quite roundabout
  2. Enable representing the query’s code as data at runtime by emitting an expression tree at compile-time

Turning the query into the required target language at compile time isn’t a viable option either, since the front-end compilers would have to know about all of the target query domains; clearly not a good idea.

A quick recap of the 3.5 landscape

Before diving into the .NET 4.0 feature set around enhanced expression trees, let’s have a quick recap of the corresponding baseline features introduced in .NET 3.5. Since LINQ was the driver for this feature after all, it makes sense to take a top-down approach and turn a LINQ query into more primitive building blocks.

Consider the following query expression:

var res = from p in db.Products
          where p.UnitPrice > 100
          select p.ProductName;

Given this, we can’t say much about the query just yet. All we know is that from a language’s point of view, we can “desugar” the above into the below:

var res = db.Products
          .Where(p => p.UnitPrice > 100)
          .Select(p => p.ProductName);

From this point on, regular method resolution rules kick in to find a Where method on db.Products, and to find a Select method on the result of calling Where. The argument to both those methods is a lambda expression and here’s where the flexibility comes from. Lambda expressions by themselves don’t have a type:

var twice = (int x) => x * 2;

Though the right-hand side seems to have all the information to infer a type (going from int to an expression of type int, ought to be Func<int, int> right?) the above doesn’t compile. Lambda expressions can be assigned to variables of two distinct types: regular delegates or expression trees thereof. The below illustrates this:

Func<int, int>             twiceD = x => x * 2;
Expression<Func<int, int>> twiceE = x => x * 2;

The first one is shorthand for an anonymous method we already supported in C# 2.0:

Func<int, int>             twiceD = delegate(int x) { return x * 2; };

However, the second one gives rise to quite a bit more of generated code; in particular, code is emitted that constructs an expression tree at runtime to represent the code the user wrote as data. Given this code, the library function calls (like Where and Select in case of LINQ) can consume that expression tree to analyze the code the user wrote, extracting the intent and turning it into execution in whatever way is right for the task at hand (e.g. by cross-compiling the expression into a target query language like SQL). In particular, the twiceE lambda expression gets turned into:

ParameterExpression x = Expression.Parameter(typeof(int), “x”);
Expression<Func<int, int>> twiceE = Expression.Lambda<Func<int, int>>(
    Expression.Multiply(
        x,
        Expression.Constant(2)
    ),
    x
);

That’s exactly what happens in the LINQ case when using an IQueryable<T> data source: the query operator methods like Where and Select don’t take in a plain delegate (like a Func<T, bool> for the filter fed in to the Where operator) but instead they take in an Expression<TDelegate> thereof (like an Expression<Func<T, bool>>). This causes the compiler to emit code generating an expression tree (by means of the Expression factory methods as shown above) for the argument. Ultimately the Queryable class puts together all those expression tree fragments into a big one representing the whole query expression, ready for the query provider to do whatever magic it sees fit. We won’t go there now.

To finish up this recap, a single picture that tells it all:

image

There’s on thing I haven’t mentioned yet: the Compile method on the LambdaExpression type as produced by the Expression.Lambda call. This plan simple API surface gives access to a gigantic machinery behind the scenes that turns the expression tree into executable IL code at runtime: the tip of the Compiler-as-a-Service (CaaS) iceberg.

As a little extra, let’s dig a little deeper and show the generated IL code through the Son-of-Strike (SOS) debugger extension. I’m using a Visual Studio 2010 project targeting .NET 3.5 for this demo. If it weren’t for the fact I have a 64-bit machine, I could equally well have used Visual Studio 2008, but mixed mode debugging (native + managed, as required by SOS) isn’t available on 64-bit in the 2008 version. In Visual Studio 2010, this restriction is gone (hooray).

Inspired by the code above, we can write a simple expression tree and have it compile at runtime. The resulting delegate object is based on dynamically emitted code, which we’ll visualize now:

image

First, we load the SOS extension from the .NET 2.0 installation folder. To do this, you’ll have to enable unmanaged code debugging:

image 

Next, we use !dso to dump stack objects, which will reveal our twice object which is of type Func<int, int>. Dumping the object using !do reveals the target object, the underlying method, etc. We’ll have a look at the _methodBase next, where we’ll be able to see the dynamic method’s code:

image

The !dumpil command is all we need to dump the contents of the dynamically generated IL body, revealing the “times two” logic. Feel free to experiment with more complex lambda expressions revealing their contents as IL after runtime compilation. For example, what lambda expression did I write to arrive with the following?

!dumpil 026b89f0
This is dynamic IL. Exception info is not reported at this time.
If a token is unresolved, run "!do <addr>" on the addr given
in parenthesis. You can also look at the token table yourself, by
running "!DumpArray 026bde24".

IL_0000: ldarg.1
IL_0001: ldarg.2
IL_0002: callvirt 6000002 System.String.Substring(Int32)
IL_0007: ret

So, to wrap up, the .NET 3.5 story of expression trees consisted of two core parts:

  • Library support for expression trees, in the System.Linq.Expressions namespace defined in the System.Core.dll assembly (meaning one can manually new up expression trees and obviously inspect their properties).
  • Language support for expression trees, triggered by assigning a lambda expression to an Expression<TDelegate> type, available in C# and VB (as invisible glue to make LINQ work in certain scenarios).

After expressions come … statements

Looking at expression trees in .NET 3.5 one observes they’re only able to represent expressions, i.e. those portions from languages that are valued in their very nature. Things like if and while don’t fall under this umbrella and are statements, and one more step higher up we land at full-blown declarations (like classes and interfaces). It’s clear if we want to represent more than valued expressions and aim our arrows at full method bodies, we need more than pure expression trees. So .NET 4.0 introduces statement trees as an extension to the existing expression tree API (so, as a little unfortunate side-effect of this, the Expressions part in the containing namespace becomes a bit of a misnomer).

The following picture illustrates the gaps we have in expression trees that hinder us from representing full method bodies. The reason we’re looking after such support is to be able to use the expression trees as one of the driving forces behind the Dynamic Language Runtime (DLR). To communicate code to the runtime, one uses expression trees. One could think of this in a rather extremist fashion as follows: where IL is used to represent code in the CLR, expression trees are used in the DLR. Ultimately the DLR is layered on top of the CLR, so at the end of the day expression trees are compiled – at runtime – into IL instructions as shown above with the Compile method of the v3.5 API. And from that point on, the IL can be turned into native code by the JIT compiler.

image

Everything in green was already exposed through the expression tree APIs in .NET 3.5. Things like method calls, various operators, constants, constructor calls, and variables (though you couldn’t assign to them but use them as parameters to lambdas) can be represented through factory method calls on the Expression type.

One first thing we need to add is all sorts of assignments so that mutation of variables can be achieved. Lots of languages live and breathe mutation, so assignment becomes an essential building block to realize the potential of those languages on top of the DLR. This is indicated in orange. Notice things like n++ fall under this category as well, due to their dual increment and assign nature. In fact, assignments in C# are categorized as expressions (why?), so the old API didn’t expose the whole expression surface of the language. Instead, the LINQ expressions API provided just enough expressiveness as was needed for commonly used operations in the context of writing queries. Writing things like the following doesn’t make sense in that context:

var res = from p in db.Products
          where p.UnitPrice > 100
          select p.ProductName = “Changed”;

(Note: The above could represent some kind of update statement, but together with LINQ’s deeply rooted lazy evaluation the above would turn out very cumbersome from an operational point of view.)

Finally, we require a way to perform sequencing of operations (the semi-colon in C# if you will) and organize such operations in blocks where the concept of scoping is alive and kicking (the idea of curly braces in C#). In addition, every statement that influences control flow is useful to have too. Examples include if and loops (together with their break and continue mechanisms), returning from a method, throwing and catching exceptions, etc.

Adding all those ingredients to the mix results in the v4.0 version of the expression tree API, including the ability to do dynamic dispatch (which is what the DLR is all about, and is surfaced through the new C# 4.0 dynamic keyword). Note: the “table” mentioned in the rectangle for dynamic dispatch above refers to all the operations supported by the DLR, such as calling properties, invoking methods, etc.

The following query can be used on both .NET 3.5 and .NET 4.0 to illustrate the differences between the APIs:

var ops = from m in typeof(Expression).GetMethods()
          where m.IsStatic
          group m by m.Name into g
          orderby g.Key
          select new { g.Key, Overloads = g.Count() };

foreach (var op in ops)
    Console.WriteLine(op);

In .NET 3.5, this is the result:

{ Key = Add, Overloads = 2 }
{ Key = AddChecked, Overloads = 2 }
{ Key = And, Overloads = 2 }
{ Key = AndAlso, Overloads = 2 }
{ Key = ArrayIndex, Overloads = 3 }
{ Key = ArrayLength, Overloads = 1 }
{ Key = Bind, Overloads = 2 }
{ Key = Call, Overloads = 6 }
{ Key = Coalesce, Overloads = 2 }
{ Key = Condition, Overloads = 1 }
{ Key = Constant, Overloads = 2 }
{ Key = Convert, Overloads = 2 }
{ Key = ConvertChecked, Overloads = 2 }
{ Key = Divide, Overloads = 2 }
{ Key = ElementInit, Overloads = 2 }
{ Key = Equal, Overloads = 2 }
{ Key = ExclusiveOr, Overloads = 2 }
{ Key = Field, Overloads = 2 }
{ Key = GetActionType, Overloads = 1 }
{ Key = GetFuncType, Overloads = 1 }
{ Key = GreaterThan, Overloads = 2 }
{ Key = GreaterThanOrEqual, Overloads = 2 }
{ Key = Invoke, Overloads = 2 }
{ Key = Lambda, Overloads = 5 }
{ Key = LeftShift, Overloads = 2 }
{ Key = LessThan, Overloads = 2 }
{ Key = LessThanOrEqual, Overloads = 2 }
{ Key = ListBind, Overloads = 4 }
{ Key = ListInit, Overloads = 6 }
{ Key = MakeBinary, Overloads = 3 }
{ Key = MakeMemberAccess, Overloads = 1 }
{ Key = MakeUnary, Overloads = 2 }
{ Key = MemberBind, Overloads = 4 }
{ Key = MemberInit, Overloads = 2 }
{ Key = Modulo, Overloads = 2 }
{ Key = Multiply, Overloads = 2 }
{ Key = MultiplyChecked, Overloads = 2 }
{ Key = Negate, Overloads = 2 }
{ Key = NegateChecked, Overloads = 2 }
{ Key = New, Overloads = 6 }
{ Key = NewArrayBounds, Overloads = 2 }
{ Key = NewArrayInit, Overloads = 2 }
{ Key = Not, Overloads = 2 }
{ Key = NotEqual, Overloads = 2 }
{ Key = Or, Overloads = 2 }
{ Key = OrElse, Overloads = 2 }
{ Key = Parameter, Overloads = 1 }
{ Key = Power, Overloads = 2 }
{ Key = Property, Overloads = 3 }
{ Key = PropertyOrField, Overloads = 1 }
{ Key = Quote, Overloads = 1 }
{ Key = RightShift, Overloads = 2 }
{ Key = Subtract, Overloads = 2 }
{ Key = SubtractChecked, Overloads = 2 }
{ Key = TypeAs, Overloads = 1 }
{ Key = TypeIs, Overloads = 1 }
{ Key = UnaryPlus, Overloads = 2 }

And in .NET 4.0, where I’ve used the same color scheme to outline differences from the previous version (notice some additions exist to the “expression” subset of available tree node types):

{ Key = Add, Overloads = 2 }
{ Key = AddAssign, Overloads = 3 }
{ Key = AddAssignChecked, Overloads = 3 }
{ Key = AddChecked, Overloads = 2 }
{ Key = And, Overloads = 2 }
{ Key = AndAlso, Overloads = 2 }
{ Key = AndAssign, Overloads = 3 }
{ Key = ArrayAccess, Overloads = 2 }
{ Key = ArrayIndex, Overloads = 3 }
{ Key = ArrayLength, Overloads = 1 }
{ Key = Assign, Overloads = 1 }
{ Key = Bind, Overloads = 2 }
{ Key = Block, Overloads = 12 }
{ Key = Break, Overloads = 4 }
{ Key = Call, Overloads = 14 }
{ Key = Catch, Overloads = 4 }
{ Key = ClearDebugInfo, Overloads = 1 }
{ Key = Coalesce, Overloads = 2 }
{ Key = Condition, Overloads = 2 }
{ Key = Constant, Overloads = 2 }
{ Key = Continue, Overloads = 2 }
{ Key = Convert, Overloads = 2 }
{ Key = ConvertChecked, Overloads = 2 }
{ Key = DebugInfo, Overloads = 1 }
{ Key = Decrement, Overloads = 2 }
{ Key = Default, Overloads = 1 }
{ Key = Divide, Overloads = 2 }
{ Key = DivideAssign, Overloads = 3 }
{ Key = Dynamic, Overloads = 6 }
{ Key = ElementInit, Overloads = 2 }
{ Key = Empty, Overloads = 1 }
{ Key = Equal, Overloads = 2 }
{ Key = ExclusiveOr, Overloads = 2 }
{ Key = ExclusiveOrAssign, Overloads = 3 }
{ Key = Field, Overloads = 3 }
{ Key = GetActionType, Overloads = 1 }
{ Key = GetDelegateType, Overloads = 1 }
{ Key = GetFuncType, Overloads = 1 }
{ Key = Goto, Overloads = 4 }
{ Key = GreaterThan, Overloads = 2 }
{ Key = GreaterThanOrEqual, Overloads = 2 }
{ Key = IfThen, Overloads = 1 }
{ Key = IfThenElse, Overloads = 1 }
{ Key = Increment, Overloads = 2 }
{ Key = Invoke, Overloads = 2 }
{ Key = IsFalse, Overloads = 2 }
{ Key = IsTrue, Overloads = 2 }
{ Key = Label, Overloads = 6 }
{ Key = Lambda, Overloads = 18 }
{ Key = LeftShift, Overloads = 2 }
{ Key = LeftShiftAssign, Overloads = 3 }
{ Key = LessThan, Overloads = 2 }
{ Key = LessThanOrEqual, Overloads = 2 }
{ Key = ListBind, Overloads = 4 }
{ Key = ListInit, Overloads = 6 }
{ Key = Loop, Overloads = 3 }
{ Key = MakeBinary, Overloads = 3 }
{ Key = MakeCatchBlock, Overloads = 1 }
{ Key = MakeDynamic, Overloads = 6 }
{ Key = MakeGoto, Overloads = 1 }
{ Key = MakeIndex, Overloads = 1 }
{ Key = MakeMemberAccess, Overloads = 1 }
{ Key = MakeTry, Overloads = 1 }
{ Key = MakeUnary, Overloads = 2 }
{ Key = MemberBind, Overloads = 4 }
{ Key = MemberInit, Overloads = 2 }
{ Key = Modulo, Overloads = 2 }
{ Key = ModuloAssign, Overloads = 3 }
{ Key = Multiply, Overloads = 2 }
{ Key = MultiplyAssign, Overloads = 3 }
{ Key = MultiplyAssignChecked, Overloads = 3 }
{ Key = MultiplyChecked, Overloads = 2 }
{ Key = Negate, Overloads = 2 }
{ Key = NegateChecked, Overloads = 2 }
{ Key = New, Overloads = 6 }
{ Key = NewArrayBounds, Overloads = 2 }
{ Key = NewArrayInit, Overloads = 2 }
{ Key = Not, Overloads = 2 }
{ Key = NotEqual, Overloads = 2 }
{ Key = OnesComplement, Overloads = 2 }
{ Key = Or, Overloads = 2 }
{ Key = OrAssign, Overloads = 3 }
{ Key = OrElse, Overloads = 2 }
{ Key = Parameter, Overloads = 2 }
{ Key = PostDecrementAssign, Overloads = 2 }
{ Key = PostIncrementAssign, Overloads = 2 }
{ Key = Power, Overloads = 2 }
{ Key = PowerAssign, Overloads = 3 }
{ Key = PreDecrementAssign, Overloads = 2 }
{ Key = PreIncrementAssign, Overloads = 2 }
{ Key = Property, Overloads = 7 }
{ Key = PropertyOrField, Overloads = 1 }
{ Key = Quote, Overloads = 1 }
{ Key = ReferenceEqual, Overloads = 1 }
{ Key = ReferenceNotEqual, Overloads = 1 }
{ Key = Rethrow, Overloads = 2 }
{ Key = Return, Overloads = 4 }
{ Key = RightShift, Overloads = 2 }
{ Key = RightShiftAssign, Overloads = 3 }
{ Key = RuntimeVariables, Overloads = 2 }
{ Key = Subtract, Overloads = 2 }
{ Key = SubtractAssign, Overloads = 3 }
{ Key = SubtractAssignChecked, Overloads = 3 }
{ Key = SubtractChecked, Overloads = 2 }
{ Key = Switch, Overloads = 6 }
{ Key = SwitchCase, Overloads = 2 }
{ Key = SymbolDocument, Overloads = 4 }
{ Key = Throw, Overloads = 2 }
{ Key = TryCatch, Overloads = 1 }
{ Key = TryCatchFinally, Overloads = 1 }
{ Key = TryFault, Overloads = 1 }
{ Key = TryFinally, Overloads = 1 }
{ Key = TryGetActionType, Overloads = 1 }
{ Key = TryGetFuncType, Overloads = 1 }
{ Key = TypeAs, Overloads = 1 }
{ Key = TypeEqual, Overloads = 1 }
{ Key = TypeIs, Overloads = 1 }
{ Key = UnaryPlus, Overloads = 2 }
{ Key = Unbox, Overloads = 1 }
{ Key = Variable, Overloads = 2 }

As you can see, there’s a wealth of new capabilities added to the expression trees API and one can’t but love this from the bottom of his .NET developer’s heart. What hasn’t changed though, at the moment, is the language support for expression tree generation. That is, when writing a statement bodied lambda expression and assigning it to an Expression<TDelegate> variable, the compiler won’t turn it into an expression tree using the new nodes for you. It keeps on supporting the subset required for LINQ. Till we truly enter the domain of meta-programming, I except the language to stay with the true expression tree subset.

A first sample

To get readers excited about this API enrichment, let’s have a go at writing a bigger expression tree containing statement and control flow nodes. Our running primes sample is a good choice for this task at hand. But first, the regular C# code to set the scene:

Func<int, List<int>> getPrimes = to =>
{
    var res = new List<int>();
    for (int n = 2; n <= to; n++)
    {
        bool found = false;

        for (int d = 2; d <= Math.Sqrt(n); d++)
        {
            if (n % d == 0)
            {
                found = true;
                break;
            }
        }

        if (!found)
            res.Add(n);
    }
    return res;
};

foreach (var num in getPrimes(100))
    Console.WriteLine(num);

This code should be fairly straightforward to grasp. Notice I’m using a lambda expression to define the “method” we’re talking about, so that we have a local variable called getPrimes that we can invoke through. Our goal is to generate this code at runtime using the expression trees APIs, so that the use site (the foreach loop at the bottom) can remain the same.

Next, we’re on for a stroll in expression tree land. The top level node is a plain old LambdaExpression but then things start to change from what we’re used to. First of all, we’ll need sequencing which can be achieved through a BlockExpression. Each block can have a number of variables that are in scope, as well as a number of expressions that make part of it. In our case, the body of the lambda has a res variable and two statements: one for the for-loop and one for the return-statement:

var to = Expression.Parameter(typeof(int), "to");
var res = Expression.Variable(typeof(List<int>), "res");
var getPrimes = // Func<int, List<int>> getPrimes = Expression.Lambda<Func<int, List<int>>>( // { Expression.Block( // List<int> res; new [] { res }, // res = new List<int>(); Expression.Assign( res, Expression.New(typeof(List<int>)) ), // loop goes here res ), to // } ).Compile(); foreach (var num in getPrimes(100)) Console.WriteLine(num);

Again, this piece of code is not too hard to understand. The Lambda factory method takes in the body and the lambda parameters. Difference from the past API landscape is that the body now can be a BlockExpression, and in our case it has a “res” variable in scope. Assignment is trivial using the Assign factory method passing in the left-hand side and right-hand side. Returning from the lambda expression can simply be done by using the “res” expression since the last valued expression from a block becomes its return value (an Expression.Return factory exists for more complicated cases where one returns from the middle of a block of code, which boils down to a “value-carrying” GotoExpression behind the scenes). Notice this is more general property of expression trees, as languages like Ruby use this property throughout (e.g. the value of an if-statement is the last expression’s value from either of the branches).

Now the loop. Although a whole bunch of loop constructs exist in front-end languages like C#, the expression tree APIs only provide one out of the box. Its goal is to centralize the plumbing involved with re-executing the loop’s body while having break and continue capabilities, as well as maintaining the last iteration’s value as the result of the loop (again for languages like Ruby that work on the DLR using expression trees). As we’ve written a for-loop in the original fragment, we need a way to cross-compile a for-loop into this basic building block for loops:

for (initializer; condition; update)
    body

stands for

{
    initializer;
    while (condition)
    {
        body;
continueLabel:
        update;
    }
breakLabel:
}

where the body’s immediate inner statements get a substitution applied to rewrite break and continue statements in terms of the added labels. Notice the use of a separate block, keeping the for-loop’s initialized variables in a separate scope. More-over, the expression tree API’s loop construct doesn’t have a terminating condition, so that too needs to be replaced by a custom jump to the break label:

{
    initializer;
    while (true)
    {
        if (!condition)
            goto breakLabel;
        body;
continueLabel:
        update;
    }
breakLabel:
}

As our loops don’t use continue statements, we can omit that complication. For the outer loop we end up with:

// {
Expression.Block(
    // int n;
    new [] { n },
    // n = 2;
    Expression.Assign(
        n,
        Expression.Constant(2)
    ),
    // while (true)
    Expression.Loop(
        // {
        Expression.Block(
            // if
            Expression.IfThen(
                // (!
                Expression.Not(
                    // (n <= to)
                    Expression.LessThanOrEqual(
                        n,
                        to
                    )
                // )
                ),
                // break;
                Expression.Break(breakOuter)
            ),
            // loop body goes here
            // n++;
            Expression.PostIncrementAssign(n)
        // }
        ),
        breakOuter
    )
)

A more thorough explanation is in order here. Think of Expression.Loop as providing an infinite loop. So, right inside the loop’s body (the first argument to the Loop call) we provide a condition to jump out of the loop when the terminating condition holds. As part of the loop’s body we also need to have the inner loop (see further) as well as the update part of the for-statement, which is a post-increment-assign node. One more thing is required here though: the use of a break label to denote the statement right after the loop. We create such a beast using the following call:

var breakOuter = Expression.Label();

and specify it as the last argument to the Loop call, where the expression tree API can take care of emitting the label in the right spot. Inside the loop, we can simply use the Break factory method to emit a GotoExpression that essentially breaks out of the loop. Also notice the use of the outer loop’s block as the scope for the variable “n”, which was declared like this:

var n = Expression.Variable(typeof(int), "n");

No rocket science either. Finally, we need to write the loop’s body which is piece of cake again: use a block to group multiple statements. It also contains an inner for-loop for which we use the same tricks, including the use of a break label etc:

// {
Expression.Block(
    // bool found;
    new[] { found },
    // found = false;
    Expression.Assign(
        found,
        Expression.Constant(false)
    ),
    // {
    Expression.Block(
        // int d;
        new [] { d },
        // d = 2;
        Expression.Assign(
            d,
            Expression.Constant(2)
        ),
        // while (true)
        Expression.Loop(
            // {
            Expression.Block(
                // if
                Expression.IfThen(
                    // (!
                    Expression.Not(
                        // d <= Math.Sqrt(n)
                        Expression.LessThanOrEqual(
                            d,
                            Expression.Convert(
                                Expression.Call(
                                    null,
                                    typeof(Math).GetMethod("Sqrt"),
                                    Expression.Convert(
                                        n,
                                        typeof(double)
                                    )
                                ),
                                typeof(int)
                            )
                        )
                    // )
                    ),
                    // break;
                    Expression.Break(breakInner)
                ),
                // {
                Expression.Block(
                    // if (n % d == 0)
                    Expression.IfThen(
                        Expression.Equal(
                            Expression.Modulo(
                                n,
                                d
                            ),
                            Expression.Constant(0)
                        ),
                        // {
                        Expression.Block(
                            // found = true;
                            Expression.Assign(
                                found,
                                Expression.Constant(true)
                            ),
                            // break;
                            Expression.Break(breakInner)
                        // }
                        )
                    )
                // }
                ),
                // d++;
                Expression.PostIncrementAssign(d)
            // }
            ),
            breakInner
        )
    ),
    // if
    Expression.IfThen(
        // (!found)
        Expression.Not(found),
        //    res.Add(n);
        Expression.Call(
            res,
            typeof(List<int>).GetMethod("Add"),
            n
        )
    )
),

It should be straightforward to find out how the variables d, found and the label breakInner have been created upfront. Around the d <= Math.Sqrt(n) code there’s quite a bit of plumbing needed because of the implicit conversions that take place all over, resulting in the use of a few convert nodes. Other than that, the code above doesn’t use anything new.

For simplicity when trying this out for yourself in .NET 4.0 and Visual Studio 2010, here’s the whole fragment:

var to = Expression.Parameter(typeof(int), "to");
var res = Expression.Variable(typeof(List<int>), "res");
var n = Expression.Variable(typeof(int), "n");
var found = Expression.Variable(typeof(bool), "found");
var d = Expression.Variable(typeof(int), "d");
var breakOuter = Expression.Label();
var breakInner = Expression.Label();
var getPrimes = 
    // Func<int, List<int>> getPrimes =
    Expression.Lambda<Func<int, List<int>>>(
        // {
        Expression.Block(
            // List<int> res;
            new [] { res },
            // res = new List<int>();
            Expression.Assign(
                res,
                Expression.New(typeof(List<int>))
            ),
            // {
            Expression.Block(
                // int n;
                new [] { n },
                // n = 2;
                Expression.Assign(
                    n,
                    Expression.Constant(2)
                ),
                // while (true)
                Expression.Loop(
                    // {
                    Expression.Block(
                        // if
                        Expression.IfThen(
                            // (!
                            Expression.Not(
                                // (n <= to)
                                Expression.LessThanOrEqual(
                                    n,
                                    to
                                )
                            // )
                            ),
                            // break;
                            Expression.Break(breakOuter)
                        ),
                        // {
                        Expression.Block(
                            // bool found;
                            new[] { found },
                            // found = false;
                            Expression.Assign(
                                found,
                                Expression.Constant(false)
                            ),
                            // {
                            Expression.Block(
                                // int d;
                                new [] { d },
                                // d = 2;
                                Expression.Assign(
                                    d,
                                    Expression.Constant(2)
                                ),
                                // while (true)
                                Expression.Loop(
                                    // {
                                    Expression.Block(
                                        // if
                                        Expression.IfThen(
                                            // (!
                                            Expression.Not(
                                                // d <= Math.Sqrt(n)
                                                Expression.LessThanOrEqual(
                                                    d,
                                                    Expression.Convert(
                                                        Expression.Call(
                                                            null,
                                                            typeof(Math).GetMethod("Sqrt"),
                                                            Expression.Convert(
                                                                n,
                                                                typeof(double)
                                                            )
                                                        ),
                                                        typeof(int)
                                                    )
                                                )
                                            // )
                                            ),
                                            // break;
                                            Expression.Break(breakInner)
                                        ),
                                        // {
                                        Expression.Block(
                                            // if (n % d == 0)
                                            Expression.IfThen(
                                                Expression.Equal(
                                                    Expression.Modulo(
                                                        n,
                                                        d
                                                    ),
                                                    Expression.Constant(0)
                                                ),
                                                // {
                                                Expression.Block(
                                                    // found = true;
                                                    Expression.Assign(
                                                        found,
                                                        Expression.Constant(true)
                                                    ),
                                                    // break;
                                                    Expression.Break(breakInner)
                                                // }
                                                )
                                            )
                                        // }
                                        ),
                                        // d++;
                                        Expression.PostIncrementAssign(d)
                                    // }
                                    ),
                                    breakInner
                                )
                            ),
                            // if
                            Expression.IfThen(
                                // (!found)
                                Expression.Not(found),
                                //    res.Add(n);
                                Expression.Call(
                                    res,
                                    typeof(List<int>).GetMethod("Add"),
                                    n
                                )
                            )
                        ),
                        // n++;
                        Expression.PostIncrementAssign(n)
                    // }
                    ),
                    breakOuter
                )
            ),
            res
        ),
        to
    // }
    ).Compile();

foreach (var num in getPrimes(100))
    Console.WriteLine(num);
 

Inspecting the IL code

Obviously you want to see the generated code after calling Compile. Near the top of this post you saw how to do this using SOS, so I won’t repeat the recipe over here (just make sure to load the sos.dll file from the v4.0 installation folder instead), but below is the result:

!dumpil 023f2bc0
This is dynamic IL. Exception info is not reported at this time.
If a token is unresolved, run "!do <addr>" on the addr given
in parenthesis. You can also look at the token table yourself, by
running "!DumpArray 023f985c".

// res = new List<int>();
IL_0000: newobj 023ef050 is not a MethodDesc
6000003
IL_0005: stloc.0

// n = 2
IL_0006: ldc.i4.2
IL_0007: stloc.1

// n <= to
IL_0008: ldloc.1
IL_0009: ldarg.1
IL_000a: ble.s IL_000f

// “mumble, mumble”
IL_000c: ldc.i4.0
IL_000d: br.s IL_0010
IL_000f: ldc.i4.1
IL_0010: ldc.i4.0
IL_0011: ceq
IL_0013: brfalse IL_001d
IL_0018: br IL_0079

// found = false;
IL_001d: ldc.i4.0
IL_001e: stloc.2

// d = 2
IL_001f: ldc.i4.2
IL_0020: stloc.3

// d <= Math.Sqrt(n)
IL_0021: ldloc.3
IL_0022: ldloc.1
IL_0023: conv.r8
IL_0024: call 023ef9a0 is not a MethodDesc
6000004
IL_0029: conv.i4
IL_002a: ble.s IL_002f

// “mumble, mumble”

IL_002c: ldc.i4.0
IL_002d: br.s IL_0030
IL_002f: ldc.i4.1
IL_0030: ldc.i4.0
IL_0031: ceq
IL_0033: brfalse IL_003d
IL_0038: br IL_005c

// n % d == 0
IL_003d: ldloc.1
IL_003e: ldloc.3
IL_003f: rem
IL_0040: ldc.i4.0
IL_0041: ceq

IL_0043: brfalse IL_004f

// found = true;
IL_0048: ldc.i4.1
IL_0049: stloc.2

IL_004a: br IL_005c

// d++;
IL_004f: ldloc.3
IL_0050: stloc.s VAR OR ARG 4
IL_0052: ldloc.s VAR OR ARG 4
IL_0054: ldc.i4.1
IL_0055: add
IL_0056: stloc.3

IL_0057: br IL_0021

// if (!found)
IL_005c: ldloc.2
IL_005d: ldc.i4.0
IL_005e: ceq
IL_0060: brfalse IL_006c

// res.Add(n);
IL_0065: ldloc.0
IL_0066: ldloc.1
IL_0067: callvirt 023f0548 is not a MethodDesc
6000005

// n++;
IL_006c: ldloc.1
IL_006d: stloc.s VAR OR ARG 5
IL_006f: ldloc.s VAR OR ARG 5
IL_0071: ldc.i4.1
IL_0072: add
IL_0073: stloc.1

IL_0074: br IL_0008

// return res;
IL_0079: ldloc.0
IL_007a: ret

Amazing, isn’t it? That’s it for now. Expect to see more expression tree fun in the future, in my continuation to the Control Library series of posts.

Del.icio.us | Digg It | Technorati | Blinklist | Furl | reddit | DotNetKicks

Filed under: ,

Comments

# re: Expression Trees, Take Two – Introducing System.Linq.Expressions v4.0

Monday, August 10, 2009 10:54 AM by Iso Su

I can see that with things like values taken from last statement explicit expression trees might be more "expressive" than C#, but still why there's no native support for simple implicit format

Expression<Func<int, int, int>> foo = (a, b) => { return a  + b; };

Is it planned at all for 2010 and we just don't have it in beta, or will we have to wait C# 5?

# re: Expression Trees, Take Two – Introducing System.Linq.Expressions v4.0

Monday, August 10, 2009 12:03 PM by Daniel Marbach

Hello

I really like where the future of .NET/c# is going and I really enjoyed reading this article. But I must say that all these new features make the code also far more complex to read. Especially when you are overusing expression trees and linq in the code the code is much more complex to maintain because you tend to pack more and more powerful stuff into a single line of code.

What I see in our big projects that there are always the all excited developers (I admit I'm also one of those) and the developers that fear touching or refactor LINQ queries or expression trees. Sometimes for me this contradicts the good principles like TDD and also agile ways of thinking "like code belongs to everyone" etc. Do you understand what I want to point out?

Thanks for this wonderful article

Daniel

# Expression Trees, Take Two – Introducing System.Linq.Expressions v4.0 - Bart De Smet

Monday, August 10, 2009 12:36 PM by DotNetShoutout

Thank you for submitting this cool story - Trackback from DotNetShoutout

# re: Expression Trees, Take Two – Introducing System.Linq.Expressions v4.0

Monday, August 10, 2009 4:57 PM by Justin Bailey

Bart,

I really want to use expression trees for metaprogramming-light, but

there is a limitation that really bothers me and I wonder if you have

a solution. What bothers me is that one Expression<TDelegate> looks

much like another. For example, these two expressions have the same

type, but radically different representations:

Expression<Func<object, string>> a = obj => "foo";

Expression<Func<object, string>> b = obj => obj.ToString();

Let me illustrate why this is a problem with an example. In

"Functional .NET – Lose the Magic Strings", Matthew Podwysocki

describes a feature in the ASP.NET MVC Futures library which lets you

easily construct HTML input elements based on the property name of

your business object. For example, assuming you are rendering a form

for some object with the "FirstName" property, you could construct an

input textbox with this statement:

<%= Html.TextBoxFor(x => x.Person.FirstName) %>

The key is that TextBoxFor is an extension method that takes an

expression tree argument:

public static string TextBoxFor<TModel, TProperty>(this

HtmlHelper<TModel> htmlHelper,

  Expression<Func<TModel, TProperty>> expression, ...)

The issue here is that the expression "x.Person.FirstName" can be

replaced by anything that has the right type and the compiler won't

catch the error. I could instead have this statement:

<%= Html.TextBoxFor(x => x.Person.FirstName + "FooBar") %>

and I don't think the compiler would catch it.

I would really like to use the facility to generate expression trees,

but I am afraid I'll replace my obscure run-time errors due to magic

strings with obscure run-time errors due to strangely formed

expression trees.

With all the work you've done on expression trees, do you see a way to

get the compiler to enforce a "legal" expression in these cases? Does

.NET 4 give something more than 3.5 does?

Let me know if I can clarify anything, and feel free to post this to your blog.

Thanks for all your work

Justin

p.s. I also emailed this to info@bartdesmet.net. I hope you don't mind the double post!

# re: Expression Trees, Take Two – Introducing System.Linq.Expressions v4.0

Monday, August 10, 2009 10:23 PM by bart

Hi Iso,

Unfortunately no; the code you've written contains a statement body, which is not supported to be turned into an expression tree. Obviously the code you wrote can be simplified to:

Expression<Func<int,int,int>> f = (a, b) => a + b;

But as soon as you end up having curly braces around the lambda expression body, odds are off for now.

Thanks,

-Bart

# re: Expression Trees, Take Two – Introducing System.Linq.Expressions v4.0

Monday, August 10, 2009 10:32 PM by bart

Hi Daniel,

I definitely understand your concerns. LINQ, as natural as it may feel, has some foreign characteristic to it for certain groups of developers. However, in general it results in more concise (pessimists would say more dense) code, where the imperative style equivalent would be much more involved and harder to understand in the majority of cases (try a manual filter, sort and group using a few loops and intermediate container objects). In addition, yielding control to the runtime opens up for great opportunities towards the future, e.g. as illustrated with PLINQ.

In my opinion, it's merely a matter of time for people to get used to the new state of affairs. Much like vtable-dispatch was a foreign animal a few decades ago. History repeats itself and concerns around new features tend to center around similar arguments all the time: is the code clearer, how easy is it to understand the runtime behavior (indirection levels, lazy evaluation), what's the performance profile (cost of table lookups, runtime compilation), etc.

Obviously features can be misused or used in places where it's rather inappropriate. As usual it takes a while for people to settle down and a balance to get established. Meta-programming is subject to similar concerns and as you can see, we're proceeding with caution here not to age and overcomplicate the language prematurely. In .NET 4.0, no big language surface changes are made and focus is rather on infrastructure (like the DLR) that will fuel potential language-level extensions going forward. At the same time, this slowdown of language innovation (other than the dynamic theme in .NET 4.0) allows things to settle on the field of LINQ.

Hope this helps,

-Bart

# re: Expression Trees, Take Two – Introducing System.Linq.Expressions v4.0

Monday, August 10, 2009 10:52 PM by bart

Hi Justin,

Great remark on the limitations of expression trees. Indeed, at this point you're trading compile-time checking for runtime checking carried out by whatever library consumes the expression tree. Typical sample is the use of expression trees in LINQ where nonsense queries can be written, in a sense they cannot be remoted to the target query language. Such a condition is detected at runtime when the query provider tries to do the cross-compilation.

All there is to expression tree safety at compile time is type checking. Though our type system is fairly rich, there's no silver bullet that allows replacement of all sorts of runtime checks (e.g. array out of bounds indexing). Unfortunately, expression trees are a potential source of more such trading between compile-time and run-time checking, much like dynamic languages have all over the place for regular operations.

Restricting what can be written in the context of an expression tree tends to get very complicated. Either the type system (in the large) or more stringent use of types (in the small) needs to get more rich to turn more checks into compile-time verification. That's not an easy thing to do, or puts too much burden on the users of the API. Another approach of restricting what can be written inside an expression tree (e.g. only "property getter lookups") would result in a snowball effect with a huge language surface for little benefit and with similar headaches as with generic type constraints (people want more and more constraint mechanisms, like "where T : operator +" every time around...). And such things won't make the problem go away because expression tree node restrictions together with type restrictions aren't restrictive enough (e.g. "property lookup for DateTime" will allow x.Person.BirthDay but will allow a plain old DateTime.Now as well).

In essence, with meta-programming facilities you're becoming an extension to the compiler, which is a very powerful capability. Today that implies runtime analysis of expression trees but there's nothing (other than practical limitations at the time of writing) that would prevent us from allowing people to tap in to the compile time phase with similar expression trees in the play. At this point, that's futuristic thinking, but plans tend towards that direction with the "compiler as a service" mission. In such a setting, you'd potentially get the ability to ask the language compilers to call into your extension for certain (annotated) parts of the code. E.g. every call to Html.TextBoxFor could result in a callback on your extension to analyze the whole call-site, including the expression tree argument passed in. If you disagree with the code written by the user, the compiler extension API could be used to feed back warning/error messages to the user at compile time.

All of this reflects very futuristic and mostly random thinking, but isn't entirely science fiction. Having a "code as data" representation at compile-time is what we're moving towards for future releases and we're only starting to see the beginning of the abilities that opens up. For now though, runtime analysis of expression trees is the best you can get. As usual, make your exceptions as clear as possible (e.g. with error codes just like a true compiler does, e.g. AB1234, which makes errors very searchable and specific)

Thanks,

-Bart

# re: Expression Trees, Take Two – Introducing System.Linq.Expressions v4.0

Tuesday, August 11, 2009 12:36 AM by aL

awsome stuff :) but why didnt the team enable statement tree generation by the compiler? time constraints? it would be such a uselful thing it seems :) also it would natrually extend the syntax, make an expression lambda, get an expression tree, make a statement lambda, get a statement tree.

for me as a developer, thats very attractive :) it would still be Expression of TDelegate so type wise it would be compatible with all exsisting QueryProviders, but those that dont support statements (all of them right now) would simply throw an exception of they encounterd a statement node, just like they do today if they find a methodcall node for example.

also, what happed to that z3 series? :) thats the one ive been looking forward to the most ;)

# Reflective Perspective - Chris Alcock &raquo; The Morning Brew #409

Pingback from  Reflective Perspective - Chris Alcock  &raquo; The Morning Brew #409

# Learning resources for C# 4.0 and .NET 4.0 new features &laquo; Bogdan Brinzarea&#8217;s blog

Pingback from  Learning resources for C# 4.0 and .NET 4.0 new features &laquo;  Bogdan Brinzarea&#8217;s blog

# Summary 09.08.2009 &ndash; 11.08.2009 &laquo; Bogdan Brinzarea&#8217;s blog

Pingback from  Summary 09.08.2009 &ndash; 11.08.2009 &laquo;  Bogdan Brinzarea&#8217;s blog

# re: Expression Trees, Take Two – Introducing System.Linq.Expressions v4.0

Tuesday, August 11, 2009 7:43 AM by bart

Hi aL,

Extending language support to include statement trees would be a nice thing indeed, but given the requirement to design, implement, test and document (E. Lippert) such a feature before shipping it, it doesn't come at a zero cost (to say the least). So, the question is whether it's worthwhile to have and, at this point, no compelling features would be layered on top of it (as expression trees did, to be a key element in the LINQ story). That equation sums up as "not at this point". Also, in general it's better to hold off on features without having a major consumer right from the start. Otherwise you're stuck with your initial implementation forever with no good opportunities to recover from the mistakes made. Another example of a cut feature because of such concerns is extension properties: would be nice to have, but without a major API inside the framework as a reality check for its feasibility and given the non-trivial nature of the feature (more specifically disambiguation syntax) it was decided to eliminate the feature for this release cycle (again, extension methods had a reality check by means of the LINQ operators they were initially designed for).

I'll do a follow-up post illustrating how expression tree fragments can be combined into statement trees, making the additional cost to embrace statement trees smaller than it seems from this post.

Thanks,

-Bart

PS: My LINQ to Z3 series is still planned to get a follow-up, but due to other priorities (tip: in the realm of writing, but outside the blogosphere) it has gotten on a slighly slower path. Maybe one of the following weekends.

# Statement Trees With Less Pain ??? Follow-Up on System.Linq.Expressions v4.0 - B# .NET Blog

Pingback from  Statement Trees With Less Pain ??? Follow-Up on System.Linq.Expressions v4.0 - B# .NET Blog

# Twitted by djsolid

Thursday, August 13, 2009 3:58 AM by Twitted by djsolid

Pingback from  Twitted by djsolid

# Friday Links #64 | Blue Onion Software *

Saturday, August 15, 2009 6:14 AM by Friday Links #64 | Blue Onion Software *

Pingback from  Friday Links #64 | Blue Onion Software *

# Interesting Finds: August 19, 2009

Wednesday, August 19, 2009 4:35 AM by Jason Haley

Interesting Finds: August 19, 2009

# Twitted by Mohamed_Meligy

Friday, August 21, 2009 3:07 PM by Twitted by Mohamed_Meligy

Pingback from  Twitted by Mohamed_Meligy

# Full Expression Tree Support Coming in .NET/C# 4.0

Monday, August 24, 2009 11:04 AM by hacked.brain

Full Expression Tree Support Coming in .NET/C# 4.0

# Top 9 Posts from 2009

Sunday, January 03, 2010 1:15 AM by B# .NET Blog

select top 9 [Subject] from dbo . cs_Posts where postlevel = 1 and usertime &lt; &#39;01/01/2010&#39;

# Calling a generic method with Expression.Call

Saturday, January 16, 2010 8:20 AM by OnTheBlog

The other day I came across a comment in the code base of the application I'm currently working on that said the following. // Having to loop through the GetMethods() array is pretty rubbish, even if ...

# Calling a generic method with Expression.Call

Sunday, January 17, 2010 1:02 AM by OnTheBlog

The other day I came across a comment in the code base of the application I'm currently working on that said the following. // Having to loop through the GetMethods() array is pretty rubbish, even if ...

# Reading on Expression Trees &laquo; Tales from a Trading Desk

Tuesday, January 19, 2010 7:42 AM by Reading on Expression Trees « Tales from a Trading Desk

Pingback from  Reading on Expression Trees &laquo; Tales from a Trading Desk

# Useful Twitter links Feb 8-Feb 15 2010 &laquo; Ian Mercer

Thursday, February 11, 2010 6:19 PM by Useful Twitter links Feb 8-Feb 15 2010 « Ian Mercer

Pingback from  Useful Twitter links Feb 8-Feb 15 2010 &laquo;  Ian Mercer

# .NET 4.0 expression trees: Code gen, blocks, and visitors

Thursday, February 18, 2010 8:47 AM by Fabian's Mix

.NET 4.0 expression trees: Code gen, blocks, and visitors

# Learning resources for C# 4.0 and .NET 4.0 new features | Noman Ali Blog

Pingback from  Learning resources for C# 4.0 and .NET 4.0 new features | Noman Ali Blog

# RealTime - Questions: "SQL update statement to change integer to have a 1 in front of it?"

Pingback from  RealTime - Questions: "SQL update statement to change integer to have a 1 in front of it?"

# Performance of compiled-to-delegate Expression | Ask Programming &amp; Technology

Pingback from  Performance of compiled-to-delegate Expression | Ask Programming &amp; Technology

# Any Changes/improvement For LINQ 2010 Vs LINQ 2008 | Click &amp; Find Answer !

Pingback from  Any Changes/improvement For LINQ 2010 Vs LINQ 2008 | Click &amp; Find Answer !

# replica rolex air king

whichever Replica Breitling Aeromarine this is this blogs by way of completely coded lin rolex GMT-Master replica go i believe, there may be family room here to push specific massive marked as online, and i believe an additional large wish askjeeve might