Last time in this series we were able to compile a stunningly complex “dynamic lambda” x => x – also known as I in the world of combinators – into IL code at runtime. As that’s not particularly useful, we want to move on to slightly more complex expressions like:

var o = DynamicExpression.Parameter("o");
var a = DynamicExpression.Parameter("a");
var b = DynamicExpression.Parameter("b");
var call = DynamicExpression.Call(o, "Substring", a, b);
var func = DynamicExpression.Lambda(call, o, a, b);
Console.WriteLine(func);
Console.WriteLine(func.Compile().DynamicInvoke("Bart", 1, 2));

Or, in pretty print,

(o, a, b) => o.Substring(a, b)

 

Setting the scene

We explained how the translation of a dynamic expression tree takes place in general: as we traverse the tree, individual nodes are visited asking them to append code capturing the expression’s semantics to an IL stream, pushing a value on the stack that corresponds to the evaluated expression. The method that does this translation for every dynamic expression is called “Compile”:

/// <summary>
/// Appends IL instructions to calculate the expression's runtime value, putting it on top of the evaluation stack.
/// </summary>
/// <param name="ilgen">IL generator to append to.</param>
/// <param name="ldArgs">Lambda argument mappings.</param>
protected internal abstract void Compile(ILGenerator ilgen, Dictionary<ParameterDynamicExpression, int> ldArgs);

In here we’re using a simplified concept of “lambda parameters in scope” using ldArgs, avoiding getting into slightly more complex techniques such as hoisting that are required for more involved expression trees. Previously you saw how to implement this method for ParameterDynamicExpression and LambdaDynamicExpression, respectively:

protected internal override void Compile(ILGenerator ilgen, Dictionary<ParameterDynamicExpression, int> ldArgs)
{
    if (!ldArgs.ContainsKey(this))
        throw new InvalidOperationException("Parameter expression " + Name + " is not in scope.");

    ilgen.Emit(OpCodes.Ldarg, ldArgs[this]);
}

and

protected internal override void Compile(ILGenerator ilgen, Dictionary<ParameterDynamicExpression, int> ldArgs)
{
    Body.Compile(ilgen, ldArgs);
    ilgen.Emit(OpCodes.Ret);
}

 

Dynamic method calls and binders

For MethodCallExpression things are a bit more involved than for the expression types above. Before we start, remember the most important portion of the Compile method contract: leave one value on top of the stack that corresponds to the evaluated expression, in this case a method call. What does a method call consist of? Here are the ingredients:

public ReadOnlyCollection<DynamicExpression> Arguments { get; private set; }
public DynamicExpression Object { get; private set; }
public string Method { get; private set; }

As we’re seeing other DynamicExpression objects being referenced in here, it’s already clear we’ll have to evaluate those by recursively calling Compile. So, we could do something along the lines of:

compile Object
foreach argument in Arguments
    compile argument
call Method

That’s the typical structure of a call site, pushing arguments on the stack including the object to invoke the method on. From a stack point of view: n + 1 arguments are pushed, where n is the number of arguments and 1 accounts for the instance to invoke the method one, and next all of those stack citizens are eaten by the method call, producing the single return value on top of the stack. This follows the contract of our Compile method.

There’s a slight problem though: we can’t emit the call because we don’t know what type to invoke it one. The reason is well-known in the meantime: the Object nor the Arguments have strongly-typed information, so just given a string named “Method”, we can’t get the required method metadata to emit a call(virt) instruction. Bummer. But that’s the whole point of dynamic programming, delaying the decision about the executed method/function till runtime because the type might dynamically grow with new members (think of ETS in PowerShell as a sample of such a capability).

One way to solve this problem is to emit a bunch of reflection code to investigate the type of Object at runtime, do the same for all of the arguments, try to find a suitable method to call, etc etc. I shouldn’t explain how complicated this would become :-). There are lots of drawbacks to this: we’re baking in the whole dynamic call infrastructure into the call site and as we’re emitting all of that code, the odds to adapt it without having to recompile the code are off. This whole “locate a suitable method” algorithm could be made extensible too if we’re not emitting it into the generated code straight away. In other words, we want to get out of the IL generating business as soon as we can, and introduce a level of indirection. That particular kind of indirection is what we call a binder.

So what’s a binder precisely? It’s simply a class that contains all the functionality to make well-formed decisions (based on certain desired semantics) about method calls (amongst other invocation mechanisms). Actually we have such a thing in the framework already: System.Reflection.Binder. As the documentation says:

Selects a member from a list of candidates, and performs type conversion from actual argument type to formal argument type.

The list of candidates is something that can be made extensible, allowing methods to be “imported” or “attached” to existing types at runtime. The type conversion clause in the sentence above outlines that the binder is responsible to take the actual passed in arguments (in our case weakly typed) and turn them into (i.e. casting) formal argument types that are suitable for consumption by the selected candidate method. The sample on MSDN for System.Reflection.Binder shows what it takes to implement such a beast. We’re not going to do that though, just to simplify matters a bit. As we’re only interested in method calls, we’ll just implement the bare minimum binder to get the job done and explained. Furthermore, we won’t spend time on implicit conversions for built-in types (like int to long) as the mentioned sample illustrates that already. Last but not least, generics are not brought in the equation either.

Without further delay, let’s show a possible binder implementation:

public class DynamicBinder
{
    public static object Call(object @this, string methodName, params object[] args)
    {
        //
        // Here we're going to be lazy for demo purposes only. Our overload resolution
        // will pick the first applicable method without applying "betterness" rules
        // as outlined in the C# specification (v2.0, section $7.4.2). We don't care
        // about extension methods either (how could the namespace be brought in scope
        // in the context of an expression tree...?) nor other dynamic type extensions
        // such as IExpando (~ IMarshalEx) or e.g. PowerShell ETS.
        //

        var result = (from method in @this.GetType().GetMethods()
                      where method.Name == methodName
                      let parameters = method.GetParameters()
                      where parameters.Length == args.Length
                            && parameters.Where((p, i) => p.ParameterType.IsAssignableFrom(args[i].GetType())).Count() == args.Length
                      select new { Method = method, Parameters = parameters }).SingleOrDefault();

        if (result == null)
        {
            StringBuilder sb = new StringBuilder();
            sb.Append("Failed to bind method call: ");
            sb.Append(@this.GetType());
            sb.Append(".");
            sb.Append(methodName);
            sb.Append("(");

            int n = args.Length;
            for (int i = 0; i < n; i++)
                sb.Append(args[i].GetType().ToString() + (i != n - 1 ? ", " : ""));

            sb.Append(").");
            throw new InvalidOperationException(sb.ToString());
        }

        return result.Method.Invoke(@this, args);
    }
}

This needs some explanation I assume. The signature should be straightforward: given an object @this, we want to call method methodName with zero or more arguments args. The result of this will be an object again (notice we don’t support void return types for methods being called, which isn’t too big of deal when considering functions as lambdas – i.e. no statement lambdas). What’s more interesting though is the way we find a suitable method. I chose to write it as a gigantic LINQ expression just to show how powerful LINQ can be. Let me walk you through it:

var result = (from method in @this.GetType().GetMethods()
              where method.Name == methodName
              let parameters = method.GetParameters()

For all methods available on the left-hand side of the call (i.e. @this) select those methods that have the same name (case sensitive compare – this would be a binder that mimics C# name resolution for method calls) and let parameters be a variable containing the parameters for each of the selected methods going forward. In other words, in what follows we’re seeing a sequence of (method, parameters) pairs mapping each suitable (at least concerning the name) method on the parameters it takes. Next we need to do overload resolution:

              where parameters.Length == args.Length

Here we make sure the number of arguments on the candidate method matches the number of arguments passed in to the binder’s Call call. This implies we don’t consider things like optional arguments supported by some languages which would mean that having less matching parameters (but not more!) would keep the method as a candidate, although there would need to be some ordering to make sure that methods with more arguments take precedence over methods with arguments supplied through optional values. Notice this simple check makes it also impossible to call a “params” method without stiffing the argument in an array upfront.

                    && parameters.Where((p, i) => p.ParameterType.IsAssignableFrom(args[i].GetType())).Count() == args.Length

Now we’re in the clause that’s maybe the most interesting. Here we’re taking all the parameters of the candidate and check that the parameter p on position i has a type that’s assignable from the type of the argument passed in to the binder’s Call method. In essence this is contravariance for arguments. Assume we’re examining a candidate method like this:

class ExperimentalZoo
{
    Animal CloneBeast(Mammal g);
}

and we’re calling the binder as follows:

DynamicBinder.Call(new ExperimentalZoo(), “CloneBeast”, new Giraffe())

As we’re calling the binder with an argument of type Giraffe (args[0].GetType()) and Giraffe inherits from Mammal (parameters[i].ParameterType), the candidate is compatible. However, if we’d call the method with an argument of type Goldfish it would clearly not be compatible (as a fish is not a mammal). This is precisely what the Where clause above enforces. The Count() == args.Length trick at the end makes sure all of the arguments pass the test (using the All operator would be ideal but it hasn’t an overload passing in the index; alternatively a Zip operator would be beneficial too).

Finally we have the select clause:

              select new { Method = method, Parameters = parameters }).SingleOrDefault();

which simply extracts the method (of type MethodInfo) and the parameters (of type ParameterInfo[]) and makes sure we only found one match. This is another simplification for illustrative purposes only – to be fully compliant with e.g. the C# language, we’d have to implement all of the overload resolution rules including “betterness rules” that select the most optimal overload. More information on this can be found in the C# specification, in v3.0 under “7.4.3 Overload Resolution”. The key takeaway though is that we can tweak this binder as much as we want (e.g., left as an exercise, we could implement resolution that takes extension methods into account) without affecting the generated IL code that will simply call into the binder’s Call method.

If we find one result, we can just go ahead and call it by calling through the retrieved Method using the Invoke method, passing in the @this pointer and the args array.

 

Connecting the pieces

Now that we have our beloved binder, we need to glue it together with our dynamic expression compilation. In concrete terms this means we need to emit a call to DynamicBinder.Call in the generated IL for the DynamicCallExpression. This isn’t too hard either:

protected internal override void Compile(ILGenerator ilgen, Dictionary<ParameterDynamicExpression, int> ldArgs)
{
    if (Object == null)
        ilgen.Emit(OpCodes.Ldnull);
    else
        Object.Compile(ilgen, ldArgs);

    ilgen.Emit(OpCodes.Ldstr, Method);

    ilgen.Emit(OpCodes.Ldc_I4, Arguments.Count);
    ilgen.Emit(OpCodes.Newarr, typeof(object));

    LocalBuilder arr = ilgen.DeclareLocal(typeof(object[]));
    ilgen.Emit(OpCodes.Stloc, arr);

    int i = 0;
    foreach (DynamicExpression arg in Arguments)
    {
        ilgen.Emit(OpCodes.Ldloc, arr);
        ilgen.Emit(OpCodes.Ldc_I4, i++);
        arg.Compile(ilgen, ldArgs);
        ilgen.Emit(OpCodes.Stelem_Ref);
    }

    ilgen.Emit(OpCodes.Ldloc, arr);

    ilgen.EmitCall(OpCodes.Call, typeof(DynamicBinder).GetMethod("Call"), null);
}

What’s going on here? First, we check whether an Object has been specified. This is more of an extensibility point in case our binder would like to implement global functions (also left as an exercise, for example you could recognize null.Add(1, 2) as a global Add call, translating into Math.Add(…); or, the Method property could be set to “Math.Add” to denote a static method call). We’ll assume the else case holds true for our samples, causing us to call Compile recursively on the Object dynamic expression. This will add the value corresponding to the Object expression tree’s evaluation on top of the stack (note: you can smell call-by-value semantics already, don’t you?). Next, we load the string specified in the Method property onto the stack as well. Currently the stack looks like:

(string) Method
(object) Object.Compile result

Now we get into interesting stuff as our binder’s Call method expects to see an object[] as its third parameter. How many arguments? On for each of the DynamicExpression objects in the Arguments collection, so we do Newarr passing in the object type object after pushing the number of elements to be allocated on the stack using Ldc_I4 passing in Arguments.Count. Now we have our array, we can store it in a local variable we call “arr”. Time to fill the array by first loading the local, then pushing the index followed by a push of the argument’s value – again obtained by a recursive Compile call on the argument “arg” – and finally calling stelem_ref (as we’re dealing with System.Object we need _ref). The loop invariant is that it doesn’t change the stack height: it cleanly loads three “arguments” to stelem_ref which brings the stack delta back to 0).

Ultimately, we load the array local variable and the stack looks like (semantically):

(object[]) Arguments.Select(arg => arg.Compile()).ToArray()
(string) Method
(object) Object.Compile()

ready for a call to DynamicBinder.Call which turns the stack into:

(object) DynamicBinder.Call(Object.Compile(), Method, Arguments.Select(arg => arg.Compile()).ToArray())

Again, we have managed to keep the house clean with regards to the stack behavior, i.e. the element on top of the stack contains the value corresponding to the entire (MethodCall)DynamicExpression.

 

Testing it

Does it work? Let’s try with our running sample:

var o = DynamicExpression.Parameter("o");
var a = DynamicExpression.Parameter("a");
var b = DynamicExpression.Parameter("b");
var call = DynamicExpression.Call(o, "Substring", a, b);
var func = DynamicExpression.Lambda(call, o, a, b);
Console.WriteLine(func);
Console.WriteLine(func.Compile().DynamicInvoke("Bart", 1, 2));

Recognize the patterns in the output IL?

image

A quick walk-through:

  • IL_0000 loads MethodCallDynamicExpression.Object which in turn was compiled into a ldarg V_0 by the ParameterDynamicExpression’s Compile method (this corresponds to “o”)
  • IL_0006 loads MethodCallDynamicExpression.Method
  • IL_000b to IL_0015 prepares the array for the method call arguments to be passed to the binder
  • IL_0016 to IL_0022 puts the first argument (corresponding to “a” translated into ldarg V_1 through ParameterDynamicExpression.Compile) in the array
  • IL_0023 to IL_002f does the same for the second argument (corresponding to “b” translated into ldarg V_2 through ParameterDynamicExpression.Compile)
  • IL_0030 to IL_0036 finally makes the call through the binder, passing in the results of the above and returning the value produced by the binder

If we now set a breakpoint in the DynamicBinder.Call method and let execution continue, we’ll see:

image

The third line in the Call Stack is where DynamicInvoke is happening:

Console.WriteLine(func.Compile().DynamicInvoke("Bart", 1, 2));

and through the “External Code” corresponding to our emitted dynamic method we got back into the DynamicBinder that now will pick the right Substring method given lhs “Bart” and arguments 1 and 2. Ultimately the following prints to the screen:

image

Magic. To show it’s really extensible we can start to compose things endlessly with our two main ingredients: parameter and method call expressions. Here’s a sample (reverse engineering the nested DynamicExpression factory calls is left as an exercise):

image

Also left as an exercise to the reader is to find values for o and a through h that produce the displayed output above :-). For the record, here’s the corresponding IL:

IL_0000: ldarg      V_0
IL_0004: nop       
IL_0005: nop       
IL_0006: ldstr      "Replace"
IL_000b: ldc.i4     2
IL_0010: newarr     Object
IL_0015: stloc.0   
IL_0016: ldloc.0   
IL_0017: ldc.i4     0
IL_001c: ldarg      V_3
IL_0020: nop       
IL_0021: nop       
IL_0022: stelem.ref
IL_0023: ldloc.0   
IL_0024: ldc.i4     1
IL_0029: ldarg      V_4
IL_002d: nop       
IL_002e: nop       
IL_002f: stelem.ref
IL_0030: ldloc.0   
IL_0031: call       System.Object Call(System.Object, System.String, System.Object[])/BinderFun.DynamicBinder
IL_0036: ldstr      "Substring"
IL_003b: ldc.i4     2
IL_0040: newarr     Object
IL_0045: stloc.1   
IL_0046: ldloc.1   
IL_0047: ldc.i4     0
IL_004c: ldarg      V_1
IL_0050: nop       
IL_0051: nop       
IL_0052: stelem.ref
IL_0053: ldloc.1   
IL_0054: ldc.i4     1
IL_0059: ldarg      V_2
IL_005d: nop       
IL_005e: nop       
IL_005f: stelem.ref
IL_0060: ldloc.1   
IL_0061: call       System.Object Call(System.Object, System.String, System.Object[])/BinderFun.DynamicBinder
IL_0066: ldstr      "Replace"
IL_006b: ldc.i4     2
IL_0070: newarr     Object
IL_0075: stloc.2   
IL_0076: ldloc.2   
IL_0077: ldc.i4     0
IL_007c: ldarg      V_5
IL_0080: nop       
IL_0081: nop       
IL_0082: stelem.ref
IL_0083: ldloc.2   
IL_0084: ldc.i4     1
IL_0089: ldarg      V_6
IL_008d: nop       
IL_008e: nop       
IL_008f: stelem.ref
IL_0090: ldloc.2   
IL_0091: call       System.Object Call(System.Object, System.String, System.Object[])/BinderFun.DynamicBinder
IL_0096: ldstr      "PadRight"
IL_009b: ldc.i4     2
IL_00a0: newarr     Object
IL_00a5: stloc.3   
IL_00a6: ldloc.3   
IL_00a7: ldc.i4     0
IL_00ac: ldarg      V_7
IL_00b0: nop       
IL_00b1: nop       
IL_00b2: stelem.ref
IL_00b3: ldloc.3   
IL_00b4: ldc.i4     1
IL_00b9: ldarg      V_8
IL_00bd: nop       
IL_00be: nop       
IL_00bf: stelem.ref
IL_00c0: ldloc.3   
IL_00c1: call       System.Object Call(System.Object, System.String, System.Object[])/BinderFun.DynamicBinder
IL_00c6: ldstr      "ToUpper"
IL_00cb: ldc.i4     0
IL_00d0: newarr     Object
IL_00d5: stloc.s    V_4
IL_00d7: ldloc.s    V_4
IL_00d9: call       System.Object Call(System.Object, System.String, System.Object[])/BinderFun.DynamicBinder
IL_00de: ret       

Enjoy! Next time … who knows what?

Del.icio.us | Digg It | Technorati | Blinklist | Furl | reddit | DotNetKicks

Welcome back to the dynamic expression tree fun. Last time we designed our simplified expression tree class library we’ll be using to enable dynamic treatment of objects. Today, we’ll take this one step further by emitting IL code that resolves the operations invoked on such dynamic objects at runtime through a mechanism called binders. Before we dive in, let me point out that everything discussed in this series is greatly simplified just to illustrate the core ideas and base mechanisms/principles that make dynamic language stuff work.

 

Introducing IL generation

Dynamic code compilation is a wonderful thing. It’s not that hard once you get the basics right (and have some level of IL opcode understanding) but quite hard to debug. Luckily we have tools like Haibo Luo’s IL Visualizer. Since I’ll be using this, download it, extract the ZIP file, compile the whole solution and copy ILMonitor\bin\Debug\*.dll to %programfiles%\Microsoft Visual Studio 9.0\Common7\Packages\Debugger\Visualizers. Alternatively you can put it in your personal Visual Studio 2008\Visualizers folder.

So, what’s our task? Assume we have the following piece of sample code:

class Program
{
    static void Main(string[] args)
    {
        var o = DynamicExpression.Parameter("o");
        var a = DynamicExpression.Parameter("a");
        var b = DynamicExpression.Parameter("b");
        var call = DynamicExpression.Call(o, "Substring", a, b);
        var func = DynamicExpression.Lambda(call, o, a, b);
        Console.WriteLine(func);
        Console.WriteLine(func.Compile().DynamicInvoke("Bart", 1, 2));
    }
}

We already know how to construct the objects and to represent it as a string (which would be (o, a, b) => o.Substring(a, b)). Now we need to focus on the marked Compile method on LambdaDynamicExpression. Starting with the signature of the delegate, we want to create (at runtime) a method that takes in three “dynamic” parameters (corresponding to parameter expressions o, a and b), returning a resulting object. Since we don’t have any type information available, everything should be System.Object, so we’d end up with the following delegate:

delegate object TheDynamicLambdaFunction(object o, object a, object b);

Looking at Compile as a black box, it will return an instance of this delegate pointing at an on-the-fly generated method corresponding to the lambda’s body expression. Returning a System.Delegate, one can call DynamicInvoke (or cast it to a compatible delegate) to invoke it with the given parameters. Obviously we want the call to do “the right thing”, in the sample above it would correspond to a method call to System.String::Substring on “Bart”, passing in startIndex 1 and length 2, producing another string containing “ar”.

It should be clear that we need to emit IL on the fly to translate the lambda expression but also the lambda’s body which could be anything, not just a MethodCallDynamicExpression. Since we lack other expression node types, one such (trivial) thing would be:

var x = DynamicExpression.Parameter("x");
var I = DynamicExpression.Lambda(x, x);
Console.WriteLine(I);
Console.WriteLine(I.Compile().DynamicInvoke("Bart"));

which is just the identity function (the underlined bold ‘o’ above indicates the lamdba’s body). I intentionally named the expression above “I” conform SKI combinators where I is defined as λx . x. Similarly we could define the K combinator as:

var x = DynamicExpression.Parameter("x");
var y = DynamicExpression.Parameter("y");
var K = DynamicExpression.Lambda(x, x, y);
Console.WriteLine(K);
Console.WriteLine(K.Compile().DynamicInvoke("Bart", 123));

but we got sidetracked, so time to move on. The whole point here is that we can’t assume the body of the lambda to be a MethodCallDynamicExpression. So, how do we tackle this? An important observation one can make is this: an expression represents a single value. Right, so what? Wait a minute, is IL-code not stack-based? Adding the two things together we could think of the following solution:

Calling a Compile method on an expression tree object, given a writeable stream for IL instructions, should add all the instructions to the stream required to evaluate the expression, leaving the result of the evaluation on top of the stack.

A LambdaDynamicExpression is the only dynamic expression that supports a publicly visible Compile method. It’s pseudo-code would look like:

  1. Create an IL stream; here the IL stack is empty.
  2. Take the Body expression and compile it by emitting IL instructions for it; this causes the IL stack to be one high.
  3. Add an IL return instruction to return the object on top of the stack.
  4. Return a delegate pointing to the method represented by the generated IL code.

 

Supporting expression compilation

To make this work, we’ll first extend the base class by adding one more method:

/// <summary>
/// Class representing a dynamic expression tree.
/// </summary>
abstract class DynamicExpression
{
    /// <summary>
    /// Appends IL instructions to calculate the expression's runtime value, putting it on top of the evaluation stack.
    /// </summary>
    /// <param name="ilgen">IL generator to append to.</param>
    /// <param name="ldArgs">Lambda argument mappings.</param>
    protected internal abstract void Compile(ILGenerator ilgen, Dictionary<ParameterDynamicExpression, int> ldArgs);

}

This method will take in two things: the IL generator (referred to as “IL stream” in the previous paragraph) and a mapping table for the lambda’s parameter expressions. Why do we need the latter, or better: what does it map the parameter expressions to? Assume we’re compiling

(o, a, b) => o.Substring(a, b)

While traversing the expression tree, asking every node in the correct order to emit IL instructions, we’ll encounter references to the parameters again. Our goal is to write a dynamic method looking like this:

object GeneratedDynamicMethod(object o, object a, object b)
{
    return o.Substring(a, b);
}

where the . obviously denotes a dynamic method call in this case. As we encounter parameter expressions like o, a or b during the translation for the method body, we need to know how to load those parameters from the argument list on the dynamic method. First of all, notice the lambda parameters got mapped in order of specification to correspond to arguments on the generated dynamic method, i.e. o as the first lambda parameter and is the first parameter on the generated method. And so on. This is precisely what the ldArgs argument on Compile stands for: a mapping from the parameter expression representation from the lambda parameters onto the concrete indices for the arguments:

o –> 0
a –> 1
b –> 2

Whenever we encounter such a parameter expression during the compilation, we know the position of the argument, so we can emit a ldarg instruction. This is the most trivial Compile override:

sealed class ParameterDynamicExpression : DynamicExpression
{
    protected internal override void Compile(ILGenerator ilgen, Dictionary<ParameterDynamicExpression, int> ldArgs)
    {
        if (!ldArgs.ContainsKey(this))
            throw new InvalidOperationException("Parameter expression " + Name + " is not in scope.");

        ilgen.Emit(OpCodes.Ldarg, ldArgs[this]);
    }
    …
}

This simply says, whenever code needs to be emitted for a ParameterDynamicExpression, simply try to find it in the dictionary to map it onto the formal parameter index on the dynamic method being emitted and turn it into a ldarg instruction for that argument index. Real full-fledged expression tree implementations would be slightly more complicated because arguments could be hidden when dealing with nested lambdas (quoting, invocation expressions, etc) but that would take us too far away from home.

For the LambdaDynamicExpression, besides a public Compile method, there will also be an override to the inherited one. It simply asks the Body expression to emit itself (which will result in a one-level high stack containing the evaluation result of the body expression), followed by a ret instruction (simply returning the value evaluated through the Body’s IL code preceding it):

sealed class LambdaDynamicExpression : DynamicExpression
{
    protected internal override void Compile(ILGenerator ilgen, Dictionary<ParameterDynamicExpression, int> ldArgs)
    {
        Body.Compile(ilgen, ldArgs);
        ilgen.Emit(OpCodes.Ret);
    }

}

 

Emitting code

For this post, we’ll omit an implementation for MethodCallDynamicExpression as that will be part of the next post focusing on binders. All we want to get to work today is the I combinator or identity function (yeah, another world-beater :-)):

var x = DynamicExpression.Parameter("x");
var I = DynamicExpression.Lambda(x, x);
Console.WriteLine(I);
Console.WriteLine(I.Compile().DynamicInvoke("Bart"));

In other words, today we’ll focus on the plumbing of emitting the code and wrapping the method in a delegate that can be returned upon calling LambdaDynamicExpression.Compile. The result for the sample above would be equivalent to:

public Delegate Compile()
{
    return new Func<object, object>(delegate(object x) { return x; });
}

The underlined portion is the code corresponding to I’s compilation. In IL-terms it would be as simplistic as this:

ldarg.0
ret

This emitted IL method body then needs to get the signature that says “taking in an object, returning an object”. All of this makes up the dynamic method. But we’re not done yet, as we need to return a delegate to it. In the free translation above, I’ve leveraged the generic System.Func<T1,R> delegate but we only have a limited number of those (up to four arguments), so what if we encounter a method that takes more arguments? Indeed, we’ll need to generate our own delegate types as well. Notice we could cache those very efficiently: the ones with arity (~ number of parameters) up to 4 could simply be mapped onto System.Func delegates with System.Object type parameters, while others would be generated on the fly and kept for reuse if another method with same arity gets compiled. We’ll omit this optimization for now.

Here’s how the code to create our own delegate type looks like:

private static Type GetDynamicDelegate(Type[] argumentTypes, Type returnType)
{
    //
    // Assemblies contain modules; generate those with unique names.
    // The generated assembly is runtime only (doesn't need to be saved to disk).
    //
    AssemblyBuilder assemblyBuilder = AppDomain.CurrentDomain.DefineDynamicAssembly(new AssemblyName(Guid.NewGuid().ToString()), AssemblyBuilderAccess.Run);
    ModuleBuilder moduleBuilder = assemblyBuilder.DefineDynamicModule(Guid.NewGuid().ToString());

    //
    // Our delegate is a private sealed type deriving from MultiCastDelegate.
    //
    TypeBuilder typeBuilder = moduleBuilder.DefineType("Lambdas", TypeAttributes.NotPublic | TypeAttributes.Sealed | TypeAttributes.AutoLayout | TypeAttributes.AnsiClass, typeof(MulticastDelegate));

    //
    // The delegate's constructor is a "special name" method with signature (object native int).
    // It doesn't have a method body by itself; rather, it's supplied by the managed runtime.
    //
    ConstructorBuilder ctorBuilder = typeBuilder.DefineConstructor(MethodAttributes.Public | MethodAttributes.HideBySig | MethodAttributes.SpecialName | MethodAttributes.RTSpecialName, CallingConventions.Standard, new Type[] { typeof(object), typeof(IntPtr) });
    ctorBuilder.SetImplementationFlags(MethodImplAttributes.Runtime | MethodImplAttributes.Managed);

    //
    // We only need the Invoke method (BeginInvoke and EndInvoke are irrelevant for us).
    // It doesn't have a method body by itself; rather, it's supplied by the managed runtime.
    // Here our delegate signature is enforced.
    //
    MethodBuilder invokeMethodBuilder = typeBuilder.DefineMethod("Invoke", MethodAttributes.Public | MethodAttributes.NewSlot | MethodAttributes.HideBySig | MethodAttributes.Virtual, CallingConventions.HasThis, returnType, argumentTypes);
    invokeMethodBuilder.SetImplementationFlags(MethodImplAttributes.Runtime | MethodImplAttributes.Managed);

    //
    // Return the created delegate type.
    // Notice we could cache this for reuse by other dynamic methods.
    //
    return typeBuilder.CreateType();
}

Lots of attribute flags which you can read all about in the CLI specification. I don’t pretend to memorize all of those attributes; why would I if ILDASM makes life just great? :-)

image

This is the screenshot of ILDASM showing a delegate for a method with signature object(object, object) as you can see on the Invoke method. We don’t need any of the asynchronous pattern implementation, so we just need a constructor and Invoke method (see section IIA.13.6 on “Delegates” in the CLI standard). One special thing about those is they don’t have an IL code body as they are “runtime managed” (see IIA.14.4.3 on “Implementation Attributes of Methods” in the CLI standard):

image

Now that we can generate the delegate, we just need to ask the lambda (since that’s the root) expression tree to emit its IL code, which will traverse the entire tree. In order to be able to do this, we need to keep mapping information about the lambda parameters mapped onto the formal arguments as mentioned earlier. Here’s the result:

public Delegate Compile()
{
    //
    // Map the lambda parameters onto formal argument indices.
    // Also build up the argument type array.
    //
    var args = new Type[Parameters.Count];
    var ldArgs = new Dictionary<ParameterDynamicExpression, int>();
    for (int i = 0; i < args.Length; i++)
    {
        args[i] = typeof(object);
        ldArgs[Parameters[i]] = i;
    }

    //
    // Compile the expression tree to an IL method body.
    //
    var method = new DynamicMethod("", typeof(object), args);
    var ilgen = method.GetILGenerator();
    Compile(ilgen, ldArgs);

    //
    // Get the delegate matching the dynamic method signature.
    //
    Type dynamicDelegate = GetDynamicDelegate(args, typeof(object));

    //
    // Return a delegate pointing at our dynamic method.
    //
    return method.CreateDelegate(dynamicDelegate);
}

This code should be relatively straightforward. First we create the mapping while building up an array just containing typeof(object)’s (since all arguments are objects in our dynamic world). Next we create a dynamic method with the right signature, produce the IL generator and let the expression compilation do all of the work to emit the IL. And finally we stick the whole thing in a dynamically created delegate that matches the signature, returning that to the caller. Setting a breakpoint on the last line and executing for the “I” identity combinator shows this:

image

This is the IL visualizer we installed earlier. Notice the friendly string representation for the dynamic method shows the signature, which matches the one of the dynamic lambda in the watch window (which is just “I”). Bringing up the IL visualizer shows stunningly complex code:

image

Ignore the NOPs inserted by the IL generator, but IL_0000 was emitted by ParameterDynamicExpression.Compile through the compilation of the lambda body. IL_0006 was emitted subsequently by LambdaDynamicExpression.Compile and the stack is nicely in balance. Sure enough, the result printed is:

image

Woohoo – truly dynamic (though simplistic) execution! Next time: method call expressions and binders.

Del.icio.us | Digg It | Technorati | Blinklist | Furl | reddit | DotNetKicks

In the previous post, I outlined the use of the expression trees from the System.Linq.Expressions namespace. Let’s recap to set the scene:

Expression<Func<string, int, int, string>> data = (string s, int a, int b) => s.Substring(a, b);

produces (deep breadth)

ParameterExpression s = Expression.Parameter(typeof(string), “s”);
ParameterExpression a = Expression.Parameter(typeof(int), “a”);
ParameterExpression b = Expression.Parameter(typeof(int), “b”);

Expression<Func<string, int, int, string>> data = Expression.Lambda<Func<string, int, int, string>>(
    Expression.Call(s, typeof(string).GetMethod(“Substring”, new Type[] { typeof(int), typeof(int) }), a, b),
    s, a, b
);

Func<string, int, int, string> fun = data.Compile();
Console.WriteLine(fun(“Bart”, 1, 2));

where I’ve indicated all the strong typing using underlines. Wow, that’s a lot dude! Based on all of this strong typing, there are little or no runtime surprises possible concerning running the right method (unless a MissingMethodException occurs for some reason). Obviously, expression trees could be much more complex but to illustrate the core points of the type system, we’ll restrict ourselves to parameters, method calls and lambdas.

So what do we want to try now? We want to design an API similar to the one used above but without all this type information. Essentially, it would look like:

ParameterExpression s = Expression.Parameter(“s”);
ParameterExpression a = Expression.Parameter(“a”);
ParameterExpression b = Expression.Parameter(“b”);

LambdaExpression data = Expression.Lambda(
    Expression.Call(s, “Substring”, a, b),
    s, a, b
);

Delegate fun = data.Compile();
Console.WriteLine(fun.DynamicInvoke(“Bart”, 1, 2));

and all of a sudden we see the dynamic aspect lurking around the corner on the very last line where we call through the weakly-typed delegate passing in three objects which just happen to be a string and two ints, causing the lookup for a Substring method applied to s (becoming “Bart”) with arguments a and b (respectively 1 and 2) to succeed. The important thing here though is that a “Substring” method with a compatible signature might be available on another type, maybe type Bar, but taking in two longs:

class Bar
{
    public Foo Substring(long a, long b) { … }
}

The calling code would still work and the console would print the result of Foo.ToString on the instance returned by Bar.Substring. What it takes to make this work consists of three things:

  • Dynamic expression trees (i.e. as the one above with stripped type information);
  • IL code generation at runtime on the fly (to produce the delegate “fun” in the sample above)
  • Binders (things that provide runtime support to resolve method calls)

Of course you could go much further than this with complete ASTs (the big brothers to expression trees) and “rules” but we’re not going to reinvent the DLR :-). Martin Maly has quite some information on those topics on his blog (must-reads!). Today we’ll cover the first bullet point.

 

Our dynamic expression trees

To disambiguate with the LINQ expression trees, let’s sneak the word Dynamic in, making our sample look like:

ParameterDynamicExpression s = DynamicExpression.Parameter(“s”);
ParameterDynamicExpression a = DynamicExpression.Parameter(“a”);
ParameterDynamicExpression b = DynamicExpression.Parameter(“b”);

LambdaDynamicExpression data = DynamicExpression.Lambda(
    DynamicExpression.Call(s, “Substring”, a, b),
    s, a, b
);

Delegate fun = data.Compile();
Console.WriteLine(fun.DynamicInvoke(“Bart”, 1, 2));

All of those *DynamicExpression classes extend the DynamicExpression base class while having a factory method on DynamicExpression too, following the design of LINQ’s expression trees. We’ll omit the NodeType property for simplicity and the Type property, because we obviously don’t want a static type to be associated with each expression tree node. We’ll also get rid of a lot of node types, just leaving the factories in for our three node types:

image

So, how does this look like? The factory methods will be just convenient syntax around internal constructor calls producing the concrete node types. In addition, we’ll override ToString to produce a friendly-on-the-eye string representation of expression trees, much like our static LINQ friends. First, the DynamicExpression base class:

abstract class DynamicExpression
{
    public static ParameterDynamicExpression Parameter(string name)
    {
        return new ParameterDynamicExpression(name);
    }

    public static MethodCallDynamicExpression Call(DynamicExpression instance, string method, params DynamicExpression[] arguments)
    {
        return new MethodCallDynamicExpression(instance, method, arguments);
    }

    public static LambdaDynamicExpression Lambda(DynamicExpression body, params ParameterDynamicExpression[] parameters)
    {
        return new LambdaDynamicExpression(body, parameters);
    }
    protected internal abstract void ToString(StringBuilder sb);

    public override string ToString()
    {
        StringBuilder sb = new StringBuilder();
        ToString(sb);
        return sb.ToString();
    }
}

We’ll extend this class a bit more in the next part where we’ll tackle compilation, but let’s move on to each of the three subtypes right now:

sealed class ParameterDynamicExpression : DynamicExpression
{
    internal ParameterDynamicExpression(string name)
    {
        Name = name;
    }

    public string Name { get; private set; }

    protected internal override void ToString(StringBuilder sb)
    {
        sb.Append(Name);
    }
}

Nothing surprising here. The expression for a dynamic method call is a slightly bit more complicated :-)…

sealed class MethodCallDynamicExpression : DynamicExpression
{
    internal MethodCallDynamicExpression(DynamicExpression instance, string method, params DynamicExpression[] arguments)
    {
        Object = instance;
        Method = method;
        Arguments = new ReadOnlyCollection<DynamicExpression>(arguments);
    }

    public ReadOnlyCollection<DynamicExpression> Arguments { get; private set; }
    public DynamicExpression Object { get; private set; }
    public string Method { get; private set; }
    protected internal override void ToString(StringBuilder sb)
    {
        Object.ToString(sb);
        sb.Append(".");
        sb.Append(Method);
        sb.Append("(");

        int n = Arguments.Count;
        for (int i = 0; i < n; i++)
        {
            Arguments[i].ToString(sb);
            if (i != n - 1)
                sb.Append(", ");
        }

        sb.Append(")");
    }
}

I told’ya it was going to be mind-blowing. The core thing to notice though is the composability because of the use of DynamicExpressions as the Object (i.e. the instance where we’ll invoke the method on) and the Arguments collection members. Also notice we don’t support static method calls in here (for which Object would be null – you can envision the right checking in the factory method, omitted for brevity) although it would be perfectly possible to come up with such a thing (think about “global functions” for example, but remember we don’t have a type that tells us where to look for the method – ideally you’d have a mixture of statically and dynamically typed trees interwoven). Oh, and the pretty printing logic in ToString isn’t too complex either…

Finally, let’s move on to the lambda expression class:

sealed class LambdaDynamicExpression : DynamicExpression
{
    internal LambdaDynamicExpression(DynamicExpression body, params ParameterDynamicExpression[] parameters)
    {
        Body = body;
        Parameters = new ReadOnlyCollection<ParameterDynamicExpression>(parameters);
    }

    public DynamicExpression Body { get; private set; }
    public ReadOnlyCollection<ParameterDynamicExpression> Parameters { get; private set; }

    public Delegate Compile()
    {
        return null;
    }
    protected internal override void ToString(StringBuilder sb)
    {
        sb.Append("(");

        int n = Parameters.Count;
        for (int i = 0; i < n; i++)
        {
            Parameters[i].ToString(sb);
            if (i != n - 1)
                sb.Append(", ");
        }

        sb.Append(") => ");
        Body.ToString(sb);
    }
}

Same deal concerning parameterization based on DynamicExpression instances. I promise you the Compile method will be pretty interesting to say the least, so stay tuned for the next post!

Del.icio.us | Digg It | Technorati | Blinklist | Furl | reddit | DotNetKicks

By now, most of my blog readers will be familiar with the simple concept of expression trees in C# 3.0 and VB 9.0. A quick recap. What’s the type of the expression below?

(string s, int a, int b) => s.Substring(a, b)

Shockingly, you can’t tell. Why? Lambdas have two forms of representation: code (as anonymous methods) or data (as expression trees). One more time:

Func<string, int, int, string> code = (string s, int a, int b) => s.Substring(a, b);

produces

Func<string, int, int, string> code = delegate (string s, int a, int b) { return s.Substring(a, b); };

where

Expression<Func<string, int, int, string>> data = (string s, int a, int b) => s.Substring(a, b);

produces (deep breadth)

ParameterExpression s = Expression.Parameter(typeof(string), “s”);
ParameterExpression a = Expression.Parameter(typeof(int), “a”);
ParameterExpression b = Expression.Parameter(typeof(int), “b”);

Expression<Func<string, int, int, string>> data = Expression.Lambda<Func<string, int, int, string>>(
    Expression.Call(s, typeof(string).GetMethod(“Substring”, new Type[] { typeof(int), typeof(int) }), a, b),
    s, a, b
);

When calling the Compile method on the produced (lambda-expression) data object we’ll get exactly the same IL code but generated at runtime through the form of a (delegate pointing at a) dynamic(ally generated) method using Reflection.Emit. So far, so good. But what are the characteristics of the expression tree’s generated code? Two things:

  • statically typed
  • early-bound

In more human terms, we know at compile time the type of all the involved expressions – at the entry points (i.e. the leaf levels) we’re passing it in with uttermost detail (see the Parameter factory method’s first argument for instance) but all intermediate nodes in the expression tree have a type too. E.g. the Expression.Call factory call returns a MethodCallExpression whose wrapped method call’s return type is System.String, hence that becomes the type of the node. Talking about method calls, those are bound at compile time too: we know precisely what method to call because of all the type information available about the arguments.

Isn’t this cool? Sure it is, but what about dynamic languages? Are those able to use this particular form of expression trees? The simple answer is no; well, at least not in a dynamic way, meaning with dynamic typing (i.e. the type of every node is determined at run time) and late binding for methods (i.e. we hope to find a suitable method overload at runtime). What can we do about this? Here we’re getting in the realm of DLR stuff and such. This blog series is not about DLR itself though, it’s rather about outlining some crucial concepts of dynamic typing including “dynamic expression trees” emitting dynamic call sites using binders.

In the first part coming up soon, we’ll start by investigating how to build dynamic expression trees.

Del.icio.us | Digg It | Technorati | Blinklist | Furl | reddit | DotNetKicks

By now, most – if not all – readers of my blog will be familiar with this C# 3.0 and VB 9.0 feature called Local Variable Type Inference or Implicitly Typed Local Variables. The idea is simple: since the compiler knows (and hence can infer) type information for expressions, also referred to as rvals, there’s no need for the developer to say the type. In most cases it’s a convenience, for example:

Dictionary<Customer, List<PhoneNumber>> phonebook = new Dictionary<Customer, List<PhoneNumber>>();

That’s literally saying the same thing twice: declare a variable of type mumble-mumble and assign it a new instance of type mumble-mumble. Wouldn’t it be nice just to say:

var phonebook = new Dictionary<Customer, List<PhoneNumber>>();

while it still means exactly the same as the original fragment? That’s what this language feature allows us to do without loosing any of the strong typing. The reason it’s very convenient in the sample above is because of the introduction of arbitrary type construction capabilities due to generics in CLR 2.0. Before this invention, types couldn’t compose arbitrarily big and type names tend to be not too long-winded (namespaces help here too).

As convenient as the case above can be, sometimes type inference is a requirement which is introduced by the invention of anonymous types. Typically those are used in projection clauses of LINQ queries although they can be used in separation as well. E.g.:

var res = from p in Process.GetProcesses()