Wednesday, August 27, 2008 4:05 PM bart

To Bind or Not To Bind – Dynamic Expression Trees – Part 2

Welcome back to the dynamic expression tree fun. Last time we designed our simplified expression tree class library we’ll be using to enable dynamic treatment of objects. Today, we’ll take this one step further by emitting IL code that resolves the operations invoked on such dynamic objects at runtime through a mechanism called binders. Before we dive in, let me point out that everything discussed in this series is greatly simplified just to illustrate the core ideas and base mechanisms/principles that make dynamic language stuff work.

 

Introducing IL generation

Dynamic code compilation is a wonderful thing. It’s not that hard once you get the basics right (and have some level of IL opcode understanding) but quite hard to debug. Luckily we have tools like Haibo Luo’s IL Visualizer. Since I’ll be using this, download it, extract the ZIP file, compile the whole solution and copy ILMonitor\bin\Debug\*.dll to %programfiles%\Microsoft Visual Studio 9.0\Common7\Packages\Debugger\Visualizers. Alternatively you can put it in your personal Visual Studio 2008\Visualizers folder.

So, what’s our task? Assume we have the following piece of sample code:

class Program
{
    static void Main(string[] args)
    {
        var o = DynamicExpression.Parameter("o");
        var a = DynamicExpression.Parameter("a");
        var b = DynamicExpression.Parameter("b");
        var call = DynamicExpression.Call(o, "Substring", a, b);
        var func = DynamicExpression.Lambda(call, o, a, b);
        Console.WriteLine(func);
        Console.WriteLine(func.Compile().DynamicInvoke("Bart", 1, 2));
    }
}

We already know how to construct the objects and to represent it as a string (which would be (o, a, b) => o.Substring(a, b)). Now we need to focus on the marked Compile method on LambdaDynamicExpression. Starting with the signature of the delegate, we want to create (at runtime) a method that takes in three “dynamic” parameters (corresponding to parameter expressions o, a and b), returning a resulting object. Since we don’t have any type information available, everything should be System.Object, so we’d end up with the following delegate:

delegate object TheDynamicLambdaFunction(object o, object a, object b);

Looking at Compile as a black box, it will return an instance of this delegate pointing at an on-the-fly generated method corresponding to the lambda’s body expression. Returning a System.Delegate, one can call DynamicInvoke (or cast it to a compatible delegate) to invoke it with the given parameters. Obviously we want the call to do “the right thing”, in the sample above it would correspond to a method call to System.String::Substring on “Bart”, passing in startIndex 1 and length 2, producing another string containing “ar”.

It should be clear that we need to emit IL on the fly to translate the lambda expression but also the lambda’s body which could be anything, not just a MethodCallDynamicExpression. Since we lack other expression node types, one such (trivial) thing would be:

var x = DynamicExpression.Parameter("x");
var I = DynamicExpression.Lambda(x, x);
Console.WriteLine(I);
Console.WriteLine(I.Compile().DynamicInvoke("Bart"));

which is just the identity function (the underlined bold ‘o’ above indicates the lamdba’s body). I intentionally named the expression above “I” conform SKI combinators where I is defined as λx . x. Similarly we could define the K combinator as:

var x = DynamicExpression.Parameter("x");
var y = DynamicExpression.Parameter("y");
var K = DynamicExpression.Lambda(x, x, y);
Console.WriteLine(K);
Console.WriteLine(K.Compile().DynamicInvoke("Bart", 123));

but we got sidetracked, so time to move on. The whole point here is that we can’t assume the body of the lambda to be a MethodCallDynamicExpression. So, how do we tackle this? An important observation one can make is this: an expression represents a single value. Right, so what? Wait a minute, is IL-code not stack-based? Adding the two things together we could think of the following solution:

Calling a Compile method on an expression tree object, given a writeable stream for IL instructions, should add all the instructions to the stream required to evaluate the expression, leaving the result of the evaluation on top of the stack.

A LambdaDynamicExpression is the only dynamic expression that supports a publicly visible Compile method. It’s pseudo-code would look like:

  1. Create an IL stream; here the IL stack is empty.
  2. Take the Body expression and compile it by emitting IL instructions for it; this causes the IL stack to be one high.
  3. Add an IL return instruction to return the object on top of the stack.
  4. Return a delegate pointing to the method represented by the generated IL code.

 

Supporting expression compilation

To make this work, we’ll first extend the base class by adding one more method:

/// <summary>
/// Class representing a dynamic expression tree.
/// </summary>
abstract class DynamicExpression
{
    /// <summary>
    /// Appends IL instructions to calculate the expression's runtime value, putting it on top of the evaluation stack.
    /// </summary>
    /// <param name="ilgen">IL generator to append to.</param>
    /// <param name="ldArgs">Lambda argument mappings.</param>
    protected internal abstract void Compile(ILGenerator ilgen, Dictionary<ParameterDynamicExpression, int> ldArgs);

}

This method will take in two things: the IL generator (referred to as “IL stream” in the previous paragraph) and a mapping table for the lambda’s parameter expressions. Why do we need the latter, or better: what does it map the parameter expressions to? Assume we’re compiling

(o, a, b) => o.Substring(a, b)

While traversing the expression tree, asking every node in the correct order to emit IL instructions, we’ll encounter references to the parameters again. Our goal is to write a dynamic method looking like this:

object GeneratedDynamicMethod(object o, object a, object b)
{
    return o.Substring(a, b);
}

where the . obviously denotes a dynamic method call in this case. As we encounter parameter expressions like o, a or b during the translation for the method body, we need to know how to load those parameters from the argument list on the dynamic method. First of all, notice the lambda parameters got mapped in order of specification to correspond to arguments on the generated dynamic method, i.e. o as the first lambda parameter and is the first parameter on the generated method. And so on. This is precisely what the ldArgs argument on Compile stands for: a mapping from the parameter expression representation from the lambda parameters onto the concrete indices for the arguments:

o –> 0
a –> 1
b –> 2

Whenever we encounter such a parameter expression during the compilation, we know the position of the argument, so we can emit a ldarg instruction. This is the most trivial Compile override:

sealed class ParameterDynamicExpression : DynamicExpression
{
    protected internal override void Compile(ILGenerator ilgen, Dictionary<ParameterDynamicExpression, int> ldArgs)
    {
        if (!ldArgs.ContainsKey(this))
            throw new InvalidOperationException("Parameter expression " + Name + " is not in scope.");

        ilgen.Emit(OpCodes.Ldarg, ldArgs[this]);
    }
    …
}

This simply says, whenever code needs to be emitted for a ParameterDynamicExpression, simply try to find it in the dictionary to map it onto the formal parameter index on the dynamic method being emitted and turn it into a ldarg instruction for that argument index. Real full-fledged expression tree implementations would be slightly more complicated because arguments could be hidden when dealing with nested lambdas (quoting, invocation expressions, etc) but that would take us too far away from home.

For the LambdaDynamicExpression, besides a public Compile method, there will also be an override to the inherited one. It simply asks the Body expression to emit itself (which will result in a one-level high stack containing the evaluation result of the body expression), followed by a ret instruction (simply returning the value evaluated through the Body’s IL code preceding it):

sealed class LambdaDynamicExpression : DynamicExpression
{
    protected internal override void Compile(ILGenerator ilgen, Dictionary<ParameterDynamicExpression, int> ldArgs)
    {
        Body.Compile(ilgen, ldArgs);
        ilgen.Emit(OpCodes.Ret);
    }

}

 

Emitting code

For this post, we’ll omit an implementation for MethodCallDynamicExpression as that will be part of the next post focusing on binders. All we want to get to work today is the I combinator or identity function (yeah, another world-beater :-)):

var x = DynamicExpression.Parameter("x");
var I = DynamicExpression.Lambda(x, x);
Console.WriteLine(I);
Console.WriteLine(I.Compile().DynamicInvoke("Bart"));

In other words, today we’ll focus on the plumbing of emitting the code and wrapping the method in a delegate that can be returned upon calling LambdaDynamicExpression.Compile. The result for the sample above would be equivalent to:

public Delegate Compile()
{
    return new Func<object, object>(delegate(object x) { return x; });
}

The underlined portion is the code corresponding to I’s compilation. In IL-terms it would be as simplistic as this:

ldarg.0
ret

This emitted IL method body then needs to get the signature that says “taking in an object, returning an object”. All of this makes up the dynamic method. But we’re not done yet, as we need to return a delegate to it. In the free translation above, I’ve leveraged the generic System.Func<T1,R> delegate but we only have a limited number of those (up to four arguments), so what if we encounter a method that takes more arguments? Indeed, we’ll need to generate our own delegate types as well. Notice we could cache those very efficiently: the ones with arity (~ number of parameters) up to 4 could simply be mapped onto System.Func delegates with System.Object type parameters, while others would be generated on the fly and kept for reuse if another method with same arity gets compiled. We’ll omit this optimization for now.

Here’s how the code to create our own delegate type looks like:

private static Type GetDynamicDelegate(Type[] argumentTypes, Type returnType)
{
    //
    // Assemblies contain modules; generate those with unique names.
    // The generated assembly is runtime only (doesn't need to be saved to disk).
    //
    AssemblyBuilder assemblyBuilder = AppDomain.CurrentDomain.DefineDynamicAssembly(new AssemblyName(Guid.NewGuid().ToString()), AssemblyBuilderAccess.Run);
    ModuleBuilder moduleBuilder = assemblyBuilder.DefineDynamicModule(Guid.NewGuid().ToString());

    //
    // Our delegate is a private sealed type deriving from MultiCastDelegate.
    //
    TypeBuilder typeBuilder = moduleBuilder.DefineType("Lambdas", TypeAttributes.NotPublic | TypeAttributes.Sealed | TypeAttributes.AutoLayout | TypeAttributes.AnsiClass, typeof(MulticastDelegate));

    //
    // The delegate's constructor is a "special name" method with signature (object native int).
    // It doesn't have a method body by itself; rather, it's supplied by the managed runtime.
    //
    ConstructorBuilder ctorBuilder = typeBuilder.DefineConstructor(MethodAttributes.Public | MethodAttributes.HideBySig | MethodAttributes.SpecialName | MethodAttributes.RTSpecialName, CallingConventions.Standard, new Type[] { typeof(object), typeof(IntPtr) });
    ctorBuilder.SetImplementationFlags(MethodImplAttributes.Runtime | MethodImplAttributes.Managed);

    //
    // We only need the Invoke method (BeginInvoke and EndInvoke are irrelevant for us).
    // It doesn't have a method body by itself; rather, it's supplied by the managed runtime.
    // Here our delegate signature is enforced.
    //
    MethodBuilder invokeMethodBuilder = typeBuilder.DefineMethod("Invoke", MethodAttributes.Public | MethodAttributes.NewSlot | MethodAttributes.HideBySig | MethodAttributes.Virtual, CallingConventions.HasThis, returnType, argumentTypes);
    invokeMethodBuilder.SetImplementationFlags(MethodImplAttributes.Runtime | MethodImplAttributes.Managed);

    //
    // Return the created delegate type.
    // Notice we could cache this for reuse by other dynamic methods.
    //
    return typeBuilder.CreateType();
}

Lots of attribute flags which you can read all about in the CLI specification. I don’t pretend to memorize all of those attributes; why would I if ILDASM makes life just great? :-)

image

This is the screenshot of ILDASM showing a delegate for a method with signature object(object, object) as you can see on the Invoke method. We don’t need any of the asynchronous pattern implementation, so we just need a constructor and Invoke method (see section IIA.13.6 on “Delegates” in the CLI standard). One special thing about those is they don’t have an IL code body as they are “runtime managed” (see IIA.14.4.3 on “Implementation Attributes of Methods” in the CLI standard):

image

Now that we can generate the delegate, we just need to ask the lambda (since that’s the root) expression tree to emit its IL code, which will traverse the entire tree. In order to be able to do this, we need to keep mapping information about the lambda parameters mapped onto the formal arguments as mentioned earlier. Here’s the result:

public Delegate Compile()
{
    //
    // Map the lambda parameters onto formal argument indices.
    // Also build up the argument type array.
    //
    var args = new Type[Parameters.Count];
    var ldArgs = new Dictionary<ParameterDynamicExpression, int>();
    for (int i = 0; i < args.Length; i++)
    {
        args[i] = typeof(object);
        ldArgs[Parameters[i]] = i;
    }

    //
    // Compile the expression tree to an IL method body.
    //
    var method = new DynamicMethod("", typeof(object), args);
    var ilgen = method.GetILGenerator();
    Compile(ilgen, ldArgs);

    //
    // Get the delegate matching the dynamic method signature.
    //
    Type dynamicDelegate = GetDynamicDelegate(args, typeof(object));

    //
    // Return a delegate pointing at our dynamic method.
    //
    return method.CreateDelegate(dynamicDelegate);
}

This code should be relatively straightforward. First we create the mapping while building up an array just containing typeof(object)’s (since all arguments are objects in our dynamic world). Next we create a dynamic method with the right signature, produce the IL generator and let the expression compilation do all of the work to emit the IL. And finally we stick the whole thing in a dynamically created delegate that matches the signature, returning that to the caller. Setting a breakpoint on the last line and executing for the “I” identity combinator shows this:

image

This is the IL visualizer we installed earlier. Notice the friendly string representation for the dynamic method shows the signature, which matches the one of the dynamic lambda in the watch window (which is just “I”). Bringing up the IL visualizer shows stunningly complex code:

image

Ignore the NOPs inserted by the IL generator, but IL_0000 was emitted by ParameterDynamicExpression.Compile through the compilation of the lambda body. IL_0006 was emitted subsequently by LambdaDynamicExpression.Compile and the stack is nicely in balance. Sure enough, the result printed is:

image

Woohoo – truly dynamic (though simplistic) execution! Next time: method call expressions and binders.

Del.icio.us | Digg It | Technorati | Blinklist | Furl | reddit | DotNetKicks

Filed under: ,

Comments

# Reflective Perspective - Chris Alcock &raquo; The Morning Brew #167

Pingback from  Reflective Perspective - Chris Alcock  &raquo; The Morning Brew #167

# Dew Drop - August 28, 2008 | Alvin Ashcraft's Morning Dew

Thursday, August 28, 2008 5:59 AM by Dew Drop - August 28, 2008 | Alvin Ashcraft's Morning Dew

Pingback from  Dew Drop - August 28, 2008 | Alvin Ashcraft's Morning Dew

# WMOC#18 - Microsoft Solver Foundation released - Service Endpoint

Pingback from  WMOC#18 - Microsoft Solver Foundation released - Service Endpoint

# Recent Faves Tagged With "cli" : MyNetFaves

Thursday, September 25, 2008 4:20 AM by Recent Faves Tagged With "cli" : MyNetFaves

Pingback from  Recent Faves Tagged With "cli" : MyNetFaves

# Websites tagged "opcodes" on Postsaver

Wednesday, March 04, 2009 11:32 AM by Websites tagged "opcodes" on Postsaver

Pingback from  Websites tagged "opcodes" on Postsaver

# Recent Faves Tagged With "generator" : MyNetFaves

Saturday, April 11, 2009 1:20 PM by Recent Faves Tagged With "generator" : MyNetFaves

Pingback from  Recent Faves Tagged With "generator" : MyNetFaves