July 2009 - Posts

Introduction

In the latest episode in this series I talked about hypothetical compile-time rewriting facilities that would be used to turn our intermediate representation of control-flow driven code using our Control Library into efficient IL code. In a somewhat hand-waving way I communicated the fairly trivial nature of such a translation but as I don’t want to hold any information back, let’s have a look at it today.

But first, refresh your mind on what we’re doing here. Starting from C# code, we’re in the midst of proving we can compile away control flow concepts as a first class citizen in the language and replace them by library calls. Using method calls for such concepts allows for extensibility, e.g. overloading the meaning of “if” when applied to a certain operand (instead of requiring a more generally applicable implicit conversion to Boolean for example). After all, this is more of an academic exercise at this point, but definitely worth the brain cycles.

Once we end up with (control flow library) calls, we’re at a junction. Either we can leave the calls as-is, which means the runtime library will drive the execution. Since this involves calling through delegates, using exceptions for control transfers and gives rise to closure objects that are heap-allocated (hence increasing pressure on the garbage collector) we’re definitely causing a lot of overhead in the regular case. So the alternative is to envision a library implementing such control flow primitives to be able to opt in to “compile-time rewriting” where the compiler would hand over the code in terms of an object model and ask to rewrite it to whatever IL code the library wishes to result. For example:

Measure(“critical region 1”, () =>
{
    // Do whatever you want here.
});

could be compile-time rewritten into (just making up something here):

Measurement m = Measure.Create(“critical region 1”);
try
{
    m.Start();
    // Do whatever you want here.
}
fault
// Non-existing C# construct; exists at IL level.
{
    m.Failed();
}
finally
{
    m.Stop();
}

And if you fantasize about lightweight syntax for block-passing code, you’d end up with something that almost looks like a new keyword:

Measure(“critical region 1”)
{
    // Do whatever you want here.
}

Desirable or not, doesn’t matter. As long as we’re having fun on Crazy Sundays :-). This whole setting is depicted below:

image 

 

A domain-specific rewriter

With .NET 4.0’s introduction of statement trees, all of this is no longer plain fantasy. In fact, as shown in the previous post, we can represent our code fragment for a loop by using the enhanced System.Linq.Expressions API. Below is the fragment we’ve shown before for a simple (though meaningless) loop:

var ctrl = Expression.Parameter(typeof(LoopControl), "ctrl");
var cwi = (from mi in typeof(Console).GetMethods()
           where mi.Name == "WriteLine"
           let p = mi.GetParameters()
           where p.Length == 1 && p[0].ParameterType == typeof(int)
           select mi).Single();
var i = Expression.Variable(typeof(int), "i");
var m = Expression.Block(new [] { i },
    Expression.Assign(i, Expression.Constant(0)),
    Expression.Call(
        typeof(Control).GetMethod("While"),
        Expression.Lambda(Expression.LessThan(i, Expression.Constant(10))),
        Expression.Lambda(
            Expression.Block(
                Expression.Call(cwi, Expression.PostIncrementAssign(i)),
                Expression.Call(
                    typeof(Control).GetMethod("If"),
                    Expression.Lambda(Expression.GreaterThan(i, Expression.Constant(5))),
                    Expression.Lambda(
                        Expression.Call(ctrl, typeof(LoopControl).GetMethod("Break"))
                    )
                ),
                Expression.Call(
                    typeof(Control).GetMethod("If"),
                    Expression.Lambda(
                        Expression.Equal(
                            Expression.Divide(i, Expression.Constant(2)),
                            Expression.Constant(0)
                        )
                    ),
                    Expression.Lambda(
                        Expression.Call(ctrl, typeof(LoopControl).GetMethod("Continue"))
                    )
                ),
                Expression.Call(cwi, i)
            ),
            ctrl
        )
    )
);

Compiling this expression tree to a lambda expression would give us an Action delegate that will print the loop’s output to the screen just fine, if we’ve implemented the Control.While and Control.If methods correctly (as well as the Break and Continue methods used to translate break and continue statements). As we know, that triggers the disadvantageous runtime-driven execution path, so instead we’d like to cross-compile the tree above to the most efficient IL code possible, just as if we had written the loop in the old-style.

For now, we’ll just look at this problem in isolation. In other words, we’re going to write a translator that turns the tree above into another (more efficient, control runtime library independent) tree, but making that translator tied to the Control methods. The way to deal with expression trees is by means of a visitor, which basically walks the tree recursively and allows us to produce a new tree out of it. The visitor pattern is a well-known recipe to do this kind of stuff and in fact .NET 4.0 comes with an ExpressionVisitor base class to write such a visitor:

image

Being an abstract class, the way to use this guy is to derive from it and override any virtual method corresponding to an expression tree node you’re interested in to inspect and/or rewrite during the visitation. On an instance of the resulting visitor you simply call Visit, passing it an expression tree, and out comes a new expression tree that contains the resulting tree:

image

It’s clear our rewriter will have to recognize static methods on the Control type: While and If to keep things simple. When visiting such a node, child nodes will exist for the parameters passed to the method. For If there’s the condition parameter and the body (we’ll ignore a call to Else for sake of simplicity); for While there’s a condition and a body as well. While is a bit different though: when visiting that one’s body, we need to look for other types of method calls, namely the ones that invoke Break or Continue.

Let’s start with the simple part: creating our visitor type, deriving from ExpressionVisitor and overriding the VisitMethodCall method that will be called (recursively) for each method call encountered in the tree:

sealed class MyVisitor : ExpressionVisitor
{
    protected override Expression VisitMethodCall(MethodCallExpression node)
    {
        return base.VisitMethodCall(node);
    }
}

Can’t get any simpler. In fact this is a no-op visitor; it simply clones the existing tree. The base method call turns the base functionality of the visitor in action. For a MethodCallExpression it visits the expression for the object on which the method is called (if any, a static method won’t have such an object) and the expressions representing the arguments:

image

Of course we want to do something more meaningful. We’ll assume we have the two well-known control flow methods available on the Control type, and want to look for those when called on VisitMethodCall to take appropriate action:

public static void While(this Func<bool> condition, Action<LoopControl> body) { … }
public static IfBlock If(this Func<bool> condition, Action @if) { … }

This switch on method is easy to do as all the information is obviously available in the MethodCallExpression instance that’s passed in:

protected override Expression VisitMethodCall(MethodCallExpression node)
{
    var type = node.Method.DeclaringType;

    if (type == typeof(Control))
    {
        switch (node.Method.Name)
        {
            case "If":
                return VisitIf(node);
            case "While":
                return VisitWhile(node);
            default:
                throw new NotSupportedException("Unknown control flow method encountered: " + node.Method);
        }
    }
    /* more to follow */

    return base.VisitMethodCall(node);
}

This code should be pretty straightforward to grasp. The Method property returns the MethodInfo reflection object representing the method that’s intended to be called by the code represented by the MethodCallExpression node. With DeclaringType and Name properties on this MethodInfo we’re in reflection land, which should be clear to all readers. Next, we simply switch on the method name and dispatch to another helper method. If the declaring type wasn’t (our) Control, we keep our hands off and let the base type visitor implementation do its work. It will simply copy the tree. Such a situation arises for Console.WriteLine calls in our code, or whatever methods calls the user wrote in the code fragment fed in to the visitor.

What’s VisitIf and VisitWhile about? That’s were the magic happens. Here we want to translate the node to a more efficient form that eliminates delegate invocations, use of exceptions and whatnot, substituting it for a plain statement tree node like ConditionalExpression or LoopExpression. Let’s start by VisitIf to set the scene:

private Expression VisitIf(MethodCallExpression node)
{
    var ctrlCondition = (LambdaExpression)node.Arguments[0];
    var ctrlBody = (LambdaExpression)node.Arguments[1];

    var condition = Visit(ctrlCondition.Body);
    var body = Visit(ctrlBody.Body);

    return // What about IfThenElse case? Left as a simple exercise for the reader.
        Expression.IfThen(
            condition,
            body
        );
}

Okay, what about explaining this code based on a sample? Let’s write the following:

var sample =
    Expression.Call(
        typeof(Control).GetMethod("If"),                // Control.If(
        Expression.Lambda(Expression.Constant(true)),   //    () => true,
        Expression.Lambda(Expression.Empty())           //    () => { }
    );                                                  // );

var res = new MyVisitor().Visit(sample);

I can’t get it any simpler. It doesn’t do anything meaningful but will trigger our code:

image

Notice the debug view. The default(Void) represents the empty statement. Let’s have a closer look using the new DebugView property which shows a textual representation of the expression tree:

image

Told you it was a small thing, no? Have a look:

image

.Call, fine enough. To our Control.If method, also fine. Passing in two lambda expressions with some symbolic name. One of type Func<bool> taking in no arguments and producing true, another one of type Action with no arguments and a “void” return value (= nada). Looks good. But what do we want to end up with? Really the code should behave as if the user had written:

if (true)
    ;

instead. This is simply to do: get the body of both lambda expressions and feed them into the Expression.IfThen factory method. One caveat: both bodies could be arbitrarily complex. The “true” condition could in reality be any expression, maybe by itself calling a Control.* method (which seems nonsense here, but let’s go with the flow here). More realistic is the body of the if statement containing another control flow call, like one or more other calls to Control.If or Control.While with arbitrary levels of nesting. In other words, we need to visit the tree recursively:

    var ctrlCondition = (LambdaExpression)node.Arguments[0];
    var ctrlBody = (LambdaExpression)node.Arguments[1];

    var condition = Visit(ctrlCondition.Body);
    var body = Visit(ctrlBody.Body);

But ultimately the promise of Visit is to give us back the translated (sub)tree, so once we’ve done that faithfully, we can starting gluing things together again, this time in a more efficient ConditionalExpression node created by the IfThen factory:

    return // What about IfThenElse case? Left as a simple exercise for the reader.
        Expression.IfThen(
            condition,
            body
        );

And that’s it. Told you it was easy, didn’t I?

A little problem?

In fact we have a little problem. Our Control.If method returns an IfBlock instance, used to (optionally) call Else on. Some fool could write things like:

Control.If(() => Control.If(…, …) != null, () => { … });

Here the Control.If call is used in the context of an expression to be compared against null (or it could even be compared to another If call’s return value, or assigned to a variable, or … oh my). Such uses should be rejected by the visitor to be really precise. However, we assume for now the only way calls to Control.If can occur in the tree is because of some hypothetical front-end language compiler emitting calls to it. Along those lines we assume that front-end doesn’t emit invalid uses. This is naive, but acceptable for the sake of the demo.

Let’s have a peek at the result. Feel free to step through the code line by line, but ultimately inspect the res variable in the Main method where we return from the Visit call:

image

Looks right, doesn’t it? The debug view always shows the .Else case but it’s also supplied with the empty statement, so that’s fine. We’ve compiled away the Control.If call. Wiiii!

Next is the While call. This is a bit more complicated. Not on the level of the call itself, but in the recursive part. Why? Let’s again set the scene by writing total nonsense, but just the bare minimum piece of code illustrating the point:

var @break = typeof(LoopControl).GetMethod("Break");          // LoopControl.Break
var @continue = typeof(LoopControl).GetMethod("Continue");    // LoopControl.Continue

var ctrl = Expression.Parameter(typeof(LoopControl), "ctrl"); // ctrl (lambda parameter for loop body)

var sample =
    Expression.Call(
        typeof(Control).GetMethod("While"),            
//    Control.While(
        Expression.Lambda(Expression.Constant(true)),  
//       ()   => true,
        Expression.Lambda(                              //       ctrl =>
            Expression.Block(                          
// /*     ^   */ {
                Expression.Call(ctrl, @break),         
// /*     |   */      break;
                Expression.Call(ctrl, @continue)       
// /*     |   */      continue;
            ),                                         
// /*     |   */ }
            ctrl                                       
// /* ----+   */
        )                                               //    );
    );

var res = new MyVisitor().Visit(sample);

Okay, it’s a bit bigger but not too bad. Look at the comments to correlate to familiar C# notation. Don’t you love constructor algebraic APIs? Inside the debugger view it may look more manageable:

image

3.14-nuts. A call to Control.While, two arguments. Two lambda expressions again, but now the second one takes an argument of type LoopControl, which as we recall from previous time around is our “handle to the currently executing loop” ready for us to receive Break or Continue calls. And that’s exactly what we’re doing inside the body code block. We just wrote a loop that’s total nonsense in C# and only worth it for its side-effects that originate from evaluating the condition:

while (someExpressionHere)
{
    break;
    continue; // Unreachable code detected.
}

Of course, the real C# compiler would baffle at us:

image

Exercise for the reader

Write a “reachability analysis” visitor to detect control flow library calls that violate unreachable code rules. Or, make it a post-rewriting phase in which you can “simply” look out for GotoExpressions. This is not a trivial exercise at all, so no worries if you don’t get it in a matter of minutes :-). Tip: basic blocks.

Anyway, the point we’re making is this: when recursively visiting the expression for the body of the loop, we need to look out for calls to the Break and Continue methods as well. Not just “calls”, very specific calls. If two loops are nested, with our library one can jump out of a loop other than the immediately enclosing one. So we need to correlate back the call to Break or Continue with the jump target. But first, let’s see how to write a loop in the expression tree API. This will set the direction:

image

Three things are expected from us: a body, a break target label and a continue target label. Here’s the code that creates the minimal loop that loops forever:

var @break = Expression.Label();
var @continue = Expression.Label();

var loop =
    Expression.Loop(
        Expression.Call(
            typeof(Console).GetMethod("WriteLine", new Type[] { typeof(string) }),
            Expression.Constant("Forever")
        ),
        @break,
        @continue
    );

Expression.Lambda<Action>(loop).Compile()();

The labels we’re using a simply of type void (that’s what the parameterless overload to Label stands for). You may wonder, why do labels have a type? The deep answer lies in the stack-driven nature of the CLR. If you leave something on the top of the stack at the time you branch somewhere, you better make sure that every path leading to the jump destination has something with a compatible type on top of the stack. Let’s not go there as it would lead us far away from the straight path.

Screams for a debug view one more time? Here it is:

image

Notice where the two labels, #Label1 for continue and #Label2 for break, are put by the Expression.Loop constructor. This is nothing but an infinite loop. So, how to write something real based on this? Like this:

.Loop .LabelTarget #Continue: {
    // Check condition; if not met, jump to #Break
    // Do loop body. Translate break; to jump to #Break and continue; to jump to #Continue.
}
.LabelTarget #Break:

That’s exactly what we’re going to do. The while-loop condition gets negated and put in an if-statement that jumps to the break label if the condition isn’t met (anymore). Then comes the body, which we visit recursively and replace all occurrences of LoopControl.[Break|Continue] calls in by the appropriate jumps. And that’s it.

One more thing we need to do is bookkeeping. Every time we encounter a Control.While(…, ctrl => { /* body */ }) call, we need to correlate the body “ctrl” lambda parameter with the pair of jump labels we’ve associated with the loop. When we encounter a call to either Continue or Break inside the loop body, we map back the loop through the object the method is called on. Doesn’t make sense? Have a look at the following diagram:

image

This happens during the initial translation of a new loop: we add an entry to a table where we keep track of loop control variables and the associated labels we should jump to upon encountering a call to Break or Continue. When we recursively visit the loop body, we need to consult the table to find the right jumping label.

image

This is precisely what our code for VisitWhile does:

  1. It news up labels for the jump targets of the loop. The expression tree API will take care of naming those in a unique way.
  2. The jump labels are stored and associated with the ParameterExpression that was used in the lambda for the body of the loop (second argument to the While call). We use a Tuple<LabelTarget, LabelTarget> to create an ad-hoc type for this duo of labels. Obviously such a mapping is represented by a Dictionary from ParameterExpression on the tuple type.
  3. We visit the condition and loop body recursively while the table contains the entry for the loop. When during this visitation method calls on LoopControl are found, the mapping table is consulted to emit the right jump instruction.
  4. After we’ve done the recursive part, we can clean up the table as nothing outside the loop will be able to jump to any of its labels. This is guaranteed by the fact the lambda parameter for the second parameter is, well, scope to that lambda expression for the body.
  5. Finally we construct the loop expression based on the secret sauce we revealed before: if not condition met, jump to break; otherwise, do loop body.

And here’s the corresponding code for VisitWhile:

private Expression VisitWhile(MethodCallExpression node)
{
    var ctrlCondition = (LambdaExpression)node.Arguments[0];
    var ctrlBody = (LambdaExpression)node.Arguments[1];
    var ctrlParam = ctrlBody.Parameters[0];

    var @break = Expression.Label();
    var @continue = Expression.Label();

    _jumpLabels.Add(ctrlParam, new Tuple<LabelTarget, LabelTarget>(@break, @continue));

    var condition = Visit(ctrlCondition.Body);
    var body = Visit(ctrlBody.Body);

    _jumpLabels.Remove(ctrlParam);

    return
        Expression.Loop(
            Expression.Block(
                Expression.IfThen(
                    Expression.Not(condition),
                    Expression.Goto(@break)
                ),
                body
            ),
            @break, @continue
        );
}

Should look fairly straightforward now, no? What’s _jumpLabels? Simply a field, defined like this:

private Dictionary<ParameterExpression, Tuple<LabelTarget, LabelTarget>> _jumpLabels = new Dictionary<ParameterExpression, Tuple<LabelTarget, LabelTarget>>();

No, no type inference for you there :-). Read this with the table shape in the figure in mind: the key is the lambda parameter, the value is a pair (a two-tuple) of two label target objects. We’re still missing one part: the one that takes care of detecting calls to Break or Continue and takes appropriate action based on the table. Where does that piece of the puzzle go? In our top-level VisitMethodCall method of course:

protected override Expression VisitMethodCall(MethodCallExpression node)
{
    var type = node.Method.DeclaringType;

    if (type == typeof(Control))
    {
        switch (node.Method.Name)
        {
            case "If":
                return VisitIf(node);
            case "While":
                return VisitWhile(node);
            default:
                throw new NotSupportedException("Unknown control flow method encountered: " + node.Method);
        }
    }

    else (type == typeof(LoopControl))
    {
        switch (node.Method.Name)
        {
            case "Break":
                return VisitBreak(node);
            case "Continue":
                return VisitContinue(node);
            default:
                throw new NotSupportedException("Unknown control flow control method encountered: " + node.Method);
        }
    }

    return base.VisitMethodCall(node);
}

Again, we delegate to two helper methods and I dared to refactor it a bit to make it as concise as possible:

private Expression VisitBreak(MethodCallExpression node)
{
    return VisitBreakOrContinue(node, t => t.Item1 /* break */, tgt => Expression.Break(tgt));
}

private Expression VisitContinue(MethodCallExpression node)
{
    return VisitBreakOrContinue(node, t => t.Item2 /* continue */, tgt => Expression.Break(tgt));

}

private Expression VisitBreakOrContinue(MethodCallExpression node, Func<Tuple<LabelTarget, LabelTarget>, LabelTarget> targetLabel, Func<LabelTarget, Expression> ctor)
{
    Tuple<LabelTarget, LabelTarget> targets;
    if (!_jumpLabels.TryGetValue(node.Object as ParameterExpression, out targets))
        throw new InvalidOperationException("Invalid control flow control variable: " + node.Object);

    return ctor(targetLabel(targets));
}

Also excuse me for being a bit lame with the error handling here (the “as” part is not quite ideal, we should throw a more proper exception if this returned null). So, what’s the result of all this fooling around between geeks and expression trees? A screenshot for the lame loop translation will tell:

var @break = typeof(LoopControl).GetMethod("Break");          // LoopControl.Break
var @continue = typeof(LoopControl).GetMethod("Continue");    // LoopControl.Continue

var ctrl = Expression.Parameter(typeof(LoopControl), "ctrl"); // ctrl (lambda parameter for loop body)

var sample =
    Expression.Call(
        typeof(Control).GetMethod("While"),            
//    Control.While(
        Expression.Lambda(Expression.Constant(true)),  
//       ()   => true,
        Expression.Lambda(                              //       ctrl =>
            Expression.Block(                          
// /*     ^   */ {
                Expression.Call(ctrl, @break),         
// /*     |   */      break;
                Expression.Call(ctrl, @continue)       
// /*     |   */      continue;
            ),                                         
// /*     |   */ }
            ctrl                                       
// /* ----+   */
        )                                               //    );
    );

var res = new MyVisitor().Visit(sample);

No, not fake at all:

image

It worked! So, what about our original contrived loop?

Console.WriteLine("Old school: ");
{
    int i = 0;
    while (i < 10)
    {
        Console.WriteLine(i++);
        if (i > 5)
            break;
        if (i % 2 == 0)
            continue;
        Console.WriteLine(i);
    }
}
Console.WriteLine();

Console.WriteLine("New school: ");
{
    int i = 0;
    Console.While(() => i < 10, ctrl =>
    {
        Console.WriteLine(i++);
        Control.If(() => i > 5, () =>
        {
            ctrl.Break();
        });
        Control.If(() => i % 2 == 0, () =>
        {
            ctrl.Continue();
        });
        Console.WriteLine(i);
    });
}
Console.WriteLine();

Console.WriteLine("After rewrite: ");
{
    var m = GetCode();
    var res = new MyVisitor().Visit(m);
    Expression.Lambda<Action>(res).Compile()();
}
Console.WriteLine();

The GetCode method simply returns the “m” expression tree shown near the start of the post, which is the equivalent to the preceding “New school” code. The “res” variable reads the following in the debugger:

image

Looks the same as the plain old loop, doesn’t it?

Exercise for the reader

Does the body of the loop need to have two nested Block expression nodes? Where do those come from? How can you eliminate one of those? Why? Why not?

 

Conclusion

My claim for compile-time rewrite capabilities wasn’t fake. In fact, next time we’ll see how to generalize this notion (with a CompileTimeRewrite attribute tagged onto certain methods) and then we’ll move on to generalizing other control flow constructs. First we’ll look at Do and For, later at Switch (go nuts) and ForEach (“ducks quack”), and even aim for Try, Catch, When, Finally, Fault and Using. What follows after that, I’ll keep secret for now :-).

Del.icio.us | Digg It | Technorati | Blinklist | Furl | reddit | DotNetKicks

Introduction

In the last installment of our control library exploration, we kept things relatively simply by looking at the if-statement. In fact, we avoided a bunch of complications that have to do with non-local return constructs like break, continue and return (and put throw on that pile as well if you like). As an implication, we aren’t able to provide full-fidelity loop constructs just yet. Some glue is missing, and today’s write-up reflects the quest for such a tube of glue. Be prepared to be shocked initially though :-).

 

The while-loop naively translated

C#’s while-loop can be considered the mother of all loops. For and foreach derive from it. Okay, there’s still the do-loop as well, but let’s ignore this one for now. It will be simple enough to derive from the while-loop’s treatment. What about a sample loop to translate?

int i = 0;
while (i < 10)
{
    Console.WriteLine(i);
    i++;
}

Yes, I know, it’s a for-loop (almost, what’s missing?) but bear with me. There’s one observation to be made here that contrasts the while-loop’s structure from the if-statement one’s: the condition expression needs to be reevaluated multiple times, so just passing a Boolean argument to our control flow library CC# function won’t fly. We really need a function here, of type Func<bool>. Other than that, we can trivially (?) translate the loop body to an Action and feed it as an argument to the control flow function:

int i = 0;
Control.While(() => i < 10, () =>
{
    Console.WriteLine(i);
    i++;
});

Easy enough, isn’t it? How would such a function be defined, let’s have a look:

static class Control
{
    public static void While(Func<bool> condition, Action body)
    {
        while (condition()) body();
    }
}

But as you already know, we’re missing out on something: support for break and continue. In fact, the Parallel Extensions loop constructor can praise themselves happy a naive approach works for them, as I can’t come up with a meaningful interpretation of break and continue in a parallel loop construct. Well, maybe for break (which could be encoded using a return from the lambda function). But what about continue? Maybe just the same as break in such a context? Who knows.

 

Non-local return

The concept of non-local return is made clear in the following sample:

int i = 0;
while (i < 10)
{
    Console.WriteLine(i);
    if (i > 5)
        break;
    i++;
}

Useless code, but of conceptual importance. If we naively port this code to our naive CC# equivalent, we end up with a compile-time error:

int i = 0;
Control.While(() => i < 10, () =>
{
    Console.WriteLine(i);
    Control.If(() => i > 5, () => {
        break;
    });
    i++;
});

In fact, since our mission in last post was nicely summarized to be “eliminating the blue”, we can see we’re failing here already: the break keyword still lights up as primitive syntactical language surface (notice int isn’t to be treated as such, it’s shorthand for Int32 which is a type).

Always breaks at compile-time

Notice as a curiosity you can provide a proof that naive translation of a loop constructs with our current control flow library always yields a compile-time error in case break or continue statements are used. You can see this intuitively, since break and continue only work when embedded in a loop construct (what about switch’s overloaded use of break?). As we’re compiling away all such loop constructs, break and continue statements will end up orphaned.

In fact, we’re lucky it always produces a compile-time error as we’re not spoiling the program’s semantics that way: non-compiling code is the best we can hope for and it works out nicely for break and continue. Use of return is more worrisome:

int i = 0;
while (i < 10)
{
    Console.WriteLine(i);
    if (i > 5)
        return;
    i++;
}

translates into

int i = 0;
Control.While(() => i < 10, () =>
{
    Console.WriteLine(i);
    Control.If(() => i > 5, () => {
        return;
    });
    i++;
});

which will compile-fine. But we’ve just changed semantics: return here plays the role of … continue. Why? While we’re repeatedly executing the body through delegate invocation, returning from that delegate won’t stop that repetition. All it does it stopping the current loop body invocation prematurely, after which Control.While will move on to the next iteration. That’s what continue is used for.

Notice how the translation of the code to control flow library method calls gave rise to closures over the outer scope’s variables:

int i = 0;
Control.While(() => i < 10, () =>
{
    Console.WriteLine(i);
    Control.If(() => i > 5, () => {
        break;
    });
    i++;
});

We even got to a nested situation here. Pop quiz: how many heap-allocated integer values exist here? (You may be counter-surprised.) Does this pose another semantics-preserving challenge? Why (not)?

 

Staying sane?

Based on the observation in the previous side-bar we can think of a translation where break and continue translate into return statements from the loop body delegate, signaling whether execution should continue or break from the loop. Does this even work? Let’s have a look. Here’s a possible attempt (don’t ask me what it means):

int i = 0;
while (i < 10)
{
    Console.WriteLine(i++);

    if (i > 5)
        return;
    if (i % 2 == 0)
        continue;
    Console.WriteLine(i);
}

becomes

int i = 0;
Control.While(() => i < 10, () =>
{
    Console.WriteLine(i++);
    Control.If(() => i > 5, () => {
        return false; // break
    });
    Control.If(() => i > 5, () => {
        return true; // continue
    });
    i++;
    return true; // continue
});

Of course I’m the devil in disguise again, as my not-so-contrived sample above uses break and continue in a nested if-statement (and quite frankly, how many times would you have a plain break or continue statement directly embedded in a loop body, without intermediate statements like if?).

Now we have the problem of having the loop-behavior-controlling return statements buried inside an inner lambda expression that’s used by the if-statement. Things start to become more clumsy every minute. Do we need to tweak Control.If to return a Boolean just because it so happens that it may be used inside a loop? Or should we be emitting code that adds Boolean “shouldBreak” and/or “shouldContinue” variables to the loop body, make break and continue statements set that, and check whether the value has changed at any point it could (e.g. after returning from Control.If). And what if loops are nested? Oh my.

Nested loops

One thing C# (amongst many other imperative languages) doesn’t have is the ability to break or continue from another loop that the immediately enclosing one. I.e. you can’t break from outer loops in any way directly supported by the language, unless you use addition Boolean flags to drive such an execution path manually. No, I’m not advocating such a style of coding would be desirable but it’s worth thinking about how such a thing could be achieved.

What about using nesting levels integers for breaking? For example, break 0 means to break from the directly enclosing loop (where 0 could be optional of course), while break 1 stands for “break from the loop one level up”. Hmm, readable? Sometimes maybe. Also think about refactoring: you’ll have to be extremely careful not to introduce additional breakable statements unless you update the break level numbers manually. Error-prone for sure and a curse for refactoring engines. Visual Studio already avoids refactoring loop bodies if a break or continue statement occurs (“The selection contains a “break” statement but not the enclosing statement”) but you’re not making things better by introducing this new “feature”.

An alternative would be to have loops denoted by symbolic names. If every loop has a name, you could break or continue from it by using that name. So, control flow structures no longer are just anonymous blocks of code, but become named entities. An obvious choice here would be to have control structures represented as objects, so their identifiers become variable (or better, constant) names:

var outer = while (condition1)
{
    var inner = while (condition2)
    {
        // …
        break outer;
        // …
    };
};

So loops no longer are statements but have become expressions (things that produce a value). With some hypothetical call-with-block-juxtaposition-lifting feature (what the ****?), this could be turned into a method call where the block is passed as a lambda.

We have a couple of problems here, but let’s first detail this “call-with-block-juxtaposition-lifting” animal. Juxtaposition simply means “two things written adjacent to one another”. Some languages even make juxtaposition a proper (overloadable) operator. In mathematics juxtaposition is often used to denote multiplication: a b stands for a * b. Here we’re concerned with juxtaposition of a method call and a code block:

method(args) { /* this is a block */ };

If blocks of code are shorthand for Action delegates, this actually corresponds to something along those lines:

Action __temp = () => { /* this is a block */ };
method(args) __temp;

What we want is something that allows to “eat” the object following the method call, and lift it into the call itself. In fact, this is the mess we’re in for using parentheses around argument lists (as opposed to whitespace-separated argument lists like add 1 2 in various functional languages). But you’ll agree lots of built-in primitives looks like this: keyword (args) { /* block */ }: using, lock, while, if, switch, for, do, etc. What we don’t want is to add syntactical noise to the block-passing code: “, () =>)”. This is what you see in the Parallel Extensions’ loop constructions.

In other words, the “method” above would indicate its desire to allow a juxtaposed argument at the end somehow:

WhileLoop while(Func<bool> condition, juxtaposed Action body) { … }

That fancy juxtaposed keyword would have similar rules as params in that it can only occur at the end. Of course you also want to drop the () => … noise for specifying the delayed condition, so that you can write the following:

var loop = while (condition)
{
    // Loop body
};

to stand for the equivalent:

var loop = while(() => condition, () =>
{
    // Loop body
});

Overload resolution could be the driver here, but it can get quite (and maybe too) subtle: if no Boolean first parameter exists, but a Func<Boolean> does exist the expression gets lifted in a lambda expression for delayed evaluation. In fact, you’re adding a calling mechanism next to by-value (the default) and by-ref (needs to be specified on both sides using “ref”)… And no, decolorization of while above is not a mistake as it has become a (global?) method call.

Now the remaining problem is how to refer to the loop variable’s name from inside the block itself. We’re hitting a violation of definite assignment:

Action blowUpStack = () => { blowUpStack(); };

The compiler rightfully complains about blowUpStack not to be assigned inside the body of the lambda expression with statement block. Yes, you’re right, it could be made smarter because it knows that the code in the lambda body can’t execute before the assignment has happened, but it isn’t. We could either use a fixpoint to solve this issue, or just do a bogus intermediate assignment:

Action blowUpStack = null;
blowUpStack = () => { blowUpStack(); };

Unfortunately for our while-loop code we’d loose our handy use of local variable type inference if we were to do so, as the initial value of “null” has no type in and by itself. Okay you could write the following, but you’ve just pushed the type to the other side:

var loop = default(WhileLoop);

Either way, all of this is not the point we’re making here. The language syntax could allow a loop to be assigned to a variable, and that variable could be used inside the loop to specify break or continue statements targeting it. Now, believe it or not but we’re gonna get such functionality for free in our mission to create a better control library that’s friendly in the face of non-local return.

What we learn from such a hypothetical juxtaposition mechanism is it can permit for dropping syntactical noise and allow people to write their own control flow structures simply by declaring fancy methods. For example, lots of people misuse the using statement to do performance measurement by implementing a Stopwatch-wrapping IDisposable implementation. If one could write a measure method with a juxtaposed block tagged to the end, allowing it to be called as if it were a proper keyword:

measure
{
    // Code goes here.
}

Extensible syntax? Not really, just much cleaner method calls. Language future-limiting? The best you could hope for is “call is ambiguous” error messages if the language itself decides to add a construct the user has invented already. That would happen if such additions would be themselves implemented as methods (possibly with compile-time rewrite treatment as shown in our previous post, to turn them into real efficient IL) which are always in the global scope. If the user defines a construct herself, e.g. the same while method as the compiler did, it’d have to be disambiguated by qualifying the method (MyExtensions.while or so). You get the idea.

Looks like this second naive approach is going to leave us with headaches once more: signaling break and continue is contagious to all sorts of statement functions that need to propagate loop behavior.

 

Going insane

A possible approach to tackle this non-local return requirement is by using a well-known non-local return mechanism in the CLR. You can feel it coming: we’re going to use “exceptions” as “regulars”. Performance folks are likely about to open a Facebook group on the spot against this practice, but I don’t care. Remember the mission of this whole series is to think about how things can be done and nothing more? It’s not coincidental this series is tagged as “Crazy Sundays”. And, as I’ve shown in the previous post, it’s possible to envision a compile-time rewrite mechanism that would translate use of the original primitives into efficient jump-based IL. In such a world we’d not be loosing anything; we’re gaining extensibility and generalization of built-in concepts. That’s what a theoretician loves: reducing concepts. Praise yourself lucky I’m not going bananas again with S and K.

So, how are we going about translating the following fragment in a semantics-preserving way without loosing break and continue usage?

int i = 0;
while (i < 10)
{
    Console.WriteLine(i++);

    if (i > 5)
        break;
    if (i % 2 == 0)
        continue;
    Console.WriteLine(i);
}

Recall we’re gaining control over the whole loop’s execution by turning this into a method. We’re playing a mini virtual machine essentially and just need a way for the user to signal a jump needs to be carried out. Such jumps need to be scoped to the enclosing loop, or in fact any loop, requiring us to have a mechanism to refer to a loop. In the previous side-bar we’ve seen an approach that’d name a loop by assigning it to a variable, but that’s a bit roundabout. We only need a way inside the loop body to refer to the loop. Since the loop’s body is represented as a lambda expression, what about parameterizing that one with a “control flow control variable” (a CFCV if you’re paid for creating fancy words rather than focusing on the technology itself <g>).

Here’s what we want to be able to write:

int i = 0;
Control.While(() => i < 10, ctrl =>
{
    Console.WriteLine(i++);

    Control.If(() => i > 5, () => {
        ctrl.Break();
    });
    Control.If(() => i % 2 == 0, () => {
        ctrl.Continue();
    });
    Console.WriteLine(i);
});

Wow, all the significant blue is gone. Bart is happy. Signaling a break or continue is done by calling the corresponding method on the CFCV of the corresponding loop. Notice this allows nested loops to break out of the not immediately enclosing loop as well. We’ve gained flexibility in fact.

Nested loops, again

Recall our sample from the previous side-bar? Let’s rewrite it here:

var outer = while (condition1)
{
    var inner = while (condition2)
    {
        // …
        break outer;
        // …
    };
};

Think away the var and maybe put the identifiers somewhere else (like “while (…) as name” or so, whatever) and imagine the translation of “break outer” to be “outer.Break()”. You’ve just enabled this generally applicable approach through a simple syntax-directed translation. In fact, I’ll detail such translations more formally later on.

The key question is of course how to make this work in the control library implementation. First of all, notice we’re not contagiously affecting other statements. In the sample above, the Control.If can continue to work as it did before. It just so happens its block contains a call on the lambda parameter from the outer (loop’s) lambda statement. That’s a perfectly valid thing to do, we don’t need any infrastructure to signal break or continue behavior through other kinds of statements, as exceptions (well, “regulars”) will take care of that.

Exception cost

The problem with exceptions is they serve a very particular use of non-local control flow. The kind that can span multiple stack frames, allows for filters requiring a two-pass stack walk treatment, and is layered on SEH to preserve full-fidelity exceptions in the face of native code calling into it. Non-exceptional use of exceptions is not something the CLR is optimized for. Though the mechanics are exactly what we need here, it’d have to be promoted to a first-class control-flow mechanism with its own performance optimizations to be considered a viable option. That this is not a crackpot idea is shown by other platforms.

Let’s reveal the goodies now. Here’s what the While method looks like:

public static void While(this Func<bool> condition, Action<LoopControl> body)
{
    var control = new LoopControl();

    while (condition())
    {
        if (!control.Do(body))
            break;
    }
}

At its heart, not surprisingly this is simply a while-loop. Recall the whole idea is to allow people to override the default behavior (a good idea or not, let’s leave that in the middle), in which case the implementation would be quite different (maybe peppering some additional checking onto the loop body, as in while-if). And the above could be compile-time rewritten to the original form as I alluded to before: call-sites of LoopControl.Break and LoopControl.Continue would be replaced by the original jump mechanisms.

Notice the loop itself only has a break-statement inside it. The continue is realized somewhere else. How this works is not unsurprisingly hidden behind the LoopControl.Do method, which obviously will execute the body and somehow get to know whether or not a break or continue is requested. If a continue is requested (or nothing has been requested at all and the body ran to completion), Do will return true, in case of a break it will return false instead. So, what does Do do?

interface ILoopControl : IControl
{
    void Break();
    void Continue();
}

sealed class LoopControl : ILoopControl
{
    public void Break()
    {
        throw new LoopControlException(this, LoopControlAction.Break);
    }

    public void Continue()
    {
        throw new LoopControlException(this, LoopControlAction.Continue);
    }

    internal bool Do(Action<LoopControl> body)
    {
        try
        {
            body(this);
        }
        catch (LoopControlException ex)
        {
            if (ex.Block != this)
                throw;

            if (ex.Action == LoopControlAction.Break)
                return false;
        }

        return true;
    }
}

enum LoopControlAction
{
    Break,
    Continue
}

Ignore the interfaces for now (IControl is plain empty in fact). Those are their for extensibility scenarios I’m not going to talk about just yet. The main idea is inside LoopControl where Break and Continue signal a control flow request by means of an exception. Do itself has as its main task to execute the body, passing in the loop control object as the parameter on which the user can signal such requests. This also allows the loop controller to know which loop has to be broken out of, which is what the first part of the LoopControlException handler does: is the request for us or for something higher up? Notice the use of a filter would be useful here.

The two cases of a LoopControlException are straightforward to see from the code. In case the exception was used to request a break, we return false spot on. Otherwise, we fall through (and the body execution has stopped anyway at the point the exception for continue was thrown) and return true to say the loop can continue to move on.

Pretty simply glue after all, isn’t it?

 

A word on compile-time rewrite

To get the whole loop code back to the original form, which leads to efficient IL and eliminates the cost of exceptions in use for the runtime-assisted execution form, we need to do an expression tree to expression tree rewrite operation. Since .NET 4.0 we have statement trees in the System.Core library’s System.Linq.Expressions namespace, so one could envision those to be the object model compilers will use to represent code. Today that’s the case for IronRuby and IronPython, but this is feasible for C# and VB as well. Assuming you are granted access to those trees by some mechanism (e.g. our [CompileTimeRewrite] attribute), you could rewrite them e.g. to implement an optimizer or to weave in aspects, etc.

Below is the .NET 4.0 way to new up an AST that is the equivalent to our code with the control flow structures library:

var ctrl = Expression.Parameter(typeof(LoopControl), "ctrl");
var cwi = (from mi in typeof(Console).GetMethods()
           where mi.Name == "WriteLine"
           let p = mi.GetParameters()
           where p.Length == 1 && p[0].ParameterType == typeof(int)
           select mi).Single();
var i = Expression.Variable(typeof(int), "i");
var m = Expression.Block(new [] { i },
    Expression.Assign(i, Expression.Constant(0)),
    Expression.Call(
        typeof(Control).GetMethod("While"),
        Expression.Lambda(Expression.LessThan(i, Expression.Constant(10))),
        Expression.Lambda(
            Expression.Block(
                Expression.Call(cwi, Expression.PostIncrementAssign(i)),
                Expression.Call(
                    typeof(Control).GetMethod("If"),
                    Expression.Lambda(Expression.GreaterThan(i, Expression.Constant(5))),
                    Expression.Lambda(
                        Expression.Call(ctrl, typeof(LoopControl).GetMethod("Break"))
                    )
                ),
                Expression.Call(
                    typeof(Control).GetMethod("If"),
                    Expression.Lambda(
                        Expression.Equal(
                            Expression.Divide(i, Expression.Constant(2)),
                            Expression.Constant(0)
                        )
                    ),
                    Expression.Lambda(
                        Expression.Call(ctrl, typeof(LoopControl).GetMethod("Continue"))
                    )
                ),
                Expression.Call(cwi, i)
            ),
            ctrl
        )
    )
);

The debug view of this code looks as follows, which starts to look a bit as fragmented C# code if you look from a distance :-). But rest assured it’s equivalent to the original form.

image

Notice closures over outer variables is something that isn’t reflected in the expression trees at this stage. Code to hoist locals for features like closures or iterators is something that’s deferred till a later stage of the compilation. Next, we need to have a look at our target which is the expression tree corresponding to the plain old while-loop in C#:

var cwi = (from mi in typeof(Console).GetMethods()
           where mi.Name == "WriteLine"
           let p = mi.GetParameters()
           where p.Length == 1 && p[0].ParameterType == typeof(int)
           select mi).Single();
var i = Expression.Variable(typeof(int), "i");
var @break = Expression.Label("break");
var @continue = Expression.Label("continue");
var m = Expression.Block(new [] { i },
    Expression.Assign(i, Expression.Constant(0)),
    Expression.Loop(
        Expression.Block(
            Expression.IfThen(
                Expression.Not(Expression.LessThan(i, Expression.Constant(10))),
                Expression.Break(@break)
            ),
            Expression.Call(cwi, Expression.PostIncrementAssign(i)),
            Expression.IfThen(
                Expression.GreaterThan(i, Expression.Constant(5)),
                Expression.Break(@break)
            ),
            Expression.IfThen(
                Expression.Equal(
                    Expression.Divide(i, Expression.Constant(2)),
                    Expression.Constant(0)
                ),
                Expression.Continue(@continue)
            ),
            Expression.Call(cwi, i)
        ),
        @break,
        @continue
    )
);

which looks like this in the debugger visualizer:

image

Notice there’s no concept of While or Do. There’s just Loop, with break and continue labels peppered on it. The goal of a compile-time rewriter would be to (recursively of course):

  • Turn a .Call for Control.While into a .Loop with the negated condition lambda expression’s body turned into an .If … .Break block.
  • Maintain a table that maps the LoopControl-typed ParameterExpression fed in to the second argument of Control.While onto the corresponding .Loop node from which the .Break and .Continue labels for that loop can be found.
  • Turn instance method calls to LoopControl.Break and LoopControl.Continue into .Break and .Continue statements targeting the corresponding loop’s labels based on the backwards mapping through the table being built up.
  • Make the aforementioned table scope-based, i.e. cleaning up entries upon leaving a loop body’s scope.

This is left as an exercise for the reader. In fact, the expression tree above can trivially be compiled into executable IL at runtime already in .NET 4.0:

Expression.Lambda<Action>(m).Compile()();

The latter pair of parentheses simply calls through the resulting delegate, with the following result:

image

 

Wrapping up

Time to conclude for today. We’ve seen how some basic non-local returns can be fueled by exceptions, here used as “regulars”. One could envision the classic usage of those to be optimized by compile-time rewrite mechanisms, so we’re not loosing anything, we’re only gaining stuff:

  • Breaking from and continuing loops other than the immediately enclosing one.
  • Potential for extensibility, where control flow structures simply are resolved by means of method (overload) resolution.
  • An idea of simplifying method calls taking in code blocks.
  • Unified treatment of all control flow structures.

Next time we’ll play the same tricks for other loop structures and see how dreaded some control flow structures are to translate like this. We’ll also show a more formal syntax-directed translation using a denotational semantics approach. This concludes my 11th of July (some Flemish holiday) “celebration” in a rather geeky manner.

Del.icio.us | Digg It | Technorati | Blinklist | Furl | reddit | DotNetKicks

Wow, can’t believe how long the blog silence over here has been. Things have been quite hectic on my side the last few months, virtually running two jobs in parallel. One to cooperate in shipping our upcoming .NET 4.0 release, and one for side projects, more explorations of type systems literature and recent innovations, but most importantly a quite big writing project. More about that later, but let this opening paragraph suffice as a lame argument for keeping your RSS feed under control. What about changing that now?

 

Introduction

As stated repeatedly, the LIN part in LINQ (for Language Integrated) is a matter of syntactical sugar. Sweet sugar, but still. It’s all about translation of keywords into method calls with (mostly) lambda expression arguments. To confirm this characteristic for yourself, refresh your mind using my C# 3.0 Query Expression Translation Cheat Sheet.

Sometimes those translations can get quite complicated due to transparent identifiers, but ultimately no magic is left in the translated form. In fact, we’re compiling query expression syntax away to a subset of C# 3.0 without LINQ. And from there, a translation exists to turn the resulting C# 3.0 program into a C# 2.0 equivalent. C# 3.0 features (including LINQ) are therefore lost in translation.

image

This makes people wonder: if C# 3.0 is so expressive to accommodate for the LINQ syntax and semantics we’re used to, what else is there to be done? Notice though C# 3.0 is not strictly more powerful than C# 2.0: for every C# 3.0 program, a semantically equivalent C# 2.0 program can be formulated. Going further back in time to C# 1.0 poses troubles due to the use of generics and such. We’ll not go there.

What about C# 4.0 to C# 3.0?

The cutting edge reader will wonder whether C# 4.0 programs can be translated into a C# 3.0 equivalent form. In fact, this is not longer the case. Not because of changes to the CLR (as far as I know there are no bare essential changes to CLR that are leveraged by C# 4.0, except for no PIA maybe which leverages a mix of compiler features and enhancements to the CLR type system) but rather because of the lack of semantically equivalent features in C# 3.0 to turn the translation into. For example, optional parameters use a metadata construct and so do generic co- and contra-variance annotations.

What about dynamic? In fact just a bunch of CallSite objects with use of C#-specific binder types. Assuming you have the new assemblies containing those types available, a C# 3.0 equivalent form for the generated code can be written down. Maybe surprising that the flagship feature of C# 4.0 seems translatable to C# 3.0 while some other relatively minor features aren’t.

I’m not saying the generated code for dynamically dispatched calls is simply to write by hand. Check it for yourself. Do you prefer the next fragment …

dynamic d = "Bart";
string s = d.ToLower();
Console.WriteLine(s);

… over this one?

object d = "Bart";
var site1 = CallSite<Func<CallSite, object, string>>.Create(
                Binder.Convert(
                    typeof(string),
                    CSharpConversionKind.ImplicitConversion,
                    false
                )
            );
var site2 = CallSite<Func<CallSite, object, object>>.Create(
                Binder.InvokeMember(
                    CSharpCallFlags.None,
                    "ToLower",
                    typeof(Demo),
                    null,
                    new [] {
                        CSharpArgumentInfo.Create(CSharpArgumentInfoFlags.None, null)
                    }
                )
            );
string s = site1.Target(site1, site2.Target(site2, d));
Console.WriteLine(s);

You’ll agree the latter one is much more clear (for geeks at least) <g>.

Either way, people only start to see the possibilities of the C# 3.0 features such as lambda expressions. But sometimes they do quite naively. And that’s where this blog post series comes in. But first, have a look at the parallel extensions to the .NET Framework with its Parallel loop constructs, such as Parallel.For. Thanks to the availability of lambda expressions, calling this method becomes very much like a regular for-loop:

Parallel.For(1, 10, i => {
    // Do something with i.
});

versus

for (int i = 1; i <= 10; i++) {
    // Do something with i.
}

Okay, there are a few differences in that there’s no iteration expression and no explicit loop condition, and the type for the variable is tied to int. But syntactically the shape looks pretty much as the built-in for.

 

Eliminating “the blue”

Lots of languages have an big syntactical surface. Some more than others of course, but still. C# definitely isn’t in the minimalistic camp. This is more of a philosophical thing, but it suits the message of the blog post very well. Just look at the following code snippet and observe the number of keywords used here:

image

Keywords are indicated with blue, hence the title of this section. The question we ask ourselves in this journey is whether we can do with less “language primitives”, potentially buying us additional advantages actually. In other words, do we really need loops? What about if-statements or switches? Exception handling “primitives”? Can we demote them to syntactical sugar over patterns of more essential building blocks like method invocation and lambda expressions?

Those are all valid questions to ask. But even the simplest loops in the fragment above aren’t very straightforward to turn into such a pattern. Pause here and think about it for a while. Get your inspiration from the Parallel.For. You’ll conclude there’s something going on here that comes and bites you. I’ll leave it an open-ended question for just a while.

 

Warming up with if

If all of this sounds like fannying about, it may help to see a sample of what we’re trying to do here. Notice this is purely a brain exercise with lots of implications. Some good, some bad, some neither of the two. Don’t ask just yet “why”, focus on “how” for now.

So here it is: the if-statement. In essence a branching mechanism to execute blocks of code conditionally based on a Boolean-valued expression. Obviously this implementation has a huge benefit: branch instructions are a low-level IL primitive that yields great performance. Why would we want to trade that for something else? Bad question for now. Let’s first see whether it’s feasible. And believe me, ultimately in this series you’ll say: “not that much of a crackpot idea after all”.

Starting simple, what does an if-statement look like exactly? Different forms exist:

if (expression)
    embedded-statement

and

if (expression)
    embedded-statement
else
    embedded-statement

We want to turn both in cosmetically minimally pleasing method calls, effectively compiling away the blue words. For now, we’ll just go with static methods and want to end up with something like this:

Control.If(expression, () => {
    embedded-statement
});

and

Control.If(expression, () => {
    embedded-statement
}).Else(() => {
    embedded-statement
});

The reason the latter form leverages a fluent interface design (tagging and else-clause on by “dotting into” Else) will become clear later. And in fact, we’ll tweak the form a bit as we go forward to enable some nice extensibility capabilities (oh my), but let’s keep things relatively easy for now. How to implement the guys above?

In fact, the second form above is a bit subtle already. So, let’s think away the Else call for now and just provide two If method overloads. One that only has a positive-case embedded-statement, and another one that has an else-clause too. Obviously the blocks are represented as Action delegates though a functional variant could be thought of (see further). Here’s a sample implementation:

static class Control
{
    public static void If(bool condition, Action @if)
    {
        if (condition) @if();
    }

    public static void If(bool condition, Action @if, Action @else)
    {
        if (condition) @if(); else @else();
    }
}

which would be invoked along those lines:

Control.If(age > 18, () => {
    AllowAccess();
}, /* else */ () => {
    DenyAccess();
});

I hear the reader scream: boring. Right, but things will improve soon. To turn this into a functional if, i.e. a conditional expression really, one could add a generic parameter TResult and replace Action by … Hmm, by what really? Just Func<TResult>? Not really, since the first overload needs to produce “nothing” in case the condition did evaluate false. What we really need here is a Maybe<T> construct, similar to Nullable<T> but that works over every type. Let’s not go there in the context of this journey.

Observe how we’ve trivially pushed down the keywords into methods. Why would we do so? Because methods could really be overloaded. Static ones not directly of course in the strict sense of the word. But if we turn them into extension methods, sure enough. Assume the if-statement syntax stood for:

expression.If(() => { embedded-statement }).Else(() => { embedded-statement });

Based on the static type of expression, either an instance method would be chosen (but what for the dreaded null reference?) or an extension method would be chosen (no risk for null dereference exceptions, so there’s a workaround). Or nothing would be found of course. Here’s how it would work for the supported Boolean condition expression, again ignoring the .Else part:

static class Control
{
    public static void If(this bool condition, Action @if)
    {
        if (condition) @if();
    }

    public static void If(this bool condition, Action @if, Action @else)
    {
        if (condition) @if(); else @else();
    }
}

But that’s clearly very wasteful in case of a plain old Boolean. We’ve just made the if-statement extensible and overridable, at the cost of superior performance in the regular case. A hypothetical compiler could special-case Boolean or “party along” and keep itself honest. Here we enter meta-programming facilities. Assume for a moment a hypothetical future version of the language has statement trees and we could declare our extension methods for Boolean-based if as follows:

static class Control
{
    [CompileTimeRewrite(typeof(Hypohetical.Future.CSharp.Compiler.SecondPhase.Rewriter))]

    public static void If(this Expression<bool> condition, Expression<Action> @if)
    {
    }

    [CompileTimeRewrite(typeof(Hypohetical.Future.CSharp.Compiler.SecondPhase.Rewriter))]

    public static void If(this Expression<bool> condition, Expression<Action> @if, Expression<Action> @else)
    {
    }
}

Here I’m also fantasizing about a hypothetical Lazy<T> construct in an expression tree format, written as Expression<bool>. Essentially it’d mean the Boolean expression gets packaged up in an expression tree, just like assigning a lambda expression to an Expression<T> today (with T a delegate type) causes an expression tree to be generated.

The idea of the CompileTimeRewrite attributes above is to detour the code generation through a rewriter that takes in the MethodCallExpression (assuming such a code object model would be tightly integrated with the compiler) for the method it has been tagged on to. A Rewriter type would essentially have an entry-point that gets the whole expression tree and returns a rewritten expression tree, which is then fed in to the next phase of compilation where trees turn into IL code. In other words, the Boolean case would be rewritten at compile time and have no performance penalty as the emitted code could be the same as the original. However, we’ve gained some extensibility mechanism and managed to keep the front-end honest by introducing a rewriting concept.

We still want to make control over “keyword overloading” more granular. For the if-statement that’s a bit of overkill you may say, but we’ll see such a thing come back in subsequent posts, so let’s introduce it here. We want to bring the .Else call back so that

Control.If(age > 18, () => {
    AllowAccess();
}, /* else */ () => {
    DenyAccess();
});

turns into

Control.If(age > 18, () => {
    AllowAccess();
}).Else(() => {
    DenyAccess();
});

One keyword, one method. Just like LINQ (give or take some operators that are more fancy). Hmm, but how can we make this work? Contrast to LINQ we don’t want to delay execution of the if-statement somehow. It really needs to be eager. The age > 18 part is already evaluated because methods are called by value in C# (no by need semantics as in Haskell). Say it evaluated true, then the If method needs to make sure the Else following it doesn’t have an effect. The call will be there, but we want it to be a no-op. If the condition was false, the action fed in to If should not execute but if there is an Else method tagged on, we need to run that one’s action block.

To make this work, we can come up with a handy trick. First we use the idea of encoding states through the language grammar by means of types. To be able to call Else after calling If, the If method needs to have a return type that exposes an Else method. After an Else we don’t want users to dot into anything more, so Else should return void. So, we end up with the following for the entry-point If method that’s always required:

public static IfBlock If(this bool condition, Action @if)
{
    return IfBlock.If(condition, @if);
}

The IfBlock type will provide an Else method on its turn, that’s clear. But what does IfBlock.If have to do internally? Well, if the condition is true, it can readily execute @if but should still return an IfBlock object that is immunized for subsequent Else calls. Otherwise it should return an object that will execute an else-block if the corresponding Else method is ever called. Here’s the trick:

class IfBlock
{
    internal static IfBlock Instance = new IfBlock();

    public virtual IfBlock ElseIf(bool condition, Action elseIf)
    {
        return IfBlock.If(condition, elseIf);
    }

    public virtual void Else(Action @else)
    {
        @else();
    }

    internal static IfBlock If(bool condition, Action action)
    {
        if (condition)
        {
            action();
            return IdleIfBlock.Instance;
        }

        return IfBlock.Instance;
    }
}

class IdleIfBlock : IfBlock
{
    new internal static IdleIfBlock Instance = new IdleIfBlock();

    public override IfBlock ElseIf(bool condition, Action @else)
    {
        return this;
    }

    public override void Else(Action @else)
    {
    }
}

Start with the If factory method. If the condition is true, we execute the if-clause action straight away and return the IdleIfBlock singleton. The behavior of that object is to do nothing upon calling Else. Otherwise, if the condition was false, we return the IfBlock singleton. That one simply runs the action passed to Else upon such a call. ElseIf is a bonus one. The idea is simply: once idle, always idle. It effectively means “we already executed a branch of the multi-way if-statement”, no-op’ing all subsequent branches or conditions.

Caveat

This stuff is tricky. We’ve already introduced a big caveat by naively adding ElseIf. Can you see why? Call by value semantics are going to bite us here. Consider the following fragment:

if (person.Age < 18)
    embedded-statement
else if (person.Age > 21)
    embedded-statement 
else
    embedded-statement

If we turn this into our “Control C#” (CC#) language, we end up with:

Control.If    (person.Age < 18, () => { … })
       .ElseIf(person.Age > 21, () => { … })
       .Else  (                 () => { … });

What if person.Age has a side-effect? How many times does it occur for the fragment above and how many times for the original fragment? Right, one time for the C# fragment, two times for the CC# fragment due to call by value evaluation for the ElseIf call. How do we eliminate this problem?

First of all, observe C# doesn’t have a proper elseif (only #elif in the preprocessor language, which not really is a preprocessor in fact). Instead, the code above is equivalent to:

if (person.Age < 18)
    embedded-statement 
else
    if (person.Age > 21)
        embedded-statement 
    else
        embedded-statement

In fact we’re entering the realm of the dangling else problem here. This one here is a true believer in significant whitespace. Either way, we can avoid the problem by staying faithful to the original language’s behavior and not try to invent ElseIf here:

Control.If(person.Age < 18, () => {
   …
}).Else(() => {
   Control.If(person.Age < 21, () => {
      …
   }).Else(() => {
      …
   });
})

Alternatively we should delay the effect of the evaluation of the condition arguments: every bool argument so far becomes a Func<bool> and every evaluation of the argument becomes a delegate call. Inside the IdleIfBlock we never call through such delegates, therefore eliminating the unwanted side-effects once the control flow structure enters the idle state.

My current CC# implementation uses delayed execution to accommodate for this problem and still has the ElseIf method as a source for some brain-teasers. The transformation of the shown code above to the delayed one is trivial. No worries, by the end of this blog series you’ll get the whole source to play with.

 

Cooling down again

That was a pretty heavy warm-up exercise, wasn’t it? Fairly trivial stuff after all but lots of hidden pitfalls and some hacking required to get the desired result. It’s clear what our next challenges in this series will be: eliminate other control flow structures, preserving their full flexibility and semantics. Based on the above exercise, can it get much harder you may wonder? In fact, it does.

Recall my little mental challenge mentioned in the Eliminating “the blue” section:

Those are all valid questions to ask. But even the simplest loops in the fragment above aren’t very straightforward to turn into such a pattern. Pause here and think about it for a while. Get your inspiration from the Parallel.For. You’ll conclude there’s something going on here that comes and bites you. I’ll leave it an open-ended question for just a while.

Have a glimpse over the prime algorithm again:

int i = 0;
for (i = 2; i < 20; i++)
{
    bool cont = false;

    int j = 0;
    for (j = 2; j <= Math.Sqrt(i); j++)
    {
        if (i % j == 0)
        {
            cont = true;
            break;
        }
    }

    if (cont)
        continue;

    Console.WriteLine(i);
}

Let’s do the mechanical translation of what the code above may look like, employing similar techniques as the if-statements for the for-loops: operands become lambda expressions, keywords become calls:

int i = 0;
Control.For(() => i = 2, () => i < 20, () => i++, () =>
{
    bool cont = false;

    int j = 0;
    Control.For(() => j = 2, () => j <= Math.Sqrt(i), () => j++, () =>
    {
        Control.If(() => i % j == 0, () =>
        {
            cont = true;
            break;
        });
    });

    Control.If(() => cont, () => {
        continue;
    });

    Console.WriteLine(i);
});

Not bad. But we have a serious problem. How to translate break and continue? And what about method returns?

void Bar()
{
    if (condition)
        return;

   
// Do more.
}

Turning this into the following doesn’t yield the correct result:

void Bar()
{
    Control.If(() => condition, () => {
        return;
    });

   
// Do more.
}

Instead of returning from the outer method, we return from the lambda expression. We’re actually unlucky the code above compiles, for if the outer method would return something, the resulting return-statement would be invalid:

int Foo()
{
    Control.If(() => condition, () => {
        return 42; // Can’t return an int from an Action delegate!
    });

   
// Do more.
}

Throughout the series we’ll have a look into these problems and come up with solutions everyone is going to beat the author for on the head. But again, this is a brain exercise if nothing more.

 

Next time on CC#…

Simple loops and non-local returns. Stay tuned!

Del.icio.us | Digg It | Technorati | Blinklist | Furl | reddit | DotNetKicks

More Posts