Tuesday, August 11, 2009 9:06 AM
bart
Statement Trees With Less Pain – Follow-Up on System.Linq.Expressions v4.0
Introduction
In my last post, Expression Trees, Take Two – Introducing System.Linq.Expressions v4.0, I showed how to the extensions to the LINQ expression trees API opens up for full-blown statement trees including support for assignment, control flow, etc. One popular question that came up in the comments section is on the lack of language-level support for statement trees, like this:
Expression<Func<int, List<int>>> getPrimes = to =>
{
var res = new List<int>();
for (int n = 2; n <= to; n++)
{
bool found = false;
for (int d = 2; d <= Math.Sqrt(n); d++)
{
if (n % d == 0)
{
found = true;
break;
}
}
if (!found)
res.Add(n);
}
return res;
};
At this point, the above won’t work and trigger the following compile error:
A lambda expression with a statement body cannot be converted to an expression tree
This hasn’t changed from the previous release. Though it would be a nice feature to have, there are various reasons not to have it integrated with the language at this point. My posts to the comments section of my previous post elaborate on some of the conservative constraints employed during language design, applied to this particular case. So, sorry: not at this point.
The result is the above turns into quite an involved statement tree declaration if done by hand. Let’s repeat it here to set the scene:
var to = Expression.Parameter(typeof(int), "to");
var res = Expression.Variable(typeof(List<int>), "res");
var n = Expression.Variable(typeof(int), "n");
var found = Expression.Variable(typeof(bool), "found");
var d = Expression.Variable(typeof(int), "d");
var breakOuter = Expression.Label();
var breakInner = Expression.Label();
var getPrimes =
// Func<int, List<int>> getPrimes =
Expression.Lambda<Func<int, List<int>>>(
// {
Expression.Block(
// List<int> res;
new [] { res },
// res = new List<int>();
Expression.Assign(
res,
Expression.New(typeof(List<int>))
),
// {
Expression.Block(
// int n;
new [] { n },
// n = 2;
Expression.Assign(
n,
Expression.Constant(2)
),
// while (true)
Expression.Loop(
// {
Expression.Block(
// if
Expression.IfThen(
// (!
Expression.Not(
// (n <= to)
Expression.LessThanOrEqual(
n,
to
)
// )
),
// break;
Expression.Break(breakOuter)
),
// {
Expression.Block(
// bool found;
new[] { found },
// found = false;
Expression.Assign(
found,
Expression.Constant(false)
),
// {
Expression.Block(
// int d;
new [] { d },
// d = 2;
Expression.Assign(
d,
Expression.Constant(2)
),
// while (true)
Expression.Loop(
// {
Expression.Block(
// if
Expression.IfThen(
// (!
Expression.Not(
// d <= Math.Sqrt(n)
Expression.LessThanOrEqual(
d,
Expression.Convert(
Expression.Call(
null,
typeof(Math).GetMethod("Sqrt"),
Expression.Convert(
n,
typeof(double)
)
),
typeof(int)
)
)
// )
),
// break;
Expression.Break(breakInner)
),
// {
Expression.Block(
// if (n % d == 0)
Expression.IfThen(
Expression.Equal(
Expression.Modulo(
n,
d
),
Expression.Constant(0)
),
// {
Expression.Block(
// found = true;
Expression.Assign(
found,
Expression.Constant(true)
),
// break;
Expression.Break(breakInner)
// }
)
)
// }
),
// d++;
Expression.PostIncrementAssign(d)
// }
),
breakInner
)
),
// if
Expression.IfThen(
// (!found)
Expression.Not(found),
// res.Add(n);
Expression.Call(
res,
typeof(List<int>).GetMethod("Add"),
n
)
)
),
// n++;
Expression.PostIncrementAssign(n)
// }
),
breakOuter
)
),
res
),
to
// }
).Compile();
I’ve formatted a bunch of nodes in the expression tree above to indicate the “tiles” that are based on the v3.0 subset of the Expression Tree APIs, which do have language support. What if we could simplify the declaration above by leveraging the language support at places where it’s worthwhile? A tour through simplifying statement trees…
Abstraction + application = ???
Without much further ado, let’s dive straight into our pool of expression tree tricks and reveal the goods for this round: abstraction and application. Guess what, we’re back to fundamental lambda calculus. An explanation is in order:
- Abstraction is the act of abstracting over an expression, i.e. introducing a parameterized function. Given an expression, turning a symbol therein into a parameter is what it takes to abstract over the expression:
- Application is also known as beta-reduction and corresponds to the act of applying a function given one or more arguments (strictly speaking one at a time):
Given those fundamental laws in lambda calculus, we can create a very round-about way of expressing 1 + 2. Just reverse the arrows in the last diagram: 2 becomes y, 1 becomes x, both “abstracting away” the constants one at a time, resulting in a function that’s readily applied to the two constants.
This basic insight is what will help us to simplify the statement tree declaration, allowing us to “tile in” expression trees in the bigger soup of statement tree nodes. But before we go there, let’s show the technique above in practice. Say we want to over-complicate the act of adding 1 and 2 together and do so through a deviation along the road of abstraction and application:
var x = Expression.Parameter(typeof(int), "x");
var y = Expression.Parameter(typeof(int), "y");
var f = Expression.Lambda<Func<int, int, int>>(
Expression.Add(x, y),
x, y
);
var three = Expression.Invoke(
f,
Expression.Constant(1),
Expression.Constant(2)
);
var res = Expression.Lambda<Func<int>>(three).Compile();
Console.WriteLine(res());
What’s happening here is not that difficult as it may seem at first glance. The “f” lambda expression simply is a function that goes from x and y to x + y (in C# lingo, (x, y) => x + y). Expression.Invoke is used to invoke a delegate (lambdas ultimately are delegates), this time supplying the arguments. So, “three” is shorthand for ((x, y) => x + y)(1, 2), which is longhand for 1 + 2. Finally, the “res” lambda expression is a simple function returning an int, being the result of the invocation of the sum abstraction.
Looking at the generated IL code for the “res” delegate (using the tricks revealed in the previous post), we see the following:
IL_0000: ldc.i4.1
IL_0001: ldc.i4.2
IL_0002: stloc.1
IL_0003: stloc.0
IL_0004: ldloc.0
IL_0005: ldloc.1
IL_0006: add
IL_0007: ret
Did you see it? You did? Great! Exactly, no single trace of a delegate call in here. The combination of application (InvocationExpression) over an abstraction (LambdaExpression) got optimized away. Instead of pushing constants 1 and 2 on the stack in preparation of a delegate invocation call instruction, they get stored in locals, followed by a dump of the invoked function’s IL where the expected ldarg instructions are replaced by ldloc instructions. All in all, it’s still as if we’d written:
var res = Expression.Lambda<Func<int>>(
Expression.Add(
Expression.Constant(1),
Expression.Constant(2)
)
).Compile();
which translates in slightly simpler IL code:
IL_0000: ldc.i4.1
IL_0001: ldc.i4.2
IL_0002: add
IL_0003: ret
Eliminating the excessive stloc/ldloc pairs in the original fragment is something the JIT compiler could take care of it feels happy to do so, but the core message here is that this trick of “abstract & apply” is cheaper than it looks.
Why am I telling you all of this? In fact, the trick outlined above is what we’ll use to tile in language-generated expression trees in a bigger sea of hand-crafted statement trees. For the sample above, what if we made the function definition through abstraction happen using the language-level support, while keeping the invocation call in plain hand-crafted expression trees:
var three = Expression.Invoke(
(Expression<Func<int, int, int>>)((x, y) => x + y),
Expression.Constant(1),
Expression.Constant(2)
);
var res = Expression.Lambda<Func<int>>(three).Compile();
See it coming? Think about it for a while: we’re combining a compiler-generated expression tree with a manually created bigger one in which we embed the smaller one.
Theory applied
To keep things a bit more controlled, focus on one part of the original statement tree:
Expression.IfThen(
// (!
Expression.Not(
// d <= Math.Sqrt(n)
Expression.LessThanOrEqual(
d,
Expression.Convert(
Expression.Call(
null,
typeof(Math).GetMethod("Sqrt"),
Expression.Convert(
n,
typeof(double)
)
),
typeof(int)
)
)
// )
),
// break;
Expression.Break(breakInner)
),
This corresponds to the terminating condition for the inner loop:
for (int d = 2; d <= Math.Sqrt(n); d++)
{
if (n % d == 0)
{
found = true;
break;
}
}
As we all know, that’s a plain old valued expression that could be turned into a function as follows:
Func<int, int, bool> cond = (x, y) => x <= Math.Sqrt(y);
for (int d = 2; cond(d, n); d++)
{
if (n % d == 0)
{
found = true;
break;
}
}
Now we’re getting somewhere. The code above is exactly the same as the original one, but we’ve abstracted over the condition by turning it into a function of its own, which we subsequently apply given d and n. For regular code, that doesn’t make much sense, but it’s exactly the trick we’re talking about here and that will make the expression tree case simpler. What if we translated the code above as follows?
Expression<Func<int, int, bool>> cond = (x, y) => x <= Math.Sqrt(y);
for (int d = 2; cond.Compile()(d, n); d++)
{
if (n % d == 0)
{
found = true;
break;
}
}
Heh, there’s our expression tree for the condition. As we’re still not generating the whole for-loop in a statement tree, we have to call Compile explicitly on the intermediate lambda expression to invoke it in the condition part of the for-loop. Now we can take it one step further and go back to the expression tree for the whole for-loop, patching it up with our intermediate “cond” expression tree … using application. At the same time, we can eat the “Not” node:
Expression.IfThen(
// (!
Expression.Not(
// d <= Math.Sqrt(n)
Expression.LessThanOrEqual(
d,
Expression.Convert(
Expression.Call(
null,
typeof(Math).GetMethod("Sqrt"),
Expression.Convert(
n,
typeof(double)
)
),
typeof(int)
)
)
// )
),
// break;
Expression.Break(breakInner)
),
In essence, the first argument to IfThen (i.e. the if-statement’s condition expression) contains two variables from the outer scope: d and n. We’ll abstract over those, introducing an intermediate lambda expression, and invoke (apply) that one using d and n. Code is worth a thousand words:
Expression.IfThen(
// (!(d <= Math.Sqrt(n)))
Expression.Invoke(
(Expression<Func<int, int, bool>>)((x, y) => !(x <= Math.Sqrt(y))),
d,
n
),
// break;
Expression.Break(breakInner)
),
Wow, that got a bit simpler, didn’t it? The formatted (bold, underline) part corresponds to the language-generated expression tree. All we have to do is wrap that thing in an InvocationExpression, passing in d and n as arguments (respectively being bound to x and y).
Similarly, we can simplify other parts of the statement tree. For example:
// if (n % d == 0)
Expression.IfThen(
Expression.Equal(
Expression.Modulo(
n,
d
),
Expression.Constant(0)
),
// {
Expression.Block(
// found = true;
Expression.Assign(
found,
Expression.Constant(true)
),
// break;
Expression.Break(breakInner)
// }
)
)
It doesn’t make sense to try to abstract Expression.Constant calls (it would only blow up the code size and make things more cumbersome), but the n % d == 0 part is something we could ask the compiler to generate for us:
// if (n % d == 0)
Expression.IfThen(
Expression.Invoke(
(Expression<Func<int, int, bool>>)((x, y) => x % y == 0),
n,
d
),
// {
Expression.Block(
// found = true;
Expression.Assign(
found,
Expression.Constant(true)
),
// break;
Expression.Break(breakInner)
// }
)
)
It’s a little unfortunate lambda expressions need to have a type forced upon them to emit them either as a delegate (Func<…> types) or an expression tree (Expression<Func<…>> types), requiring us to use an explicit cast to demand the compiler to generate an expression tree. This blows up the code significantly, but still we’re saving quite a bit of code. Also for method invocations, we can get rid of the ugly reflection code required to get the right overload, e.g.:
// if
Expression.IfThen(
// (!found)
Expression.Not(found),
// res.Add(n);
Expression.Call(
res,
typeof(List<int>).GetMethod("Add"),
n
)
)
Finding the Add method is simple in this case, but if you end up creating a MethodCallExpression with a bunch of arguments to a method with lots of overloads available, the code gets quite complicated. In the sample above, there’s little to gain, but just for the sake of illustration:
// if
Expression.IfThen(
// (!found)
Expression.Not(found),
// res.Add(n);
Expression.Invoke(
(Expression<Action<List<int>, int>>)((l, e) => l.Add(e)),
res,
n
)
)
Moreover, when typing the lambda expression above you’ll get full-blown (statically typed) IntelliSense:
In fact, this trick is a runtime-aided trick to implement an “infoof” operator (info-of) which I use from time to time. For example, if you want to get the MethodInfo for Console.WriteLine(“{0} is {1}”, “Bart”, 26) that can get quite involved using reflection flags (public, static), a method name in a string (“WriteLine”), a type array for the arguments (beware of the “params” behavior above), etc. Instead, you could do this:
static MethodInfo InfoOf(Expression<Action> ex)
{
var mce = ex.Body as MethodCallExpression;
if (mce != null)
{
return mce.Method;
}
else
{
throw new InvalidOperationException("InfoOf called on expression without any kind of member access.");
}
}
static MemberInfo InfoOf<T>(Expression<Func<T>> ex)
{
var me = ex.Body as MemberExpression;
if (me != null)
{
return me.Member;
}
else
{
throw new InvalidOperationException("InfoOf called on expression without any kind of member access.");
}
}
which we can call as follows:
var cw = InfoOf(() => Console.WriteLine("{0} is {1}", "Bart", 26));
var now = InfoOf(() => DateTime.Now);
So, all in all, the lack of statement tree support in the language is a pity, but by leveraging existing expression tree support we can simplify the task at hand in some cases. The result for our sample is as follows:
var to = Expression.Parameter(typeof(int), "to");
var res = Expression.Variable(typeof(List<int>), "res");
var n = Expression.Variable(typeof(int), "n");
var found = Expression.Variable(typeof(bool), "found");
var d = Expression.Variable(typeof(int), "d");
var breakOuter = Expression.Label();
var breakInner = Expression.Label();
var getPrimes =
// Func<int, List<int>> getPrimes =
Expression.Lambda<Func<int, List<int>>>(
// {
Expression.Block(
// List<int> res;
new [] { res },
// res = new List<int>();
Expression.Assign(
res,
Expression.New(typeof(List<int>))
),
// {
Expression.Block(
// int n;
new [] { n },
// n = 2;
Expression.Assign(
n,
Expression.Constant(2)
),
// while (true)
Expression.Loop(
// {
Expression.Block(
// if
Expression.IfThen(
// (!
Expression.Not(
// (n <= to)
Expression.LessThanOrEqual(
n,
to
)
// )
),
// break;
Expression.Break(breakOuter)
),
// {
Expression.Block(
// bool found;
new[] { found },
// found = false;
Expression.Assign(
found,
Expression.Constant(false)
),
// {
Expression.Block(
// int d;
new [] { d },
// d = 2;
Expression.Assign(
d,
Expression.Constant(2)
),
// while (true)
Expression.Loop(
// {
Expression.Block(
// if
Expression.IfThen(
// (!(d <= Math.Sqrt(n)))
Expression.Invoke(
(Expression<Func<int, int, bool>>)((x, y) => !(x <= Math.Sqrt(y))),
d,
n
),
// break;
Expression.Break(breakInner)
),
// {
Expression.Block(
// if (n % d == 0)
Expression.IfThen(
Expression.Invoke(
(Expression<Func<int, int, bool>>)((x, y) => x % y == 0),
n,
d
),
// {
Expression.Block(
// found = true;
Expression.Assign(
found,
Expression.Constant(true)
),
// break;
Expression.Break(breakInner)
// }
)
)
// }
),
// d++;
Expression.PostIncrementAssign(d)
// }
),
breakInner
)
),
// if
Expression.IfThen(
// (!found)
Expression.Not(found),
// res.Add(n);
Expression.Invoke(
(Expression<Action<List<int>, int>>)((l, e) => l.Add(e)),
res,
n
)
)
),
// n++;
Expression.PostIncrementAssign(n)
// }
),
breakOuter
)
),
res
),
to
// }
).Compile();
Still quite involved to write, but a bit simpler than our previous attempt. As an exercise, identify a few other potential rewrite sites in the fragment above. Have fun!
Del.icio.us |
Digg It |
Technorati |
Blinklist |
Furl |
reddit |
DotNetKicks
Filed under: LINQ, C# 4.0, .NET Framework v4.0