September 2009 - Posts

Introduction

It’s way too long ago I wrote about this side-project of mine, as I got side-tracked by other stuff both inside and outside the realm of LINQ (more about that some other time around). Last time, I showed how to put “the query pattern” to our hand by providing an implementation for the Where operator on top of some Theorem<T> type. Let’s recap that to set the scene:

/// <summary>
/// Strongly-typed theorem type for use with LINQ syntax.
/// </summary>
/// <typeparam name="T">Enviroment type over which the theorem is defined.</typeparam>
public class Theorem<T> : Theorem
{
/// <summary> /// Where query operator, used to add constraints to the theorem. /// </summary> /// <param name="constraint">Theorem constraint expression.</param> /// <returns>Theorem with the new constraint applied.</returns> public Theorem<T> Where(Expression<Func<T, bool>> constraint) { return new Theorem<T>(base.Context, base.Constraints.Concat(new List<LambdaExpression> { constraint })); } /// <summary> /// Solves the theorem. /// </summary> /// <returns>Environment type instance with properties set to theorem-satisfying values.</returns> public T Solve() { return base.Solve<T>(); } }

We explained how the non-generic base type keeps track of a set of constraints that have been declared by the user through use of the where query operator, turned into expression trees by the Where signature’s first parameter. This allows us to define constraints in a manner that’s easy on the eye:

using (var ctx = new Z3Context())
{
    ctx.Log = Console.Out;

    var theo = from t in ctx.NewTheorem<Symbols<int, int, int, int, int>>()
               where t.X1 - t.X2 >= 1
               where t.X1 - t.X2 <= 3
               where t.X1 == (2 * t.X3) + t.X5
               where t.X3 == t.X5
               where t.X2 == 6 * t.X4
               select t;
    
    var result = theo.Solve();
    Console.WriteLine(result);
}

The Symbols generic type is one way to create a set of symbols used within the declared theorem’s constraint (and to print the result if the constraints can be satisfied).

From the above you should be able to get an idea about the LambdaExpression instances that are kept in the Theorem<T>’s base type’s instance field, as collected through the successive calls to Where. Remaining question is how to turn the defined theorem into action, something that’s triggered through the call to Solve. This mystery is what we’ll reveal today.

 

Three steps to solve a theorem

LINQ to Z3 is in fact nothing more but a runtime translator of LINQ expression trees into Z3 calls. The Solve method lies at the heart of this translation and acts as the entry-point for the user. It’s similar to the nature of GetEnumerator in a typical LINQ provider, causing a translation of the query’s LINQ expression tree into the underlying query language (e.g. SQL, CAML, etc).

Three steps take place to make this work:

  • First of all, we need to establish an environment with the theorem’s symbols and inform Z3 about those. In other words, we map things like X1 to a Z3 counterpart, capturing the name and the type of the symbol (e.g. “T1”, int). We keep track of the Z3 targets (in fact those will be IntPtr pointers, see further) so that subsequent phases of the translation can refer to those when asserting constraints.
  • Secondly, constraints expressed through LINQ need to be translated into Z3 constraints, based on the established environment above. We call this phase the assert constraints phase which is implemented as an expression tree visitor pattern. Here the bulk of the implementation of LINQ to Z3 lives.
  • Finally, Z3 is queried for a model that satisfied the asserted constraints. If such a model exists, it’s still not ready for consumption by the end-user. To expose the solution in a user-friendly way, the symbols from the environment are mapped onto a .NET object that contains the values satisfying the expressed constraints. This phase is referred to as the get solution phase.

All of the above translates into the following implementation of Solve, on the Theorem base class:

/// <summary>
/// Solves the theorem using Z3.
/// </summary>
/// <typeparam name="T">Theorem environment type.</typeparam>
/// <returns>Result of solving the theorem; default(T) if the theorem cannot be satisfied.</returns>
protected T Solve<T>()
{
    // TODO: some debugging around issues with proper disposal of native resources…
    // using (Context context = _context.CreateContext())
    Context context = _context.CreateContext();
    {
        var environment = GetEnvironment<T>(context);

        AssertConstraints<T>(context, environment);

        Model model = null;
        if (context.CheckAndGetModel(ref model) != LBool.True)
            return default(T);

        return GetSolution<T>(model, environment);
    }
}

Recall from the previous post that _context is a Z3Context object that wraps the Z3 Config type. All we do there is creating a Z3 Context object which is the Z3 primitive to start defining a theorem on. Let’s have a brief look at what Z3’s Context type provides (simplified):

namespace Microsoft.Z3
{
    public class Context : IDisposable
    {
        public Context(Config config);

        public void AssertCnstr(IntPtr a);
public LBool Check(); public LBool CheckAndGetModel(ref Model m);
public IntPtr MkAdd(IntPtr arg1, IntPtr arg2); public IntPtr MkAnd(IntPtr arg1, IntPtr arg2); public IntPtr MkApp(IntPtr d, IntPtr[] args);

public IntPtr MkBoolType();



public IntPtr MkConst(string s, IntPtr ty);
public IntPtr MkFuncDecl(string s, IntPtr[] domain, IntPtr range);
public IntPtr MkGe(IntPtr arg1, IntPtr arg2); public IntPtr MkGt(IntPtr arg1, IntPtr arg2);
public IntPtr MkXor(IntPtr t1, IntPtr t2);
public IntPtr Simplify(IntPtr a);
… } }

As you can see, the way to communicate with the Z3 library is through IntPtr objects. Recall the managed Z3 library is a simple wrapper on top of the native library and the “handle” mechanism to represent Z3 objects is surfaced through IntPtr objects (corresponding to pointers ultimately). Various Mk* functions exist to build up expressions that are used to represent constraints. For example, one could write:

IntPtr cA = ctx.MkConst(“a”, ctx.MkBoolType());
IntPtr cb = ctx.MkConst(“b”, ctx.MkBoolType());
IntPtr aAndb = ctx.MkAnd(cA, cB);
ctx.AssertCnstr(aAndb);

This asserts the constraint “a && b” (in C# lingo), where a and b are Boolean values, to the Z3 context. Ultimately we can call CheckAndGetModel to see whether the constraints can be satisfied, and if so a Model is passed back to us:

Model model = null;
if (context.CheckAndGetModel(ref model) != LBool.True)

To consume the model, we can use the Model type in various ways:

namespace Microsoft.Z3
{
    public class Model : IDisposable
    {
        public void Display(TextWriter w);
        public void Display(TextWriter w, IntPtr v);
public IntPtr Eval(IntPtr __p1);
public bool GetBoolValueBool(IntPtr v);
… } }

Either we print the whole model or we evaluate expressions against the Model (amongst other options). Below is a sample of how a and b can be retrieved. Notice how cA and cB are used again to query the model:

bool a = model.GetBoolValueBool(model.Eval(cA));
bool b = model.GetBoolValueBool(model.Eval(cB));

All of the above is obviously a code fragment specialized to solve the constraint “a && b”, getting the result back as two Booleans. In LINQ to Z3 we need to generalize the steps above:

  • We’ll introduce an environment that maps every symbol from the theorem onto the underlying IntPtr retrieved from calling MkConst.
  • We’ll visit the expression trees that declare the constraints, generating Z3 expressions (e.g. via MkAnd) and asserting them on the Context.
  • We’ll extract the model’s values for the bindings in the environment to instantiate an object representing the solution (if any).

Let’s start by looking at the code to establish an environment.

 

Taking care of the environment

First we need to establish the environment from the T generic parameter type. E.g. in the aforementioned sample Symbols<…> is our starting point:

from t in ctx.NewTheorem<Symbols<int, int, int, int, int>>() …

But the user could equally well use different means to establish a bag of symbols to write a theorem against, e.g.:

from t in ctx.NewTheorem(new { x = default(bool), y = default(bool) }) …

We need to let Z3 know what the types are of the symbols we’re using here. Obviously we’re going to use reflection to extract this information as shown below:

/// <summary>
/// Maps the properties on the theorem environment type to Z3 handles for bound variables.
/// </summary>
/// <typeparam name="T">Theorem environment type to create a mapping table for.</typeparam>
/// <param name="context">Z3 context.</param>
/// <returns>Environment mapping table from .NET properties onto Z3 handles.</returns>
private static Dictionary<PropertyInfo, IntPtr> GetEnvironment<T>(Context context)
{
    var environment = new Dictionary<PropertyInfo, IntPtr>();

    //
    // All public properties are considered part of the theorem's environment.
    // Notice we can't require custom attribute tagging if we want the user to be able to
    // use anonymous types as a convenience solution.
    //
    foreach (var parameter in typeof(T).GetProperties(BindingFlags.Public | BindingFlags.Instance))
    {
        //
        // Normalize types when facing Z3. Theorem variable type mappings allow for strong
        // typing within the theorem, while underlying variable representations are Z3-
        // friendly types.
        //
        var parameterType = parameter.PropertyType;
        var parameterTypeMapping = (TheoremVariableTypeMappingAttribute)parameterType.GetCustomAttributes(typeof(TheoremVariableTypeMappingAttribute), false).SingleOrDefault();
        if (parameterTypeMapping != null)
            parameterType = parameterTypeMapping.RegularType;

        //
        // Map the environment onto Z3-compatible types.
        //
        if (parameterType == typeof(bool))
            environment.Add(parameter, context.MkConst(parameter.Name, context.MkBoolType()));
        else if (parameterType == typeof(int))
            environment.Add(parameter, context.MkConst(parameter.Name, context.MkIntType()));
        else
            throw new NotSupportedException("Unsupported parameter type for " + parameter.Name + ".");
    }

    return environment;
}

To understand the above, start from the method’s contract: given a Z3 context, we’re going to return a dictionary that maps PropertyInfo objects onto IntPtr objects representing the symbols introduced on the Z3 context. The code is fairly straightforward to understand and only cares about properties (though one could extend it to allow fields as well). Ignore the TheoremVariableTypeMapping part, which can be dropped from the code for the time being, which will be explained in subsequent posts. Notice we only support Z3’s Bool and Int types.

This was easy, wasn’t it? Next, the meat of the translator: translating and asserting constraints.

 

Asserting constraints

Asserting the constraints from the LINQ expression trees onto Z3 constraints is the next, and most complicated, step. The entry-point to the method responsible for this looks as follows:

/// <summary>
/// Asserts the theorem constraints on the Z3 context.
/// </summary>
/// <param name="context">Z3 context.</param>
/// <param name="environment">Environment with bindings of theorem variables to Z3 handles.</param>
/// <typeparam name="T">Theorem environment type.</typeparam>
private void AssertConstraints<T>(Context context, Dictionary<PropertyInfo, IntPtr> environment)
{
    var constraints = _constraints;

    //
    // Global rewwriter registered?
    //
    var rewriterAttr = (TheoremGlobalRewriterAttribute)typeof(T).GetCustomAttributes(typeof(TheoremGlobalRewriterAttribute), false).SingleOrDefault();
    if (rewriterAttr != null)
    {
        //
        // Make sure the specified rewriter type implements the ITheoremGlobalRewriter.
        //
        var rewriterType = rewriterAttr.RewriterType;
        if (!typeof(ITheoremGlobalRewriter).IsAssignableFrom(rewriterType))
            throw new InvalidOperationException("Invalid global rewriter type definition. Did you implement ITheoremGlobalRewriter?");

        //
        // Assume a parameterless public constructor to new up the rewriter.
        //
        var rewriter = (ITheoremGlobalRewriter)Activator.CreateInstance(rewriterType);

        //
        // Do the rewrite.
        //
        constraints = rewriter.Rewrite(constraints);
    }

    //
    // Visit, assert and log.
    //
    foreach (var constraint in constraints)
    {
        IntPtr c = Visit(context, environment, constraint.Body, constraint.Parameters[0]);
        context.AssertCnstr(c);

        _context.LogWriteLine(context.ToString(c));
    }
}

I’m not holding any information back, so some of the above can be ignored for the time being: mask away all of the code with the word rewriter in it, as it will be explained next time around. (To satisfy your curiosity, rewriters allow to create domain-specific constraint languages, e.g. to solve a Kakuro puzzle.)

The essence of the code lives in the foreach-loop that translates the lambda expression-based constraints by a call to the Visit method, returning a Z3 expression for the constraint which is subsequently asserted on the Z3 context. Obviously Visit is where all the magic happens. To do so, it gets four parameters:

  • The Z3 context, to call Mk* functions on during the translation.
  • Our environment, to be able to map occurrences of symbols (expressed as properties) onto the introduced Z3 corresponding objects.
  • Obviously the lambda expression’s body to translate (cf. Expression<Func<T, bool>> constraint).
  • And finally, the lambda expression’s parameter expression object (so that calls on it can be recognized, e.g. symbols => … symbols.X1 …).

On to the visitor now. Before we do so, let’s think a bit about the operations we want to support on constraints: arithmetic (+, -, *, /, %), logic (&&, ||, !) and relational (<, <=, >, >=, ==, !=). There’ll be a few more as we shall see in a minute. As usual with expression trees, lots of switch-statements will be attending the party:

/// <summary>
/// Main visitor method to translate the LINQ expression tree into a Z3 expression handle.
/// </summary>
/// <param name="context">Z3 context.</param>
/// <param name="environment">Environment with bindings of theorem variables to Z3 handles.</param>
/// <param name="expression">LINQ expression tree node to be translated.</param>
/// <param name="param">Parameter used to express the constraint on.</param>
/// <returns>Z3 expression handle.</returns>
private IntPtr Visit(Context context, Dictionary<PropertyInfo, IntPtr> environment, Expression expression, ParameterExpression param)
{
    //
    // Largely table-driven mechanism, providing constructor lambdas to generic Visit*
    // methods, classified by type and arity.
    //
    switch (expression.NodeType)
    {
        case ExpressionType.And:
        case ExpressionType.AndAlso:
            return VisitBinary(context, environment, (BinaryExpression)expression, param, (ctx, a, b) => ctx.MkAnd(a, b));
        case ExpressionType.Or:
        case ExpressionType.OrElse:
            return VisitBinary(context, environment, (BinaryExpression)expression, param, (ctx, a, b) => ctx.MkOr(a, b));
        case ExpressionType.ExclusiveOr:
            return VisitBinary(context, environment, (BinaryExpression)expression, param, (ctx, a, b) => ctx.MkXor(a, b));
        case ExpressionType.Not:
            return VisitUnary(context, environment, (UnaryExpression)expression, param, (ctx, a) => ctx.MkNot(a));
        case ExpressionType.Negate:
        case ExpressionType.NegateChecked:
            return VisitUnary(context, environment, (UnaryExpression)expression, param, (ctx, a) => ctx.MkUnaryMinus(a));
        case ExpressionType.Add:
        case ExpressionType.AddChecked:
            return VisitBinary(context, environment, (BinaryExpression)expression, param, (ctx, a, b) => ctx.MkAdd(a, b));
        case ExpressionType.Subtract:
        case ExpressionType.SubtractChecked:
            return VisitBinary(context, environment, (BinaryExpression)expression, param, (ctx, a, b) => ctx.MkSub(a, b));
        case ExpressionType.Multiply:
        case ExpressionType.MultiplyChecked:
            return VisitBinary(context, environment, (BinaryExpression)expression, param, (ctx, a, b) => ctx.MkMul(a, b));
        case ExpressionType.Divide:
            return VisitBinary(context, environment, (BinaryExpression)expression, param, (ctx, a, b) => ctx.MkDiv(a, b));
        case ExpressionType.Modulo:
            return VisitBinary(context, environment, (BinaryExpression)expression, param, (ctx, a, b) => ctx.MkRem(a, b));
        case ExpressionType.LessThan:
            return VisitBinary(context, environment, (BinaryExpression)expression, param, (ctx, a, b) => ctx.MkLt(a, b));
        case ExpressionType.LessThanOrEqual:
            return VisitBinary(context, environment, (BinaryExpression)expression, param, (ctx, a, b) => ctx.MkLe(a, b));
        case ExpressionType.GreaterThan:
            return VisitBinary(context, environment, (BinaryExpression)expression, param, (ctx, a, b) => ctx.MkGt(a, b));
        case ExpressionType.GreaterThanOrEqual:
            return VisitBinary(context, environment, (BinaryExpression)expression, param, (ctx, a, b) => ctx.MkGe(a, b));
        case ExpressionType.Equal:
            return VisitBinary(context, environment, (BinaryExpression)expression, param, (ctx, a, b) => ctx.MkEq(a, b));
        case ExpressionType.NotEqual:
            return VisitBinary(context, environment, (BinaryExpression)expression, param, (ctx, a, b) => ctx.MkNot(ctx.MkEq(a, b)));
        case ExpressionType.MemberAccess:
            return VisitMember(environment, (MemberExpression)expression, param);
        case ExpressionType.Constant:
            return VisitConstant(context, (ConstantExpression)expression);
        case ExpressionType.Call:
            return VisitCall(context, environment, (MethodCallExpression)expression, param);
        default:
            throw new NotSupportedException("Unsupported expression node type encountered: " + expression.NodeType);
    }
}

Notice the above is made as declarative as possible: most expression tree node types are mapped on either unary or binary specialized visitor methods, passing in a lambda expression that acts as the “Z3 constructor”. For example, And (for & in C#) uses a “(ctx, a, b) => ctx.MkAnd(a, b)” argument. The VisitBinary method will be responsible to recursively translate the left-hand side and right-hand side of the And operation (e.g. x & y, where x and y can be arbitrarily complex BLOCKED EXPRESSION, ultimately feeding the resulting Z3-returned IntPtr objects into the supplied lambda expression.

Next to the VisitBinary and VisitUnary calls, there are specialized calls for the MemberAccess, Constant and Call expression tree nodes. We’ll defer the discussion of those for a moment and focus on binary and unary calls first:

/// <summary>
/// Visitor method to translate a binary expression.
/// </summary>
/// <param name="context">Z3 context.</param>
/// <param name="environment">Environment with bindings of theorem variables to Z3 handles.</param>
/// <param name="expression">Binary expression.</param>
/// <param name="ctor">Constructor to combine recursive visitor results.</param>
/// <param name="param">Parameter used to express the constraint on.</param>
/// <returns>Z3 expression handle.</returns>
private IntPtr VisitBinary(Context context, Dictionary<PropertyInfo, IntPtr> environment, BinaryExpression expression, ParameterExpression param, Func<Context, IntPtr, IntPtr, IntPtr> ctor)
{
    return ctor(context, Visit(context, environment, expression.Left, param), Visit(context, environment, expression.Right, param));
}

/// <summary>
/// Visitor method to translate a unary expression.
/// </summary>
/// <param name="context">Z3 context.</param>
/// <param name="environment">Environment with bindings of theorem variables to Z3 handles.</param>
/// <param name="expression">Unary expression.</param>
/// <param name="ctor">Constructor to combine recursive visitor results.</param>
/// <param name="param">Parameter used to express the constraint on.</param>
/// <returns>Z3 expression handle.</returns>
private IntPtr VisitUnary(Context context, Dictionary<PropertyInfo, IntPtr> environment, UnaryExpression expression, ParameterExpression param, Func<Context, IntPtr, IntPtr> ctor)
{
    return ctor(context, Visit(context, environment, expression.Operand, param));
}

More documentation than anything else :-). Those methods simply visit the expression’s operands (e.g. Left, Right for a binary expression, Operand for a unary one) recursively. The obtained results, of type IntPtr, are fed into the “ctor” delegate parameter to obtain the final result. Trivial stuff.

Now the more complicated visitor methods for members, constants and method calls. First, constants:

/// <summary>
/// Visitor method to translate a constant expression.
/// </summary>
/// <param name="context">Z3 context.</param>
/// <param name="constant">Constant expression.</param>
/// <returns>Z3 expression handle.</returns>
private static IntPtr VisitConstant(Context context, ConstantExpression constant)
{
    if (constant.Type == typeof(int))
        return context.MkNumeral((int)constant.Value, context.MkIntType());
    else if (constant.Type == typeof(bool))
        return (bool)constant.Value ? context.MkTrue() : context.MkFalse();

    throw new NotSupportedException("Unsupported constant type.");
}

This code should be easy enough to understand as well: looking at the type of the constant value, different Z3 calls are made based on the Value property’s value.

Members get a tiny bit more involved, as that’s where we need to recognize property accesses for symbols. In other words, we need to recognize the

/// <summary>
/// Visitor method to translate a member expression.
/// </summary>
/// <param name="environment">Environment with bindings of theorem variables to Z3 handles.</param>
/// <param name="member">Member expression.</param>
/// <param name="param">Parameter used to express the constraint on.</param>
/// <returns>Z3 expression handle.</returns>
private static IntPtr VisitMember(Dictionary<PropertyInfo, IntPtr> environment, MemberExpression member, ParameterExpression param)
{
    //
    // E.g. Symbols l = ...;
    //      theorem.Where(s => l.X1)
    //                         ^^
    //
    if (member.Expression != param)
        throw new NotSupportedException("Encountered member access not targeting the constraint parameter.");

    //
    // Only members we allow currently are direct accesses to the theorem's variables
    // in the environment type. So we just try to find the mapping from the environment
    // bindings table.
    //
    PropertyInfo property;
    IntPtr value;
    if ((property = member.Member as PropertyInfo) == null
        || !environment.TryGetValue(property, out value))
        throw new NotSupportedException("Unknown parameter encountered: " + member.Member.Name + ".");

    return value;
}

First we make sure the user didn’t access something outside the realm of the constraint parameter, e.g.:

Symbols<int, int, int, int, int> oops = new Symbols<int, int, int, int, int>();
var theo = from t in ctx.NewTheorem<Symbols<int, int, int, int, int>>()
           where t.X1 - oops.X2 >= 1
           where t.X1 - t.X2 <= 3
           where t.X1 == (2 * t.X3) + t.X5
           where t.X3 == t.X5
           where t.X2 == 6 * t.X4
           select t;

In essence this is some kind of scope check with regards to the expressed theorem. Recall the above corresponds to calls to the Where method that look as follows:

ctx.NewTheorem<Symbols<int, int, int, int, int>>() .Where(t => t.X1 - oops.X2 >= 1)…

We’ve captured the lambda expression parameter “t” in the AssertConstraints method and carry it on throughout the visitor so we can check it’s used as the left-hand side inside constraint expressions.

Note: This restriction limits expressiveness quite a bit. For example, capturing an outer variable gives rise to a closure to be created, which is accessed through field accesses (of type MemberExpression). Such potentially valid uses get rejected by the code above. To restore such functionality, one can employ partial compilation using LambdaExpression.Compile, after checking the expression doesn’t contain the lambda parameter. The essence is shown below:

if (member.Expression != param)
{
    if (!IsLambdaParameterFree(member.Expression, param))
        throw new NotSupportedException(“Unsupported member expression using constraint parameter.”);

    // Evaluation outside Z3.
    object value = Expression.Lambda(member.Expression).Compile().DynamicInvoke();
    return VisitConstant(context, Expression.Constant(value));
}

We leave the implementation of such techniques to the reader, together with the task to hook this logic up in al the appropriate places (default case of Visit, in VisitCall further on, etc).

VisitMember finally consults the environment to map the property access back to the Z3 object generated in the GetEnvironment call.

Finally, we have a VisitCall method as well. Once more, I’ve kept some advanced rewriter stuff in here (to be explained in a subsequent post), so ignore that for the time being. The reason we implement VisitCall is to provide a means to expose advanced Z3 constructs efficiently, e.g. the Distinct operator. To do so, a Z3Methods class is provided:

/// <summary>
/// Z3 predicate methods.
/// </summary>
public static class Z3Methods
{
    /// <summary>
    /// Creates a predicate constraining the given symbols as distinct values.
    /// </summary>
    /// <typeparam name="T">Type of the parameters.</typeparam>
    /// <param name="symbols">Symbols that are required to be distinct.</param>
    /// <returns>Predicate return value.</returns>
    /// <remarks>This method should only be used within LINQ expressions.</remarks>
    public static bool Distinct<T>(params T[] symbols /* type? */)
    {
        throw new NotSupportedException("This method should only be used in query expressions.");
    }
}

This is very similar to the SqlMethods class of LINQ to SQL, and available for future extensions. VisitCall has the simple task to recognize calls to Z3Methods methods, turning them into the corresponding Z3 primitives.

/// <summary>
/// Visitor method to translate a method call expression.
/// </summary>
/// <param name="context">Z3 context.</param>
/// <param name="environment">Environment with bindings of theorem variables to Z3 handles.</param>
/// <param name="call">Method call expression.</param>
/// <param name="param">Parameter used to express the constraint on.</param>
/// <returns>Z3 expression handle.</returns>
private IntPtr VisitCall(Context context, Dictionary<PropertyInfo, IntPtr> environment, MethodCallExpression call, ParameterExpression param)
{
    var method = call.Method;

    //
    // Does the method have a rewriter attribute applied?
    //
    var rewriterAttr = (TheoremPredicateRewriterAttribute)method.GetCustomAttributes(typeof(TheoremPredicateRewriterAttribute), false).SingleOrDefault();
    if (rewriterAttr != null)
    {
        //
        // Make sure the specified rewriter type implements the ITheoremPredicateRewriter.
        //
        var rewriterType = rewriterAttr.RewriterType;
        if (!typeof(ITheoremPredicateRewriter).IsAssignableFrom(rewriterType))
            throw new InvalidOperationException("Invalid predicate rewriter type definition. Did you implement ITheoremPredicateRewriter?");

        //
        // Assume a parameterless public constructor to new up the rewriter.
        //
        var rewriter = (ITheoremPredicateRewriter)Activator.CreateInstance(rewriterType);

        //
        // Make sure we don't get stuck when the rewriter just returned its input. Valid
        // rewriters should satisfy progress guarantees.
        //
        var result = rewriter.Rewrite(call);
        if (result == call)
            throw new InvalidOperationException("The expression tree rewriter of type " + rewriterType.Name + " did not perform any rewrite. Aborting compilation to avoid infinite looping.");

        //
        // Visit the rewritten expression.
        //
        return Visit(context, environment, result, param);
    }

    //
    // Filter for known Z3 operators.
    //
    if (method.IsGenericMethod && method.GetGenericMethodDefinition() == typeof(Z3Methods).GetMethod("Distinct"))
    {
        //
        // We know the signature of the Distinct method call. Its argument is a params
        // array, hence we expect a NewArrayExpression.
        //
        var arr = (NewArrayExpression)call.Arguments[0];
        var args = from arg in arr.Expressions select Visit(context, environment, arg, param);
        return context.MkDistinct(args.ToArray());
    }
    else
        throw new NotSupportedException("Unknown method call:" + method.ToString());
}

In particular, a call to Distinct is processed by inspecting its argument corresponding to the params-array, recursively translating any of its elements (e.g. Z3Methods.Distinct(1, 2, s.X1, s.X2) will give rise to two constant expressions and two member accesses, but more complex expressions can occur as the arguments as well…). Ultimately, MkDistinct is called and all is happy.

Now that we’re done investigating the visitor, we can have a look at the final stage of the LINQ to Z3-driven theorem solving, the get solution phase.

 

Getting the solution

To expose the solution to the user, we new up a Plain Old CLR Object (POCO) of the theorem’s generic parameter type T. All there’s left to do here is to use some reflection voodoo to instantiate T, populate its properties somehow, and return the object. Different situations can occur, based on the way the type is defined. In particular, use of an anonymous type gives some problems as there’s no formal correlation between the generated constructor and the type’s properties (which are get-only, so we can’t assign values directly). This isn’t super-exciting stuff, but nonetheless:

/// <summary>
/// Gets the solution object for the solved theorem.
/// </summary>
/// <typeparam name="T">Environment type to create an instance of.</typeparam>
/// <param name="model">Z3 model to evaluate theorem parameters under.</param>
/// <param name="environment">Environment with bindings of theorem variables to Z3 handles.</param>
/// <returns>Instance of the enviroment type with theorem-satisfying values.</returns>
private static T GetSolution<T>(Model model, Dictionary<PropertyInfo, IntPtr> environment)
{
    Type t = typeof(T);

    //
    // Determine whether T is a compiler-generated type, indicating an anonymous type.
    // This check might not be reliable enough but works for now.
    //
    if (t.GetCustomAttributes(typeof(CompilerGeneratedAttribute), false).Any())
    {
        //
        // Anonymous types have a constructor that takes in values for all its properties.
        // However, we don't know the order and it's hard to correlate back the parameters
        // to the underlying properties. So, we want to bypass that constructor altogether
        // by using the FormatterServices to create an uninitialized (all-zero) instance.
        //
        T result = (T)FormatterServices.GetUninitializedObject(t);

        //
        // Here we take advantage of undesirable knowledge on how anonymous types are
        // implemented by the C# compiler. This is risky but we can live with it for
        // now in this POC. Because the properties are get-only, we need to perform
        // nominal matching with the corresponding backing fields.
        //
        var fields = t.GetFields(BindingFlags.NonPublic | BindingFlags.Instance);
        foreach (var parameter in environment.Keys)
        {
            //
            // Mapping from property to field.
            //
            var field = fields.Where(f => f.Name.StartsWith("<" + parameter.Name + ">")).SingleOrDefault();

            //
            // Evaluation of the values though the handle in the environment bindings.
            //
            IntPtr val = model.Eval(environment[parameter]);
            if (parameter.PropertyType == typeof(bool))
                field.SetValue(result, model.GetBoolValueBool(val));
            else if (parameter.PropertyType == typeof(int))
                field.SetValue(result, model.GetNumeralValueInt(val));
            else
                throw new NotSupportedException("Unsupported parameter type for " + parameter.Name + ".");
        }

        return result;
    }
    else
    {
        //
        // Straightforward case of having an "onymous type" at hand.
        //
        T result = Activator.CreateInstance<T>();

        foreach (var parameter in environment.Keys)
        {
            //
            // Normalize types when facing Z3. Theorem variable type mappings allow for strong
            // typing within the theorem, while underlying variable representations are Z3-
            // friendly types.
            //
            var parameterType = parameter.PropertyType;
            var parameterTypeMapping = (TheoremVariableTypeMappingAttribute)parameterType.GetCustomAttributes(typeof(TheoremVariableTypeMappingAttribute), false).SingleOrDefault();
            if (parameterTypeMapping != null)
                parameterType = parameterTypeMapping.RegularType;

            //
            // Evaluation of the values though the handle in the environment bindings.
            //
            IntPtr val = model.Eval(environment[parameter]);
            object value;
            if (parameterType == typeof(bool))
                value = model.GetBoolValueBool(val);
            else if (parameterType == typeof(int))
                value = model.GetNumeralValueInt(val);
            else
                throw new NotSupportedException("Unsupported parameter type for " + parameter.Name + ".");

            //
            // If there was a type mapping, we need to convert back to the original type.
            // In that case we expect a constructor with the mapped type to be available.
            //
            if (parameterTypeMapping != null)
            {
                var ctor = parameter.PropertyType.GetConstructor(new Type[] { parameterType });
                if (ctor == null)
                    throw new InvalidOperationException("Could not construct an instance of the mapped type " + parameter.PropertyType.Name + ". No public constructor with parameter type " + parameterType + " found.");

                value = ctor.Invoke(new object[] { value });
            }

            parameter.SetValue(result, value, null);
        }

        return result;
    }
}

Focus on the non-anonymous type case at the bottom, ignoring parts on type mappings (part of the stuff to be explained another time). The essence is plain simple:

  • Create an instance of the type.
  • Go over the environment and:
    • Extract the value from the Z3 model using the environment’s stored Z3 object.
    • Assign it to the property.

Notice get-only properties are not supported, though one could look at LINQ to SQL as a source of inspiration to deal with such cases as well, if desired.

 

Samples

A few samples of simple theorems using the infrastructure provided above are shown below:

 

using (var ctx = new Z3Context())
{
    ctx.Log = Console.Out;

    Print(from t in ctx.NewTheorem(new { x = default(bool) })
          where t.x && !t.x
          select t);

    Print(from t in ctx.NewTheorem(new { x = default(bool), y = default(bool) })
          where t.x ^ t.y
          select t);

    Print(from t in ctx.NewTheorem(new { x = default(int), y = default(int) })
          where t.x < t.y + 1
          where t.x > 2
          select t);

    Print(from t in ctx.NewTheorem<Symbols<int, int>>()
          where t.X1 < t.X2 + 1
          where t.X1 > 2
          where t.X1 != t.X2
          select t);

    Print(from t in ctx.NewTheorem<Symbols<int, int, int, int, int>>()
          where t.X1 - t.X2 >= 1
          where t.X1 - t.X2 <= 3
          where t.X1 == (2 * t.X3) + t.X5
          where t.X3 == t.X5
          where t.X2 == 6 * t.X4
          select t);

    Print(from t in ctx.NewTheorem<Symbols<int, int>>()
          where Z3Methods.Distinct(t.X1, t.X2)
          select t);
}

The reader should be able to trace through those samples mentally without much trouble. The Print method is shown below:

private static void Print<T>(Theorem<T> t) where T : class
{
    Console.WriteLine(t);
    var res = t.Solve();
    Console.WriteLine(res == null ? "none" : res.ToString());
    Console.WriteLine();
}

Given all of this, the output looks as follows:

image

Sweet! Notice the use of the Log property on the context object causes the constraints to be emitted to the screen, in LISP-style prefix notation for operators and operands.

 

Sneak peak

Given the baseline infrastructure we covered in this post, we can start adding layers of abstraction over it to make more advanced theorems easier to express. An example is a Sudoku puzzle:

var s1 = from t in Sudoku.Create(ctx)
         where t.Cell13 == 2 && t.Cell16 == 1 && t.Cell18 == 6
         where t.Cell23 == 7 && t.Cell26 == 4 
         where t.Cell31 == 5 && t.Cell37 == 9
         where t.Cell42 == 1 && t.Cell44 == 3
         where t.Cell51 == 8 && t.Cell55 == 5 && t.Cell59 == 4
         where t.Cell66 == 6 && t.Cell68 == 2
         where t.Cell73 == 6 && t.Cell79 == 7
         where t.Cell84 == 8 && t.Cell87 == 3
         where t.Cell92 == 4 && t.Cell94 == 9 && t.Cell97 == 2
         select t;

Console.WriteLine(s1);
Console.WriteLine(s1.Solve());
Console.WriteLine();

Or more compactly represented as a string:

var s2 = Sudoku.Parse(ctx, @"..2..1.6.
                             ..7..4...
                             5.....9..
                             .1.3.....
                             8...5...4
                             .....6.2.
                             ..6.....7
                             ...8..3..
                             .4.9..2..");
Console.WriteLine(s2);
Console.WriteLine(s2.Solve());
Console.WriteLine();

Here we’ve created a domain-specific theorem type, Sudoku, that’s responsible to provide parts of the constraints. In fact, a Sudoku has much more constraints than the ones specified by the user: rows, columns and blocks need to contain unique values from 1-9, and every cell has to be within that range. For the innocent-looking Sudoku above, we get the following set of constraints:

((1 <= p.Cell11) && (p.Cell11 <= 9)),
((1 <= p.Cell12) && (p.Cell12 <= 9)),
((1 <= p.Cell13) && (p.Cell13 <= 9)),
((1 <= p.Cell14) && (p.Cell14 <= 9)),
((1 <= p.Cell15) && (p.Cell15 <= 9)),
((1 <= p.Cell16) && (p.Cell16 <= 9)),
((1 <= p.Cell17) && (p.Cell17 <= 9)),
((1 <= p.Cell18) && (p.Cell18 <= 9)),
((1 <= p.Cell19) && (p.Cell19 <= 9)),
((1 <= p.Cell21) && (p.Cell21 <= 9)),
((1 <= p.Cell22) && (p.Cell22 <= 9)),
((1 <= p.Cell23) && (p.Cell23 <= 9)),
((1 <= p.Cell24) && (p.Cell24 <= 9)),
((1 <= p.Cell25) && (p.Cell25 <= 9)),
((1 <= p.Cell26) && (p.Cell26 <= 9)),
((1 <= p.Cell27) && (p.Cell27 <= 9)),
((1 <= p.Cell28) && (p.Cell28 <= 9)),
((1 <= p.Cell29) && (p.Cell29 <= 9)),
((1 <= p.Cell31) && (p.Cell31 <= 9)),
((1 <= p.Cell32) && (p.Cell32 <= 9)),
((1 <= p.Cell33) && (p.Cell33 <= 9)),
((1 <= p.Cell34) && (p.Cell34 <= 9)),
((1 <= p.Cell35) && (p.Cell35 <= 9)),
((1 <= p.Cell36) && (p.Cell36 <= 9)),
((1 <= p.Cell37) && (p.Cell37 <= 9)),
((1 <= p.Cell38) && (p.Cell38 <= 9)),
((1 <= p.Cell39) && (p.Cell39 <= 9)),
((1 <= p.Cell41) && (p.Cell41 <= 9)),
((1 <= p.Cell42) && (p.Cell42 <= 9)),
((1 <= p.Cell43) && (p.Cell43 <= 9)),
((1 <= p.Cell44) && (p.Cell44 <= 9)),
((1 <= p.Cell45) && (p.Cell45 <= 9)),
((1 <= p.Cell46) && (p.Cell46 <= 9)),
((1 <= p.Cell47) && (p.Cell47 <= 9)),
((1 <= p.Cell48) && (p.Cell48 <= 9)),
((1 <= p.Cell49) && (p.Cell49 <= 9)),
((1 <= p.Cell51) && (p.Cell51 <= 9)),
((1 <= p.Cell52) && (p.Cell52 <= 9)),
((1 <= p.Cell53) && (p.Cell53 <= 9)),
((1 <= p.Cell54) && (p.Cell54 <= 9)),
((1 <= p.Cell55) && (p.Cell55 <= 9)),
((1 <= p.Cell56) && (p.Cell56 <= 9)),
((1 <= p.Cell57) && (p.Cell57 <= 9)),
((1 <= p.Cell58) && (p.Cell58 <= 9)),
((1 <= p.Cell59) && (p.Cell59 <= 9)),
((1 <= p.Cell61) && (p.Cell61 <= 9)),
((1 <= p.Cell62) && (p.Cell62 <= 9)),
((1 <= p.Cell63) && (p.Cell63 <= 9)),
((1 <= p.Cell64) && (p.Cell64 <= 9)),
((1 <= p.Cell65) && (p.Cell65 <= 9)),
((1 <= p.Cell66) && (p.Cell66 <= 9)),
((1 <= p.Cell67) && (p.Cell67 <= 9)),
((1 <= p.Cell68) && (p.Cell68 <= 9)),
((1 <= p.Cell69) && (p.Cell69 <= 9)),
((1 <= p.Cell71) && (p.Cell71 <= 9)),
((1 <= p.Cell72) && (p.Cell72 <= 9)),
((1 <= p.Cell73) && (p.Cell73 <= 9)),
((1 <= p.Cell74) && (p.Cell74 <= 9)),
((1 <= p.Cell75) && (p.Cell75 <= 9)),
((1 <= p.Cell76) && (p.Cell76 <= 9)),
((1 <= p.Cell77) && (p.Cell77 <= 9)),
((1 <= p.Cell78) && (p.Cell78 <= 9)),
((1 <= p.Cell79) && (p.Cell79 <= 9)),
((1 <= p.Cell81) && (p.Cell81 <= 9)),
((1 <= p.Cell82) && (p.Cell82 <= 9)),
((1 <= p.Cell83) && (p.Cell83 <= 9)),
((1 <= p.Cell84) && (p.Cell84 <= 9)),
((1 <= p.Cell85) && (p.Cell85 <= 9)),
((1 <= p.Cell86) && (p.Cell86 <= 9)),
((1 <= p.Cell87) && (p.Cell87 <= 9)),
((1 <= p.Cell88) && (p.Cell88 <= 9)),
((1 <= p.Cell89) && (p.Cell89 <= 9)),
((1 <= p.Cell91) && (p.Cell91 <= 9)),
((1 <= p.Cell92) && (p.Cell92 <= 9)),
((1 <= p.Cell93) && (p.Cell93 <= 9)),
((1 <= p.Cell94) && (p.Cell94 <= 9)),
((1 <= p.Cell95) && (p.Cell95 <= 9)),
((1 <= p.Cell96) && (p.Cell96 <= 9)),
((1 <= p.Cell97) && (p.Cell97 <= 9)),
((1 <= p.Cell98) && (p.Cell98 <= 9)),
((1 <= p.Cell99) && (p.Cell99 <= 9)),
Distinct(new [] {p.Cell11, p.Cell12, p.Cell13, p.Cell14, p.Cell15, p.Cell16, p.Cell17, p.Cell18, p.Cell19}),
Distinct(new [] {p.Cell21, p.Cell22, p.Cell23, p.Cell24, p.Cell25, p.Cell26, p.Cell27, p.Cell28, p.Cell29}),
Distinct(new [] {p.Cell31, p.Cell32, p.Cell33, p.Cell34, p.Cell35, p.Cell36, p.Cell37, p.Cell38, p.Cell39}),
Distinct(new [] {p.Cell41, p.Cell42, p.Cell43, p.Cell44, p.Cell45, p.Cell46, p.Cell47, p.Cell48, p.Cell49}),
Distinct(new [] {p.Cell51, p.Cell52, p.Cell53, p.Cell54, p.Cell55, p.Cell56, p.Cell57, p.Cell58, p.Cell59}),
Distinct(new [] {p.Cell61, p.Cell62, p.Cell63, p.Cell64, p.Cell65, p.Cell66, p.Cell67, p.Cell68, p.Cell69}),
Distinct(new [] {p.Cell71, p.Cell72, p.Cell73, p.Cell74, p.Cell75, p.Cell76, p.Cell77, p.Cell78, p.Cell79}),
Distinct(new [] {p.Cell81, p.Cell82, p.Cell83, p.Cell84, p.Cell85, p.Cell86, p.Cell87, p.Cell88, p.Cell89}),
Distinct(new [] {p.Cell91, p.Cell92, p.Cell93, p.Cell94, p.Cell95, p.Cell96, p.Cell97, p.Cell98, p.Cell99}),
Distinct(new [] {p.Cell11, p.Cell21, p.Cell31, p.Cell41, p.Cell51, p.Cell61, p.Cell71, p.Cell81, p.Cell91}),
Distinct(new [] {p.Cell12, p.Cell22, p.Cell32, p.Cell42, p.Cell52, p.Cell62, p.Cell72, p.Cell82, p.Cell92}),
Distinct(new [] {p.Cell13, p.Cell23, p.Cell33, p.Cell43, p.Cell53, p.Cell63, p.Cell73, p.Cell83, p.Cell93}),
Distinct(new [] {p.Cell14, p.Cell24, p.Cell34, p.Cell44, p.Cell54, p.Cell64, p.Cell74, p.Cell84, p.Cell94}),
Distinct(new [] {p.Cell15, p.Cell25, p.Cell35, p.Cell45, p.Cell55, p.Cell65, p.Cell75, p.Cell85, p.Cell95}),
Distinct(new [] {p.Cell16, p.Cell26, p.Cell36, p.Cell46, p.Cell56, p.Cell66, p.Cell76, p.Cell86, p.Cell96}),
Distinct(new [] {p.Cell17, p.Cell27, p.Cell37, p.Cell47, p.Cell57, p.Cell67, p.Cell77, p.Cell87, p.Cell97}),
Distinct(new [] {p.Cell18, p.Cell28, p.Cell38, p.Cell48, p.Cell58, p.Cell68, p.Cell78, p.Cell88, p.Cell98}),
Distinct(new [] {p.Cell19, p.Cell29, p.Cell39, p.Cell49, p.Cell59, p.Cell69, p.Cell79, p.Cell89, p.Cell99}),
Distinct(new [] {p.Cell11, p.Cell12, p.Cell13, p.Cell21, p.Cell22, p.Cell23, p.Cell31, p.Cell32, p.Cell33}),
Distinct(new [] {p.Cell14, p.Cell15, p.Cell16, p.Cell24, p.Cell25, p.Cell26, p.Cell34, p.Cell35, p.Cell36}),
Distinct(new [] {p.Cell17, p.Cell18, p.Cell19, p.Cell27, p.Cell28, p.Cell29, p.Cell37, p.Cell38, p.Cell39}),
Distinct(new [] {p.Cell41, p.Cell42, p.Cell43, p.Cell51, p.Cell52, p.Cell53, p.Cell61, p.Cell62, p.Cell63}),
Distinct(new [] {p.Cell44, p.Cell45, p.Cell46, p.Cell54, p.Cell55, p.Cell56, p.Cell64, p.Cell65, p.Cell66}),
Distinct(new [] {p.Cell47, p.Cell48, p.Cell49, p.Cell57, p.Cell58, p.Cell59, p.Cell67, p.Cell68, p.Cell69}),
Distinct(new [] {p.Cell71, p.Cell72, p.Cell73, p.Cell81, p.Cell82, p.Cell83, p.Cell91, p.Cell92, p.Cell93}),
Distinct(new [] {p.Cell74, p.Cell75, p.Cell76, p.Cell84, p.Cell85, p.Cell86, p.Cell94, p.Cell95, p.Cell96}),
Distinct(new [] {p.Cell77, p.Cell78, p.Cell79, p.Cell87, p.Cell88, p.Cell89, p.Cell97, p.Cell98, p.Cell99}),
(and (<= 1 Cell11) (<= Cell11 9))
(and (<= 1 Cell12) (<= Cell12 9))
(and (<= 1 Cell13) (<= Cell13 9))
(and (<= 1 Cell14) (<= Cell14 9))
(and (<= 1 Cell15) (<= Cell15 9))
(and (<= 1 Cell16) (<= Cell16 9))
(and (<= 1 Cell17) (<= Cell17 9))
(and (<= 1 Cell18) (<= Cell18 9))
(and (<= 1 Cell19) (<= Cell19 9))
(and (<= 1 Cell21) (<= Cell21 9))
(and (<= 1 Cell22) (<= Cell22 9))
(and (<= 1 Cell23) (<= Cell23 9))
(and (<= 1 Cell24) (<= Cell24 9))
(and (<= 1 Cell25) (<= Cell25 9))
(and (<= 1 Cell26) (<= Cell26 9))
(and (<= 1 Cell27) (<= Cell27 9))
(and (<= 1 Cell28) (<= Cell28 9))
(and (<= 1 Cell29) (<= Cell29 9))
(and (<= 1 Cell31) (<= Cell31 9))
(and (<= 1 Cell32) (<= Cell32 9))
(and (<= 1 Cell33) (<= Cell33 9))
(and (<= 1 Cell34) (<= Cell34 9))
(and (<= 1 Cell35) (<= Cell35 9))
(and (<= 1 Cell36) (<= Cell36 9))
(and (<= 1 Cell37) (<= Cell37 9))
(and (<= 1 Cell38) (<= Cell38 9))
(and (<= 1 Cell39) (<= Cell39 9))
(and (<= 1 Cell41) (<= Cell41 9))
(and (<= 1 Cell42) (<= Cell42 9))
(and (<= 1 Cell43) (<= Cell43 9))
(and (<= 1 Cell44) (<= Cell44 9))
(and (<= 1 Cell45) (<= Cell45 9))
(and (<= 1 Cell46) (<= Cell46 9))
(and (<= 1 Cell47) (<= Cell47 9))
(and (<= 1 Cell48) (<= Cell48 9))
(and (<= 1 Cell49) (<= Cell49 9))
(and (<= 1 Cell51) (<= Cell51 9))
(and (<= 1 Cell52) (<= Cell52 9))
(and (<= 1 Cell53) (<= Cell53 9))
(and (<= 1 Cell54) (<= Cell54 9))
(and (<= 1 Cell55) (<= Cell55 9))
(and (<= 1 Cell56) (<= Cell56 9))
(and (<= 1 Cell57) (<= Cell57 9))
(and (<= 1 Cell58) (<= Cell58 9))
(and (<= 1 Cell59) (<= Cell59 9))
(and (<= 1 Cell61) (<= Cell61 9))
(and (<= 1 Cell62) (<= Cell62 9))
(and (<= 1 Cell63) (<= Cell63 9))
(and (<= 1 Cell64) (<= Cell64 9))
(and (<= 1 Cell65) (<= Cell65 9))
(and (<= 1 Cell66) (<= Cell66 9))
(and (<= 1 Cell67) (<= Cell67 9))
(and (<= 1 Cell68) (<= Cell68 9))
(and (<= 1 Cell69) (<= Cell69 9))
(and (<= 1 Cell71) (<= Cell71 9))
(and (<= 1 Cell72) (<= Cell72 9))
(and (<= 1 Cell73) (<= Cell73 9))
(and (<= 1 Cell74) (<= Cell74 9))
(and (<= 1 Cell75) (<= Cell75 9))
(and (<= 1 Cell76) (<= Cell76 9))
(and (<= 1 Cell77) (<= Cell77 9))
(and (<= 1 Cell78) (<= Cell78 9))
(and (<= 1 Cell79) (<= Cell79 9))
(and (<= 1 Cell81) (<= Cell81 9))
(and (<= 1 Cell82) (<= Cell82 9))
(and (<= 1 Cell83) (<= Cell83 9))
(and (<= 1 Cell84) (<= Cell84 9))
(and (<= 1 Cell85) (<= Cell85 9))
(and (<= 1 Cell86) (<= Cell86 9))
(and (<= 1 Cell87) (<= Cell87 9))
(and (<= 1 Cell88) (<= Cell88 9))
(and (<= 1 Cell89) (<= Cell89 9))
(and (<= 1 Cell91) (<= Cell91 9))
(and (<= 1 Cell92) (<= Cell92 9))
(and (<= 1 Cell93) (<= Cell93 9))
(and (<= 1 Cell94) (<= Cell94 9))
(and (<= 1 Cell95) (<= Cell95 9))
(and (<= 1 Cell96) (<= Cell96 9))
(and (<= 1 Cell97) (<= Cell97 9))
(and (<= 1 Cell98) (<= Cell98 9))
(and (<= 1 Cell99) (<= Cell99 9))
(distinct Cell11 Cell12 Cell13 Cell14 Cell15 Cell16 Cell17 Cell18 Cell19)
(distinct Cell21 Cell22 Cell23 Cell24 Cell25 Cell26 Cell27 Cell28 Cell29)
(distinct Cell31 Cell32 Cell33 Cell34 Cell35 Cell36 Cell37 Cell38 Cell39)
(distinct Cell41 Cell42 Cell43 Cell44 Cell45 Cell46 Cell47 Cell48 Cell49)
(distinct Cell51 Cell52 Cell53 Cell54 Cell55 Cell56 Cell57 Cell58 Cell59)
(distinct Cell61 Cell62 Cell63 Cell64 Cell65 Cell66 Cell67 Cell68 Cell69)
(distinct Cell71 Cell72 Cell73 Cell74 Cell75 Cell76 Cell77 Cell78 Cell79)
(distinct Cell81 Cell82 Cell83 Cell84 Cell85 Cell86 Cell87 Cell88 Cell89)
(distinct Cell91 Cell92 Cell93 Cell94 Cell95 Cell96 Cell97 Cell98 Cell99)
(distinct Cell11 Cell21 Cell31 Cell41 Cell51 Cell61 Cell71 Cell81 Cell91)
(distinct Cell12 Cell22 Cell32 Cell42 Cell52 Cell62 Cell72 Cell82 Cell92)
(distinct Cell13 Cell23 Cell33 Cell43 Cell53 Cell63 Cell73 Cell83 Cell93)
(distinct Cell14 Cell24 Cell34 Cell44 Cell54 Cell64 Cell74 Cell84 Cell94)
(distinct Cell15 Cell25 Cell35 Cell45 Cell55 Cell65 Cell75 Cell85 Cell95)
(distinct Cell16 Cell26 Cell36 Cell46 Cell56 Cell66 Cell76 Cell86 Cell96)
(distinct Cell17 Cell27 Cell37 Cell47 Cell57 Cell67 Cell77 Cell87 Cell97)
(distinct Cell18 Cell28 Cell38 Cell48 Cell58 Cell68 Cell78 Cell88 Cell98)
(distinct Cell19 Cell29 Cell39 Cell49 Cell59 Cell69 Cell79 Cell89 Cell99)
(distinct Cell11 Cell12 Cell13 Cell21 Cell22 Cell23 Cell31 Cell32 Cell33)
(distinct Cell14 Cell15 Cell16 Cell24 Cell25 Cell26 Cell34 Cell35 Cell36)
(distinct Cell17 Cell18 Cell19 Cell27 Cell28 Cell29 Cell37 Cell38 Cell39)
(distinct Cell41 Cell42 Cell43 Cell51 Cell52 Cell53 Cell61 Cell62 Cell63)
(distinct Cell44 Cell45 Cell46 Cell54 Cell55 Cell56 Cell64 Cell65 Cell66)
(distinct Cell47 Cell48 Cell49 Cell57 Cell58 Cell59 Cell67 Cell68 Cell69)
(distinct Cell71 Cell72 Cell73 Cell81 Cell82 Cell83 Cell91 Cell92 Cell93)
(distinct Cell74 Cell75 Cell76 Cell84 Cell85 Cell86 Cell94 Cell95 Cell96)
(distinct Cell77 Cell78 Cell79 Cell87 Cell88 Cell89 Cell97 Cell98 Cell99)
(and (and (= Cell13 2) (= Cell16 1)) (= Cell18 6))
(and (= Cell23 7) (= Cell26 4))
(and (= Cell31 5) (= Cell37 9))
(and (= Cell42 1) (= Cell44 3))
(and (and (= Cell51 8) (= Cell55 5)) (= Cell59 4))
(and (= Cell66 6) (= Cell68 2))
(and (= Cell73 6) (= Cell79 7))
(and (= Cell84 8) (= Cell87 3))
(and (and (= Cell92 4) (= Cell94 9)) (= Cell97 2))

Only the last nine lines were specified by the user through the LINQ query expression. In fact, the reader should be able to create a Sudoku class already based on today’s literature (tip: work your way down starting from the signature of Sudoku.Create you can infer from the sample above; reuse Theorem<T> of course).

More complex puzzles exist though, for example a KenKen puzzle:

image

Besides constraints on rows and columns and cells, similar to a Sudoku’s (with regards to the use of digits 1 through 4), there are more domain-specific constraints: all mathematical operators are commutative. For example, 1- in the two-cell region means that Cell12 – Cell22 or Cell22 – Cell12 equals to 1. For bigger regions all permutations of cells are considered. In other words, we need different semantics on operators. For example:

var k1 = from t in KenKen4By4.Create(ctx)
         where t.Add(4, t.Cell11, t.Cell21)
         where t.Sub(1, t.Cell12, t.Cell22)
         where t.Mul(4, t.Cell13, t.Cell14, t.Cell23)
         where t.Mul(24, t.Cell31, t.Cell32, t.Cell33)
         where t.Sub(3, t.Cell24, t.Cell34)
         where t.Div(2, t.Cell41, t.Cell42)
         where t.Sub(1, t.Cell43, t.Cell44)
         select t;
Console.WriteLine(k1.Solve());
Console.WriteLine();

results in the following constraints:

(and (<= 1 Cell11) (<= Cell11 4))
(and (<= 1 Cell12) (<= Cell12 4))
(and (<= 1 Cell13) (<= Cell13 4))
(and (<= 1 Cell14) (<= Cell14 4))
(and (<= 1 Cell21) (<= Cell21 4))
(and (<= 1 Cell22) (<= Cell22 4))
(and (<= 1 Cell23) (<= Cell23 4))
(and (<= 1 Cell24) (<= Cell24 4))
(and (<= 1 Cell31) (<= Cell31 4))
(and (<= 1 Cell32) (<= Cell32 4))
(and (<= 1 Cell33) (<= Cell33 4))
(and (<= 1 Cell34) (<= Cell34 4))
(and (<= 1 Cell41) (<= Cell41 4))
(and (<= 1 Cell42) (<= Cell42 4))
(and (<= 1 Cell43) (<= Cell43 4))
(and (<= 1 Cell44) (<= Cell44 4))
(distinct Cell11 Cell12 Cell13 Cell14)
(distinct Cell21 Cell22 Cell23 Cell24)
(distinct Cell31 Cell32 Cell33 Cell34)
(distinct Cell41 Cell42 Cell43 Cell44)
(distinct Cell11 Cell21 Cell31 Cell41)
(distinct Cell12 Cell22 Cell32 Cell42)
(distinct Cell13 Cell23 Cell33 Cell43)
(distinct Cell14 Cell24 Cell34 Cell44)
(= 4 (+ Cell11 Cell21))
(or (= 1 (- Cell12 Cell22)) (= 1 (- Cell22 Cell12)))
(= 4 (* (* Cell13 Cell14) Cell23))
(= 24 (* (* Cell31 Cell32) Cell33))
(or (= 3 (- Cell24 Cell34)) (= 3 (- Cell34 Cell24)))
(or (= 2 (div Cell41 Cell42)) (= 2 (div Cell42 Cell41)))
(or (= 1 (- Cell43 Cell44)) (= 1 (- Cell44 Cell43)))

Again the last set of constraints are a result of the LINQ query expression (as a side-note, different Z3 techniques are available to tackle problems, something outside the scope of this discussion just yet). In order to generate those constraints, the domain-specific Add, Sub, Mul and Div methods need to be translated into Z3 primitives, something that will be done by implementing a rewriter type (cf. the pieces of code I asked the reader to skim over for the time being). We could go one step further and allow the user to specify the puzzle by means of a visual textual DSL:

var k2 = KenKen4By4.Parse(ctx, @"+#####+#####+#####+#####+
                                 # 4+  # 1-  # 4*        #
                                 +     +     +     +#####+
                                 #     #     #     # 3-  #
                                 +#####+#####+#####+     +
                                 # 24*             #     #
                                 +#####+#####+#####+#####+
                                 # 2/        # 1-        #
                                 +#####+#####+#####+#####+");
Console.WriteLine(k2.Solve());
Console.WriteLine();

Or even consume the Word file (from which the screenshot of the puzzle was taken earlier) directly:

var k3 = KenKen4By4.FromWord(ctx, @"C:\Users\Bart\Desktop\KenKen.docx");
Console.WriteLine(k3.Solve());
Console.WriteLine();

Finally there are Kakuro puzzles that give rise to yet another desire for better domain-specific theorem expressiveness:

var u1 = from k in ctx.NewTheorem<Kakuro>()
         where k.Cell11.IsBlack
         where k.Cell14.IsBlack
         where k.Cell15.IsBlack
         where k.Cell24.IsBlack
         where k.Cell48.IsBlack
         where k.Cell51.IsBlack
         where k.Cell61.IsBlack
         where k.Cell85.IsBlack
         where k.Cell12.VerticalSum == 23
         where k.Cell13.VerticalSum == 30
         where k.Cell16.VerticalSum == 27
         where k.Cell17.VerticalSum == 12
         where k.Cell18.VerticalSum == 16
         where k.Cell21.HorizontalSum == 16
         where k.Cell25.VerticalSum == 17
         where k.Cell25.HorizontalSum == 24
         where k.Cell31.HorizontalSum == 17
         where k.Cell34.VerticalSum == 15
         where k.Cell34.HorizontalSum == 29
         where k.Cell41.HorizontalSum == 35
         where k.Cell47.VerticalSum == 12
         where k.Cell52.HorizontalSum == 7
         where k.Cell55.VerticalSum == 7
         where k.Cell55.HorizontalSum == 8
         where k.Cell58.VerticalSum == 7
         where k.Cell62.VerticalSum == 11
         where k.Cell63.VerticalSum == 10
         where k.Cell63.HorizontalSum == 16
         where k.Cell71.HorizontalSum == 21
         where k.Cell76.HorizontalSum == 5
         where k.Cell81.HorizontalSum == 6
         where k.Cell86.HorizontalSum == 3
         select k;

Here properties are used to indicate constraints as simple as possible, for the following puzzle:

+--------+--------+--------+--------+--------+--------+--------+--------+
| ****** | 23     | 30     | ****** | ****** | 27     | 12     | 16     |
+--------+--------+--------+--------+--------+--------+--------+--------+
|     16 |        |        | ****** | 17  24 |        |        |        |
+--------+--------+--------+--------+--------+--------+--------+--------+
|     17 |        |        | 15  29 |        |        |        |        |
+--------+--------+--------+--------+--------+--------+--------+--------+
|     35 |        |        |        |        |        | 12     | ****** |
+--------+--------+--------+--------+--------+--------+--------+--------+
| ****** |      7 |        |        | 7    8 |        |        | 7      |
+--------+--------+--------+--------+--------+--------+--------+--------+
| ****** | 11     | 10  16 |        |        |        |        |        |
+--------+--------+--------+--------+--------+--------+--------+--------+
|     21 |        |        |        |        |      5 |        |        |
+--------+--------+--------+--------+--------+--------+--------+--------+
|      6 |        |        |        | ****** |      3 |        |        |
+--------+--------+--------+--------+--------+--------+--------+--------+

We don’t want the user to express things like Cell22 + Cell32 + Cell42 == 23, but instead want to use all the information captured in the puzzle to allow constraints like Cell12.VerticalSum == 23. As we’ll see, this requires other rewriting mechanisms to allow for a good balance of responsibilities between:

  • The LINQ to Z3 infrastructure.
  • The Kakuro puzzle library writer.
  • The end-user.

The resulting Z3 constraints for the sample puzzle are shown below:

(and (= (+ (+ Cell22 Cell32) Cell42) 23) (distinct Cell22 Cell32 Cell42))
(and (= (+ (+ (+ Cell23 Cell33) Cell43) Cell53) 30)
     (distinct Cell23 Cell33 Cell43 Cell53))
(and (= (+ (+ (+ (+ Cell26 Cell36) Cell46) Cell56) Cell66) 27)
     (distinct Cell26 Cell36 Cell46 Cell56 Cell66))
(and (= (+ Cell27 Cell37) 12) (distinct Cell27 Cell37))
(and (= (+ Cell28 Cell38) 16) (distinct Cell28 Cell38))
(and (= (+ Cell22 Cell23) 16) (distinct Cell22 Cell23))
(and (>= Cell22 1) (<= Cell22 9))
(and (>= Cell23 1) (<= Cell23 9))
(and (= (+ (+ Cell26 Cell27) Cell28) 24) (distinct Cell26 Cell27 Cell28))
(and (= (+ Cell35 Cell45) 17) (distinct Cell35 Cell45))
(and (>= Cell26 1) (<= Cell26 9))
(and (>= Cell27 1) (<= Cell27 9))
(and (>= Cell28 1) (<= Cell28 9))
(and (= (+ Cell32 Cell33) 17) (distinct Cell32 Cell33))
(and (>= Cell32 1) (<= Cell32 9))
(and (>= Cell33 1) (<= Cell33 9))
(and (= (+ (+ (+ Cell35 Cell36) Cell37) Cell38) 29)
     (distinct Cell35 Cell36 Cell37 Cell38))
(and (= (+ (+ (+ (+ Cell44 Cell54) Cell64) Cell74) Cell84) 15)
     (distinct Cell44 Cell54 Cell64 Cell74 Cell84))
(and (>= Cell35 1) (<= Cell35 9))
(and (>= Cell36 1) (<= Cell36 9))
(and (>= Cell37 1) (<= Cell37 9))
(and (>= Cell38 1) (<= Cell38 9))
(and (= (+ (+ (+ (+ Cell42 Cell43) Cell44) Cell45) Cell46) 35)
     (distinct Cell42 Cell43 Cell44 Cell45 Cell46))
(and (>= Cell42 1) (<= Cell42 9))
(and (>= Cell43 1) (<= Cell43 9))
(and (>= Cell44 1) (<= Cell44 9))
(and (>= Cell45 1) (<= Cell45 9))
(and (>= Cell46 1) (<= Cell46 9))
(and
(= (+ (+ (+ Cell57 Cell67) Cell77) Cell87) 12)
     (distinct Cell57 Cell67 Cell77 Cell87))
(and (= (+ Cell53 Cell54) 7) (distinct Cell53 Cell54))
(and (>= Cell53 1) (<= Cell53 9))
(and (>= Cell54 1) (<= Cell54 9))
(and (= (+ Cell56 Cell57) 8) (distinct Cell56 Cell57))
(and (= (+ Cell65 Cell75) 7) (distinct Cell65 Cell75))
(and (>= Cell56 1) (<= Cell56 9))
(and (>= Cell57 1) (<= Cell57 9))
(and (= (+ (+ Cell68 Cell78) Cell88) 7) (distinct Cell68 Cell78 Cell88))
(and (= (+ Cell72 Cell82) 11) (distinct Cell72 Cell82))
(and
(= (+ (+ (+ (+ Cell64 Cell65) Cell66) Cell67) Cell68) 16)
     (distinct Cell64 Cell65 Cell66 Cell67 Cell68))
(and (= (+ Cell73 Cell83) 10) (distinct Cell73 Cell83))
(and (>= Cell64 1) (<= Cell64 9))
(and (>= Cell65 1) (<= Cell65 9))
(and (>= Cell66 1) (<= Cell66 9))
(and (>= Cell67 1) (<= Cell67 9))
(and (>= Cell68 1) (<= Cell68 9))
(and (= (+ (+ (+ Cell72 Cell73) Cell74) Cell75) 21)
     (distinct Cell72 Cell73 Cell74 Cell75))
(and (>= Cell72 1) (<= Cell72 9))
(and (>= Cell73 1) (<= Cell73 9))
(and (>= Cell74 1) (<= Cell74 9))
(and (>= Cell75 1) (<= Cell75 9))
(and (= (+ Cell77 Cell78) 5) (distinct Cell77 Cell78))
(and (>= Cell77 1) (<= Cell77 9))
(and (>= Cell78 1) (<= Cell78 9))
(and (= (+ (+ Cell82 Cell83) Cell84) 6) (distinct Cell82 Cell83 Cell84))
(and (>= Cell82 1) (<= Cell82 9))
(and (>= Cell83 1) (<= Cell83 9))
(and (>= Cell84 1) (<= Cell84 9))
(and (= (+ Cell87 Cell88) 3) (distinct Cell87 Cell88))
(and (>= Cell87 1) (<= Cell87 9))
(and (>= Cell88 1) (<= Cell88 9))

Again the red parts reflect what the user expressed, while the remainder is all generated by the domain-specific Kakuro implementation.

 

Conclusion

Creating a simple LINQ to Z3 implementation isn’t too hard and involves just a little bit of plumbing in the bathroom of reflection and some use of expression tree visitor patterns. In future posts, we’ll have a look at domain-specific theorem solving techniques based on declarative expression tree rewriters. Enjoy!

Del.icio.us | Digg It | Technorati | Blinklist | Furl | reddit | DotNetKicks

In this post, I’m going to do something out of the ordinary: write a post on a non-technical matter, in the “Personal” and “Microsoft” (big-bucket general stuff kind-a) categories I’ve left unattended for way too long. So, if you’re looking for interesting technical stuff, go look elsewhere this time around (but I promise to be back soon :-)).

So yes, this has been an interesting non-technical start (ignoring the continuation to follow after this post…) of a Saturday over here in Bellevue. To set the scene: a few months ago Dandy and I have been asked by a Belgian community (in the Microsoft-sense) member to write up something on “The Road to Redmond”. Dandy came up with this idea to create a video highlighting several hot-spots in the area. Finally we got around to take care of this request and shoot the moving pictures today. Lots of things interfered with the planning before: camera defects, Best Buy shopping for a new camera, Dandy’s US Open attendance (to see Kim Clijsters win, little did she know she’d become a recurring theme in the video), other weekend activities (occult and otherwise).

It’s kind of funny I’m pretty natural in front of large audiences, but simple video interviews make me behave differently somehow, especially when the spoken language falls back to Dutch instead of the English lingo my brain is in a steady state for. But anyway, we did some great shoots today starting with Belgian-style food and beverages (the latter not for me, as I’m a teetotaler) in downtown Bellevue where the waitress claimed to speak a little French but couldn’t properly pronounce Duchesse De Bourgogne. At least we got Belgian fries with mayonnaise to go with beef stew.

During a short visit to Best Buy to buy a tri-pod, I got a first question by the cashier on my unusual (way too geeky, I admit) T-shirt meant for shameless self-promotion of this blog, something that became a running theme throughout the day as well. Its front – typing for Curry-style System F-omega (different forms of definition can be found in the literature, but I chose one that fits a medium-sized T-shirt) – and back are shown below:

F B#

People who saw the filming in progress or got somehow around us can testify I really do wear a yellow T-shirt with the above :-).

Next shots were primarily to show off Dandy’s fancy car for the take-off (almost literally, velocity was getting close enough :-)) to the Space Needle. Though I’ve been here about two years now (and so does Dandy), I hadn’t been up there just yet. Too touristy if you will, but for the occasion I had an excuse. So we got there and went up, disregarding the cloudy weather and rather windy meteorological circumstances. I must say, the view is kind of nice but nothing too fancy to write home about. Kind as I am, I even took a picture of two ladies discovering some digital cameras still have a proper mechanical “take picture now” button. Last time I had a fancy digital camera in my hands, I couldn’t even find the power button, so I felt really good about this little victory near a camera this time. Dandy insisted me on showing them my T-shirt but maybe I did it a little too explicit (though I sincerely don’t believe so, notice my fellow colleague was a few Belgian beers in), so after thanking me for taking the picture and some little giggles (IMO mostly due to the presence of my colleague…) they took off. I know the (App) rule looks frightening enough for people to jump down, so I kept things a bit more hidden while being on the outside up there.

Back on the inside, we met this pretty Tacoma girl who works up there to answer tourist questions, all dressed up in a fabulous “I am Seattle” outfit. Can you believe it she even went to Brussels a while ago, totally jetlagged and looking for some old brewery nearby (can’t think of what that would have been, but nearby on the US scale is, euhm, not small). But hey, that’s better than nothing (unlike people thinking Belgium is the capital of Brussels, though in the limit things get close :-)). Anyway, she did a splendid job in our video interviews and we even made here pronounce Kim Clijsters more or less correctly. And contrast to the folks on the deck, she didn’t look too frightened about my T-shirt and even found my binary watch (standard equipment for a geek) cool.

Down again, we did a bit more hanging around in their souvenir shop (I know touristy things all look the same, but I needed some confirmation), where we shot some crazy scenes doing Plop dances in appropriate garments. Put developers together with IT Pros (yes, another cliché) and you get that sort of thing. Oh well, I don’t mind making my self a bit ridiculous from time to time :-).

Last but not least, we headed over to campus for a few shots around the new Studio buildings, where people were playing cricket on the fields and jogging around in Office T-shirts. All of a sudden, attention for my extraordinary clothing started to drop significantly as people over there are kind of used to this sort of weirdness :-). We did some talking on camera about serious matters this time around, covering subjects on life at Microsoft, the Microsoft culture, etc.

So yes, once a year or so (give or take) I can do non-geeky stuff as well. Though I need a little compensation by wearing appropriate clothing and such :-). People who believe they were part of this crazy plot today, feel free to drop a note in the comments section below this post.

Del.icio.us | Digg It | Technorati | Blinklist | Furl | reddit | DotNetKicks

Introduction

Side-effects don’t fit together very well with functional-style delayed computation, as observed in LINQ. Having such constructs embedded in an otherwise eagerly evaluated imperative language like C# can cause quite some confusion from time to time. A few samples (due to Erik Meijer) of such “mishaps” are shown below:

//
// When does it throw?
//

IEnumerable<double> res = null;
try
{
    res = from x in Enumerable.Range(0, 10)
          select 1.0 / x;
}
catch (DivideByZeroException) {}

foreach (var rcp in res)
    Console.WriteLine(rcp);

//
// What does it print?
//
Func<int, bool> div2 = i => { Console.WriteLine(“Div2 test “ + i); return i % 2 == 0; };
Func<int, bool> div3 = i => { Console.WriteLine(“Div3 test “ + i); return i % 3 == 0; };
var res = from x in Enumerable.Range(0, 10)
          where div2(x)
          where div3(x)
          select x;

foreach (var sxt in res)
    Console.WriteLine(sxt);

The side-effects in the samples above are of various kinds: exceptions and output operations all hurt in a functional setting. In pure “fundamentalist” functional programming languages, those effects are made explicit by means of the type system (in particular, the IO monad). When looking at a .NET “function” signature, types are lying at us:

class Random
{
    public int Next();
}

Sure, Next returns an Int32 numeric value when being called. But that’s only part of the story: every time you call it, it may return something else. In other words, we can’t simplify the following program:

var rand = new Random();
var fems = new Pair(rand.Next(), rand.Next());

into

var rand = new Random();
var __rn = rand.Next();
var fems = new Pair(__rn, __rn);

The main reason is simply because we don’t know enough about the behavior, i.e. the side-effects, the Next function exposes. Or to state it in another manner, Next is not a (pure) function but a method. In Haskell, types tell us a side-effect is going on:

randomIO :: (Random a) => IO a

Roughly speaking, Random a stands for Random<T> and IO a stands for T, “tagged” with the fact it has a side-effect through the IO monad (not the same as IO<T> as the CLR’s type system doesn’t support the higher kinding required to realize the typing required for monads).

 

Multiplication of side-effects

Being aware of the presence of side-effects is one thing, having them amplified or multiplied (almost) invisibly through the use of query combinators and such is even more painful. Consider the following iterator:

static class RandomExtensions
{
    public static IEnumerable<int> GetNumbers(this Random rand, int max)
    {
        while (true)
            yield return rand.Next(max);
    }
}

which can be called like:

var rand = new Random();
foreach (var x in rand.GetNumbers(100).Take(10))
    Console.WriteLine(x);

All is good and well so far, we’ll get 10 random numbers printed to the screen. Now assume we’re already in .NET 4 (just to make illustration a little easier, the general topic of “taming side effects” is version-independent and platform(-excluding-Haskell)-independent) and have Zip method available. Let’s write the following:

var rand = new Random();
var res = rand.GetNumbers(100).Take(10).Zip(rand.GetNumbers(100).Take(10), (l, r) => l + r).All(x => x % 2 == 0);

What do you think res should be? Clearly, the sum of a number and itself is always even. In bold, we have a common subexpression: on the face of it, you may think it’d produce the same results for both “sides” of Zip, hence the conclusion the zipped sums should always be even. Nothing is further away from the truth: GetNumbers propagates the side-effect of the underlying IO operation (reading form a random number generator) and the two calls create separate iterator instances, each merrily ticking along. A simple Print operator can help to illustrate this:

static class Enumerable
{
    public static IEnumerable<T> Print<T>(this IEnumerable<T> source, string formatMessage)
    {
        foreach (var item in source)
        {
            Console.WriteLine(formatMessage, item);
            yield return item;
        }
    }
}

Yes, even more side-effects added to the melting pot. Now we can write the following to illustrate the mishap:

var rand = new Random();
var res = rand.GetNumbers(100).Take(10).Print(“SRC1: {0}”).Zip(rand.GetNumbers(100).Take(10).Print(“SRC2: {0}”), (l, r) => l + r).All(x => x % 2 == 0);

One run on the author’s machine produced:

SRC1: 82
SRC2: 86
SRC1: 17
SRC2: 91
SRC1: 48
SRC2: 15

Clearly, both sources are out of sync. Now we go and knock on the door of a fellow programmer and ask to solve this issue. Well, clearly you’re calling GetNumbers twice – that’s your problem right there. Not so much… Let’s see what a proposed fix would look like:

var rand = new Random();
var src = rand.GetNumbers(100).Take(10);

var res = src.Print(“SRC1: {0}”).Zip(src.Print(“SRC2: {0}”), (l, r) => l + r).All(x => x % 2 == 0);

And here’s the result:

SRC1: 89
SRC2: 39
SRC1: 22
SRC2: 15

What’s happening here? Although we’ve reduced the number of calls to GetNumbers to just a single one, its execution is still lazily evaluated as we’re using an iterator. It’s not until we start iterating that computation is triggered. Moreover, the iterator returns an IEnumerable<int> which merely provides the potential to enumerate though the IEnumerator<int> objects it hands out. In effect, Zip will call GetEnumerator twice, resulting in two instances of the iterator (simplified code below):

public static IEnumerable<R> Zip<T1, T2, R>(this IEnumerable<T1> left, IEnumerable<T2> right, Func<T1, T2, R> zip)
{
    var leftE = left.GetEnumerator();
    var rightE = right.GetEnumerator();
    while (leftE.MoveNext() && rightE.MoveNext())
        yield return zip(leftE.Current, rightE.Current);
}

In our case, left and right are pretty much the same (ignore Print which acts as an all-pass filter iterator), namely src:

src.Zip(src, …)

Two calls to GetEnumerator result, each querying the same random generator instance that happily continues on its side-effecting mission. The iterator we wrote for GetNumbers is semantically equivalent to the following, illustrating this issue more clearly for those who are not familiar with the way iterators are compiled (simplified presentation):

static class RandomExtensions
{
    public static IEnumerable<int> GetNumbers(this Random rand, int max)
    {
        return new GetNumbersIterator(rand, max);
    }
}

class GetNumbersIterator : IEnumerable<int>
{
    private Random _rand;
    private int _max;

    public GetNumbersIterator(Random rand, int max)
    {
        _rand = rand;
        _max = max;
    }

    public IEnumerator<int> GetEnumerator()
    {
        return new GetNumbersIteratorEnumerator(this);
    }

    class GetNumbersIteratorEnumerator : IEnumerator<int>
    {
        private GetNumbersIterator _iter;
        private int _curr;

        public GetNumbersIteratorEnumerator(GetNumbersIterator iter)
        {
            _iter = iter;
        }

        public bool MoveNext()
        {
            _curr = _iter._rand.Next(_iter._max);
            return true;
        }

        public int Current
        {
            get { return _curr; }

        }
    }
}

In fact, you could say the side-effect that’s killing us now is the instantiation of GetNumbersIteratorEnumerator inside the GetEnumerator method. Observing that, the innocent unenlightened developer may say: what if we eliminate that additional object instantiation? Let’s “desugar” the iterator by writing it as the thing above (quite a bit of work, but the innocent developer is willing to go great lengths to get his code working As The Natural Numbers Intended) and eliminating the construction of a new GetNumbersIteratorEnumerator all the time.

class GetNumbersIterator : IEnumerable<int>
{
    private Random _rand;
    private int _max;
    private GetNumbersIteratorEnumerator _curr;

    public GetNumbersIterator(Random rand, int max)
    {
        _rand = rand;
        _max = max;
        _curr = new GetNumbersIteratorEnumerator(this);
    }

    public IEnumerator<int> GetEnumerator()
    {
        return _curr;
    }

Guess what? Here’s the result:

SRC1: 15
SRC2: 97
SRC1: 31
SRC2: 57
SRC1: 21
SRC2: 57
SRC1: 63
SRC2: 94

Bummer. We’re just pushing the problem in front of us. Though we have the same enumerator object inside the Zip method now, we’re still calling MoveNext twice:

public static IEnumerable<R> Zip<T1, T2, R>(this IEnumerable<T1> left, IEnumerable<T2> right, Func<T1, T2, R> zip)
{
    var leftE = left.GetEnumerator();
    var rightE = right.GetEnumerator();
    Debug.Assert(leftE == rightE);
    while (leftE.MoveNext() && rightE.MoveNext())
        yield return zip(leftE.Current, rightE.Current);
}

What would the desperate developer’s next suggestion be? Let’s not go there – clearly there’s something bigger the matter here. Everything you try at this level of decomposing iterator objects and knowing precisely how query operators work is a very very ugly band-aid trying to cover up for the bigger problem of side-effect replication here.

One way to tackle the problem from the outside is to break laziness:

var rand = new Random();
var src = rand.GetNumbers(100).Take(10).ToArray();

var res = src.Print(“SRC1: {0}”).Zip(src.Print(“SRC2: {0}”), (l, r) => l + r).All(x => x % 2 == 0);

Operators like ToArray and ToList drain the left-hand side sequence on the spot, persisting the retrieved elements in-memory. Iterators versus persisted data sequences aren’t differentiated on the type level: both are “enumerable”. Great from a generalization point of view, less so to understand the runtime behavior of dealing with those objects. Either way, the above works and prints:

SRC1: 44
SRC2: 44
SRC1: 6
SRC2: 6
SRC1: 81
SRC2: 81
SRC1: 14
SRC2: 14
SRC1: 65
SRC2: 65
SRC1: 1
SRC2: 1
SRC1: 9
SRC2: 9
SRC1: 67
SRC2: 67
SRC1: 65
SRC2: 65
SRC1: 75
SRC2: 75

For the first time we reached the end of the zipped sequence. Wonderful. But we’ve made a big sacrifice, something I accounted for right from the start of the post by using Take(10). The above doesn’t work if we’re dealing with infinite sequences. Obviously you should never try to drain those as the program will get stuck (which in fact can be formalized as well, maybe stuff to cover another time), yet infinite sequences are great to provide the user with the potential to compute as many results as desired. Having to break laziness to make query operator composition, e.g. for Zip in our sample, work spoils this whole idea of lazy computation.

By now it should be clear that side-effects are have far-stretching implications, but we should not ignore the fact they’re useful at times. What we need to make our “consistent consumption of side-effectful sequences” is some kind of let construct.

 

What’s let anyway?

Functional programming languages typically provide a “let” construct, e.g. as seen in ML-style languages like F#. One form is to use the following syntax:

let id = expr1 in expr2

The above is shorthand for abstraction and application carried out right after one another:

(fun id –> expr2) expr1

Consider the next sample:

let x = 1 in (x, x)

where (…, …) is the tupling operator. Semantically the above is equivalent to:

(fun x –> (x, x)) 1

In a call-by-value setting this effectively avoids re-evaluating the expr1 expression. Of course in the sample above, there’s not much work required to “re-evaluate” the rather minimalistic expression “1”. However, when evaluating this expression causes side-effects, we’re eliminating duplicate evaluation. This is shown in the sample below where we use the Stopwatch type to retrieve the timestamp, based on a high-resolution performance counter (see MSDN documentation for information and potential caveats).

image

Of course we can loop in our Random type sample as well:

image

You get the idea. How does this apply to our side-effecting iterators? Well, what we really want is a mechanism by which we can expose the same iterator to multiple consumers without re-evaluating the side-effects that get triggered by calls to IEnumerator.MoveNext (where the iterator’s code lives in one form or another). In addition to this functional requirement, we can’t drain the whole iterator as we’d break our teeth on infinite (computed) sequences.

 

IEnumerable.Let – Signature and use

First, let’s have a look at the contract for the IEnumerable.Let operator we’re after here. Obviously we want to keep the order as natural as possible and what better to use than an extension method to make syntax as natural as possible:

… Let<T>(this IEnumerable<T> source, …)

That’s a good start: we can take in an IEnumerable<T> as the expr1 for the let construct. Looking back at the F# equivalent for a moment:

(fun id –> expr2) expr1

It’s clear we need to have a function that exposes the one-time evaluated (read: enumerated) to a block of code where the user can party on the exposed sequence without having to worry about duplicating side-effects. The input of such a function clearly is IEnumerable<T>, leading to:

… Let<T>(this IEnumerable<T> source, Func<IEnumerable<T>, …

So, “id” from the F# code sample above will be identified to the input of the Func delegate. Finally, what’s the output meant to be? Obviously just the output of the expr2 evaluation after feeding in expr1 through the parameter. As we’re talking about sequence operators, a natural choice is to make the output an IEnumerable as well, and to allow projections and such to happen inside we choose another generic parameter:

… Let<T, U>(this IEnumerable<T> source, Func<IEnumerable<T>, IEnumerable<U>> function)

You can guess where the output of the source is meant to go to: the output of the Let construct by itself:

IEnumerable<U> Let<T, U>(this IEnumerable<T> source, Func<IEnumerable<T>, IEnumerable<U>> function)

Makes sense? For example, for our fancy Zip over random sequence we want to be able to write the following:

var rand = new Random();
var res = rand.GetNumbers(100).Take(10).Let(nums => nums.Zip(nums, (l, r) => l + r)).All(x => x % 2 == 0);

See what’s happening here? The left-hand side to Let becomes the source, which internally is exposed as “nums”. Zip produces its output, which becomes the output of the Let construct that’s subsequently fed into All. In fact, the above could be read as:

var res = (let nums = rand.GetNumbers(100).Take(10) in
               nums
.Zip(nums, (l, r) => l + r))
          .All(x => x % 2 == 0);

Now on to the question of the night, how to realize this?

 

Implementing IEnumerable.Let

Before we start implementing the let construct for LINQ to Objects, we need to make sure we fully understand the problem. Let me rephrase it one more time:

Given an IEnumerable<T> source, we need to expose it to a function is such a way that multiple consumers can only trigger a single enumerator to be active on the source. This ensures that the side-effects occurring as part of the enumeration (using MoveNext/Current) over the enumerator are not replicated.

Since many consumers can be active inside the “let body” (the function parameter in the signature), we can expect those to move at different paces. Here’s an example:

nums.Skip(5).Zip(nums, …)

In here, the Skip operator readily advances the iteration over the source 5 positions ahead. Then Zip consumes this input as well as the raw source starting from position 0. Given a sequence of { 1, 2, 3, 4, 5, 6, 7, 8, 9 }, the inputs to Zip are skewed over a distance 5, combining (6, 1), (7, 2), etc. This forces us to memorize the first 5 elements that have already been consumed by Skip, as they still have to be observed through Zip’s second input (and obviously we can’t recompute them as that’s what we’re trying to eliminate!).

For one thing, we need to keep track of elements that have been consumed from the Let-source so that consumers inside the body can get access to those computed values. An immediate question becomes whether or not we can optimize this: wouldn’t it be possible to scavenge the computed values if we are sure no single consumer inside the body can still consume it? Sounds reasonable, but there’s a deep problem here: the number of consumers can be dynamic. A good sample is SelectMany:

from x in nums
from y in nums
select x + y

This translates into the following:

nums.SelectMany(x => nums, (x, y) => x + y)

SelectMany is conceptually similar to nested loops used to enumerate a collection and for each of its elements retrieve associated objects (e.g. for each product, and each order for that product, do …). In the sample above we use nums twice, so assuming the source looks like { 1, … , 9 }, we will see the following pairs being fed to the projection function parameter:

(1, 1), (1, 2), …, (1, 9), (2, 1), …, (9, 1), (9, 2), …, (9, 9)

For the 9 elements in the left-hand side source, we’ll enumerate all of the elements in the corresponding source found by evaluating x => nums (where, in this degenerate case, we disregard the parameter x and simply return ourselves). How many times will we trigger enumeration on nums here? Well, one time for the input to SelectMany as the left-hand side, and for each element one more time for the nested enumeration. So, with 9 elements, the total will be 9 + 1 = 10. There’s no way for us to know upfront how many subscriptions will happen, so we can’t make assumptions when we can discard elements we’ve retrieved from the source.

All of those observations give rise to the introduction of a “memoized enumerator”. A what? Memoization is a technique used in functional programming to cache the result of evaluating a function knowing that function is pure, i.e. given the same inputs it always gives back the same output. A typical example is recursive Fibonacci:

Func<uint, uint> fib = null;
fib = n => n <= 1 ? 1 : fib(n - 2) + fib(n - 1);
for (uint i = 0; i < 50; i++)
    Console.WriteLine(fib(i));

Good luck running this code – it will take forever to compute fib(50) (ignore the fact it doesn’t fit in a uint) due to the explosion of recursive steps and re-evaluations incurred. A generic memoization function will cache results:

static Func<T, R> Memoize<T, R>(Func<T, R> f)
{
    var cache = new Dictionary<T, R>();

    return t =>
    {
        R res;
        if (cache.TryGetValue(t, out res))
            return res;
        return cache[t] = f(t);
    };
}

This memoizer is a higher-order function that transparently (in terms of the function signature) maps a function onto one that caches evaluation results. As a result, for a given function input “t”, the function “f” can be evaluated at most once (for subsequent function calls given the same input, the dictionary will be consulted). It should be pretty clear this is precisely what we want for our Let-enumerator as well, but instead of keying off the avoidance for multiple evaluations from the function input, we’ll need to avoid it based on the position of an enumerator. Let’s discuss this.

Assume you have two consumers of the Let-input sequence: one skips ahead 5 positions, the other starts from the beginning, just as we saw before:

src.Let(nums => nums.Skip(5).Zip(nums, (l, r) => l + r))

Let’s decompose the interactions here. Zip is pulling both its left-hand side followed by its first parameter, at an equal pace. First, the end-result:

image

I hope the above diagram makes sense, if not look at the more conceptual diagram below:

image

Now we loop bottom-up: if something pulls at Zip, what does it start pulling at by itself? First it calls MoveNext on its left-hand side, which is the Skip(5) block. Skip starts by fetching the first 5 + 1 items from its input, which is the Let block, in order to yield the 6th item back to the consumer. This leads to the following situation:

image

The read arrows indicate the active “channels” of enumeration. Reading bottom-up, Zip requested one element from Skip(5), which on its drew 6 items through its upstream channel connected to Let. Since Let hasn’t received calls to retrieve elements from the source at positions 0 through 5, it pulled on its source to get those items, causing potential side-effects (indicated by the bombs). But beside just yielding them back to the Skip(5) caller, it also “memoizes” them in an indexed collection. The “@5” annotation on the outgoing Let arrows – representing enumerators – indicate the cursor position in the source sequence, of the element that was last yielded through the channel.

Now that Zip has an element from its left-hand side source, it needs to fetch an element from its right-hand side source, in order for the two elements to be combined by the zipper function (in our case numerical +). This looks as follows:

image 

Notice the red channel doesn’t extend all the way to the source – we already have received the element at position 0, so we can consult the memoization store to retrieve the value from there. No observable source-induced side-effects are triggered by doing so. The cursor position for the active Let-consumer is moved to 0.

Essentially, whenever a channel queries for an element below the high-water mark of the memoization store, the element are retrieved from the store instead of pulling from the source. On to the code now. We’ll start by looking at a memoizing enumerable as we’ll use that as the building block for the Let construct. First, the IEnumerable<T> implementation:

class MemoizeEnumerable<T> : IEnumerable<T>, IDisposable
{
    private readonly IEnumerable<T> _source;
    private IList<T> _data;
    private IEnumerator<T> _enumSource;
    private bool _disposed;
    private bool _reachedEnd;

    public MemoizeEnumerable(IEnumerable<T> source)
    {
        _source = source;
    }

    public IEnumerator<T> GetEnumerator()
    {
        if (_disposed)
            throw new ObjectDisposedException("LetIterator");

        if (_enumSource == null && !_reachedEnd)
        {
            _enumSource = _source.GetEnumerator();
            _data = new List<T>();
        }

        return new MemoizeEnumerator<T>(this);
    }

    System.Collections.IEnumerator System.Collections.IEnumerable.GetEnumerator()
    {
        return GetEnumerator();
    }

    private bool TryMoveNext(int cursor, out T value)
    {
        Debug.Assert(cursor <= _data.Count);

        if (_disposed)
            throw new ObjectDisposedException("LetIterator");

        if (cursor < _data.Count)
        {
            value = _data[cursor];
            return true;
        }
        else
        {
            if (_reachedEnd)
            {
                value = default(T);
                return false;
            }

            if (_enumSource.MoveNext())
            {
                value = _enumSource.Current;
                _data.Add(value);
                return true;
            }
            else
            {
                // When we reach the end, no reason to delay the disposal of the source
                // any longer. Now we should avoid calls to MoveNext further on, so we
                // set a _reachedEnd flag to detect this case.
                _enumSource.Dispose();
_enumSource = null; _reachedEnd = true; value = default(T);
return false; } } } public void Dispose() { // Facilitates deterministic disposal of the underlying store. We can't rely // on Dispose calls on the enumerator object as we don't know the number of // consumers we're dealing with. if (!_reachedEnd)
{ _enumSource.Dispose(); _enumSource = null;
}
_disposed = true; _data.Clear(); } class MemoizeEnumerator<T_> : IEnumerator<T_> { …
} }

A MemoizeEnumerable<T> encapsulates an underlying IEnumerable<T> that represents the source. The GetEnumerator method for consumers to call in order to establish an enumerator triggers the MemoizeEnumerable to start consuming the input source unless it’s already doing so. This is the key to avoiding multiple source-consumers, which would lead to duplication of side-effects. The memoization store is simply a List<T>, which is indexable. What the MemoizeEnumerator<T> looks like will be shown soon.

Skipping over TryMoveNext for just a second, we look at Dispose first. As we’ve seen before, the number of enumerators over the exposed memoized data structure is something we can’t know upfront or infer from context. So, the disposal of enumerator objects doesn’t mean we can dispose the enumerator we established on the (memoized) source. We need a hint from the outside to say: it’s fine to let go the source (which could be holding on to, say, a network connection underneath). This will correspond to “scoped usage” of the Let construct as we shall see later.

Now on to TryMoveNext. Here the fetching of the underlying source happens, based on a passed-in cursor. We enforce linear progression, which is expected for MoveNext calls: the consuming enumerators should at most take one step forward at a time. If the cursor falls within the range of the memoization store, we simply retrieve the cached value from there. Otherwise, we fetch the next element from the underlying source (causing a potential side-effect once and for all for that element) and keep it in the store. If there are no further elements to be fetched as signaled by the source’s MoveNext call’s return value, we can dispose the enumerator over the source. To avoid subsequent calls on the disposed object, we keep a flag to signal we’ve reached the end. Notice we’re not dealing with multi-threaded consumers here – such worries are left as an exercise for the reader.

Finally, we should have a look at the MemoizeEnumerator inner class which obviously will call TryMoveNext in order to make progress over the shared source enumerator and memoization store. This type is fairly trivial:

class MemoizeEnumerator<T_> : IEnumerator<T_>
{
    private readonly MemoizeEnumerable<T_> _source;
    private int _cursor;
    private T_ _lastValue;

    public MemoizeEnumerator(MemoizeEnumerable<T_> source)
    {
        _source = source;
        _cursor = 0;
    }

    public T_ Current
    {
        get { return _lastValue; }
    }

    object System.Collections.IEnumerator.Current
    {
        get { return Current; }
    }

    public bool MoveNext()
    {
        return _source.TryMoveNext(_cursor++, out _lastValue);
    }

    public void Reset()
    {
        _cursor = 0;
    }

    public void Dispose()
    {
        // No disposal here - the enumerable object does the bookkeeping of the
        // underlying enumerator object. Here we're essentially just dealing with
        // a cursor into the underlying sequence.
    }
}

All there is to a memoizing enumerator is to keep its cursor, communicate it to TryMoveNext, and increment it. Notice the assert in TryMoveNext will trigger when an ill-behaved consumer continues to call MoveNext on the enumerator after it has returned false. One could pimp this case a little bit by either keeping a _reachedEnd state inside the enumerator (not to bother the enumerable object anymore in such a case) or by keeping the _cursor value in its final position as opposed to incrementing it.

Notice the enumerator’s Dispose method is not implemented for reasons explained earlier. The only thing one could do as a meaningful implementation is a _disposed flag to throw an ObjectDisposedException if the object is used beyond its disposal state. We leave this as a trivial exercise for the reader as well.

Ultimately we’ve reached a state where we can give away the entry-point to IEnumerable.Let:

public static IEnumerable<U> Let<T, U>(this IEnumerable<T> source, Func<IEnumerable<T>, IEnumerable<U>> function)
{
    using (var mem = new MemoizeEnumerable<T>(source))
    {
        foreach (var item in function(mem))
            yield return item;
    }
}

The most important thing here is obviously the memoization of the source which is fed to the function for the let-body to party on. The use of a using-block inside an iterator causes that iterator’s Dispose method to call the Dispose method on the resource in use. In other words, if one foreach’es over the result of a Let-call, exiting the loop will trigger a call to Dispose which on its turn disposes the memoized enumerable, again on its turn disposing the underlying source’s singleton enumerator. This means re-executing the Let resulting sequence itself will cause re-iteration over the underlying source. This makes sense as we only protect against duplication of side-effects inside the let-body. If you want to avoid re-execution of the Let-operator to cause duplication of side-effects by itself, wrap it in yet another Let-call.

 

A few samples

Time for some samples to conclude this journey. We go for two source iterators that are only of interest for their side-effects:

static IEnumerable<int> GetNums(int n)
{
    try
    {
        for (int i = 0; i <= n; i++)
        {
            Console.ForegroundColor = ConsoleColor.Red;
            Console.Write(i + "   ");
            Console.ResetColor();
            yield return i;
        }
    }
    finally
    {
        Console.ForegroundColor = ConsoleColor.Green;
        Console.WriteLine("Disposed");
        Console.ResetColor();
    }
}

static Random rand = new Random();

static IEnumerable<int> Lotto()
{
    try
    {
        for (int i = 0; i < 6; i++)
        {
            var n = rand.Next(100);
            Console.ForegroundColor = ConsoleColor.Red;
            Console.WriteLine(n);
            Console.ResetColor();
            yield return n;
        }
    }
    finally
    {
        Console.ForegroundColor = ConsoleColor.Green;
        Console.WriteLine("Disposed");
        Console.ResetColor();
    }
}

We log produced values to the console to visualize how many times the iterator is enumerated over. Obviously we write those in red as they are bloody side-effects. We’re also interested to see proper disposal, so we log that event too. A first example using SelectMany without Let shows how bad things get as we replicate side-effects:

Console.WriteLine("Without let");
{
    var source = GetNums(10);

    var res = from x in source
              from y in source
              select x + y;

    foreach (var i in res)
        ;
}
Console.WriteLine();

The result is as follows, no less than 11 iterations over the source sequence:

image 

Next, we eliminate the common sub-expression “source” by lifting it into a Let block. Source becomes the input to Let, and inside the let-body we represent the source with the “src” identifier:

Console.WriteLine("With let");
{
    var source = GetNums(10);

    var res = source.Let(src =>
                from x in src
                from y in src
                select x + y);

    foreach (var i in res)
        ;
}
Console.WriteLine();

Now things look much better:

image

Obviously the computed results will be the same (not visualized here – we’re just showing the side-effects), but we got rid of the excessive side-effect replication. This said, consuming a Let’s output multiple times doesn’t prevent the source from being consulted more than once: only inside the Let-block, consumers are shielded from side-effect duplication, but outside, consumers of the Let-block itself don’t get protection:

 

Console.WriteLine("With let (twice)");
{
    var source = GetNums(10);

    var res = source.Let(src =>
    {
        Console.WriteLine("Let!");
        return from x in src
               from y in src
               select x + y;
    });

    foreach (var i in res)
        ;

    foreach (var i in res)
        ;
}
Console.WriteLine();

We enumerate twice over the output of Let, hence we see the side-effects twice (and not 22 times as would be the case if we left Let out of the picture altogether):

image

However, nothing prevents us from putting a Let over the Let we already have: let’s eliminate the common subexpression res in both foreach-statements as follows:

Console.WriteLine("With let (nested)");
{
    var source = GetNums(10);

    var res = source.Let(src =>
        {
            Console.WriteLine("Let!");
            return from x in src
                   from y in src
                   select x + y;
        });

    res.Let(src =>
    {
        foreach (var i in src)
            ;

        foreach (var i in src)
            ;

        return new int[] { }; // doesn't matter, we want an "imperative let"
    }).ToArray(); // forces evaluation
}
Console.WriteLine();

Sure enough, we’re back to the minimum amount of side-effects:

image

And finally, some direct uses of Memoize, assuming the following exposition on IEnumerable<T> is present:

public static MemoizeEnumerable<T> Memoize<T>(this IEnumerable<T> source)
{
    return new MemoizeEnumerable<T>(source);
}

Now we use our lottery iterator as a source and show that the sequences are not equal without memoization as they are enumerated multiple times and hence different random numbers are picked for two uses of the source:

Console.WriteLine("Without memoize");
{
    var src = Lotto();
    Console.WriteLine(src.SequenceEqual(src));
}
Console.WriteLine();

Console.WriteLine("With memoize");
{
    using (var src = Lotto().Memoize())
        Console.WriteLine(src.SequenceEqual(src));
}
Console.WriteLine();

With the Memoization in use, obviously the two sequences ought to be the same all the time:

image

One more thing to illustrate is the use of infinite streams by means of a Repeat operator:

public static IEnumerable<T> Repeat<T>(T value)
{
    while (true)
        yield return value;
}

public static IEnumerable<T> Repeat<T>(IEnumerable<T> source)
{
    while (true)
        foreach (var item in source)
            yield return item;
}

It should be clear that the use of eagerly evaluated operators like ToArray and ToList won’t work as a side-effect-avoiding technique in such a setting. However, with Memoize one can eliminate side-effects without getting stuck:

Console.WriteLine("Memoize prevents _|_ out");
{
    using (var mem = Enumerable.Repeat(1).Memoize())
        Console.WriteLine(mem.Take(10).Count());
}
Console.WriteLine();

Console.WriteLine("ToList may _|_ out");
{
    Console.WriteLine(Enumerable.Repeat(1).ToList().Take(10).Count());
}
Console.WriteLine();

The samples above are not of interest for side-effects (though one can easily come up with samples), but rather for their infinite character. Use of ToList will exhaust the managed heap quickly, while Memoize just fills memory for the objects that are ever requested downstream (in the case above “Take 10”).

image 

Wonder about the _|_ notation? This is the “bottom type”, one way to type non-termination. “Entre parentheses”, we don’t have such a thing in the world of .NET, which is a pity. For example, if you have an exception throwing helper method:

static void ThrowAlways()
{
    throw new MyException(/* localization fluff */);
}

and you call the above from a function to signal an exception:

static int Bar()
{
    if (cond)
        …
    else
        ThrowAlways();
}

the compiler will complain that “not all code paths return a value”. Void is not the same as bottom, flow analysis cannot know the method won’t every return. With a bottom type, e.g. using keyword “never” (or more juicy ones like “blackhole”, “getstuck”, “goodbye”, “farewell”), one could say:

static never ThrowAlways()
{
    throw new MyException(/* localization fluff */);
}

Now the compiler would check statically that ThrowAlways can never reach its end (either by throwing an exception, looping infinitely or calling other “never”-returning methods). At the call-site, the compiler would be happy with Bar’s definition as well since it knows ThrowAlways never returns, hence there’s no need to be worried about not having a return value for the enclosing method. And of course, if we also had union types, we could (or maybe should, depending on how pesky the compiler would be made) warn the user that Bar may never return:

static (never | int) Bar()
{
    if (cond)
        …
    else
        ThrowAlways();
}

But all of this deserves its own discussion another time.

 

Conclusion

If you don’t want to tame your side-effects, you should at least have good notions about where they happen and how they may get replicated by mixing lazy and eager evaluation in languages like C#. Enumerable.Let shows one approach to taming side-effects in the context of LINQ queries with side-effecting computed sequences (iterators).

If you can’t get enough of this stuff, check out Erik Meijer’s talks on Fundamentalist Functional Programming. Credits where credits are due, the Enumerable.Let exercise shown in here presents the dual of the IObservable’s Let operator (from the Reactive LINQ framework Erik and his team have been working on lately), which I’ll talk about another time.

Left as a challenge for the reader: can you think of a way to port let’s brother “letrec” to the world of IEnumerable<T> as well? Let completeness be your source of inspiration :-).

Del.icio.us | Digg It | Technorati | Blinklist | Furl | reddit | DotNetKicks

A Belgian friend of mine was looking up my obsession on lambda calculus and came across this Dutch wiki page. Reminds me of exactly the same mistake 5 years ago: [FUNNY] Rule one when translating software - review the translation. Taking the opportunity to introduce a new IIgnorable blog category for the rare occasions I feel the urge to post nonsense.

image

Del.icio.us | Digg It | Technorati | Blinklist | Furl | reddit | DotNetKicks

More Posts