Introduction

Recently I delivered a session on Custom LINQ Providers - LINQ to Anything at the TechDays conferences in Ghent and Lisboa. The core of the session focuses on expression trees and translating those at runtime into a query language like CAML or LDAP (as the running samples). In this approach, IQueryable is just a minor implementation detail - or rather a vehicle to bring you closer to the query pattern established by the query operator methods. Last week, one of the attendees who got inspired by the session and started to write a query provider, asked me for some input on the best approach, referring to the little bullet indicated on my slide below:

image

 

IQueryable<T> or not?

So, do you really need IQueryable<T>? To say the least, IQueryable<T> is an interesting interface and so is IQueryProvider. Why? Well, first of all the IQueryable<T> interface itself doesn't have all of the query operators, instead it's being extended with those operators (and their implementation, wiring up expression trees) by the Queryable class's extension methods. You can somewhat think of it as an abstract base class; suffice to say it's the right balance in the tension between interface providers and implementers. In addition to this, the infrastructure provided by IQueryable is basically just a means to glue together pieces of an expression tree or to trigger execution of such an expression tree in IQueryProvider, e.g. in response to calling the First method.

Although providing much flexibility, the little caveat of the IQueryable approach lies in the fact that most providers don't need all of those methods, so most of the time you have to disappoint the user stating that something is not supported. When? At runtime, through NotSupportedException. And in lots of cases, there's no way to avoid this: think of a query predicate passed in to the Where method underneath the covers. If it contains a call to certain method (e.g. String.EndsWith) which can't be translated to the target language (e.g. CAML) you're out of luck and you won't find out till the code runs (the compiler doesn't call into a query provider, asking whether or not the generated expression tree will be supported at runtime).

However, on the higher level of query operators you can get some compile-time help by just partially implementing the query pattern. How does that work? Well, I did have the following slide, outlining query expressions are just another pattern provided by the language (just like foreach, using, lock, etc):

image

(The answer to the question above is click here.)

Right, the sample above is completely non-sense, but the compiler will be more than happy to translate it:

var res = new Homer().Where(x => x.HasSkateboard).Select(x => x.Sister).Father;

Take a look at the Where call, which resolves to the instance method Where on Homer, taking in a Func<Bart, bool>. In other words, the x in x => x.HasSkateboard is of type Bart, and HasSkateboard is a (not so far-fetched) property on Bart, returning a bool. Resolving the Where call: accomplished and returning an instance of Marge. The Select call can be inferred similarly, which is left to the reader.

 

Cooking you own query pattern

So, this is how you can create your own query pattern implementation by means of instance methods. A sample is shown below:

class Table<T> : IEnumerable<T>
{
   public Query<T> Where(Expression<Func<T, bool>> predicate) { ... }
   public ClosedQuery<R> Select<R>(Expression<Func<T, R>> projection) { ... }
   ...
}

class Query<T> : IEnumerable<T>
{
   public ClosedQuery<R> Select<R>(Expression<Func<T, R>> projection) { ... }
   ...
}

class ClosedQuery<T> : IEnumerable<T>
{
   ...
}

basically by providing an entry-point like Table<T> and further providing a fluent interface pattern that produces Query<T> objects in a chained manner. How you go ahead and expose the query operators on the source object and the query class purely depends on the composability of the query you want to provide, e.g. do you want a maximum of one Where call or do you allow more flexibility? Don't forget a Select method though since C# 3.0 requires the user to specify it; VB 9.0 doesn't but still generates the call which decouples the data source object from a resulting query object, no matter how trivial it is.

Notice in the sample above the use of ClosedQuery, which is a class that will keep all the expression tree information (which could be as easy as a few fields set by an internal constructor, containing the predicate and projection and whatever else you have captured using query operator methods) as well as a reference to the source the query is operating on. It doesn't contain further query operators though, to avoid multiple class to e.g. Select (which IQueryable<T> allows and which can be a bit challenging to implement - why?).

The core thing in the code above however is the type of the parameters to the query operator methods: Expression<T>, since you want an expression tree for runtime parsing and translation. And for the parser itself, obviously you want to short-circuit things in the degenerate cases such as Table<T> and Query<T> to fallback on a common implementation - people can still write new Table<string>().Where(x => x.Length == 5) and iterate over it, which should just work.

 

Compiler errors

Assuming you have the code above, which you can implement trivially as throwing NotImplementedException or returning null, let's try to write the following query:

var res = from x in new Table<string>()
          where x.Length > 5
          select x;

This should compile just fine. But if we write:

var res = from x in new Table<string>()
          where x.Length > 5
          orderby x.Length descending
          select x;

we'll see this compile-timer error:

image

which clearly points at the problem by using the words "query pattern".

 

Conclusion

When do you switch to this approach instead of using IQueryable<T>? As usual, the answer is: it depends. I'd say, if you just implement Where and Select and maybe some others like Take and First, it might not be worth to go all the way with IQueryable<T>. When ordering comes into play, things get a little more tricky with OrderBy and ThenBy calls (with their descending variants). Also, if you want to support multiple calls to query operators, IQueryable<T> might be beneficial, think of cases where users write "client-side views":

var expensive = from x in new Table<Product>()
                where x.UnitPrice > 123
                select x;

var res = from x in expensive
          where x.Name.StartsWith("C")
          select x;

In fact, for LINQ to AD, I still have the pending work item on my list to switch over to a custom implementation of the query pattern rather than using IQueryable<T>, but since it started as a blog series entitled "The IQueryable Tales", I have a good historical excuse for its current structure. LINQ to SharePoint on the other hand supports quite a bit of query operators (Where, Select, OrderBy*, ThenBy*, GroupBy, Take, First, TakeWhile) which makes it a better candidate for an IQueryable<T>-based implementation.

Happy querying!

Del.icio.us | Digg It | Technorati | Blinklist | Furl | reddit | DotNetKicks

It's been a while since I continued my series on a functional pattern matcher in C#. I finally found some time to extract the simplified pattern matching code from the bigger project I'm working on and cook up a downloadable documented sample. Without further delay: here it is.

image

It contains most of the matching techniques outlined in this blog series:

  • NewExpression (with metadata on constructor arguments)
  • MemberInitExpression
  • ConstantExpression (the degenerated case of a classic switch)
  • ParameterExpression (the degenerated case of a type switch)
  • ListInitExpression

Obviously there are a bunch of restrictions, namely:

  • No support for nested constructs, e.g. a match on a list of persons with parameters in the Person object expressions
  • No caching of results (cf. the Fibonacci sample)
  • Optimizations

The code comes with a set of samples showing the supported matches in practice. There's more fun stuff to come but timing is still undecided, so watch out :-). Enjoy!

Del.icio.us | Digg It | Technorati | Blinklist | Furl | reddit | DotNetKicks

I love my blog readers. Of course purely in a professional sense but still. Before I continue, let me point out to my much beloved audience on the other side of the RSS channel that this post isn't particularly interesting. So, if you're looking for hardcore technical stuff, I'd strongly recommend to ignore my post for one time.

So what brings me to write down this love declaration all of a sudden? A flashback: in late 2003 I started to blog on blogs.bartdesmet.net. I already had my domain name but no particularly interesting homepage (and the color scheme was a highly controversial topic of discussion amongst some of my friends). In fact, I've never been a big fan of homepages with photos of summer barbecues, tales about pets and other personal trivia. So I wanted to take a different approach and started to maintain a technical blog. Having been involved in some local projects that required a web server I was so kind to maintain (first Windows 2000 Server, later Windows Server 2003), it was a luxury for me to take some of the machine's web space and point my domain to it. At that time, we weren't really bandwidth constrained, being housed on a fiber network in a semi-professional housing environment.

But luxury is a volatile phenomenon. Time passed by, the project moved to other housing environments and I was no longer involved in the web server's maintenance (which was honestly a bit of a relief too). However, my blog was in the position of becoming homeless which wasn't a pretty thought given the number of links to it and of course all the time I invested to write up my sometimes interesting posts. So I was looking out for some affordable hosting company - which will remain unnamed - providing the latest and greatest platform (Windows Server 2003 and SQL Server 2005 at the time) with enough space to hold my contents. At the time I had about 250 MB of content + a 100 MB SQL Server database, so 1 GB of web space and 250 MB of SQL seemed to be more than enough. Today the size occupied has almost doubled as illustrated below:

Web space SQL Server space
image image

The thing I paid the least attention to though was the bandwidth provided by the hosting company, at that time I believe 60 GB per month. Who needs 60 GB right? I was a little surprised initially to find out there was - if I remember correctly - about 30 GB being transferred the first month at my new hoster. History only goes back to November 07, where it climbed to approximately 40 GB a month:

image

I was still fine and the bandwidth limits were even increased to a stunning 80 GB. However, ever since I started using Windows Live Writer I've been adding more and more screenshots to my posts (it so easy to upload dude), most of these at a large size to avoid the annoying "click to enlarge" catch-phrase every two lines (my policy is to fall back to the latter thumbnail approach whenever an image isn't part of the regular flow of the post but acts purely as an illustration which is "nice to see" but not strictly needed). So, after a few popular blogging series lately (C# 3.0, VB 9.0 and PowerShell 2.0 feature focuses and my recent ramblings on functional pattern matching in C# - to be continue) I was still a bit surprised to see my stats for this month so far:

image

This is why I love my readers: they make electrons travel the globe to deliver my writings to their brains. I never thought to become bandwidth constrained, but 63 GB in little over one half of the month is a bit too much given the limit of 80 GB a month. So this called for immediate action and luckily I have some additional space at bartdesmet.info which I bought recently to play around with IIS 7.0 (I said my hoster is on the cutting bleeding edge of technologies!) and to prepare moving my blog over to IIS 7.0 eventually.

The plan is simple: move over images to the second host, leave app stuff on the first host. One of the first things that brought up nice experiences was the re-encounter with the amazingly fast (not!) FTP protocol when dealing with small files: copying about 100 MB of images from one place to the other took several hours. One thing remaining is to update all pointers to images for which I considered different approaches:

  • Handle all .jpg and .png files using an HTTP Handler and redirect to the new location - would work in IIS 7.0 but my old host is still on IIS 6.0 so the metabase would require a change which involves the ISP's goodwill.
  • Tweak the SQL Server database to "replace" (although the REPLACE T-SQL function wouldn't work here - storage for post bodies in CommunityServer seem to be ntext fields) old links with new ones.
  • Write a Community Server module.

I went for the last approach, at least temporarily, since it doesn't involve touching the database and should take effect immediately. It goes roughly like this:

using System;
using System.Text;
using System.Xml;
using CommunityServer.Blogs.Components;
using CommunityServer.Components;

namespace BartDeSmet.Net
{
    public class CSRewriteImageLinks : ICSModule
    {
        public void Init(CSApplication csa, XmlNode node)
        {
            csa.PreRenderPost += new CSPostEventHandler(csa_PreRenderPost);
        }

        void csa_PreRenderPost(IContent content, CSPostEventArgs e)
        {
            if (e.ApplicationType == ApplicationType.Weblog && e.Target == PostTarget.Web)
            {
                WeblogPost post = content as WeblogPost;
                if (post != null && post.PostLevel == 1)
                {
                    StringBuilder sb = new StringBuilder();
                    sb.Append(post.FormattedBody);
                    sb.Replace("http://bartdesmet.info/images", "http://bartdesmet.info/images");
                    sb.Replace("http://bartdesmet.info/images_wlw", "http://bartdesmet.info/images_wlw");
                    post.FormattedBody = sb.ToString();
                }
            }
        }
    }
}

Quite brute force - though still a bit gentle using the StringBuilder (and web infrastructure caching I'd assume) - but functional. Fingers crossed... Oh and obviously I've changed my Live Writer settings to upload files to my second host (click to enlarge <g>):

image image

Alright, with this post I violated my own policy not to nag about personal trivial but one sin should be acceptable, no? At the very least I did publish some lines of C# code...

Del.icio.us | Digg It | Technorati | Blinklist | Furl | reddit | DotNetKicks

After my previous little European tour visiting Ghent and Lisboa talking about LINQ, Parallel FX Extensions, Windows PowerShell 2.0 and WPF, I'm looking forward to meet the European audience again at DevDays 2008 Amsterdam. I'm especially thrilled about this one since it's my first speaking opportunity at this conference though I've been living in a radius of a few 100 kilometers for the first 24,5 years of my life. The speaker list looks impressive, just a few big names: Rafal Lukaweicki, Daniel Moth and Mike Taulty, Ingo Rammer, Dan Fernandez, David Platt, the U2U triplet Peter, Patrick and Jan. This time I'll be delivering purely on LINQ, especially (agenda subject to change):

  • LINQ Inside Out - Thursday 10:50-12:00 - Want to know what really happens when you execute LINQ queries? Join us as we enter the inner workings of LINQ and see how the compiler translates LINQ query expressions into standard query operators, while digging into iterators that make LINQ to Objects tick. Learn exactly when query evaluation is deferred and how lambda expressions and closures work together to enable LINQ’s elegant syntax. After investigating LINQ to Objects, we’ll switch gears to dive into LINQ to SQL, which results in radically different translations as we dig into the details of IQueryable and expression trees. Finally, we explore language-specific LINQ features, including XML literals in VB 9.0.
  • Creating Custom LINQ Providers - Friday 13:30-14:40 - LINQ is all about unifying data access in a natural language integrated way. But there’s more than just LINQ to Objects, LINQ to SQL and LINQ to XML. In this session, we put ourselves on the other side of the curtain and explore the wonderful world of LINQ providers. You’ll learn how to create a fully functional LINQ query providers allowing users to target your favorite query language using familiar LINQ syntax in C# 3.0 and VB 9.0: LINQ to AD, LINQ to SharePoint, LINQ to Outlook, you name it! This is your chance to get to know the inner workings of LINQ.

Basically the first session acts as the entry-point to LINQ at the conference. We'll be approaching the technology from top to bottom, starting with the LINQ syntax in C# 3.0 and VB 9.0, moving on to the translation patterns carried out by the compiler, to end up in the inner workings of the beast: iterators in LINQ to Objects and expression trees for implementations like LINQ to SQL. We'll go fairly deep, so ideally attendees have had some prior exposure to the concepts of LINQ.

My second session acts as the ultimate dive deep into LINQ Providers. More specifically, you'll see how expression trees are turned into execution by runtime-translation into an underlying query language. As running samples, we'll use LINQ to AD and LINQ to SharePoint, covering the art of developing query providers. This includes not only coverage of expression tree parsing but also practical guidance on translation patterns, tools integration, etc. Even if you're not planning to write a custom LINQ provider yourself, this session is the ideal "under the covers" session for LINQ.

 

from session in DevDays.Sessions
where session.Subject == "LINQ"
select new { session.Title, session.Speaker }

  • LINQ Inside Out - Bart De Smet
  • Understanding the ADO.NET Entity Framework - Mike Taulty
  • A Tour of LINQ to XML - Mike Taulty
  • Is LINQ to SQL your data access layer? - Anko Duizer
  • Creating Custom LINQ Providers - Bart De Smet

For more information on the agenda, take a look at the session list over here. There's tons of great content on various topics: hardcore debugging, Silverlight, Visual Studio 2008, MOSS and WSS, WPF/WCF/WF, smart client and mobile development, DLR stuff, data mining and cryptography, parallel extensions, VSTS, SQL 2008, etc. Too much to pick from :-).

 

Hope to see you in Amsterdam!

Del.icio.us | Digg It | Technorati | Blinklist | Furl | reddit | DotNetKicks

In the last handful of posts in this series we've been looking at ways to match a whole set of patterns, including:

There's not that much left to apply (meaningful) matches for (feel free to think of others of course) so from this post on, we'll change the theme and start to focus a little more on translation of pattern matches into various executable forms. So far, we've been considering interpretation and a lightweight form of compilation by invoking the lambda expression compiler of .NET 3.5. We already saw in part 3 that the overall performance of a compiled pattern match is very reasonable given the fact we're faced with more work at runtime but once parsed and compiled, execution goes amazingly fast. So, we won't focus much on performance tuning again for now, but rather analyze the translation mechanism employed.

 

Where we're at right now

Our current form of compilation basically consists of this:

  • The Pattern<T> object aggregates pairs of match clause expression trees and match body delegates into MatchEntry objects.
  • Compilation is carried out on individual MatchEntry objects.
  • Each MatchEntry, when compiled, is turned into an efficient matcher and invoker:

    internal class MatchEntry
    {
        public LambdaExpression Match { get; set; }
        public Delegate Action { get; set; }

        public Func<object, bool> CompiledMatch { get; set; }
        public Func<object, T> CompiledInvoker { get; set; }

        internal void Compile()
        {
            ...
        }
    }

Although this works fine, we still do have some plumbing around that's looking a bit ugly. In order to match an object, we cycle through the list of MatchEntry objects and evaluate the compiled matcher. If it evaluates true, we hand off the object to the compiled invoker and return the resulting object with type T.

 

On to a more functional matcher

The interesting thing in the previous observation is a single letter: 'T'. Indeed, the pattern matcher returns an object of type T representing the result. In other words, it's a (fairly) complex function, or stated otherwise a valued expression. From this point on, it really matters what camp you're in:

  • Pattern matches as switch statements - or -
  • pattern matches as switch expressions

The latter obviously doesn't exist in C# (and many other languages of course) since switches are statements. In other words, they are used for their interesting side-effects (which might or might not set one piece of state, yielding some kind of hidden function). Switching to an expression type of switch would conceptually look like this:

int? age = switch (name)
{
    case "Bart":
        25;
        break;
    case "John":
        52;
        break;
    default:
        null;
        break;
}

It looks weird (hence the qualifier "conceptually") but here's another hypothetical form that looks more functional:

int? age = switch (name)
{
    case "Bart" => 25;
    case "John" => 52;
    default => null;
}

A similar approach could be taken for an if statement turning it into an expression:

int? age = if (name == "Bart")
    25;
else if (name == "John")
    52;
else
    null;

but if you look carefully we already have this kind of thing with the ternary operator:

int? age = (name == "Bart" ? 25 : (name == "John" ? 52 : null))

which can be nested arbitrarily to form a nice expression. Taking it back to the switch, you could see it as:

int? age =
(
    name == "Bart" ? 25 : (
    name == "John" ? 52 :
    null)
)

By the way, to make this really work in C#, you'll have to give the type inference machinery a hand like this (cf. CS0173):

int? age =
(
    name == "Bart" ? 25 :
    name == "John" ? 52 :
    (int?) null
)

(nullables strike again). More important though is the fact we can omit parentheses in the above because of the ternary operators right associativity. Now, let's generalize a bit:

T result =
(
    condition-expr-1 ? result-expr-1 :
    condition-expr-2 ? result-expr-2 :
    ...
    condition-expr-n ? result-expr-n :
    else-expr
)

Wait a minute, don't these conditions represent match clauses? Precisely. The only thing we need to change is the fact the match body needs to become an expression rather than a statement. So ultimately our match entry should comprise the following two:

    public LambdaExpression Clause { get; set; }
    public LambdaExpression Body { get; set; }

The remaining question is how we should chain those pieces together.

 

ConditionalExpression

We've already picked our target (ternary operators), but we need to know what it looks like under the covers. The simplest way to find out about it is the following C# one-liner:

Expression<Func<bool,int>> cond = b => b ? 1 : 0;

By now you should already be prepared to see (and filter out mentally) a ParameterExpression, two ConstantExpressions and one LambdaExpression for the whole thing. The body of the lambda is what interests us and geeks as we are, let's rely on ILDASM once more:

IL_0000:  nop
IL_0001:  ldtoken    [mscorlib]System.Boolean
IL_0006:  call       class [mscorlib]System.Type [mscorlib]System.Type::GetTypeFromHandle(valuetype [mscorlib]System.RuntimeTypeHandle)
IL_000b:  ldstr      "b"
IL_0010:  call       class [System.Core]System.Linq.Expressions.ParameterExpression [System.Core]System.Linq.Expressions.Expression::Parameter(class [mscorlib]System.Type,
                                                                                                                                               string)
IL_0015:  stloc.1
IL_0016:  ldloc.1
IL_0017:  ldc.i4.1
IL_0018:  box        [mscorlib]System.Int32
IL_001d:  ldtoken    [mscorlib]System.Int32
IL_0022:  call       class [mscorlib]System.Type [mscorlib]System.Type::GetTypeFromHandle(valuetype [mscorlib]System.RuntimeTypeHandle)
IL_0027:  call       class [System.Core]System.Linq.Expressions.ConstantExpression [System.Core]System.Linq.Expressions.Expression::Constant(object,
                                                                                                                                             class [mscorlib]System.Type)
IL_002c:  ldc.i4.0
IL_002d:  box        [mscorlib]System.Int32
IL_0032:  ldtoken    [mscorlib]System.Int32
IL_0037:  call       class [mscorlib]System.Type [mscorlib]System.Type::GetTypeFromHandle(valuetype [mscorlib]System.RuntimeTypeHandle)
IL_003c:  call       class [System.Core]System.Linq.Expressions.ConstantExpression [System.Core]System.Linq.Expressions.Expression::Constant(object,
                                                                                                                                             class [mscorlib]System.Type)
IL_0041:  call       class [System.Core]System.Linq.Expressions.ConditionalExpression [System.Core]System.Linq.Expressions.Expression::Condition(class [System.Core]System.Linq.Expressions.Expression,
                                                                                                                                                 class [System.Core]System.Linq.Expressions.Expression,
                                                                                                                                                 class [System.Core]System.Linq.Expressions.Expression)

IL_0046:  ldc.i4.1
IL_0047:  newarr     [System.Core]System.Linq.Expressions.ParameterExpression
IL_004c:  stloc.2
IL_004d:  ldloc.2
IL_004e:  ldc.i4.0
IL_004f:  ldloc.1
IL_0050:  stelem.ref
IL_0051:  ldloc.2
IL_0052:  call       class [System.Core]System.Linq.Expressions.Expression`1<!!0> [System.Core]System.Linq.Expressions.Expression::Lambda<class [System.Core]System.Func`2<bool,int32>>(class [System.Core]System.Linq.Expressions.Expression,
                                                                                                                                                                                        class [System.Core]System.Linq.Expressions.ParameterExpression[])
IL_0057:  stloc.0
IL_0058:  ret

The mechanism is surprisingly simple again thanks to the composable nature of the System.Linq.Expressions namespace types. Here's the C# equivalent of the above:

var b = Expression.Parameter(typeof(bool), "b");
var cond = Expression.Lambda(
    Expression.Conditional(
        b,
        Expression.Constant(1),
        Expression.Constant(0)
    ),
    b
);

The remaining thing for us to do is to map the entire matcher onto on big nested Expression.Conditional wrapped in a lambda. What does the lambda need to take in? Obviously the object being matched, so ultimately we'll be able to take the entire expression for the pattern matcher, compile it using the Compile method, and have a single delegate of type Func<object, T> that does the whole matching for us using efficient IL.

 

Revamping Pattern<T>

Our previous incarnation of Pattern<T> was rather quick and dirty in a sense that it didn't really enforce the fluent pattern at compile time. What we really want is a chain of Match calls that render a useless partial pattern which gets completed by adding an Else clause. Once that's done, the pattern can be compiled (for which different strategies can be considered, either we do it straight away - as shown below - or we require the user to trigger the compilation explicitly, e.g. at the first application of the pattern match on an object or by a separate Compile method call). So we'll start redefining our Pattern<T> class implementing the fluent interface pattern:

class Pattern<T>
{
    private List<MatchEntry> _entries = new List<MatchEntry>();

    public Pattern<T> Match<TR>(Expression<Func<TR>> clause, Expression<Func<T>> body)
    {
        return MatchInternal(clause, body);
    }

    public Pattern<T> Match<T1, TR>(Expression<Func<T1, TR>> clause, Expression<Func<T1, T>> body)
    {
        return MatchInternal(clause, body);
    }

    public Pattern<T> Match<T1, T2, TR>(Expression<Func<T1, T2, TR>> clause, Expression<Func<T1, T2, T>> body)
    {
        return MatchInternal(clause, body);
    }

    public Pattern<T> Match<T1, T2, T3, TR>(Expression<Func<T1, T2, T3, TR>> clause, Expression<Func<T1, T2, T3, T>> body)
    {
        return MatchInternal(clause, body);
    }

    private Pattern<T> MatchInternal(LambdaExpression clause, LambdaExpression body)
    {
        _entries.Add(new MatchEntry() { Clause = clause, Body = body });
        return this;
    }

    public ClosedPattern<T> Else(Expression<Func<T>> @else)
    {
        return new ClosedPattern<T>(_entries, @else);
    }
}

The MatchEntry class now looks like this:

internal class MatchEntry
{
    public LambdaExpression Clause { get; set; }
    public LambdaExpression Body { get; set; }
}

Notice how the Else method returns a ClosedPattern which represents a pattern that has all the information required to start applying the pattern to objects:

class ClosedPattern<T>
{
    private List<MatchEntry> _entries;
    private LambdaExpression _else; 

    private Func<object, T> _pattern;

    internal ClosedPattern(List<MatchEntry> entries, LambdaExpression @else)
    {
        _entries = entries;
        _else = @else;

        Compile();
    }

    private void Compile()
    {
        //
        // Compilation goes here.
        //

    }

    public T Execute(object o)
    {
        return _pattern(o);
    }
}

The most notable differences with the previous implementation are indicated in bold and outline the fact we're now fully expression-based.

 

Compilation of ClosedPattern<T>

At the core of our new pattern matcher implementation is the compilation of the closed pattern. Basically we have information representing something like:

T result =
(
    condition-expr-1 ? result-expr-1 :
    condition-expr-2 ? result-expr-2 :
    ...
    condition-expr-n ? result-expr-n :
    else-expr
)

where all the condition and result variables contain a reference to the object passed in. The way we match and map results is irrelevant for now, let's just assume we have a Func<object, bool> for each match clause and a Func<object, T> for each match body, which all live in peace inside the list of MatchEntry objects. In addition we have the lambda expression for the else clause. The way to thread things up is by starting at the bottom (in pseudo-code):

Expression pattern = else;

=>

pattern <- Expression.Condition(condition-expr-n, result-expr-n, pattern)

=>

...

=>

pattern <- Expression.Condition(condition-expr-1, result-expr-1, (..., Expression.Condition(condition-expr-n, result-expr-n, pattern)...))

Assuming we have this rewritten MatchEntry class:

internal class MatchEntry
{
    public LambdaExpression Clause { get; set; }
    public LambdaExpression Body { get; set; }

    public Expression Match { get; private set; }
    public Expression Map { get; private set; }

    public void Compile(ParameterExpression input)
    {
        //
        // Compilation goes here.
        //

    }
}

we can now write our compilation for the pattern like this:

private void Compile()
{
    ParameterExpression input = Expression.Parameter(typeof(object), "o");

    Expression pattern = _else.Body;

    foreach (var entry in Enumerable.Reverse(_entries))
    {
        entry.Compile(input);

        pattern = Expression.Condition(
            entry.Match,
            entry.Map,
            pattern
        );
    }

    _pattern = Expression.Lambda<Func<object,T>>(pattern, input).Compile();
}

This precisely captures the algorithm outlined above, building up the nested conditional expressions bottom-up. Ultimately, one pattern gets translated into on single delegate.

 

Compilation of MatchEntry objects

The remaining pieces of the puzzle are the Match and Map properties on MatchEntry:

internal class MatchEntry
{
    public LambdaExpression Clause { get; set; }
    public LambdaExpression Body { get; set; }

    public Expression Match { get; private set; }
    public Expression Map { get; private set; }

    public void Compile(ParameterExpression input)
    {
        //
        // Compilation goes here.
        //

    }
}

The Compile method needs to take Clause and Body and translate those into Match and Map respectively. The types represented by the expression objects will make things clear: Match is an expression producing a Boolean determining whether or not the input object (represented by ParameterExpression input) matches the entry; Map on the other hand takes a (matched) object and turns it into the output type T.

What's even better is we can reuse most of the original code of the unification engine. Below you can see the changes needed to our current Compile method in order to generate the new match expression (omitted a bunch of calls for specific expression types besides NewExpression):

internal void Compile(ParameterExpression input)
{
    //
    // Lambda input parameter.
    //
    ParameterExpression o = Expression.Parameter(typeof(object), "o");

    //
    // Created target type.
    //
    Type target = ClauseMatch.Body.Type;

    //
    // List of bindings.
    //

    var bindings = new Dictionary<ParameterExpression, Expression>();

    NewExpression ne;
    ...

    Expression match;
    if ((ne = ClauseMatch.Body as NewExpression) != null)
        match = CompileNewExpression(oinput, target, ne, bindings);
    ...
    else
        throw new
NotSupportedException("Unsupported expression type in match clause.");

    //
    // Store Compile match predicate.
    //
    CompiledMatch = Expression.Lambda<Func<object, bool>>(match, o).Compile();

The second part took care of the mapping part to execute the body. Again, we eliminate compilation on a per-MatchEntry basis and simply keep the expression tree around. It's just a little trickier because originally we did have a delegate called Action representing the match body. However, not this is represented as an expression tree too which represents the lambda expression that takes in the extracted values and produces the result of type T. Since the Body is simply a LambdaExpression now, we can feed it into Expression.Invoke straight away:

    //
    // Conversion from bound object to the target type.
    //

    Expression me = Expression.Convert(o input,  target);

    //
    // Create list of unbound parameters.
    //

    Expression[] args = new Expression[ClauseMatch.Parameters.Count];
    int j = 0;
    foreach (ParameterExpression param in ClauseMatch.Parameters)
    {
        //
        // Parameters need to be bound in the match expression.
        //

        Expression map;
        if (!bindings.TryGetValue(param, out map))
            throw new InvalidOperationException("Parameter " + param.Name + " was not bound in the pattern match.");

        //
        // Null for identity checks; grab from identified property otherwise.
        //

        args[j++] = map ?? me;
    }

    //
    // Store Compile invoker function.
    //

    Expression invoker Map = Expression.Invoke(BodyExpression.Constant(Action), args);
    CompiledInvoker = Expression.Lambda<Func<object, T>>(invoker, o).Compile();
}

Essentially we've been collecting trees in the MatchEntry's Compile method invocations and put them on a nice row inside the ClosedPattern's Compile method in order to create a pattern matching avenue.

 

Testing it

Let's take this sample again:

var bisect = new Pattern<bool>()
    .Match((int x) => new Point() { X = x, Y = x }, x => true)
    .Else(() => false);

Random rand = new Random();
for (int i = 0; i < 100; i++)
{
    Point p = new Point(rand.Next(1, 5), rand.Next(1, 5));
    if (bisect.Execute(p))
        Console.WriteLine(p);
}

Exercise: what's the type of bisect in the code above?

With the help of one breakpoint and the Immediate Window we can figure out what this translates into:

image

VB'ers will recognize IIF as the equivalent for the ternary operator but in a prefix form (actually, this is a lie in terms of VB - IIF is not a truly ternary operator, it's a function - VB 9.0 has a new If operator that's a real ternary operator). We just see one single ternary here, but you can see the whole thing working. First the condition:

((o Is Point) && Equals(Convert(Convert(o).X), Convert(Convert(o).Y)))

is

o is Point && ((Point)o).X == ((Point)o).Y

which represents the condition imposed by

(int x) => new Point() { X = x, Y = x }

The true part of the ConditionalExpression is:

Invoke(x => True,Convert(o).X)

which stands for

(x => true)(((Point)o).X)

where the first part represents a lambda and the second part calls it with argument "point's X value".

And finally the else got translated into False, which clearly stands for false.

Exercise: Map the following ToString output of the internal pattern expression tree representation onto the corresponding high-level pattern match (trivial one): "IIF((Convert(o) = 1), Invoke(() => False), IIF((Convert(o) = 2), Invoke(() => True), IIF((Convert(o) = 3), Invoke(() => False), IIF((Convert(o) = 4), Invoke(() => True), False))))"

You can imagine what mixed pattern matches will start to look like, don't you?

 

1 pattern match => 1 delegate

Del.icio.us | Digg It | Technorati | Blinklist | Furl | reddit | DotNetKicks

In our last encounter on the pattern matching mission we covered techniques to match T[] and List<T>. Today we cover another type that's being use a lot: dictionaries (the generic brothers of Hashtable which you could match too, exercise). I've already shown an example of such a match:

.Match((int bartAge, int johnAge) => new Dictionary<string, int>() {
    { "Bart", bartAge },
    { "John", johnAge }
}, (bartAge, johnAge) => { ... })

In our discussion we noticed it would be best not to constrain the number of entries in the dictionary based on the number of entries in the match clause. This is fairly similar to matching objects through object initializers where we just match the specified properties (comparable to keys) to have certain properties (comparable to values). If the reader feels the matching should be more restrictive, she should feel free to implement so. In addition to this, we do mandate the specified keys to be present since otherwise we don't know what value to extract and additionally, the matches can only be put on the keys while the extractions take place on the values (keys are unique, values aren't necessarily). Putting everything together, this is the skeleton of the sample's translation:

IDictionary<string, int> dictionary = o as IDictionary<string, int>;
int bartAge, johnAge;
if (dictionary != null && dictionary.TryGetValue("Bart", out bartAge) && dictionary.TryGetValue("John", out johnAge))
{
   ...
}

I'm using IDictionary over here to boost flexibility (I could have done the same in the previous post with IList<T> instead of List<T>) but that's just a minor detail. However, the above can't be translated one-on-one in our pattern matcher: the out keywords are problematic in the context of expression trees. Instead we'll stick with the more redundant equivalent:

IDictionary<string, int> dictionary = o as IDictionary<string, int>;
if (dictionary != null && dictionary.ContainsKey("Bart") && dictionary.ContainsKey("John"))
{
   int bartAge = dictionary["Bart"];
   int johnAge = dictionary["John"];
   ...
}

From a language point of view, initializing a Dictionary using collection initializers is not at all different from doing the same on say a List or an array; the core difference is the fact an Add method with two operandi is called. In other words, we're facing another of those ListInitExpressions, so we can fork the code introduced last time:

private Expression CompileListInitExpression(ParameterExpression o, Type target, ListInitExpression lie, Dictionary<ParameterExpression, Expression> bindings)
{
    if (lie.NewExpression.Arguments.Count != 0)
        throw new NotSupportedException("Collection initializers in match expressions should use the default constructor.");

    Type t = lie.Type.GetGenericTypeDefinition();
    if (t == typeof(List<>))
        return CompileListInitExpressionForListOfT(o, target, lie, bindings);
    else if (t == typeof(Dictionary<,>))
        return CompileListInitExpressionForDictionaryKonV(o, target, lie, bindings);
    else
        throw new NotSupportedException("Only List<T> and Dictionary<K,V> are supported.");
}

Don't bother about my rather verbose method naming, the inside of the method is much more appealing. The pattern is basically the same as before:

  1. Emit a type check
  2. Unify and extract all list entries
  3. Coalesce bindings to the same parameters and aggregate those with the check

Let's go there step by step. First the type check (trivial):

private Expression CompileListInitExpressionForDictionaryKonV(ParameterExpression o, Type target, ListInitExpression lie, Dictionary<ParameterExpression, Expression> bindings)
{
    Expression check = Expression.TypeIs(
        o,
        target
    );

Followed by the iteration over all initializers, which is where things get interesting. The Add method in Dictionary<K,V> takes two parameters: one of type K and one of type V. The compiler will have done all the type checking already to make sure types are compatible and such, and let's assume this overload of Add is the only one in reach. Here's the first part of the loop:

    int i = 0;
    foreach (var e in lie.Initializers)
    {
        //
        // Note: We know we're processing Dictionary<K,V>, which only has one Add overload,
        //       so we omit additional checks for now.
        //
        Expression keyArg = e.Arguments[0];
        Expression valArg = e.Arguments[1];

        ConstantExpression key = keyArg as ConstantExpression;
        if (key == null)
            throw new NotSupportedException("Keys for dictionaries used in pattern matches should be constant expressions.");