Saturday evening, 10:30 PM, Seattle time. Window 7 RC's audio stack is pumping music from the Internet Explorer process to the laptop speakers, the process manager boosts Word 2007 with foreground priority. Bart is writing something yet unannounced. Live Messenger (not to be renamed Bing Messenger) is running too. 10 + 9 = 7, at least in clock arithmetic, a quick calculation revealing the hour component of the time at "home", in Belgium. Little Facebook activity going on, not expecting too many log-on notifications on Messenger either. Affordable to swap in lots of context and concentrate on the writing job at hand. Boosting the IRQ level too. Life is good.

10:35 PM, Seattle time. A log-on notification message disturbs the graphical memory in the bottom right corner of the screen, after being fixed with some of Word's chrome for zooming and scrolling. The MSNP protocol is doing its work faithfully. The audio stack nicely overlaps the music with the log-on sound. An interrupt takes place. Old friend in Belgium, awake at 7:35 AM in a Sunday morning? Strange. A conversation is initiated, and a few watts in the data center are spent connecting two old friends together through the electronic alleyways of a gigantic TCP/IP world-scale network. An ethernet package with payload "too early - go to bed :)", with the last two-length character sequence nicely turned into a yellow happy face, leaves the NIC's port. An answer "emergency" followed by the brother's of the aforementioned yellow face travels back. Rang out of bed for some server going down it seems. None of my business. For a while at least...

Makes me wonder why I always have to inform for the software being used. Curious to hear about our stuff put to practice I guess. Admitted, I felt quite happy to hear Windows Server was involved. Not the latest release, but hey: better than nothing. Quite happy, but a bit unconfortable too: what piece of software was to blame, if any? Curiosity, that was I think it is. Or maybe my subconsience was in search for an excuse to deflect further talks about technology. Don't want to spoil a sunny morning in Ghent, Belgium - sunny, according to an orphaned sidebar gadget, floating freely on the Windows 7 desktop now. Let's not go deeper into an analysis of the subconsious me, I'm not a qualified psychologist, nor will I become one, and qualified ones haven't had any luck to reverse engineer my dark cells so far. Anyway, we suspend the conversation for a while. Problem being investigated on the other side; using the window switcher to get back to Word 2007. Smalltalk, in any form, I like it.

10:41 PM, Seattle time. A new IM message arrives. Error messages are being pasted. Should I regret having initiated a conversation to begin with? Maybe. Could turn out to be a fun exercise of psychic remote debugging. It takes 5 minutes to make Messenger turn to the Away state. Can't fake it too obviously. Should I respond? Let's glimpse over the message a second time:

An IO error occurred during FTP transport, No buffer space available (maximum connections reached?): JVM_Bind, No buffer space available (maximum connections reached?)

Should I make a silly joke about certain three-letter acronyms in the message? Nah, let's be fair. Wouldn't help the other side, and my technological bias is already known. Obviously I don't like the FTP protocol, or what did you think I was referring to? Actually I like the error message with built-in redundancy. They seem to be quite sure about the cause. But what does some error reporting component like to tell us? Clearly "No buffer space available (maximum connections reached?)".

Bing search did reveal a few useful hits, but some further diagnostics would be desirable. Time for me to reveal there's a bit of network admin in me. Just a tiny bit, but enough for the job at hand. Hints indicate it might reveal a WSAENOBUFS error code. Good old WinSock. Netstat might tell us something. Let's ask for a netstat -n -b -a output. Don't want to look like a dicky-doo-dah, let's try myself first. A hint to our DOS background "C:\_" catches my eye on the Windows 7 superbar. A few ALT-TAB presses will bring me there. Aarg, console out is fed with a string buffer containing the words "The requested operation requires elevation.". I'm in the post-Vista era. Oh well, I trust my main memory, press <ENTER> in the Live Messenger window.

A text file travels back, capturing output of the requested command:

Active Connections

  Proto  Local Address          Foreign Address        State           PID
  TCP    0.0.0.0:21             0.0.0.0:0              LISTENING       13668
  [inetinfo.exe]

  TCP    0.0.0.0:22             0.0.0.0:0              LISTENING       13668
  [inetinfo.exe]

  TCP    0.0.0.0:80             0.0.0.0:0              LISTENING       13068
  W3SVC
  [svchost.exe]

  ...

  TCP    172.26.28.30:1026      172.26.27.22:21        CLOSE_WAIT      15384
  [dllhost.exe]

  TCP    172.26.28.30:1027      172.26.28.30:21        CLOSE_WAIT      15384
  [dllhost.exe]

  TCP    172.26.28.30:1028      172.26.28.30:21        CLOSE_WAIT      15384
  [dllhost.exe]

  ...

  TCP    172.26.28.30:4998      172.26.28.30:21        CLOSE_WAIT      15384
  [dllhost.exe]

  TCP    172.26.28.30:4999      172.26.28.30:21        CLOSE_WAIT      15384
  [dllhost.exe]

  TCP    172.26.28.30:5000      172.26.27.22:21        CLOSE_WAIT      15384
  [dllhost.exe]

  TCP    127.0.0.1:383          127.0.0.1:4696         TIME_WAIT       0
  TCP    127.0.0.1:4281         127.0.0.1:2485         TIME_WAIT       0
  TCP    127.0.0.1:4696         127.0.0.1:2485         TIME_WAIT       0
  TCP    172.26.28.30:4281      172.26.28.31:445       TIME_WAIT       0
  UDP    0.0.0.0:135            *:*                                    1936
  RpcSs
  [svchost.exe]

  ... 

Nice to see some good old friends like IIS. But back to business. A Dictionary<string,int> in my main memory is consulted for a lookup on key "FTP", providing a prompt response of 21. What about looking for :21 in the file? Hmm, quite a bit of hits. Port 1026, CLOSE_WAIT. 1027 same deal, not any different for 1028. A scroll bar decorates the right border of my favorite editor, Notepad. WM_VSCROLL messages are pumped, SB_THUMBTRACK joins the party. 4998 involved in an intimate though partly terminated relationship with 21 too. Same for 4999 and 5000. 21 is no further engaged. Strange, a round number. Port 21 is not monogamic for sure. But 3975 desparate ports jumping on it, can it cope with that? Something's going on here.

80 centimeters behind me lies the answer in dead tree format. I reach out to my bookshelve. A blue book, purchased 08/03/03 in Ghent according to a sticker on the back, is removed from its spot where it's been sitting since its overseas transportation almost two years ago. Not too dusty though. The title is still clearly readable: Windows Server 2003 TCP/IP Protocols and Services. Yes, I admit. I even posess books on IIS 6.0, Exchange 2003, Active Directory Services and PKI in Windows Server 2003. Not just for decoration, I've read significant portions of it in a distant past. Oh, those student times with plenty of time. Dreaming back of times with my chat buddy, sitting next to one another in courses. Pretending to pay attention. Sometimes we actually did. A power state transition request for S1 is made in my head, but not granted. I can't fall asleep now. The Book Browser service is started. Damn this book looks boring. Payload diagrams, sequence diagrams, tables. Some yellow marker reveals I've actually gone through it, ever, but the book Browser Service keeps triggering cache misses.

What am I looking for anyway? Oh yes, huge number of connections in CLOSE_WAIT state. Luckily, the Book Browser service declared a dependency on the Indexer service. It's consulted to load pages in the upper range of the dead tree volume, triggering a search algorithm for LCID 1033, and responds to the "CLOSE_WAIT" query with an offset 329. The Book Browser service takes over again and efficiently locates and maps in the page. A table explains: "A FIN-ACK has been received and a FIN-ACK has been sent.". Useful to know. Is there more? Figure 13-7 reveals TCP connection states. I remember it was quite involved but the number of arrows makes me dizzy. A bit of Pineapple Orange Strawberry juice will help to rejuice my power supplies. The linear page scanning is continued, and pauses at "Controlling TCP Connection Terminations in the Windows Server 2003 Family and Windows XP". Blah, blah. TcpTimedWaitDelay. What's that? Default value of 240, the number of seconds that a TCP connection remains in the TIME WAIT state. No, we're in CLOSE_WAIT still. Lookup continues. Hints appear for a MaxUserPort setting: "MaxUserPort specifies the maximum port number .... The default value of MaxUserPort is 5000." Bingo. We're on to something.

So, what happened? Some uploader for a content management system, hosted by dllhost.exe, is overly ambitious to upload files in parallel and exhausts the number of ports available to applications. All of this in a short period of time, not enough for ports to be reclaimed. MaxUserPort is the cure. At least just for the symptoms. Clearly, there's a bigger problem here. Why is the CMS so greedy with sockets? I start the discussion. Being an emergency, my chat buddy decides to boost the setting, and requests a reboot by the data center folks. They'll look at the underlying problem in the near future. A case of premature optimization I feel. 4000 or so parallel connections. Wonder how that scales. Wasn't FTP stateful? Reuse, pooling, whatever. Local network communication it follows from context, why no SMB file copy operations? Lots of questions.

Problem solved for now, and we chatted a bit more about the weather. Smalltalk, you know?

PS: Why does Bart have a book on network protocols? There's been a time I implemented network protocols in managed code, for fun. I love the specifications, their relatively high level of precision, and the great discussion topic they form for lunch conversations.

ACK

Del.icio.us | Digg It | Technorati | Blinklist | Furl | reddit | DotNetKicks

Weekends are for hacking up crazy ideas (and so are evenings). This weekend was no different; amongst other things I got around to implement an Unlambda interpreter in C#. A what? Let’s talk about that first.

Warning: Reading this post can cause permanent brain damage. It definitely helps to have some understanding of lambda calculus, SKI combinators, continuations, and more.

 

What’s Unlambda?

Unlambda is an esoteric programming language designed by David Madore. It’s based on functional programming ideas, very minimalistic (in terms of the number of operations) and very obscure (because of its minimalistic nature and the syntax). What’s it good for? Fun, nothing more. Or maybe some understanding of combinators. Let me cite the Unlambda website to summarize the above:

This means that the language was deliberately built to make programming painful and difficult (i.e. fun and challenging).

I guess I should start believing what some colleagues attribute to me: I like pain. A while back I added the creation of an Unlambda interpreter to my list of planned Crazy Sundays projects. (Unfortunately the list always grows and never shrinks…)

So, what’s in a name? Unlambda is based on functional programming, which finds its foundations in lambda calculus. We all know by now what lambda calculus embodies, don’t we? Essentially, it’s “just” a way to declare functions and manipulate them. The lambda notation of functions comes in two parts: the arguments and the body. This is shown below with a couple of samples:

clip_image002
clip_image004
clip_image006

Here I have declared three functions. The first is the identity function: given x, it simply returns x. The second one takes two arguments and drops the second one, just to return the first one. The last function is more subtle. Remember that all we manipulate in lambda calculus are functions: the arguments of the last function are named x, y and z and are all functions. Functions can be applied to other functions: this is simply written by juxtaposition (putting symbols next to one another). For example xy stands for apply x to y. This happens in a left-associate manner: xyz means (xy)z, so the function resulting from evaluating the application of x to y is next applied to argument z. The last function I’ve shown above does use parentheses to change the evaluation order: x is applied to z and that result is applied to the result of applying y to z (still follow?).

But those functions are not just samples: they’re actually the basic building blocks of combinatory logic. In essence this means you can get rid of all lambdas, replacing them by combinations of those essential building blocks:

clip_image002[4]
clip_image004[4]
clip_image006[4]

I is known as the identity function. It’s actually redundant, as SKK is equal to I (to try that, evaluate SKKx = Kx(Kx) = x, therefore SKKx = Ix and thus SKK = I through extensionality).

K is the constant function: it produces a constant output (given by its first argument) no matter what it’s applied to next: Kx(whatever) always produces x. It’s a simple as that.

S is the substitution function. Instead of applying x to y directly, it first applies z to both functions (producing xz and yz) and then applies those two to one another. So the third argument, z, gets “substituted” in both x and y through application.

Having just S and K makes a language Turing-complete. All good things boil down to two: binary, duality, S and K. So, all we need to write all sorts of programs is a machine with two instructions: S and K. (Question for the reader: Can we do with less?) Needless to say this leads to obscure code, but that’s what unlambda stands for. And now you see why the language is called Unlambda: there’s no lambda operation (abstraction) left, everything goes through the combinators.

Oh, and by the way, if you’re given a program based on lambdas, you can “compile” those away by applying a technique called “abstraction elimination”, resulting in S, K, I and parentheses:

clip_image002[8]
clip_image004[8]
clip_image006[8]
clip_image008[5]
clip_image010[4]
clip_image012[5]

clip_image002[22]

Ignore this curiosity for now; just keep in mind it’s a “handy” way to write programs using plain old lambda calculus and make it fit in S, K and I combinators.

 

The syntax

There’s no reason for me to duplicate the Unlambda Reference here. The main operators are ` (backtick, for application), s, k and i. A few others are provided to make the language more usable (., r and @ for input and output, as well as ? and | to deal with input state). And as Unlambda is meant to be a pain in the neck, David added some interesting complications such as Call/CC (call with current continuation, known from Scheme) and promises (a way to delay evaluation).

With this in place, we can start writing applications (note it’s normal for your head to explode):

```s``s``sii`ki `k.*``s``s`ks ``s`k`s`ks``s``s`ks``s`k`s`kr``s`k`sikk `k``s`ksk

This one prints the Fibonacci numbers by means of asterisks (*, *, **, ***, *****, etc) separated by newlines. Try to spot the output in the see of combinators: .* prints the asterisks and r prints the newlines.

Can I explain the program above? I could try, but am not going to: “all” there is to it is executing it blindlessly using the rules for S, K and I. The backticks are applications and make it possible to drop all parentheses from the syntax (they are not allowed in fact, making it very easy to parse Unlambda programs btw). How was this program written (by the way, I didn’t write it – it’s taken from the Unlambda homepage)? One necessary ingredient when working with numbers is Church numerals (which represents numbers and operations on those, as we need things like addition). Below are a few representations of numbers 0, 1 and 2 in Church numerals, translated to SKI:

clip_image002[10]
clip_image004[10] 
clip_image002[14]

clip_image008[7]
clip_image010[6]
clip_image002[16]

clip_image014
clip_image016
clip_image018
clip_image020
clip_image022
clip_image024
clip_image026
clip_image028
clip_image030
clip_image032
clip_image034 
clip_image002[18]

Next, we also need some operators. On such operator is increment (+ 1):

clip_image002[20]
clip_image004[12]
clip_image006[12]
clip_image008[9]
clip_image010[8]
clip_image012[9]
clip_image014[4]
clip_image016[4]
clip_image018[4]
clip_image020[4]
clip_image022[4]
clip_image024[4]
clip_image026[4]
clip_image028[4]
clip_image030[4]
clip_image032[4]

It’s trivial to see that applying inc to 1 (represented as I), yields 2. I’ll leave it to the reader to verify that applying inc to 0 results in 1. Also calculate 3 from 2 using the inc function.

One more operator while we’re at it: plus. This one is defined in terms of inc as follows (why?):

clip_image002[24]
clip_image004[14]
clip_image006[14]
clip_image008[11]
clip_image010[10]
clip_image012[11]

Exercise: Apply plus to 1 and 2 and check it corresponds to the result of inc 2 (previous exercise).

 

Implementing Unlambda in C#

Now to the real topic of this post. How to implement Unlambda in its entirety (including “c” Call/CC and “d” promises)? Before we get there, we need to think about parsing input first. But let’s start with the Main method:

Main

Main is trivial. It validates arguments, parses the input file, and feeds the resulting expression tree to an Execute method. And, hacky as this is, we do some ugly not-so-much recommended exception handling as well:

static void Main(string[] args)
{
    if (args.Length != 1 || string.IsNullOrEmpty(args[0]))
    {
        Console.WriteLine("Usage: {0} <inputfile>", Assembly.GetEntryAssembly().GetName().Name);
        return;
    }

    string file = args[0];
    if (!File.Exists(file))
    {
        Console.WriteLine("File not found: {0}", file);
        return;
    }

    try
    {
        Execute(Parse(file));
    }
    catch (Exception ex)
    {
        Console.WriteLine("Error: " + ex.Message);
    }
}

Parse

Parse isn’t hard either. The entry-point to the parser simply opens the file and gets a StreamReader:

static Expression Parse(string file)
{
    using (FileStream fs = File.OpenRead(file))
    using (StreamReader sr = new StreamReader(fs))
        return Parse(sr);
}

Next, the real parser. Simply a mapping from characters onto expressions:

static Expression Parse(StreamReader sr)
{
    char ch;
    do
    {
        ch = ReadNextChar(sr);
        if (ch == '#')
        {
            sr.ReadLine();
            ch = '\n';
        }
    } while (char.IsWhiteSpace(ch));

    switch (ch)
    {
        case '`':
            return new ApplyExpression(Parse(sr), Parse(sr));
        case 'S':
        case 's':
            return new FunctionExpression(S);
        case 'K':
        case 'k':
            return new FunctionExpression(K);
        case 'I':
        case 'i':
            return new FunctionExpression(I);
        case 'V':
        case 'v':
            return new FunctionExpression(V);
        case 'D':
        case 'd':
            return new DelayedFunctionExpression(D);
        case 'C':
        case 'c':
            return new FunctionExpression(CallCC);
        case 'E':
        case 'e':
            return new FunctionExpression(Exit);
        case 'R':
        case 'r':
            return new FunctionExpression(Print('\r'));
        case '@':
            return new FunctionExpression(Read);
        case '.':
            return new FunctionExpression(Print(ReadNextChar(sr)));
        case '|':
            return new FunctionExpression(Reprint);
        case '?':
            return new FunctionExpression(Compare(ReadNextChar(sr)));
        default:
            throw new Exception("Unrecognized input symbol: " + ch);
    }
}

static char ReadNextChar(StreamReader sr)
{
    int c = sr.Read();
    if (c == -1)
        throw new Exception("Unexpected end of file.");
    return (char)c;
}

But this deserves a bit of explanation. First, the comment-and-whitespace ignoring loop: this simply scans input for whitespace and if an # is encountered, drops the rest of the line.

The big switch statement recognizes all valid input symbols, case-insensitively. There are three sorts of expressions being built:

  • Function expressions correspond to built-in functions. The arguments to the constructor will be explained next.
  • The delayed function expression for “d” promises. Essentially the same as a regular function expression but a different type to recognize during evaluation.
  • The apply expression that takes two expressions and applies the first one to the second one. This corresponds to the ` operator: `AB means applying whatever A is to whatever B is.

The ReadNextChar helper tries to read one character and throws if the end of the file is reached. This is fine, it will only be called when we really expect a character in the input stream. Notice how functions like . and ? read one character ahead to get their corresponding character (to print it, ., or compare it, ?, respectively).

The expression objects

So, the output of the parser is an Expression object, and we know there are three subtypes. How do those look?

abstract class Expression
{
    public abstract Task Eval(Continuation c);
}

class ApplyExpression : Expression
{
    internal ApplyExpression(Expression f, Expression arg)
    {
        Operator = f;
        Operand = arg;
    }

    public Expression Operator { get; private set; }
    public Expression Operand { get; private set; }

    public override Task Eval(Continuation c)
    {
} } class FunctionExpression : Expression { internal FunctionExpression(Function f) { Function = f; } public Function Function { get; private set; } public override Task Eval(Continuation cont) {
… } } class DelayedFunctionExpression : FunctionExpression { internal DelayedFunctionExpression(Function f) : base(f) { } }

Simple, isn’t it? As you can see, each expression can be evaluated through an Eval method. We’ll talk about that next, in addition to the Function and Continuation types.

Continuation-passing style

In order to allow Call/CC (call with current continuation), we need to use Continuation Passing Style (CPS). The idea of this weird-sounding animal is simple: a function takes in a “continuation function” that will get the result of the function’s execution. So, every such function can be void-returning:

void Add(int a, int b, Action<int> continueWith)
{
    continueWith(a + b);
}

This results in a tail-recursive setting. How this enables (and what is) Call/CC is something I won’t elaborate on this time. You’ll see it implemented in just a minute. The sample above could be called like:

Add(1, 2, sum =>
    Add(sum, 3, sum2 =>
        { Console.WriteLine(sum2); }));

We’ll call the continuation function (Action<int> above) a Continuation:

delegate Task Continuation(Function f);

This delegate takes in a Function (the result of every computation in Unlambda is a function, remember there’s nothing but functions) and returns a Task. Where does that come from?

Well, there’s one problem with our CPS-based solution: it exhausts the call-stack. Yes, the CLR a tail call instruction, but no it’s not exposed through C#. We need something else here, which we’ll call tasks. This idea is that a task can be applied to yield the subsequent task. Sounds weird? Stay with me for a moment and have a look at the Execute method:

static void Execute(Expression ex)
{
    Continuation exit = f => { Environment.Exit(0); return null; };
    for (Task task = ex.Eval(exit); true; task = task())
        ;
}

Say we have an expression that computes some function: we call it by means of the Eval method. Eval wants a continuation, essentially saying what to do next after Eval completes. We begin by saying the thing to do after the top-level expression (i.e. representing the whole program) completes, is to exit the interpreter. The “return null” statement will never be reached. An alternative implementation would be to break from the loop in a null-task and make exit simply return null.

Conceptually, Eval should be called recursively: to do an application of A to B, we need to evaluate A first, then B and finally call the result of evaluating A with the result of evaluating B. But this would be one source of stack exhaustion. So instead of causing recursive calls, we make Eval return a Task object. Our main loop then gets that task, and calls it. The result of such a call is the next task to carry out:

delegate Task Task();

So, how does ApplyExpression do its eval?

class ApplyExpression : Expression
{
    internal ApplyExpression(Expression f, Expression arg)
    {
        Operator = f;
        Operand = arg;
    }

    public Expression Operator { get; private set; }
    public Expression Operand { get; private set; }

    public override Task Eval(Continuation c)
    {
        return () => Operator.Eval(f => () => Operator is DelayedFunctionExpression ? c(Program.D1(Operand)) : Operand.Eval(arg => () => f(arg, c)));
    }
}

Ah, we’re getting somewhere: basically the evaluation of an applications hands back to the caller a task (notice the () => … lambda to new up a Task) that will “recursively” call the Operator’s Eval method to get the function to be applied (e.g. ` (SKK) (KSK) will first evaluate the operator (SKK), to I that is, and then evaluate the operand (KSK), to S that is).

How the inner Operator.Eval works is irrelevant for now. The idea is that Operator.Eval will return a Task and the thing we return from the ApplyExpression’s Eval method needs to be a Task too. So, you see that the result of calling Eval on the application is a new task with the first thing it will do being the evaluation of the operator. This allows the main loop to continue calling through the resulting task objects, and you see the call stack never grows deeper than a single eval.

So, we know how a Task works and how a Continuation works. Recapitulate:

delegate Task Task();
delegate Task Continuation(Function f);

A continuation is something that can execute a function and return a task to continue executing. A task is something that can execute, producing its successor task. We still need to know how a Function works. Well, everything is a function in Unlambda so not unsurprisingly the argument to a Function is, well, a Function. And as we’re using CPS, the next argument needs to be a Continuation object:

delegate Task Function(Function arg, Continuation c);

Wow, three mutually recursively defined delegate types. A note on functions only taking in one argument, this can be achieved by currying: every function of multiple arguments can be rewritten as a function that eats one argument at a time:

K x y = x
K x = y –> x
K = x –> y –> x

S x y z = x z (y z)
S x y = z –> x z (y z)
S x = y –> z –> x z (y z)
S = x –> y –> z –> x z (y z)

A look at the SKI combinators

Let’s make this whole thing concrete by looking at how the functions S, K and I are defined. First I, as that’s the simplest one:

static Function I = (x, c) => c(x);

Remember a function takes in a function as its argument (here x) and a continuation (here c). All I does is taking its argument and passing it to its continuation as-is. This should be simple to understand: we just continue the computation with I’s argument, as the effect of applying I is nothing but returning its argument (identity function behavior).

K is a bit more complicated, but not too hard either:

static Function K = (x, c) => c((y, c1) => c1(x));

Here we curry K, a function that takes in two argument x and y, to eat one argument at a time. Every argument comes with the continuation for that application. That explains the (x, c) and (y, c1) input pairs. Recall what K does: given two arguments, it simply returns the first one (x). So, once we have the two arguments (one at a time, through currying), we apply the first argument (x) to the continuation we have at hand. That’s what c1(x) does.

Notice how the first continuation is called, passing in the curried function:

(y, c1) => c1(x)

This is nothing but the curried version of K. The y parameter is irrelevant as K doesn’t look at it. So in essence: calling K with the first argument returns a Task that results from calling the supplied Continuation with a new (curried K) Function that on its turn evaluates its continuation to x, the surviving argument from K.

Now the beauty of all: S. Here we need to curry twice:

static Function S = (x, c) => c((y, c1) => c1((z, c2) =>
{
    Expression arg = new FunctionExpression(z);
    return () => new ApplyExpression(
        new ApplyExpression(
            new FunctionExpression(x),
            arg),
        new ApplyExpression(
            new FunctionExpression(y),
            arg)
        ).Eval(c2);
}));

First x is supplied with its continuation c, then y with c1 and finally z with c2. Once we’re there we can evaluate what S is supposed to do:

S x y z = x z (y z)

This consists of three applications: first x to z, then y to z and finally the results of the first application (x z) to the result of the second one (y z). But we already have the ability to apply functions by means of our expression tree model, so we just new up an expression tree that represents the computation (x z (y z)). Then we call Eval on it, passing in the current continuation at hand, so it will eval the whole thing and pass its result (a Function, as everything in Unlambda is such a thing) to the continuation c2.

I bet this will be overwhelming, but that’s what esoteric languages are all about: an impractical nature :-).

Void

“Ignore whatever is passed to me”, is what void does. It simply says: give me something and I’ll do my own thing going forward. To accomplish this, it passes itself to the continuation:

static Function V = (x, c) => c(V);

Quiz: Why does the above work? We’re declaring V and using it on the right-hand side. Would the following work? Why (not)>

void DoSomething()
{
    Function V = (x, c) => c(V);
}

Some other, simple functions

To regain breath, let’s look at a few input/output functions which are fairly trivial. To make those work, we have the concept of an input stream (simply Console.In) and a character that was last read:

static TextReader In = Console.In;
static char? lastChar = null;

Three functions deal with this piece of global state: Read (@), Reprint (|) and Compare (?). Those are shown below:

static Function Read = (x, c) =>
{
    int ch = In.Read();
    lastChar = ch == -1 ? null : (char?)ch;
    return () => x(lastChar != null ? I : V, c);
};
static Function Reprint = (x, c) => () => x(lastChar != null ? Print(lastChar.Value) : V, c);
static FC<char> Compare = ch => (x, c) => () => x(lastChar == ch ? I : V, c);

Read takes input from the input stream, sees whether we reached the end or not, and sets the lastChar – a nullable char – accordingly. Next, it returns a task that will call the function following the Read (@) call, i.e. x. The argument to that function becomes either the identity function (causing execution to continue in essence) or void (something bad happened), depending on the input that was read. And obviously it passes on the continuation.

Reprint is very similar and defined in terms of a yet-to-be-defined built-in Print function. But again, it either passes on the Print function or void (in case no character is available yet, or anymore).

Compare is a function that takes an argument in the program code (e.g. ?b to compare the last read character to b), so we create an FC or function constructor that can be called by the Parser:

case '?':
    return new FunctionExpression(Compare(ReadNextChar(sr)));

So, the next character in the program text becomes the character to compare to, the the “real” Compare function is parameterized in this:

static FC<char> Compare = ch => (x, c) => () => x(lastChar == ch ? I : V, c);

with

delegate Function FC<T>(T t);

yields the following given ‘b’:

Function compB = (x, c) => () => x(lastChar == ‘b’ ? I : V, c);

which again follows the same pattern as the Read function: if equal, I is returned, otherwise V is returned. This allows for some control flow constructs.

Finally, there’s the Print function, again parameterized by source text (e.g. .* prints a ‘*’):

static FC<char> Print = ch => (x, c) => { Console.Write(ch); return c(x); };

Prints to the console as a side-effect and continues execution by calling the continuation. Notice ‘r’ is implemented in terms of Print, simply by printing ‘\n’ to write a new line:

case 'R':
case 'r':
    return new FunctionExpression(Print('\n'));

Call/CC and delay/promises

The most esoteric parts of Unlambda are maybe the Call/CC and delay functions (c and d):

static Function CallCC = (x, c) => () => x((f, c2) => c(f), c);

This is Call/CC: a function that returns a new task (() => …) that calls the given function (x) with the current continuation (c), but also causes the function’s argument to become a function that ignores its own continuation (c2) and instead uses the current one. That’s the crux at the heart of Call/CC: it ignores that passed-in continuation, and substitutes it by the current one. David has a good overview of Call/CC if that seems exotic to you.

Another one we should mention here is “d”, maybe even more mind-boggling. It essentially delays the evaluation of its argument: `FG where F evaluates to the delay function (d) does not evaluate G straight away. Instead it returns a function that, when applied, will evaluate G. E.g. if ``FGH is evaluated, and F is ‘d’, G doesn’t get evaluated till H is applied to the delayed `dG function. Therefore, this is an exception to the regular rules of evaluation in Unlambda, and not surprisingly it is a bit involved to write down:

static Function D = (f, c) => c(D1(new FunctionExpression(f)));

This is the easy part. I’ve factored out D1, the curried form, as it needs to be reused somewhere else – see further. I just wished I could explain how D1 works in a crystal-clear fashion. Let me try…

internal static FC<Expression> D1 = ex => (f1, c1) => () => ex.Eval(f => () => f(f1, c1));

In essence, D1 is again a function constructor, this time based on a FunctionExpression. Our function expression here is based on “f”, the thing to be delayed. In here, we create a new Function that given an argument f1 and a continuation c1 will produce a task (() => …) that will evaluate the delayed expression (ex), passing in a continuation. This is the most complex aspect of D1, at least to understand it (because, again it’s quite simple in its implementation). The way FunctionExpression.Eval works is shown below:

class FunctionExpression : Expression
{
    internal FunctionExpression(Function f)
    {
        Function = f;
    }

    public Function Function { get; private set; }

    public override Task Eval(Continuation cont)
    {
        return cont(Function);
    }
}

It simply calls the continuation with the supplied function. Put this together with how D defers evaluation of its argument f by new’ing up a FunctionExpression (ex) that subsequently gets called in D1 through ex.Eval, and we come to the conclusion that the argument of that continuation is no other than the deferred f:

static Function D = (f, c) => c(D1(new FunctionExpression(f)));

internal static FC<Expression> D1 = ex => (f1, c1) => () => ex.Eval(f => () => f(f1, c1));

So the task returned by the Eval’s continuation simply calls the deferred function f and passes in f1 as its argument, also asking to be continued with D1’s continuation.

ApplyExpression revisited

The final thing to look at is ApplyExpression’s Eval method (again):

class ApplyExpression : Expression
{
    internal ApplyExpression(Expression f, Expression arg)
    {
        Operator = f;
        Operand = arg;
    }

    public Expression Operator { get; private set; }
    public Expression Operand { get; private set; }

    public override Task Eval(Continuation c)
    {
        return () => Operator.Eval(f => () => Operator is DelayedFunctionExpression ? c(Program.D1(Operand)) : Operand.Eval(arg => () => f(arg, c)));
    }
}

Application is the thing that gets influenced by delayed functions, as created by the “d” operator. In essence, if the operator passed to an ApplyExpression is delayed, we should not evaluate the argument either. Rather we should create a promise – as represented by D – for it:

Operator is DelayedFunctionExpression ? c(Program.D1(Operand)) : …

Only when D1 gets called, as explained in the previous paragraph, evaluation of the operand will be forced (as promised).

If the operator passed to an ApplyExpression was not delayed, we just proceed regularly:

Operator.Eval(f => () => Operator is DelayedFunctionExpression ? … : Operand.Eval(arg => () => f(arg, c)));

That is, we evaluate the operand by passing in a continuation that subsequently will call the function f with the evaluated argument, passing in the continuation. If you read the whole (eager) evaluation from left to right you can see we first evaluated the operator, then the operand, and finally called the function. So, there are three nested tasks here (look for () => …).

Note: In theory it would be possible to eliminate the Expression types too, turning them in lambdas as well (representing their Eval method, hence of type Continuation –> Task). There’s one direct implementation worry though: the type-check for an Operator against “a delayed expression type”. Delegates have no tweakable type hierarchy, and there’s no way to add properties (like IsDelayed) to them (yes, you could keep a dictionary to maintain that property for each expression). You could hack it up in IL if you want, but let’s keep that as a challenge to the reader. Essentially you want to replace Expression by something like this:

delegate Task Expr(Continuation c) {
public bool IsDelayed { get; set; } // Hypothetical syntax
};
static Func<Function, Expr> FuncExpr = f => c => c(f);
static Func<Expr, Expr, Expr> ApplExpr = (Operator, Operand) => c => () => Operator(f => () => Operator.IsDelayed ? c(Program.D1b(Operand)) : Operand(arg => () => f(arg, c)));

Testing it

Phew, never though I’d make it this far. We’ve just written a humble 186 lines of dense C# code (no comments). Let’s see it works, with our Fibonacci sample:

image

Sweet – looks right, doesn’t it? 1, 1, 2, 3, 5, 8, 13, 21, etc.

I tried the interpreter code on all the samples that come with the Unlambda distribution, and it seems to pass fine:

image

This said, it’s always possible I made some mistake somewhere, so don’t make your next space shuttle project depend on this code :-).

 

Download

Gentle as I am, I’ve made the code available over here. Enjoy <g>!

Del.icio.us | Digg It | Technorati | Blinklist | Furl | reddit | DotNetKicks

Introduction

Declarative language constructs like query comprehension syntax often worries imperatively trained developers. I hear this quite a bit, and the excuse of “It Just Works” is often not satisfactory for most of them :-). Combine this with interesting behavioral differences like lazy evaluation and lots of developers get lost in paradise.

Actually the perceived problem is not with LINQ itself, typically a lack of solid understanding about the query execution model causes grief. So one thing that can help to address this problem is a better visualization of how a query executes. Advanced query “providers” like LINQ to SQL offer logging capabilities to inspect what’s going on, but LINQ to Objects lacks such a capability.

In this post, we’ll have a look at possible approaches to make debugging LINQ to Objects (and hence LINQ to XML) queries easier. At the end of the day you’ll come to the conclusion it all boils down to knowing precisely what the semantics of the various operators are and how the execution works in the face of laziness etc.

 

Visual Studio 2008 support

First things first, so let’s have a look at what Visual Studio 2008 gives us with regards to debugging LINQ queries. To begin with, you can set breakpoints on entire query comprehensions (or the statement in which they occur):

image

This is not useful for the debugging of the query itself. Even more, it’s actually distracting: breaking on the line marked above and continuing execution (F10) doesn’t really do much interesting things. That is, the query doesn’t execute here. LINQ queries are evaluated lazily, only when someone starts to consume them (like foreach or some greedy operator like ToArray, ToList, etc) things of interest start to happen.

The following setup is more useful:

image

Notice how the “Results View” reveals the lazy nature of the query. I’m actually breaking inside the foreach loop, but this pays us little or nothing: at that point we already observe the results and the fact we’re debugging a query expression likely reveals that we’re not satisfied by some of the results produced. We need something in between showing the results and the declaration of the query…

Note: In the screenshot above you can see some weird animal – WhereSelectArrayIterator. This reflects performance optimizations carried out in the 3.5 SP1 release where consecutive where and select clauses are optimized into one iterator. In addition, the knowledge the query is carried out over an array can be used to exploit characteristics of arrays to boost performance as well.

One thing you can do, is set breakpoints on the bodies of the various clauses in the query expression, like this:

image

To set those breakpoints, put your cursor somewhere in the clause’s body and press F9.

image

Now you can see when certain clauses hit, e.g. where in the case above, and see how objects flow through the query. For example, once we get to Split we’ll first hit the where clause and then the select clause with the same object:

image

As the object has flown through the whole query it finally reaches the body of the foreach-loop:

image

While this can definitely help, together with conditional breakpoints and such, it might become tedious when dealing with more advanced queries and/or big input sequences. Also, nested queries and joins can be puzzling when you observe the execution behavior this way.

 

A naive logger

One thing that can help to get more global insight in the query’s execution is to have a global kind of logger. Essentially we want to add inspection points somewhere in the query (much like doing electronics by measuring certain connection points) and log output so we can see how each object flows through the query execution pipeline.

Let’s start by looking at the query above in a more under-the-hood manner, by expanding the query syntax into a chain of query operator method calls. This reveals little dots which you can conceptually compare to our desired measurement points (almost as in electronics again :)).

image

How can we make those dots observable? Obviously, the measurement should not change the observation (ignoring Heisenberg’s uncertainty principle here). In essence, we want to inject some observer to watch the flow of the data. One way to do this is by defining our own extension method. Here’s what you might think would be meaningful:

public static class EnumerableDebugger
{
    public static IEnumerable<T> Watch<T>(this IEnumerable<T> input)
    {
        return input;
    }
}

With the following use:

var res = typeof(string).GetMethods()
    .Where(m => m.Name.StartsWith("S"))
    .Watch()
    .Select(m => m.Name)
    .Watch();

Not really useful actually because we don’t observe objects flowing through the pipeline. We’ll just get one call to Watch, when the query expression itself is executed. We really want to participate in the query execution by pulling items from the source (whatever occurs on the left-hand side of the Watch call) and yielding them back to our downstream consumer as-is. For example, if a foreach starts pulling on “res”, it pulls on Watch, which should pull on “Select” yielding back its results one by one, again causing the previous Watch the be pulled, and so on. Here’s a better attempt:

public static class EnumerableDebugger
{
    public static IEnumerable<T> Watch<T>(this IEnumerable<T> input)
    {
        foreach (var item in input)
            yield return item;
    }
}

Next we need to do something meaningful inside the loop. Okay, we could set a breakpoint, but our starting point of this whole exercise was to create a logger so we can analyze a whole query execution run afterwards. To keep things simple, we’ll just log to the console. We could simply do a ToString or we could take in a function that allows overriding the “printf” behavior (as objects might be complicated or might not have ToString overridden in a meaningful way):

public static class EnumerableDebugger
{
    public static IEnumerable<T> Watch<T>(this IEnumerable<T> input)
    {
        return Watch(input, item => item != null ? item.ToString() : "(null)");
    }

    public static IEnumerable<T> Watch<T>(this IEnumerable<T> input, Func<T, string> toString)
    {
        foreach (var item in input)
        {
            Console.ForegroundColor = ConsoleColor.Yellow;
            Console.WriteLine(toString(item));
            Console.ResetColor();
            yield return item;
        }
    }
}

This should look relatively straightforward. Let’s give our query a dry-run:

image

Not too bad; yellow is logging output and white is the program’s main output. Using our overload we could add more diagnostic information as well, and let’s go crazy by adding a watch to the output of the original source:

 

var res = typeof(string).GetMethods()
    .Watch(m =>