Saturday, September 12, 2009 4:56 AM bart

Taming Your Sequence’s Side-Effects Through IEnumerable.Let

Introduction

Side-effects don’t fit together very well with functional-style delayed computation, as observed in LINQ. Having such constructs embedded in an otherwise eagerly evaluated imperative language like C# can cause quite some confusion from time to time. A few samples (due to Erik Meijer) of such “mishaps” are shown below:

//
// When does it throw?
//

IEnumerable<double> res = null;
try
{
    res = from x in Enumerable.Range(0, 10)
          select 1.0 / x;
}
catch (DivideByZeroException) {}

foreach (var rcp in res)
    Console.WriteLine(rcp);

//
// What does it print?
//
Func<int, bool> div2 = i => { Console.WriteLine(“Div2 test “ + i); return i % 2 == 0; };
Func<int, bool> div3 = i => { Console.WriteLine(“Div3 test “ + i); return i % 3 == 0; };
var res = from x in Enumerable.Range(0, 10)
          where div2(x)
          where div3(x)
          select x;

foreach (var sxt in res)
    Console.WriteLine(sxt);

The side-effects in the samples above are of various kinds: exceptions and output operations all hurt in a functional setting. In pure “fundamentalist” functional programming languages, those effects are made explicit by means of the type system (in particular, the IO monad). When looking at a .NET “function” signature, types are lying at us:

class Random
{
    public int Next();
}

Sure, Next returns an Int32 numeric value when being called. But that’s only part of the story: every time you call it, it may return something else. In other words, we can’t simplify the following program:

var rand = new Random();
var fems = new Pair(rand.Next(), rand.Next());

into

var rand = new Random();
var __rn = rand.Next();
var fems = new Pair(__rn, __rn);

The main reason is simply because we don’t know enough about the behavior, i.e. the side-effects, the Next function exposes. Or to state it in another manner, Next is not a (pure) function but a method. In Haskell, types tell us a side-effect is going on:

randomIO :: (Random a) => IO a

Roughly speaking, Random a stands for Random<T> and IO a stands for T, “tagged” with the fact it has a side-effect through the IO monad (not the same as IO<T> as the CLR’s type system doesn’t support the higher kinding required to realize the typing required for monads).

 

Multiplication of side-effects

Being aware of the presence of side-effects is one thing, having them amplified or multiplied (almost) invisibly through the use of query combinators and such is even more painful. Consider the following iterator:

static class RandomExtensions
{
    public static IEnumerable<int> GetNumbers(this Random rand, int max)
    {
        while (true)
            yield return rand.Next(max);
    }
}

which can be called like:

var rand = new Random();
foreach (var x in rand.GetNumbers(100).Take(10))
    Console.WriteLine(x);

All is good and well so far, we’ll get 10 random numbers printed to the screen. Now assume we’re already in .NET 4 (just to make illustration a little easier, the general topic of “taming side effects” is version-independent and platform(-excluding-Haskell)-independent) and have Zip method available. Let’s write the following:

var rand = new Random();
var res = rand.GetNumbers(100).Take(10).Zip(rand.GetNumbers(100).Take(10), (l, r) => l + r).All(x => x % 2 == 0);

What do you think res should be? Clearly, the sum of a number and itself is always even. In bold, we have a common subexpression: on the face of it, you may think it’d produce the same results for both “sides” of Zip, hence the conclusion the zipped sums should always be even. Nothing is further away from the truth: GetNumbers propagates the side-effect of the underlying IO operation (reading form a random number generator) and the two calls create separate iterator instances, each merrily ticking along. A simple Print operator can help to illustrate this:

static class Enumerable
{
    public static IEnumerable<T> Print<T>(this IEnumerable<T> source, string formatMessage)
    {
        foreach (var item in source)
        {
            Console.WriteLine(formatMessage, item);
            yield return item;
        }
    }
}

Yes, even more side-effects added to the melting pot. Now we can write the following to illustrate the mishap:

var rand = new Random();
var res = rand.GetNumbers(100).Take(10).Print(“SRC1: {0}”).Zip(rand.GetNumbers(100).Take(10).Print(“SRC2: {0}”), (l, r) => l + r).All(x => x % 2 == 0);

One run on the author’s machine produced:

SRC1: 82
SRC2: 86
SRC1: 17
SRC2: 91
SRC1: 48
SRC2: 15

Clearly, both sources are out of sync. Now we go and knock on the door of a fellow programmer and ask to solve this issue. Well, clearly you’re calling GetNumbers twice – that’s your problem right there. Not so much… Let’s see what a proposed fix would look like:

var rand = new Random();
var src = rand.GetNumbers(100).Take(10);

var res = src.Print(“SRC1: {0}”).Zip(src.Print(“SRC2: {0}”), (l, r) => l + r).All(x => x % 2 == 0);

And here’s the result:

SRC1: 89
SRC2: 39
SRC1: 22
SRC2: 15

What’s happening here? Although we’ve reduced the number of calls to GetNumbers to just a single one, its execution is still lazily evaluated as we’re using an iterator. It’s not until we start iterating that computation is triggered. Moreover, the iterator returns an IEnumerable<int> which merely provides the potential to enumerate though the IEnumerator<int> objects it hands out. In effect, Zip will call GetEnumerator twice, resulting in two instances of the iterator (simplified code below):

public static IEnumerable<R> Zip<T1, T2, R>(this IEnumerable<T1> left, IEnumerable<T2> right, Func<T1, T2, R> zip)
{
    var leftE = left.GetEnumerator();
    var rightE = right.GetEnumerator();
    while (leftE.MoveNext() && rightE.MoveNext())
        yield return zip(leftE.Current, rightE.Current);
}

In our case, left and right are pretty much the same (ignore Print which acts as an all-pass filter iterator), namely src:

src.Zip(src, …)

Two calls to GetEnumerator result, each querying the same random generator instance that happily continues on its side-effecting mission. The iterator we wrote for GetNumbers is semantically equivalent to the following, illustrating this issue more clearly for those who are not familiar with the way iterators are compiled (simplified presentation):

static class RandomExtensions
{
    public static IEnumerable<int> GetNumbers(this Random rand, int max)
    {
        return new GetNumbersIterator(rand, max);
    }
}

class GetNumbersIterator : IEnumerable<int>
{
    private Random _rand;
    private int _max;

    public GetNumbersIterator(Random rand, int max)
    {
        _rand = rand;
        _max = max;
    }

    public IEnumerator<int> GetEnumerator()
    {
        return new GetNumbersIteratorEnumerator(this);
    }

    class GetNumbersIteratorEnumerator : IEnumerator<int>
    {
        private GetNumbersIterator _iter;
        private int _curr;

        public GetNumbersIteratorEnumerator(GetNumbersIterator iter)
        {
            _iter = iter;
        }

        public bool MoveNext()
        {
            _curr = _iter._rand.Next(_iter._max);
            return true;
        }

        public int Current
        {
            get { return _curr; }

        }
    }
}

In fact, you could say the side-effect that’s killing us now is the instantiation of GetNumbersIteratorEnumerator inside the GetEnumerator method. Observing that, the innocent unenlightened developer may say: what if we eliminate that additional object instantiation? Let’s “desugar” the iterator by writing it as the thing above (quite a bit of work, but the innocent developer is willing to go great lengths to get his code working As The Natural Numbers Intended) and eliminating the construction of a new GetNumbersIteratorEnumerator all the time.

class GetNumbersIterator : IEnumerable<int>
{
    private Random _rand;
    private int _max;
    private GetNumbersIteratorEnumerator _curr;

    public GetNumbersIterator(Random rand, int max)
    {
        _rand = rand;
        _max = max;
        _curr = new GetNumbersIteratorEnumerator(this);
    }

    public IEnumerator<int> GetEnumerator()
    {
        return _curr;
    }

Guess what? Here’s the result:

SRC1: 15
SRC2: 97
SRC1: 31
SRC2: 57
SRC1: 21
SRC2: 57
SRC1: 63
SRC2: 94

Bummer. We’re just pushing the problem in front of us. Though we have the same enumerator object inside the Zip method now, we’re still calling MoveNext twice:

public static IEnumerable<R> Zip<T1, T2, R>(this IEnumerable<T1> left, IEnumerable<T2> right, Func<T1, T2, R> zip)
{
    var leftE = left.GetEnumerator();
    var rightE = right.GetEnumerator();
    Debug.Assert(leftE == rightE);
    while (leftE.MoveNext() && rightE.MoveNext())
        yield return zip(leftE.Current, rightE.Current);
}

What would the desperate developer’s next suggestion be? Let’s not go there – clearly there’s something bigger the matter here. Everything you try at this level of decomposing iterator objects and knowing precisely how query operators work is a very very ugly band-aid trying to cover up for the bigger problem of side-effect replication here.

One way to tackle the problem from the outside is to break laziness:

var rand = new Random();
var src = rand.GetNumbers(100).Take(10).ToArray();

var res = src.Print(“SRC1: {0}”).Zip(src.Print(“SRC2: {0}”), (l, r) => l + r).All(x => x % 2 == 0);

Operators like ToArray and ToList drain the left-hand side sequence on the spot, persisting the retrieved elements in-memory. Iterators versus persisted data sequences aren’t differentiated on the type level: both are “enumerable”. Great from a generalization point of view, less so to understand the runtime behavior of dealing with those objects. Either way, the above works and prints:

SRC1: 44
SRC2: 44
SRC1: 6
SRC2: 6
SRC1: 81
SRC2: 81
SRC1: 14
SRC2: 14
SRC1: 65
SRC2: 65
SRC1: 1
SRC2: 1
SRC1: 9
SRC2: 9
SRC1: 67
SRC2: 67
SRC1: 65
SRC2: 65
SRC1: 75
SRC2: 75

For the first time we reached the end of the zipped sequence. Wonderful. But we’ve made a big sacrifice, something I accounted for right from the start of the post by using Take(10). The above doesn’t work if we’re dealing with infinite sequences. Obviously you should never try to drain those as the program will get stuck (which in fact can be formalized as well, maybe stuff to cover another time), yet infinite sequences are great to provide the user with the potential to compute as many results as desired. Having to break laziness to make query operator composition, e.g. for Zip in our sample, work spoils this whole idea of lazy computation.

By now it should be clear that side-effects are have far-stretching implications, but we should not ignore the fact they’re useful at times. What we need to make our “consistent consumption of side-effectful sequences” is some kind of let construct.

 

What’s let anyway?

Functional programming languages typically provide a “let” construct, e.g. as seen in ML-style languages like F#. One form is to use the following syntax:

let id = expr1 in expr2

The above is shorthand for abstraction and application carried out right after one another:

(fun id –> expr2) expr1

Consider the next sample:

let x = 1 in (x, x)

where (…, …) is the tupling operator. Semantically the above is equivalent to:

(fun x –> (x, x)) 1

In a call-by-value setting this effectively avoids re-evaluating the expr1 expression. Of course in the sample above, there’s not much work required to “re-evaluate” the rather minimalistic expression “1”. However, when evaluating this expression causes side-effects, we’re eliminating duplicate evaluation. This is shown in the sample below where we use the Stopwatch type to retrieve the timestamp, based on a high-resolution performance counter (see MSDN documentation for information and potential caveats).

image

Of course we can loop in our Random type sample as well:

image

You get the idea. How does this apply to our side-effecting iterators? Well, what we really want is a mechanism by which we can expose the same iterator to multiple consumers without re-evaluating the side-effects that get triggered by calls to IEnumerator.MoveNext (where the iterator’s code lives in one form or another). In addition to this functional requirement, we can’t drain the whole iterator as we’d break our teeth on infinite (computed) sequences.

 

IEnumerable.Let – Signature and use

First, let’s have a look at the contract for the IEnumerable.Let operator we’re after here. Obviously we want to keep the order as natural as possible and what better to use than an extension method to make syntax as natural as possible:

… Let<T>(this IEnumerable<T> source, …)

That’s a good start: we can take in an IEnumerable<T> as the expr1 for the let construct. Looking back at the F# equivalent for a moment:

(fun id –> expr2) expr1

It’s clear we need to have a function that exposes the one-time evaluated (read: enumerated) to a block of code where the user can party on the exposed sequence without having to worry about duplicating side-effects. The input of such a function clearly is IEnumerable<T>, leading to:

… Let<T>(this IEnumerable<T> source, Func<IEnumerable<T>, …

So, “id” from the F# code sample above will be identified to the input of the Func delegate. Finally, what’s the output meant to be? Obviously just the output of the expr2 evaluation after feeding in expr1 through the parameter. As we’re talking about sequence operators, a natural choice is to make the output an IEnumerable as well, and to allow projections and such to happen inside we choose another generic parameter:

… Let<T, U>(this IEnumerable<T> source, Func<IEnumerable<T>, IEnumerable<U>> function)

You can guess where the output of the source is meant to go to: the output of the Let construct by itself:

IEnumerable<U> Let<T, U>(this IEnumerable<T> source, Func<IEnumerable<T>, IEnumerable<U>> function)

Makes sense? For example, for our fancy Zip over random sequence we want to be able to write the following:

var rand = new Random();
var res = rand.GetNumbers(100).Take(10).Let(nums => nums.Zip(nums, (l, r) => l + r)).All(x => x % 2 == 0);

See what’s happening here? The left-hand side to Let becomes the source, which internally is exposed as “nums”. Zip produces its output, which becomes the output of the Let construct that’s subsequently fed into All. In fact, the above could be read as:

var res = (let nums = rand.GetNumbers(100).Take(10) in
               nums
.Zip(nums, (l, r) => l + r))
          .All(x => x % 2 == 0);

Now on to the question of the night, how to realize this?

 

Implementing IEnumerable.Let

Before we start implementing the let construct for LINQ to Objects, we need to make sure we fully understand the problem. Let me rephrase it one more time:

Given an IEnumerable<T> source, we need to expose it to a function is such a way that multiple consumers can only trigger a single enumerator to be active on the source. This ensures that the side-effects occurring as part of the enumeration (using MoveNext/Current) over the enumerator are not replicated.

Since many consumers can be active inside the “let body” (the function parameter in the signature), we can expect those to move at different paces. Here’s an example:

nums.Skip(5).Zip(nums, …)

In here, the Skip operator readily advances the iteration over the source 5 positions ahead. Then Zip consumes this input as well as the raw source starting from position 0. Given a sequence of { 1, 2, 3, 4, 5, 6, 7, 8, 9 }, the inputs to Zip are skewed over a distance 5, combining (6, 1), (7, 2), etc. This forces us to memorize the first 5 elements that have already been consumed by Skip, as they still have to be observed through Zip’s second input (and obviously we can’t recompute them as that’s what we’re trying to eliminate!).

For one thing, we need to keep track of elements that have been consumed from the Let-source so that consumers inside the body can get access to those computed values. An immediate question becomes whether or not we can optimize this: wouldn’t it be possible to scavenge the computed values if we are sure no single consumer inside the body can still consume it? Sounds reasonable, but there’s a deep problem here: the number of consumers can be dynamic. A good sample is SelectMany:

from x in nums
from y in nums
select x + y

This translates into the following:

nums.SelectMany(x => nums, (x, y) => x + y)

SelectMany is conceptually similar to nested loops used to enumerate a collection and for each of its elements retrieve associated objects (e.g. for each product, and each order for that product, do …). In the sample above we use nums twice, so assuming the source looks like { 1, … , 9 }, we will see the following pairs being fed to the projection function parameter:

(1, 1), (1, 2), …, (1, 9), (2, 1), …, (9, 1), (9, 2), …, (9, 9)

For the 9 elements in the left-hand side source, we’ll enumerate all of the elements in the corresponding source found by evaluating x => nums (where, in this degenerate case, we disregard the parameter x and simply return ourselves). How many times will we trigger enumeration on nums here? Well, one time for the input to SelectMany as the left-hand side, and for each element one more time for the nested enumeration. So, with 9 elements, the total will be 9 + 1 = 10. There’s no way for us to know upfront how many subscriptions will happen, so we can’t make assumptions when we can discard elements we’ve retrieved from the source.

All of those observations give rise to the introduction of a “memoized enumerator”. A what? Memoization is a technique used in functional programming to cache the result of evaluating a function knowing that function is pure, i.e. given the same inputs it always gives back the same output. A typical example is recursive Fibonacci:

Func<uint, uint> fib = null;
fib = n => n <= 1 ? 1 : fib(n - 2) + fib(n - 1);
for (uint i = 0; i < 50; i++)
    Console.WriteLine(fib(i));

Good luck running this code – it will take forever to compute fib(50) (ignore the fact it doesn’t fit in a uint) due to the explosion of recursive steps and re-evaluations incurred. A generic memoization function will cache results:

static Func<T, R> Memoize<T, R>(Func<T, R> f)
{
    var cache = new Dictionary<T, R>();

    return t =>
    {
        R res;
        if (cache.TryGetValue(t, out res))
            return res;
        return cache[t] = f(t);
    };
}

This memoizer is a higher-order function that transparently (in terms of the function signature) maps a function onto one that caches evaluation results. As a result, for a given function input “t”, the function “f” can be evaluated at most once (for subsequent function calls given the same input, the dictionary will be consulted). It should be pretty clear this is precisely what we want for our Let-enumerator as well, but instead of keying off the avoidance for multiple evaluations from the function input, we’ll need to avoid it based on the position of an enumerator. Let’s discuss this.

Assume you have two consumers of the Let-input sequence: one skips ahead 5 positions, the other starts from the beginning, just as we saw before:

src.Let(nums => nums.Skip(5).Zip(nums, (l, r) => l + r))

Let’s decompose the interactions here. Zip is pulling both its left-hand side followed by its first parameter, at an equal pace. First, the end-result:

image

I hope the above diagram makes sense, if not look at the more conceptual diagram below:

image

Now we loop bottom-up: if something pulls at Zip, what does it start pulling at by itself? First it calls MoveNext on its left-hand side, which is the Skip(5) block. Skip starts by fetching the first 5 + 1 items from its input, which is the Let block, in order to yield the 6th item back to the consumer. This leads to the following situation:

image

The read arrows indicate the active “channels” of enumeration. Reading bottom-up, Zip requested one element from Skip(5), which on its drew 6 items through its upstream channel connected to Let. Since Let hasn’t received calls to retrieve elements from the source at positions 0 through 5, it pulled on its source to get those items, causing potential side-effects (indicated by the bombs). But beside just yielding them back to the Skip(5) caller, it also “memoizes” them in an indexed collection. The “@5” annotation on the outgoing Let arrows – representing enumerators – indicate the cursor position in the source sequence, of the element that was last yielded through the channel.

Now that Zip has an element from its left-hand side source, it needs to fetch an element from its right-hand side source, in order for the two elements to be combined by the zipper function (in our case numerical +). This looks as follows:

image 

Notice the red channel doesn’t extend all the way to the source – we already have received the element at position 0, so we can consult the memoization store to retrieve the value from there. No observable source-induced side-effects are triggered by doing so. The cursor position for the active Let-consumer is moved to 0.

Essentially, whenever a channel queries for an element below the high-water mark of the memoization store, the element are retrieved from the store instead of pulling from the source. On to the code now. We’ll start by looking at a memoizing enumerable as we’ll use that as the building block for the Let construct. First, the IEnumerable<T> implementation:

class MemoizeEnumerable<T> : IEnumerable<T>, IDisposable
{
    private readonly IEnumerable<T> _source;
    private IList<T> _data;
    private IEnumerator<T> _enumSource;
    private bool _disposed;
    private bool _reachedEnd;

    public MemoizeEnumerable(IEnumerable<T> source)
    {
        _source = source;
    }

    public IEnumerator<T> GetEnumerator()
    {
        if (_disposed)
            throw new ObjectDisposedException("LetIterator");

        if (_enumSource == null && !_reachedEnd)
        {
            _enumSource = _source.GetEnumerator();
            _data = new List<T>();
        }

        return new MemoizeEnumerator<T>(this);
    }

    System.Collections.IEnumerator System.Collections.IEnumerable.GetEnumerator()
    {
        return GetEnumerator();
    }

    private bool TryMoveNext(int cursor, out T value)
    {
        Debug.Assert(cursor <= _data.Count);

        if (_disposed)
            throw new ObjectDisposedException("LetIterator");

        if (cursor < _data.Count)
        {
            value = _data[cursor];
            return true;
        }
        else
        {
            if (_reachedEnd)
            {
                value = default(T);
                return false;
            }

            if (_enumSource.MoveNext())
            {
                value = _enumSource.Current;
                _data.Add(value);
                return true;
            }
            else
            {
                // When we reach the end, no reason to delay the disposal of the source
                // any longer. Now we should avoid calls to MoveNext further on, so we
                // set a _reachedEnd flag to detect this case.
                _enumSource.Dispose();
_enumSource = null; _reachedEnd = true; value = default(T);
return false; } } } public void Dispose() { // Facilitates deterministic disposal of the underlying store. We can't rely // on Dispose calls on the enumerator object as we don't know the number of // consumers we're dealing with. if (!_reachedEnd)
{ _enumSource.Dispose(); _enumSource = null;
}
_disposed = true; _data.Clear(); } class MemoizeEnumerator<T_> : IEnumerator<T_> { …
} }

A MemoizeEnumerable<T> encapsulates an underlying IEnumerable<T> that represents the source. The GetEnumerator method for consumers to call in order to establish an enumerator triggers the MemoizeEnumerable to start consuming the input source unless it’s already doing so. This is the key to avoiding multiple source-consumers, which would lead to duplication of side-effects. The memoization store is simply a List<T>, which is indexable. What the MemoizeEnumerator<T> looks like will be shown soon.

Skipping over TryMoveNext for just a second, we look at Dispose first. As we’ve seen before, the number of enumerators over the exposed memoized data structure is something we can’t know upfront or infer from context. So, the disposal of enumerator objects doesn’t mean we can dispose the enumerator we established on the (memoized) source. We need a hint from the outside to say: it’s fine to let go the source (which could be holding on to, say, a network connection underneath). This will correspond to “scoped usage” of the Let construct as we shall see later.

Now on to TryMoveNext. Here the fetching of the underlying source happens, based on a passed-in cursor. We enforce linear progression, which is expected for MoveNext calls: the consuming enumerators should at most take one step forward at a time. If the cursor falls within the range of the memoization store, we simply retrieve the cached value from there. Otherwise, we fetch the next element from the underlying source (causing a potential side-effect once and for all for that element) and keep it in the store. If there are no further elements to be fetched as signaled by the source’s MoveNext call’s return value, we can dispose the enumerator over the source. To avoid subsequent calls on the disposed object, we keep a flag to signal we’ve reached the end. Notice we’re not dealing with multi-threaded consumers here – such worries are left as an exercise for the reader.

Finally, we should have a look at the MemoizeEnumerator inner class which obviously will call TryMoveNext in order to make progress over the shared source enumerator and memoization store. This type is fairly trivial:

class MemoizeEnumerator<T_> : IEnumerator<T_>
{
    private readonly MemoizeEnumerable<T_> _source;
    private int _cursor;
    private T_ _lastValue;

    public MemoizeEnumerator(MemoizeEnumerable<T_> source)
    {
        _source = source;
        _cursor = 0;
    }

    public T_ Current
    {
        get { return _lastValue; }
    }

    object System.Collections.IEnumerator.Current
    {
        get { return Current; }
    }

    public bool MoveNext()
    {
        return _source.TryMoveNext(_cursor++, out _lastValue);
    }

    public void Reset()
    {
        _cursor = 0;
    }

    public void Dispose()
    {
        // No disposal here - the enumerable object does the bookkeeping of the
        // underlying enumerator object. Here we're essentially just dealing with
        // a cursor into the underlying sequence.
    }
}

All there is to a memoizing enumerator is to keep its cursor, communicate it to TryMoveNext, and increment it. Notice the assert in TryMoveNext will trigger when an ill-behaved consumer continues to call MoveNext on the enumerator after it has returned false. One could pimp this case a little bit by either keeping a _reachedEnd state inside the enumerator (not to bother the enumerable object anymore in such a case) or by keeping the _cursor value in its final position as opposed to incrementing it.

Notice the enumerator’s Dispose method is not implemented for reasons explained earlier. The only thing one could do as a meaningful implementation is a _disposed flag to throw an ObjectDisposedException if the object is used beyond its disposal state. We leave this as a trivial exercise for the reader as well.

Ultimately we’ve reached a state where we can give away the entry-point to IEnumerable.Let:

public static IEnumerable<U> Let<T, U>(this IEnumerable<T> source, Func<IEnumerable<T>, IEnumerable<U>> function)
{
    using (var mem = new MemoizeEnumerable<T>(source))
    {
        foreach (var item in function(mem))
            yield return item;
    }
}

The most important thing here is obviously the memoization of the source which is fed to the function for the let-body to party on. The use of a using-block inside an iterator causes that iterator’s Dispose method to call the Dispose method on the resource in use. In other words, if one foreach’es over the result of a Let-call, exiting the loop will trigger a call to Dispose which on its turn disposes the memoized enumerable, again on its turn disposing the underlying source’s singleton enumerator. This means re-executing the Let resulting sequence itself will cause re-iteration over the underlying source. This makes sense as we only protect against duplication of side-effects inside the let-body. If you want to avoid re-execution of the Let-operator to cause duplication of side-effects by itself, wrap it in yet another Let-call.

 

A few samples

Time for some samples to conclude this journey. We go for two source iterators that are only of interest for their side-effects:

static IEnumerable<int> GetNums(int n)
{
    try
    {
        for (int i = 0; i <= n; i++)
        {
            Console.ForegroundColor = ConsoleColor.Red;
            Console.Write(i + "   ");
            Console.ResetColor();
            yield return i;
        }
    }
    finally
    {
        Console.ForegroundColor = ConsoleColor.Green;
        Console.WriteLine("Disposed");
        Console.ResetColor();
    }
}

static Random rand = new Random();

static IEnumerable<int> Lotto()
{
    try
    {
        for (int i = 0; i < 6; i++)
        {
            var n = rand.Next(100);
            Console.ForegroundColor = ConsoleColor.Red;
            Console.WriteLine(n);
            Console.ResetColor();
            yield return n;
        }
    }
    finally
    {
        Console.ForegroundColor = ConsoleColor.Green;
        Console.WriteLine("Disposed");
        Console.ResetColor();
    }
}

We log produced values to the console to visualize how many times the iterator is enumerated over. Obviously we write those in red as they are bloody side-effects. We’re also interested to see proper disposal, so we log that event too. A first example using SelectMany without Let shows how bad things get as we replicate side-effects:

Console.WriteLine("Without let");
{
    var source = GetNums(10);

    var res = from x in source
              from y in source
              select x + y;

    foreach (var i in res)
        ;
}
Console.WriteLine();

The result is as follows, no less than 11 iterations over the source sequence:

image 

Next, we eliminate the common sub-expression “source” by lifting it into a Let block. Source becomes the input to Let, and inside the let-body we represent the source with the “src” identifier:

Console.WriteLine("With let");
{
    var source = GetNums(10);

    var res = source.Let(src =>
                from x in src
                from y in src
                select x + y);

    foreach (var i in res)
        ;
}
Console.WriteLine();

Now things look much better:

image

Obviously the computed results will be the same (not visualized here – we’re just showing the side-effects), but we got rid of the excessive side-effect replication. This said, consuming a Let’s output multiple times doesn’t prevent the source from being consulted more than once: only inside the Let-block, consumers are shielded from side-effect duplication, but outside, consumers of the Let-block itself don’t get protection:

 

Console.WriteLine("With let (twice)");
{
    var source = GetNums(10);

    var res = source.Let(src =>
    {
        Console.WriteLine("Let!");
        return from x in src
               from y in src
               select x + y;
    });

    foreach (var i in res)
        ;

    foreach (var i in res)
        ;
}
Console.WriteLine();

We enumerate twice over the output of Let, hence we see the side-effects twice (and not 22 times as would be the case if we left Let out of the picture altogether):

image

However, nothing prevents us from putting a Let over the Let we already have: let’s eliminate the common subexpression res in both foreach-statements as follows:

Console.WriteLine("With let (nested)");
{
    var source = GetNums(10);

    var res = source.Let(src =>
        {
            Console.WriteLine("Let!");
            return from x in src
                   from y in src
                   select x + y;
        });

    res.Let(src =>
    {
        foreach (var i in src)
            ;

        foreach (var i in src)
            ;

        return new int[] { }; // doesn't matter, we want an "imperative let"
    }).ToArray(); // forces evaluation
}
Console.WriteLine();

Sure enough, we’re back to the minimum amount of side-effects:

image

And finally, some direct uses of Memoize, assuming the following exposition on IEnumerable<T> is present:

public static MemoizeEnumerable<T> Memoize<T>(this IEnumerable<T> source)
{
    return new MemoizeEnumerable<T>(source);
}

Now we use our lottery iterator as a source and show that the sequences are not equal without memoization as they are enumerated multiple times and hence different random numbers are picked for two uses of the source:

Console.WriteLine("Without memoize");
{
    var src = Lotto();
    Console.WriteLine(src.SequenceEqual(src));
}
Console.WriteLine();

Console.WriteLine("With memoize");
{
    using (var src = Lotto().Memoize())
        Console.WriteLine(src.SequenceEqual(src));
}
Console.WriteLine();

With the Memoization in use, obviously the two sequences ought to be the same all the time:

image

One more thing to illustrate is the use of infinite streams by means of a Repeat operator:

public static IEnumerable<T> Repeat<T>(T value)
{
    while (true)
        yield return value;
}

public static IEnumerable<T> Repeat<T>(IEnumerable<T> source)
{
    while (true)
        foreach (var item in source)
            yield return item;
}

It should be clear that the use of eagerly evaluated operators like ToArray and ToList won’t work as a side-effect-avoiding technique in such a setting. However, with Memoize one can eliminate side-effects without getting stuck:

Console.WriteLine("Memoize prevents _|_ out");
{
    using (var mem = Enumerable.Repeat(1).Memoize())
        Console.WriteLine(mem.Take(10).Count());
}
Console.WriteLine();

Console.WriteLine("ToList may _|_ out");
{
    Console.WriteLine(Enumerable.Repeat(1).ToList().Take(10).Count());
}
Console.WriteLine();

The samples above are not of interest for side-effects (though one can easily come up with samples), but rather for their infinite character. Use of ToList will exhaust the managed heap quickly, while Memoize just fills memory for the objects that are ever requested downstream (in the case above “Take 10”).

image 

Wonder about the _|_ notation? This is the “bottom type”, one way to type non-termination. “Entre parentheses”, we don’t have such a thing in the world of .NET, which is a pity. For example, if you have an exception throwing helper method:

static void ThrowAlways()
{
    throw new MyException(/* localization fluff */);
}

and you call the above from a function to signal an exception:

static int Bar()
{
    if (cond)
        …
    else
        ThrowAlways();
}

the compiler will complain that “not all code paths return a value”. Void is not the same as bottom, flow analysis cannot know the method won’t every return. With a bottom type, e.g. using keyword “never” (or more juicy ones like “blackhole”, “getstuck”, “goodbye”, “farewell”), one could say:

static never ThrowAlways()
{
    throw new MyException(/* localization fluff */);
}

Now the compiler would check statically that ThrowAlways can never reach its end (either by throwing an exception, looping infinitely or calling other “never”-returning methods). At the call-site, the compiler would be happy with Bar’s definition as well since it knows ThrowAlways never returns, hence there’s no need to be worried about not having a return value for the enclosing method. And of course, if we also had union types, we could (or maybe should, depending on how pesky the compiler would be made) warn the user that Bar may never return:

static (never | int) Bar()
{
    if (cond)
        …
    else
        ThrowAlways();
}

But all of this deserves its own discussion another time.

 

Conclusion

If you don’t want to tame your side-effects, you should at least have good notions about where they happen and how they may get replicated by mixing lazy and eager evaluation in languages like C#. Enumerable.Let shows one approach to taming side-effects in the context of LINQ queries with side-effecting computed sequences (iterators).

If you can’t get enough of this stuff, check out Erik Meijer’s talks on Fundamentalist Functional Programming. Credits where credits are due, the Enumerable.Let exercise shown in here presents the dual of the IObservable’s Let operator (from the Reactive LINQ framework Erik and his team have been working on lately), which I’ll talk about another time.

Left as a challenge for the reader: can you think of a way to port let’s brother “letrec” to the world of IEnumerable<T> as well? Let completeness be your source of inspiration :-).

Del.icio.us | Digg It | Technorati | Blinklist | Furl | reddit | DotNetKicks

Filed under: , , ,

Comments

# Interesting Finds: September 12, 2009

Saturday, September 12, 2009 5:52 AM by Jason Haley

Interesting Finds: September 12, 2009

# re: Taming Your Sequence’s Side-Effects Through IEnumerable.Let

Saturday, September 12, 2009 6:09 AM by Nick Palladinos

It looks like F#'s Seq.cache

let Let s f = s |> Seq.cache |> f

# Bart de Smet walks through an Enumerable.Let exercise

Saturday, September 12, 2009 7:23 AM by Bart de Smet walks through an Enumerable.Let exercise

Pingback from  Bart de Smet walks through an Enumerable.Let exercise

# re: Taming Your Sequence’s Side-Effects Through IEnumerable.Let

Saturday, September 12, 2009 7:42 AM by Jonathan Pryor

What license is this code under?  If MIT/X11 is acceptable, would you mind if I added these extension methods to Mono.Rocks (anonsvn.mono-project.com/.../rocks)?

Thanks,

- Jon

# Taming Your Sequence’s Side-Effects Through IEnumerable.Let - Bart De Smet

Saturday, September 12, 2009 10:10 AM by DotNetShoutout

Thank you for submitting this cool story - Trackback from DotNetShoutout

# re: Taming Your Sequence’s Side-Effects Through IEnumerable.Let

Sunday, September 13, 2009 1:19 AM by Nick Palladinos

public static IEnumerable<U> LetRec<T, U>(this IEnumerable<T> source, Func<Func<IEnumerable<T>, IEnumerable<U>>, Func<IEnumerable<T>, IEnumerable<U>>> function)

{

Func<Func<IEnumerable<T>, IEnumerable<U>>, Func<IEnumerable<T>, IEnumerable<U>>> letWrap =

f => x => Let(x, function(f));

return Y(letWrap)(source);

}

# Dew Drop &#8211; Weekend Edition &#8211; September 12-13, 2009 | Alvin Ashcraft&#039;s Morning Dew

Pingback from  Dew Drop &#8211; Weekend Edition &#8211; September 12-13, 2009 | Alvin Ashcraft&#039;s Morning Dew

# letrec

Sunday, September 13, 2009 12:48 PM by Thoughts and Code

Ο Bart de smet συνεχίζει την πολύ καλή σειρά από posts, και αυτή την φορα μας παρουσιάζει ένα cache Enumerable

# Interesting Finds: 2009 09.06 ~ 09.14

Sunday, September 13, 2009 10:33 PM by gOODiDEA.NET

.NET Comparing Two Images in C# Application running in localhost and debugging mode - HttpContext.Current

# Interesting Finds: 2009 09.06 ~ 09.14

Sunday, September 13, 2009 10:39 PM by gOODiDEA

.NETComparingTwoImagesinC#Applicationrunninginlocalhostanddebuggingmode-HttpCo...

# Reflective Perspective - Chris Alcock &raquo; The Morning Brew #432

Pingback from  Reflective Perspective - Chris Alcock  &raquo; The Morning Brew #432

# re: Taming Your Sequence’s Side-Effects Through IEnumerable.Let

Thursday, September 17, 2009 8:09 AM by Evan Carroll

Making people /execute/ code they don't understand to prove a point is a novice writing technique. Please, in the future, tell us what the code does in as much detail as you tell tell of the code. I'm reading your introduction, I see code you claim to have side-effects, but you don't drive the point home. I don't know what it throws! My C# is beginner level (though I'm half way through the MS C# 3.0 book), otherwise I have solid programming background. Cut to the point, and lower the information cost of your message.

# re: Taming Your Sequence’s Side-Effects Through IEnumerable.Let

Friday, September 18, 2009 1:26 AM by TWAIN lover

Thanks for sharing.

# re: Taming Your Sequence’s Side-Effects Through IEnumerable.Let

Friday, September 18, 2009 7:45 PM by bart

Hi Evan,

I'm sorry if you didn't get the point I wanted to make in this post. Though I've stated what I consider to be side effects pretty early on: "The side-effects in the samples above are of various kinds: exceptions and output operations all hurt in a functional setting." So, in subsequent samples I use Console.WriteLine as the side-effect of choice. Then I move on to state we're seeing side-effects as everything that's no a pure function (e.g. reading from a random number generator, as stated in the _introduction_). Personally, I still think the post's introduction sufficiently states what we're going to talk about and how I see things, but feel free to disagree.

This said, I believe the best way to learn is to try things out yourself. Step through the code in a debugger to see what it exactly does. I've explained numerous times how LINQ to Objects queries execute by means of their chained iterators (set breakpoints on the various clauses, like where's predicate, and get surprised about the order they hit in). Sure, I could repeat that over and over again (I even partially did with the diagram of the "data flow network")...

Hope this helps,

-Bart

# W???P???F &raquo; Post Topic &raquo; [C#][Silverlight] Reactive Framework (Rx) ??????????????????

Pingback from  W???P???F  &raquo; Post Topic   &raquo; [C#][Silverlight] Reactive Framework (Rx) ??????????????????

# Top 9 Posts from 2009

Sunday, January 03, 2010 1:08 AM by B# .NET Blog

select top 9 [Subject] from dbo . cs_Posts where postlevel = 1 and usertime &lt; &#39;01/01/2010&#39;

# Disposing sequence of resources with Reactive Extensions

Monday, June 13, 2011 10:30 PM by Chasing state of the art

Recall my previous post on Disposing sequence of resources where we were solving imperatively the following problems: Create single disposable representation for a sequence of disposable resources Defer resources allocation to avoid exception propagation

# The Joys of Invention &laquo; cat thoughts &gt; /dev/blog

Wednesday, October 12, 2011 9:16 AM by The Joys of Invention « cat thoughts > /dev/blog

Pingback from  The Joys of Invention &laquo; cat  thoughts &gt; /dev/blog