Monday, December 28, 2009 1:07 AM bart

More LINQ with System.Interactive – Sequences under construction

With the recent release of the Reactive Extensions for .NET (Rx) on DevLabs, you’ll hear quite a bit about reactive programming, based on the IObservable<T> and IObserver<T> interfaces. A great amount of resources is available on Channel 9. In this series, I’ll focus on the dual of the System.Reactive assembly, which is System.Interactive, providing a bunch of extensions to the LINQ Standard Query Operators for IEnumerable<T>. In today’s installment we’ll talk about constructor operators provided by EnumerableEx:

image

 

Constructing sequences

In order to perform operations over sequences using various combinators and operators, it’s obviously a prerequisite to have such sequences available. While collection types in the .NET Framework implement IEnumerable<T> (or the non-generic counterpart, bridgeable to LINQ using the Cast<T> Standard Query Operator), one often wants to construct sequences on the spot. Moreover, sequences often should have a lazy nature as their persistence in memory may be problematic or infeasible (infinite sequences). For all those reasons, constructor operators come in handy.

LINQ to Objects already has a constructor function called Enumerable.Range to produce a sequence with a integral numbers starting from a certain value, returning the asked amount of numbers lazily:

// Imperative
for (int i = 0; i < 10; i++)
{
    Console.WriteLine(i);
}

// LINQish
Enumerable.Range(start: 0, count: 10).Run
(
    Console.WriteLine
);

The lazy nature should not be underestimated, as one could create infinite sequences representing the potential to produce a certain (ordered) set of objects. When combined with other restriction operators it becomes possible to use composition to limit the produced results in a manner very close to the domain we’re talking about. For example, positive natural numbers are integer numbers larger or equal to zero. Numbers starting with 5 are the numbers, capped by means of a Skip operation or something similar. Taking a number of elements can be done using Take. Without deviating too much from our today’s blogging mission, here’s what I’m alluding to:

static IEnumerable<int> Integer()
{
    for (int i = int.MinValue; i < int.MaxValue; i++)
        yield return i;

    yield return int.MaxValue;
}
var ints = Integer();
var nats = from i in ints where i >= 0 select i;
var some = nats.Skip(5).Take(5); // Good luck :-)
some.Run(Console.WriteLine);

I’ll leave it to the reader as a challenge to come up with ways to optimize this in a variety of ways whilst preserving the declarative nature on the use site (i.e. make the sarcastic “Good luck” go away).

Back to Rx: in today’s installment we’ll look at various constructor functions in EnumerableEx.

 

Return and the cruel return of the monad

The simplest constructor function is Return, simply yielding the single value specified on demand. It’s similar to a one-element array and that’s about it from a practical point of view:

public static IEnumerable<TSource> Return<TSource>(TSource value);

You should be able to guess the implementation of the operator for yourself. Use is straightforward as shown below:

EnumerableEx.Return(42).Run(Console.WriteLine);

One interesting thing about this constructor function is its signature, going from TSource to IEnumerable<TSource>. This is nothing but the return function (sometimes referred to as unit) used on a monad, with a more general signature of T to M<T>, the little brother to the bind function which has signature M<T> –> (T –> M<R>) –> M<R>, also known as SelectMany in LINQ. The triplet (known as a Kleisli triple) of the type constructor M (in LINQ the particular cases of IEnumerable<T> and IQueryable<T> are used, i.e. not a general type constructor), the unit and bind function form a monad.

image

For a great overview of Language Integrated Monads, have a look at Wes Dyer’s The Marvels of Monads post. For a more foundational paper (with lots of applications though), have a look at Philip Wadler’s Monads for Functional Programming paper.

 

Throw me an exception please

Another singleton constructor is the Throw function that we’ve seen repeatedly in the previous post on exception handling over sequences. Its role is to provide an enumerable that will throw an exception upon the first MoveNext call during enumeration:

public static IEnumerable<TSource> Throw<TSource>(Exception exception);

In fact, this is a lazily thrown exception constructor. Use is simple again:

EnumerableEx.Throw<int>(new Exception()).Run();

Notice you got to specify the element type for the returned (never-yielding) sequence as we’re constructing an IEnumerable<T> and there’s no information to infer T from. Obviously, the resulting sequence can be combined with other sequences of the same type in various places, e.g. using Concat. Below is a sample of how to use the Throw constructor with SelectMany to forcefully reject even numbers in a sequence (rather than filtering them out):

var src = Enumerable.Range(1, 10);//.Where(i => i % 2 != 0);
var res = src.SelectMany(i =>
    i % 2 == 0
    ? EnumerableEx.Throw<int>(new Exception("No evens please!"))
    : EnumerableEx.Return(i)
);
res.Run(Console.WriteLine);

Here we use the conditional operator to decide between an exception throwing sequence or a singleton element sequence (in this case, “Many” in “SelectMany” has “Single” semantics).

 

Empty completing the triad

Since the introduction of LINQ in .NET 3.5 (thanks to reader Keith for reminding me about my heritage), there’s been an Empty constructor as well, with the following signature and implementation:

public static IEnumerable<TSource> Empty<TSource>()
{
    yield break;
}

There seems little use for this though I challenge the reader to use this one to build the Where operator using SelectMany. In fact, the reason I say “for completeness” is illustrated below:

image

 

StartWith = Snoc (or Cons in disguise)

People familiar with LISP, ML, Scala, and many other functional languages, will know the concept of cons by heart. Cons is nothing but the abbreviation for “construct” used to create a bigger list (in LISP lingo) out of an existing list and an element to be prepended:

(cons 1 (cons 2 nil))

The above creates a list with 1 as the head and (cons 2 nil) as the tail, which by itself expands into a cell containing 2 and a tail with the nil (null) value. The underlying pair of the head value and tail “reference” to the tail list is known as a cons cell. Decomposition operators exist, known as car and cdr (from old IBM machine terminology where cons cells were realized in machine words consisting of a so called “address” and “decrement” register, explaining the a and d in car and cdr – c and r stand for content and register respectively):

(car (cons 1 2)) == 1
(cdr (cons 1 2)) == 2

The StartWith operator is none other than Cons in reverse (sometimes jokingly referred to as “Snoc” by functional programmers):

public static IEnumerable<TSource> StartWith<TSource>(this IEnumerable<TSource> source, params TSource[] first);
public static IEnumerable<TSource> StartWith<TSource>(this IEnumerable<TSource> source, TSource first);

Focus on the second one first. See how the “first” parameter is taken in as the second argument to StartWith. The reason is it’d be very invasive to put the extension method this parameter on the “first” parameter, as it would pollute all types in the framework with a “Cons” method:

public static IEnumerable<TSource> Cons<TSource>(this TSource head, IEnumerable<TSource> tail);

So, StartWith has to be read in reverse as illustrated below:

EnumerableEx.StartWith(
    EnumerableEx.StartWith(
        EnumerableEx.Return(3),
        2
    ),
    1
).Run(Console.WriteLine);

This prints 1, 2, 3 since 2 is put in front of 3 and 1 in front of that { 2, 3 } result. An overload exists to start a sequence with multiple elements in front of it:

EnumerableEx.StartWith(
    EnumerableEx.Return(3),
    1, 2
).Run(Console.WriteLine);

image

 

Generate is your new anamorphism

Generate is the most general constructor function for sequences you can imagine. It’s the dual of Aggregate in various ways. Where Aggregate folds a sequence into a single object by combining elements in the input sequence onto a final value in a step-by-step way, the Generate function unfolds a sequence out of a generator function also in a step-by-step way. To set the scene, let’s show the power of Aggregate by refreshing its signature and showing how to implement a bunch of other LINQ combinators in terms of it:

public static TResult Aggregate<TSource, TAccumulate, TResult>(this IEnumerable<TSource> source, TAccumulate seed, Func<TAccumulate, TSource, TAccumulate> func, Func<TAccumulate, TResult> resultSelector);

Given a seed value and a function to combine an element of the input sequence with the current accumulator value into a new accumulator value, the Aggregate function can produce a result that’s the result of (left-)folding all elements in the sequence one-by-one. For example, a sum is nothing but a left-fold thanks to left associativity of the numerical addition operation:

1 + 2 + 3 + 4 + 5 = ((((1 + 2) + 3) + 4) + 5)

The accumulated value is the running sum of everything to the left of the current element. Seeing the elements of a sequence being eaten one-by-one is quite a shocking catastrophic event for the sequence, hence the name catamorphism. Below are implementations of Sum, Product, Min, Max, FirstOrDefault, LastOrDefault, Any and All:

var src = Enumerable.Range(1, 10);

Console.WriteLine("Sum = " + src.Aggregate(0, (sum, i) => sum + i));
Console.WriteLine("Prd = " + src.Aggregate(1, (prd, i) => prd * i));
Console.WriteLine("Min = " + src.Aggregate(int.MaxValue, (min, i) => i < min ? i : min));
Console.WriteLine("Max = " + src.Aggregate(int.MinValue, (max, i) => i > max ? i : max));
Console.WriteLine("Fst = " + src.Aggregate((int?)null, (fst, i) => fst == null ? i : fst));
Console.WriteLine("Lst = " + src.Aggregate((int?)null, (lst, i) => i));
Console.WriteLine("AlE = " + src.Aggregate(true, (all, i) => all && i % 2 == 0));
Console.WriteLine("AnE = " + src.Aggregate(false, (any, i) => any || i % 2 == 0));

As the dual to catamorphisms we find anamorphisms, where one starts from an initial state and generates elements for the resulting sequence. I leave it to the reader to draw parallels with others words starting with ana- (from the Greek “up”). The most elaborate signature of Generate is shown below:

public static IEnumerable<TResult> Generate<TState, TResult>(TState initial, Func<TState, bool> condition, Func<TState, IEnumerable<TResult>> resultSelector, Func<TState, TState> iterate);

To see this is the dual to Aggregate, you got to use a bit of fantasy, but you can see the parallels. Where Aggregate takes in an IEnumerable<TSource> and produces a TResult, the Generate function produces an IEnumerable<TResult> from a given TState (and a bunch of other things). On both sides, there’s room for an initial state and a way to make progress (“func” versus “iterate”) both staying in their respective domains for the accumulation type (TAccumulate and TState). To select the result (that will end up in the output sequence), the overload above allows to produce multiple TResult values to be returned per TState. And finally, there’s a stop condition which is implicit in the case of a catamorphism as the “remaining tail of sequence is empty” condition can be used for it (i.e. MoveNext returns false).

Another way to look at Generate is to draw the parallel with a for loop’s three parts: initialization, termination condition, update. In fact, Generate is implemented as some for-loops. More signatures exist:

public static IEnumerable<TValue> Generate<TValue>(this Func<Notification<TValue>> function);
public static IEnumerable<TResult> Generate<TState, TResult>(TState initial, Func<TState, IEnumerable<TResult>> resultSelector, Func<TState, TState> iterate);
public static IEnumerable<TResult> Generate<TState, TResult>(TState initial, Func<TState, Notification<TResult>> resultSelector, Func<TState, TState> iterate);
public static IEnumerable<TResult> Generate<TState, TResult>(TState initial, Func<TState, bool> condition, Func<TState, IEnumerable<TResult>> resultSelector, Func<TState, TState> iterate);
public static IEnumerable<TResult> Generate<TState, TResult>(TState initial, Func<TState, bool> condition, Func<TState, TResult> resultSelector, Func<TState, TState> iterate);

We’ll discuss the ones with Notification<T> types in the next episode titled “Code = Data”, but the remaining three others are all straightforward to understand. Some lack a terminating condition while others lack the ability to yield multiple results per intermediate state. Below is a sample of Generate to produce the same results as Enumerable.Range:

Func<int, int, IEnumerable<int>> range = (start, count) => EnumerableEx.Generate(0, i => i < count, i => i + start, i => i + 1);

The other constructors we’ve seen can be written in terms of Generate as well:

Func<IEnumerable<int>> empty = () => EnumerableEx.Generate<object, int>(null, o => false, o => null, o => o);
Func<int, IEnumerable<int>> @return = i => EnumerableEx.Generate<int, int>(0, n => n < 1, o => new [] { i }, n => n + 1);
Func<Exception, IEnumerable<int>> @throw = ex => EnumerableEx.Generate<object, int>(null, o => true, o => { throw ex; return null; }, o => o);
Func<int, IEnumerable<int>, IEnumerable<int>> cons = (a, d) => EnumerableEx.Generate<int, int>(0, n => n < 2, o => o == 0 ? new [] { a } : d, n => n + 1);

@return(1).Run(Console.WriteLine);
@throw(new Exception()).Catch((Exception ex) => @return(22)).Run(Console.WriteLine);
cons(1, cons(2, cons(3, empty()))).Run(Console.WriteLine);

 

Defer what you can do now till later

The intrinsic lazy nature of sequences with regards to enumeration allows us to push more delayed effects into the sequence’s iteration code. In particular, the construction of a sequence can be hidden behind a sequence of the same type. Let’s show a signature to make this more clear:

public static IEnumerable<TSource> Defer<TSource>(Func<IEnumerable<TSource>> enumerableFactory);

In here, an IEnumerable<TSource> is created out of a factory function. What’s handed back from the call to Defer is a stub IEnumerable<TSource> that will only call its factory function (getting the real intended result sequence) upon a triggered enumeration. An example is shown below:

var xs = EnumerableEx.Defer(() =>
{
    Console.WriteLine("Factory!");
    return EnumerableEx.Return(1);
});

Console.ReadLine();

xs.Run(Console.WriteLine);
xs.Run(Console.WriteLine);

In here, the Factory message won’t be printed till something starts enumerating the xs sequence. Both calls to Run do so, meaning the factory will be called twice (and could in fact return a different sequence each time).

image

 

Next on More LINQ

More duality, this time between “code and data” views on sequences, introducing Notification<T>.

Del.icio.us | Digg It | Technorati | Blinklist | Furl | reddit | DotNetKicks

Filed under: ,

Comments

# Twitter Trackbacks for More LINQ with System.Interactive ??? Sequences under construction - B# .NET Blog [bartdesmet.net] on Topsy.com

Pingback from  Twitter Trackbacks for                 More LINQ with System.Interactive ??? Sequences under construction - B# .NET Blog         [bartdesmet.net]        on Topsy.com

# Dew Drop &#8211; December 28, 2009 | Alvin Ashcraft&#039;s Morning Dew

Pingback from  Dew Drop &#8211; December 28, 2009 | Alvin Ashcraft&#039;s Morning Dew

# Еще больше LINQ с System.Interactive

Monday, December 28, 2009 9:33 AM by progg.ru

Thank you for submitting this cool story - Trackback from progg.ru

# .NET System.Interactive and Monadic type systems

Monday, December 28, 2009 3:27 PM by .NET System.Interactive and Monadic type systems

Pingback from  .NET System.Interactive and Monadic type systems

# re: More LINQ with System.Interactive – Sequences under construction

Monday, December 28, 2009 11:07 PM by Keith Dahlby

Great series so far, Bart - looking forward to more!

And FWIW, Empty() isn't missing - it already exists in System.Linq.Enumerable.

Cheers ~

Keith

Where<T>(predicate) = SelectMany<T,T>(x => predicate(x) ? Return(x) : Empty<T>())

# The Morning Brew - Chris Alcock &raquo; The Morning Brew #506

Tuesday, December 29, 2009 3:58 AM by The Morning Brew - Chris Alcock » The Morning Brew #506

Pingback from  The Morning Brew - Chris Alcock  &raquo; The Morning Brew #506

# re: More LINQ with System.Interactive – Sequences under construction

Tuesday, December 29, 2009 3:58 AM by Alexey Romanov

Actually, snoc is not just cons with flipped arguments; snoc adds an element to the _end_ of the list.

See, e.g. www.eecs.ucf.edu/.../flat-recursion.txt trevion.blogspot.com/.../little-knowledge-gets-you-long-way.html

public static IEnumerable<TSource> EndWith<TSource>(this IEnumerable<TSource> source, TSource last);

would be snoc.

# Reactive Extensions for .NET (Rx) &laquo; Just Justin&#039;s

Saturday, February 06, 2010 3:45 AM by Reactive Extensions for .NET (Rx) « Just Justin's

Pingback from  Reactive Extensions for .NET (Rx)  &laquo; Just Justin&#039;s

# Reactive Extensions for .NET ( “stuff happens” )

Wednesday, August 18, 2010 7:18 AM by Mike Taulty's Blog

I’ve been taking a look at the Reactive Extensions for .NET. It’s early days for me at this point but