Thursday, April 23, 2009 3:09 AM bart

LINQ to Objects - Debugging

Introduction

Declarative language constructs like query comprehension syntax often worries imperatively trained developers. I hear this quite a bit, and the excuse of “It Just Works” is often not satisfactory for most of them :-). Combine this with interesting behavioral differences like lazy evaluation and lots of developers get lost in paradise.

Actually the perceived problem is not with LINQ itself, typically a lack of solid understanding about the query execution model causes grief. So one thing that can help to address this problem is a better visualization of how a query executes. Advanced query “providers” like LINQ to SQL offer logging capabilities to inspect what’s going on, but LINQ to Objects lacks such a capability.

In this post, we’ll have a look at possible approaches to make debugging LINQ to Objects (and hence LINQ to XML) queries easier. At the end of the day you’ll come to the conclusion it all boils down to knowing precisely what the semantics of the various operators are and how the execution works in the face of laziness etc.

 

Visual Studio 2008 support

First things first, so let’s have a look at what Visual Studio 2008 gives us with regards to debugging LINQ queries. To begin with, you can set breakpoints on entire query comprehensions (or the statement in which they occur):

image

This is not useful for the debugging of the query itself. Even more, it’s actually distracting: breaking on the line marked above and continuing execution (F10) doesn’t really do much interesting things. That is, the query doesn’t execute here. LINQ queries are evaluated lazily, only when someone starts to consume them (like foreach or some greedy operator like ToArray, ToList, etc) things of interest start to happen.

The following setup is more useful:

image

Notice how the “Results View” reveals the lazy nature of the query. I’m actually breaking inside the foreach loop, but this pays us little or nothing: at that point we already observe the results and the fact we’re debugging a query expression likely reveals that we’re not satisfied by some of the results produced. We need something in between showing the results and the declaration of the query…

Note: In the screenshot above you can see some weird animal – WhereSelectArrayIterator. This reflects performance optimizations carried out in the 3.5 SP1 release where consecutive where and select clauses are optimized into one iterator. In addition, the knowledge the query is carried out over an array can be used to exploit characteristics of arrays to boost performance as well.

One thing you can do, is set breakpoints on the bodies of the various clauses in the query expression, like this:

image

To set those breakpoints, put your cursor somewhere in the clause’s body and press F9.

image

Now you can see when certain clauses hit, e.g. where in the case above, and see how objects flow through the query. For example, once we get to Split we’ll first hit the where clause and then the select clause with the same object:

image

As the object has flown through the whole query it finally reaches the body of the foreach-loop:

image

While this can definitely help, together with conditional breakpoints and such, it might become tedious when dealing with more advanced queries and/or big input sequences. Also, nested queries and joins can be puzzling when you observe the execution behavior this way.

 

A naive logger

One thing that can help to get more global insight in the query’s execution is to have a global kind of logger. Essentially we want to add inspection points somewhere in the query (much like doing electronics by measuring certain connection points) and log output so we can see how each object flows through the query execution pipeline.

Let’s start by looking at the query above in a more under-the-hood manner, by expanding the query syntax into a chain of query operator method calls. This reveals little dots which you can conceptually compare to our desired measurement points (almost as in electronics again :)).

image

How can we make those dots observable? Obviously, the measurement should not change the observation (ignoring Heisenberg’s uncertainty principle here). In essence, we want to inject some observer to watch the flow of the data. One way to do this is by defining our own extension method. Here’s what you might think would be meaningful:

public static class EnumerableDebugger
{
    public static IEnumerable<T> Watch<T>(this IEnumerable<T> input)
    {
        return input;
    }
}

With the following use:

var res = typeof(string).GetMethods()
    .Where(m => m.Name.StartsWith("S"))
    .Watch()
    .Select(m => m.Name)
    .Watch();

Not really useful actually because we don’t observe objects flowing through the pipeline. We’ll just get one call to Watch, when the query expression itself is executed. We really want to participate in the query execution by pulling items from the source (whatever occurs on the left-hand side of the Watch call) and yielding them back to our downstream consumer as-is. For example, if a foreach starts pulling on “res”, it pulls on Watch, which should pull on “Select” yielding back its results one by one, again causing the previous Watch the be pulled, and so on. Here’s a better attempt:

public static class EnumerableDebugger
{
    public static IEnumerable<T> Watch<T>(this IEnumerable<T> input)
    {
        foreach (var item in input)
            yield return item;
    }
}

Next we need to do something meaningful inside the loop. Okay, we could set a breakpoint, but our starting point of this whole exercise was to create a logger so we can analyze a whole query execution run afterwards. To keep things simple, we’ll just log to the console. We could simply do a ToString or we could take in a function that allows overriding the “printf” behavior (as objects might be complicated or might not have ToString overridden in a meaningful way):

public static class EnumerableDebugger
{
    public static IEnumerable<T> Watch<T>(this IEnumerable<T> input)
    {
        return Watch(input, item => item != null ? item.ToString() : "(null)");
    }

    public static IEnumerable<T> Watch<T>(this IEnumerable<T> input, Func<T, string> toString)
    {
        foreach (var item in input)
        {
            Console.ForegroundColor = ConsoleColor.Yellow;
            Console.WriteLine(toString(item));
            Console.ResetColor();
            yield return item;
        }
    }
}

This should look relatively straightforward. Let’s give our query a dry-run:

image

Not too bad; yellow is logging output and white is the program’s main output. Using our overload we could add more diagnostic information as well, and let’s go crazy by adding a watch to the output of the original source:

 

var res = typeof(string).GetMethods()
    .Watch(m => "SOURCE: " + m)
    .Where(m => m.Name.StartsWith("S"))
    .Watch(m => "WHERE:  " + m.Name)
    .Select(m => m.Name)
    .Watch(m => "SELECT: " + m /* already name, see output of Select! */);

This produces the following:

image

Sweet – we get to see where objects get dropped and such. But we want more, don’t we? A first limitation of this approach is that we need to fall back to non-LINQ syntax, which makes it hard to instrument a query “on the spot”. How to solve this? We can’t add keywords obviously, so what about piggybacking on an existing keyword? Let’s pick one that preserves the type of its input when producing output. What about Where:

public static IEnumerable<TSource> Where<TSource>(this IEnumerable<TSource> source, Func<TSource, bool> predicate);

The predicate isn’t very satisfactory though. We need something else there, a “string”-valued predicate more specifically. Not really a predicate after all, as we’re overloading Where for debugging purposes only. If you think that every string is “true”, you have an all-pass-filter really. Let’s implement it:

public static class EnumerableDebugger
{
    public static IEnumerable<T> Watch<T>(this IEnumerable<T> input)
    {
        return Where(input, item => item != null ? item.ToString() : "(null)");
    }

    public static IEnumerable<T> Where<T>(this IEnumerable<T> input, Func<T, string> toString)
    {
        foreach (var item in input)
        {
            Console.ForegroundColor = ConsoleColor.Yellow;
            Console.WriteLine(toString(item));
            Console.ResetColor();
            yield return item;
        }
    }
}

Hah – just change Watch into Where and you’re done. Now you get to write something like this:

var res = from m in typeof(string).GetMethods()
          /* log */ where "SOURCE: " + m
          where m.Name.StartsWith("S")
          /* log */ where "WHERE:  " + m.Name
          select m.Name into m
          /* log */ where "SELECT: " + m
          select m;

Cool – inline logging. To put a logger at the tail of the query, I had to continue the query using “into”, but that’s a relatively little detail (and at that point it doesn’t matter much as the results of the query will be fed in to the consumer as-is). And sure enough, if you try to run this, you’ll get precisely the same output as before.

 

A sophisticated logger

Maybe the above suffices, but as usual we ask ourselves the question whether we can do better than this? What would better mean anyway? Well, what about a less invasive way of doing this kind of debugging? Typically you want everything logged to make total sense of the output, so why do we have to put the “watch” between every two clauses? In addition, we might want to log information about the query too, such as string-friendly representations of predicates or so. Why? If we have a lot of those and every Where call would be logged as “where” it’s hard to distinguish them (or search for them, for that manner).

Another thing that’s useful to log is the thread of execution. LINQ is really a virtually multi-threaded execution environment. Not in the classic OS sense of the word, but in a conceptual sense. Every sequence of objects is something that can make progress independently from one another; e.g. Join can pull on two sources causing both to make progress. And those two sources by themselves might come from yet other origin sequences, such as another Join or a Concat or a SelectMany or what have you. Seeing exactly which “thread” is producing values becomes useful information to follow the flow of objects.

So, what if we could re-instrument our original query in a manner as straightforward as this?

var res = from m in typeof(string).GetMethods().Debug("source")
          where m.Name.StartsWith("S")
          select m.Name;

Here the string passed to Debug represents a friendly name for the “thread”. If we were to join in another source we could mark that one with a different name to distinguish them. We’ll come to that point later, but first: how does this work?

Obviously we need some Debug extension method on IEnumerable<T>, but how can that influence all downstream query operators? The answer is we need to get rid of the IEnumerable<T> and replace it by something more specific, so that method resolution for the query operator methods picks our hijacked versions. Doesn’t make sense yet? Let’s have a look:

public static class DebugEnumerable
{
    public static EnumerableDebugger<T> Debug<T>(this IEnumerable<T> input, string name)
    {
        return Debug<T>(input, name, Console.Out);
    }

    public static EnumerableDebugger<T> Debug<T>(this IEnumerable<T> input, string name, TextWriter logger)
    {
        return new EnumerableDebuggerSource<T>(input, logger, name);
    }

Notice the return type of the Debug<T> method. This means that the source part of our original query, as shown below, will have EnumerableDebugger<MethodInfo> as its type, instead of IEnumerable<MethodInfo>:

typeof(string).GetMethods().Debug("source")

Consecutive query operator calls will now be resolved against this EnumerableDebugger<T>, the most specific type the compiler knows about, first. Because we want the result still to be enumerable, EnumerableDebugger<T> still implements IEnumerable<T>. So all we need to do now is “overload” the query operator methods on our debugger type.

To begin with, let’s have a peek at the EnumerableDebugger<T> type itself:

public class EnumerableDebugger<T> : IEnumerable<T>
{
    protected IEnumerable<T> _source;
    protected string _name;
    private TextWriter _logger;

    internal EnumerableDebugger(IEnumerable<T> source, TextWriter logger, string name)
    {
        _source = source;
        _logger = logger;
        _name = name;
    }

    internal TextWriter Log { get { return _logger; } }
    internal string SourceName { get { return _name; } }

    public IEnumerator<T> GetEnumerator()
    {
        return GetEnumeratorCore();
    }

    protected virtual IEnumerator<T> GetEnumeratorCore()
    {
        return _source.GetEnumerator();
    }

    System.Collections.IEnumerator System.Collections.IEnumerable.GetEnumerator()
    {
        return this.GetEnumerator();
    }

    internal void LogOperator(string format, params object[] args)
    {
        Log.WriteLine("~" + _name + " " + string.Format(format, args));
    }
}

A fairly straightforward type, supporting enumeration and keeping track of a few pieces of state. The important bit is that it encapsulates an existing IEnumerable<T> and surfaces it again in some way, to be specified further. The LogOperator method logs a message, prefixing it with the source name, which indicates the thread of execution (notice the ~ syntax).

But our Debug<T> method on IEnumerable<T> returned an EnumerableDebuggerSource<T>, didn’t it? Exactly, basically to override the meaning of the enumeration code in order to do some logging:

public class EnumerableDebuggerSource<T> : EnumerableDebugger<T>
{
    internal EnumerableDebuggerSource(IEnumerable<T> source, TextWriter logger, string name)
        : base(source, logger, name)
    {
    }

    protected override IEnumerator<T> GetEnumeratorCore()
    {
        foreach (var item in _source)
        {
            LogOperator("-> {0}", item);
            yield return item;
        }
    }
}

This guy doesn’t do anything but enumerating over the original source, much like our Watch method did, using the logger infrastructure to write information about items to the screen. We use an item’s ToString method here to keep things simple (and actually specifying a special “ToString” with this approach gets harder compared to the previous “naive logger” attempt…).

So far, everything is just an all-pass-filter again. But now we need to tackle the query operators and do meaningful logging for those as well. Let’s begin with the Where operator:

public static EnumerableDebugger<T> Where<T>(this EnumerableDebugger<T> input, Expression<Func<T, bool>> predicate)
{
    string s = predicate.ToString();
    var f = (Func<T, bool>)predicate.Compile();
    Func<T, bool> log = t =>
    {
        var res = f(t);
        input.LogOperator(".Where({0}) ({1}) {2}", s, t, res ? "-> " + t : "");
        return res;
    };
    return new EnumerableDebugger<T>(Enumerable.Where(input, log), input.Log, input.SourceName);
}

Overwhelming? Not really. Follow me along. First we change the predicate to an Expression<Func<T, bool>> instead of just a Func<T, bool>. This turns the lambda expression fed in to the second argument (really the first when calling the method using extension method syntax) into an expression tree. All we need this for is to be able to “ToString” it in order to have a recognizable print-out of the operator we’re processing (e.g. when we have multiple Where calls on different “threads”). Notice we don’t care much about performance in the context of a debugger extension…

Now the implementation itself. First we compile the predicate to a delegate so we can execute it just as if it were a Func<T, bool> to begin with. This essentially invokes the C# compiler logic at runtime (the tip of the Compiler as a Service, or CaaS, iceberg). I told you the only reason to have an expression tree was to print it here, so we get rid of the code-as-data representation as soon as we can.

The lambda expression, named log, on the third line is where the magic happens. Here we close over the outer scope to evaluate the incoming item “t” (of type T) just using the predicate (“f”) the user specified, which gives us a Boolean value “res”. All we do with that is returning it, but before we do so – and that’s the crux – we emit a log entry:

.Where({0}) ({1}) {2}

with {0} the string-representation of the predicate (something like m => m.Name.StartsWith(“S”)), {1} substituted for the ToString of the item being processed and {2} the outcome of the predicate check, followed with the item again in case it succeeded. This gives you output of the form:

.Where(m => m.Name.StartsWith(“S”)) (String.Split) –> String.Split

So, the –> is generalized as “produces”. By logging through the LogOperator method, we’ll get a prefix that indicates the source we’re dependent on.

Finally, our Where method hijacked predicate is fed into the real LINQ to Objects implementation by calling Enumerable.Where:

return new EnumerableDebugger<T>(Enumerable.Where(input, log), input.Log, input.SourceName);

A few remarks here:

  • As we pass in the altered-by-adding-logging-code predicate lambda, Enumerable.Where will start calling that one, causing logging to happen. That’s what this diversion trick is all about.
  • The result of Enumerable.Where is an IEnumerable<T>, which we should not let go as-is. Why? Because in that case, subsequent query operators (like Select in the sample) would go straight to LINQ to Objects. Instead, we want to hijack that one again, so we wrap the output of Enumerable.Where in one of our EnumerableDebugger<T> objects again.
  • We pass on the source name as-is, because Where doesn’t Join existing sources or so; we’re basically still on the thread of execution that ultimately causes the original source to be pulled (more about this further on).

You can guess what Select does…

public static EnumerableDebugger<R> Select<T, R>(this EnumerableDebugger<T> input, Expression<Func<T, R>> project)
{
    string s = project.ToString();
    var f = (Func<T, R>)project.Compile();
    Func<T, R> log = t =>
    {
        var res = f(t);
        input.LogOperator(".Select({0}) ({1}) -> {2}", s, t, res);
        return res;
    };
    return new EnumerableDebugger<R>(Enumerable.Select(input, log), input.Log, input.SourceName);
}

Same deal in principle. Turn it into motion:

image

We hit the breakpoint for the “Split” method, so let’s see what has been printed to the screen before the breakpoint was hit:

image

Quite diagnostic, isn’t it? Here you can see none of the previous methods passed the Where filter. Split was the first one to go through and enter the Select projection ultimately to reach our foreach loop consuming the query. Okay, this quite verbose, but here’s everything you can dream of.

Now it becomes interesting to think about how to log other operators. After all, we need to hijack most of the standard query operators to reach full “debuggability” here. For example, a possible Skip “override” would be:

public static EnumerableDebugger<T> Skip<T>(this EnumerableDebugger<T> input, int count)
{
    string s = count.ToString();
    int n = 1;
    Func<T, T> log = t =>
    {
        input.LogOperator(".Skip({0}) #{1}{2}", s, n, (n <= count ? "" : " -> " + t));
        n++;
        return t;
    };
    return new EnumerableDebugger<T>(Enumerable.Select(input, log).Skip(count), input.Log, input.SourceName);
}

This logs numbering of items that flow through it, indicating when the threshold is reach and when the operator starts producing results. Take is obviously the dual of this.

More interesting operators are the ones that cause the query source to evaluate eagerly, because all results are needed to continue processing. A sample includes OrderBy*. To illustrate this, we can tag the output of the sorting operation by a new “source name”, so that subsequent query operators seem to pull from the intermediate result instead:

public static EnumerableDebugger<T> OrderByDescending<T, K>(this EnumerableDebugger<T> input, Expression<Func<T, K>> keyExtractor)
{
    string s = keyExtractor.ToString();
    var f = (Func<T, K>)keyExtractor.Compile();
    Func<T, K> log = t =>
    {
        var res = f(t);
        input.LogOperator(".OrderByDescending({0}) ({1}) -> {2}", s, t, res);
        return res;
    };
    return new EnumerableDebuggerSource<T>(Enumerable.OrderByDescending(input, log), input.Log, input.SourceName + ".OrderByDescending(" + s + ")");
}

So, a source tagged “~source” will be eaten by the OrderByDescending (in order to the ordering across all items in it), starting a new “virtual source” which is now tagged as “~source.OrderByDescending(m => m.Name)” for instance.

Similar situations occur when doing things like joins or concatenations:

public static EnumerableDebugger<T> Concat<T>(this EnumerableDebugger<T> first, EnumerableDebugger<T> second)
{
    return new EnumerableDebugger<T>(Enumerable.Concat(first, second), first.Log, first.SourceName + "&" + second.SourceName);
}

I’ll leave the implementation of other query operators to the reader. Below is a sample of a more sophisticated query debugger run:

var res2 = (from m in Enumerable.Range(0, 10).Debug("first")
.SkipWhile(m => m <= 5)
.TakeWhile(m => m <= 15)
.Concat(Enumerable.Range(1, 5).Debug("second")) where m % 2 == 0 select m * 2).Skip(2).OrderByDescending(m => m).Take(2).Sum(x => x + 1); Console.WriteLine(res2);

(Ugh)

image

 

Exploring LINQ further

Actually, this post is not just meant as a debugging aid, it’s also meant as a gateway to an introspection tool for LINQ queries, which I plan to refer to in future posts. One of the most exotic aspects of LINQ is its ability to have multiple range variables in scope, e.g. when having multiple from clauses, or when using a join, or by using the let keyword. Here we get in a very fascinating topic of transparent identifiers. I’m not going into this right now, but suffice to show a query using “let” with the corresponding debugger output:

var res1 = from x in Enumerable.Range(1, 2).Debug("rng")
           let y = x
           let z = x * 2
           select x + y;
Console.WriteLine(res1.Sum());

(Drums please…)

image

I’m not going to cover this today, but expect to hear more about this (and potentially on how to make this debugger tool better in the face of those foreign transparent identifiers) in the future.

Del.icio.us | Digg It | Technorati | Blinklist | Furl | reddit | DotNetKicks

Filed under:

Comments

# LINQ to Objects - Debugging - B# .NET Blog - Bart De Smet

Thursday, April 23, 2009 6:54 AM by DotNetShoutout

Thank you for submitting this cool story - Trackback from DotNetShoutout

# LINQ to Objects: Debugging

Thursday, April 23, 2009 10:07 AM by DotNetKicks.com

You've been kicked (a good thing) - Trackback from DotNetKicks.com

# LINQ to Objects - Debugging

Thursday, April 23, 2009 11:16 AM by SteelePrice.Net

# re: LINQ to Objects - Debugging

Thursday, April 23, 2009 11:34 AM by chyld medford

nice article on linq!  very informative!

thanks

# Reflective Perspective - Chris Alcock &raquo; The Morning Brew #334

Pingback from  Reflective Perspective - Chris Alcock  &raquo; The Morning Brew #334

# Debugging linq to objects &laquo; Ismail&#8217;s umbraco adventures

Pingback from  Debugging linq to objects &laquo;  Ismail&#8217;s umbraco adventures

# re: LINQ to Objects - Debugging

Friday, April 24, 2009 4:53 AM by Dilip Krishnan

You're just a bag of brains! Thank you!

# Arjan`s World &raquo; LINKBLOG for April 24, 2009

Friday, April 24, 2009 12:47 PM by Arjan`s World » LINKBLOG for April 24, 2009

Pingback from  Arjan`s World    &raquo; LINKBLOG for April 24, 2009

# Bart is a LINQ to Objects genius

Friday, April 24, 2009 12:57 PM by sticklebackplastic.com

Bart is a LINQ to Objects genius

# re: LINQ to Objects - Debugging

Sunday, April 26, 2009 3:22 AM by Pete Montgomery

Hey Bart, From memory I don't think your first example ("This is not useful for the debugging of the query itself") is quite right. If you but a breakpoint on a LINQ to Objects query expression (comprehension or method chain) and F11 over the consuming foreach statement, I think the debugger does break into each individual query lambda, even if you haven't explicitly set on them with F9. Pete.

# re: LINQ to Objects - Debugging

Sunday, April 26, 2009 3:51 PM by bart

Hi Pete,

Thanks for your feedback. True, and you don't even need to set a breakpoint on the query expression itself to be able to step-into the query expression's individual clauses. Ultimately you're doing just a step-into skipping a bunch of framework code, calling back into your own code at some point.

So, still, the execution of the query expression itself, and thus the chain of query operator calls underneath, by itself is not a useful debugging terchnique for LINQ to Objects. But you're right there's no need to set individual breakpoints on operator clause bodies (although that might still be useful to break selectively on the parts of the query you're interested in).

Thanks,

-Bart

# Link Post for April &laquo; Code: Impossible

Thursday, April 30, 2009 7:22 AM by Link Post for April « Code: Impossible

Pingback from  Link Post for April &laquo; Code: Impossible

# zofran tab

Monday, October 11, 2010 3:49 PM by zofran odt dose

zofran tab. zofran odt dose. zofran odt dose.

# discount zyloprim

Wednesday, October 13, 2010 3:09 PM by discount zyloprim

discount zyloprim. zyloprim dosage. discount zyloprim. zyloprim dosage. discount zyloprim. discount zyloprim.