Lookup fields inside out

Introduction

Lookup fields are one of the most powerful list-related features of SharePoint, together with their big brothers, the LookupMulti fields. First of all, let's introduce these field types briefly. In the screenshot below you can see a Northwind Products list in SharePoint with at the right-hand side two Lookup fields: Category en Supplier. These two fields refer to items in another list, in this case Categories and Suppliers:

image

The definition of such a field is shown below:

image

Using 'Allow multiple values', you can turn the column into a LookupMulti list, referring to multiple items instead of just one.

 

How to map? SPML is your friend!

So far so good. But how can we use these fields in LINQ to SharePoint? Of course it all starts by creating the entity types. To do this, we can either use the SpMetal command line tool, or (starting with the upcoming 0.2.1 interim release) the SPML mapping language. It's sufficient to map just the Products list, the entity generator will automatically figure out all of the lists that are referred to in the list's definition recursively, either using Lookup or LookupMulti fields. The entity generator is smart enough to eliminate cycles in such a list cross-reference graph and export all of the entity types you need. In this case, I'll just export the Products list with the following SPML definition:

<SharePointDataContext Name="Demo" Url="http://wss3demo">
    <Lists>
        <List Name="Products" />
    </Lists>
</SharePointDataContext >

As with all SPML files you still need to set the Custom Tool manually:

image

This will change in the future when we have a Project Item for SPML files, as well as a designer down the road. Now, when we run the entity generator as follows:

image

it will generate the entity types for the Products lists as well as the Suppliers and Categories lists that are referred to:

image

 

Introducing deferred loading

Time to write some simple query now:

DemoSharePointDataContext ctx = new DemoSharePointDataContext(new Uri("http://wss3demo"));
ctx.Log = Console.Out;
var res = (from p in ctx.Products where p.ProductName == "Chai" select p).AsEnumerable().Single();
Console.WriteLine("{0} costs {1} per {2}", res.ProductName, res.UnitPrice, res.QuantityPerUnit);

On the fourth line you can see a very simple query that just selects on the product's name. The Single query operator isn't implemented in LINQ to SharePoint currently, so we insert an AsEnumerable() call to perform the subsequent Single call via LINQ to Objects. After all, this is how the execution looks like:

image

However, what about accessing the Category property of the entity object? To illustrate this, add the following line of code:

Console.WriteLine("Category of {0} is {1}", res.ProductName, res.Category.CategoryName);

In here, we refer to res.Category which is obtained using deferred loading, meaning that we only get the Category data back when the user asks for it:

image

If you want to get the data right away when executing the query, you could use it in the projection portion of the query:

var res = (from p in ctx.Products where p.ProductName == "Chai" select new { p.ProductName, p.UnitPrice, p.QuantityPerUnit, p.Category }).AsEnumerable().Single();

image

Observe the place where the query gets executed, it's now part of the Products list query operation. You might wonder why we can't do something SQL'ish to join between lists and get data back that way. The answer is pretty simple: in CAML you're restricted to querying one list at a time, so we can't do any better.

Notice the way the "subquery" is composed, as a CAML query using the ID field. This is because we're able to get identity information of child lists used in Lookup fields (i.e. Category and Supplier) when querying the parent list (in this case Products). Internally, LINQ to SharePoint keeps this data in a so-called LazyLoadingThunk object that's stored in the entity object's fields dictionary. When the data for such a field is retrieved using an entity property (which calls into the GetValue method of the base SharePointListEntity class, see fragment below), the LazyLoadingThunk grabs the ID and sends a request to the data context object to go and retrieve that object using a "GetEntityById" call:

[global::BdsSoft.SharePoint.Linq.FieldAttribute("Category", global::BdsSoft.SharePoint.Linq.FieldType.Lookup, Id="...", LookupField="CategoryName")]
public Categories Category {
    get {
        return ((Categories)(base.GetValue("Category")));
    }
    set {
        base.SetValue("Category", value);
    }
}

LookupMulti fields are treated in a similar way with an entity property that's a collection type instead (in the current release we use IList<T> as the property type for a LookupMulti field in an entity type, where T stands for another SharePointListEntity subclass). E.g. if the Products list would have a LookupMulti Categories property instead, you'd get an entity definition with a property similar to the following:

[global::BdsSoft.SharePoint.Linq.FieldAttribute("Categories", global::BdsSoft.SharePoint.Linq.FieldType.LookupMulti, Id="...", LookupField="CategoryName")]
public IList<Categories> Categories {
    get {
        return ((IList<Categories>)(base.GetValue("Categories")));
    }
    set {
        base.SetValue("Categories", value);
    }
}

Note: If you don't want to write a projection that calls the Category property (or another Lookup or LookupMulti field) manually in order to retrieve the data right away (which also returns non-entity types which won't support updating), we'll provide some other feature modeled after LINQ to SQL "data shapes" in a future release. Such a data shape instructs the data provider to retrieve some fields (in our case typically Lookup or LookupMulti fields) right away when writing a query, even when such a query doesn't include that field directly (e.g. in a projection). I'll blog about this feature once we get there in the release cycle.

 

Using subqueries

However, there's more than just deferred loading when talking about Lookup(Multi) fields. What about querying the parent list in a way like this:

var res = from p in ctx.Products where p.Category.CategoryName.StartsWith("Con") select p;
foreach (var p in res)
    Console.WriteLine("{0} in category {1} costs {2} per unit", p.ProductName, p.Category.CategoryName, p.UnitPrice);

Since we can't query for multiple lists using just one CAML query, e.g. by doing joins, we have to play quite some tricks to get this to work. When we set a breakpoint on the foreach loop, we can take a look at the res object at runtime:

image

This is the result as shown in the debugger visualizer:

image

Observe the portion I've indicated in the red rectangle. It isn't valid CAML, it's a so-called patch. Such a patch gets generated by the front-end portion of the parser whenever it observes a "traversal" (such as p.Category.CategoryName.StartsWith("Con")) of a Lookup field in the query predicate (where clause). When the query has to be sent to the server in order to retrieve results, the contents of each patch is wrapped in a separate query that's executed against the entity type specified by the Patch element's Field attribute. Based on the results of this subquery, the Patch element is replaced by a tree of <Or> tags in order to select only those list items from the parent list that reference a category that's matching with the patch's query. This looks pretty much like a foreign key lookup in relational databases.

Note: In releases prior to the 0.2.1 interim release, the situation was much more complex (cf. FieldAttribute::IsUnique and EnforeLookupFieldUniqueness) because of the lacking documentation of how to do a "lookup field query by value". There's however an undocumented LookupId attribute on the CAML FieldRef element that allows to do exactly this.

For our sample query, this is the resulting execution plan:

  1. First, the Category patch is executed. This subquery retrieves the ID field values for all of the categories that start with Con.

    <Query>
      <Where>
        <BeginsWith>
          <FieldRef Name="CategoryName" />
          <Value Type="Text">Con</Value>
        </BeginsWith>
      </Where>
    </Query>
    <ViewFields>
      <FieldRef Name="ID" />
    </ViewFields>
  2. Now, the Category patch is cut from the original pre-parsed query and replaced by a tree of <Or>-conditions chaining all matching category IDs together, like this:
  3. <Query>
      <Where>
        <Or>
          <Eq>
            <Value Type="Lookup">3</Value>
            <FieldRef Name="Category" LookupId="TRUE" />
          </Eq>
          <Eq>
            <Value Type="Lookup">2</Value>
            <FieldRef Name="Category" LookupId="TRUE" />
          </Eq>
        </Or>

      </Where>
    </Query>

As an example, let's make it a little more difficult with two patches involved. Observe the following query:

var res = from p in ctx.Products
          where p.Category.CategoryName.StartsWith("Con")
                && (p.Supplier.Country == "USA" && p.Supplier.Region == "LA"
                    || p.Supplier.Country == "Canada" && p.Supplier.Region == "Québec")
          select p;

Time to see what the parser builds out of this LINQ query, so take a look at the visualizer screenshot below:

image

Observe we have two patches this time. The parser is smart enough to coalesce adjacent lookup field traversals, in this case the whole query predicate portion for the supplier condition (USA/LA or Canada/Québec) is brought together in just one patch. This means that LINQ to SharePoint will launch two subqueries before taking on the parent query:

Query corresponding with p.Category.CategoryName.StartsWith("Con"):

<Query>
  <Where>
    <BeginsWith>
      <FieldRef Name="CategoryName" />
      <Value Type="Text">Con</Value>
    </BeginsWith>
  </Where>
</Query>
<ViewFields>
  <FieldRef Name="ID" />
</ViewFields>

Query corresponding with (p.Supplier.Country == "USA" && p.Supplier.Region == "LA" || p.Supplier.Country == "Canada" && p.Supplier.Region == "Québec"):

<Query>
  <Where>
    <Or>
      <And>
        <Eq>
          <Value Type="Text">USA</Value>
          <FieldRef Name="Country" />
        </Eq>
        <Eq>
          <Value Type="Text">LA</Value>
          <FieldRef Name="Region" />
        </Eq>
      </And>
      <And>
        <Eq>
          <Value Type="Text">Canada</Value>
          <FieldRef Name="Country" />
        </Eq>
        <Eq>
          <Value Type="Text">Québec</Value>
          <FieldRef Name="Region" />
        </Eq>
      </And>
    </Or>
  </Where>
</Query>
<ViewFields>
  <FieldRef Name="ID" />
</ViewFields>

Next, it will patch the parent query with all of the patch information gathered, resulting in the following:

image

Finally this piece of CAML is ready for execution by the server to get the final results back.

As a final feature under the umbrella of Lookup field support, you can also use entity objects in the query predicate directly, like this:

var cat = (from c in ctx.Categories where p.CategoryName == "Beverages" select c).First(); //we don't support the Single query operator, so use First to get the 'singleton result'
var res = from p in ctx.Products where p.Category == cat select p;

This will grab the primary key field value from the cat variable and use it in similar way as the samples above.

 

Quiz

If you think you've understood everything in this post, here's a little quiz. How many distinct queries will be sent to the server in order to execute the following LINQ query?

var res = from p in ctx.Products where p.UnitPrice <= 20 && (p.Category.CategoryName == "Beverages" || p.Category.CategoryName == "Seafood") && p.Supplier.Country == "USA" orderby p.UnitPrice descending select p;
foreach (var p in res)
   Console.WriteLine("{0} in category {1} costs {2} and is supplied by {3}.", p.ProductName, p.Category.CategoryName, p.UnitPrice, p.Supplier.CompanyName);

The output is shown below (tip: why is this relevant?).

Boston Crab Meat in category Seafood costs 18,4 and is supplied by New England Seafood Cannery.
Steeleye Stout in category Beverages costs 18 and is supplied by Bigfoot Breweries.
Sasquatch Ale in category Beverages costs 14 and is supplied by Bigfoot Breweries.
Laughing Lumberjack Lager in category Beverages costs 14 and is supplied by Bigfoot Breweries.
Jack's New England Clam Chowder in category Seafood costs 9,65 and is supplied by New England Seafood Cannery.

 

Conclusion

As you can see, this feature in LINQ to SharePoint frees developers from the burden of having to execute more than one CAML query in order to get the desired results back. In the upcoming 0.2.1 interim release, this feature is fully functional, ready for you to play with it. Notice however that we don't support lookup field traversals deeper than one. For example, if an item from a Product list has a supplier from a Suppliers list with an address stored in an Addresses list, it won't be possible to write a predicate like p.Supplier.Address.City == "Seattle" (Address is one bridge too far). This might be subject of a future release if time permits.

Del.icio.us | Digg It | Technorati | Blinklist | Furl | reddit | DotNetKicks

Published Tuesday, July 10, 2007 4:14 PM by bart
Filed under:

Comments

No Comments
Powered by Community Server (Non-Commercial Edition), by Telligent Systems