Category Archives: C#

Closures in the real world: Part 2

My last post about closures left some details unexplored: I mentioned that I was caught out by the lifetime of variables that are bound into the closure, but I didn’t have time to go into detail.

I’ll try a different example which hopefully illustrates the problem without needing too much background. Suppose we are running a web site with various age-restricted products, and we want to be able to filter out users whose age is over the required limit. We’ll need a predicate that we can pass into filter methods to select the right people.

Furthermore, assume that for some reason we’ve decided that we want to pre-calculate these predicates and store them in an array, rather than constructing them on the fly (yes, the “in the real world” idea is rather failing me here). That is, we want to be able to do something like:

IEnumerable<User> allowedUsers = users.Where(ageGreaterThan[18]);

We need to populate the array of predicates, and the obvious way is with a for loop:

for (int limit = 0; limit < 22; limit++)
{
   ageGreaterThan[limit] =
      delegate(User user)
      {
         return user.Age > limit;
      };
}

We can then test our predicates:

User bob = new User() { Age = 16 };
Console.WriteLine("{0}", ageGreaterThan[6](bob));

This prints “False”, which is not what we expected. The reason for this is that when we construct the predicate, the variable bound in is the same variable in each case. As we run through the loop the variable is incremented, and this affects the values bound into the already-constructed closures. This might not be so surprising with a reference type variable, but we have a plain unboxed int here (and no auto-boxing is going on).

The semantics of closures vary between languages in a way I haven’t been able to pin down, but given that the very concept of a variable changes from one language to another it’s probably not surprising. The way C# binds variables into a closure is to identify the variables inside and outside the closure as being the exact same variable, and it follows that changing the variable’s value via one location affects it’s value in the other.

This was unexpected to me, coming from languages where variables are all immutable. Arguably, it allows extra flexibility, since you can bind a local copy of the variable if you want and then it won’t be affected by changes to the outer variable:

for (int limit = 0; limit < 22; limit++)
{
   int localLimit = limit;

   ageGreaterThan[limit] =
      delegate(User user)
      {
         return user.Age > localLimit;
      };
}

This is what I ended up doing in the Psonar code, but personally I don’t like it. Not only does it clutter up an otherwise quite neat algorithm, it’s not at all clear to the untrained eye what the copy of the variable is doing there. An inexperienced programmer might quite reasonably think that since int is a value type, it would be OK to elide the copy.

The issue of value types having this behaviour was a particular problem for me. One thing I usually assume about value types is that they are safe from action at a distance. This gets even worse when you realise that the closure can have a different lifetime to the stack frame in which it was created – in fact each of the closures we’ve created can have a different lifetime. If the integer variable isn’t boxed (it isn’t), how can the runtime possibly ensure that it survives past the end of the stack frame, and yet gets cleaned up when necessary?

The solution to this conundrum is that something is being held by a reference type, but it’s not the integer variable; it’s the closure. And that int variable isn’t a property of the stack frame that the closure is borrowing a reference to, it’s a property of the closure that the stack frame is borrowing a reference to.

disassembly

The disassembly shows the class that represents the closure, and you can see the public int variable on it. I’ll spare you most of the MSIL code, but here’s an illustrative snippet:

 IL_0039:  ldfld      int32 ClosureTest.Program/'<>c__DisplayClass2'::limit
 IL_003e:  ldc.i4.1
 IL_003f:  add
 IL_0040:  stfld      int32 ClosureTest.Program/'<>c__DisplayClass2'::limit
 IL_0045:  ldloc.2
 IL_0046:  ldfld      int32 ClosureTest.Program/'<>c__DisplayClass2'::limit
 IL_004b:  ldc.i4.s   22
 IL_004d:  clt
 IL_004f:  stloc.3
 IL_0050:  ldloc.3
 IL_0051:  brtrue.s   IL_001a

This represents the end of the for loop. We load the field from the closure, add the literal value 1 to it, then store it back on the closure. Then we load it again, compare with 22 and branch back to the beginning of the loop if it’s less. Note that this is the logic for the for loop variable, which only looks like an ordinary local variable, in fact it’s operating on the member of the class representing the closure.

Closures in the real world

I’ve been saying for a long time that it’s cool to have a language with support for closures, but today for the first time I’ve hit a real-world problem where closures make life simpler, and where the language I’m using supports them sufficiently well.

Track Reordering diagram

Psonar have a web interface where you can reorder your music playlist by dragging tracks around. In the back end, this is implemented with a method that takes a list of tracks to move and a track that they should appear after, something like this (simplified example):

void ReorderPlaylist(Playlist targetPlaylist, List<int> trackIdsToMove, int targetTrackId)
{
   //...
}

What we’d like to do is iterate over the list with a for loop, and update the position of each track as we go. However, as soon as you start trying to write this you notice that you can’t update the position of track A without knowing whether tracks have been removed or inserted in front of it. Worse, when you recalculate the position of one of the tracks to be moved, you have to know where the target element ends up.

You don’t have the information you need when you’re half way through the for loop, but you do have it by the end of it. This is a typical case for lazy evaluation of some kind. Without lazy evaluation, the programmer has to decide which partial results can be calculated without dependencies, and to calculate these first. Defining the cases is difficult, and then you have to write runtime tests to categorise the data into the cases you can calculate. Worst of all, this produces code where the intent of the code is hard to figure out.

Lazy evaluation solves this problem. You can specify the position of each track not as an integer but as a function of an as-yet-unknown value. The functions depend on each other’s output values, but in a non-circular way. Specifically, many of the position functions depend on the position of the target element, which depends only on the choice of the set of tracks to move.

The way I coded this in C# (and I’m not convinced it was the neatest way), was roughly as follows:

Dictionary<int, Func<int, int>> newPositions = new Dictionary<int, Func<int, int>>();

int currentPosition = 0;
foreach (Track currentTrack in targetPlaylist.GetTracks())
{
   if (currentTrack.Id = targetTrackId)
   {
      newPositions[currentTrack.Id] =
         (targetPos => currentPosition);
   }
   else if (trackIdsToMove.Contains(currentTrack.Id))
   {
      newPositions[currentTrack.Id] =
         (targetPos => targetPos + tracksToMove.FindIndex(x => (x == currentTrack.Id)) + 1);
   }
   else
   {
      newPositions[currentTrack.Id] =
         delegate(int targetPos)
         {
            // Note call to function that doesn't exist yet
            int newTargetPos = newPositions[targetEntryId](0);

            if (currentPosition < newTargetPos)
            {
               return currentPosition;
            }
            else
            {
               return currentPosition + tracksToMove.Count;
            }
         };
   }
}

The position of each element is now a function of the position of the target element, which is a function that ignores its argument. Therefore we can calculate the target position first, then update each of the elements:

int newTargetPos = newPositions[targetElementId](0);
// Argument is ignored, we can pass anything

foreach (Track currentTrack in playlist.GetTracks())
{
   currentTrack.Position = newPositions[currentTrack.Id](newTargetPos);
}

I particularly liked the way that each of the branches in the main logic above is a simple statement of the position, as a human would think of it: “This track will be 3 places after the target track”, “This track will be at position 7, but 4 places further up if the target track ends up before position 7”, etc. This makes them relatively easy to check for correct logic.

Why it isn’t that simple

I lied about the above code. It’s written the way you’d like to see it written, but actually it’s wrong. The closures catch instances of variables that are modified within the loop, so the closures themselves get modified. To get round this you have to create local copies of all the variables you want to close over before you create the closure:

{
   int localPosition = currentPosition

   newPositions[currentTrack.Id] =
      (x => localPosition);
}

For the uninitiated, I’ll go into more detail about this in a later post.