Closures in the real world: Part 2

My last post about closures left some details unexplored: I mentioned that I was caught out by the lifetime of variables that are bound into the closure, but I didn’t have time to go into detail.

I’ll try a different example which hopefully illustrates the problem without needing too much background. Suppose we are running a web site with various age-restricted products, and we want to be able to filter out users whose age is over the required limit. We’ll need a predicate that we can pass into filter methods to select the right people.

Furthermore, assume that for some reason we’ve decided that we want to pre-calculate these predicates and store them in an array, rather than constructing them on the fly (yes, the “in the real world” idea is rather failing me here). That is, we want to be able to do something like:

IEnumerable<User> allowedUsers = users.Where(ageGreaterThan[18]);

We need to populate the array of predicates, and the obvious way is with a for loop:

for (int limit = 0; limit < 22; limit++)
{
   ageGreaterThan[limit] =
      delegate(User user)
      {
         return user.Age > limit;
      };
}

We can then test our predicates:

User bob = new User() { Age = 16 };
Console.WriteLine("{0}", ageGreaterThan[6](bob));

This prints “False”, which is not what we expected. The reason for this is that when we construct the predicate, the variable bound in is the same variable in each case. As we run through the loop the variable is incremented, and this affects the values bound into the already-constructed closures. This might not be so surprising with a reference type variable, but we have a plain unboxed int here (and no auto-boxing is going on).

The semantics of closures vary between languages in a way I haven’t been able to pin down, but given that the very concept of a variable changes from one language to another it’s probably not surprising. The way C# binds variables into a closure is to identify the variables inside and outside the closure as being the exact same variable, and it follows that changing the variable’s value via one location affects it’s value in the other.

This was unexpected to me, coming from languages where variables are all immutable. Arguably, it allows extra flexibility, since you can bind a local copy of the variable if you want and then it won’t be affected by changes to the outer variable:

for (int limit = 0; limit < 22; limit++)
{
   int localLimit = limit;

   ageGreaterThan[limit] =
      delegate(User user)
      {
         return user.Age > localLimit;
      };
}

This is what I ended up doing in the Psonar code, but personally I don’t like it. Not only does it clutter up an otherwise quite neat algorithm, it’s not at all clear to the untrained eye what the copy of the variable is doing there. An inexperienced programmer might quite reasonably think that since int is a value type, it would be OK to elide the copy.

The issue of value types having this behaviour was a particular problem for me. One thing I usually assume about value types is that they are safe from action at a distance. This gets even worse when you realise that the closure can have a different lifetime to the stack frame in which it was created – in fact each of the closures we’ve created can have a different lifetime. If the integer variable isn’t boxed (it isn’t), how can the runtime possibly ensure that it survives past the end of the stack frame, and yet gets cleaned up when necessary?

The solution to this conundrum is that something is being held by a reference type, but it’s not the integer variable; it’s the closure. And that int variable isn’t a property of the stack frame that the closure is borrowing a reference to, it’s a property of the closure that the stack frame is borrowing a reference to.

disassembly

The disassembly shows the class that represents the closure, and you can see the public int variable on it. I’ll spare you most of the MSIL code, but here’s an illustrative snippet:

 IL_0039:  ldfld      int32 ClosureTest.Program/'<>c__DisplayClass2'::limit
 IL_003e:  ldc.i4.1
 IL_003f:  add
 IL_0040:  stfld      int32 ClosureTest.Program/'<>c__DisplayClass2'::limit
 IL_0045:  ldloc.2
 IL_0046:  ldfld      int32 ClosureTest.Program/'<>c__DisplayClass2'::limit
 IL_004b:  ldc.i4.s   22
 IL_004d:  clt
 IL_004f:  stloc.3
 IL_0050:  ldloc.3
 IL_0051:  brtrue.s   IL_001a

This represents the end of the for loop. We load the field from the closure, add the literal value 1 to it, then store it back on the closure. Then we load it again, compare with 22 and branch back to the beginning of the loop if it’s less. Note that this is the logic for the for loop variable, which only looks like an ordinary local variable, in fact it’s operating on the member of the class representing the closure.

One thought on “Closures in the real world: Part 2

  1. Pingback: Closures in the real world: Part 3 « Tim Martin's blog

Leave a Reply

Your email address will not be published. Required fields are marked *