Category Archives: C++

Programming in the C++ language

The safe bool idiom in C++

I’m in the process of writing a series of articles on C++11, and a passing explanation of this idiom grew to become a whole article. There’s no original content here, but I learned something in writing it and I hope it might be useful to someone out there.

It’s idiomatic C++ to evaluate a variable in boolean context, even when it’s not a boolean variable, to detect whether that variable is “not a thing”. For example, with a pointer variable it checks whether the pointer is null or not:

if (my_pointer) {
   // my_pointer is not null
}

Exactly what this boolean conversion means varies by type, but experience shows that one’s intuitive sense of how this should work is pretty consistent and this can make code more expressive. Some would say this is bad practice, but for now I’ll assume that it’s desirable.

If you’re writing a user-defined class, you want to follow this kind of pattern if possible. If 0 is false, then surely the (0, 0, 0) point in a 3D space is false? Of course, C++ lets you make your user-defined type act pretty much however you want it to, so of course you can define an implicit conversion to bool:

struct TestResult {
   //...
 
   operator bool() const {
      return m_passed;
   }
};

The problem with this is that it’s not just a conversion to bool. It’s a conversion to a type that is part of a system of numeric types, and the types allow conversions between them that allow you do do silly things. For example, you can assign it to an integer:

TestResult test_result;
int i = test_result; // Should really be a compile error

More insidiously, things like the left-shift operator suddenly work on your objects in ways that you might not expect:

test_result << i;

As is the way with all C++ design issues, you can get around this by careful application of edge cases of language tools that were meant for something else. You can expose a conversion to any type you like. Void pointers are a reasonable thing to try, because there’s not a lot you can do with a void pointer:

struct TestResult {
   operator void *() {
      return m_passed ? this : 0;
   }
};

There’s no implicit conversion of void pointers in C++, so this seems reasonably safe. However, one thing you can do with a void pointer is call delete, and there’s no way to prevent that from compiling:

TestResult tr;
delete tr; // Gets converted to pointer, attempts to delete a stack variable!

You could try returning a pointer other than this, which could eat least mean that the code would crash horribly the moment delete was called:

struct TestResult {
   operator void *() {
      return m_passed ? (void *)1 : (void *)0;
   }
};

But this is horrible. Besides, there’s no guarantee that you won’t segfault just by referencing pointers to non-allocated portions of memory (i.e. segfault even in the good case where you’re just converting the result straight to boolean).

A common trick in C++ (which dates from an earlier related trick in C) is to give code an incomplete type to work with. If you have a class declaration in scope, but no definition, then you can handle pointers to the incomplete class but there are limits to what you can do. In particular, you can’t dereference or delete the pointer. So you can safely pass such a pointer back to the caller and allow them to compare it with NULL without worrying that they can delete it.

How can you get a pointer to a type that you can be sure won’t be defined? You define one for yourself. A nested class will do nicely:

struct TestResult {
  //...
 
  class never_defined;
 
  operator const never_defined*() {
    return m_passed ? reinterpret_cast<const never_defined *>(this) : 0;
  }
};

Unfortunately, while pointers to incomplete types can’t be dereferenced, you can still compare them:

TestResult x, y;
 
// later ..
 
if (x > y) // Really shouldn't compile
{
 // ...
}

Can we prevent this? Just one more dip into the junk draw of C++ will sort us out. One of the more overlooked features of C++: the pointer-to-member (in this case, a pointer to member function).

A pointer to member isn’t like a regular pointer. A regular pointer points to data of a particular type in the memory space. A pointer to member operates in the world of classes, not objects. The pointer to member identifies a member of a particular type on a particular class; if you have a int Foo::*, it can point to any integer member of the Foo class. When you set the value of your pointer-to-member, it points to that same member on every instance of Foo (or, equivalently, on no particular Foo instance at all).

Pointers to member can’t be compared to each other, so we can combine our conversion-to-pointer trick with a pointer-to-member and have (at last) a safe boolean conversion:

struct TestResult {
  //...
 
  typedef void (TestResult::*bool_type)() const;
  void do_nothing() const;
 
  operator bool_type() const {
    return m_passed ? &TestResult::do_nothing : 0;
  }
};

If you’re not used to unpacking these definitions, the line:

typedef void (TestResult::*bool_type)() const;

deserves some explanation. This is just like any other typedef, except that since it’s a function typedef the name of the type defined (bool_type) goes in the middle and not on the right hand side. The TestResult::* bit identifies that we’re defining a type that points to a member of TestResult, rather than a regular pointer. The remaining stuff just tells us that we’re talking about a const function that takes no arguments and returns void.

For an example of this being done in the wild, you can see the safe_bool class within Boost::Spirit.

How to tell if you’re in a method in C++

An issue came up at work that we wanted some generic logging code to be able to report the object it was being called from. The tricky part is that we wanted the same logging macro to work in non-method context as well. Any use of this is impossible because it will cause a compile error in the latter case.

In short, both the following instances ought to work, but the one in the class method should automatically know to print some extra gubbins about the object:

struct Something {
   void someMethod() {
      LOG("Some logging message");
   }
};
 
void someFunction() {
   LOG("Some other logging message");
}

I thought this was an interesting case, not so much for the eventual solution as for the thought process involved. One of the big problems with C++ is that although you have no shortage of powerful tools that enable metaprogramming, they were never designed to provide a coherent system. It takes a bit of experience even to know what’s possible, let alone how to go about finding a way to do something. With most metaprogramming the pattern is to find one language feature that yields the information you need (perhaps a type trait, a typedef, a function return value or whatever) and abuse some language feature (sizeof, SFINAE, template argument deduction etc.) to allow the original feature to indirectly feed through to generate a usable compile-time value such as an enum or typdef.

Obviously macros are going to be involved here: no other tool in C++ allows for code reuse and gives the repeated code access to the scope in which it is invoked. But macros are purely a lexical concept, so we’ll have to include something else that operates at a syntactic level to discriminate between the cases.

So what’s different in state between member and non-member functions in C++? There’s the this pointer, for a start. Also, a different namespace scope is available and potentially different accessibility to private and protected class members. Other than that, there’s not much else in the ISO standard (compiler extensions like __PRETTY_FUNCTION__ might help, though).

Using the this keyword is the most direct route to the information we need, since it pretty much is the information we need. Unfortunately, the definition of this is such that even mentioning it when outside of object context is an instant compile error. Since we can’t so much as mention it in our macro definition, we’re going to have to be a bit craftier.

The problem is that using this outside of a method is a syntax error, which will kill our compile immediately. Except there’s precisely one case where the compiler can recover from a syntax error and take a different path: instantiating a template. If the compiler attempts a template instantiation and gets a compile error, it may discard that template specialisation and try another possibility. This is surprisingly useful and even gets its own name, the rather snappy Substitution Failure Is Not An Error.

Unfortunately SFINAE isn’t going to help us here, because the area of expanded code where an error is allowable is within a different scope than the place we’re trying to log from: it’s either always in a free function scope, or always in a method scope (depending on whether we use a template function or a template class for SFINAE), but it will never depend on the calling context.

So we’re back to the drawing board, except that we know that we can’t mention this in our macro. This leaves us with two possible ways to discriminate the cases: class member accessibility and function overloads in scope. As an example of the former, consider the following:

#include <iostream>
 
using namespace std;
 
class Something {
public:
	static void func(...) {
		cout << "Ordinary function, anyone can see this" << endl;
	}
 
	void method() {
		Something::func(42);
	}
 
private:
	static void func(int n) {
		cout << "Private function, you need to be privileged" << endl;
	}
};
 
int main(int argc, char ** argv) {
	Something::func(42);
 
	Something something;
	something.method();
}

This is how you might like it to work: the call in main() gets the unprivileged overload, and the call in Something::method() gets the more specific overload that it can only see because it’s in the context of the class. This, by the way, works by a quirk of C++ that varargs functions are always used as a last resort and only if an overload that matches better is not available. Let’s ignore for the moment the fact that this only detects if we’re in a special sort of class (namely the one that defines func(), or a friend of it) and not the general distinction between class and non-class.

Unfortunately, we can’t use accessibility in this way because function overload resolution happens before accessibility is checked. Rather than getting a different result from the overload resolution in each case, you get the same result from overload resolution in each case and a compiler error if the preferred form is not accessible. You may be thinking that SFINAE could save us here, but in fact we have the same problem as before: the call will take place outside of the scope we’re trying to detect.

So the only thing we have left that distinguishes between free function context and method context is the different set of functions in scope:

#define LOG( x ) cout << (getThis() ? "In method" : "In function") 
   << ": " << x << endl;
 
void * getThis() {
   return NULL;
}
 
struct Fish {
   void * getThis() {
      return this;
   }
 
   void method() {
      LOG("Log line");
   }
};
 
void function() {
   LOG("Log line");
}

This is promising, actually. In fact, it’s pretty close to what we want except for the fact that we have to define this log method in every class that we want to be able to log from. This isn’t as bad as it first seems, since the only reason to get hold of the this pointer in practice is to call methods on it, and we can only do that if all the classes implement a common interface anyway.

In practice, the getThis() construction is a little weak. It only returns a void *, and in order to call any methods we’ll have to cast it back to the proper type, and we’re back to the problem of not knowing the type of the class we’re in (or even if we have a class type). Rather than trying to implement reusable code in a macro definition, any code that needs to know about types will have to be in a method that’s implemented for every class we care about, and corresponding stubs at global scope. This is pretty horrible, and will probably render this technique useless.

So in the end we don’t really have a solution, but at least we can be reasonably confident that we’re not missing anything.

Much of the above code doutless contains errors, as I haven’t tested a lot of it. Feel free to write in with corrections.

C++ type-declaration decoder

Unfortunately, since I wrote this article the markup for the code samples got corrupted during a backup and restore cycle. I’ve put some of it back in place from memory, but I need to test it properly. For now, treat this as a sketch of what the solution might look like.

expert C programming front coverDespite the somewhat self-aggrandising title, I have to admit that Expert C Programming: Deep C Secrets by Peter Van Der Linden is the single most beneficial programming book I’ve ever read. I believe the point that I came across it marked the first step on a road from being an amateurish hacker who was happy with anything as long as it compiled, to being a software professional. Of course, by this point I’d already long been paid as if I were a software professional, simply because I have a degree in mathematics from a high-ranking university. Such is the way of things.

Anyway, the most useful section of Deep C Secrets is a section that gives a simple algorithm for understanding a complicated C type declaration. You know the kind of thing:

void (*signal(int sig, void (*func) (int) ) ) (int) ;

The algorithm, by the way, is given in a section that has the delicious title “The Piece of Code that Understandeth all Parsing.” I’d forgotten how funny that book is.

The main problem with this is that it’s a run-time operation, and has to take the type declaration as a string. Parsing it involves a whole bunch of logic that already exists in the compiler, but has to be re-implemented. When I was stumbling through some declarations in C++ Templates: The Complete Guide it dawned on me that maybe C++ can do better.

Understanding any type declaration can be broken down into two parts:

  • Knowing where to start
  • Knowing which piece to process next

The first of these is often rendered difficult by the fact that there are several identifiers in a typedef, and you have to know which is the one being defined (because this is where you start to parse the type). In an anonymous type, there may be no identifiers at all:

doStuff( static_cast< int (*) ()> foobar );

In this case you start with the *, but it’s not easy to see how to know this in general.

The second difficulty (knowing which piece of the type to handle next) is complicated by the fact that you may need to proceed left-to-right or right-to-left, which depends on precedence rules that most people understand only implicitly, and often only by instinct.

Using the C++ rules for template argument deduction, we can ignore most of these issues. The core idea is to declare a template that takes a single compound type and expresses the type in terms of one or more simpler components. For example, we can write a class that deals with a pointer:

template<typename T>
class TypeDecryptor<T*> {
public:
	static string getName() {
		ostringstream output;
		output << "pointer to "
                       << TypeDecryptor<T>::getName();
		return output.str();
	}
};

What this says is that the getName() method on a TypeDecryptor applied to a pointer type will return the string “pointer to” followed by whatever the type decryptor tells us the pointed-to type should be called. We can do something very similar for const:

template<typename T>
class TypeDecryptor<const T> {
public:
	static string getName() {
		ostringstream output;
		output << "const " << TypeDecryptor<T>::getName();
		return output.str();
	}
};

We’ve already got something useful, because it can deal with all that int const * const * stuff that people sometimes have problems with. OK, so you also need a TypeDecryptor specialisation for each of the fundamental types, which just prints out the type name:

template<>
class TypeDecryptor<int> {
public:
	static string getName() {
		return "int";
	}
};

Annoyingly, I can’t find any way to generalise this, so you need an explicit specialisation for any fundamental or user-defined class type you want to support. It could be streamlined with a macro, of course.

So at this point, our decryptor can do things like this:

cout << TypeDecryptor<const * const *int>::getName() << endl;
// Outputs "pointer to const pointer to const int"
 
cout << TypeDecryptor<const * const * int>::getName() << endl;
// Outputs "const pointer to const pointer to int"

But this is just getting started. Similar things can be done with function pointers, arrays, references, pointers-to-member, etc. One of the more complex cases is:

template<typename R, typename S, typename T>
class TypeDecryptor<???> {
public:
	static string getName() {
		ostringstream output;
		output << "pointer to a member function (on type "
			   << TypeDecryptor<R>::getName()
			   << "), taking one argument of type "
			   << TypeDecryptor<S>::getName()
			   << " and returning "
			   << TypeDecryptor<T>::getName();
		return output.str();
	}
};

This allows us to deal with pointers-to-member-function with one argument. Annoyingly, you need an explicit specialisation for every different number of function arguments there can be (one for zero-argument functions, one for one-argument functions, etc…) and also a different specialisation for pointers to data member from pointers to member functions. This means the number of explicit specialisations gets quite large quite quickly.

On the plus side, you don’t need to know anything about type precedence rules to write this code, nor have to make any decisions about where in the type declaration to start processing. The C++ compiler does all the hard work. Without too much effort I was able to get something that could parse:

char (Person::*)(int (&)[42])

into

pointer to a member function (on type Person), taking one argument of type reference to array (of size 42) of instances of int and returning char