C++ type-declaration decoder

Unfortunately, since I wrote this article the markup for the code samples got corrupted during a backup and restore cycle. I’ve put some of it back in place from memory, but I need to test it properly. For now, treat this as a sketch of what the solution might look like.

expert C programming front coverDespite the somewhat self-aggrandising title, I have to admit that Expert C Programming: Deep C Secrets by Peter Van Der Linden is the single most beneficial programming book I’ve ever read. I believe the point that I came across it marked the first step on a road from being an amateurish hacker who was happy with anything as long as it compiled, to being a software professional. Of course, by this point I’d already long been paid as if I were a software professional, simply because I have a degree in mathematics from a high-ranking university. Such is the way of things.

Anyway, the most useful section of Deep C Secrets is a section that gives a simple algorithm for understanding a complicated C type declaration. You know the kind of thing:

void (*signal(int sig, void (*func) (int) ) ) (int) ;

The algorithm, by the way, is given in a section that has the delicious title “The Piece of Code that Understandeth all Parsing.” I’d forgotten how funny that book is.

The main problem with this is that it’s a run-time operation, and has to take the type declaration as a string. Parsing it involves a whole bunch of logic that already exists in the compiler, but has to be re-implemented. When I was stumbling through some declarations in C++ Templates: The Complete Guide it dawned on me that maybe C++ can do better.

Understanding any type declaration can be broken down into two parts:

  • Knowing where to start
  • Knowing which piece to process next

The first of these is often rendered difficult by the fact that there are several identifiers in a typedef, and you have to know which is the one being defined (because this is where you start to parse the type). In an anonymous type, there may be no identifiers at all:

doStuff( static_cast< int (*) ()> foobar );

In this case you start with the *, but it’s not easy to see how to know this in general.

The second difficulty (knowing which piece of the type to handle next) is complicated by the fact that you may need to proceed left-to-right or right-to-left, which depends on precedence rules that most people understand only implicitly, and often only by instinct.

Using the C++ rules for template argument deduction, we can ignore most of these issues. The core idea is to declare a template that takes a single compound type and expresses the type in terms of one or more simpler components. For example, we can write a class that deals with a pointer:

template<typename T>
class TypeDecryptor<T*> {
public:
	static string getName() {
		ostringstream output;
		output << "pointer to "
                       << TypeDecryptor<T>::getName();
		return output.str();
	}
};

What this says is that the getName() method on a TypeDecryptor applied to a pointer type will return the string “pointer to” followed by whatever the type decryptor tells us the pointed-to type should be called. We can do something very similar for const:

template<typename T>
class TypeDecryptor<const T> {
public:
	static string getName() {
		ostringstream output;
		output << "const " << TypeDecryptor<T>::getName();
		return output.str();
	}
};

We’ve already got something useful, because it can deal with all that int const * const * stuff that people sometimes have problems with. OK, so you also need a TypeDecryptor specialisation for each of the fundamental types, which just prints out the type name:

template<>
class TypeDecryptor<int> {
public:
	static string getName() {
		return "int";
	}
};

Annoyingly, I can’t find any way to generalise this, so you need an explicit specialisation for any fundamental or user-defined class type you want to support. It could be streamlined with a macro, of course.

So at this point, our decryptor can do things like this:

cout << TypeDecryptor<const * const *int>::getName() << endl;
// Outputs "pointer to const pointer to const int"
 
cout << TypeDecryptor<const * const * int>::getName() << endl;
// Outputs "const pointer to const pointer to int"

But this is just getting started. Similar things can be done with function pointers, arrays, references, pointers-to-member, etc. One of the more complex cases is:

template<typename R, typename S, typename T>
class TypeDecryptor<???> {
public:
	static string getName() {
		ostringstream output;
		output << "pointer to a member function (on type "
			   << TypeDecryptor<R>::getName()
			   << "), taking one argument of type "
			   << TypeDecryptor<S>::getName()
			   << " and returning "
			   << TypeDecryptor<T>::getName();
		return output.str();
	}
};

This allows us to deal with pointers-to-member-function with one argument. Annoyingly, you need an explicit specialisation for every different number of function arguments there can be (one for zero-argument functions, one for one-argument functions, etc…) and also a different specialisation for pointers to data member from pointers to member functions. This means the number of explicit specialisations gets quite large quite quickly.

On the plus side, you don’t need to know anything about type precedence rules to write this code, nor have to make any decisions about where in the type declaration to start processing. The C++ compiler does all the hard work. Without too much effort I was able to get something that could parse:

char (Person::*)(int (&)[42])

into

pointer to a member function (on type Person), taking one argument of type reference to array (of size 42) of instances of int and returning char

Leave a Reply

Your email address will not be published. Required fields are marked *