# Statistics don’t work like that

I’m shortly going to be married, so for the first time in my life I’ve taken an interest in buying jewellery. After my fiancĂ©e and I struggled to find the right ring for her, we ended up getting a bespoke one made by a specialist jeweller. Despite the swanky Mayfair address it was an unpretentious experience, run on a relatively small scale and with a personal touch.

All went excellently until after the piece had been delivered, at which point I received a customer satisfaction survey. Nothing particularly wrong with this, but for some reason they decided to phrase the questions thus: “On a scale from 1 to 5, how happy were you with…”

Stop and think about this for a moment. The company has invested their entire business model in making me feel special. Individualising things is what sets them apart from the competition. And yet when it comes to assessing satisfaction, they inadvertently send the message: “Your satisfaction is a number to us. We’re going to add it up and divide by n, where n is the number of customers we have. If the result is a large enough number, we will consider our job done.” I’m not naive enough to think I’m their only customer, but retail is a world of reassuring fictions, and this struck the wrong note.

I don’ t have any facts, but I can’t help wondering whether their number of customers is large enough to support any statistically significant analysis, beyond a simple arithmetic mean. I very much doubt that it is. But this kind of analysis is the only reason that numerical data is better than free text. If you want to dig deeper into what went wrong for your dissatisfied customers, then numeric data is the worst place to start from. None of this even touches on the deeper problem that people tend to misuse the data, assuming that three 3s count for the same as two 4s and a 1.

So why do so many people reach for the discrete numeric ranges when they want to gather information? My guess is that it feels more “professional”, more sciency, more like “real work”. Dealing with numbers rather than messy human ideas adds a facade of objectivity, but it’s really only a facade: the translation is taking place just the same, and different people will make the translation differently. Objectivity is nice to have, all other things being equal, but it is no substitute for insight. In a small scale retail environment, ones judgment and experience ought to be good enough to draw conclusions without resorting to cargo-cult data analysis.

# Book review: Head First Statistics vs. The Manga Guide to Statistics

Statistics is a subject that most people have less understanding of than they ought to, not least because it’s usually such a dry topic. As Zed Shaw pointed out, the lack of understanding of statistics is something of a blind spot for programmers, who tend to think of themselves as numerically proficient but often dismiss statistics as unimportant “stamp collecting” for people who can’t do “real maths”.

When I came across two books recently that try to make the subject more fun and approachable, I was initially quite sceptical. In my opinion, the main problem with statistics is not that people don’t spend time trying to learn it, but rather that they don’t properly comprehend the underlying principles. Too often teachers seem to be trying to make it more approachable by leaving out the mathematics, leaving just a series of “black box” techniques into which the student plugs the numbers. The problem with this is that it’s easy to plug numbers into the wrong black box.

The Manga Guide to Statistics tells the story of Rui, a young girl who takes a course on statistics in order to impress an attractive teacher. The reader learns through Rui’s eyes as we observe a series of lessons. Though Rui lacks interest in the topic, she seems intelligent and the teacher strikes a good tone that avoids ever being patronising. The characters make for an entertaining read, and unlike “Head First”, the jokes actually made me laugh.

Of course, all this is for naught if the book doesn’t teach the concepts of statistics. Thankfully (and somewhat surprisingly to me), the answer is that it teaches the topic rather well. The main downside to the manga style is that information density is very low, and plenty of technical details are sketched or glossed over entirely. One positive side to this is that omitting details makes the principles stand out more clearly. This certainly won’t be your only book on statistics, but as an introduction it’s an engaging and memorable one.

Engaging and memorable are two adjectives that ought to apply to “Head First” as well, but I’m less convinced by this one. It strikes a chatty, informal tone with plenty of simplified examples, diagrams and other visual aids. A key part of its approach is to express the same idea several times in different forms, the effect of which varies between useful and infuriating depending on whether you were having trouble with the concept. Unlike other Head First titles, I found the tone positively patronising at times.

One obvious difference from the Manga Guide is the sheer volume of information presented. It goes into a lot more depth on graphing, probability and combinatorics, among other things. The pages are somewhat more information-dense, but even so it runs to a massive 677 pages (roughly equivalent to Schneier’s Applied Cryptography, if that gives you any idea). In terms of topics covered you won’t be left wanting, though the format doesn’t make a good reference book.

If you’ve learned statistics before and just need a refresher, the Manga Guide makes a good change of pace even if (or perhaps especially if) you wouldn’t normally read manga. I could certainly recommend the Head First book if you have struggled with learning statistics in other books and want to take things slowly, and if you don’t like the idea of switching to a “real” textbook once you’ve finished with the introductory material.