Archive for November, 2009

Pass the Markup Hat

Wednesday, November 18th, 2009

This is a game for ten or more programming languages. The rules are as follows:

  1. Every single non-alphanumeric character in the ASCII set is put into a hat. Players draw from the hat without seeing what they are drawing.
  2. At the beginning of the game, LISP draws two matched parentheses from the 2-card “LISP deck”, which is then exhausted. Both LISP and the LISP deck play no further part in the game.
  3. The players sit in a circle, and play proceeds clockwise. Programming languages choose in a random order, apart from ALGOL, who chooses first.
  4. Each player borrows 3 characters from the player to their right, and draws from the hat as many additional random punctuation characters as they feel they need to create an expressive language and a related documentation format. They must then choose an escape character, which must not be the same as any of their characters, but must duplicate a non-escape character picked by a previous language.
  5. At any point during the 1990s, a player may play the Unicode card. From then on, the original ASCII hat is supplemented by a Unicode bin containing every single character used by a living language, and most dead ones.
    • NB: It is a common house rule that all players ignore the unicode bin with the exception of F#, who chooses last and often has to root around in it for some unused characters
  6. Any player who feels that the game is proceeding too slowly may play the SGML card, and may pick any matched pair of punctuation used by a previous player and redefine it to be the foundation upon which further markup is based. Subsequent players no longer need to define a documentation format, though they must now pick two escape characters.
  7. Once all the players have chosen their punctuation, each must write an operating system. Points will be deducted for any code that is valid in more than one language.

Paddington’s first microloan

Tuesday, November 17th, 2009

One story I remember reading as a child was the episode of Paddington where everyone’s favourite anthropomorphised Peruvian bear makes his first deposit at the bank. When he goes to withdraw his five pounds, he is flabbergasted to learn that not only do his interest payments not amount to much (there’s a joke about bearer bonds in here somewhere) but that he doesn’t even get the same five pound note back that he put in—indeed his original note may even have been burnt, if it was an old one.

The recent fuss about micro-lending site Kiva not being all it appears to be reminds me of Paddington. A certain segment of the internet community is shocked, shocked, to find out that the loans they’ve been deciding whether to make or not have already been signed, sealed and delivered in the field. For people who were assuming that they had the power to decide who did and didn’t get a loan, this is undoubtedly a surprise.

The situation isn’t as bad as it has been made out to be: I’m assuming there is some sort of feedback effect between loans made and future loans funded. If Kiva lenders decide en masse to support only women, or goat-farmers, or people in Cambodia, then Kiva will request more such loans from their field partners and eventually more such loans will be made. Nor is it the case as far as I can see that Kiva is double-counting loans to a recipient. If you want to believe that the dollars received by a particular Cambodian goat-farmer are in some sense “your” dollars, then you can continue to do so.

Like Paddington, some charity donors seem to be confused about the fungibility of their donation. Charities may promise that your donation goes to the project of your choice, and maybe even that none of it is spent on administrative costs, but this is of course a half-truth. If your donation is added to the total on Project Y, this may mean that somebody elses (unrestricted) donation isn’t needed for Project Y, and that that donation can be used for administrative expenses. The net effect is no more benefit to Project Y, but a decrease in the deficit for administrative expenses.

This is as it should be. Administration is a vital link in the chain that drives the quite remarkable (though often unremarked-upon) process of turning figures in my bank account and clicks of my mouse into clean water and education and food for some of the poorest people in the world.

flabbergasted

Relational Database Basics: What is Atomicity?

Monday, November 16th, 2009

Atomicity is an important concept in databases, indeed it’s a key part of the definition of first normal form. But it’s a surprisingly slippery concept, and our intuitive ideas don’t seem to serve us well enough.

Codd gave the definition that atomic data is data that “cannot be decomposed into smaller pieces by the DBMS (excluding certain special functions)”. Taken literally and not allowing for ad hoc exclusions, this definition would require that every field be a single boolean value: a string can be decomposed into characters and even an integer can be decomposed into prime factors, if we care to do so. Clearly we can choose a set of allowable operators that give a sensible definition of atomicity, but we risk begging the question.

The observation above leads fairly naturally to the idea that the concept of atomicity is a product of the operators we intend to use on the data. When you start to look at things this way, the intuitive grasp of which relations are in first normal form turns out to be more complicated than you might think. Take the following relation for example, which I’m going to assume everybody will agree is in first normal form:

Database atomicity uncontroversial example

Let’s assume that Alice, Bob and Charles all work on the market selling fruit and vegetables, and that in their part of town the only products that customers have any interest in are Apples, Bananas, Cherries and Durians.

Database atomicity controversial example

Many people would claim that this is not in first normal form, since the “products sold” field is non-atomic. However, there is a fairly simple isomorphism between the two cases.

For a start, we can map our unordered set of products sold into an ordered tuple quite easily, since there is a finite number of elements that are allowed to be in the set (since greengrocers in this part of town can sell only the four products).

Database atomicity isomorphism

However, there’s also a trivial isomorphism between ordered tuples of booleans and integers in an appropriate range, given by the binary encoding of the integer. It so happens that if we assume karate ranks run from 10th Kyu to 6th Dan (essentially -10 to +6, with no zero) we can biject these with the numbers 0 to 15. If you turn the sets of products into tuples this way, and then turn them into numbers, then map these numbers to karate grades, you’ll find that the output data is exactly the same as the first relation, which is in first normal form.

How to make sense of this? Normal forms eliminate (some) redundancy, but they don’t enforce good design. The second table may be in first normal form, but it isn’t good design. The reason that it isn’t good design has nothing to do with relational theory and everything to do with the way in which we intend to use the data. “Does Alice sell Durians?” is a reasonable question to ask, but “Is Alice’s karate rank isomorphic to an odd number?” is a directly equivalent but unreasonable question to ask. As a database designer, it is your job to anticipate as many valid questions as possible, without over-complicating the model to support invalid questions.