Category Archives: Database

Relational Database Basics: What is Atomicity?

Atomicity is an important concept in databases, indeed it’s a key part of the definition of first normal form. But it’s a surprisingly slippery concept, and our intuitive ideas don’t seem to serve us well enough.

Codd gave the definition that atomic data is data that “cannot be decomposed into smaller pieces by the DBMS (excluding certain special functions)”. Taken literally and not allowing for ad hoc exclusions, this definition would require that every field be a single boolean value: a string can be decomposed into characters and even an integer can be decomposed into prime factors, if we care to do so. Clearly we can choose a set of allowable operators that give a sensible definition of atomicity, but we risk begging the question.

The observation above leads fairly naturally to the idea that the concept of atomicity is a product of the operators we intend to use on the data. When you start to look at things this way, the intuitive grasp of which relations are in first normal form turns out to be more complicated than you might think. Take the following relation for example, which I’m going to assume everybody will agree is in first normal form:

Database atomicity uncontroversial example

Let’s assume that Alice, Bob and Charles all work on the market selling fruit and vegetables, and that in their part of town the only products that customers have any interest in are Apples, Bananas, Cherries and Durians.

Database atomicity controversial example

Many people would claim that this is not in first normal form, since the “products sold” field is non-atomic. However, there is a fairly simple isomorphism between the two cases.

For a start, we can map our unordered set of products sold into an ordered tuple quite easily, since there is a finite number of elements that are allowed to be in the set (since greengrocers in this part of town can sell only the four products).

Database atomicity isomorphism

However, there’s also a trivial isomorphism between ordered tuples of booleans and integers in an appropriate range, given by the binary encoding of the integer. It so happens that if we assume karate ranks run from 10th Kyu to 6th Dan (essentially -10 to +6, with no zero) we can biject these with the numbers 0 to 15. If you turn the sets of products into tuples this way, and then turn them into numbers, then map these numbers to karate grades, you’ll find that the output data is exactly the same as the first relation, which is in first normal form.

How to make sense of this? Normal forms eliminate (some) redundancy, but they don’t enforce good design. The second table may be in first normal form, but it isn’t good design. The reason that it isn’t good design has nothing to do with relational theory and everything to do with the way in which we intend to use the data. “Does Alice sell Durians?” is a reasonable question to ask, but “Is Alice’s karate rank isomorphic to an odd number?” is a directly equivalent but unreasonable question to ask. As a database designer, it is your job to anticipate as many valid questions as possible, without over-complicating the model to support invalid questions.

Customising sort order in MySQL

I came across a situation the other day where I needed to sort my result set in the database for efficiency: we were selecting a small number of a very large result set, so sorting on the client would require the entire data set to travel over the wire. Unfortunately, the requirements of the front end weren’t compatible with MySQL sort order, since the customer wanted empty strings sorted to the bottom of the list.

In other words, I wanted to be able to do something like the following:

  SELECT id, name
    FROM people
ORDER BY name
   LIMIT 20;

and end up with the result set

id name
3 Alice
7 Bob
4

Changing the schema wasn’t an option, and nor was doing the majority of the sort in MySQL and post-filtering, since that would cause the wrong number of results to be returned after some had been shuffled to the bottom of the list. Luckily, there’s a quick hack in MySQL that allows this sort of thing to be done:

  SELECT id, name
    FROM people
ORDER BY FIELD(name, ''), name
   LIMIT 20;

This works because the FIELD() function returns the index of the name value in the list of fields given, and 0 if it is not present. That is, if the name is an empty string, this expression will return 1, and if not it will return 0. This causes empty strings to be sorted to the end of the list as desired.

Note that this prevents an index from being used for the ordering, which may be a problem depending on the size of the result set. In my case, the query was such that an index couldn’t be used anyway, so there was no substantial loss of efficiency.