Relational Database Basics: What is Atomicity?

Atomicity is an important concept in databases, indeed it’s a key part of the definition of first normal form. But it’s a surprisingly slippery concept, and our intuitive ideas don’t seem to serve us well enough.

Codd gave the definition that atomic data is data that “cannot be decomposed into smaller pieces by the DBMS (excluding certain special functions)”. Taken literally and not allowing for ad hoc exclusions, this definition would require that every field be a single boolean value: a string can be decomposed into characters and even an integer can be decomposed into prime factors, if we care to do so. Clearly we can choose a set of allowable operators that give a sensible definition of atomicity, but we risk begging the question.

The observation above leads fairly naturally to the idea that the concept of atomicity is a product of the operators we intend to use on the data. When you start to look at things this way, the intuitive grasp of which relations are in first normal form turns out to be more complicated than you might think. Take the following relation for example, which I’m going to assume everybody will agree is in first normal form:

Database atomicity uncontroversial example

Let’s assume that Alice, Bob and Charles all work on the market selling fruit and vegetables, and that in their part of town the only products that customers have any interest in are Apples, Bananas, Cherries and Durians.

Database atomicity controversial example

Many people would claim that this is not in first normal form, since the “products sold” field is non-atomic. However, there is a fairly simple isomorphism between the two cases.

For a start, we can map our unordered set of products sold into an ordered tuple quite easily, since there is a finite number of elements that are allowed to be in the set (since greengrocers in this part of town can sell only the four products).

Database atomicity isomorphism

However, there’s also a trivial isomorphism between ordered tuples of booleans and integers in an appropriate range, given by the binary encoding of the integer. It so happens that if we assume karate ranks run from 10th Kyu to 6th Dan (essentially -10 to +6, with no zero) we can biject these with the numbers 0 to 15. If you turn the sets of products into tuples this way, and then turn them into numbers, then map these numbers to karate grades, you’ll find that the output data is exactly the same as the first relation, which is in first normal form.

How to make sense of this? Normal forms eliminate (some) redundancy, but they don’t enforce good design. The second table may be in first normal form, but it isn’t good design. The reason that it isn’t good design has nothing to do with relational theory and everything to do with the way in which we intend to use the data. “Does Alice sell Durians?” is a reasonable question to ask, but “Is Alice’s karate rank isomorphic to an odd number?” is a directly equivalent but unreasonable question to ask. As a database designer, it is your job to anticipate as many valid questions as possible, without over-complicating the model to support invalid questions.

Leave a Reply

Your email address will not be published. Required fields are marked *