Relational Database Basics: What is a relation?

The biggest misunderstanding people tend to have with the relational model must be the understanding of the term “relation” itself. Since people tend to learn relational theory as an add-on to learning about SQL, they naturally learn that the things you put the data in are called “tables” and that tables are related to each other. The natural (but incorrect) assumption then, is that “relational” refers to the relationships that exist between tables, and this couldn’t be more wrong.

The simplest explanation

Put simply, a “relation” is what SQL calls a table. If you learn nothing else about relational theory, at least understand this. This is an oversimplification of course, but it’s close enough to being true that if you don’t want to learn any theory, it will at least make the discussions of theorists more comprehensible.

The mathematical explanation

This isn’t the best way to understand what a relation is, but if you intend to have meaningful discussions with other practitioners, you will need to have a common understanding based on a definition that is unambiguous. I’ll therefore get the mathematical explanation out of the way here; if it doesn’t make much sense, return to it after the more intuitive description below. Though this is a relatively formal description, it doesn’t come close to being totally precise, and anyone who wants to know more is encouraged to investigate a book on the subject.

An attribute is a combination of a name and a  type identifier, where we can for the moment treat a type as being a (possibly infinite) set of values with some operators defined on it. Think of an attribute as being like a column definition.

A tuple is a set of distinct attributes, where each attribute is associated with one value that is an instance of the type for that attribute. The members of a tuple do not posess an inherent order, and tuples ordered in different ways for display purposes nevertheless represent the same tuple.

A relation consists of a heading and a body. The heading is a (possibly empty) set of attributes with distinct names. The body is a (possibly empty) set of tuples, each of which has the same set of attributes as the heading of the relation.

To put this in terms familiar to an SQL user: an attribute is analogous to a column definition, a tuple is analogous to a row and a relation is analogous to a table. Note that this is an over-simplification, mostly because we think of the rows and columns of a table as posessing an inherent order, and mathematical relations have no such order.

One other thing that bears stating at this point is that a relation is technically an immutable value, and is held in a mutable variable called a relvar. This is analogous to common programming practice where an integer like 5 is immutable, but held in a mutable integer variable. If you “insert a row into a table”, then actually you change the contents of that relvar from one relation to another. This distinction is rarely of relevance in discussing theoretical issues.

The intuitive explanation

Unless you’re already familiar with relational theory, that was probably all rather unclear, in which case the only vital things to take away are: columns are unordered, and rows are unordered. If you are familiar with relational theory, you’re probably angry at me for making so many mistakes, in which case please point them out in the comments.

So what does this mean in intuitive terms? A common intuitive feeling about tables is that they represent a list of entities, and indeed this understanding works nicely for simple cases. Take a table of salaried employees in FictoCorp:

Table showing list of employees in a fictional company

The head of HR for this company might look at the table and say, “yep, those are my employees—I’d recognise ’em anywhere.” As far as they’re concerned, each row in this table represents one of the employees they have to deal with. Furthermore, no row represents more than one employee, and there’s no employee of the company who doesn’t have a row.

It just so happens that FictoCorp (who have a lot of important customers in the netball industry) has a policy that all employees must play for one of the company’s netball teams. In order to keep track of this, the team captain keeps the following table in the company database:

A table showing the netball teams and positions of fictional players

We’ll simplify things by only displaying four employees, though obviously there would be more.

As an aside, netball has the nice property that the positions are named and unique; no player can be on the same team playing in the same position as another player. Therefore the combination of Netball Team and Position uniquely identifies a single employee. Obviously this constraint makes it impossible for FictoCorp to hire or fire people other than in unisex groups of 7 (in order that they can add or remove an entire netball team at once), but hey, it’s worth it for all the lucrative netball-industry contacts.

As far as the netball club captain is concerned, the entries in this table are the employees. Any employee will be in this table, and anyone in this table is an employee. So who is right, the HR manager or the netball club captain? Which table “holds” the employees? And if one table “is” the set of employees, what does that mean about the other table?

A digression

FictoCorp’s netball teams are so successful that the major league teams start to send talent scouts to their games. One day, the manager of a professional team rings up to enquire about hiring one of FictoCorp’s players.

“He was brilliant, we just have to have him … Any price, any price at all … His name? I don’t remember that, but he was definitely playing Wing Attack for your Men’s First team”

Luckily, with this information is all that is needed to identify that the player in question is Charles. The table of netball players worked equally well as a way of finding a player from their netball team and position as vice versa.

From the point of view of an outsider to FictoCorp, the table is a list of teams and playing positions, with the useful effect that the player’s name can be looked up. The talent scout’s view and the club manager’s view of the meaning of the table are different, but both are using the same table.

Resolving the ambiguity

The netball players table is neither a container of people, nor a container of playing positions. Both of these are extrinsic to the table: they will continue to exist if the table is deleted, though FictoCorp may no longer have the information it needs to get the necessary work done.

One way to think of the relation is in terms of the corresponding predicate: a function that takes a group of objects and produces a true or false value. An informal definition of the predicate for the netball players table might be:

There exists a player called X, who plays on team Y in position Z

If we substitute into this values from the table, we get true values from the function:

There exists a player called Alice, who plays on team W1 in position GA (true)

There exists a player called Charles, who plays on team M1 in position WA (true)

If we substitute in other values, we get false values from the function

There exists a player called Charles, who plays on team W1 in position WA (false)

You can think of this as a function on a 3-dimensional space, where one dimension is the list of every person in the world, one dimension is every netball team FictoCorp has and the final dimension is every possible position in a netball team:

Diagram of a relation on a 3-dimensional space

The predicate is a function over this entire 3-dimensional space. The tuples (rows) in the relation represent points in this space for which the function evaluates to true. Tuples that could be in the table, but aren’t, represent points in this space for which the predicate evaluates to false.

Things to note:

  • The predicate evaluates to true or false on every point in this space; nowhere in the space is the predicate undefined
  • The predicate can’t be evaluated anywhere but points in this space; it would be meaningless to do so

In a sense, the predicate give the meaning of the table, and this meaning won’t change as we add and remove players from various teams. The tuples in the relation (the rows in the table) show us what is currently true in the real world. It is a goal of a well-maintained database that the facts implied by the table always remain a true representation of what is true in the real world, for drawing conclusions about the real world is the reason databases exist.

Objections to this model

One obvious objection to this model is that if people, salaries, netball team positions etc. are all extrinsic to the tables, how do we keep track of an entity that happens not to appear in any of the tables? If FictoCorp has a contractor called Edgar working for the company who isn’t in the employees table, and is excused from being in any of the netball teams, how do we keep track of this person?

The answer is that the database contains all the information we want to store, and nothing else. If the database system needs to be able to be used to answer questions about contractors, it will have a contractors table in which Edgar will appear. If for some reason FictoCorp doesn’t care to know what contractors it has relationships with, then Edgar will be a non-entity as far as the database is concerned.

Leave a Reply

Your email address will not be published. Required fields are marked *