Archive for December, 2009

Project Euler in F#: Problem 8

Tuesday, December 22nd, 2009

I’ve been trying to teach myself F# using the Project Euler problems, and I’m starting to feel I’m getting somewhere with the language. The few Euler problems I’ve solved so far have had very straightforward and natural solutions.

Problem 8 is as follows: Find the subsequence of 5 consecutive digits that yield the greatest product when multiplied together, in the 1000-digit number:

73167176531330624919225119674426574742355349194934
96983520312774506326239578318016984801869478851843
85861560789112949495459501737958331952853208805511
12540698747158523863050715693290963295227443043557
66896648950445244523161731856403098711121722383113
62229893423380308135336276614282806444486645238749
30358907296290491560440772390713810515859307960866
70172427121883998797908792274921901699720888093776
65727333001053367881220235421809751254540594752243
52584907711670556013604839586446706324415722155397
53697817977846174064955149290862569321978468622482
83972241375657056057490261407972968652414535100474
82166370484403199890008895243450658541227588666881
16427171479924442928230863465674813919123162824586
17866458359124566529476545682848912883142607690042
24219022671055626321111109370544217506941658960408
07198403850962455444362981230987879927244284909188
84580156166097919133875499200524063689912560717606
05886116467109405077541002256983155200055935729725
71636269561882670428252483600823257530420752963450

I was able to come up with an F# solution that is one line, plus a helper line to convert the string into a sequence of digits:

let str = "731<...>"

let digits = Seq.map (fun x -> int (Char.GetNumericValue x)) str

let maxproduct num list =
 Seq.max (Seq.map (fun x -> Seq.reduce (*) x) (Seq.windowed num list))

The value digits is just a list of the digits in the string, converted into integers. The function maxproduct works the obvious way: take every subsequence of five digits (Seq.windowed), multiply them together (Seq.reduce, applied to each element of the sequence with Seq.map) and then find the maximum (Seq.max).

The only reason this needs quite so little work is the existence of Seq.windowed in the standard library, which does exactly the right thing in turning a 1000-element list into 996 arrays of subsequences of consecutive digits.

I’m not sure I like ramming all the functions into one line, and I’m sure there must be a way to combine map and reduce without the lambda, which adds a lot of clutter. If this was real code, it would need quite a lot of work to make it readable. However, the standard library is a big win, because the process of ‘windowing’ a sequence is nicely separated from the code. It’s also nice (for toy problems like this, at any rate) that the program is pretty much a definition of the problem, with little thought being necessary as to how to do the processing.

Relational Database Basics: What is a relation?

Wednesday, December 9th, 2009

The biggest misunderstanding people tend to have with the relational model must be the understanding of the term “relation” itself. Since people tend to learn relational theory as an add-on to learning about SQL, they naturally learn that the things you put the data in are called “tables” and that tables are related to each other. The natural (but incorrect) assumption then, is that “relational” refers to the relationships that exist between tables, and this couldn’t be more wrong.

The simplest explanation

Put simply, a “relation” is what SQL calls a table. If you learn nothing else about relational theory, at least understand this. This is an oversimplification of course, but it’s close enough to being true that if you don’t want to learn any theory, it will at least make the discussions of theorists more comprehensible.

The mathematical explanation

This isn’t the best way to understand what a relation is, but if you intend to have meaningful discussions with other practitioners, you will need to have a common understanding based on a definition that is unambiguous. I’ll therefore get the mathematical explanation out of the way here; if it doesn’t make much sense, return to it after the more intuitive description below. Though this is a relatively formal description, it doesn’t come close to being totally precise, and anyone who wants to know more is encouraged to investigate a book on the subject.

An attribute is a combination of a name and a  type identifier, where we can for the moment treat a type as being a (possibly infinite) set of values with some operators defined on it. Think of an attribute as being like a column definition.

A tuple is a set of distinct attributes, where each attribute is associated with one value that is an instance of the type for that attribute. The members of a tuple do not posess an inherent order, and tuples ordered in different ways for display purposes nevertheless represent the same tuple.

A relation consists of a heading and a body. The heading is a (possibly empty) set of attributes with distinct names. The body is a (possibly empty) set of tuples, each of which has the same set of attributes as the heading of the relation.

To put this in terms familiar to an SQL user: an attribute is analogous to a column definition, a tuple is analogous to a row and a relation is analogous to a table. Note that this is an over-simplification, mostly because we think of the rows and columns of a table as posessing an inherent order, and mathematical relations have no such order.

One other thing that bears stating at this point is that a relation is technically an immutable value, and is held in a mutable variable called a relvar. This is analogous to common programming practice where an integer like 5 is immutable, but held in a mutable integer variable. If you “insert a row into a table”, then actually you change the contents of that relvar from one relation to another. This distinction is rarely of relevance in discussing theoretical issues.

The intuitive explanation

Unless you’re already familiar with relational theory, that was probably all rather unclear, in which case the only vital things to take away are: columns are unordered, and rows are unordered. If you are familiar with relational theory, you’re probably angry at me for making so many mistakes, in which case please point them out in the comments.

So what does this mean in intuitive terms? A common intuitive feeling about tables is that they represent a list of entities, and indeed this understanding works nicely for simple cases. Take a table of salaried employees in FictoCorp:

Table showing list of employees in a fictional company

The head of HR for this company might look at the table and say, “yep, those are my employees—I’d recognise ‘em anywhere.” As far as they’re concerned, each row in this table represents one of the employees they have to deal with. Furthermore, no row represents more than one employee, and there’s no employee of the company who doesn’t have a row.

It just so happens that FictoCorp (who have a lot of important customers in the netball industry) has a policy that all employees must play for one of the company’s netball teams. In order to keep track of this, the team captain keeps the following table in the company database:

A table showing the netball teams and positions of fictional players

We’ll simplify things by only displaying four employees, though obviously there would be more.

As an aside, netball has the nice property that the positions are named and unique; no player can be on the same team playing in the same position as another player. Therefore the combination of Netball Team and Position uniquely identifies a single employee. Obviously this constraint makes it impossible for FictoCorp to hire or fire people other than in unisex groups of 7 (in order that they can add or remove an entire netball team at once), but hey, it’s worth it for all the lucrative netball-industry contacts.

As far as the netball club captain is concerned, the entries in this table are the employees. Any employee will be in this table, and anyone in this table is an employee. So who is right, the HR manager or the netball club captain? Which table “holds” the employees? And if one table “is” the set of employees, what does that mean about the other table?

A digression

FictoCorp’s netball teams are so successful that the major league teams start to send talent scouts to their games. One day, the manager of a professional team rings up to enquire about hiring one of FictoCorp’s players.

“He was brilliant, we just have to have him … Any price, any price at all … His name? I don’t remember that, but he was definitely playing Wing Attack for your Men’s First team”

Luckily, with this information is all that is needed to identify that the player in question is Charles. The table of netball players worked equally well as a way of finding a player from their netball team and position as vice versa.

From the point of view of an outsider to FictoCorp, the table is a list of teams and playing positions, with the useful effect that the player’s name can be looked up. The talent scout’s view and the club manager’s view of the meaning of the table are different, but both are using the same table.

Resolving the ambiguity

The netball players table is neither a container of people, nor a container of playing positions. Both of these are extrinsic to the table: they will continue to exist if the table is deleted, though FictoCorp may no longer have the information it needs to get the necessary work done.

One way to think of the relation is in terms of the corresponding predicate: a function that takes a group of objects and produces a true or false value. An informal definition of the predicate for the netball players table might be:

There exists a player called X, who plays on team Y in position Z

If we substitute into this values from the table, we get true values from the function:

There exists a player called Alice, who plays on team W1 in position GA (true)

There exists a player called Charles, who plays on team M1 in position WA (true)

If we substitute in other values, we get false values from the function

There exists a player called Charles, who plays on team W1 in position WA (false)

You can think of this as a function on a 3-dimensional space, where one dimension is the list of every person in the world, one dimension is every netball team FictoCorp has and the final dimension is every possible position in a netball team:

Diagram of a relation on a 3-dimensional space

The predicate is a function over this entire 3-dimensional space. The tuples (rows) in the relation represent points in this space for which the function evaluates to true. Tuples that could be in the table, but aren’t, represent points in this space for which the predicate evaluates to false.

Things to note:

  • The predicate evaluates to true or false on every point in this space; nowhere in the space is the predicate undefined
  • The predicate can’t be evaluated anywhere but points in this space; it would be meaningless to do so

In a sense, the predicate give the meaning of the table, and this meaning won’t change as we add and remove players from various teams. The tuples in the relation (the rows in the table) show us what is currently true in the real world. It is a goal of a well-maintained database that the facts implied by the table always remain a true representation of what is true in the real world, for drawing conclusions about the real world is the reason databases exist.

Objections to this model

One obvious objection to this model is that if people, salaries, netball team positions etc. are all extrinsic to the tables, how do we keep track of an entity that happens not to appear in any of the tables? If FictoCorp has a contractor called Edgar working for the company who isn’t in the employees table, and is excused from being in any of the netball teams, how do we keep track of this person?

The answer is that the database contains all the information we want to store, and nothing else. If the database system needs to be able to be used to answer questions about contractors, it will have a contractors table in which Edgar will appear. If for some reason FictoCorp doesn’t care to know what contractors it has relationships with, then Edgar will be a non-entity as far as the database is concerned.

Where are all the UK entrepreneurs?

Saturday, December 5th, 2009

If all goes well, then by the time this article gets published I will have received my first pay cheque in 6 months. Not because I’ve been unemployed, but because I chose to leave my permanent job for a pre-funding start-up. When I told people I was quitting a steady job to work without salary in the worst downturn since the Second World War, I expected a lot of people to tell me I was being reckless. What I wasn’t expecting from people was incredulity.

I’ve honestly been surprised at the number of people who think it would be impossible for someone like me to forgo six month’s salary. By the time you factor in the reduction in my tax liability, my opportunity cost has been less than the cost of buying a new BMW 1-Series, but if my acquaintances saw me driving a brand new car I doubt they would bat an eyelid. This comparison ignores the fact that shares in a start-up (sometimes) appreciate, while the car costs money to run and depreciates at an alarming rate.

If this attitude is as widespread in the British population as my anecdotal evidence would suggest, this says something slightly alarming: many people think that the best a person can do is to work a steady job and consume the proceeds, and that risk-taking and investment are something for “other people” to do. More worryingly, debt-driven consumption is regarded as perfectly acceptable (nobody would think to ask where I got the money for the 1-series, or whether I could afford it).