Tag Archives: Optimisation

You were never meant to do that with SQL

There seems to be a lot of hatred for SQL in the world at the moment: I can’t think of any other reason why the term NoSQL would catch on in the way that it has, when the key technological distinction is actually the lack of ACID guarantees (which are entirely orthogonal to whether or not SQL is used, as evidenced by non-ACID MySQL and HiveQL, which offers a pretty familiar SQL-like interface on an entirely non-traditional backend).

I wonder whether one of the unspoken reasons for this hatred is that at one point or another almost everyone has ended up doing this sort of thing:

   builder.Add("SELECT foo FROM bar WHERE id = ");

   if (additionalConstraint)
      builder.Add(" AND frobbable = 1 ");

   /* ... ad nauseam ... */

SQL is a hard language to like: it’s never been properly standardised (or rather, it has, but the standard has never been implemented) meaning that you spend too much time worrying about compatibility. Its theoretical underpinning is poor, leading to constructions that are hard for the engine to optimise (meaning more manual work).

However, SQL is a language in its own right, and was never intended to be generated programmatically by another programming language. This shouldn’t come as a surprise, as I struggle to think of any programming language that has been designed to work in this way.

Using SQL from a decent command-line environment is a powerful tool and often a pleasure to use. By comparison, generating SQL programmatically is an abomination that would be worth of The Daily WTF were it not for the fact that nobody’s ever invented an API that offers the same flexibility.

Personally, I blame the vendors. Until RDBMSs can offer the same quality of optimisation that modern compilers can (that is, write in a high-level language and never even think about micro-optimisation) high-performance relational database access will remain a sea of vendor-specific optimiser hacks. Maybe there’s a theoretical reason why optimisers will never be this good, in which case perhaps we do need to abandon the relational model in practice. But let’s not pretend it has anything to do with SQL.

Optimising this blog

Keen followers of this blog will probably have noticed that it runs pretty slowly. Being as this is the second post, I think we can rule out massive amounts of traffic as a cause. Obviously something is wrong in my configuration. Those of you who aren’t interested in server optimisation can probably ignore this post, I’ll try to write something for a more general audience later in the week.

According to top the percentage of time the CPU is in the wait state is close to 25%, which on 4 (virtual) CPU’s means I’m probably blocking entirely on I/O. Since I’m shifting around a few kilobytes of data, disk access ought to be trivial. The only likely candidate is swap usage, and sure enough I’m running with only a few meg of physical memory free.

I’m hosted on a minimally-sized Slicehost slice, which means I have 256Mb of physical RAM to play with. Apache is using 220Mb of virtual address space, and MySQL is using 158Mb, so both ought to be candidates for optimisation.

This post talks about optimising MySQL for a small slice, but doesn’t reach any firm conclusions. I can obviously cut down my MySQL caches quite a bit compared to the stock config file, so I changed the following:

key_buffer = 8M
query_cache_size = 8M

Both settings were reduced from 16Mb, which frees up 16Mb. Apparently the sort buffer is another variable that can be reduced since I shouldn’t have too many large data sets to sort.

sort_buffer_size = 128K

Changing this setting took effect (as evidenced by SHOW VARIABLES), but didn’t result in any reduction in memory use reported by top – perhaps this buffer is allocated and freed when a sort is in progress? Net effect is that MySQL is now using 142Mb.

On the Apache side, this post suggested disabling apache modules that aren’t in use. I disabled mod_cgi, which seemed to take Apache memory usage down to 200Mb. I also ought to be able to optimise the server for low traffic by tweaking the number of threads. I discovered I was running the prefork MPM by running:

$ /usr/sbin/apache2 -l

This meant I could use the settings the post suggested by reducing the minimum number of servers available. I turned both StartServers and MinSpareServers down to 1, and reduced MaxSpareServers to 5. After this Apache still has the same footprint per process, but there are fewer processes running so memory usage is reduced. It’s hard to estimate how much since the footprint per process reported in top contains an unknown amount of shared libraries that don’t add extra overhead per process.

The final score is that there is now around 100Mb physical memory free on the box, compared to 4-6Mb before I attempted to optimise. The blog feels much nippier, though that could be partially due to the effects of caching.