Mistakes not to make when decoding MPEG audio

I recently had to write my own code for parsing MP3 files into frames and data. Ordinarily, I would have relied on third party code for something like this, but the libraries I could find were either under unacceptable licenses or focussed on playing the audio, not manipulating the data. I was interested in extracting the data frame by frame to produce a cut-down sample of the audio.

I made a couple of mistakes in implementing the spec, which I’m recording here in the vague hope that they might be useful to someone at some point.

Frame sync marker

This is 12 bits of all-bits set, except when it isn’t, in which case it’s 11. Lots of documentation seems to be internally inconsistent on this point, often describing it as 11 bits and showing 12 bits on a diagram.

As far as I can understand it, the point here is that the ISO standards only describe MPEG 1 and 2, which both have a 1 as the first bit of the following field (the version number). Since the first bit of the version field is a 1, it doesn’t matter for ISO compliance whether you treat this as 12 bits of marker and a 1-bit version field, or 11 bits of marker and a 2-bit version field. I don’t know what the official ISO stance is on that since I’ve not been able to justify the cost of buying the original standards documents.

In practice if you are interested in writing the most generally applicable code you will have to take account of MPEG 2.5, which is an unofficial standard. In this case the first bit of the version field is zero, meaning that it makes sense to regard it as 11 bits of frame marker and a 2-bit version field.

Frame length

Lots of people provide a simple formula for calculating the frame length from the frame attributes, but some of them play a bit fast and loose, and most gloss over rounding. Worse, the version published in O’Reilly’s MP3: The Definitive Guide is entirely wrong, as far as I can tell (the padding value should not occur within the denominator).

One problem is that a lot of quoted formulae refer to MP3 specifically (i.e. layer-3 audio) and not the more general standard of MPEG audio that shares the same frame format. Beware of any formula that contains the magic number 144. To generalise this you’ll have to find the number of samples per frame, which can be found using the lookup table available here. The formula for frame length is then:

Frame Size = ( (Samples Per Frame / 8 * Bitrate) / Sampling Rate) + Padding Size

However, this isn’t quite the full story. In order to get the rounding right, you’ll need to pay attention to the “slot size”. Slot size is the number of bytes per “slot”, being the smallest unit of data in the file. Slot size is 1 for layers 2 and 3, but 4 for layer 1. Therefore if you’re dealing with layer 1, you need to round down to the nearest multiple of 4.

Leave a Reply

Your email address will not be published. Required fields are marked *