MP3 Archives

by Jamie Zawinski

Here's how we create our audio webcast archives:

Here are the programs that make all that work. Please let me know if you find this useful, or make any improvements...

archiver.pl This is the script that listens to an icecast stream, and saves it to disk. It tries to be robust in the case of network lossage: if the stream goes away, or the network stops responding, it just keeps retrying until the stream comes back. Really all this is doing is opening a network connection, and saving the raw data to a file.
indexer.pl This script looks at the set of saved files and generates HTML pages describing them. Note that each archive consists of two files: file.mp3 and file.time. The former contains the actual data, and the latter is a zero-length file that is simply used to indicate the time at which we started saving this archive (the write date on the mp3 file is the time at which we stopped.)

The reason for these two files is so that we know when we started recording and stopped recording, even in the situation where, due to a network glitch, there's a gap somewhere in the middle of the file: in that case, the elapsed time might be six hours, but the file might only have five hours worth of data in it.

slowcat.c This is a program that reads data from a file or files and copies it to stdout at an arbitrary bitrate. This is how we serve up the archives at audio speed instead of full network speed.

It has many other bells and whistles, such as the ability to insert Icecast/Shoutcast-style metadata; to generate a synthetic ID3 tag identifying the data; and to limit the output to a byte range across the whole set of files (in order to implement HTTP "Byte-Range" requests for audio seeking); and to burst out the first few seconds of the output to fill the client's buffer before falling back to throttled bandwidth.

audiofs.pl This is, basically, a CGI script that impersonate a file system. You'll note that the URLs pointed to by the audio archive are of the form
    http://host/somewhere/archive/year/month-day.m3u

That file doesn't actually exist: the file http://host/somewhere/archive is actually the audiofs.pl CGI script, and the stuff after that in the URL are arguments to that CGI.

This CGI behaves differently depending on the file extension it is invoked with: if it ends in .m3u or .pls, then it returns a document of type audio/mpegurl or audio/x-scpls, respectively. These are short files that just contain a URL of an MP3 stream. The MP3 URLs that this script places in these generated playlist files point back to itself.

When it is invoked with .mp3 as the extension, then it actually serves up MP3 data, throttling its speed via the slowcat program, above.

I don't know about you, but I think it's super cool that CGIs can impersonate whole file systems like this...

silencer.c This is a program that deletes silence from MP3 files. I wrote this program because the time ranges covered by our audio archives are driven by the hours of operation listed on the calendar; so if an event starts late, or if it ends early, then silence slips into the files. This wouldn't be a big deal if people were downloading these archives as files, but since they are streamed, and there's no way to fast forward or rewind, it's a pain to have to wait through half an hour of silence before the music starts!

This program requires libmad, an MPEG Audio Decoder library. There's also a Makefile.

clean-mp3s.pl This script, run periodically, is used to invoke silencer on each of the MP3 files in the archive that have not already been stripped.
Here's the stuff that runs the mixtapes:
mixtape.pl This is much like audiofs.pl, in that it's a pseudo-filesystem that serves audio. This is the script that runs the /mixtape/ URLs.
mixtape-install.pl I construct the mixtapes in iTunes; this is the script that pulls the playlists out of iTunes, copies the files, downsamples them to 128k, and puts them in a directory in the form expected by mixtape.pl.

Rather than using slowcat for streaming, I probably could have convinced the Icecast server to serve up the MP3 data for me, but this seemed easier, and I don't think there would have been any particular performance advantage to using Icecast instead of slowcat: the place where Icecast shines is when lots of people are listening to the same stream at the same time: with an archival situation like this, every listener is hearing something different (or at a different time.)