Here's how we create our
audio webcast archives:
- The calendar (in this case, cron) lists
the hours that the club is open; that is, the time periods that
we wish to archive. It runs a script that records the webcast
of that time period to a file.
- Every few minutes, another script looks at the current set
of archived webcasts, and updates an HTML page describing them.
- Once a day, files more than two weeks old are deleted.
- When you click on a link to an archive, it doesn't merely
return the file: instead, it feeds the file to you slowly,
to avoid saturating the network. Since these are 128k MP3 files,
they are served out at a rate of 128kbps.
Here are the programs that make all that work. Please let me
know if you find this useful, or make any improvements...
|| This is the script that listens to an
icecast stream, and saves it to disk. It tries to be
robust in the case of network lossage: if the stream
goes away, or the network stops responding, it just
keeps retrying until the stream comes back. Really
all this is doing is opening a network connection, and
saving the raw data to a file.
|| This script looks at the set of saved files
and generates HTML pages describing them. Note that each
archive consists of two files:
file.time. The former contains the
actual data, and the latter is a zero-length file that is
simply used to indicate the time at which we started
saving this archive (the write date on the mp3 file is the
time at which we stopped.)
The reason for these two files is so that we know when
we started recording and stopped recording, even in the
situation where, due to a network glitch, there's a gap
somewhere in the middle of the file: in that case, the
elapsed time might be six hours, but the file might only
have five hours worth of data in it.
|| This is a program that reads data from a
file or files and copies it to stdout at an arbitrary bitrate.
This is how we serve up the archives at audio speed instead
of full network speed.
It has many other bells and whistles, such as the ability
to insert Icecast/Shoutcast-style metadata; to generate a
synthetic ID3 tag identifying the data; and to limit the
output to a byte range across the whole set of files (in order
to implement HTTP "Byte-Range" requests for audio seeking);
and to burst out the first few seconds of the output to fill
the client's buffer before falling back to throttled bandwidth.
|| This is, basically, a CGI script that
impersonate a file system. You'll note that the URLs
pointed to by the audio archive are of the form
That file doesn't actually exist: the file
http://host/somewhere/archive is actually
the audiofs.pl CGI script, and the stuff after that
in the URL are arguments to that CGI.
This CGI behaves differently depending on the file
extension it is invoked with: if it ends in .m3u
or .pls, then it returns a document of type
audio/mpegurl or audio/x-scpls, respectively.
These are short files that just contain a URL of an MP3
stream. The MP3 URLs that this script places in these
generated playlist files point back to itself.
When it is invoked with .mp3 as the extension,
then it actually serves up MP3 data, throttling its speed
via the slowcat program, above.
I don't know about you, but I think it's super cool
that CGIs can impersonate whole file systems like this...
This is a program that deletes silence from MP3 files. I
wrote this program because the time ranges covered by our
audio archives are driven by the hours of operation listed on
the calendar; so if an event starts late, or if it ends
early, then silence slips into the files. This wouldn't be a
big deal if people were downloading these archives as files,
but since they are streamed, and there's no way to
fast forward or rewind, it's a pain to have to wait through
half an hour of silence before the music starts!
This program requires
an MPEG Audio Decoder library. There's also a
This script, run periodically, is used to invoke silencer on each of the MP3 files in
the archive that have not already been stripped.
Here's the stuff that runs the
|| This is much like audiofs.pl, in that it's
a pseudo-filesystem that serves audio. This is the script
that runs the /mixtape/ URLs.
|| I construct the mixtapes in iTunes;
this is the script that pulls the playlists out of
iTunes, copies the files, downsamples them to 128k,
and puts them in a directory in the form expected by
Rather than using slowcat for streaming, I probably could have
convinced the Icecast server to serve up
the MP3 data for me, but this seemed easier, and I don't think there would
have been any particular performance advantage to using Icecast instead of
slowcat: the place where Icecast shines is when lots of people are
listening to the same stream at the same time: with an archival
situation like this, every listener is hearing something different (or
at a different time.)