Skip to main content

[These notes will eventually become part of a tech talk on video encoding. List of all articles in this series.]

Unless you're going to stick to films made before 1927 or so, you're going to want an audio track. A future article will talk about how to pick the audio codec that's right for you, but for now I just want to introduce the concept and describe the playing field. (This information is likely to go out of date quickly; future readers, be aware that this was written in December 2008.)

Like video codecs, audio codecs are algorithms by which an audio stream is encoded. Like video codecs, there are lossy and lossless audio codecs. Today's article will only deal with lossy audio codecs. Actually, it's even narrower than that, because there are different categories of lossy audio codecs. Audio is used in many places where video is not (telephony, for example), and there is an entire category of audio codecs optimized for encoding speech. You wouldn't rip a music CD with these codecs, because the result would sound like a 4-year-old singing into a speakerphone. But you would use them in an Asterisk PBX, because bandwidth is precious, and these codecs can compress human speech into a fraction of the size of general-purpose codecs.

And that's all I have to say about speech-optimized audio codecs. Onward...

As I mentioned in part 2: lossy video codecs, when you "watch a video," your player software is doing several things at once:

  1. Interpreting the container format
  2. Decoding the video stream
  3. Decoding the audio stream and sending the sound to your speakers
  4. Possibly decoding the subtitle stream as well. (Tomorrow's article will be all about subtitle formats! I can hardly wait!)

The audio codec specifies how to do #3 -- decoding the audio stream and turning it into digital waveforms that your speakers then turn into sound. As with video codecs, there are all sorts of tricks to minimize the amount of information stored in the audio stream. And since we're talking about lossy audio codecs, information is being lost during the recording → encoding → decoding → listening lifecycle. Different audio codecs throw away different things, but they all have the same purpose: to trick your ears into not noticing the parts that are missing.

One concept that audio has that video does not is channels. We're sending sound to your speakers, right? Well, how many speakers do you have? If you're sitting at your computer, you may only have two: one on the left and one on the right. My desktop has three: left, right, and one more on the floor. So-called "surround sound" systems can have six or more speakers, strategically placed around the room. Each speaker is fed a particular channel of the original recording. The theory is that you can sit in the middle of the six speakers, literally surrounded by six separate channels of sound, and your brain synthesizes them and feels like you're in the middle of the action. Does it work? A multi-billion-dollar industry seems to think so.

Most general-purpose audio codecs can handle two channels of sound. During recording, the sound is split into left and right channels; during encoding, both channels are stored in the same audio stream; during decoding, both channels are decoded and each is sent to the appropriate speaker. Some audio codecs can handle more than two channels, and they keep track of which channel is which and so your player can send the right sound to the right speaker.

There are lots of audio codecs. Did I say there were lots of video codecs? Forget that. There are a metric fuck-ton of audio codecs. These are the ones you need to know about:

MPEG-1 Audio Layer 3

...colloquially known as "MP3." If you haven't heard of MP3s, I don't know what to do with you. Walmart sells portable music players and calls them "MP3 players." Walmart. Anyway...

MP3s can contain up to 2 channels of sound. They can be encoded at different bitrates: 64 kbps, 128 kbps, 192 kbps, and a variety of others from 32 to 320. Higher bitrates mean larger file sizes and better quality audio, although the ratio of audio quality to bitrate is not linear. (128 kbs sounds more than twice as good as 64 kbs, but 256 kbs doesn't sound twice as good as 128 kbs.) Furthermore, the MP3 format allows for variable bitrate encoding, which means that some parts of the encoded stream are compressed more than others. For example, silence between notes can be encoded at a very low bitrate, then the bitrate can spike up a moment later when multiple instruments start playing a complex chord. MP3s can also be encoded with a constant bitrate, which, unsurprisingly, is called constant bitrate encoding.

The MP3 standard doesn't define exactly how to encode MP3s (although it does define exactly how to decode them); different encoders use different psychoacoustic models that produce wildly different results, but are all decodable by the same players. The open source LAME project is the best free encoder, and arguably the best encoder period for all but the lowest bitrates.

The MP3 format was standardized in 1991 and is patent-encumbered, which explains why Linux sucks can't play MP3 files out of the box. Pretty much every portable music player supports standalone MP3 files, and MP3 audio streams can be embedded in any video container. Adobe Flash can play both standalone MP3 files and MP3 audio streams within an MP4 video container.

Advanced Audio Coding

...affectionately known as "AAC." Standardized in 1997, it lurched into prominence when Apple chose it as their default format for the iTunes Store. Originally, all AAC files "bought" from the iTunes Store were encrypted with Apple's proprietary DRM scheme, called FairPlay. Many songs in the iTunes Store are now available as unprotected AAC files, which Apple calls "iTunes Plus" because it sounds so much better than calling everything else "iTunes Minus." The AAC format is patent-encumbered; licensing rates are available online.

AAC was designed to provide better sound quality than MP3 at the same bitrate, and it can encode audio at any bitrate. (MP3 is limited to a fixed number of bitrates, with an upper bound of 320 kbs.) AAC can encode up to 48 channels of sound, although in practice no one does that. The AAC format also differs from MP3 in defining multiple profiles, in much the same way as H.264, and for the same reasons. The "low-complexity" profile is designed to be playable in real-time on devices with limited CPU power, while higher profiles offer better sound quality at the same bitrate at the expense of slower encoding and decoding.

All current Apple products, including iPods, AppleTV, and QuickTime support certain profiles of AAC in standalone audio files and in audio streams in an MP4 video container. Adobe Flash supports all profiles of AAC in MP4, as do the open source mplayer and VLC video players. For encoding, the FAAC library is the open source option; support for it is a compile-time option in mencoder and ffmpeg. (I'll dive into all the different encoding tools in a future article.)

Windows Media Audio

...a.k.a. "WMA." As you might guess from the name, Windows Media Audio was developed by Microsoft. The acronym "WMA" has historically referred to many different things: a lossless audio codec ("WMA Lossless"), a speech-optimized codec ("WMA Voice"), and several different lossy audio codecs ("WMA 1", "WMA 2", "WMA 7", "WMA 8", "WMA 9", and "WMA Pro"). It is also (incorrectly) used to refer to the Advanced Systems Format, because WMA-encoded audio streams are usually embedded in an ASF container. Roughly speaking, the lossy audio codecs (WMA 1-9) compete with MP3 and low-complexity AAC; WMA Lossless competes with Apple Lossless and FLAC; WMA Pro competes with high-complexity AAC, Vorbis, AC-3, and DTS.

All the different codecs under the "WMA" brand are playable with Windows Media Player, which comes pre-installed on desktops and laptops running Microsoft Windows XP and Vista. Portable devices like the Zune and the ironically named "PlaysForSure" devices can play WMA 1-9; stores that allow you to "purchase" WMA files generally encrypt them with a Microsoft-proprietary DRM scheme. The open source ffmpeg project can play WMA 1-9, and Flip4Mac offers a commercial QuickTime component to encode and decode WMA audio on Mac OS X.

WMA 1-9 support up to 2 channels of sound; WMA Pro supports up to 8 channels of sound. All WMA formats are patent-encumbered; licensing information is available from Microsoft.

Vorbis

...known to many as "Ogg Vorbis," although for some reason that pisses off both Ogg and Vorbis advocates. (Technically, "Ogg" is a container format, and Vorbis audio streams can be embedded in other containers.) Vorbis is not encumbered by any known patents and is therefore supported out-of-the-box by all major Linux distributions and by portable devices running the open source Rockbox firmware. Mozilla Firefox 3.1 will support Vorbis audio files in an Ogg container, or Ogg videos with a Vorbis audio track. Android mobile phones can also play standalone Vorbis audio files. Vorbis audio streams are usually embedded in an Ogg container, but they can also be embedded in an MP4 or MKV container (or, with some hacking, in AVI).

There are open source Vorbis encoders and decoders, including OggConvert (encoder), ffmpeg (decoder), aoTuV (encoder), and libvorbis (decoder). There are also QuickTime components for Mac OS X and DirectShow filters for Windows.

Vorbis supports an arbitrary number of sound channels.

Dolby Digital

...a.k.a. "AC-3." AC-3 was developed by Dolby Laboratories. AC-3 is most well-known for being a mandatory format in the DVD standard; all DVD players must be able to decode AC-3 audio streams. It is also mandatory for Blu-Ray players, and many digital TV broadcasts send AC-3 audio streams as well. AC-3 supports up to 6 channels of sound and bitrates of up to 640 kbps, although its most popular application -- audio on DVDs -- is officially limited to 448 kbps. (Blu-Ray discs may use the maximum 640 kbps.)

There are open source encoders and decoders for AC-3, including liba52 (decoding), AC3Filter (decoding), and Aften (encoding). ffmpeg has a compile-time option to include liba52, which will allow all ffmpeg-based players and plugin chains (like GStreamer) to play AC-3 audio streams. However, the AC-3 format is patent-encumbered; licensing is brokered by Dolby Laboratories.

AC-3 is rarely seen in standalone audio files; it is designed to be embedded in a video container. Other than DVDs and Blu-Ray discs (which use a video container format I haven't talked about yet), you can embed AC-3 audio streams in MKV, AVI, and -- just standardized earlier this year -- in MP4 files (discussion). Apple's AppleTV set-top box is the only hardware device I know of that supports AC-3 in MP4; you can encode AppleTV-compatible AC3-in-MP4 videos with HandBrake, or manually insert AC-3 audio into existing MP4 files with this Windows-only fork of mp4creator.

Digital Theater System

...a.k.a. "DTS." As you might guess from the name, DTS is designed for real-life movie theaters. Like WMA, "DTS" is a brand name for a family of different audio formats. The "core" DTS format supports up to six channels; later extensions like DTS-HD support up to eight channels. There is also DTS-HD Master Audio, a lossless variant by the same company. Core DTS is designed for high bitrates (up to 1536 kbps, which is virtually indistinguishable from being there in the first place). DTS-HD Master Audio bitrates can go even higher, although at some point even audiophiles will wonder why they should bother.

Core DTS was not originally part of the DVD specification, so early DVD players did not support it. Most recent DVD players support natively decoding core DTS audio or passing the audio stream through to an external speaker system which decodes it, but relatively few DVDs include a DTS stream due to size constraints. Core DTS is a mandatory part of the Blu-Ray specification, and many Blu-Ray discs include a DTS audio track -- sometimes the exact same stream that was originally played in the movie theater. (DTS-HD Master Audio is an optional part of the Blu-Ray specification, but few Blu-Ray discs include it due to -- you guessed it -- size constraints.)

DTS is patent-encumbered; licensing is brokered by DTS, Inc.

And so forth and so on

As with everything else in this series, this article barely scratches the surface. (Really!) If you like, you can read about other audio codecs: ATRAC, Musepack, MP2, RealAudio, AMR, ADPCM, and so forth and so on. Wikipedia has a comparison of common audio codecs, HydrogenAudio has lots of technical details, and wiki.multimedia.cx is always your friend too.

Tomorrow: subtitles!

§

As far as I can tell, the only thing that leading accessibility experts agree on is that nobody listens to leading accessibility experts, especially not the microformats cabal, which has never cared about accessibility, has never bothered to test it, and has never acknowledged those who have tested it. In fact, the BBC recently removed one microformat from their site because one piece of it may be confusing to some screen reader users with a certain non-default configuration. This proves what leading accessibility experts have been saying all along, that all microformats are inaccessible, and we should all just use RDF.

Meanwhile, the devilish cabal is secretly solving the problem on their public wiki page, their public mailing list, and their public IRC channel. But will it be enough for the BBC? Be sure to tune in next week, when we'll drown a leading accessibility expert to see if she's a witch.

§

Update June 7, 2007: this script is obsolete. Read about the updated version or visit the new project page on Google Code.

Despite my recent switch to Ubuntu, I still own a video iPod 5G, and I have been searching for a nice replacement for Handbrake and iSquint. I've tried every available script and program that purports to encode iPod-compatible video (including Handbrake under Linux), but each falls short on one of several counts:

  • Doesn't produce high-quality video
  • Doesn't support H.264/AAC
  • Doesn't always produce files the iPod can actually read (less of a problem with firmware 1.2, but still)
  • Can't encode directly from a DVD device
  • Can't encode from a pre-ripped DVD directory
  • Can't encode individual video files
  • Doesn't support all video formats
  • Can't batch process multiple tracks
  • Doesn't take advantage of multiple processors

So I wrote my own: podencoder

Prerequisites:

Please don't ask me for help installing these prerequisites. Consider it a character-building exercise.

You can run the script from the command line:

$ podencoder # interactively select tracks from default DVD device
$ podencoder -t longest # auto-select longest track and encode it
$ podencoder ./BBSDOC/ # encode from a pre-ripped directory instead
$ podencoder -t "2,3,4" ./BBSDOC/ # encode multiple tracks
$ podencoder -t "2,3,4,5" -i 7 -n "WEEDS1%i" \
     -o ~/Videos/ipod/Weeds/ ./WEEDS_SEASON_1_DISC_2/
   # encode tracks 2-5 from a DVD directory,
   # store them in ~/Videos/ipod/Weeds/
   # and name them WEEDS107.mp4, WEEDS108.mp4,
   # WEEDS109.mp4, and WEEDS110.mp4 respectively

You can set defaults in the configuration file (~/.podencoderrc) like a default DVD device (mine is /dev/scd0), a default output directory, and a default scratch directory.

outputdir=/home/mark/Videos/ipod
scratchdir=/home/tmp
device=/dev/scd0

I use /home/tmp as a scratch directory for video encoding because my /tmp directory is on a small partition. If you're encoding from a DVD or DVD directory, podencoder dumps the entire track to disk before encoding it, which can require up to 8 GB of free space.

On the other hand, if you dislike the command line, you can double-click podencoder or set up a shortcut in your favorite window manager or desktop environment, and it will graphically prompt you to select tracks from the default DVD device (screenshot -- note that it automatically selects the longest track for you). Or you can put a shortcut to it in your Nautilus scripts directory, like this:

$ ln -s /usr/local/bin/podencoder ~/.gnome2/nautilus-scripts/"Encode for iPod"

Then you can right-click a mounted DVD, or a DVD directory, or even a random video file, and select "Scripts → Encode for iPod".

This is my first shell script longer than 3 lines, so please take this opportunity to berate me for my newbie shell scripting mistakes.

Update: script updated with some feedback from the comments. Now uses /bin/sh for greater portability, and a new test to autodetect console-vs-graphical environment (thanks Bob).

Update June 7, 2007: this script is obsolete. Read about the updated version.

§

Ultra-condensed version: I'm videoblogging now. To watch the videos online, you just need Flash 9 or later. To watch the downloaded videos, Linux users should install MPlayer, Windows users should install FFDSHOW and Media Player Classic, and Mac users should upgrade to QuickTime 7.

Introduction

I recently started posting videos on my blog (It's the Dive Into Mark show!; You make bunny cry; Waiter, there's a fly in my studio). The videos are closed captioned for the hearing impaired. Herein lie the gory details of how to watch the videos and display the captions.

Briefly, all videos are encoded in a particular format (called a codec). To play a video, you need software that understands the video codec. To hear the audio while the video is playing, you need software that understands the audio codec. The video and audio (and captions) are stored in a single file (called a container), so your software needs to understand the container format too.

My videos use an MPEG-4 container with H.264 video, AAC audio, and 3GPP timed text captions. Contrary to popular belief, these are not QuickTime-specific or Apple-proprietary formats. They are all international standards that QuickTime happens to support (with some limitations, see below). Other players also support these standards, so you have plenty of choices.

Some players support all of these codecs natively, without installing additional software. Other players assume you have already installed the codecs separately. Some players support the video and audio but not the embedded captions (in 3GPP timed text format), but they can load captions from a separate file instead. For these players, I also make the captions available in a separate file in SubRip (.srt) format. If you put a foo.mp4 video file and a foo.srt captions file in the same directory, these players are smart enough to automatically load the separate captions file and display the captions while you watch the video.

I publish two versions of each video, a small one and a large one. Even with the proper software, the large version requires a relatively fast computer (by today's standards). In my tests, a 1.5 GHz laptop stutters enough to make the large video unwatchable, but a 2 GHz laptop plays it without a hiccup. Your mileage may vary.

QuickTime 7

Apple's QuickTime 7 (Mac OS X, Windows) can play the videos. It includes the standalone QuickTime Player and a browser plug-in so you can watch the videos in your browser. However, due to what I consider a bug, it will only display the embedded captions if the video file has a .3gp extension. If you want to watch the videos with captions, you will need to download the video and change the file extension from .mp4 to .3gp. Please note that QuickTime requires an extraordinary amount of CPU power to display captions -- much more than playing the same video without captions. There is nothing I can do about this except recommend that you download a better video player.

The videos require QuickTime 7 or later; earlier versions of QuickTime do not have the necessary codecs. Mac users can check their QuickTime version by going to System Preferences → QuickTime → "About QuickTime...". Windows users can go to Start → Control Panel → QuickTime → "About QuickTime...".

iPod / iTunes

Apple's "5th generation" video iPod can play the small version, but it will not display the captions. The small video is already in an iPod-compatible format; just download it, import it into iTunes, and sync it to your video iPod. Better yet, have iTunes download it for you by selecting Advanced → "Subscribe to podcast" and pasting the address of my syndicated feed.

Democracy Player

Democracy Player is a Free Software video aggregator that lets you subscribe to video podcasts and automatically download videos. It can play the videos, but it will not display the embedded captions (bug 3422) and does not support auto-downloading a separate captions file (bug 3423). Select "Add channel..." and paste the address of my syndicated feed.

Totem

Totem (Linux) is the GNOME media player. Totem does not support any particular video or audio codecs; it relies on you having already installed the necessary codecs. To install the codecs under Ubuntu Dapper, download EasyUbuntu and select "Free Codecs" in the Multimedia tab. Totem requires a separate captions file to display the captions.

MPlayer

MPlayer (all platforms, but I would only recommend it on Linux) can play the videos without installing any additional software. It requires a separate captions file to display the captions.

Ubuntu Dapper users can install MPlayer by enabling the universe and multiverse repositories and selecting MPlayer from Synaptic Package Manager (or typing sudo apt-get install mplayer). If you want to watch videos within Firefox, install the mozilla-mplayer package.

MPlayerOSX (B8r5) is a port of MPlayer to Mac OS X. It can play the videos but doesn't support the embedded captions and doesn't support loading captions from a separate file. (This is almost certainly a bug in the GUI, since you can manually find and run the mplayer executable within the MPlayerOSX.app folder and it will auto-detect and display the separate captions file.)

VLC

VLC (all platforms) 0.8.5 or later can play the videos and the embedded captions, without downloading a separate captions file or installing any additional software. Captions are off by default; to display them, select Video → Subtitles → "Track 1 [english]".

Ubuntu Dapper ships with an older version of VLC that distorts the picture in certain frames. Install the latest version of VLC. If the picture is still distorted, go to Settings → Preferences → Input / Codecs → Other codecs → FFmpeg, check the "Advanced options" checkbox in the lower right corner, set "Skip the loop filter for H.264 decoding" to "All", and restart VLC. (This tip epitomizes everything that is wrong with Linux.)

Media Player Classic

Media Player Classic (Windows) is a Free Software video player. It does not support any particular video or audio codecs; it relies on you having already installed the necessary codecs. FFDSHOW will install the necessary codecs (the default install options are fine).

Media Player Classic can display the embedded captions, without a separate captions file. To display the embedded captions, go to View → Options → Playback → Output, set "DirectShow Video" to "VMR9 (renderless)", and restart MPC.

The Core Media Player

The Core Media Player (Windows) is a WinAMP-like video player. It can play the videos once you install FFDSHOW. It requires a separate captions file to display the captions.

Windows Media Player

Microsoft's Windows Media Player can play the videos once you install FFDSHOW, but it doesn't support the embedded captions and doesn't support loading captions from a separate file.

Kaffeine

Kaffeine is a media player for KDE. It can play the videos once you install libxine-extracodecs. It claims to be able to display captions, but selecting a "subtitle channel" from the menu does nothing for me.

§