Skip to main content

[en EspaƱol]

[These notes will eventually become part of a tech talk on video encoding. List of all articles in this series.]

The most important consideration in video encoding is choosing a video codec. A future article will talk about how to pick the one that's right for you, but for now I just want to introduce the concept and describe the playing field. (This information is likely to go out of date quickly; future readers, be aware that this was written in December 2008.)

When you talk about "watching a video," you're probably talking about a combination of one video stream, one audio stream, and possibly some subtitles or captions. But you probably don't have two different files; you just have "the video." Maybe it's an AVI file, or an MP4 file. These are just container formats, like a ZIP file that contains multiple kinds of files within it. The container format defines how to store the video and audio streams in a single file (and subtitles too, if any).

When you "watch a video," your video player is doing several things at once:

  1. Interpreting the container format to find out which video and audio tracks are available, and how they are stored within the file so that it can find the data it needs to decode next
  2. Decoding the video stream and displaying a series of images on the screen
  3. Decoding the audio stream and sending the sound to your speakers
  4. Possibly decoding the subtitle stream as well, and showing and hiding phrases at the appropriate times while playing the video

A video codec is an algorithm by which a video stream is encoded, i.e. it specifies how to do #2 above. Your video player decodes the video stream according to the video codec, then displays a series of images, or "frames," on the screen. Most modern video codecs use all sorts of tricks to minimize the amount of information required to display one frame after the next. For example, instead of storing each individual frame (like a screenshot), they will only store the differences between frames. Most videos don't actually change all that much from one frame to the next, so this allows for high compression rates, which results in smaller file sizes. (There are many, many other complicated tricks too, which I'll dive into in a future article.)

There are lossy and lossless video codecs; today's article will only deal with lossy codecs. A lossy video codec means that information is being irretrievably lost during encoding. Like copying an audio cassette tape, you're losing information about the source video, and degrading the quality, every time you encode. Instead of the "hiss" of an audio cassette, a re-re-re-encoded video may look blocky, especially during scenes with a lot of motion. (Actually, this can happen even if you encode straight from the original source, if you choose a poor video codec or pass it the wrong set of parameters.) On the bright side, lossy video codecs can offer amazing compression rates, and many offer ways to "cheat" and smooth over that blockiness during playback, to make the loss less noticeable to the human eye.

There are tons of video codecs. Today I'll discuss five modern lossy video codecs: MPEG-4 ASP, H.264, VC-1, Theora, and Dirac.

MPEG-4 ASP

a.k.a. "MPEG-4 Advanced Simple Profile." MPEG-4 ASP was developed by the MPEG group and standardized in 2001. You may have heard of DivX, Xvid, or 3ivx; these are all competing implementations of the MPEG-4 ASP standard. Xvid is open source; DivX and 3ivx are closed source. The company behind DivX has had some mainstream success in branding "DivX" as synonymous with "MPEG-4 ASP." For example, this "DivX-certified" DVD player can actually play most MPEG-4 ASP videos in an AVI container, even if they were created with a competing encoder. (To confuse things even further, the company behind DivX has now created their own container format.)

MPEG-4 ASP is patent-encumbered; licensing is brokered through the MPEG LA group. MPEG-4 ASP video can be embedded in most popular container formats, including AVI, MP4, and MKV.

H.264

a.k.a. "MPEG-4 part 10," a.k.a. "MPEG-4 AVC," a.k.a. "MPEG-4 Advanced Video Coding." H.264 was also developed by the MPEG group and standardized in 2003. It aims to provide a single codec for low-bandwidth, low-CPU devices (cell phones); high-bandwidth, high-CPU devices (modern desktop computers); and everything in between. To accomplish this, the H.264 standard is split into "profiles," which each define a set of optional features that trade complexity for file size. Higher profiles use more optional features, offer better visual quality at smaller file sizes, take longer to encode, and require more CPU power to decode in real-time.

To give you a rough idea of the range of profiles, Apple's iPhone supports Baseline profile, the AppleTV set-top box supports Baseline and Main profiles, and Adobe Flash on a desktop PC supports Baseline, Main, and High profiles. YouTube (owned by Google, my employer) now uses H.264 to encode high-definition videos, playable through Adobe Flash; Youtube also provides H.264-encoded video to mobile devices, including Apple's iPhone and phones running Google's Android mobile operating system. Also, H.264 is one of the video codecs mandated by the Blu-Ray specification; Blu-Ray discs that use it generally use the High profile.

Most non-PC devices that play H.264 video (including iPhones and standalone Blu-Ray players) actually do the decoding on a dedicated chip, since their main CPUs are nowhere near powerful enough to decode the video in real-time. Recent high-end desktop graphics cards also support decoding H.264 in hardware. There are a number of competing H.264 encoders, including the open source x264 library. The H.264 standard is patent-encumbered; licensing is brokered through the MPEG LA group. H.264 video can be embedded in most popular container formats, including MP4 (used primarily by Apple's iTunes Store) and MKV (used primarily by video pirates).

VC-1

VC-1 evolved from Microsoft's WMV9 codec and was standardized in 2006. It is primarily used and promoted by Microsoft for high-definition video, although, like H.264, it has a range of profiles to trade complexity for file size. Also like H.264, it is mandated by the Blu-Ray specification, and all Blu-Ray players are required to be able to decode it. The VC-1 codec is patent-encumbered, with licensing brokered through the MPEG LA group.

Wikipedia has a brief technical comparison of VC-1 and H.264; Microsoft has their own comparison; Multimedia.cx has a pretty Venn diagram outlining the similarities and differences. Multimedia.cx also discusses the technical features of VC-1. I also found this history of VC-1 and H.264 to be interesting (as well as this rebuttal).

VC-1 is designed to be container-independent, although it is most often embedded in an ASF container. An open source decoder for VC-1 video was a 2006 Google Summer of Code project, and the resulting code was added to the multi-faceted ffmpeg library.

Theora

Theora evolved from the VP3 codec and has subsequently been developed by the Xiph.org Foundation. Theora is a royalty-free codec and is not encumbered by any known patents other than the original VP3 patents, which have been irrevocably licensed royalty-free. Although the standard has been "frozen" since 2004, the Theora project (which includes an open source reference encoder and decoder) only hit 1.0 in November 2008.

Theora video can be embedded in any container format, although it is most often seen in an Ogg container. All major Linux distributions support Theora out-of-the-box, and Mozilla Firefox 3.1 will include native support for Theora video in an Ogg container. And by "native", I mean "available on all platforms without platform-specific plugins." You can also play Theora video on Windows or on Mac OS X after installing Xiph.org's open source decoder software.

The reference encoder included in Theora 1.0 is widely criticized for being slow and poor quality, but Theora 1.1 will include a new encoder that takes better advantage of Theora's features, while staying backward-compatible with current decoders. (Info: 1, 2, 3, 4, 5, source code.)

Dirac

Dirac was developed by the BBC to provide a royalty-free alternative to H.264 and VC-1 that the BBC could use to stream high-definition television content in Great Britain. Like H.264, Dirac aims to provide a single codec for the full spectrum of very low- and very high-bandwidth streaming. Dirac is not encumbered by any known patents, and there are two open source implementations, dirac-research (the BBC's reference implementation) and Schroedinger (optimized for speed).

The Dirac standard was only finalized in 2008, so there is very little mainstream use yet, although the BBC did use it internally during the 2008 Olympics. Dirac-encoded video tracks can be embedded in several popular container formats, including MP4, Ogg, MKV, and AVI. VLC 0.9.2 (released in September 2008) can play Dirac-encoded video within an Ogg or MP4 container.

And on and on...

Of course, this is only scratching the surface of all the available video codecs. Video encoding goes way back, but my focus in this series is on the present and near-future, not the past. If you like, you can read about MPEG-2 (used in DVDs), MPEG-1 (used in Video CDs), older versions of Microsoft's WMV family, Sorenson, Indeo, and Cinepak.

Tomorrow: audio codecs!

-chromeos

§

[Dive Into Python]

Please buy 4000 copies so I can pay back my advance. Thank you.

§

Universal Feed Parser 3.3 is out. You can download it at SourceForge. That package no longer includes the more than 2700 unit tests; they are now available separately.

The major new feature in this release is improved performance, thanks to a patch from Juri Pakaste. Under Python 2.2, this version runs twice as fast as previous versions. Under Python 2.3, it runs five times as fast. No kidding. Thanks, Juri. Juri is the project lead of Straw, a desktop aggregator for Linux, which uses the Universal Feed Parser.

Other changes in this release:

  • Refactored the date parsing routines, and added a new public function registerDateHandler().
  • Added support for parsing more kinds of dates, including Korean, Greek, Hungarian, and MSSQL-style dates. Thanks to ytrewq1 for numerous patches and help refactoring the date handling code.
  • In the "things nobody cares about but me" department, UFP now detects feeds served over HTTP with a non-XML Content-Type header (such as text/plain) and sets bozo_exception to NonXMLContentType. Such feeds can never be well-formed XML; in fact, they should not be treated as XML at all. (Note that not everyone shares this view.)
  • Documented UFP's relative link resolution.
  • Fixed problem tracking xml:base and xml:lang when one element declares it, its child doesn't override it, its first grandchild does override it, but then its second grandchild doesn't.
  • Use Content-Language HTTP header as the default language, if no xml:lang attribute, <language> element, or <dc:language> element is present.
  • Optimized EBCDIC to ASCII conversion.
  • Added zopeCompatibilityHack(), which makes the parse() routine return a regular dict instead of a subclass. I have been told that this is required for Zope compatibility (hence the name). It also makes command-line debugging easier, since the pprint module inexplicably pretty-prints real dictionaries differently than dict subclasses.
  • Support xml:lang="" for setting the current language to "unknown." This behavior is straight from the XML specification. Anyone who tells you that good specs don't matter is lying, or ignorant, or trying to sell you a bad one, or... hey look, shiny objects!
  • Recognize RSS 1.0 feeds as version="rss10" even when the RSS 1.0 namespace is not the default namespace.
  • Expose the status code on HTTP 303 redirects.
  • Don't overwrite the final status on redirects, in the case where redirecting to a URL returns a 304, or another redirect, or any non-200 status code.

§

Universal Feed Parser 3.2 is out. You can download it at SourceForge.

The main new feature in version 3.2 is completely revamped handling of character encoding. Previous versions relied on an odd combination of "do it in Python" and "let the XML parser handle it." This version does everything in Python, then converts the feed to UTF-8 before handing it off to the XML parser. Every XML parser on Earth supports UTF-8.

When I say "do it in Python," I don't mean actual Python code. Python has a surprisingly sane API for handling the insanity that is character encoding, and this makes it easy for third-party libraries to extend Python's built-in encodings module to support additional encodings. One such module, CJKCodecs, adds support for Chinese, Japanese, and Korean encodings. CJKCodeces will be part of Python 2.4, but it is also downloadable for Python 2.1 and above. Another module, iconv_codec, is a Python wrapper for the marvelous libiconv, which supports several hundred encodings. Both are highly recommended, and Universal Feed Parser will use both if available.

Of course, nothing is ever as simple as it sounds. In rare cases, the character encoding of the feed is explicitly specified in the charset parameter of the Content-type HTTP header. But in most cases, you need to look at encoding attribute in the XML declaration in the first line of the feed.

Previous versions of Universal Feed Parser naively used a regular expression on the raw byte stream to find the encoding attribute. This works most of the time, since many character encodings are compatible with the ASCII encoding for ASCII characters. (All the non-ASCII characters are encoded in the upper 128 characters of a byte, or in multi-byte sequences.) However, this assumption fails for multi-byte encodings, such as UTF-16 and UTF-32. It also fails for non-ASCII-compatible encodings, such as EBCDIC.

Section F of the XML specification provides a heuristic for determining whether an XML document is in a non-ASCII-compatible encoding, and which one. The heuristic is actually divided into two parts, because all XML documents are allowed to start with something called a Byte Order Mark (BOM), which is a specific Unicode character (U+FEFF) that looks different depending on the encoding and the byte order used in the document. (BOM FAQ) So one part of the heuristic deals with XML documents with a BOM, and the other part deals with XML without a BOM, but with an XML declaration. It turns out that the first 4 characters <?xm look different in every character encoding too.

I am pleased to announce that Universal Feed Parser now supports both parts of this heuristic. It can reliably detect and parse any feed encoded as UTF-32BE, UTF-32BE+BOM, UTF-32LE, UTF-32LE+BOM, UTF-16BE, UTF-16BE+BOM, UTF-16BE, UTF-16BE+BOM, UTF-8+BOM, or UTF-8. There are several new tests to confirm this.

Also EBCDIC. Did I mention it now supports EBCDIC? I've totally sold out to the BigCos. As an adjunct to JWZ's Law of Computer Envelopment ("every program attempts to expand until it can read mail"), I declare that every aggregator attempts to expand until it can read EBCDIC. You can use this test case to track your aggregator's progress.

As a bonus, since the entire character encoding determination is finished before the feed is handed off to a real XML parser, it works just as well for non-well-formed feeds. Have you ever wanted to parse an ill-formed CDF feed encoded as UTF-32 Little Endian with a Byte Order Mark? Universal Feed Parser can do that.

§

Version 3.1 of my Universal Feed Parser is out. You can download it at SourceForge.

Virtually all of the bug fixes and improvements in this release were suggested by Aaron Swartz. Thanks for that.

  • Convert HTML entities to their Unicode equivalents.
  • Improved test for the existence of a valid XML parser before declaring XML_AVAILABLE.
  • New optional handlers parameter takes an arbitrary list of urllib2 handlers. This lets you do things like download password-protected feeds.
  • Expose download-related exceptions in bozo_exception.
  • Add __contains__ method to FeedParserDict for better compatibility with Python 2.1.
  • Add publisher_detail.
  • Add documentation on namespace handling. The handling hasn't changed; it's just documented now.
  • Various minor improvements to documentation.

§