Most of my recent writing has happened elsewhere.
That last article came about during the creation of mimesniff, my open source Python 3 library that implements the HTML5 Content-Type detection and character encoding detection algorithms.
If none of that is your cup of tea, here is a picture of my dog Beauregard, enjoying the beautiful North Carolina summer weather.
Google (my current employer) has finally open sourced protocol buffers, the data interchange format we use for internal server-to-server communication. The blogosphere's response? "No wireless. Less space than a Nomad. Lame."
Protocol buffers are "just" cross-platform data structures. All you have to write is the schema (a
.proto file), then generate bindings in C++, Java, or Python. (Or Haskell. Or Perl.) The
.proto file is just a schema; it doesn't contain any data except default values. All getting and setting is done in code. The serialized over-the-wire format is designed to minimize network traffic, and deserialization (especially in C++) is designed to maximize performance. I can't begin to describe how much effort Google spends maximizing performance at every level. We would tear down our data centers and rewire them with $500 ethernet cables if you could prove that it would reduce latency by 1%.
Besides being blindingly fast, protocol buffers have lots of neat features. A zero-size PB returns default values. You can nest PBs inside each other. And most importantly, PBs are both backward and forward compatible, which means you can upgrade servers gradually and they can still talk to each other in the interim. (When you have as many machines as Google has, it's always the interim somewhere.)
Comparisons to other data formats was, I suppose, inevitable. Old-timers may remember ASN.1 or IIOP. Kids these days seem to compare everything to XML or JSON. They're actually closer to Facebook's Thrift (written by ex-Googlers) or SQL Server's TDS. Protocol buffers won't kill XML (no matter how much you wish they would), nor will they replace JSON, ASN.1, or carrier pigeon. But they're simple and they're fast and they scale like crazy, and that's the way Google likes it.
Despite a complete lack of fanfare or self-promotion, much of the Python-loving world seems to have found my Universal Encoding Detector, which is a pure-Python port of Mozilla's encoding detection. UED is used in a variety of end-user applications and other developer libraries, including:
And probably some others I don't know about.
This is what it feels like to be an upstream author. And I use the term "author" loosely, since all I did was port somebody else's wicked-smart algorithm, introduce new bugs, and write a few incoherent pages of documentation. But still, it is humbling to step back and observe the enormous worldwide community that is constantly packaging, updating, integrating, and distributing this stuff.
Yeah, I didn't see that coming either.
The major new feature in this release is improved performance, thanks to a patch from Juri Pakaste. Under Python 2.2, this version runs twice as fast as previous versions. Under Python 2.3, it runs five times as fast. No kidding. Thanks, Juri. Juri is the project lead of Straw, a desktop aggregator for Linux, which uses the Universal Feed Parser.
Other changes in this release:
Content-Typeheader (such as
text/plain) and sets
NonXMLContentType. Such feeds can never be well-formed XML; in fact, they should not be treated as XML at all. (Note that not everyone shares this view.)
Content-LanguageHTTP header as the default language, if no
<dc:language>element is present.
zopeCompatibilityHack(), which makes the
parse()routine return a regular
dictinstead of a subclass. I have been told that this is required for Zope compatibility (hence the name). It also makes command-line debugging easier, since the
pprintmodule inexplicably pretty-prints real dictionaries differently than
xml:lang=""for setting the current language to "unknown." This behavior is straight from the XML specification. Anyone who tells you that good specs don't matter is lying, or ignorant, or trying to sell you a bad one, or... hey look, shiny objects!
version="rss10"even when the RSS 1.0 namespace is not the default namespace.