Skip to main content

The first 90% of John Gruber's And Oranges is excellent. Everyone should read it, and I'm not just saying that because it's all about me. Unfortunately, the last 10% goes right off the rails, so naturally that's where I'm going to start.

John writes:

And the truth is I'm not entirely sure he's making the right decision, even for himself. Forget all the niggling details he cites, and focus only on his central beef -- that Apple is a company that does not "get" openness, and that this deficiency is going to hinder Pilgrim's long-term access to the data he's creating. But if that's the case, and Pilgrim has been using Apple computers for 22 years, why hasn't it happened already?

It has happened already, John. Over and over again.

1983-1989

Years of hacking on an Apple //e, writing programs in Applesoft BASIC, Apple Pascal, and 6502 assembly language. All for a platform that doesn't exist and can only be emulated with the help of ROMs which are illegal to redistribute. Years of writing bad poetry, short stories, and letters in AppleWriter and later AppleWorks. At one point in the distant past I bought a copy of MacLinkPlus and converted them all (not 100% faithfully) to Word, and later converted those (again, not 100% faithfully) to RTF. (How can you fuck up converting text files, I hear you ask. Well, I had this e.e.cummings thing going for a couple of years, in which whitespace was supremely significant. Conversion sometimes lost that, or muddled it. This also helps to explain why I fell in love with Python instead of Perl, but never mind that. Then there's the whole character encoding problem.)

1990-1995

Years of hacking on various Macs, including a Mac LC, Mac IIci, and PowerMac 8500. All targeted at OS 6 through 9, using Apple-specific toolkits and libraries. None run natively in OS X and therefore will not run on modern Intel Macs (or any other platform). They can only be emulated with the help of ROMs which, once again, are illegal to redistribute. Years of writing school papers, newspaper articles, more bad poetry, and half a novel in a never-ending stream of word processors (WriteNow, MacWrite, ClarisWorks, Word, etc). I managed to convert some of these to modern formats, but again, not 100% faithfully.

1996-2000

Not much, honestly. I was stoned most of the time. I can't blame that on Apple.

2001-present

Years of creating content, most recently video content in iMovie. Home movies of my children being born and growing up, heavily edited and burned to DVD and distributed to friends and family. Plus a few screencasts and some other odd video projects I've never released. Years of tagging and organizing an ever-growing collection of music, photos, and multimedia. I've now exported all my home movies as .DV files -- one for the final product, one for all the unused clips. All other edits are lost. All editability is lost. All my iTunes ratings and playlists are lost. All my iPhoto tags and ratings are lost. John has heard this part already; I can't imagine why he thought it was an isolated occurrence.

Oh, and let's not even talk about all the mail programs I've used. Eudora, Claris Emailer, Outlook, Outlook Express, Pine, Elm, and no doubt a few I've forgotten. When I finally came to my senses in 2001, I somehow managed to collect, convert, and salvage most of my mail and landed them all in Mail.app. I specifically chose Mail.app because I knew that it stored everything in mbox format, and that that was the oldest, most stable, safest choice for long-term preservation.

And then came Tiger, and Mail.app 2.0. In Mac OS X 10.4, Apple deliberately changed Mail.app to use their proprietary .emlx data format, apparently to work around the limitations of Spotlight. Mail.app 2.0 helpfully auto-converted all my wonderful mbox files into Apple's shitty undocumented format. I'm now in the process of undoing the damage. I tried an emlx-to-mbox converter program, but it has bugs that ruin certain mail messages and corrupt the resulting mbox file. (Specifically, mail messages that contain a line that starts with the word "from".) Perhaps JWZ's emlx.pl script will fare better. JWZ knows mail. [Update: thanks to everyone who emailed me suggesting I "think different" about the problem. After an hour of wrangling configuration files and cursing Apple, I successfully set up Dovecot and migrated all my mail via IMAP. Whoops, no I didn't. Somehow it gets stuck and only exports selected messages from large folders, somewhere around 1000 messages. Can't see the pattern as to which messages get dropped. Perhaps I could split my 66,000 messages into 66 different folders and then reassemble them on the other side.] [Update 2: no thanks to macosxhints.com which suggests using Mail.app's "Save as" feature to export my messages. This feature is so broken as to be useless.]

This was really the last straw for me. I was already feeling vaguely dissatisfied with Apple; now I feel actively betrayed. By the time I even realized what had happened (a year after buying OS X 10.4), it was too late. Now I'm forced to migrate all my mail yet again from yet another proprietary format, and the best documentation I've found so far is on LiveJournal. Jesus H. Christ, somebody deserves to be fired for that.

On data fidelity

There's an important lesson in here somewhere. Long-term data preservation is like long-term backup: a series of short-term formats, punctuated by a series of migrations. But migrating between data formats is not like copying raw data from one medium to another. If I can plug both types of media into the same computer (or even the same network), I can migrate raw data from one generation to the next (I just did it with my ReadyNAS). Then there are various things you can do (checksums and so forth) to verify that the data was copied 100% correctly. But converting data into a different format is much trickier, and there's the potential of data loss or data degradation at every turn.

Fidelity is not a binary thing. Data can gradually degrade with each conversion until you're left with crap. People think this only affects the analog world, like copying cassette tapes for several generations. But I think digital preservation is actually much harder, in part because people don't even realize that it has the same issues.

(Of course, sometimes fidelity is a binary thing. Why do I avoid DRM? Because the entire point of DRM is to make migration impossible, to reduce the fidelity of your conversion to 0. Apple's iTunes DRM is actually the oddball here, since it is technically possible to migrate the songs you buy from the iTunes Music Store. Of course, you have to burn the songs onto a CD (assuming iTunes will let you) and then you can re-rip them in the format of your choice. This involves some loss of fidelity, but at least it's technically possible. Other DRM schemes are even worse. But note that Apple's DRM scheme has gotten worse since they first introduced it. That alone should be enough of a deterrent for people, but apparently it isn't.)

So if you care about long-term data preservation, your #1 goal should be to reduce the number of times you convert your data from one format to another. You should also strive to increase the fidelity of each conversion, but you may not have any control over that when the time comes. Plus, you may not know in advance how faithful the conversion will be, so planning ahead to reduce the number of conversions is a better bet.

Risk factors

Once I realized this, I started thinking about the risk factors that would increase the number of conversions. Data readable by only one application is a big risk factor, because the application won't be around forever. If that application only runs on one operating system, that's even worse, because the operating system won't be around forever either. If that operating system only runs on one hardware platform, that's even worse still. No hardware lasts forever, and you may eventually need to resort to emulating the hardware in software. Emulation is the ultimate fallback. But if any or all of those layers are closed, emulation may be costly or even impossible. And if any of the layers are DRM-encumbered, emulating them may be illegal. Data preservation is an ogre, and ogres have layers.

In the extreme case, you can try to pick a format up front and never change it. Project Gutenberg insists on publishing their e-books as plain ASCII text, even their most recent ones. The conversion from paper introduces a number of errors, which they clean up by hand, but that's it. They're not ever planning to convert them to another digital format, so the data fidelity will never go down during a less-than-perfect conversion. And ASCII is old and stable and safe and upward compatible with newer encodings like UTF-8 and can be read by any program on any platform, so it's a safe choice.

Thinking in terms of risk factors, you can begin to see why I chose to switch away from Apple and onto a Free Software platform. Mac OS X is only available from Apple, and it only runs on Apple hardware. Running a Free Software operating system removes both of these risk factors at once. Furthermore, Apple has made it very clear that they will do everything in their power to protect this lock-in. Despite the fact that their Intel-based operating system could run on commodity hardware, Apple has intentionally crippled Mac OS X with code that checks the hardware to ensure that it came from Apple. They're intentionally introducing friction between the layers, bolting their operating system onto their hardware. Some day there will be no hardware that can run Mac OS X, and because of Apple's DRM it will be illegal to emulate it in software.

There are more risk factors in the layer above the OS, the application layer. I still need to be vigilant about the formats that specific applications use to store data I care about preserving. Open source != open formats, and there are many examples of undocumented and underdocumented data formats in open source applications. The GIMP is a particularly egregious example. Its default .xcf format can only be read by GIMP and is deliberately undocumented outside the source code. GIMP only exports to formats with massive fidelity loss (you can export the final result but not in any editable form that includes layers and effects and brushes and so on). There are only a handful of third-party converters, and none of them are anywhere near complete. This is no better than Microsoft Office; in fact, it's probably worse. In practice, Microsoft Office documents have better interoperability, because third parties have spent more time reverse-engineering the formats and handling all the edge cases. (Third parties are working on reverse-engineering XCF too.)

Storing my data in open formats also mitigates several risk factors, but not the same factors as running Free Software. Open formats increase my chances of finding alternate applications that can read my data in its current form, which in turn increases my chances of being able to migrate one of the other layers (OS or hardware) without being forced to convert my data to another format. Open formats also increase my chances of maintaining data fidelity during conversion, since it decreases the difficulty of developing a converter that handles all possible cases. It's always the edge cases that come back to bite you.

Conclusion

I'm not claiming that either Free Software or open formats are a silver bullet. There are many risk factors, and Free Software mitigates some of them some of the time. There are many layers -- data on top of applications on top of operating systems on top of hardware -- and open formats can reduce the friction between some of them some of the time. They're both lubricants that help you to slide out one layer and replace it without the whole thing toppling down. Apple would prefer that I not replace any of their layers, and they have gone out of their way to increase the friction between them.

Which brings us back to John Gruber's oranges. His counter-argument -- that lock-in hasn't been a problem for me yet, so why all the fuss now -- could not be further from the truth. It's been a constant problem for 22 years. Much of the data I've spent my life creating has been lost or seriously degraded through a series of proprietary formats and forced migrations. This is why I felt so betrayed, in particular, by Mail.app "upgrading" me away from mbox format. It took a lot of forethought on my part, not to mention actual time and effort, to convert all my disparate mail archives from all those different mail programs. I finally got everything into a single archive in an open, stable format... and just 3 short years later, Apple found a way to screw me one last time. It'll be the last time they get the chance.

§

I've long been an advocate of Free Software. I've been a card-carrying associate member of the Free Software Foundation since 2002. I've been writing GPL software since 1993. The Mac is a thread woven through the tapestry of my life. For many years, Apple's combined offering has been impressive enough to keep me paying for both their hardware and their software. But lately their software has been getting weaker (and more restrictive), to the point where I've found myself researching alternatives, even on Mac OS X.

  • Safari? No thanks, I choose Firefox (and later Camino).
  • iChat? No thanks, AdiumX talks to everyone, not just your business partners.
  • QuickTime? No thanks, VLC plays everything, and in full-screen.
  • Terminal? No thanks, iTerm has tabs.

And so forth. In fact, I spend the vast majority of my time using these and other open source applications (Carbon Emacs, Colloquy, Audacity, Seashore, Python, and a variety of command-line tools). Why keep running them on an operating system that costs money and restricts my rights and my usage?

(I would like to point out that it is entirely Apple's choice that their operating system does not run on my new Lenovo ThinkCentre. I'm not saying it was a bad business decision -- they are a hardware company, after all -- but it is particularly galling to realize that if I bought a new Mac, I would be subsidizing the development of an operating system that contains code whose sole purpose is to lock me into a specific hardware platform. I realize that most people don't look at it that way, but there it is.)

And what about those wonderful Apple programs that I haven't replaced with open-source alternatives? I loved iPhoto until my iPhoto database got corrupted one day, and I lost all my ratings, keywords, and albums because that information is stored in an undocumented binary black hole. Yeah yeah, I know about AlbumData.xml. That has its own problems, and in my case it was already corrupted by the time iPhoto noticed. I'll give them some credit for trying.

Similarly, I loved iTunes until my iTunes database got corrupted, too. Once again, I lost all my ratings and about two dozen well-thought-out interlocking "smart" playlists. And once again, all of the irreplaceable metadata was stored in an undocumented binary black hole. Yeah yeah, the XML backup again. iTunes even helpfully offered to restore from it... except that it didn't restore any of my aforementioned metadata, so it's not really a backup, is it? "A" for effort, "D-" for implementation.

Meanwhile, I've already been stung by iMovie's lack of support for Edit Decision Lists. Luckily I never got locked into Keynote. (I've been using S5 ever since I got burned by PowerPoint.) And don't even get me started on the iTunes Music Store and the ever-increasing number of tie-ins in each new version of iTunes.

I'm creating things now that I want to be able to read, hear, watch, search, and filter 50 years from now. Despite all their emphasis on content creators, Apple has made it clear that they do not share this goal. Openness is not a cargo cult. Some get it, some don't. Apple doesn't.

You may think that this is all some sort of after-the-fact rationalization of my non-Apple purchase, but my coworkers (and my wife) will attest that I've been complaining about these issues for a long time. A few months ago in Austin, I monopolized an entire table of friendly coworker bar banter with a rant about Apple's lock-in. And astute readers may recall that I've been wary of iPhoto and iTunes for years.

In many ways, the tale of my switch is more of the same old story. Mac OS X was "free enough" to keep me using something that was not in my long-term best interest. But as I stood in the Apple store last weekend and drooled over the beautiful, beautiful hardware, all I could think was how much work it would take to twiddle with the default settings, install third-party software, and hide all the commercial tie-ins so I could pretend I was in control of my own computer. Beauty is in the eye of the beholder, and to my eye Apple isn't beautiful anymore. I've worked around it or ignored it for a long time, but eventually the bough breaks.

§

Radio users: to subscribe to diveintomark, go to your subscriptions page and add http://diveintomark.weblogger.com/xml/scriptingNews2.xml.

Setting hidden Mozilla preferences.

Analysts shrug at Oracle's new software. Oracle promised the applications would be easy to implement. In reality, many companies found an unusual number of glitches in the software and put the brakes on their projects.

Opera 6 review. [via Living Without Microsoft] If you're happy with your current Netscape or Microsoft browser, there's not much reason to switch to Opera 6.0. I tried it for a few weeks but switched back because it has some weird bug where it get halfway through loading a page and then act like it was done, but it wasn't. It seemed to happen more when my wireless connection was poor, or if I was saturating the router by downloading a Mandrake ISO. IE and Mozilla fared fine under the same circumstances.

C# mode for Emacs.

Wireless LANs: Trouble in the air. The vulnerability of the American Airlines wireless LAN networks was highlighted by the fact that the security specialists witnessed an intrusion while conducting their monitoring.

AirPort Base Station administration tool for Windows.

The Onion presents: Dating Tips. Never date a married person, unless he or she is just about to leave his or her spouse and simply waiting for the right moment.

Creature Comforts. Yes, that Creature Comforts. Requires Quicktime, Real, or Windows Media Player.

New Wallace and Gromit coming soon. [via Slashdot]

Wallace and Gromit are back!

Bruce Schneier: Cryptogram 1/2002.

Zimran Ahmed: License 6.0 removes the only competitor Microsoft has left -- older versions of Microsoft [products].

A Cop in Every Computer. [via FOS] Introduction to SSSCA, a proposed bill that would mandate (read: require but not pay for) industry-standard security controls on every computer connected to the Internet. (And by "industry", we mean "entertainment industry". And by "entertainment industry", we mean "Disney".) This proposed law is so bad, even the BSA (the anti-piracy organization that threatens businesses with allegely-random-just-like-those-strip-searches-at-the-airport audits checking for software licensing violations) -- even the BSA is against it. 'We think mandating these protections is an abysmally stupid idea,' says Emery Simon, special counsel to the Business Software Alliance.

O'Reilly: Why we've embraced Mac OS X. [via mac.scripting.com] Because it sells books. No, really. New tech, increasingly popular, sells books, let's cover it. Seems like a good business model to me.

National ID cards roundup

Gates' vaccination program attacked. The charity says Gavi has a billion-dollar budget but only a five-year mandate. Save the Children says it is concerned that at the end of that period, poor countries may have come to depend on expensive vaccines for minor diseases.

A CS professor at McGill University is using my book in his advanced software design class. Specifically, this chapter on unit testing. That's so cool.

Napster still sucks. [via Slashdot]

e.e. cummings reads four poems (MP3) [via Salon]

FreeBSD changes ownership again. [via Slashdot] Meanwhile, coding continues unabated, oblivious to such trivial details as corporate-ownership-of-the-week. Get what you pay for, my ass.

Radio 8 supports Manila. Unfortunately, it only supports a particular type of Manila site, which this site is not, so I won't be using it after all.

AOL quietly launches "Magic Carpet". [via Tomalak's Realm] Not to be confused with Microsoft's Passport, or Liberty Alliance's vaporware.

Creating ASP pages with Python. [via Daily Python-URL] I've done this. I wouldn't recommend it, unless you're in a position where you absolutely must use Microsoft IIS for political or legacy reasons. Apache + mod_python is easier and doesn't put silly restrictions on what you can do in your server-side scripts.

Windows Media Player 'super-Cookies' pose privacy risk. The ActiveX interface to WMP helpfully provides a ClientID function which returns the MS-assigned GUID from your registry. This function is accessible from Javascript once the WMP ActiveX control is loaded, and can therefore be returned to the web server without your knowledge. The Register has more details, and sample code.

Problems with iPhoto. Import of existing pictures is a pain; pictures can be tagged with keywords, titles, and comments, but then there are no search features to use these; no easy way to burn a slide show (or any part of your photo library) onto CD; etc. Nothing earth-shattering -- what's there works, it's just a 1.0 app. iTunes 1.0 wasn't all that hot either, but 2.0 is phenomenal. I expect nothing less of iPhoto.

§

Actually, I liked my bread better before it had a privacy policy.

Opera 5 final for Classic Mac OS [via Glish.com]

Apple iPod on Linux. Just some ramblings and notes, but we're getting closer, including the scariest Python script I've ever seen. Apparently it's more difficult than I originally thought, because iPod uses the HFS+ file system, which has poor support on Linux.

While everyone else was drooling over the iDeskLamp, it went virtually unnoticed that Apple just entered the consumer web services business. I downloaded iPhoto the day it came out, imported all our 300+ images that we've accumulated in the past 2 years from Dora's digital camera, organized them into albums, and we picked about 30 that we really liked. Click, order prints, enter billing info (once), enter quantities and sizes (4x6 prints are $0.49 each; 5x7 are $0.99; larger sizes are available if you have a better camera than I do), click, watch it upload my pictures to some Kodak central server, "thank you for shopping with Apple", wait 3 days. It was beautiful, completely effortless.

I never thought I would spend money to get my digital pictures printed out, but Apple managed to make it easy enough to get my money. And it was so worth it; the pictures are absolutely gorgeous. Looking at them, I would never know they came from a digital camera. And it's only an old 1.2 megapixel camera at that!

This is a business model that makes sense to people: spend money, get real stuff. Organizing digital photos is free. (And by "free", I mean "requires no more than was included with Dora's iBook". iPhoto requires the very latest version of Mac OS X; it does not work on any previous version of OS X, and certainly doesn't work with OS 9. Be that as it may.) Publishing digital photos on the web is free. (iPhoto can create a set of local web pages with thumbnails and so forth, and you can put it anywhere. I would prefer it going a bit further, integrating with my web site by FTP. It integrates more seamlessly with mac.com, Apple's hosted web site.) Exporting photos to files in other formats is free. Simple photo editing is free. If you want to get real stuff (professionally printed pictures), you pay for it. I paid. they're gorgeous. I'll pay again.

§