Skip to main content

David Flanagan, author of JavaScript: The Definitive Guide, writes:

For 15 years I've been one of those lucky authors who has been able to support himself and his family almost entirely on book royalties. But the publishing industry has been in decline and my royalties checks have decreased more-or-less steadily since the dot-com bust, and I've now decided that I need to look for a salaried job.

15 years is a long time. What a blessing to have been able to do something you love for that long, and get paid for it.

But then he goes off into the weeds a little bit and starts in with his opinions of piracy and Google's role in the world. [Note: I work for Google but not on search. I don't speak for Google and they don't speak for me. It works out well.]

David continues:

I was trying to be provocative when I tweeted the question "Does Google enable piracy?" But I do think it is a valid question. If Google indexes sites like ebookee that link directly to download sites and makes it easy to find the pirated content you want and even offers suggestions on what to search for, I think there is a case to be made that they're encouraging piracy.

Note how quickly we've moved the goalposts from "enabling" piracy to "encouraging" piracy. Lots of technology "enables" piracy; after all, it's only 1s and 0s. In the analog world, even libraries "enable" piracy by putting a photocopier in the same building as a bunch of books. But do libraries "encourage" piracy? No, there are big signs next to the photocopier warning you about copyright law, and photocopies are prohibitively expensive to do anything more than copy a few pages for reference. Does technology "encourage" digital piracy? No, it's only 1s and 0s.

More importantly, contrary to David's assertion, ebookee does not have David's latest book. The entire site is a scam to get people to sign up for Usenet or "premium" file sharing services. They have a page for every book in the universe. A small fraction of them actually have download links, most of which are broken. Mostly, the site just goes round and round. There's a case to be made that the site should be delisted because it's fucking useless, but its existence in search results does not bolster David's argument that Google is "encouraging" piracy.

And now JavaScript: The Definitive Guide is out. I don't have a copy of it yet, but illegal copies are free for anyone who wants one.

This is not true; see above.

And Google will suggest those illegal downloads to anyone who tries to research the book (see the screenshot).

That screenshot actually shows the results of Google's existing filtering program for piracy-related terms like "bittorrent," "rapidshare," and "megaupload." Without that filtering, the suggestion box would be full of piracy-related terms. But more to the point, it would be full of piracy-related terms because that's what people search for. Google's suggestions come from actual searches. It's a mirror onto the world, descriptive not prescriptive. If you don't like how the world looks in the mirror, don't blame the mirror.

David continues:

Here are some small steps that might help:

  • Google could filter its search suggestions so that they do not actively suggest piracy.

Google already does this; see above.

  • Google could flag (without filtering) search results that are likely links to pirated content. Google already flags some results with "this site may harm your computer". Why not flag pirate sites: "Downloading content from this site may result in legal action by the copyright holder" or "Downloads from this site may be illegal". Or nice and simple: "this site may harm your karma".

The difference is that "this site may harm your computer" is based on an objective measurement. You can read how it works. Suspected sites are automatically verified in a virtual machine running an unpatched browser. It is both fascinating and mind-boggling to imagine that it works at all, much less works at Internet scale in near-real time.

On the other hand, legal concepts like "copyrighted material" are more difficult to automate at Internet scale. This is not to say it's impossible; YouTube has its Content ID program for audio and video, but it relies on the fact that people are actually uploading content to YouTube directly. To replicate this program on RapidShare, Google would need to download everything from RapidShare in order to identify it. Ironically, RapidShare makes this technically difficult in order to discourage third-party downloader programs that help users "steal" content from RapidShare without viewing ads or paying for "premium" membership.

And even if all the technical hurdles could be overcome, it still wouldn't necessarily warrant flagging sites like ebookee, which only hosts links to infringing content and not the content itself. (And if Google did start flagging them, they would just add another layer of indirection, or cloak their download URLs, or some other damn thing. You'll never beat piracy this way.)

Speaking of "beating" piracy, an anonymous commenter makes this point:

One of the reasons iTunes is so successful is that they successfully compete with the file sharers. ... Why should I waste time looking for an mp3 that may or may not be of any decent quality when I can download immediately at low cost from iTunes?

Commenter "Peter" makes a related point:

I have been an OReilly Safari subscriber for several years. I can recommend this to every developer out there. ... Yet, must admit it still pains me that for ~$500/year we as honest subscribers can not get the same convenience (offline access, unencumbered PDF's) as people who just download a pirated PDF library for free.

So is piracy really the problem? Is it even a problem? David has provided no evidence that his book is, in fact, wildly pirated. It's not even available yet from dedicated pirate sites. But the larger, more disturbing question is this: who bothers to steal books these days when you can go to Stack Overflow or a web forum or, yes, even Google, type a question, and get an answer?

I'll close with this observation from "Curt":

Most technical book are actually really painful to navigate, but at [one] time they were the only option, now I can find context relevant information in seconds hence books are less convenient and they cost money. The default is now deeply linked, highly specific data. IMHO you are not losing money to piracy, you are failing to make money due to the inadequacy of the book as a medium for technical data.

I think David would actually agree with this. In response to another comment, David himself wrote:

I will say that my book was written as a book, and it probably wouldn't work well online (regardless of whether it could work financially that way). And maybe that is a big piece of my revenue problem: I'm producing content in an old-fashioned medium.

The "book" is dead. Long live "content." And God help us all if world-class writers like David can't make a living from it.

§

My big accomplishment of 2010 was finishing the first edition of Dive Into HTML5 and working with O'Reilly to publish it on paper as HTML5: Up & Running (as well as several downloadable DRM-free formats). I also accomplished a few minor personal things, but in this post I'm going to focus on the book.

The book went on sale in mid-August and earned out almost immediately. "Earning out" is a publishing term which means that the book has sold enough copies that my cut of the profits has paid back the advance payments that O'Reilly gave me during the writing process. Which means that I'm already receiving royalty checks for real money. Of the four books I've published through traditional publishers, this is only the second book to earn out. (The original Dive Into Python was the first, and it was on sale for over two years before it earned out.)

I write free books and people buy them. It works out surprisingly well.

"HTML5: Up & Running" sold over 14,000 copies in the first six weeks, of which about 25% were digital downloads and 75% were books on paper. Folks sure do love them some paper. The book continues to be available online for free, as it was during the entire writing process, under the liberal Creative Commons Attribution license. This open publishing model generated buzz well in advance of the print publication, and it resulted in over 1,500 pre-orders which shipped the day the book went on sale. Res ipsa loquitur.

The online edition at diveintohtml5.org includes Google Analytics so I can evilly track your every movement find out what the hell is going on. The analytics tell me many things. Some highlights:

  • Throughout 2010, the site served 2 million visitors and 3.9 million pageviews. Each chapter is on its own page because that's how I wrote the book (in HTML5). I don't need to inflate pageviews for non-existent advertisers (I work for Google so I'm not allowed to put ads on it anyway), and I never got around to writing a split-chapter-into-multiple-pages script.
  • 40% of the site's traffic came from search engines. 30% came from direct traffic or non-web applications like Twitter or email clients. 30% came from one of over 8,900 referring sites.
  • 98.7% of the search engine traffic came from Google. Less than 1% came from Bing. The rest came from search engines that I didn't know still existed.
  • John Gruber sent me three times as much traffic as Bing.
  • The most popular chapters tracked closely with the most popular incoming search keywords. HTML5 video was the most popular topic, logging almost half a million pageviews alone. #2 was web forms, followed closely by canvas, semantics, and Geolocation. Microdata was in dead last. Seriously, the shit that nobody gives about my beloved Peeks, Pokes & Pointers chart is rivaled only by the shit that nobody gives about microdata.
  • My little history of HTML logged almost a quarter million pageviews, and the average visitor spent almost four minutes reading it. (Only the video chapter was higher, at 4:45.) Folks love them some Internet folklore.
  • 6% of visitors used some version of Internet Explorer. That is not a typo. The site works fine in Internet Explorer -- the site practices what it preaches, and the live examples use a variety of fallbacks for legacy browsers -- so this is entirely due to the subject matter. Microsoft has completely lost the web development community.
  • 4% of visitors read the site on a mobile device. Of those people, 85% used an iOS device (iPhone + iPad + iPod Touch). 14% used Android, and the rest used mobile devices that I didn't know had browsers.
  • The site itself, its typography, and the book's live examples have led to bug fixes in at least four browsers and one font. Hooray for living on the bleeding edge.

Although it makes little sense to talk about "editions" of a web site (you can see a changelog if you like), O'Reilly and I have already discussed the possibility of doing a new edition of the printed book. Besides rolling up all the updates since August, we've discussed one chapter on Web Workers and another on web sockets. Since all the world's browsers have recently disabled their web sockets implementations due to a subtle (but fatal) protocol-level security vulnerability, the Web Workers chapter will probably come first. No promises, you understand. No promises at all.

If there are new chapters someday, I will urge O'Reilly to provide them for free to everyone who has already bought a digital copy. But understand that the final decision is not mine to make. Not mine at all. In any event, it will be available online at diveintohtml5.org for free, like the rest of the book.

I'm not big on predictions, but I do have one for 2011: HTML5 will continue to be popular, because anything popular will get labeled "HTML5."

§

Recently, someone did the unthinkable: they published their own version of Dive Into Python and got it listed on Amazon.com. This apparently caused a small firestorm within Apress, the exact details of which I am not privy to, but which (I am told) became a somewhat larger firestorm after the Apress executives realized they had no legal recourse, and asked my opinion on the matter. You see, the book is published under the GNU Free Documentation License, which explicitly gives anyone and everyone the right to publish it themselves. (I was about to write "gives third parties the right," until I realized that there are no third parties because there are no second parties. That's kind of the point.)

This didn't use to matter, because publishing on paper used to require a serious up-front investment in, well, paper. "Freedom of the press" was reserved for those with an actual press, and distribution costs were decidedly non-trivial. Publishing a book commercially just wasn't practical for anyone but, well, a book publisher. That's no longer the case. Copies can be purchased online, printed on demand, and drop-shipped to the customer -- up-front investment be damned. And that's for printed books; e-books are even easier.

Software had this problem first, by virtue of its non-corporeality. How many people are selling Free Software on eBay? We deride these sellers as "scammers," but in truth the only time they run afoul of the law is when they attempt to rebrand your software without acknowledgement, or when they fail to abide by some other intentionally inside-out clause of the license that you chose in the first place (e.g. selling GPL'd binaries without offering source code).

Still, there's a qualitative difference between letting people download your own work from your own site, and watching other people try to profit from it. But it is precisely this difference that strikes at the heart of the Free Software/Free Culture ethos. Part of choosing a Free license for your own work is accepting that people may use it in ways you disapprove of. There are no "field of use" restrictions, and there are no "commercial use" restrictions either. In fact, those are two of the fundamental tenets of the "Free" in Free Software. If "others profiting from my work" is something you seek to avoid, then Free Software is not for you. Opt for a Creative Commons "Non-Commercial" license, or a "personal use only" freeware license, or a traditional End User License Agreement. Free Software doesn't have "end users." That's kind of the point.

The aforementioned Apress executive told me that he did not understand why I would be willing to work with a publisher but then be happy about their competition. This is what I told him:

I enjoy working with publishers because it makes me a better writer. But I don't write for money; I write for love (or passion, or whatever you want to call it). I choose open content licenses because this is the way I want the world to work, and the only way to change the world is to change yourself first.

I don't know where that leaves you as a business. But you've made a good amount of money on the original "Dive Into Python," despite the fact that it's been available for free online for 8 years. A German translation of Dive Into Python 3 is being published this quarter by Springer/Germany [a division of Apress' parent company] almost simultaneously with the English edition -- much sooner-to-market than it would have been under a closed development process. (And an Italian translation was just released yesterday. You should snap that one up too before someone else does!) So maybe the problems you perceive are really opportunities in disguise.

So I am grateful for this anonymous soul who woke up one day and said to herself, "You know what I should do today? I should try to sell copies of that Free book that Pilgrim wrote." Grateful, because it afforded me the opportunity to remind myself why I chose a Free license in the first place. My Zen teacher once told me that, when people try to do you harm, you should thank them for giving you the opportunity to forgive them. In this case it's even simpler, because there's nothing to forgive, just explain. She's redistributing the work that I explicitly made redistributable. She's kind of the point.

§

Dive Into Python 4.4 is out. This is a major release, with an entirely new chapter on installing Python. This marks a turning point, when my book becomes our book, which is the first in a series of heartbreaking steps towards it becoming somebody else's book.

You see, my editor (who is a wonderful guy, and I do not fault him in the slightest for this decision) has insisted that we have a chapter on installing Python, because that's what people expect in a book called Dive Into Python: From novice to pro. Oh yeah, we're rebranding the book too. It's no longer a free book for experienced programmers. Well, it's still a free book, but we're downplaying that part. We're also downplaying the amount of experience you need to be able to dive into it. The preface -- which previously, and quite snottily, stated that this book assumed a lot about you and that if you were new to programming and wanted to learn Python, you should probably learn it somewhere else -- has been removed. We no longer assume a lot about you, apparently, beyond the ability to double-click and a willingness to blow $50 on a book you could download for free.

So anyway, the installation chapter is out of the way, barring feedback that I got everything wrong and ended up doing more harm than good and forgot somebody's favorite distribution and didn't cover Python for the AS/400 and suck. The next release will split up the longer chapters into 2, or sometimes 3, because my editor tells me that long chapters confuse some readers. Presumably (and this is just a wild guess) the same readers who need to be taught how to double-click an installer.

We will also be shuffling the order of chapters, which is probably a good thing in the long run. The book is going to be divided into three main parts: basics, web services, and advanced. Basics will be what used to be the first 3 chapters, which will actually be split into 5, plus the installation chapter. Web services will be the current XML processing chapter (split into 2), plus a bunch of new chapters I haven't written yet. The third part will be the current chapters on unit testing and functional programming, and a new chapter on refactoring and design patterns.

My book, my book, my book... Aaaaugh! My book!

Boy, that didn't take long at all.

§

Just a friendly reminder: the Feed Validator is located at feedvalidator.org. Unfortunately, due to a bizarre confluence of circumstances beyond my control, an old version of the validator is up and running at the old location, but we are unable to update it to the latest version or redirect it to the new server (HTTP is running but ssh is not, and no, bouncing the box does not help). It is very important that you update your bookmarks, links, templates, scripts, and applications to point to the new domain. Important bugs have been fixed and deployed at feedvalidator.org that will never be deployed at the old location.

In other news, I got a new desktop PC today, for work. At 2.4 GHz, it is faster than our 8 other computers combined. It is sitting on my desk next to a Rev D iMac which, running OS X 10.2.6 at 400 MHz on 128 MB of RAM, can only be described as wheezing. Sadly, I will need to run Windows XP on the new PC, complete with Service Pack 1 and 25 critical updates to download out of the box (thanks Dell! glad I have an external firewall), and later the .NET framework and the complete suite of Visual Studio .NET tools. Yes, I'm learning how to be a .NET programmer. I'm even getting paid for it. The world is rich with irony.

In other other news, I have a new server, through Bytemark Hosting. I have set up a CVS repository for my book and soon for all my other projects as well, and will be setting up anonymous CVS access shortly. It will also host my book's companion website, and probably some or all of my other sites as well, plus my mother's website, if I ever finish it for her (sorry Mom). It is running Debian GNU/Linux, and I have root access on it. Today I celebrated my new-found non-paying full-time got-root-why-yes-thanks-for-asking job by patching a buffer overflow in ssh to protect my server from a rumored zero-day exploit.

Meanwhile, if you're in the market for some kickass non-root hosting, I strongly recommend Cornerhost, where I am currently hosted. It will take me many moons to replicate for myself all the niceties that I take for granted on Cornerhost, and I will never be as nice to myself as Michal has been to me. But non-root hosting is like dry humping; it's fine as far as it goes, but at some point you have to hunker down and get naked. It's probably best not to stretch this analogy too far.

In other other other news, I'm writing a book. Or rather, have written a book, or at least part of a book, Dive Into Python, the book I've been writing, or not writing, for years, the book I said I'd never work on again. Except that now, Apress is paying me to work on my book again, expand it, and hopefully finish it. And if all goes well, they're going to publish it on actual paper sometime next summer, by which I mean fall, by which I mean God willing before 2005.

My book will be edited by James Cox, and reviewed by we know not whom. My book -- including all new work, edits, and corrections -- will remain freely downloadable under the GNU Free Documentation License, but you should buy a copy anyway, because this is the way I want the world to work. My book is not yet available for pre-order, but believe me when I tell you that I will make it very clear when it becomes available.

My book. My book. I'm just going to go around muttering my book for a few weeks until somebody smacks me. Or until a hurricane knocks my house down, whichever comes first.

§