Skip to main content

In case you missed it, I've started a new column at the WHATWG blog called This Week in HTML 5. The story thus far:

  1. Episode 1: Web Workers, and how to specify alternate text for images you know nothing about.
  2. Episode 2: the window.navigator object, meta http-equiv="Content-Language", the Worker object, outerHTML, insertAdjacentHTML(), and the continuing saga of the alt attribute.
  3. Episode 3: the event loop, the onhashchange event, and content sniffing for SVG images.
  4. Episode 4: the W3C's HTML 5 validator, SVG-in-HTML, and the proper way to provide alternate text for Rorschach inkblots.
  5. Episode 5: XSLT, MathML, Web Forms 2, and some light reading on character encoding.
  6. Episode 6: multimedia accessibility, Ogg Theora, and the year 2022.
  7. Episode 7: clickjacking.

There is a feed available for people who like that sort of thing.

§

Not for nothing, but I've had my share of bad reviews in my professional career. Some I've taken well, and some I've taken... poorly. Some were my fault and others honestly weren't. There isn't a manager on Earth who hasn't had to give a bad review to somebody, sometime. It's always awkward and it's never fun and in the end you're left with a low score on a piece of paper and a sinking feeling in your chest.

And yet, if you rounded up all the managers in the world and shot them... no wait, that's not where I was going with this. If you rounded up all the managers in the world and got them drunk -- yes, I think that would work -- you got them drunk and you asked them one question, they'd all tell you the same thing: the score that they give and you get doesn't mean a damn thing. Oh, you'll fixate on the score, since it means no salary bump or no bonus or no promotion or -- jackpot! -- all three at the same time, but it truly, truly, truly doesn't mean a damn thing. The only thing that truly matters is the conversation that follows.

And it is in this context that I am somewhat embarrassed on behalf of the Mozilla Corporation. They certainly didn't ask for my opinion or my guilt-by-proxy, but they apparently haven't noticed that they ought to be embarrassed, so by God somebody needs to step up. I refer, of course, to the Acid 3 test cooked up by the inimitable Ian Hickson and his motley crew of meddling minions. The test gives a numerical score that purports to rank a browser's compatibility with a potpourri of well-established web standards. Of course any such test is guaranteed to be unfair to somebody, but this one was especially unfair to everybody since the makers intentionally sought out bugs in major browsers to highlight their incompatibilities.

That, by itself, is not the story. First there was the Acid test, then there was the Acid 2 test, and there will no doubt be an Acid 4 test and so on. The fact that the testmakers had to work so damn hard to find compatibility bugs to highlight speaks volumes by itself, but that is not the story either. The story is that two browser vendors -- Opera and Apple -- somehow got into a bit of a race over who could reach a perfect score first. This, on top of their already insane release schedules (Safari 3.1, Opera 9.5), shocked and awed the web standards community, who for the first time in recent memory were put in the enviable position of arguing about which browser had increased its standards compliance the most and the fastest.

The funny thing is, I don't even know who won. There were some inconsistencies about which builds passed what, and then they found some last-minute bugs in the tests themselves, and despite minute-by-minute updates on programming.reddit.com, I don't really know or care who "won" the race. But I'll tell you one thing: it sure as hell wasn't Mozilla, because they were too busy complaining that the tests were just designed to highlight bugs (duh)... and they didn't see any real worth in the feature tests (like downloadable web fonts, which is a five-digit Bugzilla bug that has been open since 2001)... and they felt they should get partial credit for still being ahead of Internet Explorer (new working slogan: "Firefox: We're Not Dead Last")... and anyway, they're really busy right now -- unlike the fine young minds at Apple and Opera, who, unbeknownst to their managers, have outsourced all their browser development to summer interns and are spending their newfound free time reenacting Roman toga parties. And oh, by the way, didn't you hear that the other guys cheated? Also, their toga parties are, like, totally inaccurate when viewed from a psycho-historical perspective.

C'mon, guys. It's not the score that matters, it's the followup. It's the conversation you have, the promises you make, the progress you show the next day and the day after that and the day after that. And bitching about an openly developed test suite whose ultimate goal was just to get people excited about web standards for a few minutes -- man, you should all be embarrassed with yourselves. But you're not, so here I am stepping up, publicly being embarrassed on your behalf. No need to thank me.

Update: once again, I explain myself better the next morning.

§

Said the monk:

If you give me non-standard markup, I will render it according to standards.

If you give me standard markup, I will not render it according to standards.

What do you do?

The student sat for a long time and said nothing. Then, without looking up, he raised one finger and said, "There is only one web."

Many years later, the monk was enlightened, but by then it was too late.

§

HTML 5, XHTML 2, and the Future of the Web is making the rounds. It's an excellent summary for people who haven't been paying attention. Along the same lines, here's a presentation I gave in December 2005 to a bunch of Firefox developers at Mozilla Corporation headquarters: White lights lead to red lights. The "future of the web" part comes in about halfway through.

(Yes, I am aware these slides violate all the rules of Presentation Zen. I wrote them on the plane. Hence the title.)

The circumstance surrounding this speaking engagement is itself a funny story, one which I may share someday. For now, let's just say that I've been keeping this presentation under my hat for a long time, and now I don't have to.

I shall now proceed to play Super Paper Mario all weekend.

§

I can't think of a single topic that could be considered a more trivially unimportant form of navel-gazing than arguing about the semantics of HTML. (Oh wait, here's one.) I personally think Greg Knauss' definition of web standards is all you need to know on the subject, but all the weird markup-obsessed Alpha Geek freaks in the world seem to have decided that I'm worth reading because every few months or so I have a semi-enlightening rant on the subject. This is not one of those rants, but out of sympathy for those members of my audience who keep hitting refresh every few minutes like gerbils on crack, hoping for a repeat of Semantic Obsolescence, here is some light reading on the fascinating subjects of syntax, semantics, structure, validation, CSS, accessibility, and markup:

Now then. There is a veritable plethora of overlapping concepts here, and enough snake oil to choke a crack-addled gerbil.

XHTML

The successor to HTML, XHTML 1.0 is a tag-for-tag reformulation of HTML 4 in XML, and comes in several flavors (just like HTML 4). XHTML 1.1 comes in only one flavor (strict), adds a few things you'll never use, and removes several things you use all the time (like the name attribute on A tags). XHTML 2.0 is not done yet, but current drafts are not backwardly compatible with either XHTML 1.0 or 1.1. Despite what you might have heard, XHTML is not treated as XML by modern browsers, unless you use the proper MIME type (which only about 2 dozen people do).

Validation

Means that your markup declares itself as a particular markup specification (using a DOCTYPE), and that it conforms to the rules of that specification. No more, no less. A validator will tell you that the structure of your markup is correct (or not); it says nothing about whether you're using tags properly, or semantically, or accessibly, or whatever.

Validation is completely independent of whether you're using XHTML or HTML, or whether you're using the XHTML MIME type correctly. You can write valid XHTML 1.1, or XHTML 1.0 Strict, or XHTML 1.0 Transitional, or HTML 4.01 Strict, or HTML 4.01 Transitional, or HTML 3.2, or HTML 2.

The W3C Validator is the most widely-used (X)HTML validator, although the upcoming beta version goes beyond validation-against-the-spec with something called fussy parsing, and checks for common problems which, while technically valid, are known to cause problems in popular browsers. This crosses the line into being more of a linter than a pure validator. (The Feed Validator also crosses this line by flagging things like SCRIPT elements, ambiguously relative URLs, and invalid date values.)

The biggest reason to validate your markup are that, after you do it once, you can use the validator as a debugging tool to catch stupid mistakes. The second biggest reason is that if you don't, we won't help you. Yes, we're elitist snobs, but we increasingly have a monopoly on talent, and we've all decided that valid markup is a baseline.

CSS

Means that you are separating the structure of your document from the presentation of your document. It is technically possible to have a perfectly valid XHTML Transitional document that uses tables for layout, FONT tags for styling, and spacer GIFs for pixel-perfect positioning. As long as you put ALT attributes on your all spacer images and close all your FONT and TABLE tags, it can be valid. But it sucks, because table-based layouts and FONT tags and spacer GIFs are an ongoing nightmare, while CSS is only an up-front nightmare.

Also, you can do cool things like dynamic style switchers, ala CSS Zen Garden, and printer stylesheets and so forth, if you care about that sort of thing.

But am-I-CSS-or-not is independent of whether your markup validates. You can use CSS and not validate your markup, or you can validate your markup and not use CSS, or you can do both, or neither. The primary reason CSS and validation have historically been conflated is that the sort of people who have taken the time to learn about validation are also generally the sort of people who have taken the time to learn about CSS, and who use CSS for their own designs, and who advocate CSS to others. (To further confuse the issue, there is a CSS validator which checks whether your stylesheets conform to the CSS specification. Whether your (X)HTML is valid is completely independent of whether your CSS is valid, since they're separate specifications.)

CSS is almost completely independent of XHTML-vs-HTML. The rules for parsing CSS in an HTML environment are slightly different from parsing CSS in an XHTML environment, but keep in mind that unless you're using the proper XHTML MIME type, you're not in an XHTML environment anyway.

Semantic markup

Means (among a variety of definitions) that you are using tags that have specific meaning assigned to them, either in the (X)HTML specification, or in generally accepted use. As opposed to generic tags like DIV and SPAN, or Tag Soup whose result happens to look the same in one class of browsers.

HTML is flexible enough that there's usually more than one way to do something. If you want a simple vertical list of items, you can use UL and LI tags (plus some CSS to eliminate the default bullets), or you can simply put BR tags at the end of each item. Both accomplish the same result (in a popular class of visual browsers), and both are valid, and both may or may not be styled with CSS, but one is semantically better. UL means a list of things; BR doesn't. P means a paragraph; H1, H2, H3 etc. mean different levels of nested headings.

In visual browsers it doesn't really matter whether you use semantically correct markup or generic markup that happens to look the same, because the meaning of your page is going to be determined by the human being with eyes who reads it on a screen. But in other environments, it matters a great deal. (More on this in a second.)

Semantic markup is independent of XHTML-vs-HTML (which makes sense, since there aren't any new tags in XHTML that could provide new meaning). Semantic markup is independent of validation; you can produce shitty non-semantic markup that validates. Semantic markup and CSS are loosely joined, as described below.

Accessibility

Means that your content is accessible to a wide range of browsers, platforms, and users using assistive technology. Most accessibility discussion focuses on the blind and partially sighted, and the #1 important accessibility feature is ALT attributes on images, since blind people can't see them. But there are lots of types of disabilities that intersect the web; I would say that the #2 important accessibility feature is complete keyboard navigability, since not everyone reading your page can use a mouse.

Accessibility also has nothing to do with validation; no assistive technology requires valid markup. But it does have some overlap with CSS and semantic markup. You think I'm going to say table-based layouts suck for accessibility, but I'm not; table-based layouts suck for maintenance, but they can be perfectly accessible. No, the way that accessibility intersects CSS is that CSS allows you to use proper semantic markup but make it look like what you wanted it to look like in the first place, and some of that semantic markup is specifically interpreted by assistive technology better than the non-semantic (but visually identical) alternatives.

Example: if you use real H1, H2, H3 tags to build an outline of nested headers on your page, the Home Page Reader screen reader has an option to present that outline to a blind user, to read just the header tags and let the user skip to a particular one within the page. If you wanted to get a sense of the overall structure of a page, you would visually scan it, and the bolder/larger headers and surrounding whitespace would jump out at you, and you would start reading where you wanted to start reading. Blind people can't just scan the whole page at once; they need to accomplish the same thing in other ways, and the assistive technology they use relies on (among other things) good semantic markup to mimic the things we can do at a glance.

Another example: some screen readers have an option to announce the number of items in a list before they start to read it, so the user can know how long it is before they listen to each item being read. If the list is really just text separated by BR tags, this feature won't work.

I take an extremely pragmatic view of semantic markup. Semantic markup is useful as long as I can pinpoint a specific use for it, in a specific tool. Otherwise I don't care. Proper header tags are useful for accessibility. It's an actual menu item in Home Page Reader; if you use real header tags, that menu item works, and otherwise it doesn't. ALT attributes are important because I've heard what JAWS sounds like when it tries to read images without them (it reads the filename instead, which is generally meaningless).

ALT attributes can also increase search engine relevance. After all, Googlebot is just another blind user with 100 million friends. People search by typing in keywords. If you don't tag your images with text, Google can't see them and match them up with those keywords, and they may as well not be there. This isn't rocket science, but apparently most people think that Google operates by loading up your page in IE and taking screenshots.

Then there's some semantic markup that I personally make use of, even if there isn't a wide market for it. I mark up names of people I link to (like in the list above) with the CITE tag, and I have a script that runs every night that aggregates those tags and creates posts by citation. I do a similar thing with the cite attribute of BLOCKQUOTE and Q, and create posts by quotation. (Some people also use CSS and Javascript tricks to automatically format the cite URL, if CSS and Javascript are available. A cute trick that helps some people and doesn't harm anyone else.) I use the ACRONYM to mark up acronyms and then pull them out and list them on my accessibility statement. This is all very geeky and not of general interest, and some of it could probably be replicated with smarter code and dumber markup, but this is the balance I've found that works for me.

However, there are lots of recurring discussions about semantics that I have no interest in whatsoever. I do not, for example, care about the distinction between ACRONYM and ABBR. I do not care about the distinction between STRONG and B, or EM and I. I know of no mainstream tool that supports one and not the other (except Internet Explorer for Windows, which brilliantly only supports ACRONYM and not ABBR). I also don't care about the distinction between UL and OL, since general use has contradicted the spec definition for so long that the distinction is meaningless. A UL is an ordered list with bullets, because that's how everyone uses it.

So where's the problem? Well, there's a lot of snake oil out there. Anyone who tells you that validation buys you semantics is selling you snake oil. You can have an entire page full of DIV and SPAN tags and be perfectly valid, and it may look perfectly good to the human eye, but it doesn't mean anything, and any tool that relies on semantic markup won't be able to make heads or tails of it.

Anyone who tells you that CSS guarantees you accessibility is selling you snake oil. Most accessibility techniques have nothing to do with CSS (remember, the #1 accessibility technique is marking up images with text that's normally invisible). And where accessibility and CSS do overlap, it's still easy to screw up if you don't know what you're doing, or simply go through a lot of pain for no real-world gain.

Anyone who tells you that XHTML is easier to parse, consume, or validate because it's XML is selling you snake oil (or isn't using the right tools). The stuff I do with citations, quotations, and acronyms, I do it with HTML 4, a standard that has been around since 1997. Headers and lists have been around since HTML 1. Anyone who tells you that XHTML buys you anything at all is most likely selling you snake oil. The only possible use I've seen for it is directly embedding it in syndicated feeds (in other words, in another XML vocabulary). Whether this idea has legs or not is an open question.

A common misconception is that XHTML is better because you can use XML tools (such as XSLT) to generate it (from a database, from another XML source, whatever). Unless you are using XHTML as input to some further processing, this is a bogus argument. XSLT can output HTML as easily as it can output XML. It's only when you want to take XHTML and use it as input to some other transformation that it matters that it's XML (and even then, it only matters to the extent that you want to use existing XML tools instead of existing SGML tools). (It has been pointed out to me that there is a growing developer community that is doing exactly this: providing add-on tools that take XHTML as input and do interesting things with it. Like the design community that simply won't talk to you unless your markup validates, the developer community may all collectively decide not to talk to you unless you're using XHTML. This may end up being the strongest argument for using XHTML.)

And finally, anyone who tells you that any of these concepts will make your web site look better on mobile devices is selling you snake oil. Older mobile devices only supported a weird fucked up subset of HTML 3.2, and newer mobile devices have ultra-smart browsers that reflow even the most rigid designs and parse even the most fucked up Tag Soup markup. Every new mobile device that comes out seems to trip up on CSS in its own way, and apparently nobody told the mobile vendors about XHTML Basic (don't ask).

So there are a lot of overlapping concepts here, and if you are the sort of person who is trying to push one of them, you're probably going to try to push all of them. Despite the fact that it shares 100% of its tags with HTML 4, people are pushing XHTML as a fresh start, a better way of doing things. There are people who are (intentionally or not) conflating all of these issues, advocating XHTML but then trying to slip validation, CSS, accessibility, and semantics in under the radar at the same time. Everything is loosely related anyway, and if you're going to make a fresh start and make the leap to standards-based design (and it is quite the leap, if you've been doing Tag Soup design all your professional life), you may as well go whole hog.

And there's nothing wrong with this argument, per se. HTML has been historically branded with the stigma of cowboy coders, anything-goes, forcing round pegs into square Netscape browsers, and just generally being a wild woolly mess. If XHTML can be branded so as to create the association with validation, separating structure and presentation, accessibility, and other techniques that have worth in their own right, then maybe they can all get traction together. But that's not a technical argument, it's a social one, and anyone who claims that these loosely coupled concepts are really tightly coupled is either misinformed or lying.

OK, so I guess this was one of those rants. I promise not to speak of such things for another six months. Save the gerbils.

§