Skip to main content

$ lynx -head -dump http://hixie.ch/advocacy/xhtml | grep Content-Type
Content-Type: text/plain; charset=utf-8

[Safari rendering Hixie's XHTML advocacy page]

According to comments by Jens Alfke, the next version of Safari will extend this behavior to sniff for feeds as well:

Also, there is a bit of code way down in WebCore that sniffs the incoming page and, when it detects the start of an XML document that contains RSS or Atom, it auto-corrects the MIME type to application/xml+rss or application/xml+atom.

Words fail me.

§

Tim Bray is learning Python and using my feed parser to parse the feeds at Planet Sun. I am suitably flattered, and I sincerely hope that one of the 57 lines in Tim's first Python program checks the bozo bit so Tim can ignore the 13 Planet Sun feeds which are not well-formed XML.

One is served as text/plain, which means it can never be well-formed.

Two (a, b) contain invalid XML characters.

Ten (1, 2, 3, 4, 5, 6, 7, 8, 9, 10) are served as text/xml with no charset parameter. Clients are required to parse such feeds as us-ascii, but the feeds contain non-ASCII characters and are therefore not well-formed XML.

On a positive note, it's nice to see that Norman Walsh has an Atom feed (#10 in that list). Pity it's not well-formed. I'm sure he'll fix that in short order. He's no bozo.

You know what I want for Christmas? Markup Barbie. You pull a string and she says "XML is tough."

§

I was walking across a bridge one day, and I saw a man standing on the edge, about to jump off. So I ran over and said, "Stop! Don't do it!"

"I can't help it," he cried. "I've lost my will to live."

"What do you do for a living?" I asked.

He said, "I create web services specifications."

"Me too!" I said. "Do you use REST web services or SOAP web services?"

He said, "REST web services."

"Me too!" I said. "Do you use text-based XML or binary XML?"

He said, "Text-based XML."

"Me too!" I said. "Do you use XML 1.0 or XML 1.1?"

He said, "XML 1.0."

"Me too!" I said. "Do you use UTF-8 or UTF-16?"

He said, "UTF-8."

"Me too!" I said. "Do you use Unicode Normalization Form C or Unicode Normalization Form KC?"

He said, "Unicode Normalization Form KC."

"Die, heretic scum!" I shouted, and I pushed him over the edge.

(with apologies to Emo Philips)

§

Following up on my experiment with styling my Atom feed with CSS...

There were 3 major complaints that I noticed in comments, email, and elsewhere around the web:

  1. It required additional markup in the feed, which seemed wasteful
  2. Entry titles weren't displayed as links
  3. It didn't work well in IE

Next step: XSLT. Atom feed + XSLT = styled feed. (View source on each; the last one will surprise you!)

I removed all extra XHTML markup in the feed itself (it's rolled into feed.xsl). I removed the info element that explained the purpose of the feed (also moved to feed.xsl). This Atom feed is as svelte as it can be, but it looks like a regular page in IE 6 and Mozilla. Even the entry links work.

There is apparently some confusion about how this works. Won't it be wasteful for feed readers to download all those images and crap? No, it doesn't work like that. Feed readers download the feed, nothing else. They don't even see the images. How can they not see the images? The images are linked in from the CSS file.

Well, won't it be wasteful for feed readers to download a separate CSS file? No, it doesn't work like that either. Feed readers download the feed, nothing else. They don't even see the CSS file, because it's never mentioned in the feed. So how does your browser find it? When you view the feed in your browser, it transforms the XML into HTML+CSS on the fly, using XSLT (based on the rules in the feed.xsl file).

But I don't want to see your styles in my feed reader! You won't. Feed readers download the feed, nothing else. They don't apply XSLT transformations; they'll still display the feed however they display feeds. But modern browsers like IE and Mozilla look for a particular line in the feed (that feed readers ignore) that tells them to do something more useful than dumping raw XML on the screen:

<?xml-stylesheet type="text/xsl" href="feed.xsl"?>

But now users can't see your XML! When they click on your feed link, you need to show them your XML and get them excited about XML technology! That's the dumbest thing I've ever heard.

§

View this snapshot of my Atom feed in a real browser, where by real, I mean Mozilla-based. (That's not fair; it works flawlessly in the latest version of Opera too.) You should get a page that looks very much like the rest of my site, with a friendly little blurb at the top explaining a little about syndication and what this feed thing is that you just clicked on.

This is not an original idea; Blogger does something similar for all of their feeds. In fact, the blurb at the top is the info element, which was originally proposed by Jason Shellen, a Blogger employee, and later refined in Atom 0.3 to make the content model match other Atom elements. Their blurb points users to the Blogger Knowledge base, which seems reasonable. The wording of mine is open for discussion.

What is new here, I believe, is the use of inline XHTML (properly namespaced, of course; this is still a valid Atom feed) to add a few other things to the page for browser users. The page header is a series of <div>s (the images and positioning are entirely defined in the associated CSS file, just like the rest of my site), and the breadcrumb trail is also a piece of hard-coded XHTML wedged in the middle of the feed in the appropriate place.

I have tested this with a number of Atom-enabled aggregators, and none seem to have any problem with it. Let me know if your Atom-enabled client misbehaves.

Note that the display doesn't look right in IE/Win. I am shocked, shocked. Also, the links don't work in IE (the breadcrumb trail should be clickable except for the last bolded link, and the letters in the page header should be clickable).

Instead of styling XML with CSS, another potential solution would be to associate it with an XSLT transform and let the client convert it to HTML. This would probably solve the cross-browser-CSS problem, since I could just transform it into exactly the same markup I use elsewhere throughout my site. It would also create all new, even more exciting problems in trying to create cross-browser XSLT.

Update:

The entry titles aren't links in any browser, but they're not meant to be. Some have suggested they should be, but I disagree. I don't want to make this page *too* useful in a browser. I do not, for instance, want people visiting this page all the time in their browser. I just want to make it friendlier to first-timers (or accidental click-throughs) than dumping raw XML. The one and only goal of the page is to get visitors to subscribe to the feed in a feed reader.

Others have noticed that I've switched from application/atom+xml to application/xml to get this demo to work. Yeah, that sucks. application/xml is correct, in the sense that it's not wrong. It's not optimal, but at least it's not text/xml. Oh God, let's not have that discussion again.

I'm surprised no one has pointed out the irony of my using inline XHTML at all, given my strong and well-publicized opinion of XHTML for general use. In fact my HTML is virtually XHTML anyway, except for unclosed <img> tags. I may employ some quick regular expressions and inline XHTML for the full content in my feeds, since Atom supports that and all Atom-enabled aggregators I've seen support that. As I've mentioned before, this is the only real use I've seen for XHTML. And then I could display the full content on my styled-for-browsers Atom feed. (Sam does this.) Not sure if that would be an improvement worth making.

§