The complexity of HTML

Baldur Bjarnason has written down his thoughts on why he thinks HTML is too complex.

Now we’re back to seeing almost the same level of complexity and messiness in most web pages as we saw in the worst days of table-hacking. The semantic elements from HTML5 are largely unused.

The reason for this, as outlined in an old email by Matthew Thomas, is down to the lack of any visible benefit from browsers:

unless there is an immediate visual or behavioural benefit to using an element, most people will ignore it.

That’s a fair point, but I think it works in the opposite direction too. I’ve seen plenty of designers and developers who are reluctant to use HTML elements because of the default browser styling. The legend element is one example. A more recent one is input type=”date”—until there’s a way for authors to have more say about the generated interface, there’s going to be resistance to its usage.

Anyway, Baldur’s complaint is that HTML is increasing in complexity by adding new elements—section, article, etc.—that provide theoretical semantic value, without providing immediate visible benefits. It’s much the same argument that informed Cory Doctorow’s Metacrap essay over a decade ago. It’s a strong argument.

There’s always a bit of a chicken-and-egg situation with any kind of language extension designed to provide data structure: why should authors use the new extensions if there’s no benefit? …and why should parsers (like search engines or browsers) bother doing anything with the new extensions if nobody’s using them?

Whether it’s microformats or new HTML structural elements, the vicious circle remains the same. The solution is to try to make the barrier to entry as low as possible for authors—the parsers/spiders/browsers will follow.

I have to say, I share some of Baldur’s concern. I’ve talked before about the confusion between section and article. Providing two new elements might seem better than providing just one, but in fact it just muddies the waters and confuses authors (in my experience).

But I realise that in the grand scheme of things, I’m nitpicking. I think HTML is in pretty good shape. Baldur said “simply put, HTML is a mess,” and he’s not wrong …but HTML has always been a mess. It’s the worst mess except for all the others that have been tried. When it comes to markup, I think that “perfect” is very much the enemy of “good” (just look at XHTML2).

When I was in San Francisco back in August, I had a good ol’ chat with Tantek about markup complexity. It started when he asked me what I thought of hgroup.

I actually found quite a few use cases for hgroup in my own work …but I could certainly see why it was a dodgy solution. The way that a wrapping hgroup could change the semantic value of an h2, or h3, or whatever …that’s pretty weird, right?

And then Tantek pointed out that there are a number of HTML elements that depend on their wrapper for meaning …and that’s a level of complexity that doesn’t sit well with HTML.

For example, a p is always a paragraph, and em is always emphasis …but li is only a list item if it’s wrapped in ul or ol, and tr is only a table row if it’s wrapped in a table.

(Interestingly, these are the very same elements that browsers will automatically adjust the DOM for—generating ul and table if the author doesn’t include it. It’s like the complexity is damage that the browsers have to route around.)

I had never thought of that before, but the idea has really stuck with me. Now it smells bad to me that it isn’t valid to have a figcaption unless it’s within a figure.

These context-dependent elements increase the learning curve of HTML, and that, in my own opinion, is not a good thing. I like to think that HTML should be easy to learn—and that the web would not have been a success if its lingua franca hadn’t been grokabble by mere mortals.

Tantek half-joked about writing HTML: The Good Parts. The more I think about it, the more I think it’s a pretty good idea. If nothing else, it could make us more sensitive to any proposed extensions to HTML that would increase its complexity without a corresponding increase in value.

Have you published a response to this? :