The Design of HTML5

The opening keynote from Fronteers 2010 in Amsterdam.

This is the opening keynote from Fronteers 2010 in Amsterdam. It was kindly transcribed by these volunteers:

You can:

I would like to talk to you today about The Design of HTML5. So there’s two parts to this: one is, of course, HTML5. I could stand up here and just talk about HTML5 but that’s not what I’m going to do because if you want to know what is in HTML5 you can Google it, you can read books, you can go and read the spec.

Actually, some other people are going to be talking about the contents of the spec. Steve Faulkner will be talking about accessibility and HTML5. Paul Irish is going to go through a whole bunch of the APIs that are in HTML5. So I’m not just going to stand up here and run through what’s in HTML5.

Actually, before I even get started I should probably clarify what I mean by HTML5, which seems kind of crazy ‘cause why should I have to clarify what I mean by HTML5 when what I mean by HTML5 is HTML5? There’s a specification, it’s called HTML5 and when I say HTML5 that’s what I am referring to. The problem is other people are using the term HTML5 to refer to just about anything, which can be problematic. For example, referring to CSS3 as HTML5 for some reason seems to be a common technique. That’s not what I’m referring to. When I say HTML5 I don’t mean CSS3, I mean HTML5.

We’ve been here before with terms. It used to be that Ajax meant something specific and then, after a while, it just became “doing anything cool with Javascript.” That was Ajax, right? And now the same thing seems to be happening to the term HTML5. It’s supposed to mean a specific specification and now it just means “doing anything cool (fullstop) on the web.” But that’s not the HTML5 I mean. I don’t mean this umbrella term that covers anything that’s new these days. I’m specifically talking about the specification: HTML5.

As I said, it’s not so much the contents I want to talk about. It’s not going through a checklist of what’s in HTML5. It’s the other side of it, it’s the design of HTML5. What I want to talk about is not so much what is in the spec but why these things are in the spec, what the process was in thinking of these things when designing a specification.

In particular, one of the reasons why I think HTML5 as a specification is quite successful—and the process has been successful—is that it is driven by design principles. Design principles are something I am getting more and more fascinated with.

Design principles

A design principle is essentially a belief, a tenant, a concept that you rally behind. It doesn’t matter wether your making a specification or if you are making a physical object or a piece of software or a programming language. You will probably find a design principle or multiple design principles behind all good examples of anything that has been built collaboratively. And it’s not just from the world of the web. Throughout history there example of design principles behind large scale constructions like countries, societies.

To give you an example, from the United States of America, this is a design principle built into the Declaration of Independence.

We hold these Truths to be self-evident, that all Men are created equal, that they are endowed by their Creator with certain unalienable Rights, that among these are Life, Liberty and the pursuit of Happiness.

They have the watchwords in there: life, liberty, the pursuit of happiness. These are the keys things enthroned into the constitution, this is what we are all about, these are the principles on which we want to build our society.

Another example would be from Karl Marx whose writing were used as a basis for building societies throughout the 20th century and a lot of it could be boiled down to this one design principle:

From each according to his ability, to each according to his need.

So here we have this design principle guiding an economic system.

Another example, much older than that but using a similar principle to this would be:

Do unto others as you would have them do unto you.

A very simple, small design principle from Jesus Christ, a Nazarene Jew from two thousand years ago. And this design principle theoretically is what drives a number of religions that have been built on top of the teachings of this principle. The principles and the practices sometimes go out of sync.

Here’s an example from fiction. When George Orwell wrote Animal Farm he had a fictional society and that fictional society was built upon a design principle. In this case the design principle was:

Four legs good, two legs bad.

What’s interesting is that in Animal Farm, as the society changes, as the society evolves for the worse, the design principle changes along with it so the design principle comes four legs good, two legs better as the animal farm itself changes but it’s interesting to see this in works of fiction.

There is another work of fiction that has three design principles baked in and that’s from the canon of work from Isaac Asimov on robotics. He coined the term robotics and he basically ensconced these three laws of robotics, three fairly simple design principles but then build a whole canon of would around them, about fifty books, each permutations examining these design principles from different aspects. You’re probably familiar with the three laws of robotics I’m sure.

A robot may not injure a human being or, through inaction, allow a human being to come to harm.

A robot must obey any orders given to it by human beings, except where such orders would conflict with the First Law.

A robot must protect its own existence as long as such protection does not conflict with the First or Second Law.

This is, I think, the first example in fiction of design principles for a piece of software. In this case it would be a price of software to run a positronic brain in a robot that’s designed to the three simple design principles. But I think this might be the start of actually having design principles for software. And since then we see design principles for a lot of really good software.

Tim Berners-Lee who, as you know, co-invented the web. He has a document on the W3C website where he keeps his own set of personal design principles at a URL. They are kind of sprawling, there are a lot of them there, he adds to them, he adjusts them, he takes things away as time goes on but I think it’s a really good idea to have a personal set of design principles somewhere.

Actually, Bert Bos, co-inventor of CSS, he’s got great document on the W3C website that is kind of a meta design principles document. Like how to design building a format, whether it’s CSS or anything else. It is well worth reading.

So if you root around the W3C site you find a bunch of these design principles including TIm Berners-Lee’s personal ones and you see the watchwords he’s taken from schools of software engineering: decentralisation, tolerance, simplicity, modularity. These are key watchwords that he keeps in mind as they come up with formats.

You’re all pretty familiar with the work of Tim Berners-Lee as you use it everyday. He invented the web, co-invented the web with Robert Cailliau, and as well as inventing the web itself he also came up with the language that we use everyday on the web and that language is of course HTML: HyperText Markup Language.

HTML

This is an early history of HTML, it started with version two point zero. There never was a HTML one. If anyone ever tells you they have been doing HTML since version one, they are bullshitting you. There was a document called HTML Tags that contained a handful of tags that still exists today but it wasn’t an official specification.

This whole idea of using tags, angle brackets, ‘p’ or ‘h1’ or what have you; that isn’t something than Tim Berners-Lee came up with. He was basically taking an existing vocabulary from SGML, the particular version of SGML that was in use at CERN at the time. So even back then he wasn’t creating things from scratch; that’s an important lesson that you can still see in the evolution of HTML. Build on what’s come before rather than trying to build something from scratch.

So this HTML Tags document was the first version of HTML but wasn’t an official version. The first official version was HTML 2.0 which didn’t come from the W3C. HTML 2.0 was from the IETF, the Internet Engineering Task Force. They were responsible for putting out a lot of standards before the W3C really started off. But from version three onwards it was at the W3C, the World Wide Web Consortium, where later versions of HTML were specced.

There was some fairly rapid movement in the nineties. As you can see, the nineties were a pretty turbulent time for anyone building websites back then. We had the browser wars, it was pretty messy. We had a lot of proprietary shit being thrown into browsers. They were trying to compete on having the best proprietary shit. It was kind of a messy time and it wasn’t clear at all at this time if HTML was even going to be around and if HTML was going to last as the format for the web.

You can see it evolved fairly quickly, 3.2, 4.0, 4.01 from 1997 to 1999; very rapid evolution. What happened with 4.01 is the W3C stepped back, looked at it and said “Okay, this is good, we are done with HTML; HTML 4.01 is the final version of HTML, we don’t need a HTML working group anymore.”

They didn’t stop working on the language but it was no longer HTML they were interested in. Right after HTMl 4.01, they came up with XHTML 1.0. It sounds completely different but actually XHTML 1.0 was the same as HTML 4.01. I mean literally, the contents of the specification were the same, the vocabulary was the same, all the same elements, all the same attributes. The only difference, the only difference, was that in XHTML 1.0 you would use XML syntax. So that meant all your attributes had to be lowercase, all your elements had to be lowercase, all your attributes had to be quoted, you had to remember to use closing tags and you had to self close tags like img and br.

From the point of view of the contents of the spec, exactly the same. There really was no difference. In a sense it really was just coding style, because to a browser if you served up HTML 4.01, HTML 3.2 or you served up XHTML 1.0 it didn’t matter, it was all the same to the browser; it would make the same DOM tree. But what was kind of nice about XHTML 1.0 was because it had this kind of stricter syntax, it was a sort of coding style that people could get behind.

This time period of 2000, this was when the web standards project was picking up steam, and developers were really pissed off with all this proprietary crap that was being thrown into browsers; they were getting angry and saying to browsers “Why don’t you just follow the damn specifications?” And CSS is really starting to take off in a big way, and they kind of latched on to XHTML 1.0, they were like “Okay this is going to be best practice”, even though as I said, there’s really no difference between HTML 4.01 and XHTML 1.0. But okay, professionals always use lowercase elements, always use lowercase attributes, always quote your attributes: it was a good body of practice, so a lot of people got behind that syntax.

I did for example! So for the last 10 years I’ve been using the XHTML 1.0 doctype, and one of the reasons is that it makes the validator a more powerful tool for me, right? So if I’m writing XHTML 1.0 and I run that through the validator it’s going to tell me if I forgot to quote an attribute, or if I forgot to include the closing tag, stuff like this. Whereas if I was writing in HTML 4.01 that stuff would be legal, it wouldn’t necessarily catch it.

That’s the reason why I’ve been using XHTML 1.0. And I’m guessing that a lot of people …hands up those who use XHTML 1.0. Okay. HTML 4.01? A few people. Any others, shout them out? HTML5, good for you! Anything older, anybody use older doctypes? No?

I’ve been using XHTML 1.0 for 10 years now because it makes validators a more useful tool. Is anybody using XHTML 1.1? Are you now? Keep those hands up. Are you serving your documents as XML? Some? Well the ones you’re not are not XHTML 1.1.

This is the big issue. After XHTML 1.0 came XHTML 1.1, a small point increase, doesn’t sound like much, and again there’s nothing new in the spec from a vocabulary point of view, it’s all the same elements, it’s all the same attributes. The only difference was that now with XHTML 1.1 you must serve your documents as XML. With XHTML 1.0 you could serve them as HTML if you wanted, and that’s exactly what we do because you’d be kind of crazy to serve your documents as XML.

One of the reasons why it’d be crazy to serve your documents as XML is that Internet Explorer can’t handle it. Well it can now on version 9. It’s like “Aww, thatt’s so cute”, that it’s still even thinking about it. That boat has sailed! So the world’s leading browser at the time couldn’t even handle documents sent as XML and this specification was mandating that you must send the documents as XML, it was kind of crazy.

So XHTML 1.1 was just not that realistic, and the reason why you would not want to send your documents as XML even to browsers that understand XML is the error handling model of XML. The syntax of XML, okay I’ve got no problems with lowercase attributes, lowercase elements, always quote your attributes, that’s fine, in fact I kind of like it, but the error handling model of XML is this: When the parser comes across an error, stop parsing. That’s in the specification. So when you serve up XHTML 1.1 as XML, and let’s say you open it up in Firefox and you have one uncoded ampersand, just one on the whole page, then what you’ll see is the yellow screen of death. Firefox will say “Nope, you can’t see this web page because there’s one error on this page”. That is the correct behaviour according to the XML specification, for Firefox to stop right there and not render anything else is actually correct according to XML. Not HTML, because HTML has never had an error handling model, but according to the rules of XML that is correct.

So that’s another reason why you’d not want to serve your documents as XML. And then the next iteration was XHTML 2, and you’ll notice there’s no date next to that because it never actually got finished.

Now, XHTML 2, I want to be very clear on this, is actually a really, really nice specification, a really good specification …from a theoretical point of view. I mean the people building the spec were very, very smart people. Actually the main guy leading the spec was Stephen Pemberton, who is a resident of these parts, an incredibly smart guy, and it’s a fantastic specification and it would be a wonderful format if everyone agreed to use it but it’s just not that practical.

For one thing, it still uses the XML error handling model, you’re supposed to serve your documents as XML, forget about it: we’re not going to do that. And two: it was deliberately going to break

Have you published a response to this? :