The Design of HTML5

The opening keynote from Fronteers 2010 in Amsterdam.

This is the opening keynote from Fronteers 2010 in Amsterdam. It was kindly transcribed by these volunteers:

You can:

I would like to talk to you today about The Design of HTML5. So there’s two parts to this: one is, of course, HTML5. I could stand up here and just talk about HTML5 but that’s not what I’m going to do because if you want to know what is in HTML5 you can Google it, you can read books, you can go and read the spec.

Actually, some other people are going to be talking about the contents of the spec. Steve Faulkner will be talking about accessibility and HTML5. Paul Irish is going to go through a whole bunch of the APIs that are in HTML5. So I’m not just going to stand up here and run through what’s in HTML5.

Actually, before I even get started I should probably clarify what I mean by HTML5, which seems kind of crazy ‘cause why should I have to clarify what I mean by HTML5 when what I mean by HTML5 is HTML5? There’s a specification, it’s called HTML5 and when I say HTML5 that’s what I am referring to. The problem is other people are using the term HTML5 to refer to just about anything, which can be problematic. For example, referring to CSS3 as HTML5 for some reason seems to be a common technique. That’s not what I’m referring to. When I say HTML5 I don’t mean CSS3, I mean HTML5.

We’ve been here before with terms. It used to be that Ajax meant something specific and then, after a while, it just became “doing anything cool with Javascript.” That was Ajax, right? And now the same thing seems to be happening to the term HTML5. It’s supposed to mean a specific specification and now it just means “doing anything cool (fullstop) on the web.” But that’s not the HTML5 I mean. I don’t mean this umbrella term that covers anything that’s new these days. I’m specifically talking about the specification: HTML5.

As I said, it’s not so much the contents I want to talk about. It’s not going through a checklist of what’s in HTML5. It’s the other side of it, it’s the design of HTML5. What I want to talk about is not so much what is in the spec but why these things are in the spec, what the process was in thinking of these things when designing a specification.

In particular, one of the reasons why I think HTML5 as a specification is quite successful—and the process has been successful—is that it is driven by design principles. Design principles are something I am getting more and more fascinated with.

Design principles

A design principle is essentially a belief, a tenant, a concept that you rally behind. It doesn’t matter wether your making a specification or if you are making a physical object or a piece of software or a programming language. You will probably find a design principle or multiple design principles behind all good examples of anything that has been built collaboratively. And it’s not just from the world of the web. Throughout history there example of design principles behind large scale constructions like countries, societies.

To give you an example, from the United States of America, this is a design principle built into the Declaration of Independence.

We hold these Truths to be self-evident, that all Men are created equal, that they are endowed by their Creator with certain unalienable Rights, that among these are Life, Liberty and the pursuit of Happiness.

They have the watchwords in there: life, liberty, the pursuit of happiness. These are the keys things enthroned into the constitution, this is what we are all about, these are the principles on which we want to build our society.

Another example would be from Karl Marx whose writing were used as a basis for building societies throughout the 20th century and a lot of it could be boiled down to this one design principle:

From each according to his ability, to each according to his need.

So here we have this design principle guiding an economic system.

Another example, much older than that but using a similar principle to this would be:

Do unto others as you would have them do unto you.

A very simple, small design principle from Jesus Christ, a Nazarene Jew from two thousand years ago. And this design principle theoretically is what drives a number of religions that have been built on top of the teachings of this principle. The principles and the practices sometimes go out of sync.

Here’s an example from fiction. When George Orwell wrote Animal Farm he had a fictional society and that fictional society was built upon a design principle. In this case the design principle was:

Four legs good, two legs bad.

What’s interesting is that in Animal Farm, as the society changes, as the society evolves for the worse, the design principle changes along with it so the design principle comes four legs good, two legs better as the animal farm itself changes but it’s interesting to see this in works of fiction.

There is another work of fiction that has three design principles baked in and that’s from the canon of work from Isaac Asimov on robotics. He coined the term robotics and he basically ensconced these three laws of robotics, three fairly simple design principles but then build a whole canon of would around them, about fifty books, each permutations examining these design principles from different aspects. You’re probably familiar with the three laws of robotics I’m sure.

A robot may not injure a human being or, through inaction, allow a human being to come to harm.

A robot must obey any orders given to it by human beings, except where such orders would conflict with the First Law.

A robot must protect its own existence as long as such protection does not conflict with the First or Second Law.

This is, I think, the first example in fiction of design principles for a piece of software. In this case it would be a price of software to run a positronic brain in a robot that’s designed to the three simple design principles. But I think this might be the start of actually having design principles for software. And since then we see design principles for a lot of really good software.

Tim Berners-Lee who, as you know, co-invented the web. He has a document on the W3C website where he keeps his own set of personal design principles at a URL. They are kind of sprawling, there are a lot of them there, he adds to them, he adjusts them, he takes things away as time goes on but I think it’s a really good idea to have a personal set of design principles somewhere.

Actually, Bert Bos, co-inventor of CSS, he’s got great document on the W3C website that is kind of a meta design principles document. Like how to design building a format, whether it’s CSS or anything else. It is well worth reading.

So if you root around the W3C site you find a bunch of these design principles including TIm Berners-Lee’s personal ones and you see the watchwords he’s taken from schools of software engineering: decentralisation, tolerance, simplicity, modularity. These are key watchwords that he keeps in mind as they come up with formats.

You’re all pretty familiar with the work of Tim Berners-Lee as you use it everyday. He invented the web, co-invented the web with Robert Cailliau, and as well as inventing the web itself he also came up with the language that we use everyday on the web and that language is of course HTML: HyperText Markup Language.

HTML

This is an early history of HTML, it started with version two point zero. There never was a HTML one. If anyone ever tells you they have been doing HTML since version one, they are bullshitting you. There was a document called HTML Tags that contained a handful of tags that still exists today but it wasn’t an official specification.

This whole idea of using tags, angle brackets, ‘p’ or ‘h1’ or what have you; that isn’t something than Tim Berners-Lee came up with. He was basically taking an existing vocabulary from SGML, the particular version of SGML that was in use at CERN at the time. So even back then he wasn’t creating things from scratch; that’s an important lesson that you can still see in the evolution of HTML. Build on what’s come before rather than trying to build something from scratch.

So this HTML Tags document was the first version of HTML but wasn’t an official version. The first official version was HTML 2.0 which didn’t come from the W3C. HTML 2.0 was from the IETF, the Internet Engineering Task Force. They were responsible for putting out a lot of standards before the W3C really started off. But from version three onwards it was at the W3C, the World Wide Web Consortium, where later versions of HTML were specced.

There was some fairly rapid movement in the nineties. As you can see, the nineties were a pretty turbulent time for anyone building websites back then. We had the browser wars, it was pretty messy. We had a lot of proprietary shit being thrown into browsers. They were trying to compete on having the best proprietary shit. It was kind of a messy time and it wasn’t clear at all at this time if HTML was even going to be around and if HTML was going to last as the format for the web.

You can see it evolved fairly quickly, 3.2, 4.0, 4.01 from 1997 to 1999; very rapid evolution. What happened with 4.01 is the W3C stepped back, looked at it and said “Okay, this is good, we are done with HTML; HTML 4.01 is the final version of HTML, we don’t need a HTML working group anymore.”

They didn’t stop working on the language but it was no longer HTML they were interested in. Right after HTMl 4.01, they came up with XHTML 1.0. It sounds completely different but actually XHTML 1.0 was the same as HTML 4.01. I mean literally, the contents of the specification were the same, the vocabulary was the same, all the same elements, all the same attributes. The only difference, the only difference, was that in XHTML 1.0 you would use XML syntax. So that meant all your attributes had to be lowercase, all your elements had to be lowercase, all your attributes had to be quoted, you had to remember to use closing tags and you had to self close tags like img and br.

From the point of view of the contents of the spec, exactly the same. There really was no difference. In a sense it really was just coding style, because to a browser if you served up HTML 4.01, HTML 3.2 or you served up XHTML 1.0 it didn’t matter, it was all the same to the browser; it would make the same DOM tree. But what was kind of nice about XHTML 1.0 was because it had this kind of stricter syntax, it was a sort of coding style that people could get behind.

This time period of 2000, this was when the web standards project was picking up steam, and developers were really pissed off with all this proprietary crap that was being thrown into browsers; they were getting angry and saying to browsers “Why don’t you just follow the damn specifications?” And CSS is really starting to take off in a big way, and they kind of latched on to XHTML 1.0, they were like “Okay this is going to be best practice”, even though as I said, there’s really no difference between HTML 4.01 and XHTML 1.0. But okay, professionals always use lowercase elements, always use lowercase attributes, always quote your attributes: it was a good body of practice, so a lot of people got behind that syntax.

I did for example! So for the last 10 years I’ve been using the XHTML 1.0 doctype, and one of the reasons is that it makes the validator a more powerful tool for me, right? So if I’m writing XHTML 1.0 and I run that through the validator it’s going to tell me if I forgot to quote an attribute, or if I forgot to include the closing tag, stuff like this. Whereas if I was writing in HTML 4.01 that stuff would be legal, it wouldn’t necessarily catch it.

That’s the reason why I’ve been using XHTML 1.0. And I’m guessing that a lot of people …hands up those who use XHTML 1.0. Okay. HTML 4.01? A few people. Any others, shout them out? HTML5, good for you! Anything older, anybody use older doctypes? No?

I’ve been using XHTML 1.0 for 10 years now because it makes validators a more useful tool. Is anybody using XHTML 1.1? Are you now? Keep those hands up. Are you serving your documents as XML? Some? Well the ones you’re not are not XHTML 1.1.

This is the big issue. After XHTML 1.0 came XHTML 1.1, a small point increase, doesn’t sound like much, and again there’s nothing new in the spec from a vocabulary point of view, it’s all the same elements, it’s all the same attributes. The only difference was that now with XHTML 1.1 you must serve your documents as XML. With XHTML 1.0 you could serve them as HTML if you wanted, and that’s exactly what we do because you’d be kind of crazy to serve your documents as XML.

One of the reasons why it’d be crazy to serve your documents as XML is that Internet Explorer can’t handle it. Well it can now on version 9. It’s like “Aww, thatt’s so cute”, that it’s still even thinking about it. That boat has sailed! So the world’s leading browser at the time couldn’t even handle documents sent as XML and this specification was mandating that you must send the documents as XML, it was kind of crazy.

So XHTML 1.1 was just not that realistic, and the reason why you would not want to send your documents as XML even to browsers that understand XML is the error handling model of XML. The syntax of XML, okay I’ve got no problems with lowercase attributes, lowercase elements, always quote your attributes, that’s fine, in fact I kind of like it, but the error handling model of XML is this: When the parser comes across an error, stop parsing. That’s in the specification. So when you serve up XHTML 1.1 as XML, and let’s say you open it up in Firefox and you have one uncoded ampersand, just one on the whole page, then what you’ll see is the yellow screen of death. Firefox will say “Nope, you can’t see this web page because there’s one error on this page”. That is the correct behaviour according to the XML specification, for Firefox to stop right there and not render anything else is actually correct according to XML. Not HTML, because HTML has never had an error handling model, but according to the rules of XML that is correct.

So that’s another reason why you’d not want to serve your documents as XML. And then the next iteration was XHTML 2, and you’ll notice there’s no date next to that because it never actually got finished.

Now, XHTML 2, I want to be very clear on this, is actually a really, really nice specification, a really good specification …from a theoretical point of view. I mean the people building the spec were very, very smart people. Actually the main guy leading the spec was Stephen Pemberton, who is a resident of these parts, an incredibly smart guy, and it’s a fantastic specification and it would be a wonderful format if everyone agreed to use it but it’s just not that practical.

For one thing, it still uses the XML error handling model, you’re supposed to serve your documents as XML, forget about it: we’re not going to do that. And two: it was deliberately going to break backwards compatibility with existing versions of HTML. At one point they were talking about deprecating the img element, which seems kind of crazy to people working on the web every day, but, you know, they had good theoretical reasons for doing this, that the object element might be a better thing to use.

So XHTML 2, despite the fact that it was a great format in theory, never took off in practice, and it was never going to take off in practice because authors like you and me were never going to get behind something like that, that breaks backwards compatibility, and neither were browser makers. Browser makers are committed to maintaining backwards compatibility.

And there’s one simple reason why XHTML 1.1 is really not that widely used as XML, and why XHTML 2 never took off, and it’s down to a design principle, and that design principle is Postel’s Law. You’ve got to:

Be conservative in what you send; be liberal in what you accept.

Now be liberal in what you accept, that’s what the web is built on. People making web browsers have to be liberal in what they accept because they are given some pretty crappy stuff to accept, right? A lot of documents on the web are not pretty, but that’s the way the web is. The web has kind of evolved in a very messy way, but it’s a beautiful mess. There’s a lot of badly-formed documents out there and it would be great if everyone was writing proper XML and everything was well formed, but that’s not the reality. The reality is Postel’s Law.

So as professionals we try to be conservative in what we send, we try to use best practices, we try to make sure our documents are well-formed, but from a browser’s perspective, they must be liberal in what they accept.

And if you think about the error handling model of XML, when it’s applied to XHTML 1.1 and XHTML 2, that error handling model is a draconian error handling model. It’s definitely not being liberal in what it accepts, it’s quite the opposite to say when you encounter a single error stop parsing; that’s the opposite of the Robustness Principle.

HTML5

So we get to HTML5, which didn’t come directly from the W3C. What happened was, there hadn’t been an HTML working group since the end of the twentieth century, some people at the W3C kind of think “Maybe there’s still life in HTML, if we just extended it a bit more. Instead of concentrating all our efforts on XHTML we could improve the forms in HTML, we could make HTML a bit more application-like, and just evolve it a bit more.”

So there was a workshop in 2004 of W3C members, and Ian Hickson, who was working at Opera at the time, he put forward this proposal to extend HTML, to do some work on HTML. It paralleled XHTML 2, but to continue working on HTML, to expand it a bit more. And they had a vote and the W3C voted “No”, that HTML was dead and XHTML 2 was the future. So these browser makers, Opera, Apple, a few others said “That’s fine, we’re just going to go over here and we’re going to work on this stuff by ourselves, outside the W3C”. So they formed the Web Hypertext Applications Technology Working Group, WHATWG—it’s a good thing they called themselves a working group rather than a task force.

What they decided to do was, completely outside the W3C, continue working on HTML and throwing in some new stuff, and because they’re browser makers, not only are they coming up with this stuff, but they’re also shipping it. They’re coming up with good ideas and putting it into browsers.

And things move pretty fast. They start getting stuff done. Meanwhile over at W3C with XHTML 2, nothing much is really happening. In terms of actual implementations, there really isn’t much is happening.

So then an interesting thing happens, which was in a blog post in 2006, Tim Berners-Lee said “You know what? We were wrong. We were actually wrong to expect the whole web to turn to XML overnight, that was pretty unrealistic of us, and yeah, we probably should re-charter the HTML working group”. So that’s exactly what happened. The W3C started up an HTML5 working group in 2007. And the first question they had of course was “Well, do we start from scratch? Or do we take as our basis all this work that’s been done since 2004 over at this other group called WHATWG?” And that was a no-brainer, of course they were going to take the existing work and they should build on that, so they had a vote and they agreed “Yeah, we’re going to build on what the WHATWG are doing”. So now they need to work together with the WHATWG.

Second question is how to make that smoother; who should be the editor at the W3C? Should we have the same editor that they have over at the WHATWG, which is Ian Hickson who is these days working with Google. And they had a vote and said “Yeah, actually it would make things better if Ian Hickson was the editor of the HTML5 spec at the W3C as well as being the editor of the spec over at WHATWG.”

So they voted on that. And that’s kind of the situation today. So it is one format, but it lives in two places. You’ve got the specification listed on the WHATWG site and it’s also listed on the W3C site.

It’s a little bit confusing if you’re coming to it cold: “Which one is the real spec?” Well, they are both the same …kind of. Actually, the specifications will diverge in the future. They have started to diverge slightly. The idea is that the W3C is working on nailing down a specific specification that will go to last call, and that will become a working draft, and it will be set in stone.

Whereas at the WHATWG they go for a constant iteration. So even if it is currently called HTML5, it’s actually not the greatest term for what the WHATWG is working on. It’s better to think of this as just plain HTML or web technologies, ‘cause that’s the idea there. But it’s definitely a confusing situation that there are these two groups working on what is basically the same specification. It can be quite confusing.

Then there is the processes behind the two groups, because they are quite different, philosophically speaking. The way that things work at the WHATWG is essentially a dictatorship. You’ve got Ian Hickson as the editor. He will listen to all sides, everyone states their arguments and then he does what he thinks is the right thing to do.

At the W3C it’s the opposite. It’s a democracy. Everyone gets a say and everyone gets a vote. And it’s the vote that matters, the vote that decides. Now on the face of it, the WHATWG way of doing things sounds horrible. It sounds really bad. It’s like “That’s not a way to run any kind of project!”

And the W3C way sounds great. Sounds very egalitarian. In practice however, the WHATWG way works pretty darn well. And I think it works well because of Ian Hickson. He is actually a really, really good editor. He is almost robotic in his ability to listen to arguments dispassionately.

The W3C way, which is very fair in principle, actually can get bogged down quite quickly in process and procedure, and it takes a long time to get stuff done. So what works best in a way, I think, is to have a mixture of the two. So the fact that these two bodies are working on the same spec, I think, they kind of complement each other quite well.

One of the reasons why they can work together is because of the design of HTML5. It’s because they have thought from the start what they are trying to achieve. So as well as there being the specification itself, that lives in a document on the W3C site—this is the specification for HTML5 language—there is another document on the W3C site, and that is HTML design principles. And one of the editors of this document is here in this room with us today, Anne Van Kesteren. So if you have any questions about this document, you can ask Anne.

It’s an excellent document, it’s really good. And essentially what it is, is: You’ve got the W3C and the WHATWG trying to work together. And it’s like the odd couple, right? How will they ever get on? Well, this document kind of enshrines what they are working on, what they can both agree on.

I want to talk through those things. Because if they can all agree on what’s in this document, then I think, HTML5 will be a great specification, and they have agreed that this is what they’re working towards. So you can see that the watchwords are compatibility, utility, interoperability. So for all the differences between the WHATWG and the W3C—and there are plenty of differences—they also have this stuff in common. And that’s the most important thing. That they have common ground and that they have enshrined that common ground into a designs principles document.

Avoid needless complexity

I’d like to show you some of the design principles that you can find in this document. Here is one, very simple: avoid needless complexity. Seems fairly straightforward. I’ll illustrate it with an example.

Let’s suppose I’m writing an HTML 4.01 specification. I open up my document and I type the doctype. Has anyone got that doctype memorised? Okay, no, I guess not. It’s possible… I mean, you’re a geeky bunch. Somebody here might have memorised it. This is the html 4.01 doctype

<!DOCTYPE html PUBLIC "-//W3C/DTD HTML 4.01//EN"
"http://www.w3.org/TR/html4/strict.dtd">

I don’t keep that stuff memorised, because this is why we have text snippets, that’s why we have google, why we have templates.

What about if I’m writing it for XHTML 1.0, which I have been writing for ten years now. Anyone got the doctype for that? It’s kind of an equally long string:

<!DOCTYPE html PUBLIC "-//W3C/DTD XHTML 1.0 Strict//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">

They are basically the same. This document is XHTML 1. And that’s essentially what it’s saying. So in HTML5, to omit needless complexity, the doctype is simply:

<!DOCTYPE html>

And that’s it. Now, even I can remember that one. I don’t need to keep a text snippet for that. I have to say when I first saw this doctype, that this was supposed the doctype for HTML, I was somewhat taken aback: “Shouldn’t the number five be in there somewhere, you know?” I thought: “What are they saying here? Are they saying this is it? This is the only version of HTML that’ll ever be, we’re going to get it right first time and there’ll never be a need for another version of HTML?” That seemed so arrogant. But that’s not what they’re saying at all. Instead what you have to understand is why doctypes exist in the first place. It’s not for browsers. Doctypes exist for validators. So the reason why I slap an XHTML 1.0 doctype on the top of my document is that when I feed it through a validator, the validator checks against that doctype.

A browser doesn’t care. Let’s say I’m writing HTML 3.2. At top of my document I put the doctype for HTML 3.2. And then somewhere in my document I use an element that was first introduced in HTML 4.01. What’s the browser going to do? Is it going to not render that element, because it was introduced in a version of HTML later than the doctype has specified? No, of course not! It’s gonna render the element, because of Postel’s law, because of robustness. It’s got to be liberal in what it accepts. So browsers are not checking against any type of format, validators do, validators care. This is the reason why doctypes exist.

And because one of the design principles of HTML5, it’s going to be backwards compatible and future compatible, that any future versions of HTML—and there will be an HTML6 and an HTML7, whatever—but they need to be backwards compatible with the current version of HTML, which is HTML5. Hence having a version number in your doctype doesn’t really make much sense, even to a validator.

Now, I say the doctypes aren’t for browsers; that’s mostly true. There is one instance where it matters to browsers what doctype you’re using, and you are probably all familiar with it. It’s not the reason why doctypes exist, it’s a hack that uses doctypes. And that’s when Microsoft were introducing CSS, getting ahead of the standard, they tried putting CSS into the browser, they had their own box model—in some ways a more intuitive box model than the standard box model—and then the standards come out, and the standards are using a different one. What could they do? They want to support the standards, but they also have to maintain backwards compatibility with the old way they were doing things. How can they tell if an author means I want to use standards or I want to use the old-fashion way?

So the trick they came up with is a very clever hack. It was to use the presence of a doctype, of a valid doctype as the trigger for standards mode rather than quirks mode. And that is very clever. And that’s generally what we do today, when we put a doctype on the document. We’re saying, “I want to use the standards mode,” but that’s not why doctypes were invented. That’s a hack that happens to use doctypes.

The six million dollar question is, “Wait a minute, if I go ahead and slap doctype html on the top of the document, and I feed that to Internet Explorer, is it going to use standards mode or is it going to use quirks mode?”

It turns out, this is the minimum number of characters necessary to trigger standards mode in Internet Explorer. I think that illustrates the kind of fundamental approach to HTML5: that it’s not about the theoretical benefits. It’s not about, “Oh, wouldn’t it be nice if authors had a nice short doctype that was easy to memorise?” Yeah, that would be nice, but if it doesn’t work in existing browsers, then forget about it. So there’s this great balance between something that’s theoretically a good idea—a nice short doctype—and practically a good idea, because it still triggers standards mode. And the doctype is a good example.

And there’s other examples that are omitting needless complexity, avoiding needless complexity in the spec. For example, in previous versions of HTML, in HTML 4.01, suppose I want to specify the character encoding of my document. The ideal way is, you have the server send the character encoding in the heading, but you can also specify it at the document level:

<meta http-equiv="Content-Type" content="text/html; charset=utf-8">

Again, I wouldn’t memorise this. I’ve got better things to do with my brain cells, but this is how you specify it, that I want to be the document to be UTF-8. So this is how you do it in HTML 4.01. And in XHTML 1.0 you’ve got to do a bit more, because as well as the meta element itself you also have to declare it in the xml opening tag.

<?xml version="1.0" encoding="UTF-8" ?>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />

In HTML5, it turns out that all you have to do is say:

<meta charset="utf-8">

Nice and short. And I can memorise that.

Once again, it turns out this works. Not only works in very modern browsers: this works today in all browsers. Because it turns out that when we were feeding these meta elements to browsers, they were parsing them like this: “Meta blah blah blah blah blah charset utf-8”. That’s essentially what a browser sees when it parses that string. And it has to, because of Postel’s Law, right?

I keep coming back to the Robustness Principle, but people get this wrong so browsers think “Okay, I think somebody is trying to specify a character set …oh yeah, utf-8”. That’s just getting codified into the spec. Now it’s okay to just leave out the blahs and just write meta charset=”utf-8”.

There are some more examples of omitting needless complexity, avoiding needless complexity. But the needless complexity can be avoided without breaking in existing browsers. For example, in HTML5 if I’m linking off to a stylesheet by using a link element and I say rel=”stylesheet” and then I say type=”text/css”, well I’m kind of repeating myself. And it turns out for a browser, I’m repeating myself. A browser doesn’t need both things. A browser is perfectly fine saying rel=”stylesheet” because it’s going to guess that your serving up CSS. You don’t have to include the type attribute. You’ve already said it was a stylesheet; you don’t have to say it again. You could if you want; if you want to include the type attribute, go ahead.

And likewise, if you’re using a script element and you say type=”text/javascript”, the browser kinda already knows that. What else would you be using on the web? If you want to use a different scripting language on the web, go for it. I don’t think any browsers will support you.

You can add a type attribute if you want. But you can leave it out and the browser is going to assume you’re using JavaScript. Avoiding. Needless. Complexity.

Support existing content

Support existing content. This is really important because a lot of people are thinking of HTML5 as new and shiny; that it’s all about what’s coming in the future, going to make the web better in the future. And it is, right? Obviously we have to think about the future in making the web a better place, but they have to think about the past. And remember a lot of people on the working group the W3C, these are browser makers so they very much have to think about supporting existing content. This is the watchword of anyone who has ever had to build a browser: you have to support existing content.

Let me show you an example of how HTML5 supports existing content.

Here we have four different ways of writing the same thing. There’s an img element and there’s a paragraph element with an attribute on it. The only difference is the syntax. You serve any one of these four pieces of code, pieces of markup, to a browser and the browser will serve it into the same DOM tree, no problem whatsoever. From a browser’s perspective, there are no differences between these four pieces of markup. And so in HTML5 you can go ahead and use any syntax you want.

<img src="foo" alt="bar" /> 
<p class="foo">Hello world</p>

<img src="foo" alt="bar">
<p class="foo">Hello world

<IMG SRC="foo" ALT="bar">
<P CLASS="foo">Hello world</P>

<img src=foo alt=bar>
<p class=foo>Hello world</p>

Now to us looking at this, we kind of look at this and go “No, no, no. One of those is right and three of those—something is fishy.” No, you should be quoting your attributes! Come on! We all quote our attributes. Uppercase elements? Didn’t we leave that behind ten years ago?

And so I kind of get a queasy feeling when I see that this is all now allowed in HTML5 because I have been writing XHTML 1.0 for ten years now and I have gotten pretty used to that coding style. But you have to understand, from a browser’s perspective, this is all really the same. It really doesn’t matter.

Doesn’t make anyone else queazy? You look at this and think “Ooh, this is sloppy, this is bad?” Is it just me? Am I the only one?

But they have to support existing content and existing content looks like this. Right? This is the way browsers already work because of Postel’s Law.

And some people said “This just will not do. We need some kind of trigger within the language to say the author knows what they are doing.” They want to use one particular syntax, like the XHTML syntax, for example, rather than a different syntax. And I can see why that’s what people want. But I disagree that it has to be in the language itself. Because what we are talking about here essentially is a coding style or a writing style rather than being syntactically correct. So what I think is needed for professionals like us is lint tools, because we have lint tools for other technologies we use.

For example, with JavaScript. JavaScript is another example of a kind of messy, sloppy language that is powerful because it is messy and sloppy and there are many different ways of writing it. In javascript you’re supposed to put a semi-colon at the end of every statement but you don’t have to, because javascript will perform semi-colon insertion …which does sound quite painful.

So we have tools like JSlint: jslint.org from Douglas Crockford, where it says right there on the page “JSlint will hurt your feelings.” But it’s a really good tool because you can make perfectly good javascript and run it through JSlint and it will say “Okay, this JavaScript is valid but your doing it wrong. This coding style: I don’t like it. I disagree with it. It’s not good.” That’s actually really handy when your working in a team because you can all have one coding style you all agree on.

I think that if your working in a team, even if your working alone, you need to settle on one syntax. No one syntax is superior to another in terms of browsers parsing it, but I do think that as professionals, we say “This is my coding style.” But I don’t think that needs to be in the language. I think that lint tools helps all that. There are now lint tools. If you go to htmllint.com, you can run your HTML5 documents through it and it will check for things like, have you quoted your attributes, are the elements lowercase, or you can click checkboxes to say what you want.

So it’s not a return to sloppy mark up if you don’t want it to be. And as I said, they have to allow this in HTML5 because this is what browsers allow. And it all comes back to Postel’s Law. We keep coming back to Postel’s Law.

Solve real problems

Another design principle in HTML5 is to solve real problems. It seems obvious and yet there are plenty of formats and specifications out there that, in my opinion, are solving theoretical problems rather than real problems. This is all about solving problems people have today, solving the pain points.

I’ll give you an example. Here’s one I think you might be able to relate to. Let’s say in HTML 4 or XHTML 1, I’ve got this piece on my page, a block of content, and I want the whole thing to be clickable, right? But it’s got a headline and a paragraph and maybe its got an image in there too. Now I want the whole thing to be clickable so I need three link elements. So I open up my headline, my h2 for example, and then I’ve got my text and thats what I wrap in the a element. And then I open up the paragraph and then I open up another link, and then I wrap that and then I put the text in that…

<h2><a href="/path/to/resource">Headline text</a></h2>
<p><a href="/path/to/resource">Paragraph text.</a></p>

In HTML5, all I simply do is wrap the whole thing in an a element.

<a href="/path/to/resource">
<h2>Headline text</h2>
<p>Paragraph text.</p>
</a>

Yes, it’s a load of block elements but yet I can wrap them in an a element. This is great. I know this is great because I have been in this situation, where I’ve needed do this. And I for one welcome our new HTML5 overlords.

It’s solving a real world problem. I bet you have come across this problem before as well.

Now here is the great thing about it. This isn’t something that browsers now have to go out and start supporting. This is something which exists in browsers already because people were doing this kind of stuff already, even though it wasn’t legal yet. So essentially all that HTML5 is saying is “What you have been doing for years? That’s actually allowed now.”

Pave the cowpaths

This design principal is probably the most buzzwordy: pave the cowpaths. I don’t know if you have heard that in a business meeting, “run it up the flag pole, pave the cowpaths.” But it is actually a very good design principal because essentially what it is saying is, when you are deciding what to tackle—where are the pain points—look at where people are already finding hacks to patch these problems today. That is essentially where you should concentrate your efforts. Look at where people are already coming up with solutions.

A good example of paving the cowpaths are the new semantic elements in HTML5. And there are not that many: it’s not infinitely extensible, which I think is a good thing. It’s basically a handful. But they’re good. You’ve got header, footer, section, article …and a lot of these will be familiar to you. I mean the names of these will be familiar to you already even if you’re not using HTML5 because you will have been using class names such as class=”header” or “heading” or “head” or class=”footer” or “foot”. Or IDs, if your saying id=”header”, id=”footer”. These are already fairly familiar things.

So, for example, today you might be writing a document like this.

<body>
<div id="header">...</div>
<div id="navigation">...</div>
<div id="main">...</div>
<div id="sidebar">...</div>
<div id="footer">...</div>
</body>

You’ve got a div id=”header”. You’ve got a div id=”navigation”. Stuff like this. In HTML5 you’ll be able to swap out these elements. And when talking about these new elements, this is the way it is often presented: “Oh look, it’s great: you can swap out these divs with ids for these new HTML5 elements.”

<body>
<header>...</header>
<nav>...</nav>
<div id="main">...</div>
<aside>...</aside>
<footer>...</footer>
</body>

And that’s true: you can. These elements work at the document level like this and that’s good. But if that were the only reason these elements were introduced, then that would be kind of a waste of effort.

While you can use these elements in the document level as a replacement for IDs, it’s much more useful, in my opinion, to think of them as a replacement for classes.

While you can use these new elements this way, at the document level, as a replacement for IDs, it’s much more useful in my opinion to think of them as and replacement for classes, because the real power comes in that you can use these things multiple times per page, not once per page. Yes, you’ve got a header for your document and you have a footer for your document; you also have a header and a footer for each section within your document. And each section might have another section within it and that could have a header and that could have a footer, right?

It’s these four elements: section, article, aside and nav, that are the really powerful ones because they introduce a whole new content model that hasn’t existed in any previous version of HTML; this idea of sectioning content. Up ‘till now we’ve used divs to kind of group things together, but divs are just like any other element, they don’t have any extra semantic meaning. While section, article, aside and nav are essentially saying “This is like a document within a document.” Anything that’s inside one of these elements will have its own outline, its own heading, its own footer.

The way to think about it is basically section, the most generic one, is thematically related content. And article is a specialised kind of section. Aside is a specialised kind of section. Nav is a specialised kind of section.

So, whereas today I might have markup that looks like this, where I’m using classes to delineate the different parts of a page—using divs, right?

<div class="item">
<h2>...</h2>
<div class="meta">...</div>
<div class="content">
...
</div>
<div class="links">...</div>
</div>

There might be the meta content about who wrote this piece of content. We have some links at the bottom, stuff like that. With HTML5, I basically can say that this is like a document in itself, by using sectioning content, by using something like section or article or aside, I can say “This kinda stands alone.” And I can use header and use footer.

<section class="item">
<header><h1>...</h1></header>
<footer class="meta">...</footer>
<div class="content">
...
</div>
<nav class="links">...</nav>
</section>

Notice also that the footer doesn’t necessarily have to be at the bottom, right? The important thing about header, footer, aside, nav, is the semantics. It’s nothing to do with positioning. We think of the word footer and we think “Oh, that comes at the bottom.” We think of aside as a side bar. But if you look at the specification, it’s all about the content. So the content that you put in the footer could be things like the byline, who wrote this, and that’s the element that you use. It doesn’t say “Must come at the bottom of the document or the bottom of the section”.

The really interesting thing here, though, is not that I’ve swapped out some divs with classes for some new elements, but the fact that I swapped out an H2 element for an H1. Shock, horror! What if I already have an H1 on the page?! The Google siren will go off and my SEO will be terrible and the world will end.

You can have multiple H1s in a document. This is not something new to HTML5. You have always been able to have multiple H1s in a document—shock, horror. I’ve met working professionals on the web who have thought for years that it was in the specification that you could only have one H1 in a document. There has been a lot of SEO snake oil peddled on this subject. SEO has a lot of dogma attached to it. And by dogma, I mean belief without data. And the dogma has traditionally been like “Oh, if you ever put more than one H1 in the page you will die.” In HTML5, every time you open a new piece of sectioning content, section, article, aside, nav, whatever, you can begin with H1 again, rather than having to stick to whatever level you were at; H2, H3, whatever it was.

This is incredibly powerful. This could really revolutionise content management. The fact that now, you can literally think about your chunks of content as standalone content that can be taken out of context. Now, depending on the context within it finds itself on the page, this H1 would have the role of being an H2 or H3, depending on where it finds itself within the document. It can be a little weird to get your head around at first, but I actually think this is possibly the most powerful part of the new semantic elements in HTML5; that they’re literally standalone elements, and we get this whole new level of headings in there.

I could have a document with a section in it, that section could have a section within it, or an article, and sections in articles, articles in sections, sections in sections, articles in articles. And each one could have H1s to H6s all the way down so there’s like H-infinite at this point. But when you’re dealing with your content, or your content management system, they are standalone, properly standalone chunks of content now. And that’s very powerful.

And this is not some wacky idea that the working group came up with, or the W3C came up with recently. Here’s an email from Tim Berners-Lee from 1991, an email to Dan Connolly, he’s explaining about this HTML thing, and he says “You know …you know what I’d prefer, instead of H1, H2 and all that stuff, I’d like for it to be this nested section element, and a generic H element, so that we could just nest our levels within it.” Now we don’t have this generic H element, we’re still using H1 and H2—that’s because we’re still supporting the existing content—but we’re finally seeing this idea realised 20 years later.

Degrade gracefully

Here’s a principle that I’m sure that you’re all familiar with already, which is degrading gracefully. We’ve all being doing this for years anyway. With progressive enhancement we get graceful degradation for free.

One of my favourite examples of this principal in action in HTML5 is the way that forms have been enhanced in HTML5 using the type attribute. So this is a whole bunch of new values that you can give to type like number, search, range, all that stuff.

input type="number"
input type="search"
input type="range"
input type="email"
input type="date"
input type="url"

The great thing is what browsers do when they encounter this. Existing browsers, not future browsers, existing browsers that don’t understand this stuff, the way they degrade gracefully is whenever they see a type value they don’t understand, they just treat it like text.

You write input type=”foo”, input type=”bar”, and every browser out there will say, “Eh, he probably meant text.” So that means you can start using this stuff and be secure in the knowledge the browsers that don’t understand it will simply treat it like input type=”text”—a great example of browsers practicing graceful degradation.

So this type equals=”number”, for example. Suppose you have an input that requires a number. You could say the input type is number and then in a browser that understands that input type, you might get a nice control like this little spinner thing. Right? And then in a browser that doesn’t understand it, you just get a text input, which is what you would have used anyway. So why not just say, input type=”number” and you get that spinner for free?

You can have min and max and all that stuff in there as well, but it degrades gracefully. That’s the key issue.

Input type=”search”. You might as well start using it because, in a browser like Safari you get this nice Operating System level search control and a little X you can click away the search term with. And in all other browsers you just get a text input, as if it were input type=”text”, which is what you would have used anyway. So why not just say input type=”search”? It doesn’t hurt, right?

New attributes have been added as well. You’ve got this placeholder attribute you can throw in. You know this pattern, right? You’ve got some text that’s already inside the input. Not a label — a placeholder and a label are two very different things. It’s an example value that’s sort of greyed out. You click in there and it goes away. You click out and type again, it’ll come back.

We can hack this stuff together with JavaScript today, but now with HTML5 you just use a placeholder attribute.

And what you can do as well, if you still want to do it with JavaScript for browsers that don’t support this, is it’s pretty easy in JavaScript to test — does the browser understand the placeholder attribute? If it does, just step back, get out the way, don’t do anything. If it doesn’t, then include your JavaScript to mimic this functionality.

Now, I couldn’t talk about HTML5 without bringing up this subject—HTML5 versus Flash. You may have heard about this. You may have read about this somewhere. And I really don’t get it. I’m really puzzled by this whole battle that’s supposedly going on.

First of all, when people talk about HTML5 versus Flash, they’re not talking about HTML5 and they’re not talking about Flash. They are talking about a subset of HTML5 and a subset of Flash. Specifically, they’re talking about video. So whenever you hear, “HTML5 versus Flash” they probably mean HTML5 video versus Flash video.

And another thing, this framing of it as HTML5 versus Flash—like you have to choose: which side are you on? That’s just not the case. Because of the way that the spec has been designed, you can have your cake and eat it too.

So this is the way that it works with the new video element — a really nice element, really simple, elegant design. You have an opening video tag, a closing video tag, and in between you put your fallback content. Fallback content, not accessibility content, fallback content. So this is for browsers that don’t understand the video element.

<video src="movie.mp4">
<!-- fallback content -->
</video>

So what would you put, for example, for your fallback content? Well, you can put a Flash movie. You can have HTML5 video and Flash video. You don’t have to choose.

<video src="movie.mp4">
<object data="movie.swf">
<!-- fallback content -->
</object>
</video>

It’s not quite as simple as that, of course, because here I am sending in H264, and some browsers will understand that. Other browsers don’t understand that.

Don’t get me started on the whole formats thing, because I’ll get really upset. Not because of the technologies. I don’t care about the technologies, but the fact that patents and lawyers and intellectual property and other enemies of the web are getting in the way of me building better websites.

But actually what you have to do is — put your fallback in there as well — you can have different formats. And you can specify these different formats using the source element rather than the source attribute if you want.

<video>
<source src="movie.mp4">
<source src="movie.ogv">
<object data="movie.swf">
<a href="movie.mp4">download</a>
</object>
</video>

Here I’ve kind of got four different levels going on. I got:

  1. Okay, if the browser understands the video element and it understands H264, it gets that.
  2. If the browser understands the video element and it understands Ogg, then it gets the second one.
  3. If the browser doesn’t understand the video element, then I’ll try to give it Flash.
  4. If the browser doesn’t support the video element or Flash, then I give it just a download link.

It’s kind of like you’ve got Inception going on right here. You’ve got different levels that you can go down.

I think it’s a good idea to hedge your bets and serve up your video both ways. If you’re only serving video using the video element, you’re kind of shooting yourself in the foot, I think. But if you’re only serving your video with Flash, you’re kind of doing an equal disservice. You should probably do it both ways.

Why would you want to do this? Well, suppose you’ve got some kind of, I don’t know, handheld device that doesn’t support Flash and you want to make sure they get your content too, right?

The reason why it’s such a problem that you have to have these different formats and the reason why flash has been so successful with video and audio comes down to another design principle, and that’s the principle of Metcalfe’s Law:

The value of a network is proportional to the square of the number of connected users of the system.

He was talking about telephony networks when he came up with this law, but it applies to pretty much anything. The value of a network increases basically the more people are using the network. Everybody uses Facebook because everybody uses Facebook. It’s not that it’s inherently valuable but the fact that everyone’s on it is what makes it so valuable.

A good example of Metcalfe’s Law would be whomever bought the first fax machine. Why did they buy it? It was pretty useless. But as soon as other people started buying fax machines, then it starts to gain value.

So, when you’ve got competing formats and different ways of encoding things you’re not getting the benefit of Metcalfe’s Law and I get upset that we have to encode our video in different ways and I can’t just encode it in one way and send it to the browser. This is why Flash has been very successful in the video / audio areas. It’s that they can serve up one thing to the browser and have that supposedly work. Basically making use of Metcalfe’s Law.

Priority of constituencies

The final design principle I’ll leave with is my favourite, and there’s no code examples to show you for this one. This is much more philosophical and that’s the priority of constituencies.

This one’s really about working together. This solves that problem of when there’s an issue, when you’ve got a problem you’ve got to tackle and you’ve got the W3C wants to do it one way and the WHATWG wants to do it another way, and this person thinks it should be done this way and this person thinks it should be done another way …here’s a flag in the sand to say “here’s how we’ll solve the problem:”

In case of conflict, consider users over authors over implementors over specifiers over theoretical purity.

Theoretical purity: that’s like building the best format we possibly can do. Specifiers are the working groups, W3C, etc. Implementors are browser makers. The authors; that’s us, right? Look how high up the chain we are! We’re second only to users—which is the way it should be right? The users come first. And that means our voice is very very important in this process.

Hixie has stated often that, as features are being proposed and debated for HTML5, if there were a browser manufacturer who said of a feature, “we would never support that feature, never implement that feature in our browser,” then that feature’s coming out of the spec. Because if you put that feature into the spec anyway then the spec will be fiction, right? Because the implementors are refusing to implement that.

Because, according to the priority of constituencies, we are higher up the chain than the implementors, if we have a problem with something in the spec—if we think, “We disagree with this; we will never implement what you’ve specified in our documents”, then equally it should come out of the spec, because our voice should carry more weight. I like that! This basically makes us very powerful, right? And I think that’s a good thing.

I think this is probably the most important design principle because it acknowledges you have to have design principles when you’re building a format, when you’re building software. It’s just the way that the world works. It might seem obvious: users over authors over implementors over specifiers, but if you think about other specifications like XHTML2, it was the complete opposite way around. Theoretical purity was the most important thing and the users—what with the draconian error handling—came very much at the end of that chain. I’m not saying that was wrong but I think that was a very different philosophical approach.

So I think that whatever you’re doing, whether it’s building a format like HTML5 or whether it’s building a website, whether it’s building a content management system, having design principles is really really useful.

Software, like all technologies, is inherently political. Code inevitably reflects the choices, biases and desires of its creators.

I’ll give you an example. The Drupal community got in touch with Mark Boulton and Leisa Reichelt to redesign the Drupal interface. They set out to come up with design principles. They didn’t just sit down and scribble them onto a piece of paper, they took a long time to boil it down to these four design principles that they were going to operate with:

  1. Make the most frequent tasks easy and less frequent tasks achievable.

  2. Design for the 80%.

  3. Privilege the Content Creator.

  4. Make the default settings smart.

And actually when I spoke to Mark about this it’s really those two: “design for the 80%” and “privilege the content creator.” And that’s good, taking a stand and saying “We’re going to value the content creator more than any other people in this project.” That’s an important thing to remember about design principles because a lot of the time the idea of the design principle is that you’re not going to please everyone. The whole point is we don’t aim to please everyone, we’re deciding who’s the most valuable. They decided the content creator was the most important thing.

This other principle, design for the 80%, this is a really common design principle. It’s a common pattern in nature. It’s called the Pareto principle.

Pareto was an Italian economist that noticed this ratio, the eighty/twenty ratio, that twenty percent of the population has eighty percent of the wealth. It’s mapped onto a power law distribution that shows up everywhere in nature. And it turns out that when you’re writing software or you’re building something, it’s kinda the same, that you can achieve with twenty percent of the effort you can hit eighty percent of the use cases. The final twenty percent of the use cases are going to probably require eighty percent more effort. So sometimes it’s a good design principle to say only design for that eighty percent, because we know you can do it with twenty percent of the effort.

Microformats, for example, very much use the Pareto principle in solving some use cases and not going to worry about some of the edge cases. They know they won’t please everyone; that’s not what they’re aiming to do. They’ve got lot of design principles, it’s really worth reading all of them but it’s captured in:

Design for humans first, machines second.

Again, it might seem obvious to you and to me but there are examples of other formats where it’s the other way around: where the ease of parsing for machines was more important than the ease of parsing to the users.

So I think it’s really good to look at these design principles other people got out there and whatever you’re doing, think about what the design principles are and nail them to the wall. Basically have a URL where you publish this stuff, like the Mozilla foundation has done, right, these are the design principles that are behind Mozilla:

The effectiveness of the Internet as a public resource depends upon interoperability (protocols, data formats, content), innovation and decentralised participation worldwide.

Transparent community-based processes promote participation, accountability, and trust.

I think it’s a really good thing to do. And there’s a design principle that it seems to me has driven a lot of really good projects. It’s driven the web itself and I think it’s certainly a philosophy that’s behind HTML5. And that’s a design principle you’ve definitely heard of and that’s:

Rough consensus and running code.

Right? It keeps cropping up over and over and for me, it kind of encapsulates what the web is about and captures where HTML5 is heading.

So this is probably one I would nail to the wall in my office and say this is the design principle of the web: rough consenus and running code.

Ok so I’m gonna leave it there. And if you have any comments about any of this you can contact me on twitter @adactio. I sometimes blog about this stuff on adactio.com and I’m going to give a shameless plug that yes, I have a book out.

Thank you very much for listening.

Questions

Oh, do we have time for questions? Oh, cool. I thought—I didn’t realise I had time. Who’s got questions? I love questions. Anybody? Don’t be shy. We have microphones, so …oh, there, okay. The microphone is coming to you, sir.

Audience member: If you’re using HTML 4.01, let’s say, are you allowed to use HTML5 elements in there? Or is it does it make sense, will browsers actually support it? Or should you really use HTML5 if you want to use HTML5 elements?

Jeremy: I would recommend using the doctype HTML rather than a legacy doctype like HTML 4.01. It’s not because of the browser support, because as I said, browsers will render whatever they can. I guess I need to define what I mean by “support” here.

With these new semantic elements—you know, section, article—you can just go ahead and use them in browsers and the browsers will go ahead and render them and style them fine …except for one browser that you’re all familiar with, but a little bit of javascript will help. A kick up the ass and Internet Explorer can understand them as well.

But it’s not so much that these browsers understand what these new elements are, it’s just that these browsers will let you write any element you want. You can write a ‘foo’ element and a ‘bar’ element and you can still go ahead and style them. The new version of Firefox will properly understand HTML5, will understand these elements.

So you can go ahead and use section or article in HTML 4, but then your doctype will be invalid. If you run it against a validator it will say “I have no idea what a ‘section’ is, an ‘article’ is”. But the browser won’t complain, it will parse it just fine. So it depends what you want to do.

So I would recommend changing the doctype as well, just so you get the nice green tick from the validator. You can use this stuff without changing your doctype, but I would recommend changing the doctype. In fact, if at the minimum you change the doctype, then you can start messing around with this stuff as you feel comfortable.

But here’s something else you can do if you’re still using HTML 4.01 and you want to get to grips with the semantics of these new elements. You don’t necessarily start having to use these new elements themselves, but you could take the names and use them as class names. This is what I’m doing now with a lot of client work. In terms of my personal work, I’m just going ahead and using section and article, nav, aside, header, footer, but in client work I’m like, “Okay, I better ease off a bit on this.” I will still use divs, but I’ll say div class="section", div class="header", div class="footer", so I’ll reuse the vocabulary with class names.

It’s kind of handy, because a lot of the time you’ll be handing off these documents to be turned over to some server side programmer that has to build the actual app, and I need to document why I chose these class names, and by using the class names from HTML5, I have existing documentation, which is the HTML 5 spec. I simply say “If you want to know what ‘section’ means, or ‘article’ means, here’s a URL that explains the semantics of it”. I’m not using the element, I’m using the class names, but the semantics are the important thing.

So that’s an idea, if you don’t want to dive into using these new elements, maybe just use the class names instead.

Do we have another question? Over here, we’ve got one. There’s a microphone coming your way.

Audience member: You touched on input types there. When you use an input type that’s not recognised by the browser and you interrogate the input type, it reports type text.

Jeremy: Yes, that’s correct.

Audience member: In CSS, when you style it with an attribute selector, it styles with the original type that you specified.

Jeremy: Excellent.

Audience member: What’s going on?

Jeremy: I guess there are different parsing rules for CSS and HTML. I don’t know is the answer to your question. It’s a good question. I don’t know, but I guess it’s different parsing models, the way that CSS would parse from the way HTML would parse.

But the fact that you can query with javascript, does the browser think this is input type="number", input type="range", is actually really really handy, because you can use the javascript fallback depending on whether it responds with its input type text or not. That’s actually pretty handy.

But I don’t know why CSS does things differently than HTML. I’m not a browser maker. Don’t we have browser makers in the room somewhere? So that might be a good question to ask them, why it works that way. But it’s a good question.

Anne Van Kesteren: I can answer it if you want.

Jeremy: I hear a voice from beyond.

Anne: The difference is that in javascript you use the IDL attribute, or the property, and you don’t actually query the actual attribute value, because you would usually getAttribute. You would actually get back, like, search instead of text.

Jeremy: Right, so that’s a good point. Actually, I’m going to do a quick bit of live coding. It’s going to go horribly wrong.

Ninety-nine times out of a hundred, there’s no difference between using getAttribute and using just the dot syntax, right? img.src …it makes no difference ninety-nine times out of a hundred whether you say img.src in javascript or whether you say img.getAttribute('src'). It’s going to be the same thing ninety-nine times out of a hundred. With the input types we can use the fact that in some cases it’s not quite the same. So let’s say we create a new input element…

Audience member: It’s not appearing on the screen, by the way.

Jeremy: Oh, shoot. Well, sorry about that. Looks like keynote …. Let me do… sorry. Sorry. Okay, thank you. Thank you for pointing that out.

So let’s say we create a new input type. Let’s see if I get this wrong. And now I say test.setAttribute, and I’m not saying set the test.type attribute, I’m saying set the test attribute to be whatever it is we’re testing, but let’s say the input type was range. And then we can query if test.type is equal to text, right? Then we know, time for some javascript fallback.

This code is probably horrible, I can’t believe I’m trying to do JavaScript with PPK looking at the screen.

But you get the idea. Here’s one of the few times that there’s a slight difference between getAttribute and the dot syntax. It’s important.

So that’s a good point, but if you were to query in javascript getAttribute, you would get the correct report, the same as with CSS, whereas what you’re querying is probably .type –- you’re going to get ‘text’. I think that would explain that reasonably well. Let me just do something…

So, any other questions? I hope that helped. Who’s got the microphone? I’m somewhat blind up here. You’ve got the microphone coming your way …and there we go.

Audience member: Hello

Jeremy: Hello

Audience member: When you want to start using HTML5 for your website, will it improve your position in the search engines?

Jeremy: I’ll tell you what improves your position in search engines, write good content! It’s crazy but it’ll work!

Audience member: That’s the first place, but…

Jeremy: Obviously, and you know what? Everything after that is secondary …it’s not secondary, it’s tertiary. It’s less than tertiary. The most important thing is you’ve got good content. Structure it well, obviously. The fact that you’re using headings at all in a document is good and the fact you’re using semantic markup is good but most important is good content. Now the fine details—whether this element or that element gets ranked more by Google—it changes every week anyway and I know earlier I was pointing out every section or article can start with H1—“Oh my God, what will Google do?” I will point out that Ian Hickson does actually work at Google so I wouldn’t worry too much about it, frankly.

Look, here’s my approach to SEO, if you’re thinking in terms of SEO (Search Engine Optimisation) you’re doing it wrong. You want to be thinking in terms of people optimisation because Google practices people optimisation. So, when someone enters a search term in Google, Google is now thinking “what is the best document, what’s the best resource I can give to this person who’s trying to find a result for this search?” and so it thinks in terms of what people want and when it’s analysing a document it’s thinking about “what do people want out of this document?”. So when you’re creating your documents instead of thinking about what does Google want—because Google is thinking about what people want—why don’t you just cut out the middle man and think “what do people want?” and if you do that, you’ll get good “SEO” anyway because Google is also practising this people optimisation, so cut out the middle man. Don’t think about Google, think about your content, think about your users. Think about using the best semantics available and by a happy coincidence, you will get good SEO from that.

As for the fine details, of which element is better than the other element, it doesn’t matter nearly as much as having relevant, well written, well structured content. Oh, what’s that excellent, excellent phrase you have in Dutch: miereneuker, right? Miereneuker: that’s what a lot of SEO is about.

I got to use that, that was awesome.

One more question, Who’s got a hands up? I can’t see a thing, so …run microphone, run!

Audience member: Hi, you mentioned that you don’t use HTML5 on your own websites for customers, but you’ve been talking about using it today, for about an hour now.

Jeremy: Yep.

Audience member: Why is that?

Jeremy: Well I’d use the doctype. Obviously, no problem using a doctype. I was talking specifically about the new semantic elements: section, article, nav, aside; I’d stick to using div class="section", div class="article". It depends on the client, but there’s a very very very small use case where they might not get the styles and that is, like i said, you can style these elements in every browser but to style them in Internet Explorer you have to use a little bit of JavaScript, right? You have to say… so for some bizarre reason you have to tell Internet Explorer that an element exists by saying document.createElement('section'). Now you can style them in Internet Explorer. It doesn’t make any sense, but that’s the way it works. And Remy sharp has written a nice little piece of code that encapsulates all of this, and it’s on Google Code so you can just point at it and jobs a good ‘un.

So, there’s a very very very small possibility of your audience using Internet Explorer (less than 9) and having JavaScript switched off in which case they then won’t get the styles that you apply to any of these new elements. So, for that very small use case, I’m a little bit more cautious about using the new semantic elements but I absolutely use HTML5 in my client sites because what I’ll do is, maybe I won’t use section, article, nav and aside , I’ll use div class="section", div class="article" and so on, but I will start using the new input types, I’ll use input type="search" because—why not? I will start using the placeholder attribute—why not? I will start using ARIA roles. Right? You can use ARIA roles in any version of HTML but in HTML5, it’s valid! Right? The role attribute is valid. So you can go ahead and start using ARIA roles in your document. Frankly, for that reason alone, it’s worth switching over to using the HTML5 doctype in my opinion.

So, you weren’t quite correct saying I’m not using HTML5 on client sites, I am using HTML5 on client sites, I’m not necessarily using the new structural elements and that’s because there’s that potential for a small portion of the audience to not get the styles for those new elements, if I’m using them as styling points; I don’t necessarily have to use them as styling points.

I think I’m out of time …okay, I’m out of time. Thank you.

Licence

This presentation is licenced under a Creative Commons attribution licence. You are free to:

Share
Copy, distribute and transmit this presentation.
Remix
Adapt the presentation.

Under the following conditions:

Attribution
You must attribute the presentation to Jeremy Keith.