Misunderstanding markup

The W3C announced last week that the XHTML 2 Working Group will wrap up at the end of this year. This should have been a straightforward, welcome announcement. Instead it has confused a lot of people who believe that it heralds the end of XHTML—see, for example, the comments on Zeldman’s blog post.

This confusion is understandable given the lamentable names that have been assigned to different technologies. This isn’t the first time this has happened…

sounds like it has something to do with Java. It doesn’t. Apart from some superficial syntactical similarities, they have nothing in common. Java is to JavaScript as ham is to hamster.

sounds like it has something to do with HTML. It doesn’t. DHTML is a catch-all term to describe the action of updating the CSS properties of HTML elements using JavaScript. I have my own catch-all term for the combination of HTML, CSS, and JavaScript; I call it web development.

And so to . You’d be forgiven for thinking it has something to do with or . It doesn’t.

XHTML 1.0 is simply a reformulation of HTML 4 with XML syntax:

  • lowercase tag and attribute names,
  • quoted attribute values,
  • mandatory closing tags for p and li elements,
  • a slash at the end of standalone elements like img, br, and meta.

XHTML 1.1 is the same reformulation but with the added unrealistic demand that documents must served with an XML mime-type.

XHTML 2, by contrast, has had very, very little in common with HTML 4. It was an attempt at a fresh start, to create a theoretically “pure” vocabulary with little concern for backwards compatibility. It was, of course, doomed to failure:

This was a philosophically pure specification that was so backwardly incompatible that it nearly deprecated the img element.

Now that XHTML 2 is dead, some people think that this means XHTML is dead. It doesn’t.

Henri Sivonen apparently attempts to clear up this confusion by writing An Unofficial Q and A about the Discontinuation of the XHTML2 WG which, alas, is not very clearly written. Also, Henri, as pointed out by John Allsopp, your snark is showing:

There are two meanings to XHTML: technical and marketing. The technical kind (XHTML served using the application/xhtml+xml MIME type) is a formulation of HTML as an XML vocabulary. The marketing kind (XHTML served using the text/html MIME type) is processed just like HTML by browsers but the authors attempt to observe slightly different syntax rules in order to make it seem that they are doing something newer and shinier compared to HTML.

Belittling authors who prefer a stricter syntax is no way to explain technical differences between formats.

There are perfectly good reasons for choosing to use the XHTML syntax. Take, for example, Drew’s comment:

Whenever this argument surfaces, there seems to be the assumption that loose syntax is easier for beginners. This baffles me. In my experience simple, strict rules are much easier to learn and code to than loose rules with multiple shortcuts. I like XHTML because attributes must always be quoted. Tags must always be closed. These are simple rules that require no thought, and result in uniform, predictable markup.

I’m not saying that XHTML syntax is better or worse than HTML syntax. I’m saying it’s a personal choice. If you prefer a different syntax to me, that doesn’t mean that one of us is wrong. If I like Thai food and you prefer Italian, neither of us is wrong.

The death of XHTML 2 does not mean the death of XHTML syntax. If you want to continue to close all tags and quote all attributes, you can do so. You can either use the existing XHTML 1 spec or you can use HTML 5.

That’s right; HTML 5 allows you to use whichever syntax you are most comfortable with. Doctor Bruce has the diagnosis:

I like the XHTML syntax. It’s how I learned. I’m used to lowercase code, quoted attributes and trailing slashes on elements like br and img. They make me feel nice and comfy, like a cup of Ovaltine and The Evil Dead on the telly.

But you might not. You might want SHOUTY UPPERCASE tags, no trailing slashes and attribute minimisation. And, in HTML 5 you can choose.

Thanks to the “pave the cowpaths” principle, it’s up to you. As you like it. What you will. Whatever you want, whatever you like.

If you want, you can even serve your documents as application/xhtml+xml, instantly transforming them from HTML 5 into XHTML 5 …yes, another confusing name.

Just remember, XHTML 2, the spec, has nothing to do with XHTML, the syntax. XHTML lives on in HTML 5.

But, but, but…, I hear you cry, surely that does us no good because HTML 5 isn’t supported yet, right?

Define support. HTML 5, unlike XHTML 2, is designed to be backwards compatible. So here’s how you can take an existing XHTML 1 document and convert it to HTML 5…

Take this line:

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">

Replace it with:

<!DOCTYPE html>

Done.

XHTML 2 is dead. Long live XHTML …as HTML 5.

Update: What Zeldman said.

Have you published a response to this? :

Responses

Patrick Hamann

Well said!

# Posted by Patrick Hamann on Tuesday, July 7th, 2009 at 1:57pm

Rich Clark

Spot on with these comments, I’ve just commented on Zeldmans post that you should choose the right doctype for the job so it’s good to hear someone else thinking on the same lines.

# Posted by Rich Clark on Tuesday, July 7th, 2009 at 2:17pm

Pete

On the ball as usual

I definitely prefer the strictness as per XHTML spec, so I guess I’ll be using XHTML5 — HTML 5 with optional strict style markup would just be odd.

# Posted by Pete on Tuesday, July 7th, 2009 at 2:30pm

Pete

On the ball as usual

I definitely prefer the strictness as per XHTML spec, so I guess I’ll be using XHTML5 — HTML 5 with optional strict style markup would just be odd.

# Posted by Pete on Tuesday, July 7th, 2009 at 2:45pm

Rimantas

You can have the same strict syntax in HTML. And using HTML you don’t have to rely on the lack of proper SGML implementation in your browser. That browsers can digest your broken HTML (this is what XHTML document is for HTML processor) does not mean that you should feed them with it. Zeldman should really quit that marketing game and stick to standards.

# Posted by Rimantas on Tuesday, July 7th, 2009 at 3:11pm

Simon

Thank you, possibly the first post on the HTML5 / XHTML2 saga that has made some sense.

So where’s the panic?

# Posted by Simon on Tuesday, July 7th, 2009 at 3:36pm

Bryan Hoffman

Thanks for explaining this.

XHTML, like Doctor Bruce, is how I learned. Good to hear I can continue to do what I do, and that HTML5 will also allow me to continue to do what I do.

The confusion surrounding the ‘death’ of XHTML has had a "the sky is falling" vibe recently, thankfully it seems, clear heads will prevail.

Philip Renich

Thank you for this writeup. Of the only small bit of reading on this subject I’ve done, you explained it very well and concisely. Looking forward to the future with more excitement now.

Egor Kloos

Sweet post. Personally I prefer the xHTML syntax, but I’m not going drone on about it. It’s bit like the whole toilet paper under or over preference.

# Posted by Egor Kloos on Tuesday, July 7th, 2009 at 4:22pm

rob cherny

Yeah. I don’t care what anybody says, but HTML’s traditional lax rules encourage badly formed code, and I believe that time will show that to be the truth.

I’ve been digging around for the life of my trying to find an absolute answer: They say the HTML5 spec allows trailing slashes on self-closing elements, but that they should be served as XML then. But is an HTML5 document with trailing slashes served as text/html invalid?

# Posted by rob cherny on Tuesday, July 7th, 2009 at 4:31pm

Scott McCracken

“XHTML 2 is dead. Long live XHTML …as HTML 5.”

A perfect summarization, thank you for taking the time to spell this out. I will push forward with the XHTML syntax in HTML 5.

Michael Jackson

Thanks so much for clearing the air a bit with this post Jeremy. An understanding of the relationships between the different specs is an extremely important piece of the conversation that has not been very prevalent until Zeldman announced that XHTML was dead, the comments started rolling in, and everybody who understands these things perfectly started rolling their eyes, disgusted at the amount of ignorance out there in the developer community.

Unfortunately, most developers that I’ve had the opportunity to work with only take time to get a very shallow understanding of the technologies they are working with, the core of course being HTML, probably because the consequences are not very severe if they don’t fully understand and implement the standard (most web clients are very forgiving). It’s perfectly reasonable to interpret the incremental version number (XHTML 2) as being an evolution of the XHTML 1 spec instead of a "fresh start". After all, that’s how it goes with software and people are used to that.

river

well done. this is one of the clearest explanations i’ve read so far. it has all seemed like a lot of chest-pounding and hair-pulling over something not very surprising or upsetting, so it’s good to hear some grownup thoughts on the matter from yourself, zeldman, and others. thanks for bringing things back down to earth.

# Posted by river on Tuesday, July 7th, 2009 at 5:25pm

Robert

Henri may have been being snarky, but the quote isn’t saying anything about "formats" from what I can tell. He’s saying that for most people who author web content, XHTML is just a buzz word. Most people serve "XHTML" with "Content-Type: text/html" which removes any benefits over HTML 4 (e.g. inclusion of other XML bits, like MathML, directly in the XHTML markup) and I don’t care to guess how many XHTML authors know how to configure a server to override the default of serving XHTML as HTML. To modify your analogy, it’s like pretending to be a Thai restaurant but only serving McDonald’s hamburgers.

The "requirement" for strict syntax is only enforced if XHTML is served as XML, and browsers parse XHTML served as text/html as HTML. The same ideologic use of strict syntax people have for XHTML can be applied to HTML. Just because you can use loose syntax doesn’t mean you HAVE to use loose syntax, and it’s best if you don’t because (until HTML 5) error recovery (e.g. from unclosed paragraph tags) isn’t always consistent across browsers because the W3C didn’t define how errors should be handled in HTML. The strict syntax in XHTML served as text/html is self-imposed just like HTML (because the browsers treat it like HTML).

The argument for my decision not to adopt XHTML boiled down to this: If you’re serving XHTML as malformed HTML (XHTML’s self-closing elements, for example, are treated as having invalid attributes until HTML 5 is implemented and they are treated as self-closing tags like in XML) and you can code HTML with strict syntax, why not just use HTML?

Serving XHTML as text/html is a stop gap that has lasted 9 years while web developers haven’t gotten to use the "X" in "XHTML." The promise in real XHTML is great, but it’s been held back by lack of support in IE (and probably the lack of web servers that check to see if a document is XHTML or HTML before they pick the Content-type to send to the browser for .html files). The real fate of XHTML, as far as I’m concerned, is whether a complete HTML 5 implementation will require real XHTML 5 support (i.e. XHTML served as XML), and whether the IE team plans to support XHTML 5. If HTML 5 can force IE to support real XHTML, I think XHTML can be revived as the best tool for most web designers needs. If not, I still see no reason to pick XHTML over HTML.

Thanks for explicitly mentioning that XHTML isn’t dead. Few of the well known web design celeb types are saying anything about XHTML 5, and posting comments saying that HTML 5 has an XHTML serialization doesn’t get the same attention.

# Posted by Robert on Tuesday, July 7th, 2009 at 6:07pm

Nicolethecoder

Thank you for writing this.

Unfortunately, seems like many people forget or aren’t aware that there really are only 4 big things that differ between HTML4 and XHTML1. There’s really no excuse not to do those 4 things.

Carol Dew

THANK you. I have read Jeffrey Zeldman’s post, and I’ve read the Unofficial Q&A by Henri Sivone. After all the comments on Mr Zeldman’s post, I was completely confused. All I know is that I learned XHTML because my OU and University of Exeter distance learning courses said to use it. Then I read about the mime-type issue, so in my limited understanding, I thought maybe I should start writing HTML. Then when all this started about XHTML2, I’m like, for shit’s sake, all I want to do is write to standards, but if all these people who know so much more than me are arguing about it, what chance do I have of knowing which to use?

Your post answered my questions. Again, THANK you.

# Posted by Carol Dew on Tuesday, July 7th, 2009 at 7:14pm

Matthias

thanks for clearing this up! Now I can sleep again… ;-)

# Posted by Matthias on Tuesday, July 7th, 2009 at 8:49pm

David Mead

The more I read about this, the more I feel there’s a lot of wailing and gnashing of teeth over nothing.

I’m glad we (the web development community at large) are discussing this and are, for the most part, pretty happy about it. But I have to think we should be able to get over it pretty quickly.

I know of very few web sites that implemented XHTML correctly. I know of thousands of websites that are written using Frontpage etc. that don’t even use correct HTML.

I know personally the next site I build will be written in HTML5 using the skills I learned from using XHTML.

One ring to rule them all, and let that ring be HTML5.

# Posted by David Mead on Tuesday, July 7th, 2009 at 9:48pm

Aaron Burrows

"I have my own catch-all term for the combination of HTML, CSS, and JavaScript; I call it web development." - Well said.

Great post! I, for one, will continue to mark up my code with XHTML syntax. I can’t stand the sloppiness of "old" HTML, but I love the flexibility to do it either way in HTML5. I guess I fall into the category of developers who will be using X/HTML5 and I look forward to all the shiny newness that it will bring. Now, if we can just get them to move a tiny bit faster, so we don’t lose our standards 13 years from now.

Manuel

Hi, I’m new to this field and I would like to learn HTML.

If I understand correctly, most future web sites will be written in HTML5 (served using the text/html MIME type) and just a few sites will use XHTML5 (served using the application/xhtml+xml MIME type). As a consequence, I will focus on HTML5.

You say that HTML5 can use both the HTML syntax and the stricter XHTML syntax, but since I’m starting from scratch I suppose I could safely ignore the XHTML syntax and focus on the HTML syntax (HTML 4.1/5). Why bother about XHTML syntax?

Unfortunately all modern introductory book/online resources about HTML are focused on the XHTML syntax. Where I could learn the pure HTML syntax without any reference to the useless and distracting (for my purpouses) XHTML syntax?

# Posted by Manuel on Tuesday, July 7th, 2009 at 10:56pm

Bilgehan

The confusion and misunderstanding is a result of reality distortion field that XHTML supporters created through recent years. XHTML served with text/html was always a lie that these developers wanted to believe.

# Posted by Bilgehan on Tuesday, July 7th, 2009 at 10:57pm

Dave Harrison

Great article Jeremy. I think this will clear things up for many people. HTML5 will be here and in common use within a couple of years in my view, forget 2022!! and so it is time to start getting familiar with it. Using it in its purest form will probably involve rethinking how we currently markup our pages (that is the point after all) but until then as you say it is very easy to serve the page with an HTML5 doctype. We have to start somewhere right? lol

Matt Barnes

This was so very concise and helpful. Thanks for cutting through all the confusion and clutter of these last few days.

CodeJoust

Very nice article. I also agree that stricter rules produce nicer, easier to read, and more uniform markup. That is most of the reason I use xhtml. xHTML is very backwards compatible (as with v.1), except if it is served as real xhtml. (Internet Explorer, I’m looking at you!) Otherwise, there is very little difference, and it will only further confuse people if there is a large divide between xHTML and HTML.

# Posted by CodeJoust on Wednesday, July 8th, 2009 at 12:10am

Damien Buckley

Thanks for clearing that up Jeremy and nice to read a balanced - emotion reduced account for the first time this week.

Robin

Thank you for the clear explanation.

# Posted by Robin on Wednesday, July 8th, 2009 at 1:17am

Jonty

Great explanation, especially the analogies. There is so much misinformation about this topic, so it’s great to see a pragmatic approach in this world of extremes.

# Posted by Jonty on Wednesday, July 8th, 2009 at 2:30am

JC John Sese Cuneta

Ah, great post, thanks for writing it and clearing it up for the majority, and you explained it better than most of similar explanations I’ve read so far (including myself :p )

XHTML 2 is dead. Long live XHTML …as HTML 5.

Well said. ^_^

Andy Walpole

Good article - I’m glad you cleared that up.

I think most people have gotten used to and prefer the XHTML syntax now.

pingin

Thanks for this very clear explanation, Jeremy.

# Posted by pingin on Wednesday, July 8th, 2009 at 1:17pm

Dennis

Great article! And thanks for adding the Zeldman link. Also, I took your advice and simply changed the doctype on my XHTML Strict code, and it validated for HTML 5, hooray!

# Posted by Dennis on Wednesday, July 8th, 2009 at 10:53pm

Steven Clark

Well written. Its intereting in Zeldman’s comments to see people like Rimantas come back out of the woodwork in this conversation with some kind of renewed vigour that their doomsayer obsessiveness about XHTML being wrong is somehow finally proven. That guy was accosting me years ago with the very same diatribe of techno-babble, quite unprovoked as well.

I think their obsessive ranting shows more about a lack of work-life balance than it does about the reality of web development. I like the way you have succinctly put each statemtent by the way, so it will be interesting is they now come over to argue otherwise.

Yes its a good thing. End of story. :)

Paul Morriss

I’m trying to understand the different points of view on this issue. Your post seems pretty much snark-free. Henri’s Q&A seemed clear to me, but I don’t understand all the issues.

It seems the point of contention is not HTML 5, but the usefulness of XHTML 1.1 over 1.0. Would you say that is fair?

vancouver web design

quote"Java is to JavaScript as ham is to hamster" -Apparently you’ve never had a hamster and bacon club sandwich.

Personally, I’m happy to hear that xhtml 2 is dead. It was needless in the first place. Too many people trying to put their mark on unnecessary standards gets us into needless messes.

Michel

You summed it so well, Jeremy! :-)

And I, for one, prefer writing XHTML syntax, too: it’s simpler for me, a bit more strict than HTML syntax, and works well in all browsers that can read & interpret HTML.

So I don’t understand from where came all of a sudden this "Didn’t we tell you, XHTML is dead?" thing that I hear now here and there, all the time, especially after W3C announced that XHTML won’t be developed anymore… :-/

XHTML 1.0 is a valid standard. XHTML 5 is developed. HTML 5 with XHTML syntax is perfectly valid, too. So, I don’t understand… :(