Tags: semantics

19

sparkline

Conversational interfaces

Psst… Jeremy! Right now you’re getting notified every time something is posted to Slack. That’s great at first, but now that activity is increasing you’ll probably prefer dialing that down.

Slackbot, 2015

What’s happening?

Twitter, 2009

Why does everyone always look at me? I know I’m a chalkboard and that’s my job, I just wish people would ask before staring at me. Sometimes I don’t have anything to say.

Existentialist chalkboard, 2007

I’m Little MOO - the bit of software that will be managing your order with us. It will shortly be sent to Big MOO, our print machine who will print it for you in the next few days. I’ll let you know when it’s done and on it’s way to you.

Little MOO, 2006

It looks like you’re writing a letter.

Clippy, 1997

Your quest is to find the Warlock’s treasure, hidden deep within a dungeon populated with a multitude of terrifying monsters. You will need courage, determination and a fair amount of luck if you are to survive all the traps and battles, and reach your goal — the innermost chambers of the Warlock’s domain.

The Warlock Of Firetop Mountain, 1982

Welcome to Adventure!! Would you like instructions?

Colossal Cave, 1976

I am a lead pencil—the ordinary wooden pencil familiar to all boys and girls and adults who can read and write.

I, Pencil, 1958

ÆLFRED MECH HET GEWYRCAN
Ælfred ordered me to be made

Ashmolean Museum, Oxford

The Ælfred Jewel, ~880

Technical note

I have marked up the protagonist of each conversation using the cite element. There is a long-running dispute over the use of this element. In HTML 4.01 it was perfectly fine to use cite to mark up a person being quoted. In the HTML Living Standard, usage has been narrowed:

The cite element represents the title of a work (e.g. a book, a paper, an essay, a poem, a score, a song, a script, a film, a TV show, a game, a sculpture, a painting, a theatre production, a play, an opera, a musical, an exhibition, a legal case report, a computer program, etc). This can be a work that is being quoted or referenced in detail (i.e. a citation), or it can just be a work that is mentioned in passing.

A person’s name is not the title of a work — even if people call that person a piece of work — and the element must therefore not be used to mark up people’s names.

I disagree.

In the examples above, it’s pretty clear that I, Pencil and Warlock Of Firetop Mountain are valid use cases for the cite element according to the HTML5 definition; they are titles of works. But what about Clippy or Little Moo or Slackbot? They’re not people …but they’re not exactly titles of works either.

If I were to mark up a dialogue between Eliza and a human being, should I only mark up Eliza’s remarks with cite? In text transcripts of conversations with Alexa, Siri, or Cortana, should only their side of the conversation get attributed as a source? Or should they also be written without the cite element because it must not be used to mark up people’s names …even though they are not people, according to conventional definition.

It’s downright botist.

A question of markup

Hi,

I’m really sorry it’s taken me so long to write back to you (over a month!)—I’m really crap at email.

I’m writing to you hoping you can help me make my colleagues take html5 “seriously”. They have read your book, they know it’s the “right” thing to do, but still they write !doctype HTML and then div, div, div, div, div…

Now, if you could provide me with some answers to their “why bother?- questions” would be really appreciated.

I have to be honest, I don’t think it’s worth spending lots of time agonising over what’s the right element to use for marking up a particular piece of content.

That said, I also think it’s lazy to just use divs and spans for everything, if a more appropriate element is available.

Paragraphs, lists, figures …these are all pretty straightforward and require almost no thought.

Deciding whether something is a section or an article, though …that’s another story. It’s not so clear. And I’m not sure it’s worth the effort. Frankly, a div might be just fine in most cases.

For example, can one assume that in the future we will be pulling content directly from websites and therefore it would be smart to tell this technology which content is the article, what are the navigation and so on?

There are some third-party tools (like Readability) that pay attention to the semantics of the elements you use, but the most important use-case is assistive technology. For tools such as screen readers, there’s a massive benefit to marking up headings, lists, and other straightforward elements, as well as some of the newer additions like nav and main.

But for many situations, a div is just fine. If you’re just grouping some stuff together that doesn’t have a thematic relation (for instance, you might be grouping them together to apply a particular style), then div works perfectly well. And if you’re marking up a piece of inline text and you’re not emphasising it, or otherwise differentiating it semantically, then a span is the right element to use.

So for most situations, I don’t think it’s worth overthinking the choice of HTML elements. A moment or two should be enough to decide which element is right. Any longer than that, and you might as well just use a div or span, and move on to other decisions.

But there’s one area where I think it’s worth spending a bit longer to decide on the right element, and that’s with forms.

When you’re marking up forms, it’s really worth making sure that you’re using the right element. Never use a span or a div if you’re just going to add style and behaviour to make it look and act like a button: use an actual button instead (not only is it the correct element to use, it’s going to save you a lot of work).

Likewise, if a piece of text is labelling a form control, don’t just use a span; use the label element. Again, this is not only the most meaningful element, but it will provide plenty of practical benefit, not only to screen readers, but to all browsers.

So when it comes to forms, it’s worth sweating the details of the markup. I think it’s also worth making sure that the major chunks of your pages are correctly marked up: navigation, headings. But beyond that, don’t spend too much brain energy deciding questions like “Is this a definition list? Or a regular list?” or perhaps “Is this an aside? Or is it a footer?” Choose something that works well enough (even if that’s a div) and move on.

But if your entire document is nothing but divs and spans then you’re probably going to end up making more work for yourself when it comes to the CSS and JavaScript that you apply.

There’s a bit of a contradiction to what I’m saying here.

On the one hand, I’m saying you should usually choose the most appropriate element available because it will save you work. In other words, it’s the lazy option. Be lazy!

On the other hand, I’m saying that it’s worth taking a little time to choose the most appropriate element instead of always using a div or a span. Don’t be lazy!

I guess what I’m saying is: find a good balance. Don’t agonise over choosing appropriate HTML elements, but don’t just use divs and spans either.

Hope that helps.

Hmmm… you know, I think I might publish this response on my blog.

Cheers,

Jeremy

The complexity of HTML

Baldur Bjarnason has written down his thoughts on why he thinks HTML is too complex.

Now we’re back to seeing almost the same level of complexity and messiness in most web pages as we saw in the worst days of table-hacking. The semantic elements from HTML5 are largely unused.

The reason for this, as outlined in an old email by Matthew Thomas, is down to the lack of any visible benefit from browsers:

unless there is an immediate visual or behavioural benefit to using an element, most people will ignore it.

That’s a fair point, but I think it works in the opposite direction too. I’ve seen plenty of designers and developers who are reluctant to use HTML elements because of the default browser styling. The legend element is one example. A more recent one is input type=”date”—until there’s a way for authors to have more say about the generated interface, there’s going to be resistance to its usage.

Anyway, Baldur’s complaint is that HTML is increasing in complexity by adding new elements—section, article, etc.—that provide theoretical semantic value, without providing immediate visible benefits. It’s much the same argument that informed Cory Doctorow’s Metacrap essay over a decade ago. It’s a strong argument.

There’s always a bit of a chicken-and-egg situation with any kind of language extension designed to provide data structure: why should authors use the new extensions if there’s no benefit? …and why should parsers (like search engines or browsers) bother doing anything with the new extensions if nobody’s using them?

Whether it’s microformats or new HTML structural elements, the vicious circle remains the same. The solution is to try to make the barrier to entry as low as possible for authors—the parsers/spiders/browsers will follow.

I have to say, I share some of Baldur’s concern. I’ve talked before about the confusion between section and article. Providing two new elements might seem better than providing just one, but in fact it just muddies the waters and confuses authors (in my experience).

But I realise that in the grand scheme of things, I’m nitpicking. I think HTML is in pretty good shape. Baldur said “simply put, HTML is a mess,” and he’s not wrong …but HTML has always been a mess. It’s the worst mess except for all the others that have been tried. When it comes to markup, I think that “perfect” is very much the enemy of “good” (just look at XHTML2).

When I was in San Francisco back in August, I had a good ol’ chat with Tantek about markup complexity. It started when he asked me what I thought of hgroup.

I actually found quite a few use cases for hgroup in my own work …but I could certainly see why it was a dodgy solution. The way that a wrapping hgroup could change the semantic value of an h2, or h3, or whatever …that’s pretty weird, right?

And then Tantek pointed out that there are a number of HTML elements that depend on their wrapper for meaning …and that’s a level of complexity that doesn’t sit well with HTML.

For example, a p is always a paragraph, and em is always emphasis …but li is only a list item if it’s wrapped in ul or ol, and tr is only a table row if it’s wrapped in a table.

(Interestingly, these are the very same elements that browsers will automatically adjust the DOM for—generating ul and table if the author doesn’t include it. It’s like the complexity is damage that the browsers have to route around.)

I had never thought of that before, but the idea has really stuck with me. Now it smells bad to me that it isn’t valid to have a figcaption unless it’s within a figure.

These context-dependent elements increase the learning curve of HTML, and that, in my own opinion, is not a good thing. I like to think that HTML should be easy to learn—and that the web would not have been a success if its lingua franca hadn’t been grokabble by mere mortals.

Tantek half-joked about writing HTML: The Good Parts. The more I think about it, the more I think it’s a pretty good idea. If nothing else, it could make us more sensitive to any proposed extensions to HTML that would increase its complexity without a corresponding increase in value.

The main issue

The inclusion of a main element in HTML has been the subject of debate for quite a while now. After Steve outlined the use cases, the element has now landed in the draft of the W3C spec. It isn’t in the WHATWG spec yet, but seeing as it is being shipped in browsers, it’s only a matter of time.

But I have some qualms about the implementation details so I fired off an email to the HTML working group with my concerns about the main element as it is current specced.

I’m curious as to why its use is specifically scoped to the body element rather than any kind of sectioning content:

Authors must not include more than one main element in a document.

I get the rationale behind the main element: it plugs a gap in the overlap between native semantics and ARIA landmark roles (namely role="main"). But if you look at the other elements that map to ARIA roles (header, footer, nav), those elements can be used multiple times in a single document by being scoped to their sectioning content ancestor.

Let’s say, for example, that I have a document like this, containing two header elements:

<body>
 <header>Page header</header>
 Page main content starts here
 <section>
  <header>Section header</header>
  Section main content
 </section>
 Page main content finishes here
</body>

…only the first header element (the one that’s scoped to the body element) will correspond to role="banner", right?

Similarly, in this example, only the final footer element will correspond to role=”contentinfo”:

<body>
 <header>Page header</header>
 Page main content starts here
 <section>
  <header>Section header</header>
  Section main content
  <footer>Section footer</footer>
 </section>
 Page main content finishes here
 <footer>Page footer</footer>
</body>

So what I don’t understand is why we can’t have the main element work the same way i.e. scope it to its sectioning content ancestor so that only the main element that is scoped to the body element will correspond to role="main":

<body>
 <header>Page header</header>
 <main>
  Page main content starts here
  <section>
   <header>Section header</header>
   <main>Section main content</main>
   <footer>Section footer</footer>
  </section>
  Page main content finishes here
 </main>
 <footer>Page footer</footer>
</body>

Here are the corresponding ARIA roles:

<body>
 <header>Page header</header> <!-- role="banner" -->
 <main>Page main content</main> <!-- role="main" -->
 <footer>Page footer</footer> <!-- role="contentinfo" -->
</body>

Why not allow the main element to exist within sectioning content (section, article, nav, aside) the same as header and footer?

<section>
 <header>Section header</header> <!-- no corresponding role -->
 <main>Section main content</main> <!-- no corresponding role -->
 <footer>Section footer</footer> <!-- no corresponding role -->
</section>

Deciding how to treat the main element would then be the same as header and footer. Here’s what the spec says about the scope of footers:

When the nearest ancestor sectioning content or sectioning root element is the body element, then it applies to the whole page.

I could easily imagine the same principle being applied to the main element. From an implementation viewpoint, this is already what browsers have to do with header and footer. Wouldn’t it be just as simple to apply the same parsing to the main element?

It seems a shame to introduce a new element that can only be used zero or one time in a document …I think the head and body elements are the only other elements with this restriction (and I believe the title element is the only element that is used precisely once per document).

It would be handy to be able to explicitly mark up the main content of an article or an aside—for exactly the same reasons that it would be useful to mark up the main content of a document.

So why not?

Figuring out

A recent simplequiz over on HTML5 Doctor threw up some interesting semantic issues. Although the figure element wasn’t the main focus of the article, a lot of the comments were concerned with how it could and should be used.

This is a subject that has been covered before on HTML5 Doctor. It turns out that we may have been too restrictive in our use of the element, mistaking some descriptive text in the spec for proscriptive instruction:

The element can thus be used to annotate illustrations, diagrams, photos, code listings, etc, that are referred to from the main content of the document, but that could, without affecting the flow of the document, be moved away from that primary content, e.g. to the side of the page, to dedicated pages, or to an appendix.

Steve and Bruce have been campaigning on the HTML mailing list to get the wording updated and clarified.

Meanwhile, in an unrelated semantic question, there was another HTML5 Doctor article a while back about quoting and citing with blockquote and its ilk.

The two questions come together beautifully in a blog post on the newly-redesigned A List Apart documenting this pattern for associating quotations and authorship:

<figure>
 <blockquote>It is the unofficial force—the Baker Street irregulars.</blockquote>
 <figcaption>Sherlock Holmes, <cite>Sign of Four</cite></figcaption>
</figure>

Although, unsurprisingly, I still take issue with the decision in HTML5 not to allow the cite element to apply to people. As I’ve said before we don’t have to accept this restriction:

Join me in a campaign of civil disobedience against the unnecessarily restrictive, backwards-incompatible change to the cite element.

In which case, we get this nice little pattern combining figure, blockquote, cite, and the hCard microformat, like this:

<figure>
 <blockquote>It is the unofficial force—the Baker Street irregulars.</blockquote>
 <figcaption class="vcard"><cite class="fn">Sherlock Holmes</cite>, <cite>Sign of Four</cite></figcaption>
</figure>

Or like this:

<figure>
 <blockquote>Join me in a campaign of civil disobedience against the unnecessarily restrictive, backwards-incompatible change to the cite element.</blockquote>
 <figcaption class="vcard"><cite class="fn">Jeremy Keith</cite>, <a href="http://24ways.org/2009/incite-a-riot/"><cite>Incite A Riot</cite></a></figcaption>
</figure>

Pursuing semantic value

Divya Manian, one of the super-smart web warriors behind HTML5 Boilerplate, has published an article on Smashing Magazine called Our Pointless Pursuit Of Semantic Value.

I’m afraid I have to agree with Patrick’s comment when he says that the abrasive title, the confrontational tone and strawman arguments at the start of the article make it hard to get to the real message.

But if you can get past the blustery tone and get to the kernel of the article, it’s a fairly straightforward message: don’t get too hung up on semantics to the detriment of other important facets of web development. Divya clarifies this in a comment:

Amen, this is the message the article gets to. Not semantics are useless but its not worth worrying over minute detail on.

The specific example of divs and sectioning content is troublesome though. There is a difference between a div and a section or article (or aside or nav). I don’t just mean the semantic difference (a div conveys no meaning about the contained content whereas a section element is specifically for enclosing thematically-related content). There are also practical differences.

A section element will have an effect on the generated outline for a document (a div will not). The new outline algorithm in HTML5 will make life a lot easier for future assistive technology and searchbots (as other people mentioned in the comments) but it already has practical effects today in some browsers in their default styling.

Download the HTML document I’ve thrown up at https://gist.github.com/1360458 and open it in the latest version of Safari, Chrome or Firefox. You’ll notice that the same element (h1) will have different styling depending on whether it is within a div or within a section element (thanks to -moz-any and -webkit-any CSS declarations in the browser’s default stylesheets).

So that’s one illustration of the practical difference between div and section.

Now with that said, I somewhat concur with the conclusion of “when in doubt, just use a div”. I see far too many documents where every div has been swapped out for a section or an article or a nav or an aside. But my reason for coming to that conclusion is the polar opposite of Divya’s reasoning. Whereas Divya is saying there is effectively no difference between using a div and using sectioning content, the opposite is the case: it makes a big difference to the document’s outline. So if you use a section or article or aside or nav without realising the consequences, the results could be much worse than if you had simply used a div.

I also agree that there’s a balance to be struck in the native semantics of HTML. In many ways its power comes from the fact that it is a limited—but universally understood by browsers—set of semantics. If we had an element for every possible type of content, the language would be useless. Personally, I’m not convinced that we need a section element and an article element: the semantics of those two elements are so close as to be practically identical.

And that’s the reason why right now is exactly the time for web developers to be thinking about semantics. The specification is still being put together and our collective voice matters. If we want to have well-considered semantic elements in the language, we need to take the time to consider the effects of every new element that could potentially be used to structure our content.

So I will continue to stop and think when it comes to choosing elements and class names just as much as I would sweat the details of visual design or the punctation in my copy or the coding style of my JavaScript.

Timeless

When I first heard that Hixie had removed all traces of the time element from the ongoing HTML spec, my knee-jerk reaction was “This is a really bad idea!” But I decided not to jump in without first evaluating the arguments for and against the element’s removal. That’s what I’ve been doing over the past week and my considered response is:

This is a really bad idea!

The process by which the element was removed is quite disturbing:

  1. Hixie (as a contributer) opens a bug proposing that the time element be replaced with the more general data element.
  2. Lots of people respond, almost unanimously pointing out the problems with that proposal.
  3. Hixie (as the editor) goes ahead and does what exactly what he wanted anyway.

Technically that’s exactly how the WHATWG process works. The editor does whatever he wants:

This is not a consensus-based approach — there’s no guarantee that everyone will be happy! There is also no voting.

Most of the time, this works pretty well. It might not be fair but it seems to work more efficiently than the W3C’s consensus-based approach. But in this case the editor’s unilateral decision is fundamentally at odds with the most important HTML design principle, the priority of constituencies:

In case of conflict, consider users over authors over implementors over specifiers over theoretical purity. In other words costs or difficulties to the user should be given more weight than costs to authors; which in turn should be given more weight than costs to implementors; which should be given more weight than costs to authors of the spec itself, which should be given more weight than those proposing changes for theoretical reasons alone.

The specifier (Hixie) is riding roughshod over the concerns of authors.

I’m particularly concerned by the uncharacteristically muddy thinking behind Hixie’s decision. There are two separate issues here:

  1. Is the time element useful?
  2. Do we need a more general data element?

Hixie conflates these two questions.

He begins by bizarrely making the claim that time hasn’t had much uptake. This is demonstrably false. It’s already shipping in Drupal builds and Wordpress templates as well as having parser support in services and at least one browser. If anything, time is one of the more commonly used and understood elements in HTML5.

There’s a very good reason for that: it fulfils a need that authors have had for a long time—the ability to make a timestamp that is human and machine-readable at the same time. That’s the use case that the microformats community has been trying to solve with the abbr design pattern and the value-class pattern. Those solutions are okay, but not nearly as elegant and intuitive as having a dedicated time element.

Crucially the time element didn’t just specify the mechanism for encoding a machine-readable timestamp, it also defined the format, namely a subset of ISO 8601. That’s exactly the kind of unambiguous documentation that makes a specification useful.

Hixie correctly points out that there are cases for human and machine-readable data other than dates and times. He incorrectly jumps to the conclusion that the time is therefore a failure.

I think he’s right that there probably should be a dedicated element for marking up this kind of data. We already have the meta element but the fact that it’s a standalone element makes it tricky to explicitly associate the human-readable text. So the introduction of a new data may very well turn out to be a good idea. But it does not need to be introduced at the expense of the more specific time element.

I think there’s a comparison to be made with sectioning content. We’ve got the generic section element but then we also have the more specific nav, aside and article elements that can be thought of specialised forms of section. By Hixie’s logic, there’s no reason to have nav, aside and article when a section element has the same effect on the outlining algorithm. But we have those elements because they cover very common use cases.

Hixie’s reductio ad absurdum argument is that if there is a special element for timestamps, we should also have an element for every other possible piece of machine-readable data. If that’s the case we should also have an infinite amount of sectioning content elements: blogpost, comment, chapter, etc. Instead we have just the four elements—section, article, aside and nav—because they represent the most common use cases …perhaps 80%?

The use case that the time element satisfies (human and machine-readable timestamps) is very, very common. The actual mechanism may vary (the time element itself, the abbr pattern, etc.) but the cow path is very much in need of paving.

There’s also a disturbingly Boolean trait to Hixie’s logic. He lumps all machine-readable data into the same bucket. If he had paid attention to Tantek’s research on abstracting microformat vocabularies, he would have seen that it’s a bit more fine-grained than that. There’s a difference between straightforward name/value pairs (like fn or summary), URL values (like photo) and timestamps (like dtstart). The thing that distinguishes timestamps is that they have an existing predefined machine-readable format.

To strengthen his position Hixie introduced two strawmen into the discussion, claiming that the time element was intended to allow easier styling of dates and times with CSS and to also allow conversion of HTML documents into Atom. Those use cases are completely tangential to the fundamental reason for the existence of the time element. Hixie seems to have forgotten that he himself once had it enshrined in the spec that the purpose of the element was to allow users to add events to their calendars. I pointed out that this was an example usage of the more general pattern of machine-readable dates and times so the text of the spec was updated accordingly:

This element is intended as a way to encode modern dates and times in a machine-readable way so that, for example, user agents can offer to add birthday reminders or scheduled events to the user’s calendar.

No mention of styling. No mention of converting documents to Atom.

I’m not the only one who is perplexed by Hixie’s bullheaded behaviour. Steve Faulkner has entered a revert request on the W3C side of things (he’s a braver man than me: the byzantine W3C process scares me off). Ben summed up the situation nicely on Twitter. You can find plenty of other reactions on Twitter by searching for the hashtag #occupyhtml5. Bruce has written down his thoughts and the follow-on comments are worth reading. This reaction from Stephanie is both heartbreaking and completely understandable:

I have been pretty frustrated by this change. In fact, when I read the entire thread on the debacle, it nearly made me want to give up on teaching, and using HTML5. What’s the point when there’s really no discussion? A single person brings something up. Great arguments are made. And then he just does what he originally wanted to do anyway.

I’d like to think that a concerted campaign could sway Hixie but I don’t hold out much hope. Usually there’s only one way to get through to him and that’s by presenting data. Rightly so. In this case however, Hixie is ignoring the data completely. He’s also wilfully violating the fundamental design principles behind HTML5.

So what can we do? Well, just as with the incorrectly-defined semantics of the cite element we can make a stand and simply carry on using the time element in our web pages. If we do, then we’ll see more parsers and browsers implementing support for the time element. The fact that our documentation has been ripped away makes this trickier but it’s such a demonstrably useful addition to HTML that we cannot afford to throw it away based on the faulty logic of one person.

Hixie once said:

The reality is that the browser vendors have the ultimate veto on everything in the spec, since if they don’t implement it, the spec is nothing but a work of fiction. So they have a lot of influence—I don’t want to be writing fiction, I want to be writing a spec that documents the actual behaviour of browsers.

Keep using the time element.

Citation needed

Over on the HTML5 Doctor site, Oli has written a great article called Quoting and citing with <blockquote>, <q>, <cite>, and the cite attribute.

Now, I still stand by my criticism of the way the cite element has been restrictively redefined in HTML5 such that it’s not supposed to be used for marking up a resource if that resource is a person. But I think that Oli has done a great job in setting out the counter-argument:

By better defining <cite>, we increase the odds of getting usable data from it, though we now need different methods to cover these other uses.

Oli’s article also delves into the blockquote element, which is defined in HTML5 as a sectioning root.

Don’t be fooled by the name: sectioning roots are very different to sectioning content in a fundamental way. Whereas sectioning content elements—section, article, nav and aside—are all about creating an explicit outline for the document from the headings contained within the sectioning content (using the new outline algorithm), the headings within sectioning roots (blockquote, td, fieldset, figure, etc.) don’t contribute to the document outline at all! But what sectioning roots and sectioning content have in common is that they both define the scope of the header and footer elements contained within them.

The footer element is defined as containing information about its section such as who wrote it, links to related documents, copyright data, and the like.

This gives a rise to rather lovely markup pattern that’s used on HTML5 Doctor: why not use the footer element within a blockquote to explicitly declare its provenance:

<blockquote>
<p>The people that designed .mobi were smoking crack.</p>
<footer>&mdash;<cite class="vcard">
<a class="fn url" href="http://tantek.com/">Tantek Çelik</a>
</cite></footer>
</blockquote>

(and yes, I am using the cite element to mark up a person’s name there).

Well, apparently that blockquote pattern is not allowed according to the spec:

Content inside a blockquote must be quoted from another source.

Because the content within the blockquote’s footer isn’t part of the quoted content, it shouldn’t be contained within the blockquote.

I think that’s a shame. So does Oli. He filed a bug. The bug was rejected with this comment:

If you want the spec to be changed, please provide rationale and reopen.

That’s exactly what Oli is doing. He has created a comprehensive document of block quote metadata from other resources: books, plays, style guides and so on.

Excellent work! That’s how you go about working towards a change in the spec—not with rhetoric, but with data.

That’s why my article complaining about the restrictions on the cite element is fairly pointless, but the wiki page that Tantek set up to document existing use cases is far more useful.

Landmark roles

David made a comment on Twitter about some markup he was working on:

Feels dirty setting id’s on main HTML5 page header and footer, but overriding inheritance they cause seems needlessly laborious.

I know the feeling. I don’t like using IDs at all, unless I want part of a document to be addressable through the fragment identifier portion of the URL. While I think it’s desirable to use the id attribute to create in-document permalinks, I don’t think it’s desirable to use the id attribute just as a styling hook. Its high specificity may seem a blessing but, in my experience, it quickly leads to duplicated CSS. IDs are often used as a substitute for understanding the cascade.

Nicole feels the same way about ID selectors, and she knows a thing or two about writing efficient CSS.

Back to David’s dilemma. Let’s say you’re targetting header and footer elements used throughout your document in sections, articles, etc. All you need to use is an element selector:

header {
// regular header styles
}

But what about the document-level header and footer? They’re probably styled very differently from the headings of sections and articles.

You could try using a child selector:

body > header

But there’s no guarantee that the document header will be a direct child of the body element. Hence the use of the id attribute—header id="ID":

header#ID {
// page header styles
}

There is another way. In HTML5 you can, for the first time, use ARIA roles and still have a valid document. ARIA landmark roles add an extra layer of semantics onto your document, distinguishing them as navigational landmarks for assistive technology.

Some of these landmark roles—like IDs—are to be used just once per document:

Within any document or application, the author SHOULD mark no more than one element with

That’s useful for two reasons. One, the existence of role="main" means it is not necessary for HTML5 to provide an element whose only semantic is to say “this is the main content of the document.”

A lot of people have asked for such an element in HTML5. “We’ve got header, footer and aside,” they say. “Why not maincontent too?”

But whereas header, footer and aside can and should be used multiple times per document, a maincontent element would, by necessity, only be used once per document. There are very few elements in HTML that are like that: the html element itself, the title element, head and body (contrary to popular opinion—likely spread by SEO witch-doctors—the h1 element can be used more than once per document).

Now the desired functionality of semantically expressing “this is the main content of the document” can be achieved, not with an element, but with an attribute value: role="main".

The rough skeleton of your document might look like this:

<header role="banner"></header>
<div role="main"></div>
<footer role="contentinfo"></footer>

Now you can see the second advantage of these one-off role values. You can use an attribute selector in your CSS to target those specific elements:

header[role="banner"] {
// page header styles
}

Voila! You can over-ride the default styling of headers and footers without resorting to the heavy-handed specificity of the ID selector.

And, of course, you get the accessibility benefits too. ARIA roles are supported in JAWS, NVDA and Voiceover

Article of doubt

A Day Apart in Seattle was more like a seminar than a workshop. Rather than being an intimate gathering in a small room, it was more lecture-like in an amphitheatre setting. But that didn’t stop me interacting with the attendees. There were plenty of great questions throughout, and I also had everyone complete an exercise.

I reprised the exercise I gave at dConstruct back in September. It isn’t a test of the audience. Rather, it’s a test of how well the new structural elements in HTML5 are described:

I then asked the attendees to match up the definitions with the element whose name sounded like the best match. To be clear: this wasn’t a test of knowledge. I was testing the spec.

The results from September’s test were quite revealing. There was some confusion between footer and details. Since then, the definitions in the spec have been updated and I’m happy to report that the Seattle audience—a much larger sampling—were almost unanimous in correctly matching element names to their definitions.

With one glaring exception.

The section and article elements were, once again, confused. This happened back in September at dConstruct. It happened again at A Day Apart in Seattle. I didn’t get exact numbers, but from the very web-savvy audience of about two hundred people, I would say there was a 50/50 split in matching up the definitions of section and article. About 50% of the attendees thought that the definition of section applied to article and visa-versa.

Historically, article and section were more distinct. The article element used to have optional cite and pubdate attributes. Now their content models are identical (apart from the fact that the article element can take an optional time element with a pubdate attribute).

The only thing that distinguishes the definition of article from the definition of section is the presence of the phrase self-contained. A section groups together thematically-related content. An article groups together self-contained thematically-related content. That distinction is too fine to warrant a separate element, in my opinion.

The existence of two elements that are practically semantically identical isn’t a harmless addition to HTML5. It’s causing a great deal of confusion. I’ve spoken to authors who incorrectly assumed that articles had to be within sections or that sections could only be within articles. The truth is that you can have sections within articles, articles within sections, sections within sections, articles within articles, or any other combination you can think of.

This isn’t helpful. Authors are confused. Yet, according to the HTML Design Principle of Priority of Constituencies:

In case of conflict, consider users over authors over implementors over specifiers over theoretical purity.

I don’t understand why Hixie is still clinging to the addition of the article element when he has repeatedly stated that he wants to keep the number of new elements to a minimum. Here’s the perfect opportunity: merge section and article into one element. Personally, I would keep section, with its more generic-sounding name.

We’ve been here before. The abbr and acronym elements were responsible for years of confusion amongst authors unsure of which one to use. The use-cases and the definitions of both elements were just too similar. That particular problem has been solved in HTML5: the acronym element is now obsolete. The abbr element works well enough for both use cases.

Let’s not repeat the mistake of abbr and acronym with article and section.

HTML5 watch

Keeping up with HTML5 can seem like a full-time job if you’re subscribed to both the W3C public-html list and the WHATWG mailing list.

If you have to choose just one, the WHATWG list is definitely the red pill. The W3C list has a very high volume of traffic, mostly about politics and procedure. Sam Ruby deserves a medal for keeping the whole thing on an even keel.

The WHATWG list, on the other hand, can get pretty nitty-gritty in its discussions of Web Workers, Offline Storage and other technologies that are completely over my head.

The specification itself is shaping up nicely. My list of bugbears is getting shorter and shorter:

  1. I’m still not convinced that the article element is necessary, given that it is almost indistinguishable from section. Having two very similar elements is potentially very confusing for authors. It’s hard enough deciding the difference between a section and a div.
  2. The time element is still unnecessarily restrictive. I don’t just mean that it’s restrictive in the sense that you can’t mark up a month, the very definition is too narrow. I hoisted the HTML5 spec by its own petard recently, pointing out that a different portion of the spec violates the definition of time.
  3. The cite element is also too restrictively defined, and in a backwards-incompatible way to boot. I’ve written more about that over on 24 Ways.

There are much bigger issues than these still outstanding—mostly related to the accessibility of audio, video and canvas—but I’ll leave it to smarter people than me to tackle those. My issues all revolve around semantics and, let’s face it, they’re kind of piddling little problems in the grand scheme of things.

On the whole, I’d say the spec is looking mighty fine. Most of it is ready for use today.

I think the next big challenge for HTML5 lies with the tools. It’s great that we’ve got a validator but what we really need is —something like JSlint but for checking markup writing style: case sensitivity of tags, quotes around attributes, that kind of thing. Robert Nyman concurs.

Let be clear: I’m not talking about a validator that checks for polyglot documents i.e. HTML that can also be parsed as XML. I’m talking purely about writing style and personal preference; a tool that will help enforce an in-house style guide of arbitrary “best” practices.

I’ve impressed this upon Henri in IRC on a few occasions. He has explained to me that it’s not so easy to build a true syntax checker …and no, you can’t just use regular expressions.

Still, I think that there would be enormous value in having even an imperfect tool to help authors who want to write HTML5 right now but also want to enforce a strict syntax on themselves. A working rough’n’ready lint tool that catches 80% of the most common gotchas is better than a theoretical perfect tool that will work 100% of the time but that currently works 0% of the time because it doesn’t exist yet.

Anybody want to step up to challenge?

The devil in the details

Looking through the list of hiccups highlighted by the HTML5 Super Friends and my own personal tally, things are progressing at a nice clip with HTML5.

That still leaves a few issues:

  • The confusion between section and article that I’ve been researching.
  • The restrictive content model of the small element not matching that element’s updated semantics.
  • The time element not allowing month or year dates i.e. YYYY-MM or YYYY.

Then there’s the issue with details and figure. The insistence on recycling the legend element leads to all sorts of problems with browsers today, as described by Remy.

This has been an ongoing discussion in the HTML5 IRC channel and on the HTML5 mailing lists. It flared up again recently and I fired off an email to the HTML Working Group yesterday:

I understand the aversion to introducing a new element … but I don’t understand why legend is being treated as the only possible existing element to recycle.

For example, dt and dd are being recycled in the new context of dialog so they no longer mean “definition title” and “definition description”. Now they can also mean (presumably) “dialog title” and “dialog description”.

If those elements are already being recycled, why not apply the same thinking to details so that dt and dd could also mean “details title” and “details description”?

To be honest, I was just spouting that out without really thinking. How about… something like—not this, obviously, not this, but what if…

Ok… Not This…

So imagine my surprise when Hixie responded:

That’s not a bad idea actually. Ok, done.

Wow. Okay.

While I was at it, I also did this for figure

Alright. Makes sense, I suppose …although the names of the elements dt and dd aren’t quite as intuitive in the context of figure as they are in details.

and removed dialog from the spec altogether.

Er …okay. Seems somewhat unrelated to the issue at hand but I guess that the dialog element has been a point of contention. It’s mentioned by the HTML5 Super Friends so that’s another one that we can tick off the list.

So how should we mark up conversations now? Here’s what the spec now suggests:

Authors who need to mark the speaker for styling purposes are encouraged to use span or b.

Um… what? The b element? Really? You have got to be kidding.

Excuse me while I channel Carrie’s mother, but I can predict exactly how authors are going to react to this advice: They’re all going to laugh at you!

@gcarothers: Jaw, floor, WHAT?! http://bit.ly/48Bhta Why in the world would #html5 suggest using <b> tags to markup names?

@akamike: There are more appropriate tags than <b> for marking up names in conversations. “The b element should be used as a last resort…” #html5

@cssquirrel: Well, if the <p><b> recommendations for dialog in #html5 persist for a week, I know what I’m drawing.

Perhaps now would be a good time to mention that the cite element is also listed by the HTML5 Super Friends. Call me crazy but I think it might just be the right element for marking up the person being cited.

<p> <cite>Costello</cite>: <q>Who's playing first?</q> </p>
<p> <cite>Abbott</cite>: <q>That's right.</q> </p>

HTML5 test results

As promised, I’ve gathered the data from one of the exercises I administered at the dConstruct HTML5 and CSS3 workshop and totted the results up in a table.

There were 30 people in the workshop but I only managed to retrieve 22 results—I don’t know what happened to the missing eight sheets of answers. This is a smaller sampling than I was hoping for and I realise that it’s too small to be considered scientifically accurate but I think it’s still interesting to see the responses of 22 smart, savvy web developers.

Across the top (in the table header) are the possible answers; nine new elements in the HTML5 spec. Each row shows the answers given for each element as workshop attendees attempted to match up the names of the elements with the nine definitions provided from the spec. The most common answer in each case has been highlighted.

HTML5 Workshop Results
article aside details figure footer header hgroup nav section
article 5 0 2 1 1 1 0 1 9
aside 1 11 1 3 2 0 0 0 2
details 1 6 8 1 4 0 0 0 0
figure 3 2 2 14 1 0 0 0 0
footer 0 0 10 0 10 1 0 0 1
header 0 0 0 2 0 16 0 4 0
hgroup 0 0 0 0 0 1 21 0 0
nav 0 0 1 0 0 0 0 19 2
section 12 0 0 0 1 1 1 0 7

Ideally, the table would show a nice uniform graph: a solid diagonal line from the top left left to the bottom right. The diagonal line is there but it isn’t exactly uniform.

For the record, here are the element names together with their correct definitions. I’ve included a little sparkline with each one to show the distribution of answers. A unanimous result would show one clear spike. Wobbly uneven sparklines indicate a corresponding uncertainty in the answers.

article

…a section of a page that consists of a composition that forms an independent part of a document, page, or site.

aside

…represents a section of a page that consists of content that is tangentially related to the content around it, and which could be considered separate from that content.

details

…additional information or controls which the user can obtain on demand.

figure

…some flow content, optionally with a caption, that is self-contained and is typically referenced as a single unit from the main flow of the document.

footer

…typically contains information about its section such as who wrote it, links to related documents, copyright data, and the like.

header

…a group of introductory or navigational aids.

hgroup

…used to group a set of h1–h6 elements when the heading has multiple levels, such as subheadings, alternative titles, or taglines.

nav

…a section of a page that links to other pages or to parts within the page: a section with navigation links.

section

…a thematic grouping of content, typically with a heading, possibly with a footer.

There are some interesting results here; some of them surprising, some of them expected.

The most easily identified element is hgroup. That’s also the only element name that isn’t taken from an existing English word. I don’t think that’s a coincidence. When the name of an element is created specifically to describe what that element does, it’s no surprise that the name and the definition are tightly coupled.

I was expecting to see more confusion between aside and figure. Both elements serve a similar purpose and in my opinion, the names could be swapped around without changing the definitions in the spec. But the data doesn’t support my hypothesis.

The confusion around footer surprised me. I would have expected to see more confusion between footer and aside than between footer and details.

The most glaring problem lies with section and article. This doesn’t surprise me. At this point, section and article are practically indistinguishable. For a while there, article had some more optional attributes—cite and pubdate—but those have now been removed. Now section and article have the same content model and appear to cover much the same kind of content:

…a thematic grouping of content, typically with a heading, possibly with a footer.

…a section of a page that consists of a composition that forms an independent part of a document, page, or site.

Witness the recent article on HTML5 Doctor wherein Bruce attempts to clear up the confusion and points out where the HTML5 Doctor site has been getting it wrong. If it’s tough for those smart guys to figure out when to use section, article or div, it doesn’t bode well for the rest of us.

Choosing the right element would be easier if there were some clear rules like “You can only use sections within articles” or “You can only use articles within a section” but as it is, you can nest sections in articles and articles in a section. With that much rope, authors are likely to hang themselves, overdosing on the semantic freedom.

I don’t think there needs to be a section element and an article element. I don’t have a particularly strong opinion about which one should stay and which one should go but a little trimming is definitely in order.

Now, if you’ll excuse me, I’m off to present this data to the list.

Microformation

It’s been a busy week for microformats.

Google announced that it was following in the footsteps of in indexing microformats and RDFa to display in search results. For now, it’s a subset of microformats— and —on a subset of websites, including the newly microformated Yelp. The list of approved sites will increase over time so if you’re already publishing structured contact and review information, let Google know about it.

But the other new announcement is equally important. After a lot of hard work, the is ready for use.

The what now? I hear you ask. Well, if you’ve been feeling hampered by the combination of the and design patterns, the value class pattern offers a few alternatives.

To my mind, that’s one of the greatest strengths of the value class pattern: it doesn’t offer one alternative, it allows authors to choose how they want to mark up their content. I think that’s one of the reasons why datetime values have proven to be such a sticking point up ‘till now. Concerns about semantics and accessibility really come down to the fact that, as an author, you had very little choice in how you could mark up a datetime value.

You could either present the datetime between the opening and closing tags of whatever element you were using:

<span class="dtstart">
 2009-06-05T20:00:00
</span>

…or you could put the value in the title attribute of the abbr element:

<abbr class="dtstart" title="2009-06-05T20:00:00">
 Friday, June 5th at 8pm
</abbr>

Those were your only options.

But now, with the value class pattern, all of the following are possible:

<span class="dtstart">
 <abbr class="value" title="2009-06-05">
  Friday, June 5th
 </abbr>
 at
 <abbr class="value" title="20:00">
  8pm
 </abbr>
</span>

<span class="dtstart">
 <span class="value-title" title="2009-06-05T20:00:00">
  Friday, June 5th at 8pm
 </span>
</span>

<span class="dtstart">
 <span class="value-title" title="2009-06-05">
  Friday, June 5th
 </span>
 at
 <span class="value-title" title="20:00">
  8pm
 </span>
</span>

<span class="dtstart">
 <span class="value-title" title="2009-06-05T20:00:00"> </span>
 Friday, June 5th at 8pm
</span>

Personally, I’ll probably use the first example. I like the idea of splitting up the date and time portions of a datetime value. I think there’s a big difference between putting a date string into the title attribute of an abbr element and putting a datetime string in there. In the past, when I argued that having an ISO date value in an abbreviation was semantic, accessible and internationalised, Mike Davies rightly accused me of using a strawman—the issue wasn’t about dates, it was about datetimes. That’s why I created the page on the microformats wiki; to disambiguate it as a subset of the larger .

Now, others might think that even using dates in combination with the abbr design pattern is semantically dodgy. That’s fine. They now have some other options they can use, thanks to the value-title subset of the value class pattern. Me? I don’t see myself using that. I’m especially not keen on the option to use an empty element. But I’m perfectly happy for other authors to go ahead and use that option. When it comes to writing, there are often no right or wrong answers, just personal preferences. That’s true whether it’s English, HTML, or any other language. As long as you use correct syntax and grammar, the details are up to you. You can choose semicolons or em-dashes when you’re writing English. You can choose abbr or value-title when you’re writing microformats.

The wiki page for the value class pattern doesn’t just list the options available to authors. It also explains them. That’s just as important. Head over there and read the document. I think you’ll agree that it’s an excellent example of clear, methodical writing.

The microformats wiki needs more pages like that. One of the biggest challenges facing microformats isn’t any particular technical problem; it’s trying to explain to willing HTML authors how to get up and running with microformats. Given Google’s recent announcement, there’ll probably be even more eager authors showing up, looking to sprinkle some extra semantics into their markup. We’ll be hanging out in the IRC channel, ready to answer any questions people might have, but I wish the wiki were laid out in a more self-explanatory way.

In the face of that challenge, the page for the value class pattern leads by example. Ben and Tantek have done a fantastic job. And it wouldn’t have been possible without the help and support of Bruce, James and Derek, those magnanimous giants of the accessibility community who offered help, support and data.

Semantic brevity

When I write here at adactio.com, I often sprinkle in some microformats. As I wrote in Natural Language hCard, I’ve developed a sense of smell for microformats:

Once I started looking for it, I started seeing identity and event information in lots of places… even when it doesn’t explicitly look like cards or calendars.

If I’m linking to somebody using their full formated name, then it’s a no-brainer that I’ll turn that into an hCard:

<span class="vcard">
<a class="fn url" href="http://example.com/">
Joe Bloggs
</a>
</span>

But what if I don’t want want to use the full name? It would sound somewhat stilted if I wrote:

I was chatting with Richard Rutter the other day…

When you work alongside someone every day, it sounds downright weird to always refer to them by their full name. It’s much more natural for me to write:

I was chatting with Richard the other day…

I would still make his name a hyperlink but what can I do about making this text into an hCard? Should I change my writing style and refer to everyone by their full formated name even if the context and writing style would favour just using their first name?

Enter the abbr element:

ABBR: Indicates an abbreviated form

I can write “Richard” in my body text and use the semantics of (X)HTML to indicate that this is the abbreviated form of “Richard Rutter”:

<abbr title="Richard Rutter">
Richard
</abbr>

From there, it’s a simple step to providing an hCard containing the formated name without compromising the flow of my text:

<span class="vcard">
<a class="url" href="http://clagnut.com/">
<abbr class="fn" title="Richard Rutter">
Richard
</abbr>
</a>
</span>

Now a parser will have to do some extra legwork to find the formated name within the title attribute of the abbr element rather than in the text between the opening and closing tags of whatever element has a class of “fn”. But that’s okay. That’s all part of the microformats philosophy:

Designed for humans first and machines second

Specifically, humans who publish first, machines that parse second.

If I were to link off to Richard’s site from here, I’d also combine my microformats: hCard + XFN:

<span class="vcard">
<a class="url" rel="friend met co-worker" href="http://clagnut.com/">
<abbr class="fn" title="Richard Rutter">
Richard
</abbr>
</a>
</span>

Now I’ve got a bounty of semantic richness:

All of that in one word of one clause of one sentence:

I was chatting with Richard the other day…

The language of accessibility

While I was in Berlin for the BIENE awards, I found myself thinking a lot about language and meaning… as one does when one is in a foreign country.

First of all, I think I may have inadvertently insulted my fellow jury members, and just about everyone else I came into contact with, by continuously using the familiar, rather than the polite form. I never could figure out when to use “du” and when to use “sie”, so I’ve always just stuck with “du.”

Secondly, I was thinking about the German word being used to describe accessibility: “Barrierefreiheit”, literally “free from obstacles.” It’s a good word, but because it’s describes websites by what they don’t contain (obstacles), it leads to a different way of thinking about the topic.

In English, it’s relatively easy to qualify the word “accessible.” We can talk about sites being “quite accessible”, “fairly accessible”, or “very accessible”. But if you define accessibility as a lack of obstacles, then as long as a single obstacle remains in place it’s hard to use the word “barrierefrei” as an adjective. The term is too binary; black or white; yes or no.

Thinking further along these lines, I realised that English is not without its problems in this regard. Consider for a minute the term “making a website accessible.” There’s an implication there that accessibility is something that needs to be actively added, something that requires an expenditure of energy and therefore money.

The BIENE awards ceremony began with some words of wisdom from Johnny Haeusler, the German of equivalent of Jason Kottke and Tom Coates rolled into one:

In der realen Welt steht fast immer die Frage im Mittelpunkt, was wir tun müssen, um Menschen mit Behinderung zu integrieren. Im Internet müssen wir umdenken und fragen, was wir tun müssen, um niemanden auszuschließen – und dabei spielt die Barrierefreiheit eine zentrale Rolle.

In the real world, we’re almost always asking what we can do to include handicapped people. On the Internet, we need to rethink this question and ask what we can do so that we don’t shut anybody out—and accessibility plays a central role in that.

This highlights a really important point: good markup is accessible by default. As long as you’re using HTML elements in a semantically meaningful way—which you should be doing anyway, without even thinking about accessibility—then your documents will be accessible to begin with. It’s only through other additions—visual presentation, behaviour, etc.—that accessibility is removed.

Far from being something that is added to a site, accessibility is something we need to ensure isn’t removed. From that perspective, the phrase “making a site accessible” isn’t accurate.

Just as “progressive enhancement” sounds better than “graceful degradation”, talking about accessibility as something that needs to be added onto a website isn’t doing us any favours. Accessibility is not a plug-in. It’s not something that can be bolted onto a site after the fact. So here’s what I’m proposing:

From now on, instead of talking about making a site accessible, I’m going to talk about keeping a site accessible.

Join me.

Museum piece

I had a thoroughly enjoyable time at the Semantic Web Think Tank this week. Most of the other attendees were there representing museums and—geek that I am—I found it thrilling to be able to chat with people from such venerable institutions as the V&A Museum, the Science Museum and the Natural History Museum.

Apart from one focused effort to explain microformats, my contributions were pretty rambling affairs. Still, I thought I’d try to put together a list of things I mentioned while they’re still fresh in my mind.

There was a lot of talk about tagging and folksonomies, including a great demo of the user-contributed content at the Powerhouse Museum in Sydney. I was chatting with Glenda later on and she pointed me to the steve.museum project, which is described as the first experiment in social tagging of art museum collections. That sounds like exactly what we were talking about at the think tank.

There was also plenty of talk about (uppercase) Semantic Web technologies. Museums have to deal with classifications far beyond the reach of microformats—with the glaring exception of events, which are crying out to be marked up in hCalendar. I don’t envy them the task of classifying and publishing all their data, but I remain convinced that user contributions (a la Wikipedia) can help enormously.

Thinktanking

I’m sitting in on a Semantic Web Think Tank today which is conveniently happening right here in Brighton (in the same building as the new Clearleft office). I’m really enjoying it so far. I’m just taking some time out to blog this and demonstrate hCalendar at the same time.

Natural language hCard

Drew has written up the process of adding hCard to Vitamin. To view the results in action, try visiting Vitamin using Firefox with the Tails extension installed.

Drew explains what prompted the conversion:

Something struck me about the site as soon as I saw all the list of advisors. It’s the perfect candidate for exuberant use of the hCard microformat, so I dropped Ryan Carson a line and made the suggestion.

I recently had a similar satori. On every page of this site, there’s a little explanatory blurb in the sidebar:

Adactio is the online home of Jeremy Keith, a web developer living and working in Brighton, England.

This used to be contained in a simple paragraph element:

<p>Adactio is the online home of Jeremy Keith, a web developer living and working in Brighton, England.</p>

I noticed that this little paragraph contained:

  • my full name,
  • my profession,
  • my town and
  • my country.

<p>Adactio is the online home of Jeremy Keith, a web developer living and working in Brighton, England.</p>

This information is easily encoded in hCard:

<p class="vcard">Adactio is the online home of <span class="fn">Jeremy Keith</span>, a <span class="title">web developer</span> living and working in <span class="adr"><span class="locality">Brighton</span>, <span class="country-name">England</span></span>.</p>

Bam! Instant semantic richness.

I could have stopped there but I decided to go a little further and add a URL and email address:

<p class="vcard"><a class="url" href="http://adactio.com/">Adactio</a> is the online home of <a class="email fn" href="mailto:jeremy@adactio.com">Jeremy Keith</a>, a <span class="title">web developer</span> living and working in <span class="adr"><span class="locality">Brighton</span>, <span class="country-name">England</span></span>.</p>

This got me thinking about other places where microformats can be insterspersed in the flow of natural English sentences. I updated the author page over on the DOM Scripting site (there’s already an hCalendar of events). Speaker bios on conference websites would be another prime candidate.

I borrowed an idea from Andy Hume and started marking up comments as hCards on DOM Scripting and Principia Gastronomica. Once I started looking for it, I started seeing identity and event information in lots of places… even when it doesn’t explicitly look like cards or calendars.

The next time I’m writing or marking up some copy, I intend to sniff it for names, contact details, dates, etc. I hope to develop a nose for microformats.