Tags: data

14

sparkline

Metadata markup

When something on your website is shared on Twitter or Facebook, you probably want a nice preview to appear with it, right?

For Twitter, you can use Twitter cards—a collection of meta elements you place in the head of your document.

For Facebook, you can use the grandiosely-titled Open Graph protocol—a collection of meta elements you place in the head of your document.

What’s that you say? They sound awfully similar? Why, no! I mean, just look at the difference. Here’s how you’d mark up a blog post for Twitter:

<meta name="twitter:url" content="https://adactio.com/journal/9881">
<meta name="twitter:title" content="Metadata markup">
<meta name="twitter:description" content="So many standards to choose from.">
<meta name="twitter:image" content="https://adactio.com/icon.png">

Whereas here’s how you’d mark up the same blog post for Facebook:

<meta property="og:url" content="https://adactio.com/journal/9881">
<meta property="og:title" content="Metadata markup">
<meta property="og:description" content="So many standards to choose from.">
<meta property="og:image" content="https://adactio.com/icon.png">

See? Completely different.

Okay, I’ll attempt to dial down my sarcasm, but I find this wastage annoying. It adds unnecessary complexity, which in turn, I suspect, puts a lot of people off even trying to implement this stuff. In short: 927.

We’ve seen this kind of waste before. I remember when Netscape and Microsoft were battling it out in the browser wars: Internet Explorer added a proprietary acronym element, while Netscape added the abbr element. They both basically did the same thing. For years, Internet Explorer refused to implement the abbr element out of sheer spite.

A more recent example of the negative effects of competing standards was on display at this year’s Edge conference in London. In a session on front-end data, Nolan Lawson decried the fact that developers weren’t making more use of the client-side storage options available in browsers today. After all, there are so many to choose from: LocalStorage, WebSQL, IndexedDB…

(Hint: if developers aren’t showing much enthusiasm for the latest and greatest API which is sooooo much better than the previous APIs they were also encouraged to use at the time, perhaps their reticence is understandable.)

Anyway, back to metacrap.

Matt has written a guide to what you need to do in order to get a preview of your posts to appear in Slack. Fortunately the answer is not yet another collection of meta elements to place in the head of your document. Instead, Slack piggybacks on the existing combatants: oEmbed, Twitter Cards, and Open Graph.

So to placate both Twitter and Facebook (with Slack thrown in for good measure), your metadata markup is supposed to look something like this:

<meta name="twitter:card" content="summary">
<meta name="twitter:site" content="@adactio">
<meta name="twitter:url" content="https://adactio.com/journal/9881">
<meta name="twitter:title" content="Metadata markup">
<meta name="twitter:description" content="So many standards to choose from.">
<meta name="twitter:image" content="https://adactio.com/icon.png">
<meta property="og:url" content="https://adactio.com/journal/9881">
<meta property="og:title" content="Metadata markup">
<meta property="og:description" content="So many standards to choose from.">
<meta property="og:image" content="https://adactio.com/icon.png">

There are two things on display here: redundancy, and also, redundancy.

Now the eagle-eyed amongst you will have spotted a crucial difference between the Twitter metacrap and the Facebook metacrap. The Twitter metacrap uses the name attribute on the meta element, whereas the Facebook metacrap uses the property attribute. Technically, there is no property attribute in HTML—it’s an RDFa thing. But the fact that they’re using two different attributes means that we can squish the meta elements together like this:

<meta name="twitter:card" content="summary">
<meta name="twitter:site" content="@adactio">
<meta name="twitter:url" property="og:url" content="https://adactio.com/journal/9881">
<meta name="twitter:title" property="og:title" content="Metadata markup">
<meta name="twitter:description" property="og:description" content="So many standards to choose from.">
<meta name="twitter:image" property="og:image" content="https://adactio.com/icon.png">

There. I saved you at least a little bit of typing.

The metacrap situation is even more ridiculous for “add to homescreen”/”pin to start”/whatever else browser makers can’t agree on…

Microsoft:

<meta name="msapplication-starturl" content="https://adactio.com" />
<meta name="msapplication-window" content="width=800;height=600">
<meta name="msapplication-tooltip" content="Kill me now...">

Apple:

<link rel="apple-touch-icon" href="https://adactio.com/icon.png">

(Repeat four or five times with different variations of icon sizes, and be sure to create icons with new sizes after every. single. Apple. keynote.)

Fortunately Google, Opera, and Mozilla appear to be converging on using an external manifest file:

<link rel="manifest" href="https://adactio.com/manifest.json">

Perhaps our long national nightmare of balkanised metacrap is finally coming to an end, and clearer heads will prevail.

Hope

Cennydd points to an article by Ev Williams about the pendulum swing between open and closed technology stacks, and how that pendulum doesn’t always swing back towards openness. Cennydd writes:

We often hear the idea that “open platforms always win in the end”. I’d like that: the implicit values of the web speak to my own. But I don’t see clear evidence of this inevitable supremacy, only beliefs and proclamations.

It’s true. I catch myself saying things like “I believe the open web will win out.” Statements like that worry my inner empiricist. Faith-based outlooks scare me, and rightly so. I like being able to back up my claims with data.

Only time will tell what data emerges about the eventual fate of the web, open or closed. But we can look to previous technologies and draw comparisons. That’s exactly what Tim Wu did in his book The Master Switch and Jonathan Zittrain did in The Future Of The Internet—And How To Stop It. Both make for uncomfortable reading because they challenge my belief. Wu points to radio and television as examples of systems that began as egalitarian decentralised tools that became locked down over time in ever-constricting cycles. Cennydd adds:

I’d argue this becomes something of a one-way valve: once systems become closed, profit potential tends to grow, and profit is a heavy entropy to reverse.

Of course there is always the possibility that this time is different. It may well be that fundamental architectural decisions in the design of the internet and the workings of the web mean that this particular technology has an inherent bias towards openness. There is some data to support this (and it’s an appealing thought), but again; only time will tell. For now it’s just one more supposition.

The real question—when confronted with uncomfortable ideas that challenge what you’d like to believe is true—is what do you do about it? Do you look for evidence to support your beliefs or do you discard your beliefs entirely? That second option looks like the most logical course of action, and it’s certainly one that I would endorse if there were proven facts to be acknowledged (like gravity, evolution, or vaccination). But I worry about mistaking an argument that is still being discussed for an argument that has already been decided.

When I wrote about the dangers of apparently self-evident truisms, I said:

These statements aren’t true. But they are repeated so often, as if they were truisms, that we run the risk of believing them and thus, fulfilling their promise.

That’s my fear. Only time will tell whether the closed or open forces will win the battle for the soul of the internet. But if we believe that centralised, proprietary, capitalistic forces are inherently unstoppable, then our belief will help make them so.

I hope that openness will prevail. Hope sounds like such a wishy-washy word, like “faith” or “belief”, but it carries with it a seed of resistance. Hope, faith, and belief all carry connotations of optimism, but where faith and belief sound passive, even downright complacent, hope carries the promise of action.

Margaret Atwood was asked about the futility of having hope in the face of climate change. She responded:

If we abandon hope, we’re cooked. If we rely on nothing but hope, we’re cooked. So I would say judicious hope is necessary.

Judicious hope. I like that. It feels like a good phrase to balance empiricism with optimism; data with faith.

The alternative is to give up. And if we give up too soon, we bring into being the very endgame we feared.

Cennydd finishes:

Ultimately, I vote for whichever technology most enriches humanity. If that’s the web, great. A closed OS? Sure, so long as it’s a fair value exchange, genuinely beneficial to company and user alike.

This is where we differ. Today’s fair value exchange is tomorrow’s monopoly, just as today’s revolutionary is tomorrow’s tyrant. I will fight against that future.

To side with whatever’s best for the end user sounds like an eminently sensible metric to judge a technology. But I’ve written before about where that mindset can lead us. I can easily imagine Asimov’s three laws of robotics rewritten to reflect the ethos of user-centred design, especially that first and most important principle:

A robot may not injure a human being or, through inaction, allow a human being to come to harm.

…rephrased as:

A product or interface may not injure a user or, through inaction, allow a user to come to harm.

Whether the technology driving the system behind that interface is open or closed doesn’t come into it. What matters is the interaction.

But in his later years Asimov revealed the zeroeth law, overriding even the first:

A robot may not harm humanity, or, by inaction, allow humanity to come to harm.

It may sound grandiose to apply this thinking to the trivial interfaces we’re building with today’s technologies, but I think it’s important to keep drilling down and asking uncomfortable questions (even if they challenge our beliefs).

That’s why I think openness matters. It isn’t enough to use whatever technology works right now to deliver the best user experience. If that short-time gain comes with a long-term price tag for our society, it’s not worth it.

I would much rather an imperfect open system to a perfect proprietary one.

I have hope in an open web …judicious hope.

August in America, day twelve

Today was a travel day, but it was a short travel day: the flight from Tucson to San Diego takes just an hour. It took longer to make the drive up from Sierra Vista to Tucson airport.

And what a lovely little airport it is. When we showed up, we were literally the only people checking in and the only people going through security. After security is a calm oasis, free of the distracting TV screens that plague most other airports. Also, it has free WiFi, which was most welcome. I’m relying on WiFi, not 3G, to go online on this trip.

I’ve got my iPhone with me but I didn’t do anything to guarantee myself a good data plan while I’m here in the States. Honestly, it’s not that hard to not always be connected to the internet. Here are a few things I’ve learned along the way:

  1. To avoid accidentally using data and getting charged through the nose for it, you can go into the settings of your iPhone and under General -> Cellular, you can switch “Cellular Data” to “off”. Like it says, “Turn off cellular data to restrict all data to Wi-Fi, including email, web browsing, and push notifications.”
  2. If you do that, and you normally use iMessage, make sure to switch iMessage off. Otherwise if someone with an iPhone in the States sends you an SMS, you won’t get it until the next time you connect to a WiFi network. I learned this the hard way: it happened to me twice on this trip before I realised what was going on.
  3. I use Google Maps rather than Apple Maps. It turns out you can get offline maps on iOS (something that’s been available on Android for quite some time). Open the Google Maps app while you’re still connected to a WiFi network; navigate so that the area you want to save is on the screen; type “ok maps” into the search bar; now that map is saved and zoomable for offline browsing.

Battle for the planet of the APIs

Back in 2006, I gave a talk at dConstruct called The Joy Of API. It basically involved me geeking out for 45 minutes about how much fun you could have with APIs. This was the era of the mashup—taking data from different sources and scrunching them together to make something new and interesting. It was a good time to be a geek.

Anil Dash did an excellent job of describing that time period in his post The Web We Lost. It’s well worth a read—and his talk at The Berkman Istitute is well worth a listen. He described what the situation was like with APIs:

Five years ago, if you wanted to show content from one site or app on your own site or app, you could use a simple, documented format to do so, without requiring a business-development deal or contractual agreement between the sites. Thus, user experiences weren’t subject to the vagaries of the political battles between different companies, but instead were consistently based on the extensible architecture of the web itself.

Times have changed. These days, instead of seeing themselves as part of a wider web, online services see themselves as standalone entities.

So what happened?

Facebook happened.

I don’t mean that Facebook is the root of all evil. If anything, Facebook—a service that started out being based on exclusivity—has become more open over time. That’s the cause of many of its scandals; the mismatch in mental models that Facebook users have built up about how their data will be used versus Facebook’s plans to make that data more available.

No, I’m talking about Facebook as a role model; the template upon which new startups shape themselves.

In the web’s early days, AOL offered an alternative. “You don’t need that wild, chaotic lawless web”, it proclaimed. “We’ve got everything you need right here within our walled garden.”

Of course it didn’t work out for AOL. That proposition just didn’t scale, just like Yahoo’s initial model of maintaining a directory of websites just didn’t scale. The web grew so fast (and was so damn interesting) that no single company could possibly hope to compete with it. So companies stopped trying to compete with it. Instead they, quite rightly, saw themselves as being part of the web. That meant that they didn’t try to do everything. Instead, you built a service that did one thing really well—sharing photos, managing links, blogging—and if you needed to provide your users with some extra functionality, you used the best service available for that, usually through someone else’s API …just as you provided your API to them.

Then Facebook began to grow and grow. I remember the first time someone was showing me Facebook—it was Tantek of all people—I remember asking “But what is it for?” After all, Flickr was for photos, Delicious was for links, Dopplr was for travel. Facebook was for …everything …and nothing.

I just didn’t get it. It seemed crazy that a social network could grow so big just by offering …well, a big social network.

But it did grow. And grow. And grow. And suddenly the AOL business model didn’t seem so crazy anymore. It seemed ahead of its time.

Once Facebook had proven that it was possible to be the one-stop-shop for your user’s every need, that became the model to emulate. Startups stopped seeing themselves as just one part of a bigger web. Now they wanted to be the only service that their users would ever need …just like Facebook.

Seen from that perspective, the open flow of information via APIs—allowing data to flow porously between services—no longer seemed like such a good idea.

Not only have APIs been shut down—see, for example, Google’s shutdown of their Social Graph API—but even the simplest forms of representing structured data have been slashed and burned.

Twitter and Flickr used to markup their user profile pages with microformats. Your profile page would be marked up with hCard and if you had a link back to your own site, it include a rel=”me” attribute. Not any more.

Then there’s RSS.

During the Q&A of that 2006 dConstruct talk, somebody asked me about where they should start with providing an API; what’s the baseline? I pointed out that if they were already providing RSS feeds, they already had a kind of simple, read-only API.

Because there’s a standardised format—a list of items, each with a timestamp, a title, a description (maybe), and a link—once you can parse one RSS feed, you can parse them all. It’s kind of remarkable how many mashups can be created simply by using RSS. I remember at the first London Hackday, one of my favourite mashups simply took an RSS feed of the weather forecast for London and combined it with the RSS feed of upcoming ISS flypasts. The result: a Twitter bot that only tweeted when the International Space Station was overhead and the sky was clear. Brilliant!

Back then, anywhere you found a web page that listed a series of items, you’d expect to find a corresponding RSS feed: blog posts, uploaded photos, status updates, anything really.

That has changed.

Twitter used to provide an RSS feed that corresponded to my HTML timeline. Then they changed the URL of the RSS feed to make it part of the API (and therefore subject to the terms of use of the API). Then they removed RSS feeds entirely.

On the Salter Cane site, I want to display our band’s latest tweets. I used to be able to do that by just grabbing the corresponding RSS feed. Now I’d have to use the API, which is a lot more complex, involving all sorts of authentication gubbins. Even then, according to the terms of use, I wouldn’t be able to display my tweets the way I want to. Yes, how I want to display my own data on my own site is now dictated by Twitter.

Thanks to Jo Brodie I found an alternative service called Twitter RSS that gives me the RSS feed I need, ‘though it’s probably only a matter of time before that gets shuts down by Twitter.

Jo’s feelings about Twitter’s anti-RSS policy mirror my own:

I feel a pang of disappointment at the fact that it was really quite easy to use if you knew little about coding, and now it might be a bit harder to do what you easily did before.

That’s the thing. It’s not like RSS is a great format—it isn’t. But it’s just good enough and just versatile enough to enable non-programmers to make something cool. In that respect, it’s kind of like HTML.

The official line from Twitter is that RSS is “infrequently used today.” That’s the same justification that Google has given for shutting down Google Reader. It reminds of the joke about the shopkeeper responding to a request for something with “Oh, we don’t stock that—there’s no call for it. It’s funny though, you’re the fifth person to ask today.”

RSS is used a lot …but much of the usage is invisible:

RSS is plumbing. It’s used all over the place but you don’t notice it.

That’s from Brent Simmons, who penned a love letter to RSS:

If you subscribe to any podcasts, you use RSS. Flipboard and Twitter are RSS readers, even if it’s not obvious and they do other things besides.

He points out the many strengths of RSS, including its decentralisation:

It’s anti-monopolist. By design it creates a level playing field.

How foolish of us, therefore, that we ended up using Google Reader exclusively to power all our RSS consumption. We took something that was inherently decentralised and we locked it up into one provider. And now that provider is going to screw us over.

I hope we won’t make that mistake again. Because, believe me, RSS is far from dead just because Google and Twitter are threatened by it.

In a post called The True Web, Robin Sloan reiterates the strength of RSS:

It will dip and diminish, but will RSS ever go away? Nah. One of RSS’s weaknesses in its early days—its chaotic decentralized weirdness—has become, in its dotage, a surprising strength. RSS doesn’t route through a single leviathan’s servers. It lacks a kill switch.

I can understand why that power could be seen as a threat if what you are trying to do is force your users to consume their own data only the way that you see fit (and all in the name of “user experience”, I’m sure).

Returning to Anil’s description of the web we lost:

We get a generation of entrepreneurs encouraged to make more narrow-minded, web-hostile products like these because it continues to make a small number of wealthy people even more wealthy, instead of letting lots of people build innovative new opportunities for themselves on top of the web itself.

I think that the presence or absence of an RSS feed (whether I actually use it or not) is a good litmus test for how a service treats my data.

It might be that RSS is the canary in the coal mine for my data on the web.

If those services don’t trust me enough to give me an RSS feed, why should I trust them with my data?

Canvas sparklines

I like sparklines a lot. Tufte describes a sparkline as:

…a small intense, simple, word-sized graphic with typographic resolution.

Four years ago, I added sparklines to Huffduffer using Google’s chart API. That API comes in two flavours: a JavaScript API for client-side creation of graphs, and image charts for server-side rendering of charts as PNGs.

The image API is really useful: there’s no reliance on JavaScript, it works in every browser capable of displaying images, and it’s really flexible and customisable. Therefore it is, of course, being deprecated.

The death warrant for Google image charts sets the execution date for 2015. Time to start looking for an alternative.

I couldn’t find a direct equivalent to the functionality that Google provides i.e. generating the images dynamically on the server. There are, however, plenty of client-side alternatives, many of them using canvas.

Most of the implementations I found were a little heavy-handed for my taste: they either required jQuery or Processing or both. I just wanted a quick little script for generating sparklines from a dataset of numbers. So I wrote my own.

I’ve put my code up on Github as Canvas Sparkline.

Here’s the JavaScript. You create a canvas element with the dimensions you want for the sparkline, then pass the ID of that element (along with your dataset) into the sparkline function:

sparkline ('canvasID', [12, 18, 13, 12, 11, 15, 17, 20, 15, 12, 8, 7, 9, 11], true);

(that final Boolean value at the end just indicates whether you want a red dot at the end of the sparkline).

The script takes care of normalising the values, so it doesn’t matter how many numbers are in the dataset or whether the range of the numbers is in the tens, hundreds, thousands, or hundreds of thousands.

There’s plenty of room for improvement:

  • The colour of the sparkline is hardcoded (50% transparent black) but it could be passed in as a value.
  • All the values should probably be passed in as an array of options rather than individual parameters.

Feel free to fork, adapt, and improve.

The sparklines are working quite nicely, but I can’t help but feel that this isn’t the right tool for the job. Ideally, I’d like to keep using a server-side solution like Google’s image charts. But if I am going to use a client-side solution, I’m not sure that canvas is the right element. This should really be SVG: canvas is great for dynamic images and animations that need to update quite quickly, but sparklines are generally pretty static. If anyone fancies making a lightweight SVG solution for sparklines, that would be lovely.

In the meantime, you can see Canvas Sparkline in action on the member profiles at The Session, like here, here, here, or here.

Update: Ask and thou shalt receive. Check out this fantastic lightweight SVG solution from Stuart—bloody brilliant!

Generating placeholders from datalists

Here’s a cute little markup pattern for ya.

Suppose you’ve got an input element that has—by means of a list attribute—an associated datalist. Here’s the example I used in HTML5 For Web Designers:

<label for="homeworld">Your home planet</label>
<input type="text" name="homeworld" id="homeworld" list="planets">
<datalist id="planets">
 <option value="Mercury">
 <option value="Venus">
 <option value="Earth">
 <option value="Mars">
 <option value="Jupiter">
 <option value="Saturn">
 <option value="Uranus">
 <option value="Neptune">
</datalist>

That results in a combo-box control in supporting browsers: as you type in the text field, you are presented with a subset of the options in the datalist that match what you are typing. It’s more powerful than a regular select, because you aren’t limited by the list of options: you’re free to type something that isn’t in the list (like, say, “Pluto”).

I’ve already written about the design of datalist and how you can use a combination of select and input using the same markup to be backward-compatible. I like datalist.

I also like the placeholder attribute. Another recent addition to HTML, this allows you to show an example of the kind of content you’d like the user to enter (note: this is not the same as a label).

It struck me recently that all the options in a datalist are perfectly good candidates for placeholder text. In the example above, I could update the input element to include:

<input type="text" name="homeworld" id="homeworld" list="planets" placeholder="Mars">

or:

<input type="text" name="homeworld" id="homeworld" list="planets" placeholder="Saturn">

I wrote a little piece of JavaScript to do this:

  1. Loop through all the input elements that have a list attribute.
  2. Find the corresponding datalist element (its ID will match the list attribute).
  3. Pick a random option element from that datalist.
  4. Set the placeholder value of the input to that option value.

Put that JavaScript at the end of your document (or link to it from the end of your document) and you’re all set. You might want to tweak it a little: I find it helps to preface placeholder values with “e.g.” to make it clear that this is an example value. You can do that by changing the last line of the script:

input.setAttribute('placeholder','e.g. '+value);

You also might want to show more than one possible value. You might want the placeholder value to read “e.g. Mercury, Venus, Earth, etc.” …I’ll leave that as an exercise for the reader.

Hacking History

I spent the weekend at The Guardian offices in London at History Hack Day. It was rather excellent. You’d think I’d get used to the wonderful nature of these kinds of events, but I once again I experienced the same level of amazement that I experienced the first time I went to hack day.

The weekend kicked off in the traditional way with some quickfire talks. Some lovely people from The British Museum, The British Library and The National Archives talked about their datasets, evangelists from Yahoo and Google talked about YQL and Fusion Tables, and Max Gadney and Matthew Sheret got us thinking in the right directions.

Matthew Sheret was particularly inspiring, equating hackers with time travellers, and encouraging us to find and explore the stories within the data of history. The assembled geeks certainly took that message to heart.

Ben Griffiths told the story of his great-uncle, who died returning from a bomber raid on Bremen in 1941. Using data to put the death in context, Ben approached the story of the lost bomber with sensitivity.

Simon created geStation, a timeline of when railway stations opened in the UK. On the face of it, it sounds like just another mashup of datetimes and lat-long coordinates. But when you run it, you can see the story of the industrial revolution emerge on the map.

Similarly, Gareth Lloyd and Tom Martin used Wikipedia data to show the emerging shape of the world over time in their video A History of the World in 100 Seconds, a reference to the BBC’s History of the World in 100 Objects for which Cristiano built a thoroughly excellent mobile app to help you explore the collection at British Museum.

Brian used the Tropo API to make a telephone service that will find a passenger on the Titanic who was the same age and sex as you, and then tell you if they made it onto a lifeboat or not. Hearing this over the phone makes the story more personal somehow. Call +1 (804) 316-9215 in the US, +44 2035 142721 in the UK, or +990009369991481398 on Skype to try it for yourself.

Audioboo / did you die on the Titanic? on Huffduffer

I was so impressed with the Tropo API that I spent most of History Hack Day working on a little something for Huffduffer …more on that later.

My contribution to the hack day was very modest, but it was one of the few to involve something non-digital. It’s called London On A Stick.

A pile of USB sticks had been donated to History Hack Day, but nobody was making much use of them so I thought they could be used as fodder for Dead Drops. I took five USB sticks and placed a picture from The National Archives on Flickr Commons on each one. Each picture was taken somewhere in London and has been geotagged.

Zeppelin over St. Paul's

I slapped sticky notes on the USB sticks with the location of the picture. Then I asked for volunteers to go out and place the sticks at the locations of the pictures: Paddington, Trafalgar Square, Upper Lambeth, St. Paul’s and Tower Bridge. Not being a Londoner myself, I’m relying on the natives to take up the challenge. You can find the locations at icanhaz.com/londononastick. I ducked out of History Hack Day a bit early to get back to Brighton so I have no idea if the five sticks were claimed.

Although my contribution to History Hack Day was very modest, I had a really good time. Matt did a great job putting on an excellent event.

It was an eye-opening weekend. This hack day put the “story” back into history.

The design of datalist

One of the many form enhancements provided by HTML5 is the datalist element. It allows you to turn a regular input field into a .

Using the list attribute on an input, you can connect it to a datalist with the corresponding ID. The datalist itself contains a series of option elements.

<input list="suggestions">
<datalist id="suggestions">
    <option value="foo"></option>
    <option value="bar"></option>
    <option value="baz"></option>
</datalist>

I can imagine a number of use cases for this:

  • “Share this” forms, like the one on Last.fm, that allow you to either select from your contacts on the site, or enter email addresses, separated by commas. Using input type="email" with a multiple attribute, in combination with a datalist would work nicely.
  • Entering the details for an event, where you can either select from a list of venues or, if the venue is not listed, create a new one.
  • Just about any form that has a selection of choices, of which the last choice is “other”, followed by “If other, please specify…”

You can take something like this:

<label for="source">How did you hear about us?</label>
<select name="source">
    <option>please choose...</option>
    <option value="television">Television</option>
    <option value="radio">Radio</option>
    <option value="newspaper">Newspaper</option>
    <option>Other</option>
</select>
If other, please specify:
<input id="source" name="source">

And replace it with this:

<label for="source">How did you hear about us?</label>
<datalist id="sources">
    <option value="television"></option>
    <option value="radio"></option>
    <option value="newspaper"></option>
</datalist>
<input id="source" name="source" list="sources">

The datalist element has been designed according to one of the design principles driving HTML5—Degrade Gracefully:

On the World Wide Web, authors are often reluctant to use new language features that cause problems in older user agents, or that do not provide some sort of graceful fallback. HTML 5 document conformance requirements should be designed so that Web content can degrade gracefully in older or less capable user agents, even when making use of new elements, attributes, APIs and content models.

Because the datalist element contains a series of option elements with value attributes, it is effectively invisible to user-agents that don’t support the datalist element. That means you can use the datalist element without worrying it “breaking” in older browsers.

If you wanted, you could include a message for non-supporting browsers:

<datalist id="sources">
    Your browser doesn't support datalist!
    <option value="television"></option>
    <option value="radio"></option>
    <option value="newspaper"></option>
</datalist>

That message—“Your browser doesn’t support datalist!”—will be visible in older browsers, but browsers that support datalist know not to show anything that’s not an option. But displaying a message like this for older browsers is fairly pointless; I certainly wouldn’t consider it graceful degradation.

In my opinion, one of the best aspects of the design of the datalist element is that you can continue to do things the old-fashioned way—using a select and an input—and at the same time start using datalist. There’s no violation of either; you can use the same option elements for the select and the datalist:

<label for="source">How did you hear about us?</label>
<datalist id="sources">
    <select name="source">
        <option>please choose...</option>
        <option value="television">Television</option>
        <option value="radio">Radio</option>
        <option value="newspaper">Newspaper</option>
        <option>Other</option>
    </select>
    If other, please specify:
</datalist>
<input id="source" name="source" list="sources">

Browsers that support datalist will display the label “How did you hear about us?” followed by a combo-box text field that allows the user to select an option, or enter some free text.

Browsers that don’t support datalist will display the the label “How did you hear about us?” followed by a drop-down list of of options (the last of which is “other”), followed by the text “If other, please specifiy”, followed by a text field.

Take a look at this example in Opera to see datalist in operation. Take a look at it in any other browser to see the fallback. The source is on Github if you’d like to play around with it.

WebKit’s mistake

If you try that example in any browser, you’ll get something that works; either through datalist support or through the select fallback …unless the browser is using WebKit.

It took me a while to figure out why this would be the case. I didn’t think that Safari or Chrome supported datalist and a little digging around with object detection in JavaScript confirmed this. So why don’t those browsers follow the standard behaviour and simply ignore the element they don’t understand and render what’s inside it instead.

Here’s the problem: line 539 of WebKit’s default CSS:

datalist {
    display: none;
}

This is pretty much the worst possible behaviour for a browser to implement. An element should either be recognised—like p, h1 or img—and parsed accordingly, or an element is unrecognised—like foo or bar—and ignored. WebKit does not support the datalist element (even in the current nightly build), so the element should be ignored.

Fortunately the problem is easily rectified by adding something like this to your own stylesheet:

datalist {
    display: inline-block;
}

I’ve submitted a bug report on the WebKit Bugzilla.

Update: That Webkit bug has now been fixed so the extra CSS is no longer necessary.

The format of The Long Now

In 01992, Tim Berners-Lee wrote a document called HTML Tags.

In September 02001, I started keeping this online journal. Back then, I was storing my data in XML, using a format of my own invention. The XML was converted using PHP into (X)HTML, RSS, and potentially anything else …although the “anything else” part never really materialised.

In February 02006, I switched over to using a MySQL database to store my data as chunks of markup.

In February 02007, Ted wrote about data longevity

To me, being able to completely migrate my data — with minimal bit-rot — from system to system is the key in the never-ending and easily-lost fight to keep my data accessible over the entirety of my life.

He’s using non-binary, well-documented standards to store and structure his data: Atom, HTML and microformats.

Meanwhile, the HTML5 spec began defining error-handling for HTML documents. Ian Hickson wrote:

The original reason I got involved in this work is that I realised that the human race has written literally billions of electronic documents, but without ever actually saying how they should be processed.

I decided that for the sake of our future generations we should document exactly how to process today’s documents, so that when they look back, they can still reimplement HTML browsers and get our data back, even if they no longer have access to Microsoft Internet Explorer’s source code.

In August 2008, Ian Hickson mentioned in an interview that the timeline for HTML5 involves having two complete implementations by 02022. Many web developers were disgusted that such a seemingly far-off date was even being mentioned. My reaction was the opposite. I began to pay attention to HTML5.

HTML is starting to look like a relatively safe bet for data longevity and portability. I’m not sure the same can be said for any particular flavour of database. Sooner rather than later, I should remove the unnecessary layer of abstraction that I’m using to store my data.

This would be my third migration of content. I will take care to head Mark Pilgrim’s advice on data fidelity:

Long-term data preservation is like long-term backup: a series of short-term formats, punctuated by a series of migrations. But migrating between data formats is not like copying raw data from one medium to another.

Fidelity is not a binary thing. Data can gradually degrade with each conversion until you’re left with crap. People think this only affects the analog world, like copying cassette tapes for several generations. But I think digital preservation is actually much harder, in part because people don’t even realize that it has the same issues.

He’s also betting on HTML:

HTML is not an output format. HTML is The Format. Not The Format Of Forever, but damn if it isn’t The Format Of The Now.

I don’t think that any format could ever be The Format Of The Long Now but HTML is the closest we’ve come thus far in the history of computing to having a somewhat stable, human- and machine-readable data format with a decent chance of real longevity.

Linkrot

The geeks of the UK have been enjoying a prime-time television show dedicated to the all things webby. Virtual Revoltution is a rare thing: a television programme about the web made by someone who actually understands the web (Aleks, to be precise).

Still, the four-part series does rely on the usual television documentary trope of presenting its subject matter as a series of yin and yang possibilities. The web: blessing or curse? The web: force for democracy or tool of oppression? Rhetorical questions: a necessary evil or an evil necessity?

The third episode tackles one of the most serious of society’s concerns about our brave new online world, namely the increasing amount of information available to commercial interests and the associated fear that technology is having a negative effect on privacy. Personally, I’m with Matt when he says:

If the end of privacy comes about, it’s because we misunderstand the current changes as the end of privacy, and make the mistake of encoding this misunderstanding into technology. It’s not the end of privacy because of these new visibilities, but it may be the end of privacy because it looks like the end of privacy because of these new visibilities*.

Inevitably, whenever there’s a moral panic about the web, a truism that raises its head is the assertion that The Internets Never Forget:

On the one hand, the Internet can freeze youthful folly and a small transgressions can stick with you for life. So that picture of you drunk and passed out in a skip, or that heated argument you had on a mailing list when you were twenty can come back and haunt you.

Citation needed.

We seem to have a collective fundamental attribution error when it comes to the longevity of data on the web. While we are very quick to recall the instances when a resource remains addressable for a long enough time period to cause embarrassment or shame later on, we completely ignore all the link rot and 404s that is the fate of most data on the web.

There is an inverse relationship between the age of a resource and its longevity. You are one hundred times more likely to find an embarrassing picture of you on the web uploaded in the last year than to find an embarrassing picture of you uploaded ten years ago.

If a potential boss finds a ten-year old picture of you drunk and passed out at a party, that’s certainly a cause for concern. But such an event would be extraordinary rather than commonplace. If that situation ever happened to me, I would probably feel outrage and indignation like anybody else, but I bet that I would also wonder Hmmm, where’s that picture being hosted? Sounds like a good place for off-site backups.

The majority of data uploaded to the web will disappear. But we don’t pay attention to the disappearances. We pay attention to the minority of instances when data survives.

This isn’t anything specific to the web; this is just the way we human beings operate. It doesn’t matter if the national statistics show a decrease in crime; if someone is mugged on your street, you’ll probably be worried about increased crime. It doesn’t matter how many airplanes successfully take off and land; one airplane crash in ten thousand is enough to make us very worried about dying on a plane trip. It makes sense that we’ve taken this cognitive bias with us onto the web.

As for why resources on the web tend to disappear over time, there are two possible reasons:

  1. The resource is being hosted on a third-party site or
  2. The resource is being hosted on an independent site.

The problem with the first instance is obvious. A commercial third-party responsible for hosting someone else’s hopes and dreams will pull the plug as soon as the finances stop adding up.

I’m sure you’ve seen the famous chart of Web 2.0 logos but have seen Meg Pickard’s updated version, adjusted for dead companies?

You cannot rely on a third-party service for data longevity, whether it’s Geocities, Magnolia, Pownce, or anything else.

That leaves you with The Pemberton Option: host your own data.

This is where the web excels: distributed and decentralised data linked together with hypertext. You can still ping third-party sites and allow them access to your data, but crucially, you are in control of the canonical copy (Tantek is currently doing just that, microblogging on his own site and sending copies to Twitter).

Distributed HTML, addressable by URL and available through HTTP: it’s a beautiful ballet that creates the network effects that makes the web such a wonderful creation. There’s just one problem and it lies with the URL portion of the equation.

Domain names aren’t bought, they are rented. Nobody owns domain names, except ICANN. While you get to decide the relative structure of URLs on your site, everything between the colon slash slash and the subsequent slash belongs to ICANN. Centralised. Not distributed.

Cool URIs don’t change but even with the best will in the world, there’s only so much we can do when we are tenants rather than owners of our domains.

In his book Weaving The Web, Sir Tim Berners-Lee mentions that exposing URLs in the browser interface was a throwaway decision, a feature that would probably only be of interest to power users. It’s strange to imagine what the web would be like if we used IP numbers rather than domain names—more like a phone system than a postal system.

But in the age of Google, perhaps domain names aren’t quite as important as they once were. In Japanese advertising, URLs are totally out. Instead they show search boxes with recommended search terms.

I’m not saying that we should ditch domain names. But there’s something fundamentally flawed about a system that thinks about domain names in time periods as short as a year or two. It doesn’t bode well for the long-term stability of our data on the web.

On the plus side, that embarrassing picture of you passed out at a party will inevitably disappear …along with almost everything else on the web.

To protect and to preserve

I’m gratified to see that my thoughts on archiving my data—prompted by the shutdown of Pownce, Magnolia, Ficlets, etc., etc., etc.,—are shared by others. But it’s all well and good for me to talk about how I’m backing up by using APIs, RSS, PHP and other non-trivial technologies. As David said when he bookmarked my post:

Now if someone would build a backup-to-local system that I could use…

Paul has been thinking about how to build it:

Now I’m wondering: is there a space for a piece of user-installable software, like Movable Type or Wordpress, that aggregates their data from sites across the web, and then presents it as a site? If there is, is it even possible to write it in a way that anyone who couldn’t have written it themselves can even use it? Can I write it just for myself in the first place?

Meanwhile, Mike points me to an impassioned post by Jason Scott prompted by the callous, heartbreaking closure of AOL Hometown. That’s right; AOL.

And before you sneer at AOL people, these people who trusted AOL: how about your Flickr? Your Facebook? Whatever the hot new wig-wag that you’re dumping hours into without thinking about it? What, you’re paying for something? Check this recent event out, paying subscriber: you have shit. Because of a cascade of EULA and Best Practices, and most importantly, a complete disregard for the importance of this data, we’re going to let it happen again. And again. And again.

Read his post and then read the follow-up: Datapocalypso! wherein he proposes an A-Team for rescuing data:

They’d go to a site, spider the living crap out of it, reverse engineer what they could, and then put it all up on archive.org or another hosting location, so people could grab things they needed. Fuck the EULAs and the clickthroughs. This is history, you bastards.

It’s still early days, but Archive Team now exists.

Magnoliloss

Since Magnolia went down, taking everyone’s bookmarks with it, I’ve been through a mild cycle.

  1. Denial. “It can’t be that all the data is gone. They’ll recover it.”
  2. Anger. “I want my freaking bookmarks!”
  3. Bargaining. “Isn’t there something I can do? Maybe there’s some API hacking that would help.”
  4. Depression. “Why do I bother contributing to any social websites. Our data is doomed in the end.”
  5. Acceptance. “C’est le Web.”

I also experienced déjà vu at every stage. The only difference between the end of Pownce and the end of Magnolia was that just one of those pieces of plug-pulling was planned. From the perspective of the people running those services, that’s a huge difference. From my perspective as an avid user of both services, it felt the same.

Actually, things turned out okay for my Magnolia data in the end. I was able to recover all my bookmarks …and it wasn’t down to any API hacking either. My bookmarks were saved by two messy, scrappy, plucky little technologies: RSS and microformats.

Google Reader caches RSS feeds aggressively. As long one person has ever subscribed to the RSS feed of your Magnolia links, you should be able to retrieve your links using Google’s Feed API—‘though for the life of me, I cannot understand why Google insists on marketing all these APIs as “Ajax” APIs, hiding server-side documentation under “Flash and other Non-Javascript Environments”.

If that doesn’t work, there’s always the regular HTML as archived by Google and the Internet Archive. Magnolia’s pages were marked up with . Using tools like Glenn’s UfXtract, this structured data can be converted into JSON or some other importable format. As Chris put it, Microformats are the vinyl of the web.

Magnolia’s bookmark recovery page uses a mixture of RSS and XFolk extraction tricks. I was able to recover my bookmarks and import them into Delicious.

But what’s the point of that? Swapping one third-party service for another. Well, believe me, I did a lot of soul searching before putting my links back in another silo. Really, I should be keeping my links here on adactio.com, maybe pinging Delicious or some other social bookmarking site as a back-up …what would Steven Pemberton do?

In the end, I decided to keep using Delicious partly out of convenience, but mostly because I can export my bookmarks quite easily; either through the API or as a hulking great hideous HTML bookmarks file (have you ever looked at the markup of those files that browsers import/export? Yeesh!)

But the mere presence of backup options isn’t enough. After all, Magnolia had a better API than Delicious but that didn’t help when the server came a crashin’. If I’m going to put data into a third-party site, I’m going to have to be self-disciplined and diligent about backing up regularly, just as I do with local data. So I’m getting myself into the habit of running a little PHP script every weekend that will extract all my bookmarks for safekeeping.

That’s my links taken care of. What about other data stores?

  • Twitter. This PHP script should take care of backing up all my inane utterances.
  • Flickr. I still have all the photos I’ve uploaded to Flickr so the photos themselves will be saved should anything happen to the site. But it would be a shame to lose the metadata that the pictures have accumulated. I should probably investigate how much metadata is maintained by backup services like QOOP.
  • Dopplr. Well, the data about my trips isn’t really the important part of Dopplr; it’s the ancillary stuff like coincidences that makes it so handy. Still, with a little bit of hacking on the Dopplr API I could probably whip an export script together. Update: Tom writes to tell me that in the form of an .ics file.
  • Last.fm. Again, like Dopplr, I’m not sure how valuable the data is outside the social context of the site. But again, like Dopplr, a bit of hacking on the Last.fm API might yield a reusable export script.
  • Ffffound. I don’t use it to store anything useful or valuable. That’s what tools like are for. Update: Hacker extraordinaire Paul Mison has whipped up a Ruby script to scrape ffffound and he points me in the direction of ddddownload.
  • Facebook. It could fall off the face of the planet for all I care. I’ve never put any data into the site. I only keep a profile there as a communication hub for otherwise unconnected old friends.

As for my own sites—adactio, DOM Scripting, Principia Gastronomica, Salter Cane and of course The Session and Huffduffer—I’ve got local copies which are regularly backed up to an external hard drive and I’m doing database dumps once a week, which probably isn’t often enough. I worry sometimes that I’m not nearly as paranoid as I should be.

What happened to Magnolia was a real shame but, to put a positive spin on it, it’s been a learning experience not just for me, but for Larry too.

Sound and vision

Every creation of Tony Wilson’s was labelled with the letters followed by a number. The first poster was FAC1. was FAC51.

The Joy Division album was FACT10. The album artwork was designed by Peter Saville. The words “Unknown Pleasures” don’t appear on the cover. Neither do the words “Joy Division”. Instead, the cover contains a series of 100 lines representing pulses from —thanks to . It was a groundbreaking piece of graphic design. Its beauty lies in its simplicity: a two-dimensional representation of raw data.

That was almost thirty years ago. This week Radiohead released the video for the song House of Cards from the album In Rainbows …except it isn’t really a video at all. It wasn’t shot on film or video. It is a three-dimensional representation of raw data.

You can play with the data visualisation, altering it while the song plays. You can even download the raw data. You are not just allowed to play around with the data, you are encouraged to do so. There’s a YouTube group for aggregating the results.

Suddenly every other music video seems very flat and passive. I’m reminded of a prescient passage from Douglas Adams’s essay How to Stop Worrying and Learn to Love the Internet:

I expect that history will show “normal” mainstream twentieth century media to be the aberration in all this.

Please, miss, you mean they could only just sit there and watch? They couldn’t do anything? Didn’t everybody feel terribly isolated or alienated or ignored?

Yes, child, that’s why they all went mad. Before the Restoration.

What was the Restoration again, please, miss?

The end of the twentieth century, child. When we started to get interactivity back.

Visual

Jason Kottke points to a a beautiful collection of literary maps by Stefanie Posavec. Meanwhile over on A List Apart there’s a new article by Wilson Miner called Accessible Data Visualization with Web Standards. He shows some of the nifty CSS tricks he used on EveryBlock. The end results are very impressive though I don’t necessarily agree with the assertion that when what we’re really building is navigation, tables are an awkward and often clumsy tool for the job — I still think that tables would have been not just semantically correct but also malleable enough with CSS. But I’m nitpicking. It’s a great article.

There was oodles of data visualisation goodness at BarCamp Brighton 2 courtesy of Robin Harrison. Check out the links from his presentation. As well as the Tufte favourite of Napoleon’s Russian invasion map, he mentioned Florence Nightingale’s map of mortality causes which reminded me of the cholera map of London. That is the subject of the newest book from Steven Johnson called The Ghost Map: The Story of London’s Most Terrifying Epidemic—and How It Changed Science, Cities, and the Modern World.

There are some fine examples of data visualisation over at the New York Times:

Some more data visualisation: