Journal tags: format

81

sparkline

Preparing an online conference talk

I’m terrible at taking my own advice.

Hana wrote a terrific article called You’re on mute: the art of presenting in a Zoom era. In it, she has very kind things to say about my process for preparing conference talks.

As it happens, I’m preparing a conference talk right now for delivery online. Am I taking my advice about how to put a talk together? I am on me arse.

Perhaps the most important part of the process I shared with Hana is that you don’t get too polished too soon. Instead you get everything out of your head as quickly as possible (probably onto disposable bits of paper) and only start refining once you’re happy with the rough structure you’ve figured out by shuffling those bits around.

But the way I’ve been preparing this talk has been more like watching a progress bar. I started at the start and even went straight into slides as the medium for putting the talk together.

It was all going relatively well until I hit a wall somewhere between the 50% and 75% mark. I was blocked and I didn’t have any rough sketches to fall back on. Everything was a jumbled mess in my brain.

It all came to a head at the start of last week when that jumbled mess in my brain resulted in a very restless night spent tossing and turning while I imagined how I might complete the talk.

This is a terrible way of working and I don’t recommend it to anyone.

The problem was I couldn’t even return to the proverbial drawing board because I hadn’t given myself a drawing board to return to (other than this crazy wall of connections on Kinopio).

My sleepless night was a wake-up call (huh?). The next day I forced myself to knuckle down and pump out anything even if it was shit—I could refine it later. Well, it turns out that just pumping out any old shit was exactly what I needed to do. The act of moving those fingers up and down on the keyboard resulted in something that wasn’t completely terrible. In fact, it turned out pretty darn good.

Past me said:

The idea here is to get everything out of my head.

I should’ve listened to that guy.

At this point, I think I’ve got the talk done. The progress bar has reached 100%. I even think that it’s pretty good. A giveaway for whether a talk is any good is when I find myself thinking “Yes, this has good points well made!” and then five minutes later I’m thinking “Wait, is this complete rubbish that’s totally obvious and doesn’t make much sense?” (see, for example, every talk I’ve ever prepared ever).

Now I just to have to record it. The way that An Event Apart are running their online editions is that the talks are pre-recorded but followed with live Q&A. That’s how the Clearleft events team have been running the conference part of the Leading Design Festival too. Last week there were three days of this format and it worked out really, really well. This week there’ll be masterclasses which are delivered in a more synchronous way.

It feels a bit different to prepare a talk for pre-recording rather than live delivery on stage. In fact, it feels less like preparing a conference talk and more like making a documentary. I guess this is what life is like for YouTubers.

I think the last time I was in a cinema before The Situation was at the wonderful Duke of York’s cinema here in Brighton for an afternoon showing of The Proposition followed by a nice informal chat with the screenwriter, one Nick Cave, local to this parish. It was really enjoyable, and that’s kind of what Leading Design Festival felt like last week.

I wonder if maybe we’ve been thinking about online events with the wrong metaphor. Perhaps they’re not like conferences that have moved online. Maybe they’re more like film festivals where everyone has the shared experience of watching a new film for the first time together, followed by questions to the makers about what they’ve just seen.

Audio

I spent the last couple of weekends rolling out a new feature on The Session. It involves playing audio in a web page. No big deal these days, right? But the history involves some old file formats…

The first venerable format is ABC notation. File extension: .abc, mime type: text/vnd.abc. It’s an ingenious text format for musical notation using ASCII. The metadata of the piece of music is defined in JSON-like key/value pairs. Then the contents are encoded with letters: A, B, C, etc. Uppercase and lowercase denote different octaves. Numbers can be used for note lengths.

The format was created by Chris Walshaw in 1997 when dial-up was the norm. With ABC, people were able to swap tunes on email lists or bulletin boards without transferring weighty image or sound files. If you had ABC software on your computer, you could convert that lightweight text file into sheet music …or audio.

That brings me to the second old format: midi files. File extension: .mid, mime-type: audio/midi. Like ABC, it’s a lightweight format for encoding the instructions for music instead of the music itself.

Think of it like SVG: instead of storing the final pixels of an image, SVG stores the instructions for drawing the image instead. The instructions in a midi file are like “play this note for this long on this instrument.” Again, as with ABC, you need some software to turn the instructions into sound.

There was a time when lots of software could play midi files. Quicktime on the Mac, for example. You could even embed midi files in web pages. I mean literally embed them …with the embed element. No Geocities page was complete without an autoplaying midi file.

On The Session, people submit tunes in ABC format. Then, using the amazing ABCJS JavaScript library, the ABC is turned into SVG on the fly! For years I’ve also offered midi files, generated on the server from the ABC notation.

But times have changed. These days it’s hard to find software that plays midi files. Quicktime doesn’t do it anymore. And you’d need to go to the app store on iOS to find a midi file player. It’s time to phase out the midi files on The Session.

I still want to provide automatically-generated audio though. Fortunately ABCJS gives me a way to do this. But instead of using the old technology of midi files, it uses a more modern browser feature: the Web Audio API.

The end result sounds like a midi file, but the underlying technique is more like a synthesiser. There’s a separate mp3 file for each note. The JavaScript figures out how long each “sample” needs to be played for, strings them all together, and outputs them with Web Audio. So you’ve got cutting-edge browser technology recreating a much older file format. Paul Rosen—the creator of ABCJS—has a presentation explaining how it all works under the hood.

Not only is there a separate short mp3 file for each note in seven octaves, but if you want the sound of a different instrument, you need samples for all seven octaves in that instrument. They’re called soundfonts.

Paul provides soundfonts for ABCJS. It’s a repo that was forked from this repo from Benjamin Gleitzman. And here’s where it gets small worldy…

The reason why Benjamin has a repo of soundfonts is because he needed to create midi-like audio in the browser. He wanted to do this for a project on September 28th and 29th, 2013 …at Science Hack Day San Francisco!

I was there too—working on my own audio-related hack—and I remember the excellent (and winning) hack that Benjamin worked on. It was called Symphony of Satellites and it’s still online along with the promo video. Here’s Benjamin’s post-hackday write-up from seven years ago.

It’s rare that the worlds of the web and Irish music cross over. When I got to meet Paul—creator of ABCJS—at a web conference a couple of years ago it kind of blew my mind. Last weekend when I set out to dabble with a feature on The Session, I certainly didn’t expect to stumble on a connection to Science Hack Day! (Aside: the first Science Hack Day was ten years ago—yowzers!)

Anyway, I was able to get that audio playback working on The Session. Except for some weirdness on iOS that I had to fix. But that’s a hack for another day.

Ampvisory

I was very inspired by something Terence Eden wrote on his blog last year. A report from the AMP Advisory Committee Meeting:

I don’t like AMP. I think that Google’s Accelerated Mobile Pages are a bad idea, poorly executed, and almost-certainly anti-competitive.

So, I decided to join the AC (Advisory Committee) for AMP.

Like Terence, I’m not a fan of Google AMP—my initially positive reaction to it soured over time as it became clear that Google were blackmailing publishers by privileging AMP pages in Google Search. But all I ever did was bitch and moan about it on my website. Terence actually did something.

So this year I put myself forward as a candidate for the AMP advisory committee. I have no idea how the election process works (or who does the voting) but thanks to whoever voted for me. I’m now a member of the AMP advisory committee. If you look at that blog post announcing the election results, you’ll see the brief blurb from everyone who was voted in. Most of them are positively bullish on AMP. Mine is not:

Jeremy Keith is a writer and web developer dedicated to an open web. He is concerned that AMP is being unfairly privileged by Google’s search engine instead of competing on its own merits.

The good news is that main beef with AMP is already being dealt with. I wanted exactly what Terence said:

My recommendation is that Google stop requiring that organisations use Google’s proprietary mark-up in order to benefit from Google’s promotion.

That’s happening as of May of this year. Just as well—the AMP advisory committee have absolutely zero influence on Google search. I’m not sure how much influence we have at all really.

This is an interesting time for AMP …whatever AMP is.

See, that’s been a problem with Google AMP from the start. There are multiple defintions of what AMP is. At the outset, it seemed pretty straightforward. AMP is a format. It has a doctype and rules that you have to meet in order to be “valid” AMP. Part of that ruleset involved eschewing HTML elements like img and video in favour of web components like amp-img and amp-video.

That messaging changed over time. We were told that AMP is the collection of web components. If that’s the case, then I have no problem at all with AMP. People are free to use the components or not. And if the project produces performant accessible web components, then that’s great!

But right now it’s not at all clear which AMP people are talking about, even in the advisory committee. When we discuss improving AMP, do we mean the individual components or the set of rules that qualify an AMP page being “valid”?

The use-case for AMP-the-format (as opposed to AMP-the-library-of-components) was pretty clear. If you were a publisher and you wanted to appear in the top stories carousel in Google search, you had to publish using AMP. Just using the components wasn’t enough. Your pages had to be validated as AMP-the-format.

That’s no longer the case. From May, pages that are fast enough will qualify for the top stories carousel. What will publishers do then? Will they still maintain separate AMP-the-format pages? Time will tell.

I suspect publishers will ditch AMP-the-format, although it probably won’t happen overnight. I don’t think anyone likes being blackmailed by a search engine:

An engineer at a major news publication who asked not to be named because the publisher had not authorized an interview said Google’s size is what led publishers to use AMP.

The pre-rendering (along with the lightning bolt) that happens for AMP pages in Google search might be a reason for publishers to maintain their separate AMP-the-format pages. But I suspect publishers don’t actually think the benefits of pre-rendering outweigh the costs: pre-rendered AMP-the-format pages are served from Google’s servers with a Google URL. If anything, I think that publishers will look forward to having the best of both worlds—having their pages appear in the top stories carousel, but not having their pages hijacked by Google’s so-called-cache.

Does AMP-the-format even have a future without Google search propping it up? I hope not. I think it would make everything much clearer if AMP-the-format went away, leaving AMP-the-collection-of-components. We’d finally see these components being evaluated on their own merits—usefulness, performance, accessibility—without unfair interference.

So my role on the advisory committee so far has been to push for clarification on what we’re supposed to be advising on.

I think it’s good that I’m on the advisory committee, although I imagine my opinions could easily be be dismissed given my public record of dissent. I may well be fooling myself though, like those people who go to work at Facebook and try to justify it by saying they can accomplish more from inside than outside (or whatever else they tell themselves to sleep at night).

The topic I’ve volunteered to help with is somewhat existential in nature: what even is AMP? I’m happy to spend some time on that. I think it’ll be good for everyone to try to get that sorted, regardless about how you feel about the AMP project.

I have no intention of giving any of my unpaid labour towards the actual components themselves. I know AMP is theoretically open source now, but let’s face it, it’ll always be perceived as a Google-led project so Google can pay people to work on it.

That said, I’ve also recently joined a web components community group that Lea instigated. Remember she wrote that great blog post recently about the failed promise of web components? I’m not sure how much I can contribute to the group (maybe some meta-advice on the nature of good design principles?) but at the very least I can serve as a bridge between the community group and the AMP advisory committee.

After all, AMP is a collection of web components. Maybe.

Downloading from Google Fonts

If you’re using web fonts, there are good performance (and privacy) reasons for hosting your own font files. And fortunately, Google Fonts gives you that option. There’s a “Download family” button on every specimen page.

But if you go ahead and download a font family from Google Fonts, you’ll notice something a bit odd. The .zip file only contains .ttf files. You can serve those on the web, but it’s far from the best choice. Woff2 is far leaner in file size.

This means you need to manually convert the downloaded .ttf files into .woff or .woff2 files using something like Font Squirrel’s generator. That’s fine, but I’m curious as to why this step is necessary. Why doesn’t Google Fonts provide .woff or .woff2 files in the downloaded folder? After all, if you choose to use Google Fonts as a third-party hosting service for your fonts, it most definitely serves up the appropriate file formats.

I thought maybe it was something to do with the licensing. Maybe some licenses only allow for unmodified truetype files to be distributed? But I’ve looked at fonts with different licenses—some have Apache 2 licensing, some have Open Font licensing—and they’re all quite permissive and definitely allow for modification.

Maybe the thinking is that, if you’re hosting your own font files, then you know what you’re doing and you should be able to do your own file conversion and subsetting. But I’ve come across more than one website in the wild serving up .ttf files. And who can blame them? They want to host their own font files. They downloaded those files from Google Fonts. Why shouldn’t they assume that they’re good to go?

It’s all a bit strange. If anyone knows why Google Fonts only provides .ttf files for download, please let me know. In a pinch, I will also accept rampant speculation.

Trys also pointed out some weird default behaviour if you do let Google Fonts do the hosting for you. Specifically if it’s a variable font. Let’s say it’s a font with weight as a variable axis. You specify in advance which weights you’ll be using, and then it generates separate font files to serve for each different weight.

Doesn’t that defeat the whole point of using a variable font? I mean, I can see how it could result in smaller file sizes if you’re just using one or two weights, but isn’t half the fun of having a weight axis that you can go crazy with as many weights as you want and it’s all still one font file?

Like I said, it’s all very strange.

Performance and people

I was helping a client with a bit of a performance audit this week. I really, really enjoy this work. It’s such a nice opportunity to get my hands in the soil of a website, so to speak, and suggest changes that will have a measurable effect on the user’s experience.

Not only is web performance a user experience issue, it may well be the user experience issue. Page speed has a proven demonstrable direct effect on user experience (and revenue and customer satisfaction and whatever other metrics you’re using).

It struck me that there’s a continuum of performance challenges. On one end of the continuum, you’ve got technical issues. These can be solved with technical solutions. On the other end of the continuum, you’ve got human issues. These can be solved with discussions, agreement, empathy, and conversations (often dreaded or awkward).

I think that, as developers, we tend to gravitate towards the technical issues. That’s our safe space. But I suspect that bigger gains can be reaped by tackling the uncomfortable human issues.

This week, for example, I uncovered three performance issues. One was definitely technical. One was definitely human. One was halfway between.

The technical issue was with web fonts. It’s a lot of fun to dive into this aspect of web performance because quite often there’s some low-hanging fruit: a relatively simple technical fix that will boost the performance (or perceived performance) of a website. That might be through resource hints (using link rel=“preload” in the HTML) or adjusting the font loading (using font-display in the CSS) or even nerdier stuff like subsetting.

In this case, the issue was with the file format of the font itself. By switching to woff2, there were significant file size savings. And the great thing is that @font-face rules allow you to specify multiple file formats so you can still support older browsers that can’t handle woff2. A win all ‘round!

The performance issue that was right in the middle of the technical/human continuum was with images. At first glance it looked like a similar issue to the fonts. Some images were being served in the wrong formats. When I say “wrong”, I guess I mean inappropriate. A photographic image, for example, is probably going to best served as a JPG rather than a PNG.

But unlike the fonts, the images weren’t in the direct control of the developers. These images were coming from a Content Management System. And while there’s a certain amount of processing you can do on the server, a human still makes the decision about what file format they’re uploading.

I’ve seen this happen at Clearleft. We launched an event site with lean performant code, but then someone uploaded an image that’s megabytes in size. The solution in that case wasn’t technical. We realised there was a knowledge gap around image file formats—which, let’s face it, is kind of a techy topic that most normal people shouldn’t be expected to know.

But it was extremely gratifying to see that people were genuinely interested in knowing a bit more about choosing the right format for the right image. I was able to provide a few rules of thumb and point to free software for converting images. It empowered those people to feel more confident using the Content Management System.

It was a similar situation with the client site I was looking at this week. Nobody is uploading oversized images in order to deliberately make the site slower. They probably don’t realise the difference that image formats can make. By having a discussion and giving them some pointers, they’ll have more knowledge and the site will be faster. Another win all ‘round!

At the other end of continuum was an issue that wasn’t technical. From a technical point of view, there was just one teeny weeny little script. But that little script is Google Tag Manager which then calls many, many other scripts that are not so teeny weeny. Third party scripts …the bane of web performance!

In retrospect, it seems unbelievable that third-party JavaScript is even possible. I mean, putting arbitrary code—that can then inject even more arbitrary code—onto your website? That seems like a security nightmare!

Remember when I did a countdown of the top four web performance challenges? At the number one spot is other people’s JavaScript.

Now one technical solution would be to remove the Google Tag Manager script. But that’s probably not very practical—you’ll probably just piss off some other department. That said, if you can’t find out which department was responsible for adding the Google Tag Manager script in the first place, it might we well be an option to remove it and then wait and see who complains. If no one notices it’s gone, job done!

More realistically, there’s someone who’s added that Google Tag Manager script for their own valid reasons. You’ll need to talk to them and understand their needs.

Again, as with images uploaded in a Content Management System, they may not be aware of the performance problems caused by third-party scripts. You could try throwing numbers at them, but I think you get better results by telling the story of performance.

Use tools like Request Map Generator to help them visualise the impact that third-party scripts are having. Talk to them. More importantly, listen to them. Find out why those scripts are being requested. What are the outcomes they’re working towards? Can you offer an alternative way of providing the data they need?

I think many of us developers are intimidated or apprehensive about approaching people to have those conversations. But it’s necessary. And in its own way, it can be as rewarding as tinkering with code. If the end result is a faster website, then the work is definitely worth doing—whether it’s technical work or people work.

Personally, I just really enjoy working on anything that will end up improving a website’s performance, and by extension, the user experience. If you fancy working with me on your site, you should get in touch with Clearleft.

Reading

At the beginning of the year, Remy wrote about extracting Goodreads metadata so he could create his end-of-year reading list. More recently, Mark Llobrera wrote about how he created a visualisation of his reading history. In his case, he’s using JSON to store the information.

This kind of JSON storage is exactly what Tom Critchlow proposes in his post, Library JSON - A Proposal for a Decentralized Goodreads:

Thinking through building some kind of “web of books” I realized that we could use something similar to RSS to build a kind of decentralized GoodReads powered by indie sites and an underlying easy to parse format.

His proposal looks kind of similar to what Mark came up with. There’s a title, an author, an image, and some kind of date for when you started and/or finished reading the book.

Matt then points out that RSS gets close to the data format being suggested and asks how about using RSS?:

Rather than inventing a new format, my suggestion is that this is RSS plus an extension to deal with books. This is analogous to how the podcast feeds are specified: they are RSS plus custom tags.

Like Matt, I’m in favour of re-using existing wheels rather than inventing new ones, mostly to avoid a 927 situation.

But all of these proposals—whether JSON or RSS—involve the creation of a separate file, and yet the information is originally published in HTML. Along the lines of Matt’s idea, I could imagine extending the h-entry collection of class names to allow for books (or films, or other media). It already handles images (with u-photo). I think the missing fields are the date-related ones: when you start and finish reading. Those fields are present in a different microformat, h-event in the form of dt-start and dt-end. Maybe they could be combined:


<article class="h-entry h-event h-review">
<h1 class="p-name p-item">Book title</h1>
<img class="u-photo" src="image.jpg" alt="Book cover.">
<p class="p-summary h-card">Book author</p>
<time class="dt-start" datetime="YYYY-MM-DD">Start date</time>
<time class="dt-end" datetime="YYYY-MM-DD">End date</time>
<div class="e-content">Remarks</div>
<data class="p-rating" value="5">★★★★★</data>
<time class="dt-published" datetime="YYYY-MM-DDThh:mm">Date of this post</time>
</article>

That markup is simultaneously a post (h-entry) and an event (h-event) and you can even throw in h-card for the book author (as well as h-review if you like to rate the books you read). It can be converted to RSS and also converted to .ics for calendars—those parsers are already out there. It’s ready for aggregation and it’s ready for visualisation.

I publish very minimal reading posts here on adactio.com. What little data is there isn’t very structured—I don’t even separate the book title from the author. But maybe I’ll have a little play around with turning these h-entries into combined h-entry/event posts.

Going offline with microformats

For the offline page on my website, I’ve been using a mixture of the Cache API and the localStorage API. My service worker script uses the Cache API to store copies of pages for offline retrieval. But I used the localStorage API to store metadata about the page—title, description, and so on. Then, my offline page would rifle through the pages stored in a cache, and retreive the corresponding metadata from localStorage.

It all worked fine, but as soon as I read Remy’s post about the forehead-slappingly brilliant technique he’s using, I knew I’d be switching my code over. Instead of using localStorage—or any other browser API—to store and retrieve metadata, he uses the pages themselves! Using the Cache API, you can examine the contents of the pages you’ve stored, and get at whatever information you need:

I realised I didn’t need to store anything. HTML is the API.

Refactoring the code for my offline page felt good for a couple of reasons. First of all, I was able to remove a dependency—localStorage—and simplify the JavaScript. That always feels good. But the other reason for the warm fuzzies is that I was able to use data instead of metadata.

Many years ago, Cory Doctorow wrote a piece called Metacrap. In it, he enumerates the many issues with metadata—data about data. The source of many problems is when the metadata is stored separately from the data it describes. The data may get updated, without a corresponding update happening to the metadata. Metadata tends to rot because it’s invisible—out of sight and out of mind.

In fact, that’s always been at the heart of one of the core principles behind microformats. Instead of duplicating information—once as data and again as metadata—repurpose the visible data; mark it up so its meta-information is directly attached to the information itself.

So if you have a person’s contact details on a web page, rather than repeating that information somewhere else—in the head of the document, say—you could instead attach some kind of marker to indicate which bits of the visible information are contact details. In the case of microformats, that’s done with class attributes. You can mark up a page that already has your contact information with classes from the h-card microformat.

Here on my website, I’ve marked up my blog posts, articles, and links using the h-entry microformat. These classes explicitly mark up the content to say “this is the title”, “this is the content”, and so on. This makes it easier for other people to repurpose my content. If, for example, I reply to a post on someone else’s website, and ping them with a webmention, they can retrieve my post and know which bit is the title, which bit is the content, and so on.

When I read Remy’s post about using the Cache API to retrieve information directly from cached pages, I knew I wouldn’t have to do much work. Because all of my posts are already marked up with h-entry classes, I could use those hooks to create a nice offline page.

The markup for my offline page looks like this:

<h1>Offline</h1>
<p>Sorry. It looks like the network connection isn’t working right now.</p>
<div id="history">
</div>

I’ll populate that “history” div with information from a cache called “pages” that I’ve created using the Cache API in my service worker.

I’m going to use async/await to do this because there are lots of steps that rely on the completion of the step before. “Open this cache, then get the keys of that cache, then loop through the pages, then…” All of those thens would lead to some serious indentation without async/await.

All async functions have to have a name—no anonymous async functions allowed. I’m calling this one listPages, just like Remy is doing. I’m making the listPages function execute immediately:

(async function listPages() {
...
})();

Now for the code to go inside that immediately-invoked function.

I create an array called browsingHistory that I’ll populate with the data I’ll use for that “history” div.

const browsingHistory = [];

I’m going to be parsing web pages later on, so I’m going to need a DOM parser. I give it the imaginative name of …parser.

const parser = new DOMParser();

Time to open up my “pages” cache. This is the first await statement. When the cache is opened, this promise will resolve and I’ll have access to this cache using the variable …cache (again with the imaginative naming).

const cache = await caches.open('pages');

Now I get the keys of the cache—that’s a list of all the page requests in there. This is the second await. Once the keys have been retrieved, I’ll have a variable that’s got a list of all those pages. You’ll never guess what I’m calling the variable that stores the keys of the cache. That’s right …keys!

const keys = await cache.keys();

Time to get looping. I’m getting each request in the list of keys using a for/of loop:

for (const request of keys) {
...
}

Inside the loop, I pull the page out of the cache using the match() method of the Cache API. I’ll store what I get back in a variable called response. As with everything involving the Cache API, this is asynchronous so I need to use the await keyword here.

const response = await cache.match(request);

I’m not interested in the headers of the response. I’m specifically looking for the HTML itself. I can get at that using the text() method. Again, it’s asynchronous and I want this promise to resolve before doing anything else, so I use the await keyword. When the promise resolves, I’ll have a variable called html that contains the body of the response.

const html = await response.text();

Now I can use that DOM parser I created earlier. I’ve got a string of text in the html variable. I can generate a Document Object Model from that string using the parseFromString() method. This isn’t asynchronous so there’s no need for the await keyword.

const dom = parser.parseFromString(html, 'text/html');

Now I’ve got a DOM, which I have creatively stored in a variable called …dom.

I can poke at it using DOM methods like querySelector. I can test to see if this particular page has an h-entry on it by looking for an element with a class attribute containing the value “h-entry”:

if (dom.querySelector('.h-entry h1.p-name') {
...
}

In this particular case, I’m also checking to see if the h1 element of the page is the title of the h-entry. That’s so that index pages (like my home page) won’t get past this if statement.

Inside the if statement, I’m going to store the data I retrieve from the DOM. I’ll save the data into an object called …data!

const data = new Object;

Well, the first piece of data isn’t actually in the markup: it’s the URL of the page. I can get that from the request variable in my for loop.

data.url = request.url;

I’m going to store the timestamp for this h-entry. I can get that from the datetime attribute of the time element marked up with a class of dt-published.

data.timestamp = new Date(dom.querySelector('.h-entry .dt-published').getAttribute('datetime'));

While I’m at it, I’m going to grab the human-readable date from the innerText property of that same time.dt-published element.

data.published = dom.querySelector('.h-entry .dt-published').innerText;

The title of the h-entry is in the innerText of the element with a class of p-name.

data.title = dom.querySelector('.h-entry .p-name').innerText;

At this point, I am actually going to use some metacrap instead of the visible h-entry content. I don’t output a description of the post anywhere in the body of the page, but I do put it in the head in a meta element. I’ll grab that now.

data.description = dom.querySelector('meta[name="description"]').getAttribute('content');

Alright. I’ve got a URL, a timestamp, a publication date, a title, and a description, all retrieved from the HTML. I’ll stick all of that data into my browsingHistory array.

browsingHistory.push(data);

My if statement and my for/in loop are finished at this point. Here’s how the whole loop looks:

for (const request of keys) {
  const response = await cache.match(request);
  const html = await response.text();
  const dom = parser.parseFromString(html, 'text/html');
  if (dom.querySelector('.h-entry h1.p-name')) {
    const data = new Object;
    data.url = request.url;
    data.timestamp = new Date(dom.querySelector('.h-entry .dt-published').getAttribute('datetime'));
    data.published = dom.querySelector('.h-entry .dt-published').innerText;
    data.title = dom.querySelector('.h-entry .p-name').innerText;
    data.description = dom.querySelector('meta[name="description"]').getAttribute('content');
    browsingHistory.push(data);
  }
}

That’s the data collection part of the code. Now I’m going to take all that yummy information an output it onto the page.

First of all, I want to make sure that the browsingHistory array isn’t empty. There’s no point going any further if it is.

if (browsingHistory.length) {
...
}

Within this if statement, I can do what I want with the data I’ve put into the browsingHistory array.

I’m going to arrange the data by date published. I’m not sure if this is the right thing to do. Maybe it makes more sense to show the pages in the order in which you last visited them. I may end up removing this at some point, but for now, here’s how I sort the browsingHistory array according to the timestamp property of each item within it:

browsingHistory.sort( (a,b) => {
  return b.timestamp - a.timestamp;
});

Now I’m going to concatenate some strings. This is the string of HTML text that will eventually be put into the “history” div. I’m storing the markup in a string called …markup (my imagination knows no bounds).

let markup = '<p>But you still have something to read:</p>';

I’m going to add a chunk of markup for each item of data.

browsingHistory.forEach( data => {
  markup += `
<h2><a href="${ data.url }">${ data.title }</a></h2>
<p>${ data.description }</p>
<p class="meta">${ data.published }</p>
`;
});

With my markup assembled, I can now insert it into the “history” part of my offline page. I’m using the handy insertAdjacentHTML() method to do this.

document.getElementById('history').insertAdjacentHTML('beforeend', markup);

Here’s what my finished JavaScript looks like:

<script>
(async function listPages() {
  const browsingHistory = [];
  const parser = new DOMParser();
  const cache = await caches.open('pages');
  const keys = await cache.keys();
  for (const request of keys) {
    const response = await cache.match(request);
    const html = await response.text();
    const dom = parser.parseFromString(html, 'text/html');
    if (dom.querySelector('.h-entry h1.p-name')) {
      const data = new Object;
      data.url = request.url;
      data.timestamp = new Date(dom.querySelector('.h-entry .dt-published').getAttribute('datetime'));
      data.published = dom.querySelector('.h-entry .dt-published').innerText;
      data.title = dom.querySelector('.h-entry .p-name').innerText;
      data.description = dom.querySelector('meta[name="description"]').getAttribute('content');
      browsingHistory.push(data);
    }
  }
  if (browsingHistory.length) {
    browsingHistory.sort( (a,b) => {
      return b.timestamp - a.timestamp;
    });
    let markup = '<p>But you still have something to read:</p>';
    browsingHistory.forEach( data => {
      markup += `
<h2><a href="${ data.url }">${ data.title }</a></h2>
<p>${ data.description }</p>
<p class="meta">${ data.published }</p>
`;
    });
    document.getElementById('history').insertAdjacentHTML('beforeend', markup);
  }
})();
</script>

I’m pretty happy with that. It’s not too long but it’s still quite readable (I hope). It shows that the Cache API and the h-entry microformat are a match made in heaven.

If you’ve got an offline strategy for your website, and you’re using h-entry to mark up your content, feel free to use that code.

If you don’t have an offline strategy for your website, there’s a book for that.

Optimise without a face

I’ve been playing around with the newly-released Squoosh, the spiritual successor to Jake’s SVGOMG. You can drag images into the browser window, and eyeball the changes that any optimisations might make.

On a project that Cassie is working on, it worked really well for optimising some JPEGs. But there were a few images that would require a bit more fine-grained control of the optimisations. Specifically, pictures with human faces in them.

I’ve written about this before. If there’s a human face in image, I open that image in a graphics editing tool like Photoshop, select everything but the face, and add a bit of blur. Because humans are hard-wired to focus on faces, we’ll notice any jaggy artifacts on a face, but we’re far less likely to notice jagginess in background imagery: walls, materials, clothing, etc.

On the face of it (hah!), a browser-based tool like Squoosh wouldn’t be able to optimise for faces, but then Cassie pointed out something really interesting…

When we were both at FFConf on Friday, there was a great talk by Eleanor Haproff on machine learning with JavaScript. It turns out there are plenty of smart toolkits out there, and one of them is facial recognition. So I wonder if it’s possible to build an in-browser tool with this workflow:

  • Drag or upload an image into the browser window,
  • A facial recognition algorithm finds any faces in the image,
  • Those portions of the image remain crisp,
  • The rest of the image gets a slight blur,
  • Download the optimised image.

Maybe the selecting/blurring part would need canvas? I don’t know.

Anyway, I thought this was a brilliant bit of synthesis from Cassie, and now I’ve got two questions:

  1. Does this exist yet? And, if not,
  2. Does anyone want to try building it?

Webmentions at Indie Web Camp Berlin

I was in Berlin for most of last week, and every day was packed with activity:

By the time I got back to Brighton, my brain was full …just in time for FF Conf.

All of the events were very different, but equally enjoyable. It was also quite nice to just attend events without speaking at them.

Indie Web Camp Berlin was terrific. There was an excellent turnout, and once again, I found that the format was just right: a day of discussions (BarCamp style) followed by a day of doing (coding, designing, hacking). I got very inspired on the first day, so I was raring to go on the second.

What I like to do on the second day is try to complete two tasks; one that’s fairly straightforward, and one that’s a bit tougher. That way, when it comes time to demo at the end of the day, even if I haven’t managed to complete the tougher one, I’ll still be able to demo the simpler one.

In this case, the tougher one was also tricky to demo. It involved a lot of invisible behind-the-scenes plumbing. I was tweaking my webmention endpoint (stop sniggering—tweaking your endpoint is no laughing matter).

Up until now, I could handle straightforward webmentions, and I could handle updates (if I receive more than one webmention from the same link, I check it each time). But I needed to also handle deletions.

The spec is quite clear on this. A 404 isn’t enough to trigger a deletion—that might be a temporary state. But a status of 410 Gone indicates that a resource was once here but has since been deliberately removed. In that situation, any stored webmentions for that link should also be removed.

Anyway, I think I got it working, but it’s tricky to test and even trickier to demo. “Not to worry”, I thought, “I’ve always got my simpler task.”

For that, I chose to add a little map to my homepage showing the last location I published something from. I’ve been geotagging all my content for years (journal entries, notes, links, articles), but not really doing anything with that data. This is a first step to doing something interesting with many years of location data.

I’ve got it working now, but the demo gods really weren’t with me at Indie Web Camp. Both of my demos failed. The webmention demo failed quite embarrassingly.

As well as handling deletions, I also wanted to handle updates where a URL that once linked to a post of mine no longer does. Just to be clear, the URL still exists—it’s not 404 or 410—but it has been updated to remove the original link back to one of my posts. I know this sounds like another very theoretical situation, but I’ve actually got an example of it on my very first webmention test post from five years ago. Believe it or not, there’s an escort agency in Nottingham that’s using webmention as a vector for spam. They post something that does link to my test post, send a webmention, and then remove the link to my test post. I almost admire their dedication.

Still, I wanted to foil this particular situation so I thought I had updated my code to handle it. Alas, when it came time to demo this, I was using someone else’s computer, and in my attempt to right-click and copy the URL of the spam link …I accidentally triggered it. In front of a room full of people. It was midly NSFW, but more worryingly, a potential Code Of Conduct violation. I’m very sorry about that.

Apart from the humiliating demo, I thoroughly enjoyed Indie Web Camp, and I’m going to keep adjusting my webmention endpoint. There was a terrific discussion around the ethical implications of storing webmentions, led by Sebastian, based on his epic post from earlier this year.

We established early in the discussion that we weren’t going to try to solve legal questions—like GDPR “compliance”, which varies depending on which lawyer you talk to—but rather try to figure out what the right thing to do is.

Earlier that day, during the introductions, I quite happily showed webmentions in action on my site. I pointed out that my last blog post had received a response from another site, and because that response was marked up as an h-entry, I displayed it in full on my site. I thought this was all hunky-dory, but now this discussion around privacy made me question some inferences I was making:

  1. By receiving a webention in the first place, I was inferring a willingness for the link to be made public. That’s not necessarily true, as someone pointed out: a CMS could be automatically sending webmentions, which the author might be unaware of.
  2. If the linking post is marked up in h-entry, I was inferring a willingness for the content to be republished. Again, not necessarily true.

That second inferrence of mine—that publishing in a particular format somehow grants permissions—actually has an interesting precedent: Google AMP. Simply by including the Google AMP script on a web page, you are implicitly giving Google permission to store a complete copy of that page and serve it from their servers instead of sending people to your site. No terms and conditions. No checkbox ticked. No “I agree” button pressed.

Just sayin’.

Anyway, when it comes to my own processing of webmentions, I’m going to take some of the suggestions from the discussion on board. There are certain signals I could be looking for in the linking post:

  • Does it include a link to a licence?
  • Is there a restrictive robots.txt file?
  • Are there meta declarations that say noindex?

Each one of these could help to infer whether or not I should be publishing a webmention or not. I quickly realised that what we’re talking about here is an algorithm.

Despite its current usage to mean “magic”, an algorithm is a recipe. It’s a series of steps that contribute to a decision point. The problem is that, in the case of silos like Facebook or Instagram, the algorithms are secret (which probably contributes to their aura of magical thinking). If I’m going to write an algorithm that handles other people’s information, I don’t want to make that mistake. Whatever steps I end up codifying in my webmention endpoint, I’ll be sure to document them publicly.

Cerf rocks

After I wrote about digital preservation and the need to save everything, not just the so-called “important” stuff, Jason wrote a lovely piece with his own thoughts on the matter:

In order to write a history, you need evidence of what happened. When we talk about preserving the stuff we make on the web, it isn’t because we think a Facebook status update, or those GeoCities sites have such significance now. It’s because we can’t know.

In a timely coincidence, Vint Cerf also spoke about the importance of digital preservation:

When you think about the quantity of documentation from our daily lives that is captured in digital form, like our interactions by email, people’s tweets, and all of the world wide web, it’s clear that we stand to lose an awful lot of our history.

He warns of the dangers of rapidly-obsoleting file formats:

We are nonchalantly throwing all of our data into what could become an information black hole without realising it. We digitise things because we think we will preserve them, but what we don’t understand is that unless we take other steps, those digital versions may not be any better, and may even be worse, than the artefacts that we digitised.

It was a little weird that the Guardian headline refers to Vint Cerf as “Google boss”. On the BBC he’s labelled as “Google’s Vint Cerf”. Considering he’s one of the creators of the internet itself, it’s a bit like referring to Neil Armstrong as a NASA employee.

I have to say, I just love listening to him talk. He’s so smooth. I’m sure that the character of The Architect from The Matrix Reloaded is modelled on him.

Vint Cerf knows a thing or two about long-term thinking when it comes to data formats. He has written many RFCs for the IETF (my favourite being RFC 2468). Back in 1969, he wrote RFC 20, proposing the ASCII format for network interchange. If you’ve ever used the keypress event in JavaScript and wondered why, for example, the number 13 corresponds to a carriage return, this is where all those numbers come from.

Last month, over 45 years after the RFC’s original publication, it became an official standard.

So when Vint Cerf warns about the dangers of digitising into file formats that could become unreadable, I think we should pay attention to him.

Celebrating CSS

Cascading Style Sheets turned 20 years old this week. Happy birthtime, CeeSusS!

Bruce interviewed Håkon about the creation of CSS, and it makes for fascinating reading. If you want to dig even deeper, here’s Håkon’s 1994 thesis comparing competing approaches to style sheets.

CSS gets a tough rap. I remember talking to Douglas Crockford about CSS. I’ll paraphrase his stance as “Kill it with fire!” To be fair, he was mostly talking about the lack of a decent layout system in CSS—something that’s only really getting remedied now.

Most of the flak directed at CSS comes from smart programmers, decrying its lack of power. As a declarative language, it lacks even the most basic features of even the simplest procedural language. How are serious programmers supposed to write their serious programmes with such a primitive feature set?

But I think this mindset misses out a crucial facet of understanding CSS: it’s not about us. By us, I mean professional web developers. And when I say it’s not about us, I mean it’s not only about us.

The web is for everyone. That doesn’t just mean that it’s for everyone to use—the web is for everyone to create. That means that the core building blocks of the web need to be learnable by everyone, not just programmers.

I get nervous when I see web browsers gaining powerful features that can only be accessed via a JavaScript API. Geolocation is one example: it doesn’t have any declarative equivalent to its JavaScript implementation. Counter-examples would be video and audio: you can use the JavaScript API to get exactly the behaviour you want, if you’ve got that level of knowledge …or you can use the video and audio elements if you’re okay with letting web browsers handle the complexity of display and playback.

I think that CSS hits a nice sweet spot, balancing learnability and power. I love the fact that every bit of CSS ever written comes down to the same basic pattern:

selector {
    property: value;
}

That’s it!

How amazing is it that one simple pattern can scale to encompass a whole wide world of visual design variety?

Think about the revolution that CSS has gone through in recent years: OOCSS, SMACSS, BEM …these are fundamentally new ways of approaching front-end development, and yet none of these approaches required any changes to be made to the CSS specification. The power and flexibility was already available within its simple selector-property-value pattern.

Mind you, that modularity was compromised when we got things like named animations; a pattern that breaks out of the encapsulation model of CSS. Variables in CSS also break out of the modularity pattern.

Personally, I don’t think there’s any reason to have variables in the CSS language; it’s enough to have them in pre-processing tools. Variables add enormous value for developers, and no value at all for end users. As long as developers can use variables—and they can, with Sass and LESS—I don’t think we need to further complicate CSS.

Bert Bos wrote an exhaustive list of design principles for web standards. There’s some crossover with Tim Berners-Lee’s principles of design, with ideas such as modularity and robustness. Personally, I think that Bert and Håkon did a pretty damn good job of balancing principles like learnability, extensibility, longevity, interoperability and a host of other factors while still producing something powerful enough to scale for the whole web.

There’s one important phrase I want to highlight in the abstract of the 20 year old CSS proposal:

The proposed scheme provides a simple mapping between HTML elements and presentation hints.

Hints.

Every line of CSS you write is a suggestion. You are not dictating how the HTML should be rendered; you are suggesting how the HTML should be rendered. I find that to be a very liberating and empowering idea.

My only regret is that—twenty years on from the birth of CSS—web browsers are killing the very idea of user stylesheets. Along with “view source”, this feature really drove home the idea that professional web developers are not the only ones who have a say in what gets rendered in web browsers …and that the web truly is for everyone.

rel="source"

Aral and his trusty sidekick Victor have taken up residency for a while at the Clearleft office in Middle Street while they work on their very exciting project. It’s nice having them around.

I got chatting to Aral about a markup pattern that’s become fairly prevalent since the rise of Github: linking to the source code for a website or project. You know, like when you see “fork me on Github” links.

We were talking about how it would be nice to have some machine-readable way of explicitly marking up those kind of links, whether they’re in the head of the document, or visible in the body. Sounds like a job for the rel attribute, I thought.

The rel attribute describes the relationship of the current document to the linked document. You can use it on the link element (in the head of your document) and the a element (in the body). The example that everyone is familiar with is rel=”stylesheet” when linking off to a CSS file—the linked document has the relationship of being a stylesheet for the current document.

The rel attribute could theoretically take a space-separated list of any values, just like the class attribute. In practice, there’s much more value in having everyone agree on which rel values should be used.

There used to be a page on the WHATWG site for listing rel values, but it tended to stagnate. So now the official registry for rel values is on the microformats wiki. That’s where you can see which values are recommended for use today and you can also brainstorm new ideas.

The benefit of having one centralised for this is that you can see if someone else has had the same idea as you. Then you can come to agreement on which value to use, so that everyone’s using the same vocabulary instead of just making stuff up.

It doesn’t look like there’s an existing value for the use case of linking to a document’s (or a project’s) source code so I’ve proposed rel=”source”.

Now I should document some use cases of people linking their site to its source code. It might be that wikis qualify as another use case: every “edit” button points to the source of the document in wiki markup.

If you have any thoughts on this pattern, or examples to add, please feel free to add them.

Defining the damn thang

Chris recently documented the results from his survey which asked:

Is it useful to distinguish between “web apps” and “web sites”?

His conclusion:

There is just nothing but questions, exemptions, and gray area.

This is something I wrote about a while back:

Like obscenity and brunch, web apps can be described but not defined.

The results of Chris’s poll are telling. The majority of people believe there is a difference between sites and apps …but nobody can agree on what it is. The comments make for interesting reading too. The more people chime in an attempt to define exactly what a “web app” is, the more it proves the point that the the term “web app” isn’t a useful word (in the sense that useful words should have an agreed-upon meaning).

Tyler Sticka makes a good point:

By this definition, web apps are just a subset of websites.

I like that. It avoids the false dichotomy that a product is either a site or an app.

But although it seems that the term “web app” can’t be defined, there are a lot of really smart people who still think it has some value.

I think Cennydd is right. I think the differences exist …but I also think we’re looking for those differences at the wrong scale. Rather than describing an entire product as either a website or an web app, I think it makes much more sense to distinguish between patterns.

Let’s take those two modifiers—behavioural and informational. But let’s apply them at the pattern level.

The “get stuff” sites that Jake describes will have a lot of informational patterns: how best to present a flow of text for reading, for example. Typography, contrast, whitespace; all of those attributes are important for an informational pattern.

The “do stuff” sites will probably have a lot of behavioural patterns: entering information or performing an action. Feedback, animation, speed; these are some of the possible attributes of a behavioural pattern.

But just about every product out there on the web contains a combination of both types of pattern. Like I said:

Is Wikipedia a website up until the point that I start editing an article? Are Twitter and Pinterest websites while I’m browsing through them but then flip into being web apps the moment that I post something?

Now you could make an arbitrary decision that any product with more than 50% informational patterns is a website, and any product with more than 50% behavioural patterns is a web app, but I don’t think that’s very useful.

Take a look at Brad’s collection of responsive patterns. Some of them are clearly informational (tables, images, etc.), while some of them are much more behavioural (carousels, notifications, etc.). But Brad doesn’t divide his collection into two, saying “Here are the patterns for websites” and “Here are the patterns for web apps.” That would be a dumb way to divide up his patterns, and I think it’s an equally dumb way to divide up the whole web.

What I’m getting at here is that, rather than trying to answer the question “what is a web app, anyway?”, I think it’s far more important to answer the other question I posed:

Why?

Why do you want to make that distinction? What benefit do you gain by arbitrarily dividing the entire web into two classes?

I think by making the distinction at the pattern level, that question starts to become a bit easier to answer. One possible answer is to do with the different skills involved.

For example, I know plenty of designers who are really, really good at informational patterns—they can lay out content in a beautiful, clear way. But they are less skilled when it comes to thinking through all the permutations involved in behavioural patterns—the “arrow of time” that’s part of so much interaction design. And vice-versa: a skilled interaction designer isn’t necessarily the best at old-skill knowledge of type, margins, and hierarchy. But both skillsets will be required on an almost every project on the web.

So I do believe there is value in distinguishing between behaviour and information …but I don’t believe there is value in trying to shoehorn entire products into just one of those categories. Making the distinction at the pattern level, though? That I can get behind.

Addendum

Incidentally, some of the respondents to Chris’s poll shared my feeling that the term “web app” was often used from a marketing perspective to make something sound more important and superior:

Perhaps it’s simply fashion. Perhaps “website” just sounds old-fashioned, and “web app” lends your product a more up-to-date, zingy feeling on par with the native apps available from the carefully-curated walled gardens of app stores.

Approaching things from the patterns perspective, I wonder if those same feelings of inferiority and superiority are driving the recent crop of behavioural patterns for informational content: parallaxy, snowfally, animation patterns are being applied on top of traditional informational patterns like hierarchy, measure, and art direction. I’m not sure that the juxtaposition is working that well. Taking the single interaction involved in long-form informational patterns (that interaction would be scrolling) and then using it as a trigger for all kinds of behavioural patterns feels …uncanny.

Parsing webmentions

Thanks to everyone who helped me test webmentions that I hacked together at Indie Web Camp last weekend.

Let me explain what web mentions are all about…

Basically, it’s an equivalent to pingback. Let’s say I write something here on adactio.com. Suppose that prompts you to write something in response on your own site. A web mention is a way for you to let me know that your response exists.

If you look in the head of any of my journal posts, you’ll see this link element:

<link rel="webmention" href="http://adactio.com/webmention.php" />

That’s my web mention endpoint: http://adactio.com/webmention.php …it’s kind of like a webhook: a URL that’s intended to be hit by machines rather than people. So when you publish your response to my post, you ping that URL with a POST request that sends two parameters:

  1. target: the URL of my post and
  2. source: the URL of your response.

Ideally your own CMS or blogging system would take care of doing the pinging, but until that’s more widely implemented, I’m providing this form at the end of each of my posts:

Either way, once you ping my web mention endpoint—discoverable through that link rel="webmention"—with those two parameters, I just need to confirm that your post does indeed contain a link to my post—by making a cURL request and parsing your source—and then I return a server response of 202 (Accepted).

Here’s the code for a minimum viable web mention in PHP.

That’s as far as I got at Indie Web Camp but it was enough for me to start collecting responses to posts.

Webmentions as links

The next step is to do something with the responses. After all, I’ve already got the source of each response from those cURL requests.

Barnaby has a written a nice straightforward microformats parser in PHP. I’m using that to check the cURLed source for any responses that have been marked up using h-entry. That’s one of the microformats 2 vocabularies—a much simpler way of writing structured content with microformats.

Aaron, Amber, and Barnaby all sent responses that were marked up with h-entry so now their responses appear in full.

Webmentions as comments

So there you have it. Comments are now open on every journal post on adactio.com …the only catch is that you have to write the comment on your own site. And if you want the content of your post to appear here (instead of just a link) then update your blog post template to include a handful of h-entry classes.

Feel free to use this post as a test. Mark up your blog with h-entry, write a post that links to this URL, and enter the URL of your post in the form below.

Battle for the planet of the APIs

Back in 2006, I gave a talk at dConstruct called The Joy Of API. It basically involved me geeking out for 45 minutes about how much fun you could have with APIs. This was the era of the mashup—taking data from different sources and scrunching them together to make something new and interesting. It was a good time to be a geek.

Anil Dash did an excellent job of describing that time period in his post The Web We Lost. It’s well worth a read—and his talk at The Berkman Istitute is well worth a listen. He described what the situation was like with APIs:

Five years ago, if you wanted to show content from one site or app on your own site or app, you could use a simple, documented format to do so, without requiring a business-development deal or contractual agreement between the sites. Thus, user experiences weren’t subject to the vagaries of the political battles between different companies, but instead were consistently based on the extensible architecture of the web itself.

Times have changed. These days, instead of seeing themselves as part of a wider web, online services see themselves as standalone entities.

So what happened?

Facebook happened.

I don’t mean that Facebook is the root of all evil. If anything, Facebook—a service that started out being based on exclusivity—has become more open over time. That’s the cause of many of its scandals; the mismatch in mental models that Facebook users have built up about how their data will be used versus Facebook’s plans to make that data more available.

No, I’m talking about Facebook as a role model; the template upon which new startups shape themselves.

In the web’s early days, AOL offered an alternative. “You don’t need that wild, chaotic lawless web”, it proclaimed. “We’ve got everything you need right here within our walled garden.”

Of course it didn’t work out for AOL. That proposition just didn’t scale, just like Yahoo’s initial model of maintaining a directory of websites just didn’t scale. The web grew so fast (and was so damn interesting) that no single company could possibly hope to compete with it. So companies stopped trying to compete with it. Instead they, quite rightly, saw themselves as being part of the web. That meant that they didn’t try to do everything. Instead, you built a service that did one thing really well—sharing photos, managing links, blogging—and if you needed to provide your users with some extra functionality, you used the best service available for that, usually through someone else’s API …just as you provided your API to them.

Then Facebook began to grow and grow. I remember the first time someone was showing me Facebook—it was Tantek of all people—I remember asking “But what is it for?” After all, Flickr was for photos, Delicious was for links, Dopplr was for travel. Facebook was for …everything …and nothing.

I just didn’t get it. It seemed crazy that a social network could grow so big just by offering …well, a big social network.

But it did grow. And grow. And grow. And suddenly the AOL business model didn’t seem so crazy anymore. It seemed ahead of its time.

Once Facebook had proven that it was possible to be the one-stop-shop for your user’s every need, that became the model to emulate. Startups stopped seeing themselves as just one part of a bigger web. Now they wanted to be the only service that their users would ever need …just like Facebook.

Seen from that perspective, the open flow of information via APIs—allowing data to flow porously between services—no longer seemed like such a good idea.

Not only have APIs been shut down—see, for example, Google’s shutdown of their Social Graph API—but even the simplest forms of representing structured data have been slashed and burned.

Twitter and Flickr used to markup their user profile pages with microformats. Your profile page would be marked up with hCard and if you had a link back to your own site, it include a rel=”me” attribute. Not any more.

Then there’s RSS.

During the Q&A of that 2006 dConstruct talk, somebody asked me about where they should start with providing an API; what’s the baseline? I pointed out that if they were already providing RSS feeds, they already had a kind of simple, read-only API.

Because there’s a standardised format—a list of items, each with a timestamp, a title, a description (maybe), and a link—once you can parse one RSS feed, you can parse them all. It’s kind of remarkable how many mashups can be created simply by using RSS. I remember at the first London Hackday, one of my favourite mashups simply took an RSS feed of the weather forecast for London and combined it with the RSS feed of upcoming ISS flypasts. The result: a Twitter bot that only tweeted when the International Space Station was overhead and the sky was clear. Brilliant!

Back then, anywhere you found a web page that listed a series of items, you’d expect to find a corresponding RSS feed: blog posts, uploaded photos, status updates, anything really.

That has changed.

Twitter used to provide an RSS feed that corresponded to my HTML timeline. Then they changed the URL of the RSS feed to make it part of the API (and therefore subject to the terms of use of the API). Then they removed RSS feeds entirely.

On the Salter Cane site, I want to display our band’s latest tweets. I used to be able to do that by just grabbing the corresponding RSS feed. Now I’d have to use the API, which is a lot more complex, involving all sorts of authentication gubbins. Even then, according to the terms of use, I wouldn’t be able to display my tweets the way I want to. Yes, how I want to display my own data on my own site is now dictated by Twitter.

Thanks to Jo Brodie I found an alternative service called Twitter RSS that gives me the RSS feed I need, ‘though it’s probably only a matter of time before that gets shuts down by Twitter.

Jo’s feelings about Twitter’s anti-RSS policy mirror my own:

I feel a pang of disappointment at the fact that it was really quite easy to use if you knew little about coding, and now it might be a bit harder to do what you easily did before.

That’s the thing. It’s not like RSS is a great format—it isn’t. But it’s just good enough and just versatile enough to enable non-programmers to make something cool. In that respect, it’s kind of like HTML.

The official line from Twitter is that RSS is “infrequently used today.” That’s the same justification that Google has given for shutting down Google Reader. It reminds of the joke about the shopkeeper responding to a request for something with “Oh, we don’t stock that—there’s no call for it. It’s funny though, you’re the fifth person to ask today.”

RSS is used a lot …but much of the usage is invisible:

RSS is plumbing. It’s used all over the place but you don’t notice it.

That’s from Brent Simmons, who penned a love letter to RSS:

If you subscribe to any podcasts, you use RSS. Flipboard and Twitter are RSS readers, even if it’s not obvious and they do other things besides.

He points out the many strengths of RSS, including its decentralisation:

It’s anti-monopolist. By design it creates a level playing field.

How foolish of us, therefore, that we ended up using Google Reader exclusively to power all our RSS consumption. We took something that was inherently decentralised and we locked it up into one provider. And now that provider is going to screw us over.

I hope we won’t make that mistake again. Because, believe me, RSS is far from dead just because Google and Twitter are threatened by it.

In a post called The True Web, Robin Sloan reiterates the strength of RSS:

It will dip and diminish, but will RSS ever go away? Nah. One of RSS’s weaknesses in its early days—its chaotic decentralized weirdness—has become, in its dotage, a surprising strength. RSS doesn’t route through a single leviathan’s servers. It lacks a kill switch.

I can understand why that power could be seen as a threat if what you are trying to do is force your users to consume their own data only the way that you see fit (and all in the name of “user experience”, I’m sure).

Returning to Anil’s description of the web we lost:

We get a generation of entrepreneurs encouraged to make more narrow-minded, web-hostile products like these because it continues to make a small number of wealthy people even more wealthy, instead of letting lots of people build innovative new opportunities for themselves on top of the web itself.

I think that the presence or absence of an RSS feed (whether I actually use it or not) is a good litmus test for how a service treats my data.

It might be that RSS is the canary in the coal mine for my data on the web.

If those services don’t trust me enough to give me an RSS feed, why should I trust them with my data?

Figuring out

A recent simplequiz over on HTML5 Doctor threw up some interesting semantic issues. Although the figure element wasn’t the main focus of the article, a lot of the comments were concerned with how it could and should be used.

This is a subject that has been covered before on HTML5 Doctor. It turns out that we may have been too restrictive in our use of the element, mistaking some descriptive text in the spec for proscriptive instruction:

The element can thus be used to annotate illustrations, diagrams, photos, code listings, etc, that are referred to from the main content of the document, but that could, without affecting the flow of the document, be moved away from that primary content, e.g. to the side of the page, to dedicated pages, or to an appendix.

Steve and Bruce have been campaigning on the HTML mailing list to get the wording updated and clarified.

Meanwhile, in an unrelated semantic question, there was another HTML5 Doctor article a while back about quoting and citing with blockquote and its ilk.

The two questions come together beautifully in a blog post on the newly-redesigned A List Apart documenting this pattern for associating quotations and authorship:

<figure>
 <blockquote>It is the unofficial force—the Baker Street irregulars.</blockquote>
 <figcaption>Sherlock Holmes, <cite>Sign of Four</cite></figcaption>
</figure>

Although, unsurprisingly, I still take issue with the decision in HTML5 not to allow the cite element to apply to people. As I’ve said before we don’t have to accept this restriction:

Join me in a campaign of civil disobedience against the unnecessarily restrictive, backwards-incompatible change to the cite element.

In which case, we get this nice little pattern combining figure, blockquote, cite, and the hCard microformat, like this:

<figure>
 <blockquote>It is the unofficial force—the Baker Street irregulars.</blockquote>
 <figcaption class="vcard"><cite class="fn">Sherlock Holmes</cite>, <cite>Sign of Four</cite></figcaption>
</figure>

Or like this:

<figure>
 <blockquote>Join me in a campaign of civil disobedience against the unnecessarily restrictive, backwards-incompatible change to the cite element.</blockquote>
 <figcaption class="vcard"><cite class="fn">Jeremy Keith</cite>, <a href="http://24ways.org/2009/incite-a-riot/"><cite>Incite A Riot</cite></a></figcaption>
</figure>

Twilight

I went out to The Albert the other night to see Twilight Hotel play. There were really good, so after the show I bought their CD, When The Wolves Go Blind.

It was only when I got home that I realised that I had no device that could play Compact Discs. I play all my music on my iPod or iPhone connected to a speaker dock. And my computer is a Macbook Air …no disc drive. So I had to bring the CD into work with me, stick into my iMac and rip the songs from there.

It’s funny how format (or storage medium) obsolescence creeps up on you like that. I wonder how long it will be until I’m not using any kind of magnetic medium at all.

Facing the future

There is much hand-wringing in the media about the impending death of journalism, usually blamed on the rise of the web or more specifically bloggers. I’m sympathetic to their plight, but sometimes journalists are their own worst enemy, especially when they publish badly-researched articles that fuel moral panic with little regard for facts (if you’ve ever been in a newspaper article yourself, you’ll know that you’re lucky if they manage to spell your name right).

Exhibit A: an article published in The Guardian called How I became a Foursquare cyberstalker. Actually, the article isn’t nearly as bad as the comments, which take ignorance and narrow-mindedness to a new level.

Fortunately Ben is on hand to set the record straight. He wrote Concerning Foursquare and communicating privacy. Far from being a lesser form of writing, this blog post is more accurate than the article it is referencing, helping to balance the situation with a different perspective …and a nice big dollop of facts and research. Ben is actually quite kind to The Guardian article but, in my opinion, his own piece is more interesting and thoughtful.

Exhibit B: an article by Jeffrey Rosen in The New York Times called The Web Means the End of Forgetting. That’s a bold title. It’s also completely unsupported by the contents of the article. The article contains anecdotes about people getting into trouble about something they put on the web, and—even though the consequences for that action played out in the present—he talks about the permanent memory bank of the Web and writes:

The fact that the Internet never seems to forget is threatening, at an almost existential level, our ability to control our identities.

Bollocks. Or, to use the terminology of Wikipedia, citation needed.

Scott Rosenberg provides the necessary slapdown, asking Does the Web remember too much — or too little?:

Rosen presents his premise — that information once posted to the Web is permanent and indelible — as a given. But it’s highly debatable. In the near future, we are, I’d argue, far more likely to find ourselves trying to cope with the opposite problem: the Web “forgets” far too easily.

Exactly! I get irate whenever I hear the truism that the web never forgets presented without any supporting data. It’s right up there with eskimos have fifty words for snow and people in the middle ages thought that the world was flat. These falsehoods are irritating at best. At worst, as is the case with the myth of the never-forgetting web, the lie is downright dangerous. As Rosenberg puts it:

I’m a lot less worried about the Web that never forgets than I am about the Web that can’t remember.

That’s a real problem. And yet there’s no moral panic about the very real threat that, once digitised, our culture could be in more danger of being destroyed. I guess that story doesn’t sell papers.

This problem has a number of thorns. At the most basic level, there’s the issue of . I love the fact that the web makes it so easy for people to publish anything they want. I love that anybody else can easily link to what has been published. I hope that the people doing the publishing consider the commitment they are making by putting a linkable resource on the web.

As I’ve said before, a big part of this problem lies with the DNS system:

Domain names aren’t bought, they are rented. Nobody owns domain names, except ICANN.

I’m not saying that we should ditch domain names. But there’s something fundamentally flawed about a system that thinks about domain names in time periods as short as a year or two.

Then there’s the fact that so much of our data is entrusted to third-party sites. There’s no guarantee that those third-party sites give a rat’s ass about the long-term future of our data. Quite the opposite. The callous destruction of Geocities by Yahoo is a testament to how little our hopes and dreams mean to a company concerned with the bottom line.

We can host our own data but that isn’t quite as easy as it should be. And even with the best of intentions, it’s possible to have the canonical copies wiped from the web by accident. I’m very happy to see services like Vaultpress come on the scene:

Your WordPress site or blog is your connection to the world. But hosting issues, server errors, and hackers can wipe out in seconds what took years to build. VaultPress is here to protect what’s most important to you.

The Internet Archive is also doing a great job but Brewster Kahle shouldn’t have to shoulder the entire burden. Dave Winer has written about the idea of future-safe archives:

We need one or more institutions that can manage electronic trusts over very long periods of time.

The institutions need to be long-lived and have the technical know-how to manage static archives. The organizations should need the service themselves, so they would be likely to advance the art over time. And the cost should be minimized, so that the most people could do it.

The Library of Congress has its Digital Preservation effort. Dan Gillmor reports on the recent three-day gathering of the institution’s partners:

It’s what my technology friends call a non-trivial task, for all kinds of technical, social and legal reasons. But it’s about as important for our future as anything I can imagine. We are creating vast amounts of information, and a lot of it is not just worth preserving but downright essential to save.

There’s an even longer-term problem with digital preservation. The very formats that we use to store our most treasured memories can become obsolete over time. This goes to the very heart of why standards such as HTML—the format I’m betting on—are so important.

Mark Pilgrim wrote about the problem of format obsolescence back in 2006. I found his experiences echoed more recently by Paul Glister, author of the superb Centauri Dreams, one of my favourite websites. He usually concerns himself with challenges on an even longer timescale, like the construction of a feasible means of interstellar travel but he gives a welcome long zoom perspective on digital preservation in Burying the Digital Genome, pointing to a project called PLANETS: Preservation and Long-term Access Through Networked Services.

Their plan involves the storage, not just of data, but of data formats such as JPEG and PDF: the equivalent of a Rosetta stone for our current age. A box containing format-decoding documentation has been buried in a bunker under the Swiss Alps. That’s a good start.

David Eagleman recently gave a talk for The Long Now Foundation entitled Six Easy Steps to Avert the Collapse of Civilization. Step two is Don’t lose things:

As proved by the destruction of the Alexandria Library and of the literature of Mayans and Minoans, “knowledge is hard won but easily lost.”

Long Now: Six Easy Steps to Avert the Collapse of Civilization on Huffduffer

I’m worried that we’re spending less and less time thinking about the long-term future of our data, our culture, and ultimately, our civilisation. Currently we are preoccupied with the real-time web: Twitter, Foursquare, Facebook …all services concerned with what’s happening right here, right now. The Long Now Foundation and Tau Zero Foundation offer a much-needed sense of perspective.

As with that other great challenge of our time—the alteration of our biosphere through climate change—the first step to confronting the destruction of our collective digital knowledge must be to think in terms greater than the local and the present.

Making Workshops for the Web

The latest Clearleft offering is Workshops for the Web. It made sense to move our workshop offerings out of the Clearleft site—where they were kind of distracting from the main message of the company—and give them their own home, just like our other events, dConstruct and UX London.

As well as the range of workshops that can be booked privately at any time, there’s a schedule of upcoming public workshops for 2010:

  1. CSS3 Wizardry on January 29th,
  2. Copywriting for the Web on March 5th,
  3. HTML5 for Web Designers on April 23rd,
  4. UX Fundamentals on June 11th and
  5. Usability Testing on July 16th.

The next workshop, CSS3 Wizardry with Rich and Nat, promises to be packed full of cutting-edge front-end techniques. Book a place if you want to have CSS3 kung-fu injected into your brainstem.

Visual Design

I’m pretty pleased with how the site turned out. When I began designing it initially, I thought I would give it a sort of Russian constructivist feeling: the title Workshops for the Web made me think of an international workers movement. I started researching political propaganda posters, beginning with the book Revolutionary Tides.

Revolutionary Tides

There’s also some fantastic propaganda material in The National Archives (and I just love the modern twist of World War Three propaganda posters). I found a treasure trove of images of American working life in the Flickr Commons collection from The Library of Congress. I started gathering these sources together and distilling some of the common components such as bold colours and diagonal lines.

Workers of the web: unite!

This was when Jon was working as an intern at Clearleft. I enlisted his help in brainstorming some ideas and he came up with some great stuff—like using Soviet space-race imagery—and we played around with proof-of-concept ideas for creating diagonal backgrounds using CSS3 transforms.

But it never really came together for me. Much as I loved the Russian constructivist propaganda angle, I ditched it and started from scratch.

IA

I scribbled down a page description diagram describing what the site needed to communicate in order of importance:

  1. The name of the site.
  2. A positioning statement.
  3. The next workshop.
  4. Other upcoming workshops.
  5. A list of all workshops available.
  6. A way of getting in touch.

The hierarchy for an individual workshop page looked pretty similar:

  1. The title of the workshop.
  2. The date of the workshop.
  3. The location of the workshop.
  4. The price of the workshop.
  5. Details of the workshop.

It was clear that the page needed to quickly answer some basic questions: what? where? how much?

I started marking up the answers to those questions from top to bottom. That’s when it started to come together. Working with markup and CSS in the browser felt more productive than any of the sketching I had done in Photoshop. I started really sweating the typography …to the extent that I decided that even the logotype should be created with “live” text rather than an image.

Build

From the start, I knew that I wanted the site to be a self-describing example of the technologies taught in the workshops. The site is built in HTML5, making good use of the new structural elements and the powerful outline algorithm. Marking up an events site with the hCalendar microformat was a no-brainer. There are hCards a-plenty too.

CSS3 nth-child selectors came in very handy and media queries are, quite simply, the bee’s knees when it comes to building a flexible site: just a few declarations allowed me to make sure the liquid layout could be optimised for different ranges of viewport size.

Workshops for the Web homepage Workshops for the Web homepage

Given the audience of the site, I could be fairly certain that Internet Explorer 6 wouldn’t be much of a hindrance. As it turns out, everything looks more or less okay even in that crappy browser. It looks different, of course, but then do websites need to look exactly the same in every browser?

Right before launch, Paul took a shot at tweaking the visual design, adding a bit more contrast and separation on the homepage with some horizontal banding. That’s a visual element that I had been subconsciously avoiding, probably because it’s already used on some of our other sites, but once it was added, it helped to emphasise the next upcoming workshop—the main purpose of the homepage.

Just because the site is live now doesn’t mean that I’ll stop working on it. I’d like to keep tweaking and evolving it. Maybe I’ll finally figure out a way of incorporating some elements of those great propaganda posters.

Propaganda

Microformation

It’s been a busy week for microformats.

Google announced that it was following in the footsteps of in indexing microformats and RDFa to display in search results. For now, it’s a subset of microformats— and —on a subset of websites, including the newly microformated Yelp. The list of approved sites will increase over time so if you’re already publishing structured contact and review information, let Google know about it.

But the other new announcement is equally important. After a lot of hard work, the is ready for use.

The what now? I hear you ask. Well, if you’ve been feeling hampered by the combination of the and design patterns, the value class pattern offers a few alternatives.

To my mind, that’s one of the greatest strengths of the value class pattern: it doesn’t offer one alternative, it allows authors to choose how they want to mark up their content. I think that’s one of the reasons why datetime values have proven to be such a sticking point up ‘till now. Concerns about semantics and accessibility really come down to the fact that, as an author, you had very little choice in how you could mark up a datetime value.

You could either present the datetime between the opening and closing tags of whatever element you were using:

<span class="dtstart">
 2009-06-05T20:00:00
</span>

…or you could put the value in the title attribute of the abbr element:

<abbr class="dtstart" title="2009-06-05T20:00:00">
 Friday, June 5th at 8pm
</abbr>

Those were your only options.

But now, with the value class pattern, all of the following are possible:

<span class="dtstart">
 <abbr class="value" title="2009-06-05">
  Friday, June 5th
 </abbr>
 at
 <abbr class="value" title="20:00">
  8pm
 </abbr>
</span>

<span class="dtstart">
 <span class="value-title" title="2009-06-05T20:00:00">
  Friday, June 5th at 8pm
 </span>
</span>

<span class="dtstart">
 <span class="value-title" title="2009-06-05">
  Friday, June 5th
 </span>
 at
 <span class="value-title" title="20:00">
  8pm
 </span>
</span>

<span class="dtstart">
 <span class="value-title" title="2009-06-05T20:00:00"> </span>
 Friday, June 5th at 8pm
</span>

Personally, I’ll probably use the first example. I like the idea of splitting up the date and time portions of a datetime value. I think there’s a big difference between putting a date string into the title attribute of an abbr element and putting a datetime string in there. In the past, when I argued that having an ISO date value in an abbreviation was semantic, accessible and internationalised, Mike Davies rightly accused me of using a strawman—the issue wasn’t about dates, it was about datetimes. That’s why I created the page on the microformats wiki; to disambiguate it as a subset of the larger .

Now, others might think that even using dates in combination with the abbr design pattern is semantically dodgy. That’s fine. They now have some other options they can use, thanks to the value-title subset of the value class pattern. Me? I don’t see myself using that. I’m especially not keen on the option to use an empty element. But I’m perfectly happy for other authors to go ahead and use that option. When it comes to writing, there are often no right or wrong answers, just personal preferences. That’s true whether it’s English, HTML, or any other language. As long as you use correct syntax and grammar, the details are up to you. You can choose semicolons or em-dashes when you’re writing English. You can choose abbr or value-title when you’re writing microformats.

The wiki page for the value class pattern doesn’t just list the options available to authors. It also explains them. That’s just as important. Head over there and read the document. I think you’ll agree that it’s an excellent example of clear, methodical writing.

The microformats wiki needs more pages like that. One of the biggest challenges facing microformats isn’t any particular technical problem; it’s trying to explain to willing HTML authors how to get up and running with microformats. Given Google’s recent announcement, there’ll probably be even more eager authors showing up, looking to sprinkle some extra semantics into their markup. We’ll be hanging out in the IRC channel, ready to answer any questions people might have, but I wish the wiki were laid out in a more self-explanatory way.

In the face of that challenge, the page for the value class pattern leads by example. Ben and Tantek have done a fantastic job. And it wouldn’t have been possible without the help and support of Bruce, James and Derek, those magnanimous giants of the accessibility community who offered help, support and data.