Tags: format



Saturday, December 26th, 2020

How Claude Shannon’s Information Theory Invented the Future | Quanta Magazine

Shannon is not exactly a household name. He never won a Nobel Prize, and he wasn’t a celebrity like Albert Einstein or Richard Feynman, either before or after his death in 2001. But more than 70 years ago, in a single groundbreaking paper, he laid the foundation for the entire communication infrastructure underlying the modern information age.

Tuesday, December 22nd, 2020

Ignore AMP · Jens Oliver Meiert

It started using the magic spell of prominent results page display to get authors to use it. Nothing is left of the original lure of raising awareness for web performance, and nothing convincing is there to confirm it was, indeed, a usable “web component framework.”

Wednesday, December 16th, 2020


I spent the last couple of weekends rolling out a new feature on The Session. It involves playing audio in a web page. No big deal these days, right? But the history involves some old file formats…

The first venerable format is ABC notation. File extension: .abc, mime type: text/vnd.abc. It’s an ingenious text format for musical notation using ASCII. The metadata of the piece of music is defined in JSON-like key/value pairs. Then the contents are encoded with letters: A, B, C, etc. Uppercase and lowercase denote different octaves. Numbers can be used for note lengths.

The format was created by Chris Walshaw in 1997 when dial-up was the norm. With ABC, people were able to swap tunes on email lists or bulletin boards without transferring weighty image or sound files. If you had ABC software on your computer, you could convert that lightweight text file into sheet music …or audio.

That brings me to the second old format: midi files. File extension: .mid, mime-type: audio/midi. Like ABC, it’s a lightweight format for encoding the instructions for music instead of the music itself.

Think of it like SVG: instead of storing the final pixels of an image, SVG stores the instructions for drawing the image instead. The instructions in a midi file are like “play this note for this long on this instrument.” Again, as with ABC, you need some software to turn the instructions into sound.

There was a time when lots of software could play midi files. Quicktime on the Mac, for example. You could even embed midi files in web pages. I mean literally embed them …with the embed element. No Geocities page was complete without an autoplaying midi file.

On The Session, people submit tunes in ABC format. Then, using the amazing ABCJS JavaScript library, the ABC is turned into SVG on the fly! For years I’ve also offered midi files, generated on the server from the ABC notation.

But times have changed. These days it’s hard to find software that plays midi files. Quicktime doesn’t do it anymore. And you’d need to go to the app store on iOS to find a midi file player. It’s time to phase out the midi files on The Session.

I still want to provide automatically-generated audio though. Fortunately ABCJS gives me a way to do this. But instead of using the old technology of midi files, it uses a more modern browser feature: the Web Audio API.

The end result sounds like a midi file, but the underlying technique is more like a synthesiser. There’s a separate mp3 file for each note. The JavaScript figures out how long each “sample” needs to be played for, strings them all together, and outputs them with Web Audio. So you’ve got cutting-edge browser technology recreating a much older file format. Paul Rosen—the creator of ABCJS—has a presentation explaining how it all works under the hood.

Not only is there a separate short mp3 file for each note in seven octaves, but if you want the sound of a different instrument, you need samples for all seven octaves in that instrument. They’re called soundfonts.

Paul provides soundfonts for ABCJS. It’s a repo that was forked from this repo from Benjamin Gleitzman. And here’s where it gets small worldy…

The reason why Benjamin has a repo of soundfonts is because he needed to create midi-like audio in the browser. He wanted to do this for a project on September 28th and 29th, 2013 …at Science Hack Day San Francisco!

I was there too—working on my own audio-related hack—and I remember the excellent (and winning) hack that Benjamin worked on. It was called Symphony of Satellites and it’s still online along with the promo video. Here’s Benjamin’s post-hackday write-up from seven years ago.

It’s rare that the worlds of the web and Irish music cross over. When I got to meet Paul—creator of ABCJS—at a web conference a couple of years ago it kind of blew my mind. Last weekend when I set out to dabble with a feature on The Session, I certainly didn’t expect to stumble on a connection to Science Hack Day! (Aside: the first Science Hack Day was ten years ago—yowzers!)

Anyway, I was able to get that audio playback working on The Session. Except for some weirdness on iOS that I had to fix. But that’s a hack for another day.

Tuesday, December 15th, 2020


I was very inspired by something Terence Eden wrote on his blog last year. A report from the AMP Advisory Committee Meeting:

I don’t like AMP. I think that Google’s Accelerated Mobile Pages are a bad idea, poorly executed, and almost-certainly anti-competitive.

So, I decided to join the AC (Advisory Committee) for AMP.

Like Terence, I’m not a fan of Google AMP—my initially positive reaction to it soured over time as it became clear that Google were blackmailing publishers by privileging AMP pages in Google Search. But all I ever did was bitch and moan about it on my website. Terence actually did something.

So this year I put myself forward as a candidate for the AMP advisory committee. I have no idea how the election process works (or who does the voting) but thanks to whoever voted for me. I’m now a member of the AMP advisory committee. If you look at that blog post announcing the election results, you’ll see the brief blurb from everyone who was voted in. Most of them are positively bullish on AMP. Mine is not:

Jeremy Keith is a writer and web developer dedicated to an open web. He is concerned that AMP is being unfairly privileged by Google’s search engine instead of competing on its own merits.

The good news is that main beef with AMP is already being dealt with. I wanted exactly what Terence said:

My recommendation is that Google stop requiring that organisations use Google’s proprietary mark-up in order to benefit from Google’s promotion.

That’s happening as of May of this year. Just as well—the AMP advisory committee have absolutely zero influence on Google search. I’m not sure how much influence we have at all really.

This is an interesting time for AMP …whatever AMP is.

See, that’s been a problem with Google AMP from the start. There are multiple defintions of what AMP is. At the outset, it seemed pretty straightforward. AMP is a format. It has a doctype and rules that you have to meet in order to be “valid” AMP. Part of that ruleset involved eschewing HTML elements like img and video in favour of web components like amp-img and amp-video.

That messaging changed over time. We were told that AMP is the collection of web components. If that’s the case, then I have no problem at all with AMP. People are free to use the components or not. And if the project produces performant accessible web components, then that’s great!

But right now it’s not at all clear which AMP people are talking about, even in the advisory committee. When we discuss improving AMP, do we mean the individual components or the set of rules that qualify an AMP page being “valid”?

The use-case for AMP-the-format (as opposed to AMP-the-library-of-components) was pretty clear. If you were a publisher and you wanted to appear in the top stories carousel in Google search, you had to publish using AMP. Just using the components wasn’t enough. Your pages had to be validated as AMP-the-format.

That’s no longer the case. From May, pages that are fast enough will qualify for the top stories carousel. What will publishers do then? Will they still maintain separate AMP-the-format pages? Time will tell.

I suspect publishers will ditch AMP-the-format, although it probably won’t happen overnight. I don’t think anyone likes being blackmailed by a search engine:

An engineer at a major news publication who asked not to be named because the publisher had not authorized an interview said Google’s size is what led publishers to use AMP.

The pre-rendering (along with the lightning bolt) that happens for AMP pages in Google search might be a reason for publishers to maintain their separate AMP-the-format pages. But I suspect publishers don’t actually think the benefits of pre-rendering outweigh the costs: pre-rendered AMP-the-format pages are served from Google’s servers with a Google URL. If anything, I think that publishers will look forward to having the best of both worlds—having their pages appear in the top stories carousel, but not having their pages hijacked by Google’s so-called-cache.

Does AMP-the-format even have a future without Google search propping it up? I hope not. I think it would make everything much clearer if AMP-the-format went away, leaving AMP-the-collection-of-components. We’d finally see these components being evaluated on their own merits—usefulness, performance, accessibility—without unfair interference.

So my role on the advisory committee so far has been to push for clarification on what we’re supposed to be advising on.

I think it’s good that I’m on the advisory committee, although I imagine my opinions could easily be be dismissed given my public record of dissent. I may well be fooling myself though, like those people who go to work at Facebook and try to justify it by saying they can accomplish more from inside than outside (or whatever else they tell themselves to sleep at night).

The topic I’ve volunteered to help with is somewhat existential in nature: what even is AMP? I’m happy to spend some time on that. I think it’ll be good for everyone to try to get that sorted, regardless about how you feel about the AMP project.

I have no intention of giving any of my unpaid labour towards the actual components themselves. I know AMP is theoretically open source now, but let’s face it, it’ll always be perceived as a Google-led project so Google can pay people to work on it.

That said, I’ve also recently joined a web components community group that Lea instigated. Remember she wrote that great blog post recently about the failed promise of web components? I’m not sure how much I can contribute to the group (maybe some meta-advice on the nature of good design principles?) but at the very least I can serve as a bridge between the community group and the AMP advisory committee.

After all, AMP is a collection of web components. Maybe.

Thursday, October 1st, 2020

Downloading from Google Fonts

If you’re using web fonts, there are good performance (and privacy) reasons for hosting your own font files. And fortunately, Google Fonts gives you that option. There’s a “Download family” button on every specimen page.

But if you go ahead and download a font family from Google Fonts, you’ll notice something a bit odd. The .zip file only contains .ttf files. You can serve those on the web, but it’s far from the best choice. Woff2 is far leaner in file size.

This means you need to manually convert the downloaded .ttf files into .woff or .woff2 files using something like Font Squirrel’s generator. That’s fine, but I’m curious as to why this step is necessary. Why doesn’t Google Fonts provide .woff or .woff2 files in the downloaded folder? After all, if you choose to use Google Fonts as a third-party hosting service for your fonts, it most definitely serves up the appropriate file formats.

I thought maybe it was something to do with the licensing. Maybe some licenses only allow for unmodified truetype files to be distributed? But I’ve looked at fonts with different licenses—some have Apache 2 licensing, some have Open Font licensing—and they’re all quite permissive and definitely allow for modification.

Maybe the thinking is that, if you’re hosting your own font files, then you know what you’re doing and you should be able to do your own file conversion and subsetting. But I’ve come across more than one website in the wild serving up .ttf files. And who can blame them? They want to host their own font files. They downloaded those files from Google Fonts. Why shouldn’t they assume that they’re good to go?

It’s all a bit strange. If anyone knows why Google Fonts only provides .ttf files for download, please let me know. In a pinch, I will also accept rampant speculation.

Trys also pointed out some weird default behaviour if you do let Google Fonts do the hosting for you. Specifically if it’s a variable font. Let’s say it’s a font with weight as a variable axis. You specify in advance which weights you’ll be using, and then it generates separate font files to serve for each different weight.

Doesn’t that defeat the whole point of using a variable font? I mean, I can see how it could result in smaller file sizes if you’re just using one or two weights, but isn’t half the fun of having a weight axis that you can go crazy with as many weights as you want and it’s all still one font file?

Like I said, it’s all very strange.

Thursday, September 24th, 2020

Performance and people

I was helping a client with a bit of a performance audit this week. I really, really enjoy this work. It’s such a nice opportunity to get my hands in the soil of a website, so to speak, and suggest changes that will have a measurable effect on the user’s experience.

Not only is web performance a user experience issue, it may well be the user experience issue. Page speed has a proven demonstrable direct effect on user experience (and revenue and customer satisfaction and whatever other metrics you’re using).

It struck me that there’s a continuum of performance challenges. On one end of the continuum, you’ve got technical issues. These can be solved with technical solutions. On the other end of the continuum, you’ve got human issues. These can be solved with discussions, agreement, empathy, and conversations (often dreaded or awkward).

I think that, as developers, we tend to gravitate towards the technical issues. That’s our safe space. But I suspect that bigger gains can be reaped by tackling the uncomfortable human issues.

This week, for example, I uncovered three performance issues. One was definitely technical. One was definitely human. One was halfway between.

The technical issue was with web fonts. It’s a lot of fun to dive into this aspect of web performance because quite often there’s some low-hanging fruit: a relatively simple technical fix that will boost the performance (or perceived performance) of a website. That might be through resource hints (using link rel=“preload” in the HTML) or adjusting the font loading (using font-display in the CSS) or even nerdier stuff like subsetting.

In this case, the issue was with the file format of the font itself. By switching to woff2, there were significant file size savings. And the great thing is that @font-face rules allow you to specify multiple file formats so you can still support older browsers that can’t handle woff2. A win all ‘round!

The performance issue that was right in the middle of the technical/human continuum was with images. At first glance it looked like a similar issue to the fonts. Some images were being served in the wrong formats. When I say “wrong”, I guess I mean inappropriate. A photographic image, for example, is probably going to best served as a JPG rather than a PNG.

But unlike the fonts, the images weren’t in the direct control of the developers. These images were coming from a Content Management System. And while there’s a certain amount of processing you can do on the server, a human still makes the decision about what file format they’re uploading.

I’ve seen this happen at Clearleft. We launched an event site with lean performant code, but then someone uploaded an image that’s megabytes in size. The solution in that case wasn’t technical. We realised there was a knowledge gap around image file formats—which, let’s face it, is kind of a techy topic that most normal people shouldn’t be expected to know.

But it was extremely gratifying to see that people were genuinely interested in knowing a bit more about choosing the right format for the right image. I was able to provide a few rules of thumb and point to free software for converting images. It empowered those people to feel more confident using the Content Management System.

It was a similar situation with the client site I was looking at this week. Nobody is uploading oversized images in order to deliberately make the site slower. They probably don’t realise the difference that image formats can make. By having a discussion and giving them some pointers, they’ll have more knowledge and the site will be faster. Another win all ‘round!

At the other end of continuum was an issue that wasn’t technical. From a technical point of view, there was just one teeny weeny little script. But that little script is Google Tag Manager which then calls many, many other scripts that are not so teeny weeny. Third party scripts …the bane of web performance!

In retrospect, it seems unbelievable that third-party JavaScript is even possible. I mean, putting arbitrary code—that can then inject even more arbitrary code—onto your website? That seems like a security nightmare!

Remember when I did a countdown of the top four web performance challenges? At the number one spot is other people’s JavaScript.

Now one technical solution would be to remove the Google Tag Manager script. But that’s probably not very practical—you’ll probably just piss off some other department. That said, if you can’t find out which department was responsible for adding the Google Tag Manager script in the first place, it might we well be an option to remove it and then wait and see who complains. If no one notices it’s gone, job done!

More realistically, there’s someone who’s added that Google Tag Manager script for their own valid reasons. You’ll need to talk to them and understand their needs.

Again, as with images uploaded in a Content Management System, they may not be aware of the performance problems caused by third-party scripts. You could try throwing numbers at them, but I think you get better results by telling the story of performance.

Use tools like Request Map Generator to help them visualise the impact that third-party scripts are having. Talk to them. More importantly, listen to them. Find out why those scripts are being requested. What are the outcomes they’re working towards? Can you offer an alternative way of providing the data they need?

I think many of us developers are intimidated or apprehensive about approaching people to have those conversations. But it’s necessary. And in its own way, it can be as rewarding as tinkering with code. If the end result is a faster website, then the work is definitely worth doing—whether it’s technical work or people work.

Personally, I just really enjoy working on anything that will end up improving a website’s performance, and by extension, the user experience. If you fancy working with me on your site, you should get in touch with Clearleft.

Wednesday, September 16th, 2020

Sophie Zhang and The Social Dilemma | Revue

I watched The Social Dilemma last night and to say it’s uneven would be like saying the Himalayas are a little bumpy.

I’m shocked at how appealing so many people find the idea that social networks are uniquely responsible for all of society’s ills.

This cartoon super villain view of the world strikes me as a kind of mirror image of the right-wing conspiracy theories which hold that a cabal of elites are manipulating every world event in secret. It is more than a little ironic that a film that warns incessantly about platforms using misinformation to stoke fear and outrage seems to exist only to stoke fear and outrage — while promoting a distorted view of how those platforms work along the way.

Monday, September 14th, 2020

Why Do We Interface?

A short web book on the past, present and future of interfaces, written in a snappy, chatty style.

From oral communication and storytelling 500,000 years ago to virtual reality today, the purpose of information interfaces has always been to communicate more quickly, more deeply, to foster relationships, to explore, to measure, to learn, to build knowledge, to entertain, and to create.

We interface precisely because we are human. Because we are intelligent, because we are social, because we are inquisitive and creative.

We design our interfaces and they in turn redefine what it means to be human.

Wednesday, September 9th, 2020

AVIF has landed - JakeArchibald.com

There’s a new image format on the browser block and it’s very performant indeed. Jake has all the details you didn’t ask for.

Saturday, April 25th, 2020


At the beginning of the year, Remy wrote about extracting Goodreads metadata so he could create his end-of-year reading list. More recently, Mark Llobrera wrote about how he created a visualisation of his reading history. In his case, he’s using JSON to store the information.

This kind of JSON storage is exactly what Tom Critchlow proposes in his post, Library JSON - A Proposal for a Decentralized Goodreads:

Thinking through building some kind of “web of books” I realized that we could use something similar to RSS to build a kind of decentralized GoodReads powered by indie sites and an underlying easy to parse format.

His proposal looks kind of similar to what Mark came up with. There’s a title, an author, an image, and some kind of date for when you started and/or finished reading the book.

Matt then points out that RSS gets close to the data format being suggested and asks how about using RSS?:

Rather than inventing a new format, my suggestion is that this is RSS plus an extension to deal with books. This is analogous to how the podcast feeds are specified: they are RSS plus custom tags.

Like Matt, I’m in favour of re-using existing wheels rather than inventing new ones, mostly to avoid a 927 situation.

But all of these proposals—whether JSON or RSS—involve the creation of a separate file, and yet the information is originally published in HTML. Along the lines of Matt’s idea, I could imagine extending the h-entry collection of class names to allow for books (or films, or other media). It already handles images (with u-photo). I think the missing fields are the date-related ones: when you start and finish reading. Those fields are present in a different microformat, h-event in the form of dt-start and dt-end. Maybe they could be combined:

<article class="h-entry h-event h-review">
<h1 class="p-name p-item">Book title</h1>
<img class="u-photo" src="image.jpg" alt="Book cover.">
<p class="p-summary h-card">Book author</p>
<time class="dt-start" datetime="YYYY-MM-DD">Start date</time>
<time class="dt-end" datetime="YYYY-MM-DD">End date</time>
<div class="e-content">Remarks</div>
<data class="p-rating" value="5">★★★★★</data>
<time class="dt-published" datetime="YYYY-MM-DDThh:mm">Date of this post</time>

That markup is simultaneously a post (h-entry) and an event (h-event) and you can even throw in h-card for the book author (as well as h-review if you like to rate the books you read). It can be converted to RSS and also converted to .ics for calendars—those parsers are already out there. It’s ready for aggregation and it’s ready for visualisation.

I publish very minimal reading posts here on adactio.com. What little data is there isn’t very structured—I don’t even separate the book title from the author. But maybe I’ll have a little play around with turning these h-entries into combined h-entry/event posts.

Thursday, April 23rd, 2020

Better Image Optimization by Restricting the Color Index – Eric’s Archived Thoughts

A great little mini case-study from Eric—if you’re exporting transparent PNGs from a graphic design tool, double-check the colour-depth settings!

I’d been saving the PNGs with no bit depth restrictions, meaning the color table was holding space for 224 colors. That’s… a lot of colors, roughly 224 of which I wasn’t actually using.

Wednesday, April 8th, 2020

The Cuneiform Tablets of 2015 [PDF]

A 2015 paper by Long Tien Nguyen and Alan Kay with a proposal for digital preservation.

We discuss the problem of running today’s software decades,centuries, or even millennia into the future.

Friday, March 13th, 2020

Rise of the Digital Fonts

A history of typesetting from movable type to variable fonts.

Thursday, February 20th, 2020

“Let us Calculate!”: Leibniz, Llull, and the Computational Imagination – The Public Domain Review

The characteristica universalis and the calculus racionator of Leibniz.

Emma Willard’s Maps of Time

The beautiful 19th century data visualisations of Emma Willard unfold in this immersive piece by Susan Schulten.

Wednesday, January 22nd, 2020

28c3: The Science of Insecurity - YouTube

I understand less than half of this great talk by Meredith L. Patterson, but it ticks all my boxes: Leibniz, Turing, Borges, and Postel’s Law.

(via Tim Berners-Lee)

28c3: The Science of Insecurity

Thursday, November 7th, 2019

Information mesh

Timelines of people, interfaces, technologies and more:

30 years of facts about the World Wide Web.

Tuesday, October 22nd, 2019

Saturday, September 21st, 2019

Going offline with microformats

For the offline page on my website, I’ve been using a mixture of the Cache API and the localStorage API. My service worker script uses the Cache API to store copies of pages for offline retrieval. But I used the localStorage API to store metadata about the page—title, description, and so on. Then, my offline page would rifle through the pages stored in a cache, and retreive the corresponding metadata from localStorage.

It all worked fine, but as soon as I read Remy’s post about the forehead-slappingly brilliant technique he’s using, I knew I’d be switching my code over. Instead of using localStorage—or any other browser API—to store and retrieve metadata, he uses the pages themselves! Using the Cache API, you can examine the contents of the pages you’ve stored, and get at whatever information you need:

I realised I didn’t need to store anything. HTML is the API.

Refactoring the code for my offline page felt good for a couple of reasons. First of all, I was able to remove a dependency—localStorage—and simplify the JavaScript. That always feels good. But the other reason for the warm fuzzies is that I was able to use data instead of metadata.

Many years ago, Cory Doctorow wrote a piece called Metacrap. In it, he enumerates the many issues with metadata—data about data. The source of many problems is when the metadata is stored separately from the data it describes. The data may get updated, without a corresponding update happening to the metadata. Metadata tends to rot because it’s invisible—out of sight and out of mind.

In fact, that’s always been at the heart of one of the core principles behind microformats. Instead of duplicating information—once as data and again as metadata—repurpose the visible data; mark it up so its meta-information is directly attached to the information itself.

So if you have a person’s contact details on a web page, rather than repeating that information somewhere else—in the head of the document, say—you could instead attach some kind of marker to indicate which bits of the visible information are contact details. In the case of microformats, that’s done with class attributes. You can mark up a page that already has your contact information with classes from the h-card microformat.

Here on my website, I’ve marked up my blog posts, articles, and links using the h-entry microformat. These classes explicitly mark up the content to say “this is the title”, “this is the content”, and so on. This makes it easier for other people to repurpose my content. If, for example, I reply to a post on someone else’s website, and ping them with a webmention, they can retrieve my post and know which bit is the title, which bit is the content, and so on.

When I read Remy’s post about using the Cache API to retrieve information directly from cached pages, I knew I wouldn’t have to do much work. Because all of my posts are already marked up with h-entry classes, I could use those hooks to create a nice offline page.

The markup for my offline page looks like this:

<p>Sorry. It looks like the network connection isn’t working right now.</p>
<div id="history">

I’ll populate that “history” div with information from a cache called “pages” that I’ve created using the Cache API in my service worker.

I’m going to use async/await to do this because there are lots of steps that rely on the completion of the step before. “Open this cache, then get the keys of that cache, then loop through the pages, then…” All of those thens would lead to some serious indentation without async/await.

All async functions have to have a name—no anonymous async functions allowed. I’m calling this one listPages, just like Remy is doing. I’m making the listPages function execute immediately:

(async function listPages() {

Now for the code to go inside that immediately-invoked function.

I create an array called browsingHistory that I’ll populate with the data I’ll use for that “history” div.

const browsingHistory = [];

I’m going to be parsing web pages later on, so I’m going to need a DOM parser. I give it the imaginative name of …parser.

const parser = new DOMParser();

Time to open up my “pages” cache. This is the first await statement. When the cache is opened, this promise will resolve and I’ll have access to this cache using the variable …cache (again with the imaginative naming).

const cache = await caches.open('pages');

Now I get the keys of the cache—that’s a list of all the page requests in there. This is the second await. Once the keys have been retrieved, I’ll have a variable that’s got a list of all those pages. You’ll never guess what I’m calling the variable that stores the keys of the cache. That’s right …keys!

const keys = await cache.keys();

Time to get looping. I’m getting each request in the list of keys using a for/of loop:

for (const request of keys) {

Inside the loop, I pull the page out of the cache using the match() method of the Cache API. I’ll store what I get back in a variable called response. As with everything involving the Cache API, this is asynchronous so I need to use the await keyword here.

const response = await cache.match(request);

I’m not interested in the headers of the response. I’m specifically looking for the HTML itself. I can get at that using the text() method. Again, it’s asynchronous and I want this promise to resolve before doing anything else, so I use the await keyword. When the promise resolves, I’ll have a variable called html that contains the body of the response.

const html = await response.text();

Now I can use that DOM parser I created earlier. I’ve got a string of text in the html variable. I can generate a Document Object Model from that string using the parseFromString() method. This isn’t asynchronous so there’s no need for the await keyword.

const dom = parser.parseFromString(html, 'text/html');

Now I’ve got a DOM, which I have creatively stored in a variable called …dom.

I can poke at it using DOM methods like querySelector. I can test to see if this particular page has an h-entry on it by looking for an element with a class attribute containing the value “h-entry”:

if (dom.querySelector('.h-entry h1.p-name') {

In this particular case, I’m also checking to see if the h1 element of the page is the title of the h-entry. That’s so that index pages (like my home page) won’t get past this if statement.

Inside the if statement, I’m going to store the data I retrieve from the DOM. I’ll save the data into an object called …data!

const data = new Object;

Well, the first piece of data isn’t actually in the markup: it’s the URL of the page. I can get that from the request variable in my for loop.

data.url = request.url;

I’m going to store the timestamp for this h-entry. I can get that from the datetime attribute of the time element marked up with a class of dt-published.

data.timestamp = new Date(dom.querySelector('.h-entry .dt-published').getAttribute('datetime'));

While I’m at it, I’m going to grab the human-readable date from the innerText property of that same time.dt-published element.

data.published = dom.querySelector('.h-entry .dt-published').innerText;

The title of the h-entry is in the innerText of the element with a class of p-name.

data.title = dom.querySelector('.h-entry .p-name').innerText;

At this point, I am actually going to use some metacrap instead of the visible h-entry content. I don’t output a description of the post anywhere in the body of the page, but I do put it in the head in a meta element. I’ll grab that now.

data.description = dom.querySelector('meta[name="description"]').getAttribute('content');

Alright. I’ve got a URL, a timestamp, a publication date, a title, and a description, all retrieved from the HTML. I’ll stick all of that data into my browsingHistory array.


My if statement and my for/in loop are finished at this point. Here’s how the whole loop looks:

for (const request of keys) {
  const response = await cache.match(request);
  const html = await response.text();
  const dom = parser.parseFromString(html, 'text/html');
  if (dom.querySelector('.h-entry h1.p-name')) {
    const data = new Object;
    data.url = request.url;
    data.timestamp = new Date(dom.querySelector('.h-entry .dt-published').getAttribute('datetime'));
    data.published = dom.querySelector('.h-entry .dt-published').innerText;
    data.title = dom.querySelector('.h-entry .p-name').innerText;
    data.description = dom.querySelector('meta[name="description"]').getAttribute('content');

That’s the data collection part of the code. Now I’m going to take all that yummy information an output it onto the page.

First of all, I want to make sure that the browsingHistory array isn’t empty. There’s no point going any further if it is.

if (browsingHistory.length) {

Within this if statement, I can do what I want with the data I’ve put into the browsingHistory array.

I’m going to arrange the data by date published. I’m not sure if this is the right thing to do. Maybe it makes more sense to show the pages in the order in which you last visited them. I may end up removing this at some point, but for now, here’s how I sort the browsingHistory array according to the timestamp property of each item within it:

browsingHistory.sort( (a,b) => {
  return b.timestamp - a.timestamp;

Now I’m going to concatenate some strings. This is the string of HTML text that will eventually be put into the “history” div. I’m storing the markup in a string called …markup (my imagination knows no bounds).

let markup = '<p>But you still have something to read:</p>';

I’m going to add a chunk of markup for each item of data.

browsingHistory.forEach( data => {
  markup += `
<h2><a href="${ data.url }">${ data.title }</a></h2>
<p>${ data.description }</p>
<p class="meta">${ data.published }</p>

With my markup assembled, I can now insert it into the “history” part of my offline page. I’m using the handy insertAdjacentHTML() method to do this.

document.getElementById('history').insertAdjacentHTML('beforeend', markup);

Here’s what my finished JavaScript looks like:

(async function listPages() {
  const browsingHistory = [];
  const parser = new DOMParser();
  const cache = await caches.open('pages');
  const keys = await cache.keys();
  for (const request of keys) {
    const response = await cache.match(request);
    const html = await response.text();
    const dom = parser.parseFromString(html, 'text/html');
    if (dom.querySelector('.h-entry h1.p-name')) {
      const data = new Object;
      data.url = request.url;
      data.timestamp = new Date(dom.querySelector('.h-entry .dt-published').getAttribute('datetime'));
      data.published = dom.querySelector('.h-entry .dt-published').innerText;
      data.title = dom.querySelector('.h-entry .p-name').innerText;
      data.description = dom.querySelector('meta[name="description"]').getAttribute('content');
  if (browsingHistory.length) {
    browsingHistory.sort( (a,b) => {
      return b.timestamp - a.timestamp;
    let markup = '<p>But you still have something to read:</p>';
    browsingHistory.forEach( data => {
      markup += `
<h2><a href="${ data.url }">${ data.title }</a></h2>
<p>${ data.description }</p>
<p class="meta">${ data.published }</p>
    document.getElementById('history').insertAdjacentHTML('beforeend', markup);

I’m pretty happy with that. It’s not too long but it’s still quite readable (I hope). It shows that the Cache API and the h-entry microformat are a match made in heaven.

If you’ve got an offline strategy for your website, and you’re using h-entry to mark up your content, feel free to use that code.

If you don’t have an offline strategy for your website, there’s a book for that.

Friday, May 24th, 2019

The Bit Player

Ooh! A documentary on Claude Shannon—exciting!

I just finished reading A Mind At Play, the (very good) biography of Claude Shannon, so this film feels very timely.

Mixing contemporary interviews, archival film, animation and dialogue drawn from interviews conducted with Shannon himself, The Bit Player tells the story of an overlooked genius who revolutionized the world, but never lost his childlike curiosity.