Tags: ice

598

sparkline

Saturday, September 21st, 2019

Going offline with microformats

For the offline page on my website, I’ve been using a mixture of the Cache API and the localStorage API. My service worker script uses the Cache API to store copies of pages for offline retrieval. But I used the localStorage API to store metadata about the page—title, description, and so on. Then, my offline page would rifle through the pages stored in a cache, and retreive the corresponding metadata from localStorage.

It all worked fine, but as soon as I read Remy’s post about the forehead-slappingly brilliant technique he’s using, I knew I’d be switching my code over. Instead of using localStorage—or any other browser API—to store and retrieve metadata, he uses the pages themselves! Using the Cache API, you can examine the contents of the pages you’ve stored, and get at whatever information you need:

I realised I didn’t need to store anything. HTML is the API.

Refactoring the code for my offline page felt good for a couple of reasons. First of all, I was able to remove a dependency—localStorage—and simplify the JavaScript. That always feels good. But the other reason for the warm fuzzies is that I was able to use data instead of metadata.

Many years ago, Cory Doctorow wrote a piece called Metacrap. In it, he enumerates the many issues with metadata—data about data. The source of many problems is when the metadata is stored separately from the data it describes. The data may get updated, without a corresponding update happening to the metadata. Metadata tends to rot because it’s invisible—out of sight and out of mind.

In fact, that’s always been at the heart of one of the core principles behind microformats. Instead of duplicating information—once as data and again as metadata—repurpose the visible data; mark it up so its meta-information is directly attached to the information itself.

So if you have a person’s contact details on a web page, rather than repeating that information somewhere else—in the head of the document, say—you could instead attach some kind of marker to indicate which bits of the visible information are contact details. In the case of microformats, that’s done with class attributes. You can mark up a page that already has your contact information with classes from the h-card microformat.

Here on my website, I’ve marked up my blog posts, articles, and links using the h-entry microformat. These classes explicitly mark up the content to say “this is the title”, “this is the content”, and so on. This makes it easier for other people to repurpose my content. If, for example, I reply to a post on someone else’s website, and ping them with a webmention, they can retrieve my post and know which bit is the title, which bit is the content, and so on.

When I read Remy’s post about using the Cache API to retrieve information directly from cached pages, I knew I wouldn’t have to do much work. Because all of my posts are already marked up with h-entry classes, I could use those hooks to create a nice offline page.

The markup for my offline page looks like this:

<h1>Offline</h1>
<p>Sorry. It looks like the network connection isn’t working right now.</p>
<div id="history">
</div>

I’ll populate that “history” div with information from a cache called “pages” that I’ve created using the Cache API in my service worker.

I’m going to use async/await to do this because there are lots of steps that rely on the completion of the step before. “Open this cache, then get the keys of that cache, then loop through the pages, then…” All of those thens would lead to some serious indentation without async/await.

All async functions have to have a name—no anonymous async functions allowed. I’m calling this one listPages, just like Remy is doing. I’m making the listPages function execute immediately:

(async function listPages() {
...
})();

Now for the code to go inside that immediately-invoked function.

I create an array called browsingHistory that I’ll populate with the data I’ll use for that “history” div.

const browsingHistory = [];

I’m going to be parsing web pages later on, so I’m going to need a DOM parser. I give it the imaginative name of …parser.

const parser = new DOMParser();

Time to open up my “pages” cache. This is the first await statement. When the cache is opened, this promise will resolve and I’ll have access to this cache using the variable …cache (again with the imaginative naming).

const cache = await caches.open('pages');

Now I get the keys of the cache—that’s a list of all the page requests in there. This is the second await. Once the keys have been retrieved, I’ll have a variable that’s got a list of all those pages. You’ll never guess what I’m calling the variable that stores the keys of the cache. That’s right …keys!

const keys = await cache.keys();

Time to get looping. I’m getting each request in the list of keys using a for/of loop:

for (const request of keys) {
...
}

Inside the loop, I pull the page out of the cache using the match() method of the Cache API. I’ll store what I get back in a variable called response. As with everything involving the Cache API, this is asynchronous so I need to use the await keyword here.

const response = await cache.match(request);

I’m not interested in the headers of the response. I’m specifically looking for the HTML itself. I can get at that using the text() method. Again, it’s asynchronous and I want this promise to resolve before doing anything else, so I use the await keyword. When the promise resolves, I’ll have a variable called html that contains the body of the response.

const html = await response.text();

Now I can use that DOM parser I created earlier. I’ve got a string of text in the html variable. I can generate a Document Object Model from that string using the parseFromString() method. This isn’t asynchronous so there’s no need for the await keyword.

const dom = parser.parseFromString(html, 'text/html');

Now I’ve got a DOM, which I have creatively stored in a variable called …dom.

I can poke at it using DOM methods like querySelector. I can test to see if this particular page has an h-entry on it by looking for an element with a class attribute containing the value “h-entry”:

if (dom.querySelector('.h-entry h1.p-name') {
...
}

In this particular case, I’m also checking to see if the h1 element of the page is the title of the h-entry. That’s so that index pages (like my home page) won’t get past this if statement.

Inside the if statement, I’m going to store the data I retrieve from the DOM. I’ll save the data into an object called …data!

const data = new Object;

Well, the first piece of data isn’t actually in the markup: it’s the URL of the page. I can get that from the request variable in my for loop.

data.url = request.url;

I’m going to store the timestamp for this h-entry. I can get that from the datetime attribute of the time element marked up with a class of dt-published.

data.timestamp = new Date(dom.querySelector('.h-entry .dt-published').getAttribute('datetime'));

While I’m at it, I’m going to grab the human-readable date from the innerText property of that same time.dt-published element.

data.published = dom.querySelector('.h-entry .dt-published').innerText;

The title of the h-entry is in the innerText of the element with a class of p-name.

data.title = dom.querySelector('.h-entry .p-name').innerText;

At this point, I am actually going to use some metacrap instead of the visible h-entry content. I don’t output a description of the post anywhere in the body of the page, but I do put it in the head in a meta element. I’ll grab that now.

data.description = dom.querySelector('meta[name="description"]').getAttribute('content');

Alright. I’ve got a URL, a timestamp, a publication date, a title, and a description, all retrieved from the HTML. I’ll stick all of that data into my browsingHistory array.

browsingHistory.push(data);

My if statement and my for/in loop are finished at this point. Here’s how the whole loop looks:

for (const request of keys) {
  const response = await cache.match(request);
  const html = await response.text();
  const dom = parser.parseFromString(html, 'text/html');
  if (dom.querySelector('.h-entry h1.p-name')) {
    const data = new Object;
    data.url = request.url;
    data.timestamp = new Date(dom.querySelector('.h-entry .dt-published').getAttribute('datetime'));
    data.published = dom.querySelector('.h-entry .dt-published').innerText;
    data.title = dom.querySelector('.h-entry .p-name').innerText;
    data.description = dom.querySelector('meta[name="description"]').getAttribute('content');
    browsingHistory.push(data);
  }
}

That’s the data collection part of the code. Now I’m going to take all that yummy information an output it onto the page.

First of all, I want to make sure that the browsingHistory array isn’t empty. There’s no point going any further if it is.

if (browsingHistory.length) {
...
}

Within this if statement, I can do what I want with the data I’ve put into the browsingHistory array.

I’m going to arrange the data by date published. I’m not sure if this is the right thing to do. Maybe it makes more sense to show the pages in the order in which you last visited them. I may end up removing this at some point, but for now, here’s how I sort the browsingHistory array according to the timestamp property of each item within it:

browsingHistory.sort( (a,b) => {
  return b.timestamp - a.timestamp;
});

Now I’m going to concatenate some strings. This is the string of HTML text that will eventually be put into the “history” div. I’m storing the markup in a string called …markup (my imagination knows no bounds).

let markup = '<p>But you still have something to read:</p>';

I’m going to add a chunk of markup for each item of data.

browsingHistory.forEach( data => {
  markup += `
<h2><a href="${ data.url }">${ data.title }</a></h2>
<p>${ data.description }</p>
<p class="meta">${ data.published }</p>
`;
});

With my markup assembled, I can now insert it into the “history” part of my offline page. I’m using the handy insertAdjacentHTML() method to do this.

document.getElementById('history').insertAdjacentHTML('beforeend', markup);

Here’s what my finished JavaScript looks like:

<script>
(async function listPages() {
  const browsingHistory = [];
  const parser = new DOMParser();
  const cache = await caches.open('pages');
  const keys = await cache.keys();
  for (const request of keys) {
    const response = await cache.match(request);
    const html = await response.text();
    const dom = parser.parseFromString(html, 'text/html');
    if (dom.querySelector('.h-entry h1.p-name')) {
      const data = new Object;
      data.url = request.url;
      data.timestamp = new Date(dom.querySelector('.h-entry .dt-published').getAttribute('datetime'));
      data.published = dom.querySelector('.h-entry .dt-published').innerText;
      data.title = dom.querySelector('.h-entry .p-name').innerText;
      data.description = dom.querySelector('meta[name="description"]').getAttribute('content');
      browsingHistory.push(data);
    }
  }
  if (browsingHistory.length) {
    browsingHistory.sort( (a,b) => {
      return b.timestamp - a.timestamp;
    });
    let markup = '<p>But you still have something to read:</p>';
    browsingHistory.forEach( data => {
      markup += `
<h2><a href="${ data.url }">${ data.title }</a></h2>
<p>${ data.description }</p>
<p class="meta">${ data.published }</p>
`;
    });
    document.getElementById('history').insertAdjacentHTML('beforeend', markup);
  }
})();
</script>

I’m pretty happy with that. It’s not too long but it’s still quite readable (I hope). It shows that the Cache API and the h-entry microformat are a match made in heaven.

If you’ve got an offline strategy for your website, and you’re using h-entry to mark up your content, feel free to use that code.

If you don’t have an offline strategy for your website, there’s a book for that.

Friday, September 20th, 2019

Frank Chimero · Tweenage Computing

Frank yearns for just-in-time computing:

With each year that goes by, it feels like less and less is happening on the device itself. And the longer our work maintains its current form (writing documents, updating spreadsheets, using web apps, responding to emails, monitoring chat, drawing rectangles), the more unnecessary high-end computing seems. Who needs multiple computers when I only need half of one?

Saturday, September 7th, 2019

samuelgoto/sms-receiver: phone number verification

An interesting proposal to allow websites to detect certain SMS messages. The UX implications are fascinating.

Friday, September 6th, 2019

Getting started

I got an email recently from a young person looking to get into web development. They wanted to know what languages they should start with, whether they should a Mac or a Windows PC, and what some places to learn from.

I wrote back, saying this about languages:

For web development, start with HTML, then CSS, then JavaScript (and don’t move on to JavaScript too quickly—really get to grips with HTML and CSS first).

And this is what I said about hardware and software:

It doesn’t matter whether you use a Mac or a Windows PC, as long as you’ve got an internet connection, some web browsers (Chrome, Firefox, for example) and a text editor. There are some very good free text editors available for Mac and PC:

For resources, I had a trawl through links I’ve tagged with “learning” and “html” and sent along some links to free online tutorials:

After sending that email, I figured that this list might be useful to anyone else looking to start out in web development. If you know of anyone in that situation, I hope this list might help.

Offline listings

This is brilliant technique by Remy!

If you’ve got a custom offline page that lists previously-visited pages (like I do on my site), you don’t have to choose between localStorage or IndexedDB—you can read the metadata straight from the HTML of the cached pages instead!

This seems forehead-smackingly obvious in hindsight. I’m totally stealing this.

Tuesday, September 3rd, 2019

Bottom Navigation Pattern On Mobile Web Pages: A Better Alternative? — Smashing Magazine

Making the case for moving your navigation to the bottom of the screen on mobile:

Phones are getting bigger, and some parts of the screen are easier to interact with than others. Having the hamburger menu at the top provides too big of an interaction cost, and we have a large number of amazing mobile app designs that utilize the bottom part of the screen. Maybe it’s time for the web design world to start using these ideas on websites as well?

Tuesday, August 27th, 2019

Voice User Interface Design by Cheryl Platz

Cheryl Platz is speaking at An Event Apart Chicago. Her inaugural An Event Apart presentation is all about voice interfaces, and I’m going to attempt to liveblog it…

Why make a voice interface?

Successful voice interfaces aren’t necessarily solving new problems. They’re used to solve problems that other devices have already solved. Think about kitchen timers. There are lots of ways to set a timer. Your oven might have one. Your phone has one. Why use a $200 device to solve this mundane problem? Same goes for listening to music, news, and weather.

People are using voice interfaces for solving ordinary problems. Why? Context matters. If you’re carrying a toddler, then setting a kitchen timer can be tricky so a voice-activated timer is quite appealing. But why is voice is happening now?

Humans have been developing the art of conversation for thousands of years. It’s one of the first skills we learn. It’s deeply instinctual. Most humans use speach instinctively every day. You can’t necessarily say that about using a keyboard or a mouse.

Voice-based user interfaces are not new. Not just the idea—which we’ve seen in Star Trek—but the actual implementation. Bell Labs had Audrey back in 1952. It recognised ten words—the digits zero through nine. Why did it take so long to get to Alexa?

In the late 70s, DARPA issued a challenge to create a voice-activated system. Carnagie Mellon came up with Harpy (with a thousand word grammar). But none of the solutions could respond in real time. In conversation, we expect a break of no more than 200 or 300 milliseconds.

In the 1980s, computing power couldn’t keep up with voice technology, so progress kind of stopped. Time passed. Things finally started to catch up in the 90s with things like Dragon Naturally Speaking. But that was still about vocabulary, not grammar. By the 2000s, small grammars were starting to show up—starting an X-Box or pausing Netflix. In 2008, Google Voice Search arrived on the iPhone and natural language interaction began to arrive.

What makes natural language interactions so special? It requires minimal training because it uses the conversational muscles we’ve been working for a lifetime. It unlocks the ability to have more forgiving, less robotic conversations with devices. There might be ten different ways to set a timer.

Natural language interactions can also free us from “screen magnetism”—that tendency to stay on a device even when our original task is complete. Voice also enables fast and forgiving searches of huge catalogues without time spent typing or browsing. You can pick a needle straight out of a haystack.

Natural language interactions are excellent for older customers. These interfaces don’t intimidate people without dexterity, vision, or digital experience. Voice input often leads to more inclusive experiences. Many customers with visual or physical disabilities can’t use traditional graphical interfaces. Voice experiences throw open the door of opportunity for some people. However, voice experience can exclude people with speech difficulties.

Making the case for voice interfaces

There’s a misconception that you need to work at Amazon, Google, or Apple to work on a voice interface, or at least that you need to have a big product team. But Cheryl was able to make her first Alexa “skill” in a week. If you’re a web developer, you’re good to go. Your voice “interaction model” is just JSON.

How do you get your product team on board? Find the customers (and situations) you might have excluded with traditional input. Tell the stories of people whose hands are full, or who are vision impaired. You can also point to the adoption rate numbers for smart speakers.

You’ll need to show your scenario in context. Otherwise people will ask, “why can’t we just build an app for this?” Conduct research to demonstrate the appeal of a voice interface. Storyboarding is very useful for visualising the context of use and highlighting existing pain points.

Getting started with voice interfaces

You’ve got to understand how the technology works in order to adapt to how it fails. Here are a few basic concepts.

Utterance. A word, phrase, or sentence spoken by a customer. This is the true form of what the customer provides.

Intent. This is the meaning behind a customer’s request. This is an important distinction because one intent could have thousands of different utterances.

Prompt. The text of a system response that will be provided to a customer. The audio version of a prompt, if needed, is generated separately using text to speech.

Grammar. A finite set of expected utterances. It’s a list. Usually, each entry in a grammar is paired with an intent. Many interfaces start out as being simple grammars before moving on to a machine-learning model later once the concept has been proven.

Here’s the general idea with “artificial intelligence”…

There’s a human with a core intent to do something in the real world, like knowing when the cookies in the oven are done. This is translated into an intent like, “set a 15 minute timer.” That’s the utterance that’s translated into a string. But it hasn’t yet been parsed as language. That string is passed into a natural language understanding system. What comes is a data structure that represents the customers goal e.g. intent=timer; duration=15 minutes. That’s sent to the business logic where a timer is actually step. For a good voice interface, you also want to send back a response e.g. “setting timer for 15 minutes starting now.”

That seems simple enough, right? What’s so hard about designing for voice?

Natural language interfaces are a form of artifical intelligence so it’s not deterministic. There’s a lot of ruling out false positives. Unlike graphical interfaces, voice interfaces are driven by probability.

How do you turn a sound wave into an understandable instruction? It’s a lot like teaching a child. You feed a lot of data into a statistical model. That’s how machine learning works. It’s a probability game. That’s where it gets interesting for design—given a bunch of possible options, we need to use context to zero in on the most correct choice. This is where confidence ratings come in: the system will return the probability that a response is correct. Effectively, the system is telling you how sure or not it is about possible results. If the customer makes a request in an unusual or unexpected way, our system is likely to guess incorrectly. That’s because the system is being given something new.

Designing a conversation is relatively straightforward. But 80% of your voice design time will be spent designing for what happens when things go wrong. In voice recognition, edge cases are front and centre.

Here’s another challenge. Interaction with most voice interfaces is part conversation, part performance. Most interactions are not private.

Humans don’t distinguish digital speech fom human speech. That means these devices are intrinsically social. Our brains our wired to try to extract social information, even form digital speech. See, for example, why it’s such a big question as to what gender a voice interface has.

Delivering a voice interface

Storyboards help depict the context of use. Sample dialogues are your new wireframes. These are little scripts that not only cover the happy path, but also your edge case. Then you reverse engineer from there.

Flow diagrams communicate customer states, but don’t use the actual text in them.

Prompt lists are your final deliverable.

Functional prototypes are really important for voice interfaces. You’ll learn the real way that customers will ask for things.

If you build a working prototype, you’ll be building two things: a natural language interaction model (often a JSON file) and custom business logic (in a programming language).

Eventually voice design will become a core competency, much like mobile, which was once separate.

Ask yourself what tasks your customers complete on your site that feel clunkly. Remember that voice desing is almost never about new scenarious. Start your journey into voice interfaces by tackling old problems in new, more inclusive ways.

May the voice be with you!

Monday, August 26th, 2019

A Walk In Hong Kong (Idle Words)

Maciej goes marching.

The protests are intentionally decentralized, using a jury-rigged combination of a popular message board, the group chat app Telegram, and in-person huddles at the protests.

This sounds like it shouldn’t possibly work, but the protesters are too young to know that it can’t work, so it works.

Opening up the AMP cache

I have a proposal that I think might alleviate some of the animosity around Google AMP. You can jump straight to the proposal or get some of the back story first…

The AMP format

Google AMP is exactly the kind of framework I’d like to get behind. Unlike most front-end frameworks, its components take a declarative approach—no knowledge of JavaScript required. I think Lea’s excellent Mavo is the only other major framework that takes this inclusive approach. All the configuration happens in markup, and all the styling happens in CSS. Excellent!

But I cannot get behind AMP.

Instead of competing on its own merits, AMP is unfairly propped up by the search engine of its parent company, Google. That makes it very hard to evaluate whether AMP is being used on its own merits. Instead, the evidence suggests that most publishers of AMP pages are doing so because they feel they have to, rather than because they want to. That’s a real shame, because as a library of web components, AMP seems pretty good. But there’s just no way to evaluate AMP-the-format without taking into account AMP-the-ecosystem.

The AMP ecosystem

Google AMP ostensibly exists to make the web faster. Initially the focus was specifically on mobile performance, but that distinction has since fallen by the wayside. The idea is that by using AMP’s web components, your pages will be speedy. Though, as Andy Davies points out, this isn’t always the case:

This is where I get confused… https://independent.co.uk only have an AMP site yet it’s performance is awful from a user perspective - isn’t AMP supposed to prevent this?

See also: Google AMP lowered our page speed, and there’s no choice but to use it:

According to Google’s own Page Speed Insights audit (which Google recommends to check your performance), the AMP version of articles got an average performance score of 87. The non-AMP versions? 95.

Publishers who already have fast web pages—like The Guardian—are still compelled to make AMP versions of their stories because of the search benefits reserved for AMP. As Terence Eden reported from a meeting of the AMP advisory committee:

We heard, several times, that publishers don’t like AMP. They feel forced to use it because otherwise they don’t get into Google’s news carousel — right at the top of the search results.

Some people felt aggrieved that all the hard work they’d done to speed up their sites was for nothing.

The Google AMP team are at pains to point out that AMP is not a ranking factor in search. That’s true. But it is unfairly privileged in other ways. Only AMP pages can appear in the Top Stories carousel …which appears above any other search results. As I’ve said before:

Now, if you were to ask any right-thinking person whether they think having their page appear right at the top of a list of search results would be considered preferential treatment, I think they would say hell, yes! This is the only reason why The Guardian, for instance, even have AMP versions of their content—it’s not for the performance benefits (their non-AMP pages are faster); it’s for that prime real estate in the carousel.

From A letter about Google AMP:

Content that “opts in” to AMP and the associated hosting within Google’s domain is granted preferential search promotion, including (for news articles) a position above all other results.

That’s not the only way that AMP pages get preferential treatment. It turns out that the secret to the speed of AMP pages isn’t the web components. It’s the prerendering.

The AMP cache

If you’ve ever seen an AMP page in a list of search results, you’ll have noticed the little lightning icon. If you’ve ever tapped on that search result, you’ll have noticed that the page loads blazingly fast!

That’s not down to AMP-the-format, alas. That’s down to the fact that the page has been prerendered by Google before you even went to it. If any page were prerendered that way, it would load blazingly fast. But currently, this privilege is reserved for AMP pages only.

If, after tapping through to that AMP page, you looked at the address bar of your browser, you might have noticed something odd. Even though you might have thought you were visiting The Washington Post, or The New York Times, the URL of the (blazingly fast) page you’re looking at is still under Google’s domain. That’s because Google hosts any AMP pages that it prerenders.

Google calls this “the AMP cache”, but it would be better described as “AMP hosting”. The web page sent down the wire is hosted on Google’s domain.

Here’s that AMP letter again:

When a user navigates from Google to a piece of content Google has recommended, they are, unwittingly, remaining within Google’s ecosystem.

Through gritted teeth, I will refer to this as “the AMP cache”, because that’s what everyone else calls it. But make no mistake, Google is hosting—not caching—these pages.

But why host the pages on a Google domain? Why not prerender the original URLs?

Prerendering and privacy

Scott summed up the situation with AMP nicely:

The pitch I think site owners are hearing is: let us host your pages on our domain and we’ll promote them in search results AND preload them so they feel “instant.” To opt-in, build pages using this component syntax.

But perhaps we could de-couple the AMP format from the AMP cache.

That’s what Terence suggests:

My recommendation is that Google stop requiring that organisations use Google’s proprietary mark-up in order to benefit from Google’s promotion.

The AMP letter, too:

Instead of granting premium placement in search results only to AMP, provide the same perks to all pages that meet an objective, neutral performance criterion such as Speed Index.

Scott reiterates:

It’s been said before but it would be so good for the web if pages with a Lighthouse score over say, 90 could get into that top search result area, even if they’re not built using Google’s AMP framework. Feels wrong to have to rebuild/reproduce an already-fast site just for SEO.

This was also what I was calling for. But then Malte pointed out something that stumped me. Privacy.

Here’s the problem…

Let’s say Google do indeed prerender already-fast pages when they’re listed in search results. You, a search user, type something into Google. A list of results come back. Google begins pre-rendering some of them. But you don’t end up clicking through to those pages. Nonetheless, the servers those pages are hosted on have received a GET request coming from a Google search. Those publishers now know that a particular (cookied?) user could have clicked through to their site. That’s very different from knowing when someone has actually arrived at a particular site.

And that’s why Google host all the AMP pages that they prerender. Given the privacy implications of prerendering non-Google URLs, I must admit that I see their point.

Still, it’s a real shame to miss out on the speed benefit of prerendering:

Prerendering AMP documents leads to substantial improvements in page load times. Page load time can be measured in different ways, but they consistently show that prerendering lets users see the content they want faster. For now, only AMP can provide the privacy preserving prerendering needed for this speed benefit.

A modest proposal

Why is Google’s AMP cache just for AMP pages? (Y’know, apart from the obvious answer that it’s in the name.)

What if Google were allowed to host non-AMP pages? Google search could then prerender those pages just like it currently does for AMP pages. There would be no privacy leaks; everything would happen on the same domain—google.com or ampproject.org or whatever—just as currently happens with AMP pages.

Don’t get me wrong: I’m not suggesting that Google should make a 1:1 model of the web just to prerender search results. I think that the implementation would need to have two important requirements:

  1. Hosting needs to be opt-in.
  2. Only fast pages should be prerendered.

Opting in

Currently, by publishing a page using the AMP format, publishers give implicit approval to Google to host that page on Google’s servers and serve up this Google-hosted version from search results. This has always struck me as being legally iffy. I’ve looked in the AMP documentation to try to find any explicit granting of hosting permission (e.g. “By linking to this JavaScript file, you hereby give Google the right to serve up our copies of your content.”), but no luck. So even with the current situation, I think a clear opt-in for hosting would be beneficial.

This could be a meta element. Maybe something like:

<meta name="caches-allowed" content="google">

This would have the nice benefit of allowing comma-separated values:

<meta name="caches-allowed" content="google, yandex">

(The name is just a strawman, by the way—I’m not suggesting that this is what the final implementation would actually look like.)

If not a meta element, then perhaps this could be part of robots.txt? Although my feeling is that this needs to happen on a document-by-document basis rather than site-wide.

Many people will, quite rightly, never want Google—or anyone else—to host and serve up their content. That’s why it’s so important that this behaviour needs to be opt-in. It’s kind of appalling that the current hosting of AMP pages is opt-in-by-proxy-sort-of.

Criteria for prerendering

Which pages should be blessed with hosting and prerendering? The fast ones. That’s sorta the whole point of AMP. But right now, there’s a lot of resentment by people with already-fast websites who quite rightly feel they shouldn’t have to use the AMP format to benefit from the AMP ecosystem.

Page speed is already a ranking factor. It doesn’t seem like too much of a stretch to extend its benefits to hosting and prerendering. As mentioned above, there are already a few possible metrics to use:

  • Page Speed Index
  • Lighthouse
  • Web Page Test

Ah, but what if a page has good score when it’s indexed, but then gets worse afterwards? Not a problem! The version of the page that’s measured is the same version of the page that gets hosted and prerendered. Google can confidently say “This page is fast!” After all, they’re the ones serving up the page.

That does raise the question of how often Google should check back with the original URL to see if it has changed/worsened/improved. The answer to that question is however long it currently takes to check back in on AMP pages:

Each time a user accesses AMP content from the cache, the content is automatically updated, and the updated version is served to the next user once the content has been cached.

Issues

This proposal does not solve the problem with the address bar. You’d still find yourself looking at a page from The Washington Post or The New York Times (or adactio.com) but seeing a completely different URL in your browser. That’s not good, for all the reasons outlined in the AMP letter.

In fact, this proposal could potentially make the situation worse. It would allow even more sites to be impersonated by Google’s URLs. Where currently only AMP pages are bad actors in terms of URL confusion, opening up the AMP cache would allow equal opportunity URL confusion.

What I’m suggesting is definitely not a long-term solution. The long-term solutions currently being investigated are technically tricky and will take quite a while to come to fruition—web packages and signed exchanges. In the meantime, what I’m proposing is a stopgap solution that’s technically a lot simpler. But it won’t solve all the problems with AMP.

This proposal solves one problem—AMP pages being unfairly privileged in search results—but does nothing to solve the other, perhaps more serious problem: the erosion of site identity.

Measuring

Currently, Google can assess whether a page should be hosted and prerendered by checking to see if it’s a valid AMP page. That test would need to be widened to include a different measurement of performance, but those measurements already exist.

I can see how this assessment might not be as quick as checking for AMP validity. That might affect whether non-AMP pages could be measured quickly enough to end up in the Top Stories carousel, which is, by its nature, time-sensitive. But search results are not necessarily as time-sensitive. Let’s start there.

Assets

Currently, AMP pages can be prerendered without fetching anything other than the markup of the AMP page itself. All the CSS is inline. There are no initial requests for other kinds of content like images. That’s because there are no img elements on the page: authors must use amp-img instead. The image itself isn’t loaded until the user is on the page.

If the AMP cache were to be opened up to non-AMP pages, then any content required for prerendering would also need to be hosted on that same domain. Otherwise, there’s privacy leakage.

This definitely introduces an extra level of complexity. Paths to assets within the markup might need to be re-written to point to the Google-hosted equivalents. There would almost certainly need to be a limit on the number of assets allowed. Though, for performance, that’s no bad thing.

Make no mistake, figuring out what to do about assets—style sheets, scripts, and images—is very challenging indeed. Luckily, there are very smart people on the Google AMP team. If that brainpower were to focus on this problem, I am confident they could solve it.

Summary

  1. Prerendering of non-Google URLs is problematic for privacy reasons, so Google needs to be able to host pages in order to prerender them.
  2. Currently, that’s only done for pages using the AMP format.
  3. The AMP cache—and with it, prerendering—should be decoupled from the AMP format, and opened up to other fast web pages.

There will be technical challenges, but hopefully nothing insurmountable.

I honestly can’t see what Google have to lose here. If their goal is genuinely to reward fast pages, then opening up their AMP cache to fast non-AMP pages will actively encourage people to make fast web pages (without having to switch over to the AMP format).

I’ve deliberately kept the details vague—what the opt-in should look like; what the speed measurement should be; how to handle assets—I’m sure smarter folks than me can figure that stuff out.

I would really like to know what other people think about this proposal. Obviously, I’d love to hear from members of the Google AMP team. But I’d also love to hear from publishers. And I’d very much like to know what people in the web performance community think about this. (Write a blog post and send me a webmention.)

What am I missing here? What haven’t I thought of? What are the potential pitfalls (and are they any worse than the current acrimonious situation with Google AMP)?

I would really love it if someone with a fast website were in a position to say, “Hey Google, I’m giving you permission to host this page so that it can be prerendered.”

I would really love it if someone with a slow website could say, “Oh, shit! We’d better make our existing website faster or Google won’t host our pages for prerendering.”

And I would dearly love to finally be able to embrace AMP-the-format with a clear conscience. But as long as prerendering is joined at the hip to the AMP format, the injustice of the situation only harms the AMP project.

Google, open up the AMP cache.

Thursday, August 1st, 2019

Navigation preloads in service workers

There’s a feature in service workers called navigation preloads. It’s relatively recent, so it isn’t supported in every browser, but it’s still well worth using.

Here’s the problem it solves…

If someone makes a return visit to your site, and the service worker you installed on their machine isn’t active yet, the service worker boots up, and then executes its instructions. If those instructions say “fetch the page from the network”, then you’re basically telling the browser to do what it would’ve done anyway if there were no service worker installed. The only difference is that there’s been a slight delay because the service worker had to boot up first.

  1. The service worker activates.
  2. The service worker fetches the file.
  3. The service worker does something with the response.

It’s not a massive performance hit, but it’s still a bit annoying. It would be better if the service worker could boot up and still be requesting the page at the same time, like it would do if no service worker were present. That’s where navigation preloads come in.

  1. The service worker activates while simultaneously requesting the file.
  2. The service worker does something with the response.

Navigation preloads—like the name suggests—are only initiated when someone navigates to a URL on your site, either by following a link, or a bookmark, or by typing a URL directly into a browser. Navigation preloads don’t apply to requests made by a web page for things like images, style sheets, and scripts. By the time a request is made for one of those, the service worker is already up and running.

To enable navigation preloads, call the enable() method on registration.navigationPreload during the activate event in your service worker script. But first do a little feature detection to make sure registration.navigationPreload exists in this browser:

if (registration.navigationPreload) {
  addEventListener('activate', activateEvent => {
    activateEvent.waitUntil(
      registration.navigationPreload.enable()
    );
  });
}

If you’ve already got event listeners on the activate event, that’s absolutely fine: addEventListener isn’t exclusive—you can use it to assign multiple tasks to the same event.

Now you need to make use of navigation preloads when you’re responding to fetch events. So if your strategy is to look in the cache first, there’s probably no point enabling navigation preloads. But if your default strategy is to fetch a page from the network, this will help.

Let’s say your current strategy for handling page requests looks like this:

addEventListener('fetch', fetchEvent => {
  const request = fetchEvent.request;
  if (request.headers.get('Accept').includes('text/html')) {
    fetchEvent.respondWith(
      fetch(request)
      .then( responseFromFetch => {
        // maybe cache this response for later here.
        return responseFromFetch;
      })
      .catch( fetchError => {
        return caches.match(request)
        .then( responseFromCache => {
          return responseFromCache || caches.match('/offline');
        });
      })
    );
  }
});

That’s a fairly standard strategy: try the network first; if that doesn’t work, try the cache; as a last resort, show an offline page.

It’s that first step (“try the network first”) that can benefit from navigation preloads. If a preload request is already in flight, you’ll want to use that instead of firing off a new fetch request. Otherwise you’re making two requests for the same file.

To find out if a preload request is underway, you can check for the existence of the preloadResponse promise, which will be made available as a property of the fetch event you’re handling:

fetchEvent.preloadResponse

If that exists, you’ll want to use it instead of fetch(request).

if (fetchEvent.preloadResponse) {
  // do something with fetchEvent.preloadResponse
} else {
  // do something with fetch(request)
}

You could structure your code like this:

addEventListener('fetch', fetchEvent => {
  const request = fetchEvent.request;
  if (request.headers.get('Accept').includes('text/html')) {
    if (fetchEvent.preloadResponse) {
      fetchEvent.respondWith(
        fetchEvent.preloadResponse
        .then( responseFromPreload => {
          // maybe cache this response for later here.
          return responseFromPreload;
        })
        .catch( preloadError => {
          return caches.match(request)
          .then( responseFromCache => {
            return responseFromCache || caches.match('/offline');
          });
        })
      );
    } else {
      fetchEvent.respondWith(
        fetch(request)
        .then( responseFromFetch => {
          // maybe cache this response for later here.
          return responseFromFetch;
        })
        .catch( fetchError => {
          return caches.match(request)
          .then( responseFromCache => {
            return responseFromCache || caches.match('/offline');
          });
        })
      );
    }
  }
});

But that’s not very DRY. Your logic is identical, regardless of whether the response is coming from fetch(request) or from fetchEvent.preloadResponse. It would be better if you could minimise the amount of duplication.

One way of doing that is to abstract away the promise you’re going to use into a variable. Let’s call it retrieve. If a preload is underway, we’ll assign it to that variable:

let retrieve;
if (fetchEvent.preloadResponse) {
  retrieve = fetchEvent.preloadResponse;
}

If there is no preload happening (or this browser doesn’t support it), assign a regular fetch request to the retrieve variable:

let retrieve;
if (fetchEvent.preloadResponse) {
  retrieve = fetchEvent.preloadResponse;
} else {
  retrieve = fetch(request);
}

If you like, you can squash that into a ternary operator:

const retrieve = fetchEvent.preloadResponse ? fetchEvent.preloadResponse : fetch(request);

Use whichever syntax you find more readable.

Now you can apply the same logic, regardless of whether retrieve is a preload navigation or a fetch request:

addEventListener('fetch', fetchEvent => {
  const request = fetchEvent.request;
  if (request.headers.get('Accept').includes('text/html')) {
    const retrieve = fetchEvent.preloadResponse ? fetchEvent.preloadResponse : fetch(request);
    fetchEvent.respondWith(
      retrieve
      .then( responseFromRetrieve => {
        // maybe cache this response for later here.
       return responseFromRetrieve;
      })
      .catch( fetchError => {
        return caches.match(request)
        .then( responseFromCache => {
          return responseFromCache || caches.match('/offline');
        });
      })
    );
  }
});

I think that’s the least invasive way to update your existing service worker script to take advantage of navigation preloads.

Like I said, preload navigations can give a bit of a performance boost if you’re using a network-first strategy. That’s what I’m doing here on adactio.com and on thesession.org so I’ve updated their service workers to take advantage of navigation preloads. But on Resilient Web Design, which uses a cache-first strategy, there wouldn’t be much point enabling navigation preloads.

Jeff Posnick made this point in his write-up of bringing service workers to Google search:

Adding a service worker to your web app means inserting an additional piece of JavaScript that needs to be loaded and executed before your web app gets responses to its requests. If those responses end up coming from a local cache rather than from the network, then the overhead of running the service worker is usually negligible in comparison to the performance win from going cache-first. But if you know that your service worker always has to consult the network when handling navigation requests, using navigation preload is a crucial performance win.

Oh, and those browsers that don’t yet support navigation preloads? No problem. It’s a progressive enhancement. Everything still works just like it did before. And having a service worker on your site in the first place is itself a progressive enhancement. So enabling navigation preloads is like a progressive enhancement within a progressive enhancement. It’s progressive enhancements all the way down!

By the way, if all of this service worker stuff sounds like gibberish, but you wish you understood it, I think my book, Going Offline, will prove quite valuable.

Wednesday, July 3rd, 2019

You are not connected to the Internet

This is a very cute offline page. Ali Spittel has written up how it was made too.

Tuesday, July 2nd, 2019

The trimCache function in Going Offline …again

It seems that some code that I wrote in Going Offline is haunted. It’s the trimCache function.

First, there was the issue of a typo. Or maybe it’s more of a brainfart than a typo, but either way, there’s a mistake in the syntax that was published in the book.

Now it turns out that there’s also a problem with my logic.

To recap, this is a function that takes two arguments: the name of a cache, and the maximum number of items that cache should hold.

function trimCache(cacheName, maxItems) {

First, we open up the cache:

caches.open(cacheName)
.then( cache => {

Then, we get the items (keys) in that cache:

cache.keys()
.then(keys => {

Now we compare the number of items (keys.length) to the maximum number of items allowed:

if (keys.length > maxItems) {

If there are too many items, delete the first item in the cache—that should be the oldest item:

cache.delete(keys[0])

And then run the function again:

.then(
    trimCache(cacheName, maxItems)
);

A-ha! See the problem?

Neither did I.

It turns out that, even though I’m using then, the function will be invoked immediately, instead of waiting until the first item has been deleted.

Trys helped me understand what was going on by making a useful analogy. You know when you use setTimeout, you can’t put a function—complete with parentheses—as the first argument?

window.setTimeout(doSomething(someValue), 1000);

In that example, doSomething(someValue) will be invoked immediately—not after 1000 milliseconds. Instead, you need to create an anonymous function like this:

window.setTimeout( function() {
    doSomething(someValue)
}, 1000);

Well, it’s the same in my trimCache function. Instead of this:

cache.delete(keys[0])
.then(
    trimCache(cacheName, maxItems)
);

I need to do this:

cache.delete(keys[0])
.then( function() {
    trimCache(cacheName, maxItems)
});

Or, if you prefer the more modern arrow function syntax:

cache.delete(keys[0])
.then( () => {
    trimCache(cacheName, maxItems)
});

Either way, I have to wrap the recursive function call in an anonymous function.

Here’s a gist with the updated trimCache function.

What’s annoying is that this mistake wasn’t throwing an error. Instead, it was causing a performance problem. I’m using this pattern right here on my own site, and whenever my cache of pages or images gets too big, the trimCaches function would get called …and then wouldn’t stop running.

I’m very glad that—witht the help of Trys at last week’s Homebrew Website Club Brighton—I was finally able to get to the bottom of this. If you’re using the trimCache function in your service worker, please update the code accordingly.

Management regrets the error.

Monday, June 24th, 2019

Am I cached or not?

When I was writing about the lie-fi strategy I’ve added to adactio.com, I finished with this thought:

What I’d really like is some way to know—on the client side—whether or not the currently-loaded page came from a cache or from a network. Then I could add some kind of interface element that says, “Hey, this page might be stale—click here if you want to check for a fresher version.”

Trys heard my plea, and came up with a very clever technique to alter the HTML of a page when it’s put into a cache.

It’s a function that reads the response body stream in, returning a new stream. Whilst reading the stream, it searches for the character codes that make up: <html. If it finds them, it tacks on a data-cached attribute.

Nice!

But then I was discussing this issue with Tantek and Aaron late one night after Indie Web Camp Düsseldorf. I realised that I might have another potential solution that doesn’t involve the service worker at all.

Caveat: this will only work for pages that have some kind of server-side generation. This won’t work for static sites.

In my case, pages are generated by PHP. I’m not doing a database lookup every time you request a page—I’ve got a server-side cache of posts, for example—but there is a little bit of assembly done for every request: get the header from here; get the main content from over there; get the footer; put them all together into a single page and serve that up.

This means I can add a timestamp to the page (using PHP). I can mark the moment that it was served up. Then I can use JavaScript on the client side to compare that timestamp to the current time.

I’ve published the code as a gist.

In a script element on each page, I have this bit of coducken:

var serverTimestamp = <?php echo time(); ?>;

Now the JavaScript variable serverTimestamp holds the timestamp that the page was generated. When the page is put in the cache, this won’t change. This number should be the number of seconds since January 1st, 1970 in the UTC timezone (that’s what my server’s timezone is set to).

Starting with JavaScript’s Date object, I use a caravan of methods like toUTCString() and getTime() to end up with a variable called clientTimestamp. This will give the current number of seconds since January 1st, 1970, regardless of whether the page is coming from the server or from the cache.

var localDate = new Date();
var localUTCString = localDate.toUTCString();
var UTCDate = new Date(localUTCString);
var clientTimestamp = UTCDate.getTime() / 1000;

Then I compare the two and see if there’s a discrepency greater than five minutes:

if (clientTimestamp - serverTimestamp > (60 * 5))

If there is, then I inject some markup into the page, telling the reader that this page might be stale:

document.querySelector('main').insertAdjacentHTML('afterbegin',`
  <p class="feedback">
    <button onclick="this.parentNode.remove()">dismiss</button>
    This page might be out of date. You can try <a href="javascript:window.location=window.location.href">refreshing</a>.
  </p>
`);

The reader has the option to refresh the page or dismiss the message.

This page might be out of date. You can try refreshing.

It’s not foolproof by any means. If the visitor’s computer has their clock set weirdly, then the comparison might return a false positive every time. Still, I thought that using UTC might be a safer bet.

All in all, I think this is a pretty good method for detecting if a page is being served from a cache. Remember, the goal here is not to determine if the user is offline—for that, there’s navigator.onLine.

The upshot is this: if you visit my site with a crappy internet connection (lie-fi), then after three seconds you may be served with a cached version of the page you’re requesting (if you visited that page previously). If that happens, you’ll now also be presented with a little message telling you that the page isn’t fresh. Then it’s up to you whether you want to have another go.

I like the way that this puts control back into the hands of the user.

Sunday, June 23rd, 2019

Julio Biason .Net 4.0 - Things I Learnt The Hard Way (in 30 Years of Software Development)

Lots and lots of programming advice. I can’t attest to the veracity and efficacy of all of it, but this really rang true:

If you have no idea how to start, describe the flow of the application in high level, pure English/your language first. Then fill the spaces between comments with the code.

And this:

Blogging about your stupid solution is still better than being quiet.

You may feel “I’m not start enough to talk about this” or “This must be so stupid I shouldn’t talk about it”.

Create a blog. Post about your stupid solutions.

Sunday, June 16th, 2019

When should you be using Web Workers? — DasSur.ma

Although this piece is ostensibly about why we should be using web workers more, there’s a much, much bigger point about the growing power gap between the devices we developers use and the typical device used by the rest of the planet.

While we are getting faster flagship phones every cycle, the vast majority of people can’t afford these. The more affordable phones are stuck in the past and have highly fluctuating performance metrics. These low-end phones will mostly likely be used by the massive number of people coming online in the next couple of years. The gap between the fastest and the slowest phone is getting wider, and the median is going down.

Monday, June 10th, 2019

Making QR codes with cloud functions • tommorris.org

Tom makes an endpoint for generating QR codes so you don’t have to rely on the Google Charts API.

He also provides a good definition of “serverless”:

Now, serverless is a very silly buzzword dreamed up by someone from the consultant class who love coming up with terrible names, so I promise I won’t use it any further. Your code obviously run on a server. It just means it runs on a server someone else manages.

Amazon call it a ‘Lambda Function’. Google call it a ‘Cloud Function’. Microsoft Azure call it simply a ‘Function’. But none of those are very descriptive, because, well, anyone who writes any kind of programming language generally writes functions pretty much all the time in much the same way as anyone who writes English writes paragraphs, and we don’t call our blogging software “Cloud Paragraphs”. (Someone will now, I’m guessing.)

Friday, June 7th, 2019

Jeremy Keith: Going offline - YouTube

Here’s the opening keynote I gave at Frontend United in Utrecht a few weeks back.

Thursday, May 9th, 2019

Distinguishing cached vs. network HTML requests in a Service Worker | Trys Mudford

Less than 24 hours after I put the call out for a solution to this gnarly service worker challenge, Trys has come up with a solution.

Wednesday, May 8th, 2019

Timing out

Service workers are great for creating a good user experience when someone is offline. Heck, the book I wrote about service workers is literally called Going Offline.

But in some ways, the offline experience is relatively easy to handle. It’s a binary situation; either you’re online or you’re offline. What’s more challenging—and probably more common—is the situation that Jake calls Lie-Fi. That’s when technically you’ve got a network connection …but it’s a shitty connection, like one bar of mobile signal. In that situation, because there’s technically a connection, the user gets a slow frustrating experience. Whatever code you’ve got in your service worker for handling offline situations will never get triggered. When you’re handling fetch events inside a service worker, there’s no automatic time-out.

But you can make one.

That’s what I’ve done recently here on adactio.com. Before showing you what I added to my service worker script to make that happen, let me walk you through my existing strategy for handling offline situations.

Service worker strategies

Alright, so in my service worker script, I’ve got a block of code for handling requests from fetch events:

addEventListener('fetch', fetchEvent => {
        const request = fetchEvent.request;
    // Do something with this request.
});

I’ve got two strategies in my code. One is for dealing with requests for pages:

if (request.headers.get('Accept').includes('text/html')) {
    // Code for handling page requests.
}

By adding an else clause I can have a different strategy for dealing with requests for anything else—images, style sheets, scripts, and so on:

if (request.headers.get('Accept').includes('text/html')) {
    // Code for handling page requests.
} else {
    // Code for handling everthing else.
}

For page requests, I’m going to try to go the network first:

fetchEvent.respondWith(
    fetch(request)
    .then( responseFromFetch => {
        return responseFromFetch;
    })

My logic is:

When someone requests a page, try to fetch it from the network.

If that doesn’t work, we’re in an offline situation. That triggers the catch clause. That’s where I have my offline strategy: show a custom offline page that I’ve previously cached (during the install event):

.catch( fetchError => {
    return caches.match('/offline');
})

Now my logic has been expanded to this:

When someone requests a page, try to fetch it from the network, but if that doesn’t work, show a custom offline page instead.

So my overall code for dealing with requests for pages looks like this:

if (request.headers.get('Accept').includes('text/html')) {
    fetchEvent.respondWith(
        fetch(request)
        .then( responseFromFetch => {
            return responseFromFetch;
        })
        .catch( fetchError => {
            return caches.match('/offline');
        })
    );
}

Now I can fill in the else statement that handles everything else—images, style sheets, scripts, and so on. Here my strategy is different. I’m looking in my caches first, and I only fetch the file from network if the file can’t be found in any cache:

caches.match(request)
.then( responseFromCache => {
    return responseFromCache || fetch(request);
})

Here’s all that fetch-handling code put together:

addEventListener('fetch', fetchEvent => {
    const request = fetchEvent.request;
    if (request.headers.get('Accept').includes('text/html')) {
        fetchEvent.respondWith(
            fetch(request)
            .then( responseFromFetch => {
                return responseFromFetch;
            })
            .catch( fetchError => {
                return caches.match('/offline');
            })
        );
    } else {
        caches.match(request)
        .then( responseFromCache => {
            return responseFromCache || fetch(request);
        })
    }
});

Good.

Cache as you go

Now I want to introduce an extra step in the part of the code where I deal with requests for pages. Whenever I fetch a page from the network, I’m going to take the opportunity to squirrel it away in a cache. I’m calling that cache “pages”. I’m imaginative like that.

fetchEvent.respondWith(
    fetch(request)
    .then( responseFromFetch => {
        const copy = responseFromFetch.clone();
        try {
            fetchEvent.waitUntil(
                caches.open('pages')
                .then( pagesCache => {
                    return pagesCache.put(request, copy);
                })
            )
        } catch(error) {
            console.error(error);
        }
        return responseFromFetch;
    })

You’ll notice that I can’t put the response itself (responseFromCache) into the cache. That’s a stream that I only get to use once. Instead I need to make a copy:

const copy = responseFromFetch.clone();

That’s what gets put in the pages cache:

fetchEvent.waitUntil(
    caches.open('pages')
    .then( pagesCache => {
        return pagesCache.put(request, copy);
    })
)

Now my logic for page requests has an extra piece to it:

When someone requests a page, try to fetch it from the network and store a copy in a cache, but if that doesn’t work, show a custom offline page instead.

Here’s my updated fetch-handling code:

addEventListener('fetch', fetchEvent => {
    const request = fetchEvent.request;
    if (request.headers.get('Accept').includes('text/html')) {
        fetchEvent.respondWith(
            fetch(request)
            .then( responseFromFetch => {
                const copy = responseFromFetch.clone();
                try {
                    fetchEvent.waitUntil(
                        caches.open('pages')
                        .then( pagesCache => {
                            return pagesCache.put(request, copy);
                        })
                    )
                } catch(error) {
                    console.error(error);
                }
                return responseFromFetch;
            })
            .catch( fetchError => {
                return caches.match('/offline');
            })
        );
    } else {
        caches.match(request)
        .then( responseFromCache => {
            return responseFromCache || fetch(request);
        })
    }
});

I call this the cache-as-you-go pattern. The more pages someone views on my site, the more pages they’ll have cached.

Now that there’s an ever-growing cache of previously visited pages, I can update my offline fallback. Currently, I reach straight for the custom offline page:

.catch( fetchError => {
    return caches.match('/offline');
})

But now I can try looking for a cached copy of the requested page first:

.catch( fetchError => {
    caches.match(request)
    .then( responseFromCache => {
        return responseFromCache || caches.match('/offline');
    })
});

Now my offline logic is expanded:

When someone requests a page, try to fetch it from the network and store a copy in a cache, but if that doesn’t work, first look for an existing copy in a cache, and otherwise show a custom offline page instead.

I can also access this ever-growing cache of pages from my custom offline page to show people which pages they can revisit, even if there’s no internet connection.

So far, so good. Everything I’ve outlined so far is a good robust strategy for handling offline situations. Now I’m going to deal with the lie-fi situation, and it’s that cache-as-you-go strategy that sets me up nicely.

Timing out

I want to throw this addition into my logic:

When someone requests a page, try to fetch it from the network and store a copy in a cache, but if that doesn’t work, first look for an existing copy in a cache, and otherwise show a custom offline page instead (but if the request is taking too long, try to show a cached version of the page).

The first thing I’m going to do is rewrite my code a bit. If the fetch event is for a page, I’m going to respond with a promise:

if (request.headers.get('Accept').includes('text/html')) {
    fetchEvent.respondWith(
        new Promise( resolveWithResponse => {
            // Code for handling page requests.
        })
    );
}

Promises are kind of weird things to get your head around. They’re tailor-made for doing things asynchronously. You can set up two parameters; a success condition and a failure condition. If the success condition is executed, then we say the promise has resolved. If the failure condition is executed, then the promise rejects.

In my re-written code, I’m calling the success condition resolveWithResponse (and I haven’t bothered with a failure condition, tsk, tsk). I’m going to use resolveWithResponse in my promise everywhere that I used to have a return statement:

addEventListener('fetch', fetchEvent => {
    const request = fetchEvent.request;
    if (request.headers.get('Accept').includes('text/html')) {
        fetchEvent.respondWith(
            new Promise( resolveWithResponse => {
                fetch(request)
                .then( responseFromFetch => {
                    const copy = responseFromFetch.clone();
                    try {
                        fetchEvent.waitUntil(
                            caches.open('pages')
                            then( pagesCache => {
                                return pagesCache.put(request, copy);
                            })
                        )
                    } catch(error) {
                        console.error(error);
                    }
                    resolveWithResponse(responseFromFetch);
                })
                .catch( fetchError => {
                    caches.match(request)
                    .then( responseFromCache => {
                        resolveWithResponse(
                            responseFromCache || caches.match('/offline')
                        );
                    })
                })
            })
        );
    } else {
        caches.match(request)
        .then( responseFromCache => {
            return responseFromCache || fetch(request);
        })
    }
});

By itself, rewriting my code as a promise doesn’t change anything. Everything’s working the same as it did before. But now I can introduce the time-out logic. I’m going to put this inside my promise:

const timer = setTimeout( () => {
    caches.match(request)
    .then( responseFromCache => {
        if (responseFromCache) {
            resolveWithResponse(responseFromCache);
        }
    })
}, 3000);

If a request takes three seconds (3000 milliseconds), then that code will execute. At that point, the promise attempts to resolve with a response from the cache instead of waiting for the network. If there is a cached response, that’s what the user now gets. If there isn’t, then the wait continues for the network.

The last thing left for me to do is cancel the countdown to timing out if a network response does return within three seconds. So I put this in the then clause that’s triggered by a successful network response:

clearTimeout(timer);

I also add the clearTimeout statement to the catch clause that handles offline situations. Here’s the final code:

addEventListener('fetch', fetchEvent => {
    const request = fetchEvent.request;
    if (request.headers.get('Accept').includes('text/html')) {
        fetchEvent.respondWith(
            new Promise( resolveWithResponse => {
                const timer = setTimeout( () => {
                    caches.match(request)
                    .then( responseFromCache => {
                        if (responseFromCache) {
                            resolveWithResponse(responseFromCache);
                        }
                    })
                }, 3000);
                fetch(request)
                .then( responseFromFetch => {
                    clearTimeout(timer);
                    const copy = responseFromFetch.clone();
                    try {
                        fetchEvent.waitUntil(
                            caches.open('pages')
                            then( pagesCache => {
                                return pagesCache.put(request, copy);
                            })
                        )
                    } catch(error) {
                        console.error(error);
                    }
                    resolveWithResponse(responseFromFetch);
                })
                .catch( fetchError => {
                    clearTimeout(timer);
                    caches.match(request)
                    .then( responseFromCache => {
                        resolveWithResponse(
                            responseFromCache || caches.match('/offline')
                        );
                    })
                })
            })
        );
    } else {
        caches.match(request)
        .then( responseFromCache => {
            return responseFromCache || fetch(request)
        })
    }
});

That’s the JavaScript translation of this logic:

When someone requests a page, try to fetch it from the network and store a copy in a cache, but if that doesn’t work, first look for an existing copy in a cache, and otherwise show a custom offline page instead (but if the request is taking too long, try to show a cached version of the page).

For everything else, try finding a cached version first, otherwise fetch it from the network.

Pros and cons

As with all service worker enhancements to a website, this strategy will do absolutely nothing for first-time visitors. If you’ve never visited my site before, you’ve got nothing cached. But the more you return to the site, the more your cache is primed for speedy retrieval.

I think that serving up a cached copy of a page when the network connection is flaky is a pretty good strategy …most of the time. If we’re talking about a blog post on this site, then sure, there won’t be much that the reader is missing out on—a fixed typo or ten; maybe some additional webmentions at the end of a post. But if we’re talking about the home page, then a reader with a flaky network connection might think there’s nothing new to read when they’re served up a stale version.

What I’d really like is some way to know—on the client side—whether or not the currently-loaded page came from a cache or from a network. Then I could add some kind of interface element that says, “Hey, this page might be stale—click here if you want to check for a fresher version.” I’d also need some way in the service worker to identify any requests originating from that interface element and make sure they always go out to the network.

I think that should be doable somehow. If you can think of a way to do it, please share it. Write a blog post and send me the link.

But even without the option to over-ride the time-out, I’m glad that I’m at least doing something to handle the lie-fi situation. Perhaps I should write a sequel to Going Offline called Still Online But Only In Theory Because The Connection Sucks.

Saturday, April 13th, 2019

Offline fallback page with service worker - Modern Web Development: Tales of a Developer Advocate by Paul Kinlan

Paul describes a fairly straightforward service worker recipe: a custom offline page for failed requests.