Tags: cache

18

sparkline

Thursday, March 23rd, 2017

Need to Catch Up on the AMP Debate? | CSS-Tricks

Funnily enough, I led a brown bag lunch discussion about AMP at work just the other day. A lot of it mirrored Chris’s thoughts here. It’s a complicated situation that has lots of people worried.

Retrofit Your Website as a Progressive Web App — SitePoint

Turning your existing website into a progressive web “app”—a far more appealing prospect than trying to create an entirely new app-shell architecture:

…they are an enhancement of your existing website which should take no longer than a few hours and have no negative effect on unsupported browsers.

Monday, March 13th, 2017

In AMP we trust

AMP Conf was one of those deep dive events, with two days dedicated to one single technology: AMP.

Except AMP isn’t really one technology, is it? And therein lies the confusion. This was at the heart of the panel I was on. When we talk about AMP, we could be talking about one of three things:

  1. The AMP format. A bunch of web components. For instance, instead of using an img element on an AMP page, you use an amp-img element instead.
  2. The AMP rules. There’s one JavaScript file, hosted on Google’s servers, that turns those web components from spans into working elements. No other JavaScript is allowed. All your styles must be in a style element instead of an external file, and there’s a limit on what you can do with those styles.
  3. The AMP cache. The source of most confusion—and even downright enmity—this is what’s behind the fact that when you launch an AMP result from Google search, you don’t go to another website. You see Google’s cached copy of the page instead of the original.

The first piece of AMP—the format—is kind of like a collection of marginal gains. Where the img element might have some performance issues, the amp-img element optimises for perceived performance. But if you just used the AMP web components, it wouldn’t be enough to make your site blazingly fast.

The second part of AMP—the rules—is where the speed gains start to really show. You can’t have an external style sheet, and crucially, you can’t have any third-party scripts other than the AMP script itself. This is key to making AMP pages super fast. It’s not so much about what AMP does; it’s more about what it doesn’t allow. If you never used a single AMP component, but stuck to AMP’s rules disallowing external styles and scripts, you could easily make a page that’s even faster than what AMP can do.

At AMP Conf, Natalia pointed out that The Guardian’s non-AMP pages beat out the AMP pages for performance. So why even have AMP pages? Well, that’s down to the third, most contentious, part of the AMP puzzle.

The AMP cache turns the user experience of visiting an AMP page from fast to instant. While you’re still on the search results page, Google will pre-render an AMP page in the background. Not pre-fetch, pre-render. That’s why it opens so damn fast. It’s also what causes the most confusion for end users.

From my unscientific polling, the behaviour of AMP results confuses the hell out of people. The fact that the page opens instantly isn’t the problem—far from it. It’s the fact that you don’t actually go to an another page. Technically, you’re still on Google. An analogous mental model would be an RSS reader, or an email client: you don’t go to an item or an email; you view it in situ.

Well, that mental model would be fine if it were consistent. But in Google search, only some results will behave that way (the AMP pages) and others will behave just like regular links to other websites. No wonder people are confused! Some search results take them away and some search results keep them on Google …even though the page looks like a different website.

The price that we pay for the instantly-opening AMP pages from the Google cache is the URL. Because we’re looking at Google’s pre-rendered copy instead of the original URL, the address bar is not pointing to the site the browser claims to be showing. Everything in the body of the browser looks like an article from The Guardian, but if I look at the URL (which is what security people have been telling us for years is important to avoid being phished), then I’ll see a domain that is not The Guardian’s.

But wait! Couldn’t Google pre-render the page at its original URL?

Yes, they could. But they won’t.

This was a point that Paul kept coming back to: trust. There’s no way that Google can trust that someone else’s URL will play by the AMP rules (no external scripts, only loading embedded content via web components, limited styles, etc.). They can only trust the copies that they themselves are serving up from their cache.

By the way, there was a joint AMP/search panel at AMP Conf with representatives from both teams. As you can imagine, there were many questions for the search team, most of which were Glomar’d. But one thing that the search people said time and again was that Google was not hosting our AMP pages. Now I don’t don’t know if they were trying to make some fine-grained semantic distinction there, but that’s an outright falsehood. If I click on a link, and the URL I get taken to is a Google property, then I am looking at a page hosted by Google. Yes, it might be a copy of a document that started life somewhere else, but if Google are serving something from their cache, they are hosting it.

This is one of the reasons why AMP feels like such a bait’n’switch to me. When it first came along, it felt like a direct competitor to Facebook’s Instant Articles and Apple News. But the big difference, we were told, was that you get to host your own content. That appealed to me much more than having Facebook or Apple host the articles. But now it turns out that Google do host the articles.

This will be the point at which Googlers will say no, no, no, you can totally host your own AMP pages …but you won’t get the benefits of pre-rendering. But without the pre-rendering, what’s the point of even having AMP pages?

Well, there is one non-cache reason to use AMP and it’s a political reason. Beleaguered developers working for publishers of big bloated web pages have a hard time arguing with their boss when they’re told to add another crappy JavaScript tracking script or bloated library to their pages. But when they’re making AMP pages, they can easily refuse, pointing out that the AMP rules don’t allow it. Google plays the bad cop for us, and it’s a very valuable role. Sarah pointed this out on the panel we were on, and she was spot on.

Alright, but what about The Guardian? They’ve already got fast pages, but they still have to create separate AMP pages if they want to get the pre-rendering benefits when they show up in Google search results. Sorry, says Google, but it’s the only way we can trust that the pre-rendered page will be truly fast.

So here’s the impasse we’re at. Google have provided a list of best practices for making fast web pages, but the only way they can truly verify that a page is sticking to those best practices is by hosting their own copy, URLs be damned.

This was the crux of Paul’s argument when he was on the Shop Talk Show podcast (it’s a really good episode—I was genuinely reassured to hear that Paul is not gung-ho about drinking the AMP Kool Aid; he has genuine concerns about the potential downsides for the web).

Initially, I accepted this argument that Google just can’t trust the rest of the web. But the more I talked to people at AMP Conf—and I had some really, really good discussions with people away from the stage—the more I began to question it.

Here’s the thing: the regular Google search can’t guarantee that any web page is actually 100% the right result to return for a search. Instead there’s a lot of fuzziness involved: based on the content, the markup, and the number of trusted sources linking to this, it looks like it should be a good result. In other words, Google search trusts websites to—by and large—do the right thing. Sometimes websites abuse that trust and try to game the system with sneaky tricks. Google responds with penalties when that happens.

Why can’t it be the same for AMP pages? Let me host my own AMP pages (maybe even host my own AMP script) and then when the Googlebot crawls those pages—the same as it crawls any other pages—that’s when it can verify that the AMP page is abiding by the rules. If I do something sneaky and trick Google into flagging a page as fast when it actually isn’t, then take my pre-rendering reward away from me.

To be fair, Google has very, very strict rules about what and how to pre-render the AMP results it’s caching. I can see how allowing even the potential for a false positive would have a negative impact on the user experience of Google search. But c’mon, there are already false positives in regular search results—fake news, spam blogs. Googlers are smart people. They can solve—or at least mitigate—these problems.

Google says it can’t trust our self-hosted AMP pages enough to pre-render them. But they ask for a lot of trust from us. We’re supposed to trust Google to cache and host copies of our pages. We’re supposed to trust Google to provide some mechanism to users to get at the original canonical URL. I’d like to see trust work both ways.

Wednesday, January 11th, 2017

Making Resilient Web Design work offline

I’ve written before about taking an online book offline, documenting the process behind the web version of HTML5 For Web Designers. A book is quite a static thing so it’s safe to take a fairly aggressive offline-first approach. In fact, a static unchanging book is one of the few situations that AppCache works for. Of course a service worker is better, but until AppCache is removed from browsers (and until service worker is supported across the board), I’m using both. I wouldn’t recommend that for most sites though—for most sites, use a service worker to enhance it, and avoid AppCache like the plague.

For Resilient Web Design, I took a similar approach to HTML5 For Web Designers but I knew that there was a good chance that some of the content would be getting tweaked at least for a while. So while the approach is still cache-first, I decided to keep the cache fairly fresh.

Here’s my service worker. It starts with the usual stuff: when the service worker is installed, there’s a list of static assets to cache. In this case, that list is literally everything; all the HTML, CSS, JavaScript, and images for the whole site. Again, this is a pattern that works well for a book, but wouldn’t be right for other kinds of websites.

The real heavy lifting happens with the fetch event. This is where the logic sits for what the service worker should do everytime there’s a request for a resource. I’ve documented the logic with comments:

// Look in the cache first, fall back to the network
  // CACHE
  // Did we find the file in the cache?
      // If so, fetch a fresh copy from the network in the background
      // NETWORK
          // Stash the fresh copy in the cache
  // NETWORK
  // If the file wasn't in the cache, make a network request
      // Stash a fresh copy in the cache in the background
  // OFFLINE
  // If the request is for an image, show an offline placeholder
  // If the request is for a page, show an offline message

So my order of preference is:

  1. Try the cache first,
  2. Try the network second,
  3. Fallback to a placeholder as a last resort.

Leaving aside that third part, regardless of whether the response is served straight from the cache or from the network, the cache gets a top-up. If the response is being served from the cache, there’s an additional network request made to get a fresh copy of the resource that was just served. This means that the user might be seeing a slightly stale version of a file, but they’ll get the fresher version next time round.

Again, I think this acceptable for a book where the tweaks and changes should be fairly minor, but I definitely wouldn’t want to do it on a more dynamic site where the freshness matters more.

Here’s what it usually likes like when a file is served up from the cache:

caches.match(request)
  .then( responseFromCache => {
  // Did we find the file in the cache?
  if (responseFromCache) {
      return responseFromCache;
  }

I’ve introduced an extra step where the fresher version is fetched from the network. This is where the code can look a bit confusing: the network request is happening in the background after the cached file has already been returned, but the code appears before the return statement:

caches.match(request)
  .then( responseFromCache => {
  // Did we find the file in the cache?
  if (responseFromCache) {
      // If so, fetch a fresh copy from the network in the background
      event.waitUntil(
          // NETWORK
          fetch(request)
          .then( responseFromFetch => {
              // Stash the fresh copy in the cache
              caches.open(staticCacheName)
              .then( cache => {
                  cache.put(request, responseFromFetch);
              });
          })
      );
      return responseFromCache;
  }

It’s asynchronous, see? So even though all that network code appears before the return statement, it’s pretty much guaranteed to complete after the cache response has been returned. You can verify this by putting in some console.log statements:

caches.match(request)
.then( responseFromCache => {
  if (responseFromCache) {
      event.waitUntil(
          fetch(request)
          .then( responseFromFetch => {
              console.log('Got a response from the network.');
              caches.open(staticCacheName)
              .then( cache => {
                  cache.put(request, responseFromFetch);
              });
          })
      );
      console.log('Got a response from the cache.');
      return responseFromCache;
  }

Those log statements will appear in this order:

Got a response from the cache.
Got a response from the network.

That’s the opposite order in which they appear in the code. Everything inside the event.waitUntil part is asynchronous.

Here’s the catch: this kind of asynchronous waitUntil hasn’t landed in all the browsers yet. The code I’ve written will fail.

But never fear! Jake has written a polyfill. All I need to do is include that at the start of my serviceworker.js file and I’m good to go:

// Import Jake's polyfill for async waitUntil
importScripts('/js/async-waituntil.js');

I’m also using it when a file isn’t found in the cache, and is returned from the network instead. Here’s what the usual network code looks like:

fetch(request)
  .then( responseFromFetch => {
    return responseFromFetch;
  })

I want to also store that response in the cache, but I want to do it asynchronously—I don’t care how long it takes to put the file in the cache as long as the user gets the response straight away.

Technically, I’m not putting the response in the cache; I’m putting a copy of the response in the cache (it’s a stream, so I need to clone it if I want to do more than one thing with it).

fetch(request)
  .then( responseFromFetch => {
    // Stash a fresh copy in the cache in the background
    let responseCopy = responseFromFetch.clone();
    event.waitUntil(
      caches.open(staticCacheName)
      .then( cache => {
          cache.put(request, responseCopy);
      })
    );
    return responseFromFetch;
  })

That all seems to be working well in browsers that support service workers. For legacy browsers, like Mobile Safari, there’s the much blunter caveman logic of an AppCache manifest.

Here’s the JavaScript that decides whether a browser gets the service worker or the AppCache:

if ('serviceWorker' in navigator) {
  // If service workers are supported
  navigator.serviceWorker.register('/serviceworker.js');
} else if ('applicationCache' in window) {
  // Otherwise inject an iframe to use appcache
  var iframe = document.createElement('iframe');
  iframe.setAttribute('src', '/appcache.html');
  iframe.setAttribute('style', 'width: 0; height: 0; border: 0');
  document.querySelector('footer').appendChild(iframe);
}

Either way, people are making full use of the offline nature of the book and that makes me very happy indeed.

A Tale of Four Caches · Yoav Weiss

A cute explanation of different browser caches:

  • memory cache,
  • service worker cache,
  • disk cache, and
  • push cache.

Sunday, December 4th, 2016

Service Worker, what are you? - Mariko Kosaka

This is a fun—and accurate—explanation of service workers.

There’s definitely something “alien” about a service worker—it’s kind of like a virus that gets installed on the user’s device. I’ve taken to describing it as “a man-in-the-middle attack on your own website” which makes sound a bit scarier than is necessary.

Tuesday, September 27th, 2016

Offline content with service workers · MadebyMike

This is a really great step-by-step walkthrough of adding a service worker to a website. Mike mentions the gotchas he encountered along the way, and describes how he incrementally levelled up the functionality.

If you’ve been going through a similar process, please write it down and share it like this!

Tuesday, August 23rd, 2016

gmetais/sw-delta: An incremental cache for the web

Here’s an interesting use of service workers: figure out the difference (the delta) between the currently-cached version of a file, and the version on the network, and then grab only the bits that have changed. It requires some configuration on the server side (to send back the diff) but it’s an interesting approach that could be worth keeping an eye on.

Friday, July 15th, 2016

The Progress of Web Apps | Microsoft Edge Dev Blog

The roadmap for progressive web apps from Microsoft; not just their support plans, but also some ideas for distribution.

Sunday, June 26th, 2016

» Service Workers at Scale, Part II: Handling Fallback Resources Cloud Four Blog

This ongoing series about the nuts’n’bolts of implementing Service Workers is really good. This one is great for getting to grips with the cache API.

Friday, June 3rd, 2016

Taking an online book offline

Application Cache is—as Jake so infamously described—not a good API. It was specced and shipped before developers had a chance to figure out what they really needed, and so AppCache turned out to be frustrating at best and downright dangerous in some situations. Its over-zealous caching combined with its byzantine cache invalidation ensured it was never going to become a mainstream technology.

There are very few use-cases for AppCache, but I think I hit upon one of them. Six years ago, A Book Apart published HTML5 For Web Designers. A year and a half later, I put the book online. The contents are never going to change. There’s a second edition of the book out now but if you want to read all the extra bits that Rachel added, you’re going to have to buy the book. The website for the original book is static and unchanging. That’s what made it such a good candidate for using AppCache. I could just set it and forget.

Except that’s no longer true. AppCache is being deprecated and browsers are starting to withdraw support. Chrome is already making sure that AppCache—like geolocation—no longer works on sites that aren’t served over HTTPS. That’s for the best. In retrospect, those APIs should never have been allowed over unsecured HTTP.

I mentioned that I spent the weekend switching all my book websites over to HTTPS, so AppCache should continue to work …for now. It’s only a matter of time before AppCache is removed completely from many of the browsers that currently support it.

Seeing as I’ve got the HTML5 For Web Designers site running on HTTPS now, I might as well go all out and make it a progressive web app. By far the biggest barrier to making a progressive web app is that first step of setting up HTTPS. It’s gotten cheaper—thanks to Let’s Encrypt Certbot—but it still involves mucking around in the command line with root access; I never wanted to become a sysadmin. But once that’s finally all set up, the other technological building blocks—a Service Worker and a manifest file—are relatively easy.

In this case, the Service Worker is using a straightforward bit of logic:

  • On installation, cache absolutely everything: HTML, CSS, images.
  • When anything is requested, grab it from the cache.
  • If it isn’t in the cache, try the network.
  • If the network doesn’t work, show an offline page (or image).

Basically I’m reproducing AppCache’s overzealous approach. It works for this site because the content is never going to change. I hope that this time, I really can just set it and forget it. I want the site to be an historical artefact, available at the same URL for at least my lifetime. I don’t want to have to maintain it or revisit it every few years to swap out one API for another.

Which brings me back to the way AppCache is being deprecated…

The Firefox team are very eager to ditch AppCache as soon as possible. On the one hand, that’s commendable. They’re rightly proud of shipping Service Workers and they want to encourage people to use the better technology instead. But it sure stings for the suckers (like me) who actually went and built stuff using AppCache.

In a weird way, I think this rush to deprecate AppCache might actually hurt the adoption of Service Workers. Let me explain…

At last year’s Edge Conference, Nolan Lawson gave a great presentation on storing data in the browser. He enumerated the many ways—past and present—that we could store data locally: WebSQL, Local Storage, IndexedDB …the list goes on. He also posed the question: why aren’t more people using insert-name-of-latest-API-here? To me it seemed obvious why more people weren’t diving into using the latest and greatest option for local data storage. It was because they had been burned before. The developers who rushed into trying previous solutions end up being mocked for their choice. “Still using that ol’ thing? Pffftt!”

You can see that same attitude on display from Mozilla as they push towards removing AppCache. Like in a comment that refers to developers using AppCache in production as “the angry hordes”. Reminds me of something Tom said:

In that same Mozilla thread, Soledad echoes Tom’s point:

As a member of the devrel team: I think that this should be better addressed in a blog post that someone from the team responsible for switching AppCache off should write, so everyone can understand the reasons and ask questions to those people.

I’d rather warn people beforehand, pointing them to that post and help them with migration paths than apply emergency mitigation strategies when a lot of people find their stuff stopped working in the newer Firefox…

Bravo! That same approach should have also been taken by the Chrome team when it came to their thread about punishing display:browser in manifest files. There was absolutely no communication with developers about this major decision. I only found out about it because Paul happened to mention it to me.

I was genuinely shocked by this:

Withholding the “add to home screen” prompt like that has a whiff of blackmail about it.

I can confirm that smell. When I was making the manifest file for HTML5 For Web Designers, I really wanted to put display: browser because I want people to be able to copy and paste URLs (for the book, for individual chapters, and for sections within chapters). But knowing that if I did that, Android users would never see the “add to home screen” prompt made me question that decision. I felt strong-armed into declaring display: standalone. And no, I’m not mollified by hand-waving reassurances that the Chrome team will figure out some solution for this. Figure out the solution first, then punish the saps like me who want to use display: browser to allow people to share URLs.

Anyway, the website for HTML5 For Web Designers is now using AppCache and Service Workers. The AppCache part will probably be needed for quite a while yet to provide offline support on iOS. Apple are really dragging their heels on Service Worker support, with at least one WebKit engineer actively looking for reasons not to implement it.

There’s a lot of talk about making apps work offline, but I think it’s just as important that we consider making information work offline. Books are a great example of this. To use the tired transport tropes, the website for a book is something you might genuinely want to access when you’re on a plane, or in the underground, or out at sea.

I really, really like progressive web apps. But I also think it’s important that we don’t fall into the trap of just trying to imitate native apps on the web. I love the idea of taking the best of the web—like information being permanently available at a URL—and marrying that up with the best of native—like offline access. I also like the idea of taking the best of books—a tome of thought—and marrying it up with the best of the web—hypertext.

I’d love to see more experimentation around online/offline hypertext/books. For now, you can visit HTML5 For Web Designers, add it to your home screen, and revisit it whenever and wherever you like.

Wednesday, December 2nd, 2015

Performance Calendar » Reducing Single Point of Failure using Service Workers

This is a nifty use of Service Workers—using a cache to mitigate unresponsive Content Delivery Networks.

The stuff in here about Promise.race is particularly useful for “lie-fi” scenarios: instead of thinking about the network connection in a binary way (either it’s available or it isn’t), considering the scenario of a crappy network connection seems more realistic.

Saturday, November 7th, 2015

My first Service Worker

I’ve made no secret of the fact that I’m really excited about Service Workers. I’m not alone. At the Coldfront conference in Copenhagen, pretty much every talk mentioned Service Workers.

Obviously I’m excited about what Service Workers enable: offline caching, background processes, push notifications, and all sorts of other goodies that allow the web to compete with native. But more than that, I’m really excited about the way that the Service Worker spec has been designed. Instead of being an all-or-nothing technology that you have to bet the farm on, it has been deliberately crafted to be used as an enhancement on top of existing sites (oh, how I wish that web components would follow a similar path).

I’ve got plenty of ideas on how Service Workers could be used to enhance a community site like The Session or the kind of events sites that we produce at Clearleft, but to begin with, I figured it would make sense to use my own personal site as a playground.

To start with, I’ve already conquered the first hurdle: serving my site over HTTPS. Service Workers require a secure connection. But you can play around with running a Service Worker locally if you run a copy of your site on localhost.

That’s how I started experimenting with Service Workers: serving on localhost, and stopping and starting my local Apache server with apachectl stop and apachectl start on the command line.

That reminds of another interesting use case for Service Workers: it’s not just about the user’s network connection failing (say, going into a train tunnel); it’s also about your web server not always being available. Both scenarios are covered equally.

I would never have even attempted to start if it weren’t for the existing examples from people who have been generous enough to share their work:

Also, I knew that Jake was coming to FF Conf so if I got stumped, I could pester him. That’s exactly what ended up happening (thanks, Jake!).

So if you decide to play around with Service Workers, please, please share your experience.

It’s entirely up to you how you use Service Workers. I figured for a personal site like this, it would be nice to:

  1. Explicitly cache resources like CSS, JavaScript, and some images.
  2. Cache the homepage so it can be displayed even when the network connection fails.
  3. For other pages, have a fallback “offline” page to display when the network connection fails.

So now I’ve got a Service Worker up and running on adactio.com. It will only work in Chrome, Android, Opera, and the forthcoming version of Firefox …and that’s just fine. It’s an enhancement. As more and more browsers start supporting it, this Service Worker will become more and more useful.

How very future friendly!

The code

If you’re interested in the nitty-gritty of what my Service Worker is doing, read on. If, on the other hand, code is not your bag, now would be a good time to bow out.

If you want to jump straight to the finished code, here’s a gist. Feel free to take it, break it, copy it, improve it, or do anything else you want with it.

To start with, let’s establish exactly what a Service Worker is. I like this definition by Matt Gaunt:

A service worker is a script that is run by your browser in the background, separate from a web page, opening the door to features which don’t need a web page or user interaction.

register

From inside my site’s global JavaScript file—or I could do this from a script element inside my pages—I’m going to do a quick bit of feature detection for Service Workers. If the browser supports it, then I’m going register my Service Worker by pointing to another JavaScript file, which sits at the root of my site:

if (navigator.serviceWorker) {
  navigator.serviceWorker.register('/serviceworker.js', {
    scope: '/'
  });
}

The serviceworker.js file sits in the root of my site so that it can act on any requests to my domain. If I put it somewhere like /js/serviceworker.js, then it would only be able to act on requests to the /js directory.

Once that file has been loaded, the installation of the Service Worker can begin. That means the script will be installed in the user’s browser …and it will live there even after the user has left my website.

install

I’m making the installation of the Service Worker dependent on a function called updateStaticCache that will populate a cache with the files I want to store:

self.addEventListener('install', function (event) {
  event.waitUntil(updateStaticCache());
});

That updateStaticCache function will be used for storing items in a cache. I’m going to make sure that the cache has a version number in its name, exactly as described in the Guardian’s use case. That way, when I want to update the cache, I only need to update the version number.

var staticCacheName = 'static';
var version = 'v1::';

Here’s the updateStaticCache function that puts the items I want into the cache. I’m storing my JavaScript, my CSS, some images referenced in the CSS, the home page of my site, and a page for displaying when offline.

function updateStaticCache() {
  return caches.open(version + staticCacheName)
    .then(function (cache) {
      return cache.addAll([
        '/path/to/javascript.js',
        '/path/to/stylesheet.css',
        '/path/to/someimage.png',
        '/path/to/someotherimage.png',
        '/',
        '/offline'
      ]);
    });
};

Because those items are part of the return statement for the Promise created by caches.open, the Service Worker won’t install until all of those items are in the cache. So you might want to keep them to a minimum.

You can still put other items in the cache, and not make them part of the return statement. That way, they’ll get added to the cache in their own good time, and the installation of the Service Worker won’t be delayed:

function updateStaticCache() {
  return caches.open(version + staticCacheName)
    .then(function (cache) {
      cache.addAll([
        '/path/to/somefile',
        '/path/to/someotherfile'
      ]);
      return cache.addAll([
        '/path/to/javascript.js',
        '/path/to/stylesheet.css',
        '/path/to/someimage.png',
        '/path/to/someotherimage.png',
        '/',
        '/offline'
      ]);
    });
}

Another option is to use completely different caches, but I’ve decided to just use one cache for now.

activate

When the activate event fires, it’s a good opportunity to clean up any caches that are out of date (by looking for anything that doesn’t match the current version number). I copied this straight from Nicolas’s code:

self.addEventListener('activate', function (event) {
  event.waitUntil(
    caches.keys()
      .then(function (keys) {
        return Promise.all(keys
          .filter(function (key) {
            return key.indexOf(version) !== 0;
          })
          .map(function (key) {
            return caches.delete(key);
          })
        );
      })
  );
});

fetch

The fetch event is fired every time the browser is going to request a file from my site. The magic of Service Worker is that I can intercept that request before it happens and decide what to do with it:

self.addEventListener('fetch', function (event) {
  var request = event.request;
  ...
});

POST requests

For a start, I’m going to just back off from any requests that aren’t GET requests:

if (request.method !== 'GET') {
  event.respondWith(
      fetch(request)
  );
  return;
}

That’s basically just replicating what the browser would do anyway. But even here I could decide to fall back to my offline page if the request doesn’t succeed. I do that using a catch clause appended to the fetch statement:

if (request.method !== 'GET') {
  event.respondWith(
      fetch(request)
          .catch(function () {
              return caches.match('/offline');
          })
  );
  return;
}

HTML requests

I’m going to treat requests for pages differently to requests for files. If the browser is requesting a page, then here’s the order I want:

  1. Try fetching the page from the network first.
  2. If that doesn’t work, try looking for the page in the cache.
  3. If all else fails, show the offline page.

First of all, I need to test to see if the request is for an HTML document. I’m doing this by sniffing the Accept headers, which probably isn’t the safest method:

if (request.headers.get('Accept').indexOf('text/html') !== -1) {

Now I try to fetch the page from the network:

event.respondWith(
  fetch(request)
);

If the network is working fine, this will return the response from the site and I’ll pass that along.

But if that doesn’t work, I’m going to look for a match in the cache. Time for a catch clause:

.catch(function () {
  return caches.match(request);
})

So now the whole event.respondWith statement looks like this:

event.respondWith(
  fetch(request)
    .catch(function () {
      return caches.match(request)
    })
);

Finally, I need to take care of the situation when the page can’t be fetched from the network and it can’t be found in the cache.

Now, I first tried to do this by adding a catch clause to the caches.match statement, like this:

return caches.match(request)
  .catch(function () {
    return caches.match('/offline');
  })

That didn’t work and for the life of me, I couldn’t figure out why. Then Jake set me straight. It turns out that caches.match will always return a response …even if that response is undefined. So a catch clause will never be triggered. Instead I need to return the offline page if the response from the cache is falsey:

return caches.match(request)
  .then(function (response) {
    return response || caches.match('/offline');
  })

With that cleared up, my code for handing HTML requests looks like this:

event.respondWith(
  fetch(request, { credentials: 'include' })
    .catch(function () {
      return caches.match(request)
        .then(function (response) {
          return response || caches.match('/offline');
        })
    })
);

Actually, there’s one more thing I’m doing with HTML requests. If the network request succeeds, I stash the response in the cache.

Well, that’s not exactly true. I stash a copy of the response in the cache. That’s because you’re only allowed to read the value of a response once. So if I want to do anything with it, I have to clone it:

var copy = response.clone();
caches.open(version + staticCacheName)
  .then(function (cache) {
    cache.put(request, copy);
  });

I do that right before returning the actual response. Here’s how it fits together:

if (request.headers.get('Accept').indexOf('text/html') !== -1) {
  event.respondWith(
    fetch(request, { credentials: 'include' })
      .then(function (response) {
        var copy = response.clone();
        caches.open(version + staticCacheName)
          .then(function (cache) {
            cache.put(request, copy);
          });
        return response;
      })
      .catch(function () {
        return caches.match(request)
          .then(function (response) {
            return response || caches.match('/offline');
          })
      })
  );
  return;
}

Okay. So that’s requests for pages taken care of.

File requests

I want to handle requests for files differently to requests for pages. Here’s my list of priorities:

  1. Look for the file in the cache first.
  2. If that doesn’t work, make a network request.
  3. If all else fails, and it’s a request for an image, show a placeholder.

Step one: try getting the file from the cache:

event.respondWith(
  caches.match(request)
);

Step two: if that didn’t work, go out to the network. Now remember, I can’t use a catch clause here, because caches.match will always return something: either a response or undefined. So here’s what I do:

event.respondWith(
  caches.match(request)
    .then(function (response) {
      return response || fetch(request);
    })
);

Now that I’m back to dealing with a fetch statement, I can use a catch clause to take care of the third and final step: if the network request doesn’t succeed, check to see if the request was for an image, and if so, display a placeholder:

.catch(function () {
  if (request.headers.get('Accept').indexOf('image') !== -1) {
    return new Response('<svg>...</svg>',  { headers: { 'Content-Type': 'image/svg+xml' }});
  }
})

I could point to a placeholder image in the cache, but I’ve decided to send an SVG on the fly using a new Response object.

Here’s how the whole thing looks:

event.respondWith(
  caches.match(request)
    .then(function (response) {
      return response || fetch(request)
        .catch(function () {
          if (request.headers.get('Accept').indexOf('image') !== -1) {
            return new Response('<svg>...</svg>', { headers: { 'Content-Type': 'image/svg+xml' }});
          }
        })
    })
);

The overall shape of my code to handle fetch events now looks like this:

self.addEventListener('fetch', function (event) {
  var request = event.request;
  // Non-GET requests
  if (request.method !== 'GET') {
    event.respondWith(
      ... 
    );
    return;
  }
  // HTML requests
  if (request.headers.get('Accept').indexOf('text/html') !== -1) {
    event.respondWith(
      ...
    );
    return;
  }
  // Non-HTML requests
  event.respondWith(
    ...
  );
});

Feel free to peruse the code.

Next steps

The code I’m running now is fine for a first stab, but there’s room for improvement.

Right now I’m stashing any HTML pages the user visits into the cache. I don’t think that will get out of control—I imagine most people only ever visit just a handful of pages on my site. But there’s the chance that the cache could get quite bloated. Ideally I’d have some way of keeping the cache nice and lean.

I was thinking: maybe I should have a separate cache for HTML pages, and limit the number in that cache to, say, 20 or 30 items. Every time I push something new into that cache, I could pop the oldest item out.

I could imagine doing something similar for images: keeping a cache of just the most recent 10 or 20.

If you fancy having a go at coding that up, let me know.

Lessons learned

There were a few gotchas along the way. I already mentioned the fact that caches.match will always return something so you can’t use catch clauses to handle situations where a file isn’t found in the cache.

Something else worth noting is that this:

fetch(request);

…is functionally equivalent to this:

fetch(request)
  .then(function (response) {
    return response;
  });

That’s probably obvious but it took me a while to realise. Likewise:

caches.match(request);

…is the same as:

caches.match(request)
  .then(function (response) {
    return response;
  });

Here’s another thing… you’ll notice that sometimes I’ve used:

fetch(request);

…but sometimes I’ve used:

fetch(request, { credentials: 'include' } );

That’s because, by default, a fetch request doesn’t include cookies. That’s fine if the request is for a static file, but if it’s for a potentially-dynamic HTML page, you probably want to make sure that the Service Worker request is no different from a regular browser request. You can do that by passing through that second (optional) argument.

But probably the trickiest thing is getting your head around the idea of Promises. Writing JavaScript is generally a fairly procedural affair, but once you start dealing with then clauses, you have to come to grips with the fact that the contents of those clauses will return asynchronously. So statements written after the then clause will probably execute before the code inside the clause. It’s kind of hard to explain, but if you find problems with your Service Worker code, check to see if that’s the cause.

And remember, please share your code and your gotchas: it’s early days for Service Workers so every implementation counts.

Updates

I got some very useful feedback from Jake after I published this…

Expires headers

By default, JavaScript files on my server are cached for a month. But a Service Worker script probably shouldn’t be cached at all (or cached for a very, very short time). I’ve updated my .htaccess rules accordingly:

<FilesMatch "serviceworker.js">
  ExpiresDefault "now"
</FilesMatch>
Credentials

If a request is initiated by the browser, I don’t need to say:

fetch(request, { credentials: 'include' } );

It’s enough to just say:

fetch(request);
Scope

I set the scope parameter of my Service Worker to be “/” …but because the Service Worker is sitting in the root directory anyway, I don’t really need to do that. I could just register it with:

if (navigator.serviceWorker) {
  navigator.serviceWorker.register('/serviceworker.js');
}

If, on the other hand, the Service Worker file were sitting in a folder, but I wanted it to act on the whole site, then I would need to specify the scope:

if (navigator.serviceWorker) {
  navigator.serviceWorker.register('/path/to/serviceworker.js', {
    scope: '/'
  });
}

…and I’d also need to send a special header. So it’s probably easiest to just put Service Worker scripts in the root directory.

Wednesday, September 24th, 2014

Using ServiceWorker in Chrome today - JakeArchibald.com

It’s very early days for ServiceWorker, but Jake is on hand with documentation and instructions on its use. To be honest, most of this is over my head and I suspect it won’t really “click” until I try using it for myself.

Where it gets really interesting is in the comments. Stuart asks “What about progressive enhancement?” And Jake points out that because a ServiceWorker won’t be installed on a first visit, you pretty much have to treat it as an enhancement. In fact, you’d have to go out of your way to make it a requirement:

You could, of course, throw up a splash screen and wait for the ServiceWorker to install, creating a ServiceWorker-dependant experience. I will hunt those people down.

Thursday, June 20th, 2013

Web Fonts and the Critical Path - Ian Feather

The battle between web fonts and performance. Ian Feather outlines some possible solutions, but of course, as always, the answer is “it depends”.

Monday, August 15th, 2011

Appcache Facts

A handy one-page cheatsheet for using HTML5’s appcache manifest file for offline storage.

Monday, July 11th, 2011

manifestR - offline web apps made easy (well easier)

A bookmarklet to help you figure out what files you might want to put in your cache manifest for offline storage.

Get off(line) | Web Directions

John has written a very in-depth look at offline storage (using the cache manifest) in HTML5.