Tags: cache

29

sparkline

Thursday, July 5th, 2018

The trimCache function in Going Offline

Paul Yabsley wrote to let me know about an error in Going Offline. It’s rather embarrassing because it’s code that I’m using in the service worker for adactio.com but for some reason I messed it up in the book.

It’s the trimCache function in Chapter 7: Tidying Up. That’s the reusable piece of code that recursively reduces the number of items in a specified cache (cacheName) to a specified amount (maxItems). On page 95 and 96 I describe the process of creating the function which, in the book, ends up like this:

 function trimCache(cacheName, maxItems) {
   cacheName.open( cache => {
     cache.keys()
     .then( items => {
       if (items.length > maxItems) {
         cache.delete(items[0])
         .then(
           trimCache(cacheName, maxItems)
         ); // end delete then
       } // end if
     }); // end keys then
   }); // end open
 } // end function

See the problem? It’s right there at the start when I try to open the cache like this:

cacheName.open( cache => {

That won’t work. The open method only works on the caches object—I should be passing the name of the cache into the caches.open method. So the code should look like this:

caches.open( cacheName )
.then( cache => {

Everything else remains the same. The corrected trimCache function is here:

function trimCache(cacheName, maxItems) {
  caches.open(cacheName)
  .then( cache => {
    cache.keys()
    .then(items => {
      if (items.length > maxItems) {
        cache.delete(items[0])
        .then(
          trimCache(cacheName, maxItems)
        ); // end delete then
      } // end if
    }); // end keys then
  }); // end open then
} // end function

Sorry about that! I must’ve had some kind of brainfart when I was writing (and describing) that one line of code.

You may want to deface your copy of Going Offline by taking a pen to that code example. Normally I consider the practice of writing in books to be barbarism, but in this case …go for it.

Tuesday, June 5th, 2018

Fresher service workers, by default

“Ah, this is good news!”, I thought, reading this update about how service worker scripts won’t be cached.

And that was the moment when I realised what an utter nerd I had become.

Tuesday, March 20th, 2018

How Fast Is Amp Really? - TimKadlec.com

An excellent, thorough, even-handed analysis of AMP’s performance from Tim. The AMP format doesn’t make that much of a difference, the AMP cache does speed things up (as would any CDN), but it’s the pre-rendering that really delivers the performance boost …as long as you give up your URLs.

But right now, the incentives being placed on AMP content seem to be accomplishing exactly what you would think: they’re incentivizing AMP, not performance.

Tuesday, March 6th, 2018

Minimal viable service worker

I really, really like service workers. They’re one of those technologies that have such clear benefits to users that it seems like a no-brainer to add a service worker to just about any website.

The thing is, every website is different. So the service worker strategy for every website needs to be different too.

Still, I was wondering if it would be possible to create a service worker script that would work for most websites. Here’s the script I came up with.

The logic works like this:

  • If there’s a request for an HTML page, fetch it from the network and store a copy in a cache (but if the network request fails, try looking in the cache instead).
  • For any other files, look for a copy in the cache first but meanwhile fetch a fresh version from the network to update the cache (and if there’s no existing version in the cache, fetch the file from the network and store a copy of it in the cache).

So HTML files are served network-first, while all other files are served cache-first, but in both cases a fresh copy is always put in the cache. The idea is that HTML content will always be fresh (unless there’s a problem with the network), while all other content—images, style sheets, scripts—might be slightly stale, but get refreshed with every request.

My original attempt was riddled with errors. Jake came to my rescue and we revised the script into something that actually worked. In the process, my misunderstanding of how await works led Jake to write a great blog post on await vs return vs return await.

I got there in the end and the script seems solid enough. It’s a fairly simplistic strategy that could work for quite a few sites, but it has some issues…

Service workers don’t perform any automatic cleanup of caches—that’s up to you to do (usually during the activate event). This script doesn’t do any cleanup so the cache might grow and grow and grow. For that reason, I think the script is best suited for fairly small sites.

The strategy also assumes that a file will either be fetched from the network or the cache. There’s no contingency for when both attempts fail. So there’s no fallback offline page, for example.

I decided to test it in the wild, but I expanded it slightly to fix the fallback issue. The version on the Ampersand 2018 website includes a worst-case-scenario option to show a custom offline page that has been pre-cached. (By the way, if you haven’t got a ticket for Ampersand yet, get a ticket now—it’s going to be superb day of web typography nerdery.)

Anyway, this fairly basic script seems to be delivering some good performance improvements. If you’ve got a site that you think would benefit from this network/caching strategy, and it’s served over HTTPS, then:

  1. Feel free to download the script or copy and paste it into a file called serviceworker.js,
  2. Put that file in the root directory of your website,
  3. Add this in a script element at the bottom of your HTML pages:

if (navigator.serviceWorker && !navigator.serviceWorker.controller) { navigator.serviceWorker.register('/serviceworker.js'); }

You can also use the script as a starting point. You might find issues specific to your particular website. That’s okay—you can tweak and adjust the script to suit your needs.

If this minimal service worker script proves in any way useful to you, thank Jake.

Monday, February 26th, 2018

Reliable Experiences - PWA Roadshow - YouTube

A good hands-on introduction to service workers from Mariko.

Reliable Experiences - PWA Roadshow

Sunday, February 11th, 2018

Welcoming Progressive Web Apps to Microsoft Edge and Windows 10 - Microsoft Edge Dev BlogMicrosoft Edge Dev Blog

It’s really great to hear about how Microsoft will be promoting progressive web apps as first-class citizens …but it’s really unhelpful that they’re using this fudgy definition:

Progressive Web Apps are just great web sites that can behave like native apps—or, perhaps, Progressive Web Apps are just great apps, powered by Web technologies and delivered with Web infrastructure.

Although they also give a more technical definition:

Technologically speaking, PWAs are web apps, progressively enhanced with modern web technologies (Service Worker, Fetch networking, Cache API, Push notifications, Web App Manifest) to provide a more app-like experience.

Nice try, slipping notifications in there like that, but no. No, no, no. Let’s not fool ourselves into thinking that one of the most annoying “features” of native apps is even desirable on the web.

If you want to use notifications, fine. But they are absolutely not a requirement for a progressive web app.

(A responsive design, on the other hand, totally is.)

Friday, February 9th, 2018

Workers at Your Service | WebKit

Here’s an interesting insight on how WebKit is going to handle the cleanup of unused service workers and caches:

Service worker and Cache API stored information will grow as a user is browsing content. To keep only the stored information that is useful to the user, WebKit will remove unused service worker registrations after a period of a few weeks. Caches that do not get opened after a few weeks will also be removed.

Thursday, May 18th, 2017

It’s not working! Should I blame caching?

Finally, the answer to one of the two hard questions in computer science.

Monday, May 1st, 2017

Offline Web Applications | Udacity

This is a free online video course recorded by Jake a couple of years back. It’s got a really good step-by-step introduction to service workers, delivered in Jake’s typically witty way. Some of the details are a bit out of date, and I must admit that I bailed when it got to IndexedDB, but I highly recommend giving this a go.

There’s also a free course on web accessibility I’m planning to check out.

Tuesday, April 18th, 2017

Progressive Web Apps - ILT  |  Web  |  Google Developers

A step-by-step guide to building progressive web apps. It covers promises, service workers, fetch, and cache, but seeing as it’s from Google, it also pushes the app-shell model.

This is a handy resource but I strongly disagree with some of the advice in the section on architectures (the same bit that gets all swoonsome for app shells):

Start by forgetting everything you know about conventional web design, and instead imagine designing a native app.

Avoid overly “web-like” design.

What a horribly limiting vision for the web! After all that talk about being progressive and responsive, we’re told to pretend we’re imitating native apps on one device type.

What’s really disgusting is the way that the Chrome team are withholding the “add to home screen” prompt from anyone who dares to make progressive web apps that are actually, y’know …webby.

Saturday, April 1st, 2017

AMP: breaking news | Andrew Betts

A wide-ranging post from Andrew on the downsides of Google’s AMP solution.

I don’t agree with all the issues he has with the format itself (in my opinion, the fact that AMP pages can’t have script elements is a feature, not a bug), but I wholeheartedly concur with his concerns about the AMP cache:

It recklessly devalues the URL

Spot on! And as Andrew points out, in this age of fake news, devaluing the URL is a recipe for disaster.

It’s hard to avoid the idea that the primary objective of AMP is really about hosting publisher content inside the Google ecosystem (as is more obviously the objective of Facebook Instant Articles and Apple News).

Thursday, March 23rd, 2017

Need to Catch Up on the AMP Debate? | CSS-Tricks

Funnily enough, I led a brown bag lunch discussion about AMP at work just the other day. A lot of it mirrored Chris’s thoughts here. It’s a complicated situation that has lots of people worried.

Retrofit Your Website as a Progressive Web App — SitePoint

Turning your existing website into a progressive web “app”—a far more appealing prospect than trying to create an entirely new app-shell architecture:

…they are an enhancement of your existing website which should take no longer than a few hours and have no negative effect on unsupported browsers.

Monday, March 13th, 2017

In AMP we trust

AMP Conf was one of those deep dive events, with two days dedicated to one single technology: AMP.

Except AMP isn’t really one technology, is it? And therein lies the confusion. This was at the heart of the panel I was on. When we talk about AMP, we could be talking about one of three things:

  1. The AMP format. A bunch of web components. For instance, instead of using an img element on an AMP page, you use an amp-img element instead.
  2. The AMP rules. There’s one JavaScript file, hosted on Google’s servers, that turns those web components from spans into working elements. No other JavaScript is allowed. All your styles must be in a style element instead of an external file, and there’s a limit on what you can do with those styles.
  3. The AMP cache. The source of most confusion—and even downright enmity—this is what’s behind the fact that when you launch an AMP result from Google search, you don’t go to another website. You see Google’s cached copy of the page instead of the original.

The first piece of AMP—the format—is kind of like a collection of marginal gains. Where the img element might have some performance issues, the amp-img element optimises for perceived performance. But if you just used the AMP web components, it wouldn’t be enough to make your site blazingly fast.

The second part of AMP—the rules—is where the speed gains start to really show. You can’t have an external style sheet, and crucially, you can’t have any third-party scripts other than the AMP script itself. This is key to making AMP pages super fast. It’s not so much about what AMP does; it’s more about what it doesn’t allow. If you never used a single AMP component, but stuck to AMP’s rules disallowing external styles and scripts, you could easily make a page that’s even faster than what AMP can do.

At AMP Conf, Natalia pointed out that The Guardian’s non-AMP pages beat out the AMP pages for performance. So why even have AMP pages? Well, that’s down to the third, most contentious, part of the AMP puzzle.

The AMP cache turns the user experience of visiting an AMP page from fast to instant. While you’re still on the search results page, Google will pre-render an AMP page in the background. Not pre-fetch, pre-render. That’s why it opens so damn fast. It’s also what causes the most confusion for end users.

From my unscientific polling, the behaviour of AMP results confuses the hell out of people. The fact that the page opens instantly isn’t the problem—far from it. It’s the fact that you don’t actually go to an another page. Technically, you’re still on Google. An analogous mental model would be an RSS reader, or an email client: you don’t go to an item or an email; you view it in situ.

Well, that mental model would be fine if it were consistent. But in Google search, only some results will behave that way (the AMP pages) and others will behave just like regular links to other websites. No wonder people are confused! Some search results take them away and some search results keep them on Google …even though the page looks like a different website.

The price that we pay for the instantly-opening AMP pages from the Google cache is the URL. Because we’re looking at Google’s pre-rendered copy instead of the original URL, the address bar is not pointing to the site the browser claims to be showing. Everything in the body of the browser looks like an article from The Guardian, but if I look at the URL (which is what security people have been telling us for years is important to avoid being phished), then I’ll see a domain that is not The Guardian’s.

But wait! Couldn’t Google pre-render the page at its original URL?

Yes, they could. But they won’t.

This was a point that Paul kept coming back to: trust. There’s no way that Google can trust that someone else’s URL will play by the AMP rules (no external scripts, only loading embedded content via web components, limited styles, etc.). They can only trust the copies that they themselves are serving up from their cache.

By the way, there was a joint AMP/search panel at AMP Conf with representatives from both teams. As you can imagine, there were many questions for the search team, most of which were Glomar’d. But one thing that the search people said time and again was that Google was not hosting our AMP pages. Now I don’t don’t know if they were trying to make some fine-grained semantic distinction there, but that’s an outright falsehood. If I click on a link, and the URL I get taken to is a Google property, then I am looking at a page hosted by Google. Yes, it might be a copy of a document that started life somewhere else, but if Google are serving something from their cache, they are hosting it.

This is one of the reasons why AMP feels like such a bait’n’switch to me. When it first came along, it felt like a direct competitor to Facebook’s Instant Articles and Apple News. But the big difference, we were told, was that you get to host your own content. That appealed to me much more than having Facebook or Apple host the articles. But now it turns out that Google do host the articles.

This will be the point at which Googlers will say no, no, no, you can totally host your own AMP pages …but you won’t get the benefits of pre-rendering. But without the pre-rendering, what’s the point of even having AMP pages?

Well, there is one non-cache reason to use AMP and it’s a political reason. Beleaguered developers working for publishers of big bloated web pages have a hard time arguing with their boss when they’re told to add another crappy JavaScript tracking script or bloated library to their pages. But when they’re making AMP pages, they can easily refuse, pointing out that the AMP rules don’t allow it. Google plays the bad cop for us, and it’s a very valuable role. Sarah pointed this out on the panel we were on, and she was spot on.

Alright, but what about The Guardian? They’ve already got fast pages, but they still have to create separate AMP pages if they want to get the pre-rendering benefits when they show up in Google search results. Sorry, says Google, but it’s the only way we can trust that the pre-rendered page will be truly fast.

So here’s the impasse we’re at. Google have provided a list of best practices for making fast web pages, but the only way they can truly verify that a page is sticking to those best practices is by hosting their own copy, URLs be damned.

This was the crux of Paul’s argument when he was on the Shop Talk Show podcast (it’s a really good episode—I was genuinely reassured to hear that Paul is not gung-ho about drinking the AMP Kool Aid; he has genuine concerns about the potential downsides for the web).

Initially, I accepted this argument that Google just can’t trust the rest of the web. But the more I talked to people at AMP Conf—and I had some really, really good discussions with people away from the stage—the more I began to question it.

Here’s the thing: the regular Google search can’t guarantee that any web page is actually 100% the right result to return for a search. Instead there’s a lot of fuzziness involved: based on the content, the markup, and the number of trusted sources linking to this, it looks like it should be a good result. In other words, Google search trusts websites to—by and large—do the right thing. Sometimes websites abuse that trust and try to game the system with sneaky tricks. Google responds with penalties when that happens.

Why can’t it be the same for AMP pages? Let me host my own AMP pages (maybe even host my own AMP script) and then when the Googlebot crawls those pages—the same as it crawls any other pages—that’s when it can verify that the AMP page is abiding by the rules. If I do something sneaky and trick Google into flagging a page as fast when it actually isn’t, then take my pre-rendering reward away from me.

To be fair, Google has very, very strict rules about what and how to pre-render the AMP results it’s caching. I can see how allowing even the potential for a false positive would have a negative impact on the user experience of Google search. But c’mon, there are already false positives in regular search results—fake news, spam blogs. Googlers are smart people. They can solve—or at least mitigate—these problems.

Google says it can’t trust our self-hosted AMP pages enough to pre-render them. But they ask for a lot of trust from us. We’re supposed to trust Google to cache and host copies of our pages. We’re supposed to trust Google to provide some mechanism to users to get at the original canonical URL. I’d like to see trust work both ways.

Wednesday, January 11th, 2017

Making Resilient Web Design work offline

I’ve written before about taking an online book offline, documenting the process behind the web version of HTML5 For Web Designers. A book is quite a static thing so it’s safe to take a fairly aggressive offline-first approach. In fact, a static unchanging book is one of the few situations that AppCache works for. Of course a service worker is better, but until AppCache is removed from browsers (and until service worker is supported across the board), I’m using both. I wouldn’t recommend that for most sites though—for most sites, use a service worker to enhance it, and avoid AppCache like the plague.

For Resilient Web Design, I took a similar approach to HTML5 For Web Designers but I knew that there was a good chance that some of the content would be getting tweaked at least for a while. So while the approach is still cache-first, I decided to keep the cache fairly fresh.

Here’s my service worker. It starts with the usual stuff: when the service worker is installed, there’s a list of static assets to cache. In this case, that list is literally everything; all the HTML, CSS, JavaScript, and images for the whole site. Again, this is a pattern that works well for a book, but wouldn’t be right for other kinds of websites.

The real heavy lifting happens with the fetch event. This is where the logic sits for what the service worker should do everytime there’s a request for a resource. I’ve documented the logic with comments:

// Look in the cache first, fall back to the network
  // CACHE
  // Did we find the file in the cache?
      // If so, fetch a fresh copy from the network in the background
      // NETWORK
          // Stash the fresh copy in the cache
  // NETWORK
  // If the file wasn't in the cache, make a network request
      // Stash a fresh copy in the cache in the background
  // OFFLINE
  // If the request is for an image, show an offline placeholder
  // If the request is for a page, show an offline message

So my order of preference is:

  1. Try the cache first,
  2. Try the network second,
  3. Fallback to a placeholder as a last resort.

Leaving aside that third part, regardless of whether the response is served straight from the cache or from the network, the cache gets a top-up. If the response is being served from the cache, there’s an additional network request made to get a fresh copy of the resource that was just served. This means that the user might be seeing a slightly stale version of a file, but they’ll get the fresher version next time round.

Again, I think this acceptable for a book where the tweaks and changes should be fairly minor, but I definitely wouldn’t want to do it on a more dynamic site where the freshness matters more.

Here’s what it usually likes like when a file is served up from the cache:

caches.match(request)
  .then( responseFromCache => {
  // Did we find the file in the cache?
  if (responseFromCache) {
      return responseFromCache;
  }

I’ve introduced an extra step where the fresher version is fetched from the network. This is where the code can look a bit confusing: the network request is happening in the background after the cached file has already been returned, but the code appears before the return statement:

caches.match(request)
  .then( responseFromCache => {
  // Did we find the file in the cache?
  if (responseFromCache) {
      // If so, fetch a fresh copy from the network in the background
      event.waitUntil(
          // NETWORK
          fetch(request)
          .then( responseFromFetch => {
              // Stash the fresh copy in the cache
              caches.open(staticCacheName)
              .then( cache => {
                  cache.put(request, responseFromFetch);
              });
          })
      );
      return responseFromCache;
  }

It’s asynchronous, see? So even though all that network code appears before the return statement, it’s pretty much guaranteed to complete after the cache response has been returned. You can verify this by putting in some console.log statements:

caches.match(request)
.then( responseFromCache => {
  if (responseFromCache) {
      event.waitUntil(
          fetch(request)
          .then( responseFromFetch => {
              console.log('Got a response from the network.');
              caches.open(staticCacheName)
              .then( cache => {
                  cache.put(request, responseFromFetch);
              });
          })
      );
      console.log('Got a response from the cache.');
      return responseFromCache;
  }

Those log statements will appear in this order:

Got a response from the cache.
Got a response from the network.

That’s the opposite order in which they appear in the code. Everything inside the event.waitUntil part is asynchronous.

Here’s the catch: this kind of asynchronous waitUntil hasn’t landed in all the browsers yet. The code I’ve written will fail.

But never fear! Jake has written a polyfill. All I need to do is include that at the start of my serviceworker.js file and I’m good to go:

// Import Jake's polyfill for async waitUntil
importScripts('/js/async-waituntil.js');

I’m also using it when a file isn’t found in the cache, and is returned from the network instead. Here’s what the usual network code looks like:

fetch(request)
  .then( responseFromFetch => {
    return responseFromFetch;
  })

I want to also store that response in the cache, but I want to do it asynchronously—I don’t care how long it takes to put the file in the cache as long as the user gets the response straight away.

Technically, I’m not putting the response in the cache; I’m putting a copy of the response in the cache (it’s a stream, so I need to clone it if I want to do more than one thing with it).

fetch(request)
  .then( responseFromFetch => {
    // Stash a fresh copy in the cache in the background
    let responseCopy = responseFromFetch.clone();
    event.waitUntil(
      caches.open(staticCacheName)
      .then( cache => {
          cache.put(request, responseCopy);
      })
    );
    return responseFromFetch;
  })

That all seems to be working well in browsers that support service workers. For legacy browsers, like Mobile Safari, there’s the much blunter caveman logic of an AppCache manifest.

Here’s the JavaScript that decides whether a browser gets the service worker or the AppCache:

if ('serviceWorker' in navigator) {
  // If service workers are supported
  navigator.serviceWorker.register('/serviceworker.js');
} else if ('applicationCache' in window) {
  // Otherwise inject an iframe to use appcache
  var iframe = document.createElement('iframe');
  iframe.setAttribute('src', '/appcache.html');
  iframe.setAttribute('style', 'width: 0; height: 0; border: 0');
  document.querySelector('footer').appendChild(iframe);
}

Either way, people are making full use of the offline nature of the book and that makes me very happy indeed.

A Tale of Four Caches · Yoav Weiss

A cute explanation of different browser caches:

  • memory cache,
  • service worker cache,
  • disk cache, and
  • push cache.

Sunday, December 4th, 2016

Service Worker, what are you? - Mariko Kosaka

This is a fun—and accurate—explanation of service workers.

There’s definitely something “alien” about a service worker—it’s kind of like a virus that gets installed on the user’s device. I’ve taken to describing it as “a man-in-the-middle attack on your own website” which makes sound a bit scarier than is necessary.

Tuesday, September 27th, 2016

Offline content with service workers · MadebyMike

This is a really great step-by-step walkthrough of adding a service worker to a website. Mike mentions the gotchas he encountered along the way, and describes how he incrementally levelled up the functionality.

If you’ve been going through a similar process, please write it down and share it like this!

Tuesday, August 23rd, 2016

gmetais/sw-delta: An incremental cache for the web

Here’s an interesting use of service workers: figure out the difference (the delta) between the currently-cached version of a file, and the version on the network, and then grab only the bits that have changed. It requires some configuration on the server side (to send back the diff) but it’s an interesting approach that could be worth keeping an eye on.

Friday, July 15th, 2016

The Progress of Web Apps | Microsoft Edge Dev Blog

The roadmap for progressive web apps from Microsoft; not just their support plans, but also some ideas for distribution.