Tags: caching

59

sparkline

Thursday, November 21st, 2019

Request with Intent: Caching Strategies in the Age of PWAs – A List Apart

Aaron outlines some sensible strategies for serving up images, including using the Cache API from your service worker script.

Tuesday, October 29th, 2019

Periodic background sync

Yesterday I wrote about how much I’d like to see silent push for the web:

I’d really like silent push for the web—the ability to update a cache with fresh content as soon as it’s published; that would be nifty! At the same time, I understand the concerns. It feels more powerful than other permission-based APIs like notifications.

Today, John Holt Ripley responded on Twitter:

hi there, just read your blog post about Silent Push for acthe web, and wondering if Periodic Background Sync would cover a few of those use cases?

Periodic background sync looks very interesting indeed!

It’s not the same as silent push. As the name suggests, this is about your service worker waking up periodically and potentially fetching (and caching) fresh content from the network. So the service worker is polling rather than receiving a push. But I’ll take it! It’s definitely close enough for the kind of use-cases I’ve been thinking about.

Interestingly, periodic background sync also ties into the other part of what I was writing about: permissions. I mentioned that adding a site the home screen could be interpreted as a signal to potentially allow more permissions (or at least allow prompts for more permissions).

Well, Chromium has a document outlining metrics for attempting to gauge site engagement. There’s some good thinking in there.

Monday, October 28th, 2019

Silent push for the web

After Indie Web Camp in Berlin last year, I wrote about Seb’s nifty demo of push without notifications:

While I’m very unwilling to grant permission to be interrupted by intrusive notifications, I’d be more than willing to grant permission to allow a website to silently cache timely content in the background. It would be a more calm technology.

Phil Nash left a comment on the Medium copy of my post explaining that Seb’s demo of using the Push API without showing a notification wouldn’t work for long:

The browsers allow a certain number of mistakes(?) before they start to show a generic notification to say that your site sent a push notification without showing a notification. I believe that after ~10 or so notifications, and that’s different between browsers, they run out of patience.

He also provided me with the name to describe what I’m after:

You’re looking for “silent push” as are many others.

Silent push is something that is possible in native apps. It isn’t (yet?) available on the web, presumably because of security concerns.

It’s an API that would ripe for abuse. I mean, just look at the mess we’ve made with APIs like notifications and geolocation. Sure, they require explicit user opt-in, but these opt-ins are seen so often that users are sick of seeing them. Silent push would be one more permission-based API to add to the stack of annoyances.

Still, I’d really like silent push for the web—the ability to update a cache with fresh content as soon as it’s published; that would be nifty! At the same time, I understand the concerns. It feels more powerful than other permission-based APIs like notifications.

Maybe there could be another layer of permissions. What if adding a site to your home screen was the first step? If a site is running on HTTPS, has a service worker, has a web app manifest, and has been added to the homescreen, maybe then and only then should it be allowed to prompt for permission to do silent push.

In other words, what if certain very powerful APIs were only available to progressive web apps that have successfully been added to the home screen?

Frankly, I’d be happy if the same permissions model applied to web notifications too, but I guess that ship has sailed.

Anyway, all this is pure conjecture on my part. As far as I know, silent push isn’t on the roadmap for any of the browser vendors right now. That’s fair enough. Although it does annoy me that native apps have this capability that web sites don’t.

It used to be that there was a long list of features that only native apps could do, but that list has grown shorter and shorter. The web’s hare is catching up to native’s tortoise.

Friday, October 25th, 2019

Offline Page Descriptions | Erik Runyon

Here’s a nice example of showing pages offline. It’s subtly different from what I’m doing on my own site, which goes to show that there’s no one-size-fits-all recipe when it comes to offline strategies.

Thursday, October 3rd, 2019

Blog service workers and the chicken and the egg

This is a great little technique from Remy: when a service worker is being installed, you make sure that the page(s) the user is first visiting get added to a cache.

Saturday, September 21st, 2019

Going offline with microformats

For the offline page on my website, I’ve been using a mixture of the Cache API and the localStorage API. My service worker script uses the Cache API to store copies of pages for offline retrieval. But I used the localStorage API to store metadata about the page—title, description, and so on. Then, my offline page would rifle through the pages stored in a cache, and retreive the corresponding metadata from localStorage.

It all worked fine, but as soon as I read Remy’s post about the forehead-slappingly brilliant technique he’s using, I knew I’d be switching my code over. Instead of using localStorage—or any other browser API—to store and retrieve metadata, he uses the pages themselves! Using the Cache API, you can examine the contents of the pages you’ve stored, and get at whatever information you need:

I realised I didn’t need to store anything. HTML is the API.

Refactoring the code for my offline page felt good for a couple of reasons. First of all, I was able to remove a dependency—localStorage—and simplify the JavaScript. That always feels good. But the other reason for the warm fuzzies is that I was able to use data instead of metadata.

Many years ago, Cory Doctorow wrote a piece called Metacrap. In it, he enumerates the many issues with metadata—data about data. The source of many problems is when the metadata is stored separately from the data it describes. The data may get updated, without a corresponding update happening to the metadata. Metadata tends to rot because it’s invisible—out of sight and out of mind.

In fact, that’s always been at the heart of one of the core principles behind microformats. Instead of duplicating information—once as data and again as metadata—repurpose the visible data; mark it up so its meta-information is directly attached to the information itself.

So if you have a person’s contact details on a web page, rather than repeating that information somewhere else—in the head of the document, say—you could instead attach some kind of marker to indicate which bits of the visible information are contact details. In the case of microformats, that’s done with class attributes. You can mark up a page that already has your contact information with classes from the h-card microformat.

Here on my website, I’ve marked up my blog posts, articles, and links using the h-entry microformat. These classes explicitly mark up the content to say “this is the title”, “this is the content”, and so on. This makes it easier for other people to repurpose my content. If, for example, I reply to a post on someone else’s website, and ping them with a webmention, they can retrieve my post and know which bit is the title, which bit is the content, and so on.

When I read Remy’s post about using the Cache API to retrieve information directly from cached pages, I knew I wouldn’t have to do much work. Because all of my posts are already marked up with h-entry classes, I could use those hooks to create a nice offline page.

The markup for my offline page looks like this:

<h1>Offline</h1>
<p>Sorry. It looks like the network connection isn’t working right now.</p>
<div id="history">
</div>

I’ll populate that “history” div with information from a cache called “pages” that I’ve created using the Cache API in my service worker.

I’m going to use async/await to do this because there are lots of steps that rely on the completion of the step before. “Open this cache, then get the keys of that cache, then loop through the pages, then…” All of those thens would lead to some serious indentation without async/await.

All async functions have to have a name—no anonymous async functions allowed. I’m calling this one listPages, just like Remy is doing. I’m making the listPages function execute immediately:

(async function listPages() {
...
})();

Now for the code to go inside that immediately-invoked function.

I create an array called browsingHistory that I’ll populate with the data I’ll use for that “history” div.

const browsingHistory = [];

I’m going to be parsing web pages later on, so I’m going to need a DOM parser. I give it the imaginative name of …parser.

const parser = new DOMParser();

Time to open up my “pages” cache. This is the first await statement. When the cache is opened, this promise will resolve and I’ll have access to this cache using the variable …cache (again with the imaginative naming).

const cache = await caches.open('pages');

Now I get the keys of the cache—that’s a list of all the page requests in there. This is the second await. Once the keys have been retrieved, I’ll have a variable that’s got a list of all those pages. You’ll never guess what I’m calling the variable that stores the keys of the cache. That’s right …keys!

const keys = await cache.keys();

Time to get looping. I’m getting each request in the list of keys using a for/of loop:

for (const request of keys) {
...
}

Inside the loop, I pull the page out of the cache using the match() method of the Cache API. I’ll store what I get back in a variable called response. As with everything involving the Cache API, this is asynchronous so I need to use the await keyword here.

const response = await cache.match(request);

I’m not interested in the headers of the response. I’m specifically looking for the HTML itself. I can get at that using the text() method. Again, it’s asynchronous and I want this promise to resolve before doing anything else, so I use the await keyword. When the promise resolves, I’ll have a variable called html that contains the body of the response.

const html = await response.text();

Now I can use that DOM parser I created earlier. I’ve got a string of text in the html variable. I can generate a Document Object Model from that string using the parseFromString() method. This isn’t asynchronous so there’s no need for the await keyword.

const dom = parser.parseFromString(html, 'text/html');

Now I’ve got a DOM, which I have creatively stored in a variable called …dom.

I can poke at it using DOM methods like querySelector. I can test to see if this particular page has an h-entry on it by looking for an element with a class attribute containing the value “h-entry”:

if (dom.querySelector('.h-entry h1.p-name') {
...
}

In this particular case, I’m also checking to see if the h1 element of the page is the title of the h-entry. That’s so that index pages (like my home page) won’t get past this if statement.

Inside the if statement, I’m going to store the data I retrieve from the DOM. I’ll save the data into an object called …data!

const data = new Object;

Well, the first piece of data isn’t actually in the markup: it’s the URL of the page. I can get that from the request variable in my for loop.

data.url = request.url;

I’m going to store the timestamp for this h-entry. I can get that from the datetime attribute of the time element marked up with a class of dt-published.

data.timestamp = new Date(dom.querySelector('.h-entry .dt-published').getAttribute('datetime'));

While I’m at it, I’m going to grab the human-readable date from the innerText property of that same time.dt-published element.

data.published = dom.querySelector('.h-entry .dt-published').innerText;

The title of the h-entry is in the innerText of the element with a class of p-name.

data.title = dom.querySelector('.h-entry .p-name').innerText;

At this point, I am actually going to use some metacrap instead of the visible h-entry content. I don’t output a description of the post anywhere in the body of the page, but I do put it in the head in a meta element. I’ll grab that now.

data.description = dom.querySelector('meta[name="description"]').getAttribute('content');

Alright. I’ve got a URL, a timestamp, a publication date, a title, and a description, all retrieved from the HTML. I’ll stick all of that data into my browsingHistory array.

browsingHistory.push(data);

My if statement and my for/in loop are finished at this point. Here’s how the whole loop looks:

for (const request of keys) {
  const response = await cache.match(request);
  const html = await response.text();
  const dom = parser.parseFromString(html, 'text/html');
  if (dom.querySelector('.h-entry h1.p-name')) {
    const data = new Object;
    data.url = request.url;
    data.timestamp = new Date(dom.querySelector('.h-entry .dt-published').getAttribute('datetime'));
    data.published = dom.querySelector('.h-entry .dt-published').innerText;
    data.title = dom.querySelector('.h-entry .p-name').innerText;
    data.description = dom.querySelector('meta[name="description"]').getAttribute('content');
    browsingHistory.push(data);
  }
}

That’s the data collection part of the code. Now I’m going to take all that yummy information an output it onto the page.

First of all, I want to make sure that the browsingHistory array isn’t empty. There’s no point going any further if it is.

if (browsingHistory.length) {
...
}

Within this if statement, I can do what I want with the data I’ve put into the browsingHistory array.

I’m going to arrange the data by date published. I’m not sure if this is the right thing to do. Maybe it makes more sense to show the pages in the order in which you last visited them. I may end up removing this at some point, but for now, here’s how I sort the browsingHistory array according to the timestamp property of each item within it:

browsingHistory.sort( (a,b) => {
  return b.timestamp - a.timestamp;
});

Now I’m going to concatenate some strings. This is the string of HTML text that will eventually be put into the “history” div. I’m storing the markup in a string called …markup (my imagination knows no bounds).

let markup = '<p>But you still have something to read:</p>';

I’m going to add a chunk of markup for each item of data.

browsingHistory.forEach( data => {
  markup += `
<h2><a href="${ data.url }">${ data.title }</a></h2>
<p>${ data.description }</p>
<p class="meta">${ data.published }</p>
`;
});

With my markup assembled, I can now insert it into the “history” part of my offline page. I’m using the handy insertAdjacentHTML() method to do this.

document.getElementById('history').insertAdjacentHTML('beforeend', markup);

Here’s what my finished JavaScript looks like:

<script>
(async function listPages() {
  const browsingHistory = [];
  const parser = new DOMParser();
  const cache = await caches.open('pages');
  const keys = await cache.keys();
  for (const request of keys) {
    const response = await cache.match(request);
    const html = await response.text();
    const dom = parser.parseFromString(html, 'text/html');
    if (dom.querySelector('.h-entry h1.p-name')) {
      const data = new Object;
      data.url = request.url;
      data.timestamp = new Date(dom.querySelector('.h-entry .dt-published').getAttribute('datetime'));
      data.published = dom.querySelector('.h-entry .dt-published').innerText;
      data.title = dom.querySelector('.h-entry .p-name').innerText;
      data.description = dom.querySelector('meta[name="description"]').getAttribute('content');
      browsingHistory.push(data);
    }
  }
  if (browsingHistory.length) {
    browsingHistory.sort( (a,b) => {
      return b.timestamp - a.timestamp;
    });
    let markup = '<p>But you still have something to read:</p>';
    browsingHistory.forEach( data => {
      markup += `
<h2><a href="${ data.url }">${ data.title }</a></h2>
<p>${ data.description }</p>
<p class="meta">${ data.published }</p>
`;
    });
    document.getElementById('history').insertAdjacentHTML('beforeend', markup);
  }
})();
</script>

I’m pretty happy with that. It’s not too long but it’s still quite readable (I hope). It shows that the Cache API and the h-entry microformat are a match made in heaven.

If you’ve got an offline strategy for your website, and you’re using h-entry to mark up your content, feel free to use that code.

If you don’t have an offline strategy for your website, there’s a book for that.

Friday, September 6th, 2019

Offline listings

This is brilliant technique by Remy!

If you’ve got a custom offline page that lists previously-visited pages (like I do on my site), you don’t have to choose between localStorage or IndexedDB—you can read the metadata straight from the HTML of the cached pages instead!

This seems forehead-smackingly obvious in hindsight. I’m totally stealing this.

Monday, August 26th, 2019

Opening up the AMP cache

I have a proposal that I think might alleviate some of the animosity around Google AMP. You can jump straight to the proposal or get some of the back story first…

The AMP format

Google AMP is exactly the kind of framework I’d like to get behind. Unlike most front-end frameworks, its components take a declarative approach—no knowledge of JavaScript required. I think Lea’s excellent Mavo is the only other major framework that takes this inclusive approach. All the configuration happens in markup, and all the styling happens in CSS. Excellent!

But I cannot get behind AMP.

Instead of competing on its own merits, AMP is unfairly propped up by the search engine of its parent company, Google. That makes it very hard to evaluate whether AMP is being used on its own merits. Instead, the evidence suggests that most publishers of AMP pages are doing so because they feel they have to, rather than because they want to. That’s a real shame, because as a library of web components, AMP seems pretty good. But there’s just no way to evaluate AMP-the-format without taking into account AMP-the-ecosystem.

The AMP ecosystem

Google AMP ostensibly exists to make the web faster. Initially the focus was specifically on mobile performance, but that distinction has since fallen by the wayside. The idea is that by using AMP’s web components, your pages will be speedy. Though, as Andy Davies points out, this isn’t always the case:

This is where I get confused… https://independent.co.uk only have an AMP site yet it’s performance is awful from a user perspective - isn’t AMP supposed to prevent this?

See also: Google AMP lowered our page speed, and there’s no choice but to use it:

According to Google’s own Page Speed Insights audit (which Google recommends to check your performance), the AMP version of articles got an average performance score of 87. The non-AMP versions? 95.

Publishers who already have fast web pages—like The Guardian—are still compelled to make AMP versions of their stories because of the search benefits reserved for AMP. As Terence Eden reported from a meeting of the AMP advisory committee:

We heard, several times, that publishers don’t like AMP. They feel forced to use it because otherwise they don’t get into Google’s news carousel — right at the top of the search results.

Some people felt aggrieved that all the hard work they’d done to speed up their sites was for nothing.

The Google AMP team are at pains to point out that AMP is not a ranking factor in search. That’s true. But it is unfairly privileged in other ways. Only AMP pages can appear in the Top Stories carousel …which appears above any other search results. As I’ve said before:

Now, if you were to ask any right-thinking person whether they think having their page appear right at the top of a list of search results would be considered preferential treatment, I think they would say hell, yes! This is the only reason why The Guardian, for instance, even have AMP versions of their content—it’s not for the performance benefits (their non-AMP pages are faster); it’s for that prime real estate in the carousel.

From A letter about Google AMP:

Content that “opts in” to AMP and the associated hosting within Google’s domain is granted preferential search promotion, including (for news articles) a position above all other results.

That’s not the only way that AMP pages get preferential treatment. It turns out that the secret to the speed of AMP pages isn’t the web components. It’s the prerendering.

The AMP cache

If you’ve ever seen an AMP page in a list of search results, you’ll have noticed the little lightning icon. If you’ve ever tapped on that search result, you’ll have noticed that the page loads blazingly fast!

That’s not down to AMP-the-format, alas. That’s down to the fact that the page has been prerendered by Google before you even went to it. If any page were prerendered that way, it would load blazingly fast. But currently, this privilege is reserved for AMP pages only.

If, after tapping through to that AMP page, you looked at the address bar of your browser, you might have noticed something odd. Even though you might have thought you were visiting The Washington Post, or The New York Times, the URL of the (blazingly fast) page you’re looking at is still under Google’s domain. That’s because Google hosts any AMP pages that it prerenders.

Google calls this “the AMP cache”, but it would be better described as “AMP hosting”. The web page sent down the wire is hosted on Google’s domain.

Here’s that AMP letter again:

When a user navigates from Google to a piece of content Google has recommended, they are, unwittingly, remaining within Google’s ecosystem.

Through gritted teeth, I will refer to this as “the AMP cache”, because that’s what everyone else calls it. But make no mistake, Google is hosting—not caching—these pages.

But why host the pages on a Google domain? Why not prerender the original URLs?

Prerendering and privacy

Scott summed up the situation with AMP nicely:

The pitch I think site owners are hearing is: let us host your pages on our domain and we’ll promote them in search results AND preload them so they feel “instant.” To opt-in, build pages using this component syntax.

But perhaps we could de-couple the AMP format from the AMP cache.

That’s what Terence suggests:

My recommendation is that Google stop requiring that organisations use Google’s proprietary mark-up in order to benefit from Google’s promotion.

The AMP letter, too:

Instead of granting premium placement in search results only to AMP, provide the same perks to all pages that meet an objective, neutral performance criterion such as Speed Index.

Scott reiterates:

It’s been said before but it would be so good for the web if pages with a Lighthouse score over say, 90 could get into that top search result area, even if they’re not built using Google’s AMP framework. Feels wrong to have to rebuild/reproduce an already-fast site just for SEO.

This was also what I was calling for. But then Malte pointed out something that stumped me. Privacy.

Here’s the problem…

Let’s say Google do indeed prerender already-fast pages when they’re listed in search results. You, a search user, type something into Google. A list of results come back. Google begins pre-rendering some of them. But you don’t end up clicking through to those pages. Nonetheless, the servers those pages are hosted on have received a GET request coming from a Google search. Those publishers now know that a particular (cookied?) user could have clicked through to their site. That’s very different from knowing when someone has actually arrived at a particular site.

And that’s why Google host all the AMP pages that they prerender. Given the privacy implications of prerendering non-Google URLs, I must admit that I see their point.

Still, it’s a real shame to miss out on the speed benefit of prerendering:

Prerendering AMP documents leads to substantial improvements in page load times. Page load time can be measured in different ways, but they consistently show that prerendering lets users see the content they want faster. For now, only AMP can provide the privacy preserving prerendering needed for this speed benefit.

A modest proposal

Why is Google’s AMP cache just for AMP pages? (Y’know, apart from the obvious answer that it’s in the name.)

What if Google were allowed to host non-AMP pages? Google search could then prerender those pages just like it currently does for AMP pages. There would be no privacy leaks; everything would happen on the same domain—google.com or ampproject.org or whatever—just as currently happens with AMP pages.

Don’t get me wrong: I’m not suggesting that Google should make a 1:1 model of the web just to prerender search results. I think that the implementation would need to have two important requirements:

  1. Hosting needs to be opt-in.
  2. Only fast pages should be prerendered.

Opting in

Currently, by publishing a page using the AMP format, publishers give implicit approval to Google to host that page on Google’s servers and serve up this Google-hosted version from search results. This has always struck me as being legally iffy. I’ve looked in the AMP documentation to try to find any explicit granting of hosting permission (e.g. “By linking to this JavaScript file, you hereby give Google the right to serve up our copies of your content.”), but no luck. So even with the current situation, I think a clear opt-in for hosting would be beneficial.

This could be a meta element. Maybe something like:

<meta name="caches-allowed" content="google">

This would have the nice benefit of allowing comma-separated values:

<meta name="caches-allowed" content="google, yandex">

(The name is just a strawman, by the way—I’m not suggesting that this is what the final implementation would actually look like.)

If not a meta element, then perhaps this could be part of robots.txt? Although my feeling is that this needs to happen on a document-by-document basis rather than site-wide.

Many people will, quite rightly, never want Google—or anyone else—to host and serve up their content. That’s why it’s so important that this behaviour needs to be opt-in. It’s kind of appalling that the current hosting of AMP pages is opt-in-by-proxy-sort-of.

Criteria for prerendering

Which pages should be blessed with hosting and prerendering? The fast ones. That’s sorta the whole point of AMP. But right now, there’s a lot of resentment by people with already-fast websites who quite rightly feel they shouldn’t have to use the AMP format to benefit from the AMP ecosystem.

Page speed is already a ranking factor. It doesn’t seem like too much of a stretch to extend its benefits to hosting and prerendering. As mentioned above, there are already a few possible metrics to use:

  • Page Speed Index
  • Lighthouse
  • Web Page Test

Ah, but what if a page has good score when it’s indexed, but then gets worse afterwards? Not a problem! The version of the page that’s measured is the same version of the page that gets hosted and prerendered. Google can confidently say “This page is fast!” After all, they’re the ones serving up the page.

That does raise the question of how often Google should check back with the original URL to see if it has changed/worsened/improved. The answer to that question is however long it currently takes to check back in on AMP pages:

Each time a user accesses AMP content from the cache, the content is automatically updated, and the updated version is served to the next user once the content has been cached.

Issues

This proposal does not solve the problem with the address bar. You’d still find yourself looking at a page from The Washington Post or The New York Times (or adactio.com) but seeing a completely different URL in your browser. That’s not good, for all the reasons outlined in the AMP letter.

In fact, this proposal could potentially make the situation worse. It would allow even more sites to be impersonated by Google’s URLs. Where currently only AMP pages are bad actors in terms of URL confusion, opening up the AMP cache would allow equal opportunity URL confusion.

What I’m suggesting is definitely not a long-term solution. The long-term solutions currently being investigated are technically tricky and will take quite a while to come to fruition—web packages and signed exchanges. In the meantime, what I’m proposing is a stopgap solution that’s technically a lot simpler. But it won’t solve all the problems with AMP.

This proposal solves one problem—AMP pages being unfairly privileged in search results—but does nothing to solve the other, perhaps more serious problem: the erosion of site identity.

Measuring

Currently, Google can assess whether a page should be hosted and prerendered by checking to see if it’s a valid AMP page. That test would need to be widened to include a different measurement of performance, but those measurements already exist.

I can see how this assessment might not be as quick as checking for AMP validity. That might affect whether non-AMP pages could be measured quickly enough to end up in the Top Stories carousel, which is, by its nature, time-sensitive. But search results are not necessarily as time-sensitive. Let’s start there.

Assets

Currently, AMP pages can be prerendered without fetching anything other than the markup of the AMP page itself. All the CSS is inline. There are no initial requests for other kinds of content like images. That’s because there are no img elements on the page: authors must use amp-img instead. The image itself isn’t loaded until the user is on the page.

If the AMP cache were to be opened up to non-AMP pages, then any content required for prerendering would also need to be hosted on that same domain. Otherwise, there’s privacy leakage.

This definitely introduces an extra level of complexity. Paths to assets within the markup might need to be re-written to point to the Google-hosted equivalents. There would almost certainly need to be a limit on the number of assets allowed. Though, for performance, that’s no bad thing.

Make no mistake, figuring out what to do about assets—style sheets, scripts, and images—is very challenging indeed. Luckily, there are very smart people on the Google AMP team. If that brainpower were to focus on this problem, I am confident they could solve it.

Summary

  1. Prerendering of non-Google URLs is problematic for privacy reasons, so Google needs to be able to host pages in order to prerender them.
  2. Currently, that’s only done for pages using the AMP format.
  3. The AMP cache—and with it, prerendering—should be decoupled from the AMP format, and opened up to other fast web pages.

There will be technical challenges, but hopefully nothing insurmountable.

I honestly can’t see what Google have to lose here. If their goal is genuinely to reward fast pages, then opening up their AMP cache to fast non-AMP pages will actively encourage people to make fast web pages (without having to switch over to the AMP format).

I’ve deliberately kept the details vague—what the opt-in should look like; what the speed measurement should be; how to handle assets—I’m sure smarter folks than me can figure that stuff out.

I would really like to know what other people think about this proposal. Obviously, I’d love to hear from members of the Google AMP team. But I’d also love to hear from publishers. And I’d very much like to know what people in the web performance community think about this. (Write a blog post and send me a webmention.)

What am I missing here? What haven’t I thought of? What are the potential pitfalls (and are they any worse than the current acrimonious situation with Google AMP)?

I would really love it if someone with a fast website were in a position to say, “Hey Google, I’m giving you permission to host this page so that it can be prerendered.”

I would really love it if someone with a slow website could say, “Oh, shit! We’d better make our existing website faster or Google won’t host our pages for prerendering.”

And I would dearly love to finally be able to embrace AMP-the-format with a clear conscience. But as long as prerendering is joined at the hip to the AMP format, the injustice of the situation only harms the AMP project.

Google, open up the AMP cache.

Tuesday, July 2nd, 2019

The trimCache function in Going Offline …again

It seems that some code that I wrote in Going Offline is haunted. It’s the trimCache function.

First, there was the issue of a typo. Or maybe it’s more of a brainfart than a typo, but either way, there’s a mistake in the syntax that was published in the book.

Now it turns out that there’s also a problem with my logic.

To recap, this is a function that takes two arguments: the name of a cache, and the maximum number of items that cache should hold.

function trimCache(cacheName, maxItems) {

First, we open up the cache:

caches.open(cacheName)
.then( cache => {

Then, we get the items (keys) in that cache:

cache.keys()
.then(keys => {

Now we compare the number of items (keys.length) to the maximum number of items allowed:

if (keys.length > maxItems) {

If there are too many items, delete the first item in the cache—that should be the oldest item:

cache.delete(keys[0])

And then run the function again:

.then(
    trimCache(cacheName, maxItems)
);

A-ha! See the problem?

Neither did I.

It turns out that, even though I’m using then, the function will be invoked immediately, instead of waiting until the first item has been deleted.

Trys helped me understand what was going on by making a useful analogy. You know when you use setTimeout, you can’t put a function—complete with parentheses—as the first argument?

window.setTimeout(doSomething(someValue), 1000);

In that example, doSomething(someValue) will be invoked immediately—not after 1000 milliseconds. Instead, you need to create an anonymous function like this:

window.setTimeout( function() {
    doSomething(someValue)
}, 1000);

Well, it’s the same in my trimCache function. Instead of this:

cache.delete(keys[0])
.then(
    trimCache(cacheName, maxItems)
);

I need to do this:

cache.delete(keys[0])
.then( function() {
    trimCache(cacheName, maxItems)
});

Or, if you prefer the more modern arrow function syntax:

cache.delete(keys[0])
.then( () => {
    trimCache(cacheName, maxItems)
});

Either way, I have to wrap the recursive function call in an anonymous function.

Here’s a gist with the updated trimCache function.

What’s annoying is that this mistake wasn’t throwing an error. Instead, it was causing a performance problem. I’m using this pattern right here on my own site, and whenever my cache of pages or images gets too big, the trimCaches function would get called …and then wouldn’t stop running.

I’m very glad that—witht the help of Trys at last week’s Homebrew Website Club Brighton—I was finally able to get to the bottom of this. If you’re using the trimCache function in your service worker, please update the code accordingly.

Management regrets the error.

Monday, June 24th, 2019

Am I cached or not?

When I was writing about the lie-fi strategy I’ve added to adactio.com, I finished with this thought:

What I’d really like is some way to know—on the client side—whether or not the currently-loaded page came from a cache or from a network. Then I could add some kind of interface element that says, “Hey, this page might be stale—click here if you want to check for a fresher version.”

Trys heard my plea, and came up with a very clever technique to alter the HTML of a page when it’s put into a cache.

It’s a function that reads the response body stream in, returning a new stream. Whilst reading the stream, it searches for the character codes that make up: <html. If it finds them, it tacks on a data-cached attribute.

Nice!

But then I was discussing this issue with Tantek and Aaron late one night after Indie Web Camp Düsseldorf. I realised that I might have another potential solution that doesn’t involve the service worker at all.

Caveat: this will only work for pages that have some kind of server-side generation. This won’t work for static sites.

In my case, pages are generated by PHP. I’m not doing a database lookup every time you request a page—I’ve got a server-side cache of posts, for example—but there is a little bit of assembly done for every request: get the header from here; get the main content from over there; get the footer; put them all together into a single page and serve that up.

This means I can add a timestamp to the page (using PHP). I can mark the moment that it was served up. Then I can use JavaScript on the client side to compare that timestamp to the current time.

I’ve published the code as a gist.

In a script element on each page, I have this bit of coducken:

var serverTimestamp = <?php echo time(); ?>;

Now the JavaScript variable serverTimestamp holds the timestamp that the page was generated. When the page is put in the cache, this won’t change. This number should be the number of seconds since January 1st, 1970 in the UTC timezone (that’s what my server’s timezone is set to).

Starting with JavaScript’s Date object, I use a caravan of methods like toUTCString() and getTime() to end up with a variable called clientTimestamp. This will give the current number of seconds since January 1st, 1970, regardless of whether the page is coming from the server or from the cache.

var localDate = new Date();
var localUTCString = localDate.toUTCString();
var UTCDate = new Date(localUTCString);
var clientTimestamp = UTCDate.getTime() / 1000;

Then I compare the two and see if there’s a discrepency greater than five minutes:

if (clientTimestamp - serverTimestamp > (60 * 5))

If there is, then I inject some markup into the page, telling the reader that this page might be stale:

document.querySelector('main').insertAdjacentHTML('afterbegin',`
  <p class="feedback">
    <button onclick="this.parentNode.remove()">dismiss</button>
    This page might be out of date. You can try <a href="javascript:window.location=window.location.href">refreshing</a>.
  </p>
`);

The reader has the option to refresh the page or dismiss the message.

This page might be out of date. You can try refreshing.

It’s not foolproof by any means. If the visitor’s computer has their clock set weirdly, then the comparison might return a false positive every time. Still, I thought that using UTC might be a safer bet.

All in all, I think this is a pretty good method for detecting if a page is being served from a cache. Remember, the goal here is not to determine if the user is offline—for that, there’s navigator.onLine.

The upshot is this: if you visit my site with a crappy internet connection (lie-fi), then after three seconds you may be served with a cached version of the page you’re requesting (if you visited that page previously). If that happens, you’ll now also be presented with a little message telling you that the page isn’t fresh. Then it’s up to you whether you want to have another go.

I like the way that this puts control back into the hands of the user.

Monday, June 3rd, 2019

Self-Host Your Static Assets – CSS Wizardry

Trust no one! Harry enumerates the reason why you should be self-hosting your assets (and busts some myths along the way).

There really is very little reason to leave your static assets on anyone else’s infrastructure. The perceived benefits are often a myth, and even if they weren’t, the trade-offs simply aren’t worth it. Loading assets from multiple origins is demonstrably slower.

Thursday, May 9th, 2019

Distinguishing cached vs. network HTML requests in a Service Worker | Trys Mudford

Less than 24 hours after I put the call out for a solution to this gnarly service worker challenge, Trys has come up with a solution.

Friday, March 8th, 2019

Tuning Performance for New and “Old” Friends | Filament Group, Inc., Boston, MA

This is a really clever technique from Scott that he unveiled at An Event Apart in Seattle. It uses a header sent by a service worker to distinguish between returning and new visitors—much neater than relying on a cookie. I’ve updated my service worker on The Session to use this technique now.

Tuesday, March 5th, 2019

Move Fast and Don’t Break Things by Scott Jehl

Scott Jehl is speaking at An Event Apart in Seattle—yay! His talk is called Move Fast and Don’t Break Things:

Performance is a high priority for any site of scale today, but it can be easier to make a site fast than to keep it that way. As a site’s features and design evolves, its performance is often threatened for a number of reasons, making it hard to ensure fast, resilient access to services. In this session, Scott will draw from real-world examples where business goals and other priorities have conflicted with page performance, and share some strategies and practices that have helped major sites overcome those challenges to defend their speed without compromises.

The title is a riff on the “move fast and break things” motto, which comes from a more naive time on the web. But Scott finds part of it relatable. Things break. We want to move fast without breaking things.

This is a performance talk, which is another kind of moving fast. Scott starts with a brief history of not breaking websites. He’s been chipping away at websites for 20 years now. Remember Positioning Is Everything? How about Quirksmode? That one's still around.

In the early days, building a website that was "not broken" was difficult, but it was difficult for different reasons. We were focused on consistency. We had deal with differences between browsers. There were two ways of dealing with browsers: browser detection and feature detection.

The feature-based approach was more sustainable but harder. It fits nicely with the practice of progressive enhancement. It's a good mindset for dealing with the explosion of devices that kicked off later. Touch screens made us rethink our mouse and hover-centric matters. That made us realise how much keyboard-driven access mattered all along.

Browsers exploded too. And our data networks changed. With this explosion of considerations, it was clear that our early ideas of “not broken” didn’t work. Our notion of what constituted “not broken” was itself broken. Consistency just doesn’t cut it.

But there was a comforting part to this too. It turned out that progressive enhancement was there to help …even though we didn’t know what new devices were going to appear. This is a recurring theme throughout Scott’s career. So given all these benefits of progressive enhancement, it shouldn’t be surprising that it turns out to be really good for performance too. If you practice progressive enhancement, you’re kind of a performance expert already.

People started talking about new performance metrics that we should care about. We’ve got new tools, like Page Speed Insights. It gives tangible advice on how to test things. Web Page Test is another great tool. Once you prove you’re a human, Web Page Test will give you loads of details on how a page loaded. And you get this great visual timeline.

This is where we can start to discuss the metrics we want to focus on. Traditionally, we focused on file size, which still matters. But for goal-setting, we want to focus on user-perceived metrics.

First Meaningful Content. It’s about how soon appears to be useful to a user. Progressive enhancement is a perfect match for this! When you first make request to a website, it’s usually for a web page. But to render that page, it might need to request more files like CSS or JavaScript. All of this adds up. From a user perspective, if the HTML is downloaded, but the browser can’t render it, that’s broken.

The average time for this on the web right now is around six seconds. That’s broken. The render blockers are the problem here.

Consider assets like scripts. Can you get the browser to load them without holding up the rendering of the page? If you can add async or defer to a script element in the head, you should do that. Sometimes that’s not an option though.

For CSS, it’s tricky. We’ve delivered the HTML that we need but we’ve got to wait for the CSS before rendering it. So what can you bundle into that initial payload?

You can user server push. This is a new technology that comes with HTTP2. H2, as it’s called, is very performance-focused. Just turning on H2 will probably make your site faster. Server push allows the server to send files to the browser before the browser has even asked for them. You can do this with directives in Apache, for example. You could push CSS whenever an HTML file is requested. But we need to be careful not to go too far. You don’t want to send too much.

Server push is great in moderation. But it is new, and it may not even be supported by your server.

Another option is to inline CSS (well, actually Scott, this is technically embedding CSS). It’s great for first render, but isn’t it wasteful for caching? Scott has a clever pattern that uses the Cache API to grab the contents of the inlined CSS and put a copy of its contents into the cache. Then it’s ready to be served up by a service worker.

By the way, this isn’t just for CSS. You could grab the contents of inlined SVGs and create cached versions for later use.

So inlining CSS is good, but again, in moderation. You don’t want to embed anything bigger than 15 or 20 kilobytes. You might want separate out the critical CSS and only embed that on first render. You don’t need to go through your CSS by hand to figure out what’s critical—there are tools that to do this that integrate with your build process. Embed that critical CSS into the head of your document, and also start preloading the full CSS. Here’s a clever technique that turns a preload link into a stylesheet link:

<link rel="preload" href="site.css" as="style" onload="this.rel='stylesheet'">

Also include this:

<noscript><link rel="stylesheet" href="site.css"></noscript>

You can also optimise for return visits. It’s all about the cache.

In the past, we might’ve used a cookie to distinguish a returning visitor from a first-time visitor. But cookies kind of suck. Here’s something that Scott has been thinking about: service workers can intercept outgoing requests. A service worker could send a header that matches the current build of CSS. On the server, we can check for this header. If it’s not the latest CSS, we can server push the latest version, or inline it.

The neat thing about service workers is that they have to install before they take over. Scott makes use of this install event to put your important assets into a cache. Only once that is done to we start adding that extra header to requests.

Watch out for an article on the Filament Group blog on this technique!

With performance, more weight doesn’t have to mean more wait. You can have a heavy page that still appears to load quickly by altering the prioritisation of what loads first.

Web pages are very heavy now. There’s a real cost to every byte. Tim’s WhatDoesMySiteCost.com shows that the CNN home page costs almost fifty cents to load for someone in America!

Time to interactive. This is is the time before a user can use what’s on the screen. The issue is almost always with JavaScript. The page looks usable, but you can’t use it yet.

Addy Osmani suggests we should get to interactive in under five seconds on a 3G network on a median mobile device. Your iPhone is not a median mobile device. A typical phone takes six seconds to process a megabyte of JavaScript after it has downloaded. So even if the network is fast, the time to interactive can still be very long.

This all comes down to our industry’s increasing reliance on JavaScript just to render content. There seems to be pendulum shifts between client-side and server-side rendering. It’s been great to see libraries like Vue and Ember embrace server-side rendering.

But even with server-side rendering, there’s still usually a rehydration step where all the JavaScript gets parsed and that really affects time to interaction.

Code splitting can help. Webpack can do this. That helps with first-party JavaScript, but what about third-party JavaScript?

Scott believes easier to make a fast website than to keep a fast website. And that’s down to all the third-party scripts that people throw in: analytics, ads, tracking. They can wreak havoc on all your hard work.

These scripts apparently contribute to the business model, so it can be hard for us to make the case for removing them. Tools like SpeedCurve can help people stay informed on the impact of these scripts. It allows you to set up performance budgets and it shows you when pages go over budget. When that happens, we have leverage to step in and push back.

Assuming you lose that battle, what else can we do?

These days, lots of A/B testing and personalisation happens on the client side. The tooling is easy to use. But they are costly!

A typical problematic pattern is this: the server sends one version of the page, and once the page is loaded, the whole page gets replaced with a different layout targeted at the user. This leads to a terrifying new metric that Scott calls Second Meaningful Content.

Assuming we can’t remove the madness, what can we do? We could at least not do this for first-time visits. We could load the scripts asyncronously. We can preload the scripts at the top of the page. But ideally we want to move these things to the server. Server-side A/B testing and personalisation have existed for a while now.

Scott has been experimenting with a middleware solution. There’s this idea of server workers that Cloudflare is offering. You can manipulate the page that gets sent from the server to the browser—all the things you would do for an A/B test. Scott is doing this by using comments in the HTML to demarcate which portions of the page should be filtered for testing. The server worker then deletes a block for some users, and deletes a different block for other users. Scott has written about this approach.

The point here isn’t about using Cloudflare. The broader point is that it’s much faster to do these things on the server. We need to defend our user’s time.

Another issue, other than third-party scripts, is the page weight on home pages and landing pages. Marketing teams love to fill these things with enticing rich imagery and carousels. They’re really difficult to keep performant because they change all the time. Sometimes we’re not even in control of the source code of these pages.

We can advocate for new best practices like responsive images. The srcset attribute on the img element; the picture element for when you need more control. These are great tools. What’s not so great is writing the markup. It’s confusing! Ideally we’d have a CMS drive this, but a lot of the time, landing pages fall outside of the purview of the CMS.

Scott has been using Vue.js to make a responsive image builder—a form that people can paste their URLs into, which spits out the markup to use. Anything we can do by creating tools like these really helps to defend the performance of a site.

Another thing we can do is lazy loading. Focus on the assets. The BBC homepage uses some lazy loading for images—they blink into view as your scroll down the page. They use LazySizes, which you can find on Github. You use data- attributes to list your image sources. Scott realises that LazySizes is not progressive enhancement. He wouldn’t recommend using it on all images, just some images further down the page.

But thankfully, we won’t need these workarounds soon. Soon we’ll have lazy loading in browsers. There’s a lazyload attribute that we’ll be able to set on img and iframe elements:

<img src=".." alt="..." lazyload="on">

It’s not implemented yet, but it’s coming in Chrome. It might be that this behaviour even becomes the default way of loading images in browsers.

If you dig under the hood of the implementation coming in Chrome, it actually loads all the images, but the ones being lazyloaded are only sent partially with a 206 response header. That gives enough information for the browser to lay out the page without loading the whole image initially.

To wrap up, Scott takes comfort from the fact that there are resilient patterns out there to help us. And remember, it is our job to defend the user’s experience.

Monday, March 4th, 2019

Cache-Control for Civilians – CSS Wizardry

Harry breaks down cache-control headers into steps that even I can understand. I’ll be using this a reference for sure.

Thursday, December 6th, 2018

Introducing Background Fetch  |  Web  |  Google Developers

I’m going to have to read through this article by Jake a few times before I begin to wrap my head around this background fetch thing, but it looks like it would be perfect for something like the dConstruct Audio Archive, where fairly large files can be saved for offline listening.

Saturday, November 24th, 2018

PushAPI without Notifications | Seblog

Remember when I wrote about using push without notifications? Sebastiaan has written up the details of the experiment he conducted at Indie Web Camp Berlin.

Tuesday, November 20th, 2018

Push and ye shall receive | CSS-Tricks

Imagine a PWA podcast app that works offline and silently receives and caches new podcasts. Sweet. Now we need a permissions model that allows for silent notifications.

Tuesday, November 13th, 2018

Inlining or Caching? Both Please! | Filament Group, Inc., Boston, MA

This just blew my mind! A fiendishly clever pattern that allows you to inline resources (like critical CSS) and cache that same content for later retrieval by a service worker.

Crazy clever!

Sunday, November 11th, 2018

Push without notifications

On the first day of Indie Web Camp Berlin, I led a session on going offline with service workers. This covered all the usual use-cases: pre-caching; custom offline pages; saving pages for offline reading.

But on the second day, Sebastiaan spent a fair bit of time investigating a more complex use of service workers with the Push API.

The Push API is what makes push notifications possible on the web. There are a lot of moving parts—browser, server, service worker—and, frankly, it’s way over my head. But I’m familiar with the general gist of how it works. Here’s a typical flow:

  1. A website prompts the user for permission to send push notifications.
  2. The user grants permission.
  3. A whole lot of complicated stuff happens behinds the scenes.
  4. Next time the website publishes something relevant, it fires a push message containing the details of the new URL.
  5. The user’s service worker receives the push message (even if the site isn’t open).
  6. The service worker creates a notification linking to the URL, interrupting the user, and generally adding to the weight of information overload.

Here’s what Sebastiaan wanted to investigate: what if that last step weren’t so intrusive? Here’s the alternate flow he wanted to test:

  1. A website prompts the user for permission to send push notifications.
  2. The user grants permission.
  3. A whole lot of complicated stuff happens behinds the scenes.
  4. Next time the website publishes something relevant, it fires a push message containing the details of the new URL.
  5. The user’s service worker receives the push message (even if the site isn’t open).
  6. The service worker fetches the contents of the URL provided in the push message and caches the page. Silently.

It worked.

I think this could be a real game-changer. I don’t know about you, but I’m very, very wary of granting websites the ability to send me push notifications. In fact, I don’t think I’ve ever given a website permission to interrupt me with push notifications.

You’ve seen the annoying permission dialogues, right?

In Firefox, it looks like this:

Will you allow name-of-website to send notifications?

[Not Now] [Allow Notifications]

In Chrome, it’s:

name-of-website wants to

Show notifications

[Block] [Allow]

But in actual fact, these dialogues are asking for permission to do two things:

  1. Receive messages pushed from the server.
  2. Display notifications based on those messages.

There’s no way to ask for permission just to do the first part. That’s a shame. While I’m very unwilling to grant permission to be interrupted by intrusive notifications, I’d be more than willing to grant permission to allow a website to silently cache timely content in the background. It would be a more calm technology.

Think of the use cases:

  • I grant push permission to a magazine. When the magazine publishes a new article, it’s cached on my device.
  • I grant push permission to a podcast. Whenever a new episode is published, it’s cached on my device.
  • I grant push permission to a blog. When there’s a new blog post, it’s cached on my device.

Then when I’m on a plane, or in the subway, or in any other situation without a network connection, I could still visit these websites and get content that’s fresh to me. It’s kind of like background sync in reverse.

There’s plenty of opportunity for abuse—the cache could get filled with content. But websites can already do that, and they don’t need to be granted any permissions to do so; just by visiting a website, it can add multiple files to a cache.

So it seems that the reason for the permissions dialogue is all about displaying notifications …not so much about receiving push messages from the server.

I wish there were a way to implement this background-caching pattern without requiring the user to grant permission to a dialogue that contains the word “notification.”

I wonder if the act of adding a site to the home screen could implicitly grant permission to allow use of the Push API without notifications?

In the meantime, the proposal for periodic synchronisation (using background sync) could achieve similar results, but in a less elegant way; periodically polling for new content instead of receiving a push message when new content is published. Also, it requires permission. But at least in this case, the permission dialogue should be more specific, and wouldn’t include the word “notification” anywhere.