Tags: rendering

2

sparkline

In AMP we trust

AMP Conf was one of those deep dive events, with two days dedicated to one single technology: AMP.

Except AMP isn’t really one technology, is it? And therein lies the confusion. This was at the heart of the panel I was on. When we talk about AMP, we could be talking about one of three things:

  1. The AMP format. A bunch of web components. For instance, instead of using an img element on an AMP page, you use an amp-img element instead.
  2. The AMP rules. There’s one JavaScript file, hosted on Google’s servers, that turns those web components from spans into working elements. No other JavaScript is allowed. All your styles must be in a style element instead of an external file, and there’s a limit on what you can do with those styles.
  3. The AMP cache. The source of most confusion—and even downright enmity—this is what’s behind the fact that when you launch an AMP result from Google search, you don’t go to another website. You see Google’s cached copy of the page instead of the original.

The first piece of AMP—the format—is kind of like a collection of marginal gains. Where the img element might have some performance issues, the amp-img element optimises for perceived performance. But if you just used the AMP web components, it wouldn’t be enough to make your site blazingly fast.

The second part of AMP—the rules—is where the speed gains start to really show. You can’t have an external style sheet, and crucially, you can’t have any third-party scripts other than the AMP script itself. This is key to making AMP pages super fast. It’s not so much about what AMP does; it’s more about what it doesn’t allow. If you never used a single AMP component, but stuck to AMP’s rules disallowing external styles and scripts, you could easily make a page that’s even faster than what AMP can do.

At AMP Conf, Natalia pointed out that The Guardian’s non-AMP pages beat out the AMP pages for performance. So why even have AMP pages? Well, that’s down to the third, most contentious, part of the AMP puzzle.

The AMP cache turns the user experience of visiting an AMP page from fast to instant. While you’re still on the search results page, Google will pre-render an AMP page in the background. Not pre-fetch, pre-render. That’s why it opens so damn fast. It’s also what causes the most confusion for end users.

From my unscientific polling, the behaviour of AMP results confuses the hell out of people. The fact that the page opens instantly isn’t the problem—far from it. It’s the fact that you don’t actually go to an another page. Technically, you’re still on Google. An analogous mental model would be an RSS reader, or an email client: you don’t go to an item or an email; you view it in situ.

Well, that mental model would be fine if it were consistent. But in Google search, only some results will behave that way (the AMP pages) and others will behave just like regular links to other websites. No wonder people are confused! Some search results take them away and some search results keep them on Google …even though the page looks like a different website.

The price that we pay for the instantly-opening AMP pages from the Google cache is the URL. Because we’re looking at Google’s pre-rendered copy instead of the original URL, the address bar is not pointing to the site the browser claims to be showing. Everything in the body of the browser looks like an article from The Guardian, but if I look at the URL (which is what security people have been telling us for years is important to avoid being phished), then I’ll see a domain that is not The Guardian’s.

But wait! Couldn’t Google pre-render the page at its original URL?

Yes, they could. But they won’t.

This was a point that Paul kept coming back to: trust. There’s no way that Google can trust that someone else’s URL will play by the AMP rules (no external scripts, only loading embedded content via web components, limited styles, etc.). They can only trust the copies that they themselves are serving up from their cache.

By the way, there was a joint AMP/search panel at AMP Conf with representatives from both teams. As you can imagine, there were many questions for the search team, most of which were Glomar’d. But one thing that the search people said time and again was that Google was not hosting our AMP pages. Now I don’t don’t know if they were trying to make some fine-grained semantic distinction there, but that’s an outright falsehood. If I click on a link, and the URL I get taken to is a Google property, then I am looking at a page hosted by Google. Yes, it might be a copy of a document that started life somewhere else, but if Google are serving something from their cache, they are hosting it.

This is one of the reasons why AMP feels like such a bait’n’switch to me. When it first came along, it felt like a direct competitor to Facebook’s Instant Articles and Apple News. But the big difference, we were told, was that you get to host your own content. That appealed to me much more than having Facebook or Apple host the articles. But now it turns out that Google do host the articles.

This will be the point at which Googlers will say no, no, no, you can totally host your own AMP pages …but you won’t get the benefits of pre-rendering. But without the pre-rendering, what’s the point of even having AMP pages?

Well, there is one non-cache reason to use AMP and it’s a political reason. Beleaguered developers working for publishers of big bloated web pages have a hard time arguing with their boss when they’re told to add another crappy JavaScript tracking script or bloated library to their pages. But when they’re making AMP pages, they can easily refuse, pointing out that the AMP rules don’t allow it. Google plays the bad cop for us, and it’s a very valuable role. Sarah pointed this out on the panel we were on, and she was spot on.

Alright, but what about The Guardian? They’ve already got fast pages, but they still have to create separate AMP pages if they want to get the pre-rendering benefits when they show up in Google search results. Sorry, says Google, but it’s the only way we can trust that the pre-rendered page will be truly fast.

So here’s the impasse we’re at. Google have provided a list of best practices for making fast web pages, but the only way they can truly verify that a page is sticking to those best practices is by hosting their own copy, URLs be damned.

This was the crux of Paul’s argument when he was on the Shop Talk Show podcast (it’s a really good episode—I was genuinely reassured to hear that Paul is not gung-ho about drinking the AMP Kool Aid; he has genuine concerns about the potential downsides for the web).

Initially, I accepted this argument that Google just can’t trust the rest of the web. But the more I talked to people at AMP Conf—and I had some really, really good discussions with people away from the stage—the more I began to question it.

Here’s the thing: the regular Google search can’t guarantee that any web page is actually 100% the right result to return for a search. Instead there’s a lot of fuzziness involved: based on the content, the markup, and the number of trusted sources linking to this, it looks like it should be a good result. In other words, Google search trusts websites to—by and large—do the right thing. Sometimes websites abuse that trust and try to game the system with sneaky tricks. Google responds with penalties when that happens.

Why can’t it be the same for AMP pages? Let me host my own AMP pages (maybe even host my own AMP script) and then when the Googlebot crawls those pages—the same as it crawls any other pages—that’s when it can verify that the AMP page is abiding by the rules. If I do something sneaky and trick Google into flagging a page as fast when it actually isn’t, then take my pre-rendering reward away from me.

To be fair, Google has very, very strict rules about what and how to pre-render the AMP results it’s caching. I can see how allowing even the potential for a false positive would have a negative impact on the user experience of Google search. But c’mon, there are already false positives in regular search results—fake news, spam blogs. Googlers are smart people. They can solve—or at least mitigate—these problems.

Google says it can’t trust our self-hosted AMP pages enough to pre-render them. But they ask for a lot of trust from us. We’re supposed to trust Google to cache and host copies of our pages. We’re supposed to trust Google to provide some mechanism to users to get at the original canonical URL. I’d like to see trust work both ways.

Where to start?

A lot of the talks at this year’s Chrome Dev Summit were about progressive web apps. This makes me happy. But I think the focus is perhaps a bit too much on the “app” part on not enough on “progressive”.

What I mean is that there’s an inevitable tendency to focus on technologies—Service Workers, HTTPS, manifest files—and not so much on the approach. That’s understandable. The technologies are concrete, demonstrable things, whereas approaches, mindsets, and processes are far more nebulous in comparison.

Still, I think that the most important facet of building a robust, resilient website is how you approach building it rather than what you build it with.

Many of the progressive app demos use server-side and client-side rendering, which is great …but that aspect tends to get glossed over:

Browsers without service worker support should always be served a fall-back experience. In our demo, we fall back to basic static server-side rendering, but this is only one of many options.

I think it’s vital to not think in terms of older browsers “falling back” but to think in terms of newer browsers getting a turbo-boost. That may sound like a nit-picky semantic subtlety, but it’s actually a radical difference in mindset.

Many of the arguments I’ve heard against progressive enhancement—like Tom’s presentation at Responsive Field Day—talk about the burdensome overhead of having to bolt on functionality for older or less-capable browsers (even Jake has done this). But the whole point of progressive enhancement is that you start with the simplest possible functionality for the greatest number of users. If anything gets bolted on, it’s the more advanced functionality for the newer or more capable browsers.

So if your conception of progressive enhancement is that it’s an added extra, I think you really need to turn that thinking around. And that’s hard. It’s hard because you need to rewire some well-engrained pathways.

There is some precedence for this though. It was really, really hard to convince people to stop using tables for layout and starting using CSS instead. That was a tall order—completely change the way you approach building on the web. But eventually we got there.

When Ethan came out with Responsive Web Design, it was an equally difficult pill to swallow, not because of the technologies involved—media queries, percentages, etc.—but because of the change in thinking that was required. But eventually we got there.

These kinds of fundamental changes are inevitably painful …at first. After years of building websites using tables for layout, creating your first CSS-based layout was demoralisingly difficult. But the second time was a bit easier. And the third time, easier still. Until eventually it just became normal.

Likewise with responsive design. After years of building fixed-width websites, trying to build in a fluid, flexible way was frustratingly hard. But the second time wasn’t quite as hard. And the third time …well, eventually it just became normal.

So if you’re used to thinking of the all-singing, all-dancing version of your site as the starting point, it’s going to be really, really hard to instead start by building the most basic, accessible version first and then work up to the all-singing, all-dancing version …at first. But eventually it will just become normal.

For now, though, it’s going to take work.

The recent redesign of Google+ is true case study in building a performant, responsive, progressive site:

With server-side rendering we make sure that the user can begin reading as soon as the HTML is loaded, and no JavaScript needs to run in order to update the contents of the page. Once the page is loaded and the user clicks on a link, we do not want to perform a full round-trip to render everything again. This is where client-side rendering becomes important — we just need to fetch the data and the templates, and render the new page on the client. This involves lots of tradeoffs; so we used a framework that makes server-side and client-side rendering easy without the downside of having to implement everything twice — on the server and on the client.

This took work. Had they chosen to rely on client-side rendering alone, they could have built something quicker. But I think it was worth laying that solid foundation. And the next time they need to build something this way, it’s going to be less work. Eventually it just becomes normal.

But it all starts with thinking of the server-side rendering as the default. Server-side rendering is not a fallback; client-side rendering is an enhancement.

That’s exactly the kind of mindset that enables Jack Franklin to build robust, resilient websites:

Now we’ll build the React application entirely on the server, before adding the client-side JavaScript right at the end.

I had a chance to chat briefly with Jack at the Edge conference in London and I congratulated him on the launch of a Go Cardless site that used exactly this technique. He told me that the decision to flip the switch and make it act as a single page app came right at the end of the project. Server-side rendering was the default; client-side rendering was added later.

The key to building modern, resilient, progressive sites doesn’t lie in browser technologies or frameworks; it lies in how we think about the task at hand; how we approach building from the ground up rather than the top down. Changing the way we fundamentally think about building for the web is inevitably going to be challenging …at first. But it will also be immensely rewarding.