Journal tags: measurement

3

sparkline

Doing the right thing for the wrong reasons

I remember trying to convince people to use semantic markup because it’s good for accessibility. That tactic didn’t always work. When it didn’t, I would add “By the way, Google’s searchbot is indistinguishable from a screen-reader user so semantic markup is good for SEO.”

That usually worked. It always felt unsatisfying though. I don’t know why. It doesn’t matter if people do the right thing for the wrong reasons. The end result is what matters. But still. It never felt great.

It happened with responsive design and progressive enhancement too. If I couldn’t convince people based on user experience benefits, I’d pull up some official pronouncement from Google recommending those techniques.

Even AMP, a dangerously ill-conceived project, has one very handy ace in the hole. You can’t add third-party JavaScript cruft to AMP pages. That’s useful:

Beleaguered developers working for publishers of big bloated web pages have a hard time arguing with their boss when they’re told to add another crappy JavaScript tracking script or bloated library to their pages. But when they’re making AMP pages, they can easily refuse, pointing out that the AMP rules don’t allow it. Google plays the bad cop for us, and it’s a very valuable role.

AMP is currently dying, which is good news. Google have announced that core web vitals will be used to boost ranking instead of requiring you to publish in their proprietary AMP format. The really good news is that the political advantage that came with AMP has also been ported over to core web vitals.

Take user-hostile obtrusive overlays. Perhaps, as a contientious developer, you’ve been arguing for years that they should be removed from the site you work on because they’re so bad for the user experience. Perhaps you have been met with the same indifference that I used to get regarding semantic markup.

Well, now you can point out how those annoying overlays are affecting, for example, the cumulative layout shift for the site. And that number is directly related to SEO. It’s one thing for a department to over-ride UX concerns, but I bet they’d think twice about jeopardising the site’s ranking with Google.

I know it doesn’t feel great. It’s like dealing with a bully by getting an even bigger bully to threaten them. Still. Needs must.

Numbers

Core web vitals from Google are the ingredients for an alphabet soup of exlusionary intialisms. But once you get past the unnecessary jargon, there’s a sensible approach underpinning the measurements.

From May—no, June—these measurements will be a ranking signal for Google search so performance will become more of an SEO issue. This is good news. This is what Google should’ve done years ago instead of pissing up the wall with their dreadful and damaging AMP project that blackmailed publishers into using a proprietary format in exchange for preferential search treatment. It was all done supposedly in the name of performance, but in reality all it did was antagonise users and publishers alike.

Core web vitals are an attempt to put numbers on user experience. This is always a tricky balancing act. You’ve got to watch out for the McNamara fallacy. Harry has already started noticing this:

A new and unusual phenomenon: clients reluctant (even refusing) to fix performance issues unless they directly improve Vitals.

Once you put a measurement on something, there’s a danger of focusing too much on the measurement. Chris is worried that we’re going to see tips’n’tricks for gaming core web vitals:

This feels like the start of a weird new era of web performance where the metrics of web performance have shifted to user-centric measurements, but people are implementing tricky strategies to game those numbers with methods that, if anything, slightly harm user experience.

The map is not the territory. The numbers are a proxy for user experience, but it’s notoriously difficult to measure intangible ideas like pain and frustration. As Laurie says:

This is 100% the downside of automatic tools that give you a “score”. It’s like gameification. It’s about hitting that perfect score instead of the holistic experience.

And Ethan has written about the power imbalance that exists when Google holds all the cards, whether it’s AMP or core web vitals:

Google used its dominant position in the marketplace to force widespread adoption of a largely proprietary technology for creating websites. By switching to Core Web Vitals, those power dynamics haven’t materially changed.

We would do well to remember:

When you measure, include the measurer.

But if we’re going to put numbers to user experience, the core web vitals are a pretty good spread of measurements: largest contentful paint, cumulative layout shift, and first input delay.

(If you prefer using initialisms, remember that CFP is Certified Financial Planner, CLS is Community Legal Services, and FID is Flame Ionization Detector. Together they form CWV, Catholic War Veterans.)

A framework for web performance

Here at Clearleft, we’ve recently been doing some front-end consultancy. That prompted me to jot down thoughts on design principles and performance:

We continued with some more performance work this week. Having already covered some of the nitty-gritty performance tactics like font-loading, image optimisation, etc., we wanted to take a step back and formulate an ongoing strategy for performance.

When it comes to web performance, the eternal question is “What should we measure?” The answer to that question will determine where you then concentrate your efforts—whatever it is your measuring, that’s what you’ll be looking to improve.

I started by drawing a distinction between measurements of quantities and measurements of time. Quantities are quite easy to measure. You can measure these quantities using nothing more than browser dev tools:

  • overall file size (page weight + assets), and
  • number of requests.

I think it’s good to measure these quantities, and I think it’s good to have a performance budget for them. But I also think they’re table stakes. They don’t actually tell you much about the impact that performance is having on the user experience. For that, we need to enumerate moments in time:

  • time to first byte,
  • time to first render,
  • time to first meaningful paint, and
  • time to first meaningful interaction.

There’s one more moment in time, which is the time until DOM content is loaded. But I’m not sure that has a direct effect on how performance is perceived, so it feels like it belongs more in the category of quantities than time.

Next, we listed out all the factors that could affect each of the moments in time. For example, the time to first byte depends on the speed of the network that the user is on. It also depends on how speedily your server (or Content Delivery Network) can return a response. Meanwhile, time to first render is affected by the speed of the user’s network, but it’s also affected by how many blocking elements are on the critical path.

By listing all the factors out, we can draw a distinction between the factors that are outside of our control, and the factors that we can do something about. So while we might not be able to do anything about the speed of the user’s network, we might well be able to optimise the speed at which our server returns a response, or we might be able to defer some assets that are currently blocking the critical path.

Factors
1st byte
  • server speed
  • network speed
1st render
  • network speed
  • critical path assets
1st meaningful paint
  • network speed
  • font-loading strategy
  • image optimisation
1st meaningful interaction
  • network speed
  • device processing power
  • JavaScript size

So far, everything in our list of performance-affecting factors is related to the first visit. It’s worth drawing up a second list to document all the factors for subsequent visits. This will look the same as the list for first visits, but with the crucial difference that caching now becomes a factor.

First visit factors Repeat visit factors
1st byte
  • server speed
  • network speed
  • server speed
  • network speed
  • caching
1st render
  • network speed
  • critical path assets
  • network speed
  • critical path assets
  • caching
1st meaningful paint
  • network speed
  • font-loading strategy
  • image optimisation
  • network speed
  • font-loading strategy
  • image optimisation
  • caching
1st meaningful interaction
  • network speed
  • device processing power
  • JavaScript size
  • network speed
  • device processing power
  • JavaScript size
  • caching

Alright. Now it’s time to get some numbers for each of the four moments in time. I use Web Page Test for this. Choose a realistic setting, like 3G on an Android from the East coast of the USA. Under advanced settings, be sure to select “First View and Repeat View” so that you can put those numbers in two different columns.

Here are some numbers for adactio.com:

First visit time Repeat visit time
1st byte 1.476 seconds 1.215 seconds
1st render 2.633 seconds 1.930 seconds
1st meaningful paint 2.633 seconds 1.930 seconds
1st meaningful interaction 2.868 seconds 2.083 seconds

I’m getting the same numbers for first render as first meaningful paint. That tells me that there’s no point in trying to optimise my font-loading, for example …which makes total sense, because adactio.com isn’t using any web fonts. But on a different site, you might see a big gap between those numbers.

I am seeing a gap between time to first byte and time to first render. That tells me that I might be able to get some blocking requests off the critical path. Sure enough, I’m currently referencing an external stylesheet in the head of adactio.com—if I were to inline critical styles and defer the loading of that stylesheet, I should be able to narrow that gap.

A straightforward site like adactio.com isn’t going to have much to worry about when it comes to the time to first meaningful interaction, but on other sites, this can be a significant bottleneck. If you’re sending UI elements in the initial HTML, but then waiting for JavaScript to “hydrate” those elements into working, the user can end up in an uncanny valley of tapping on page elements that look fine, but aren’t ready yet.

My point is, you’re going to see very different distributions of numbers depending on the kind of site you’re testing. There’s no one-size-fits-all metric to focus on.

Now that you’ve got numbers for how your site is currently performing, you can create two new columns: one of those is a list of first-visit targets, the other is a list of repeat-visit targets for each moment in time. Try to keep them realistic.

For example, if I could reduce the time to first render on adactio.com by 0.5 seconds, my goals would look like this:

First visit goal Repeat visit goal
1st byte 1.476 seconds 1.215 seconds
1st render 2.133 seconds 1.430 seconds
1st meaningful paint 2.133 seconds 1.430 seconds
1st meaningful interaction 2.368 seconds 1.583 seconds

See how the 0.5 seconds saving cascades down into the other numbers?

Alright! Now I’ve got something to aim for. It might also be worth having an extra column to record which of the moments in time are high priority, which are medium priority, and which are low priority.

Priority
1st byte Medium
1st render High
1st meaningful paint Low
1st meaningful interaction Low

Your goals and priorities may be quite different.

I think this is a fairly useful framework for figuring out where to focus when it comes to web performance. If you’d like to give it a go, I’ve made a web performance chart for you to print out and fill in. Here’s a PDF version if that’s easier for printing. Or you can download the HTML version if you want to edit it.

I have to say, I’m really enjoying the front-end consultancy work we’ve been doing at Clearleft around performance and related technologies, like offline functionality. I’d like to do more of it. If you’d like some help in prioritising performance at your company, please get in touch. Let’s make the web faster together.