Tags: analytics

6

sparkline

GDPR and Google Analytics

Enforcement of the European Union’s General Data Protection Regulation is coming very, very soon. Look busy. This regulation is not limited to companies based in the EU—it applies to any service anywhere in the world that can be used by citizens of the EU.

It’s less about data protection and more like a user’s bill of rights. That’s good. Cennydd has written a techie’s rough guide to GDPR.

The Open Data Institute’s Jeni Tennison wrote down her thoughts on how it could change data portability in particular. While she welcomes GDPR, she has some misgivings.

Blaine—who really needs to get a blog—shared his concerns in the form of the online equivalent of interpretive dance …a twitter thread (it’s called a thread because it inevitably gets all tangled, and it’s easy to break.)

The interesting thing about the so-called “cookie law” is that it makes no mention of cookies whatsoever. It doesn’t list any specific technology. Instead it states that any means of tracking or identifying users across websites requires disclosure. So if you’re setting a cookie just to manage state—so that users can log in, or keep items in a shopping basket—the legislation doesn’t apply. But as soon as your site allows a third-party to set a cookie, it’s banner time.

Google Analytics is a classic example of a third-party service that uses cookies to track people across domains. That’s pretty much why it exists. We, as site owners, get to use this incredibly powerful tool, and all we have to do in return is add one little snippet of JavaScript to our pages. In doing so, we’re allowing a third party to read or write a cookie from their domain.

Before Google Analytics, Google—the search engine business—was able to identify and track what users were searching for, and which search results they clicked on. But as soon as the user left google.com, the trail went cold. By creating an enormously useful analytics product that only required site owners to add a single line of JavaScript, Google—the online advertising business—gained the ability to keep track of users across most of the web, whether they were on a site owned by Google or not.

Under the old “cookie law”, using a third-party cookie-setting service like that meant you had to inform any of your users who were citizens of the EU. With GDPR, that changes. Now you have to get consent. A dismissible little overlay isn’t going to cut it any more. Implied consent isn’t enough.

Now this situation raises an interesting question. Who’s responsible for getting consent? Is it the site owner or the third party whose script is the conduit for the tracking?

In the first scenario, you’d need to wait for an explicit agreement from a visitor to your site before triggering the Google Analytics functionality. Suddenly it’s not as simple as adding a single line of JavaScript to your site.

In the second scenario, you don’t do anything differently than before—you just add that single line of JavaScript. But now that script would need to launch the interface for getting consent before doing any tracking. Google Analytics would go from being something invisible to something that directly impacts the user experience of your site.

I’m just using Google Analytics as an example here because it’s so widespread. This also applies to third-party sharing buttons—Twitter, Facebook, etc.—and of course, advertising.

In the case of advertising, it gets even thornier because quite often, the site owner has no idea which third party is about to do the tracking. Many, many sites use intermediary services (y’know, ‘cause bloated ad scripts aren’t slowing down sites enough so let’s throw some just-in-time bidding into the mix too). You could get consent for the intermediary service, but not for the final advert—neither you nor your site’s user would have any idea what they were consenting to.

Interesting times. One way or another, a massive amount of the web—every website using Google Analytics, embedded YouTube videos, Facebook comments, embedded tweets, or third-party advertisements—will be liable under GDPR.

It’s almost as if the ubiquitous surveillance of people’s every move on the web wasn’t a very good idea in the first place.

Needs must

I got a follow-up comment to my follow-up post about the follow-up comment I got on my original post about Google Analytics. Keep up.

I made the point that, from a front-end performance perspective, server logs have no impact whereas a JavaScript-based analytics solution must have some impact on the end user. Paul Anthony says:

Google won the analytics war because dropping one line of JS in the footer and handing a tried and tested interface to customers is an obvious no brainer in comparison to setting up an open source option that needs a cron job to parse the files, a database to store the results and doesn’t provide mobile interface.

Good point. Dropping one snippet of JavaScript into your front-end codebase is certainly an easier solution …easier for you, that is. The cost is passed on to your users. This is a classic example of where user needs and developer needs are in opposition. I’ve said it before and I’ll say it again:

Given the choice between making something my problem, and making something the user’s problem, I’ll choose to make it my problem every time.

It’s true that this often means doing more work. That’s why it’s called work. This is literally what our jobs are supposed to entail: we put in the work to make life easier for users. We’re supposed to be saving them time, not passing it along.

The example of Google Analytics is pretty extreme, I’ll grant you. The cost to the user of adding that snippet of JavaScript—if you’ve configured things reasonably well—is pretty small (again, just from a performance perspective; there’s still the cost of allowing Google to track them across domains), and the cost to you of setting up a comparable analytics system based on server logs can indeed be disproportionately high. But this tension between user needs and developer needs is something I see play out again and again.

I’ve often thought the HTML design principle called the priority of constituencies could be adopted by web developers:

In case of conflict, consider users over authors over implementors over specifiers over theoretical purity. In other words costs or difficulties to the user should be given more weight than costs to authors.

In Resilient Web Design, I documented the three-step approach I take when I’m building anything on the web:

  1. Identify core functionality.
  2. Make that functionality available using the simplest possible technology.
  3. Enhance!

Now I’m wondering if I should’ve clarified that second step further. When I talk about choosing “the simplest possible technology”, what I mean is “the simplest possible technology for the user”, not “the simplest possible technology for the developer.”

For example, suppose I were going to build a news website. The core functionality is fairly easy to identify: providing the news. Next comes the step where I choose the simplest possible technology. Now, if I were a developer who had plenty of experience building JavaScript-driven single page apps, I might conclude that the simplest route for me would be to render the news via JavaScript. But that would be a fragile starting point if I’m trying to reach as many people as possible (I might well end up building a swishy JavaScript-driven single page app in step three, but step two should almost certainly be good ol’ HTML).

Time and time again, I see decisions that favour developer convenience over user needs. Don’t get me wrong—as a developer, I absolutely want developer convenience …but not at the expense of user needs.

I know that “empathy” is an over-used word in the world of user experience and design, but with good reason. I think we should try to remind ourselves of why we make our architectural decisions by invoking who those decisions benefit. For example, “This tech stack is best option for our team”, or “This solution is the best for the widest range of users.” Then, given the choice, favour user needs in the decision-making process.

There will always be situations where, given time and budget constraints, we end up choosing solutions that are easier for us, but not the best for our users. And that’s okay, as long as we acknowledge that compromise and strive to do better next time.

But when the best solutions for us as developers become enshrined as the best possible solutions, then we are failing the people we serve.

That doesn’t mean we must become hairshirt-wearing martyrs; developer convenience is important …but not as important as user needs. Start with user needs.

Heisenberg

I wrote about Google Analytics yesterday. As usual, I syndicated the post to Ev’s blog, and I got an interesting response over there. Kelly Burgett set me straight on some of the finer details of how goals work, and finished with this thought:

You mention “delivering a performant, accessible, responsive, scalable website isn’t enough” as if it should be, and I have to disagree. It’s not enough for a business to simply have a great website if you are unable to understand performance of channel marketing, track user demographics and behavior on-site, and optimize your site/brand based on that data. I’ve seen a lot of ugly sites who have done exceptionally well in terms of ROI, simply because they are getting the data they need from the site in order make better business decisions. If your site cannot do that (ie. through data collection, often third party scripts), then your beautifully-designed site can only take you so far.

That makes an excellent case for having analytics. But that’s not necessarily the same as having Google analytics, or even JavaScript-driven analytics at all.

By far the most useful information you get from analytics is around where people have come from, where did they go next, and what kind of device are they using. None of that information requires JavaScript. It’s all available from your server logs.

I don’t want to come across all old-man-yell-at-cloud here, but I’m trying to remember at what point self-hosted software for analysing your log traffic became not good enough.

Here’s the thing: logging on the server has no effect on the user experience. It’s basically free, in terms of performance. Logging via JavaScript, by its very nature, has some cost. Even if its negligible, that’s one more request, and that’s one more bit of processing for the CPU.

All of the data that you can only get via JavaScript (in-page actions, heat maps, etc.) are, in my experience, better handled by dedicated software. To me, that kind of more precise data feels different to analytics in the sense of funnels, conversions, goals and all that stuff.

So in order to get more fine-grained data to analyse, our analytics software has now doubled down on a technology—JavaScript—that has an impact on the end user, where previously the act of observation could be done at a distance.

There are also blind spots that come with JavaScript-based tracking. According to Google Analytics, 0% of your customers don’t have JavaScript. That’s not necessarily true, but there’s literally no way for Google Analytics—which relies on JavaScript—to even do its job in the absence of JavaScript. That can lead to a dangerous situation where you might be led to think that 100% of your potential customers are getting by, when actually a proportion might be struggling, but you’ll never find out about it.

Related: according to Google Analytics, 0% of your customers are using ad-blockers that block requests to Google’s servers. Again, that’s not necessarily a true fact.

So I completely agree than analytics are a good thing to have for your business. But it does not follow that Google Analytics is a good thing for your business. Other options are available.

I feel like the assumption that “analytics = Google Analytics” is like the slippery slope in reverse. If we’re all agreed that analytics are important, then aren’t we also all agreed that JavaScript-based tracking is important?

In a word, no.

This reminds me of the arguments made in favour of intrusive, bloated advertising scripts. All of the arguments focus on the need for advertising—to stay in business, to pay the writers—which are all great reasons for advertising, but have nothing to do with JavaScript, which is at the root of the problem. Everyone I know who uses an ad-blocker—including me—doesn’t use it to stop seeing adverts, but to stop the performance of the page being degraded (and to avoid being tracked across domains).

So let’s not confuse the means with the ends. If you need to have advertising, that doesn’t mean you need to have horribly bloated JavaScript-based advertising. If you need analytics, that doesn’t mean you need an analytics script on your front end.

Analysing analytics

Hell is other people’s JavaScript.

There’s nothing quite so crushing as building a beautifully performant website only to have it infested with a plague of third-party scripts that add to the weight of each page and reduce the responsiveness, making a mockery of your well-considered performance budget.

Trent has been writing about this:

My latest realization is that delivering a performant, accessible, responsive, scalable website isn’t enough: I also need to consider the impact of third-party scripts.

He’s started the process by itemising third-party scripts. Frustratingly though, there’s rarely one single culprit that you can point to—it’s the cumulative effect of “just one more beacon” and “just one more analytics script” and “just one more A/B testing tool” that adds up to a crappy experience that warms your user’s hands by ensuring your site is constantly draining their battery.

Actually, having just said that there’s rarely one single culprit, Adobe Tag Manager is often at the root of third-party problems. That and adverts. It’s like opening the door of your beautifully curated dream home, and inviting a pack of diarrhetic elephants in: “Please, crap wherever you like.”

But even the more well-behaved third-party scripts can get out of hand. Google Analytics is so ubiquitous that it’s hardly even considered in the list of potentially harmful third-party scripts. On the whole, it’s a fairly well-behaved citizen of your site’s population of third-party scripts (y’know, leaving aside the whole surveillance capitalism business model that allows you to use such a useful tool for free in exchange for Google tracking your site’s visitors across the web and selling the insights from that data to advertisers).

The initial analytics script that you—asynchronously—load into your page isn’t very big. But depending on how you’ve configured your Google Analytics account, that might just be the start of a longer chain of downloads and event handlers.

Ed recently gave a lunchtime presentation at Clearleft on using Google Analytics—he professes modesty but he really knows his stuff. He was making sure that everyone knew how to set up goals’n’stuff.

As I understand it, there are two main categories of goals: events and destinations (there are also durations and pages, but they feel similar to destinations). You use events to answer questions like “Did the user click on this button?” or “Did the user click on that search field?”. You use destinations to answer questions like “Did the user arrive at this page?” or “Did the user come from that page?”

You can add as many goals to your site’s analytics as you want. That’s an intoxicating offer. The problem is that there is potentially a cost for each goal you create. It’s an invisible cost. It’s paid by the user in the currency of JavaScript sent down the wire (I wish that the Google Analytics admin interface were more like the old interface for Google Fonts, where each extra file you added literally pushed a needle higher on a dial).

It strikes me that the event-based goals would necessarily require more JavaScript in order to listen out for those clicks and fire off that information. The destination-based goals should be able to get all the information needed from regular page navigations.

So I have a hypothesis. I think that destination-based goals are less harmful to performance than event-based goals. I might well be wrong about that, and if I am, please let me know.

With that hypothesis in mind, and until I learn otherwise, I’ve got two rules of thumb to offer when it comes to using Google Analytics:

  1. Try to keep the number of goals to a minimum.
  2. If you must create a goal, favour destinations over events.

Analytical

When I was talking about Huffduffer, I mentioned that I don’t have any analytics set up for the site:

To be honest, I’m okay with that—one of the perks of having a personal project is that only metric that really matters is your own satisfaction.

For a while, I did have Google Analytics set up on The Session. But I started to feel a bit uncomfortable about willingly opening up a wormhole between my site and the Google mothership. It bothered me that Adblock Plus would show that one ad had been blocked on the site. There are no ads on the site, but the presence of the Google Analytics code was providing valuable information to Google—and its advertiser customer base—so I can understand why it gets flagged up like any other unwanted tracking.

Theoretically, users have a way of opting out of this kind of tracking by switching on the Do Not Track header (if it isn’t switched on by default). Looking at the default JavaScript code that Google provides for setting up Google Analytics, I don’t see any mention of navigator.doNotTrack.

Now, it may well be that Google sniffs for that header (and abandons any tracking) when its server is pinged via the analytics code, but there’s no way to tell from this side of the Googleplex. I certainly don’t see any mention of it in the JavaScript that gets inserted into our web pages.

I was wondering whether it makes sense to explictly check for the doNotTrack header before opening up that connection to google-analytics.com via a generated script element.

So if the current code looks like:

<script>
// generate a script element
// point it to google-analytics.com
</script>

Would it make sense to wrap it with some kind of test for navigator.doNotTrack:

<script>
if (!navigator.doNotTrack || navigator.doNotTrack != 'yes') {
// generate a script element
// point it to google-analytics.com
}
</script>

For the love of mercy, don’t actually use that code—it’s completely untested and probably causes more harm than good. But you can see the idea that I’m trying to get at, right? Google Analytics most definitely counts as tracking so it seems like the ideal use-case for Do Not Track.

It raises a few questions:

  1. Is anyone doing this already? It might well be that the answer is “no”, not because of any reluctance to respect user preferences but because the doNotTrack spec is very much in flux.
  2. Would you consider doing this?
  3. If you were to do this, could you foresee getting pushback from within your own company?

Huffduff up and up

I had a nice Skype chat with Stan Alcorn yesterday all about Huffduffer, online sharing of audio, and all things podcasty and radioish. I’m sure I must have talked his ear off.

Stan was asking about numbers for Huffduffer’s user base and activity. I have to admit that I’ve got zero analytics running on the site. To be honest, I’m okay with that—one of the perks of having a personal project is that only metric that really matters is your own satisfaction. But I told Stan I’d run some quick database queries to get some feeling for Huffduffer’s usage patterns. Here’s what I found…

There are 5,862 people signed up to Huffduffer.

About 150,919 items have been huffduffed. But those aren’t unique files. The total number of distinct files that have been huffduffed is 5,972. That means that, on average, an audio file is huffduffed around 26 times. And the average user has huffduffed around 30 items. But neither of those distributions would be evenly distributed; they’d be power-law distributions rather than bell curves. For example, the most popular file was huffduffed 329 times.

Looking at the amount of items huffduffed each year, there’s a pleasing upward trend.

1st year 7,382
2nd year 19,080
3rd year 23,403
4th year 31,808
5th year 41,514

I was pleasantly surprised by this. I would’ve assumed that Huffduffer usage would be more of a steady-state affair, but it looks like the site is getting used a bit more with each passing year (the site is currently in its sixth(!) year).

Not that any of that really matters. I built Huffduffer to scratch my own itch. I huffduff an average of 411 audio files each year. So even if nobody else used Huffduffer, it would still provide plenty of value to me.

Like I was saying to Stan, the biggest strength and the biggest weakness of audio—as opposed to text or video—is that you can listen to it while your doing other things. For some people, car journeys are the perfect podcast time. For others, it might be doing the dishes or train journeys. For me, it’s the walk to and from work each day—it takes about 35 minutes each way, and I catch up on my Huffduffer feed during that time.

Jessica and I will often listen to some spoken word audio in the background during dinner—usually something quite radio-y like Radiolab, or NPR stories. Yesterday, we were catching up with Aleks’s BBC documentary series, The Digital Human. It was the episode about voice.

Imagine my surprise when I heard the voice of Stan Alcorn. What a co-inky-dink!