Tags: privacy

4

sparkline

Name That Script! by Trent Walton

Trent is about to pop his AEA cherry and give a talk at An Event Apart in Boston. I’m going to attempt to liveblog this:

How many third-party scripts are loading on our web pages these days? How can we objectively measure the value of these (advertising, a/b testing, analytics, etc.) scripts—considering their impact on web performance, user experience, and business goals? We’ve learned to scrutinize content hierarchy, browser support, and page speed as part of the design and development process. Similarly, Trent will share recent experiences and explore ways to evaluate and discuss the inclusion of 3rd-party scripts.

Trent is going to speak about third-party scripts, which is funny, because a year ago, he never would’ve thought he’d be talking about this. But he realised he needed to pay more attention to:

any request made to an external URL.

Or how about this:

A resource included with a web page that the site owner doesn’t explicitly control.

When you include a third-party script, the third party can change the contents of that script.

Here are some uses:

  • advertising,
  • A/B testing,
  • analytics,
  • social media,
  • content delivery networks,
  • customer interaction,
  • comments,
  • tag managers,
  • fonts.

You get data from things like analytics and A/B testing. You get income from ads. You get content from CDNs.

But Trent has concerns. First and foremost, the user experience effects of poor performance. Also, there are the privacy implications.

Why does Trent—a designer—care about third party scripts? Well, over the years, the areas that Trent pays attention to has expanded. He’s progressed from image comps to frontend to performance to accessibility to design systems to the command line and now to third parties. But Trent has no impact on those third-party scripts. That’s very different to all those other areas.

Trent mostly builds prototypes. Those then get handed over for integration. Sometimes that means hooking it up to a CMS. Sometimes it means adding in analytics and ads. It gets really complex when you throw in third-party comments, payment systems, and A/B testing tools. Oftentimes, those third-party scripts can outweigh all the gains made beforehand. It happens with no discussion. And yet we spent half a meeting discussing a border radius value.

Delivering a performant, accessible, responsive, scalable website isn’t enough: I also need to consider the impact of third-party scripts.

Trent has spent the last few months learning about third parties so he can be better equiped to discuss them.

UX, performance and privacy impact

We feel the UX impact every day we browse the web (if we turn off our content blockers). The Food Network site has an intersitial asking you to disable your ad blocker. They promise they won’t spawn any pop-up windows. Trent turned his ad blocker off—the page was now 15 megabytes in size. And to top it off …he got a pop up.

Privacy can harder to perceive. We brush aside cookie notifications. What if the wording was “accept trackers” instead of “accept cookies”?

Remarketing is that experience when you’re browsing for a spatula and then every website you visit serves you ads for spatula. That might seem harmless but allowing access to our browsing history has serious privacy implications.

Web builders are on the front lines. It’s up to us to advocate for data protection and privacy like we do for web standards. Don’t wait to be told.

Categories of third parties

Ghostery categories third-party providers: advertising, comments, customer interaction, essential, site analytics, social media. You can dive into each layer and see the specific third-party services on the page you’re viewing.

Analyse and itemise third-party scripts

We have “view source” for learning web development. For third parties, you need some tool to export the data. HAR files (HTTP ARchive) are JSON files that you can create from most browsers’ network request panel in dev tools. But what do you do with a .har file? The site har.tech has plenty of resources for you. That’s where Trent found the Mac app, Charles. It can open .har files. Best of all, you can export to CSV so you can share spreadsheets of the data.

You can visualise third-party requests with Simon Hearne’s excellent Request Map. It’s quite impactful for delivering a visceral reaction in a meeting—so much more effective than just saying “hey, we have a lot of third parties.” Request Map can also export to CSV.

Know industry averages

Trent wanted to know what was “normal.” He decided to analyse HAR files for Alexa’s top 50 US websites. The result was a massive spreadsheet of third-party providers. There were 213 third-party domains (which is not even the same as the number of requests). There was an average of 22 unique third-party domains per site. The usual suspects were everywhere—Google, Amazon, Facebook, Adobe—but there were many others. You can find an alphabetical index on better.fyi/trackers. Often the lesser-known domains turn out to be owned by the bigger domains.

News sites and shopping sites have the most third-party scripts, unsurprisingly.

Understand benefits

Trent realised he needed to listen and understand why third-party scripts are being included. He found out what tag managers do. They’re funnels that allow you to cram even more third-party scripts onto your website. Trent worried that this was a Pandora’s box. The tag manager interface is easy to access and use. But he was told that it’s more like a way of organising your third-party scripts under one dashboard. But still, if you get too focused on the dashboard, you could lose focus of the impact on load times. So don’t blame the tool: it’s all about how it’s used.

Take action

Establish a centre of excellence. Put standards in place—in a cross-discipline way—to define how third-party scripts are evaluated. For example:

  1. Determine the value to the business.
  2. Avoid redundant scripts and services.
  3. Fit within the established performance budget.
  4. Comply with the organistional privacy policy.

Document those decisions, maybe even in your design system.

Also, include third-party scripts within your prototypes to get a more accurate feel for the performance implications.

On a live site, you can regularly audit third-party scripts on a regular basis. Check to see if any are redundant or if they’re exceeding the performance budget. You can monitor performance with tools like Calibre and Speed Curve to cover the time in between audits.

Make your case

Do competitive analysis. Look at other sites in your sector. It’s a compelling way to make a case for change. WPO Stats is very handy for anecdata.

You can gather comparative data with Web Page Test: you can run a full test, and you can run a test with certain third parties blocked. Use the results to kick off a discussion about the impact of those third parties.

Talk it out

Work to maintain an ongoing discussion with the entire team. As Tim Kadlec says:

Everything should have a value, because everything has a cost.

100 words 010

  • Nobody writes on their own website any more. People write on Twitter, Facebook, and Medium instead. Personal publishing is dead.
  • You can’t build a complex interactive app on the web without requiring JavaScript. It’s fine if it doesn’t work without JavaScript. JavaScript is ubiquitous.
  • Privacy is dead. The technologies exists to monitor your every movement and track all your communications so it is inevitable that our society will bend to this reality.

These statements aren’t true. But they are repeated so often, as if they were truisms, that we run the risk of believing them and thus, fulfilling their promise.

Analytical

When I was talking about Huffduffer, I mentioned that I don’t have any analytics set up for the site:

To be honest, I’m okay with that—one of the perks of having a personal project is that only metric that really matters is your own satisfaction.

For a while, I did have Google Analytics set up on The Session. But I started to feel a bit uncomfortable about willingly opening up a wormhole between my site and the Google mothership. It bothered me that Adblock Plus would show that one ad had been blocked on the site. There are no ads on the site, but the presence of the Google Analytics code was providing valuable information to Google—and its advertiser customer base—so I can understand why it gets flagged up like any other unwanted tracking.

Theoretically, users have a way of opting out of this kind of tracking by switching on the Do Not Track header (if it isn’t switched on by default). Looking at the default JavaScript code that Google provides for setting up Google Analytics, I don’t see any mention of navigator.doNotTrack.

Now, it may well be that Google sniffs for that header (and abandons any tracking) when its server is pinged via the analytics code, but there’s no way to tell from this side of the Googleplex. I certainly don’t see any mention of it in the JavaScript that gets inserted into our web pages.

I was wondering whether it makes sense to explictly check for the doNotTrack header before opening up that connection to google-analytics.com via a generated script element.

So if the current code looks like:

<script>
// generate a script element
// point it to google-analytics.com
</script>

Would it make sense to wrap it with some kind of test for navigator.doNotTrack:

<script>
if (!navigator.doNotTrack || navigator.doNotTrack != 'yes') {
// generate a script element
// point it to google-analytics.com
}
</script>

For the love of mercy, don’t actually use that code—it’s completely untested and probably causes more harm than good. But you can see the idea that I’m trying to get at, right? Google Analytics most definitely counts as tracking so it seems like the ideal use-case for Do Not Track.

It raises a few questions:

  1. Is anyone doing this already? It might well be that the answer is “no”, not because of any reluctance to respect user preferences but because the doNotTrack spec is very much in flux.
  2. Would you consider doing this?
  3. If you were to do this, could you foresee getting pushback from within your own company?

Tracking

Ajax was a really big deal six, seven, eight years ago. My second book was all about Ajax. I spoke about Ajax at conferences and gave workshops all about using Ajax and progressive enhancement.

During those workshops, I would often point out that Ajax had the potential to be abused terribly. Until the advent of Ajax, it was very clear to a user when data was being submitted to a server: you’d have to click a link or submit a form. As soon as you introduce asynchronous communication, it’s possible for the server to get information from the client even without a full-page refresh.

Imagine, for example, that you’re typing a message into a textarea. You might begin by typing, “Why, you stuck up, half-witted, scruffy-looking nerf…” before calming down and thinking better of it. Before Ajax, there was no way that what you had typed could ever reach the server. But now, it’s entirely possible to send data via Ajax with every key press.

It was just a thought experiment. I wasn’t actually that worried that anyone would ever do something quite so creepy.

Then I came across this article by Jennifer Golbeck in Slate all about Facebook tracking what’s entered—but then erased—within its status update form:

Unfortunately, the code that powers Facebook still knows what you typed—even if you decide not to publish it. It turns out that the things you explicitly choose not to share aren’t entirely private.

Initially I thought there must have been some mistake. I erronously called out Jen Golbeck when I found the PDF of a paper called The Post that Wasn’t: Exploring Self-Censorship on Facebook. The methodology behind the sample group used for that paper was much more old-fashioned than using Ajax:

First, participants took part in a weeklong diary study during which they used SMS messaging to report all instances of unshared content on Facebook (i.e., content intentionally self-censored). Participants also filled out nightly surveys to further describe unshared content and any shared content they decided to post on Facebook. Next, qualified participants took part in in-lab interviews.

But the Slate article was referencing a different paper that does indeed use Ajax to track instances of deleted text:

This research was conducted at Facebook by Facebook researchers. We collected self-censorship data from a random sample of approximately 5 million English-speaking Facebook users who lived in the U.S. or U.K. over the course of 17 days (July 6-22, 2012).

So what I initially thought was a case of alarmism—conflating something as simple as simple as a client-side character count with actual server-side monitoring—turned out to be a pretty accurate reading of the situation. I originally intended to write a scoffing post about Slate’s linkbaiting alarmism (and call it “The shocking truth behind the latest Facebook revelation”), but it turns out that my scoffing was misplaced.

That said, the article has been updated to reflect that the Ajax requests are only sending information about deleted characters—not the actual content. Still, as we learned very clearly from the NSA revelations, there’s not much practical difference between logging data and logging metadata.

The nerds among us may start firing up our developer tools to keep track of unexpected Ajax requests to the server. But what about everyone else?

This isn’t the first time that the power of JavaScript has been abused. Every browser now ships with an option to block pop-up windows. That’s because the ability to spawn new windows was so horribly misused. Maybe we’re going to see similar preference options to avoid firing Ajax requests on keypress.

It would be depressingly reductionist to conclude that any technology that can be abused will be abused. But as long as there are web developers out there who are willing to spawn pop-up windows or force persistent cookies or use Ajax to track deleted content, the depressingly reductionist conclusion looks like self-fulfilling prophecy.