Tags: privacy

85

sparkline

Tuesday, September 3rd, 2019

Today’s Firefox Blocks Third-Party Tracking Cookies and Cryptomining by Default - The Mozilla Blog

If you haven’t done so already, you should really switch to Firefox.

Then encourage your friends and family to switch to Firefox too.

Monday, August 26th, 2019

Opening up the AMP cache

I have a proposal that I think might alleviate some of the animosity around Google AMP. You can jump straight to the proposal or get some of the back story first…

The AMP format

Google AMP is exactly the kind of framework I’d like to get behind. Unlike most front-end frameworks, its components take a declarative approach—no knowledge of JavaScript required. I think Lea’s excellent Mavo is the only other major framework that takes this inclusive approach. All the configuration happens in markup, and all the styling happens in CSS. Excellent!

But I cannot get behind AMP.

Instead of competing on its own merits, AMP is unfairly propped up by the search engine of its parent company, Google. That makes it very hard to evaluate whether AMP is being used on its own merits. Instead, the evidence suggests that most publishers of AMP pages are doing so because they feel they have to, rather than because they want to. That’s a real shame, because as a library of web components, AMP seems pretty good. But there’s just no way to evaluate AMP-the-format without taking into account AMP-the-ecosystem.

The AMP ecosystem

Google AMP ostensibly exists to make the web faster. Initially the focus was specifically on mobile performance, but that distinction has since fallen by the wayside. The idea is that by using AMP’s web components, your pages will be speedy. Though, as Andy Davies points out, this isn’t always the case:

This is where I get confused… https://independent.co.uk only have an AMP site yet it’s performance is awful from a user perspective - isn’t AMP supposed to prevent this?

See also: Google AMP lowered our page speed, and there’s no choice but to use it:

According to Google’s own Page Speed Insights audit (which Google recommends to check your performance), the AMP version of articles got an average performance score of 87. The non-AMP versions? 95.

Publishers who already have fast web pages—like The Guardian—are still compelled to make AMP versions of their stories because of the search benefits reserved for AMP. As Terence Eden reported from a meeting of the AMP advisory committee:

We heard, several times, that publishers don’t like AMP. They feel forced to use it because otherwise they don’t get into Google’s news carousel — right at the top of the search results.

Some people felt aggrieved that all the hard work they’d done to speed up their sites was for nothing.

The Google AMP team are at pains to point out that AMP is not a ranking factor in search. That’s true. But it is unfairly privileged in other ways. Only AMP pages can appear in the Top Stories carousel …which appears above any other search results. As I’ve said before:

Now, if you were to ask any right-thinking person whether they think having their page appear right at the top of a list of search results would be considered preferential treatment, I think they would say hell, yes! This is the only reason why The Guardian, for instance, even have AMP versions of their content—it’s not for the performance benefits (their non-AMP pages are faster); it’s for that prime real estate in the carousel.

From A letter about Google AMP:

Content that “opts in” to AMP and the associated hosting within Google’s domain is granted preferential search promotion, including (for news articles) a position above all other results.

That’s not the only way that AMP pages get preferential treatment. It turns out that the secret to the speed of AMP pages isn’t the web components. It’s the prerendering.

The AMP cache

If you’ve ever seen an AMP page in a list of search results, you’ll have noticed the little lightning icon. If you’ve ever tapped on that search result, you’ll have noticed that the page loads blazingly fast!

That’s not down to AMP-the-format, alas. That’s down to the fact that the page has been prerendered by Google before you even went to it. If any page were prerendered that way, it would load blazingly fast. But currently, this privilege is reserved for AMP pages only.

If, after tapping through to that AMP page, you looked at the address bar of your browser, you might have noticed something odd. Even though you might have thought you were visiting The Washington Post, or The New York Times, the URL of the (blazingly fast) page you’re looking at is still under Google’s domain. That’s because Google hosts any AMP pages that it prerenders.

Google calls this “the AMP cache”, but it would be better described as “AMP hosting”. The web page sent down the wire is hosted on Google’s domain.

Here’s that AMP letter again:

When a user navigates from Google to a piece of content Google has recommended, they are, unwittingly, remaining within Google’s ecosystem.

Through gritted teeth, I will refer to this as “the AMP cache”, because that’s what everyone else calls it. But make no mistake, Google is hosting—not caching—these pages.

But why host the pages on a Google domain? Why not prerender the original URLs?

Prerendering and privacy

Scott summed up the situation with AMP nicely:

The pitch I think site owners are hearing is: let us host your pages on our domain and we’ll promote them in search results AND preload them so they feel “instant.” To opt-in, build pages using this component syntax.

But perhaps we could de-couple the AMP format from the AMP cache.

That’s what Terence suggests:

My recommendation is that Google stop requiring that organisations use Google’s proprietary mark-up in order to benefit from Google’s promotion.

The AMP letter, too:

Instead of granting premium placement in search results only to AMP, provide the same perks to all pages that meet an objective, neutral performance criterion such as Speed Index.

Scott reiterates:

It’s been said before but it would be so good for the web if pages with a Lighthouse score over say, 90 could get into that top search result area, even if they’re not built using Google’s AMP framework. Feels wrong to have to rebuild/reproduce an already-fast site just for SEO.

This was also what I was calling for. But then Malte pointed out something that stumped me. Privacy.

Here’s the problem…

Let’s say Google do indeed prerender already-fast pages when they’re listed in search results. You, a search user, type something into Google. A list of results come back. Google begins pre-rendering some of them. But you don’t end up clicking through to those pages. Nonetheless, the servers those pages are hosted on have received a GET request coming from a Google search. Those publishers now know that a particular (cookied?) user could have clicked through to their site. That’s very different from knowing when someone has actually arrived at a particular site.

And that’s why Google host all the AMP pages that they prerender. Given the privacy implications of prerendering non-Google URLs, I must admit that I see their point.

Still, it’s a real shame to miss out on the speed benefit of prerendering:

Prerendering AMP documents leads to substantial improvements in page load times. Page load time can be measured in different ways, but they consistently show that prerendering lets users see the content they want faster. For now, only AMP can provide the privacy preserving prerendering needed for this speed benefit.

A modest proposal

Why is Google’s AMP cache just for AMP pages? (Y’know, apart from the obvious answer that it’s in the name.)

What if Google were allowed to host non-AMP pages? Google search could then prerender those pages just like it currently does for AMP pages. There would be no privacy leaks; everything would happen on the same domain—google.com or ampproject.org or whatever—just as currently happens with AMP pages.

Don’t get me wrong: I’m not suggesting that Google should make a 1:1 model of the web just to prerender search results. I think that the implementation would need to have two important requirements:

  1. Hosting needs to be opt-in.
  2. Only fast pages should be prerendered.

Opting in

Currently, by publishing a page using the AMP format, publishers give implicit approval to Google to host that page on Google’s servers and serve up this Google-hosted version from search results. This has always struck me as being legally iffy. I’ve looked in the AMP documentation to try to find any explicit granting of hosting permission (e.g. “By linking to this JavaScript file, you hereby give Google the right to serve up our copies of your content.”), but no luck. So even with the current situation, I think a clear opt-in for hosting would be beneficial.

This could be a meta element. Maybe something like:

<meta name="caches-allowed" content="google">

This would have the nice benefit of allowing comma-separated values:

<meta name="caches-allowed" content="google, yandex">

(The name is just a strawman, by the way—I’m not suggesting that this is what the final implementation would actually look like.)

If not a meta element, then perhaps this could be part of robots.txt? Although my feeling is that this needs to happen on a document-by-document basis rather than site-wide.

Many people will, quite rightly, never want Google—or anyone else—to host and serve up their content. That’s why it’s so important that this behaviour needs to be opt-in. It’s kind of appalling that the current hosting of AMP pages is opt-in-by-proxy-sort-of.

Criteria for prerendering

Which pages should be blessed with hosting and prerendering? The fast ones. That’s sorta the whole point of AMP. But right now, there’s a lot of resentment by people with already-fast websites who quite rightly feel they shouldn’t have to use the AMP format to benefit from the AMP ecosystem.

Page speed is already a ranking factor. It doesn’t seem like too much of a stretch to extend its benefits to hosting and prerendering. As mentioned above, there are already a few possible metrics to use:

  • Page Speed Index
  • Lighthouse
  • Web Page Test

Ah, but what if a page has good score when it’s indexed, but then gets worse afterwards? Not a problem! The version of the page that’s measured is the same version of the page that gets hosted and prerendered. Google can confidently say “This page is fast!” After all, they’re the ones serving up the page.

That does raise the question of how often Google should check back with the original URL to see if it has changed/worsened/improved. The answer to that question is however long it currently takes to check back in on AMP pages:

Each time a user accesses AMP content from the cache, the content is automatically updated, and the updated version is served to the next user once the content has been cached.

Issues

This proposal does not solve the problem with the address bar. You’d still find yourself looking at a page from The Washington Post or The New York Times (or adactio.com) but seeing a completely different URL in your browser. That’s not good, for all the reasons outlined in the AMP letter.

In fact, this proposal could potentially make the situation worse. It would allow even more sites to be impersonated by Google’s URLs. Where currently only AMP pages are bad actors in terms of URL confusion, opening up the AMP cache would allow equal opportunity URL confusion.

What I’m suggesting is definitely not a long-term solution. The long-term solutions currently being investigated are technically tricky and will take quite a while to come to fruition—web packages and signed exchanges. In the meantime, what I’m proposing is a stopgap solution that’s technically a lot simpler. But it won’t solve all the problems with AMP.

This proposal solves one problem—AMP pages being unfairly privileged in search results—but does nothing to solve the other, perhaps more serious problem: the erosion of site identity.

Measuring

Currently, Google can assess whether a page should be hosted and prerendered by checking to see if it’s a valid AMP page. That test would need to be widened to include a different measurement of performance, but those measurements already exist.

I can see how this assessment might not be as quick as checking for AMP validity. That might affect whether non-AMP pages could be measured quickly enough to end up in the Top Stories carousel, which is, by its nature, time-sensitive. But search results are not necessarily as time-sensitive. Let’s start there.

Assets

Currently, AMP pages can be prerendered without fetching anything other than the markup of the AMP page itself. All the CSS is inline. There are no initial requests for other kinds of content like images. That’s because there are no img elements on the page: authors must use amp-img instead. The image itself isn’t loaded until the user is on the page.

If the AMP cache were to be opened up to non-AMP pages, then any content required for prerendering would also need to be hosted on that same domain. Otherwise, there’s privacy leakage.

This definitely introduces an extra level of complexity. Paths to assets within the markup might need to be re-written to point to the Google-hosted equivalents. There would almost certainly need to be a limit on the number of assets allowed. Though, for performance, that’s no bad thing.

Make no mistake, figuring out what to do about assets—style sheets, scripts, and images—is very challenging indeed. Luckily, there are very smart people on the Google AMP team. If that brainpower were to focus on this problem, I am confident they could solve it.

Summary

  1. Prerendering of non-Google URLs is problematic for privacy reasons, so Google needs to be able to host pages in order to prerender them.
  2. Currently, that’s only done for pages using the AMP format.
  3. The AMP cache—and with it, prerendering—should be decoupled from the AMP format, and opened up to other fast web pages.

There will be technical challenges, but hopefully nothing insurmountable.

I honestly can’t see what Google have to lose here. If their goal is genuinely to reward fast pages, then opening up their AMP cache to fast non-AMP pages will actively encourage people to make fast web pages (without having to switch over to the AMP format).

I’ve deliberately kept the details vague—what the opt-in should look like; what the speed measurement should be; how to handle assets—I’m sure smarter folks than me can figure that stuff out.

I would really like to know what other people think about this proposal. Obviously, I’d love to hear from members of the Google AMP team. But I’d also love to hear from publishers. And I’d very much like to know what people in the web performance community think about this. (Write a blog post and send me a webmention.)

What am I missing here? What haven’t I thought of? What are the potential pitfalls (and are they any worse than the current acrimonious situation with Google AMP)?

I would really love it if someone with a fast website were in a position to say, “Hey Google, I’m giving you permission to host this page so that it can be prerendered.”

I would really love it if someone with a slow website could say, “Oh, shit! We’d better make our existing website faster or Google won’t host our pages for prerendering.”

And I would dearly love to finally be able to embrace AMP-the-format with a clear conscience. But as long as prerendering is joined at the hip to the AMP format, the injustice of the situation only harms the AMP project.

Google, open up the AMP cache.

Tuesday, June 18th, 2019

The New Wilderness (Idle Words)

An excellent piece by Maciej on the crucial difference between individual privacy and ambient privacy (and what that means for regulation):

Ambient privacy is not a property of people, or of their data, but of the world around us. Just like you can’t drop out of the oil economy by refusing to drive a car, you can’t opt out of the surveillance economy by forswearing technology (and for many people, that choice is not an option). While there may be worthy reasons to take your life off the grid, the infrastructure will go up around you whether you use it or not.

Because our laws frame privacy as an individual right, we don’t have a mechanism for deciding whether we want to live in a surveillance society. Congress has remained silent on the matter, with both parties content to watch Silicon Valley make up its own rules. The large tech companies point to our willing use of their services as proof that people don’t really care about their privacy. But this is like arguing that inmates are happy to be in jail because they use the prison library. Confronted with the reality of a monitored world, people make the rational decision to make the best of it.

That is not consent.

For more detail, I highly recommend reading his testimony to the senate hearing on Privacy Rights and Data Collection in a Digital Economy.

Wednesday, June 5th, 2019

Let’s Clarify some Misunderstandings around Sign In with Apple • Aaron Parecki

Aaron knows what he’s talking about when it comes to authentication, and Apple’s latest move with sign-in for native apps gets the thumbs up.

Sign In with Apple is a good thing for users! This means apps will no longer be able to force you to log in with your Facebook account to use them.

This does not mean that Apple is requiring every app to use Sign in with Apple.

Tuesday, May 28th, 2019

How Ireland became Europe’s data watchdog - BBC News

The coming GDPR storm:

Ireland’s Data Protection Commissioner, Helen Dixon, is expected to circulate her decisions on some cases by July or August, with final rulings made by the end of the year.

(That’s my sister-in-law, that is.)

Saturday, April 13th, 2019

Goodbye Google Analytics, Hello Fathom - daverupert.com

Dave stops feeding his site’s visitors data to Google. I wish more people (and companies) would join him.

There’s also an empowering #indieweb feeling about owning your analytics too. I pay for the server my analytics collector runs on. It’s on my own subdomain. It’s mine.

Wednesday, March 20th, 2019

switching.social – Ethical alternatives to popular sites and apps

For full hipster points, make sure you’re using these services, and then casually drop them into conversation by saying “Yeah, it’s a pretty obscure service; you probably haven’t heard of it…”

Monday, March 4th, 2019

Designing for Personalities by Sarah Parmenter

Following on from Jeffrey and Margot, the third talk in the morning’s curated content at An Event Apart Seattle is from Sarah Parmenter. Her talk is Designing for Personalities. Here’s the description:

Just as our designs today must accommodate differences of gender, cultural background, and other factors, it’s time to create apps, websites, and internal processes that account for still another strand of human diversity: our very different personality types.

In this new presentation, Sarah shares real-life case studies demonstrating how businesses and organizations large and small are learning to adjust the thinking behind their websites and processes to account for the wishes, needs, and comfort levels of all kinds of people.

We know that the world is full of different conventions—currency, measuring systems, and more—and our web forms address these differences. Let’s do the same for the emotional and psychological assumptions behind our customer profiles. Let’s learn to design for a palette of different personalities.

I’m going to do my best to write down some of what she says…

Sarah works with Adobe, and at a gathering last year, she ended up chatting with some of her co-workers about ancestry, for some reason. She mentioned that she had French and Norwegian roots. The French part is evident in her surname: parmentier means potato farmer. So Sarah did a DNA test. It turned out that Sarah had no French or Norwegian roots—everything in her ancestry came from within an eighty mile radius of her home. It was scary how much she strongly believed for years in something that just wasn’t true.

It’s like that on the web. There are things we do because lots of people do them, but that doesn’t mean they work. Many websites and digital processes are broken and it’s down to us to fix it.

With traditional personas, we make an awful lot of assumptions about people. Have a look at facebook.com/ads/preferences. See just how easy it is for computers to make startling amounts of assumptions.

The other problem with personas is that they are amalgamations. But there’s no such thing as an average costumer. The Microsoft design team add much more context so that they can design for real people in real situations.

Designing for personas only takes care of a fraction of the work we need to do. When we add in another layer of life getting in the way, and a layer of how someone is feeling, you’ve a medley of UX issues that need solving.

The problem is that personality traits aren’t static. They evolve with context. Personas are contextual but static. What we should really be doing is creating the most desirable experience for the user, and we can only do that by empowering them, as Margot also said. We need to give our users control.

If there were such controls, Sarah would use them to reduce motion on websites. She suffers from motion sickness and some websites literally make her sick. There is a prefers-reduced-motion media query but so far only Safari and Firefox support it. It’s hard to believe that we haven’t been doing this already. This stuff seems so obvious in hindsight.

Sarah asks who in the room are introverts. People raise their hands (which seems like quite an extroverted thing to do).

Now Sarah brings up the Meyers-Briggs test, a piece of pseudoscientic bollocks. Sarah is INFJ—introversion, intuition, feeling, judging. Weird flex, but okay.

Introverts will patiently seek out complex UX patterns if it aligns with their levels of comfort. These are people who would rather do anything rather than speak to someone on the phone. An introvert figured out that if you sat on the Virgin Atlantic homepage long enough, a live chat will pop up after twenty minutes.

Apple is great for introverts. They don’t bury their chat options (unlike Amazon). Remember, introverts are a third of the population.

Users will begin to value those applications and services that bother them the least, respect their privacy, and allow them a certain level of control.

Let’s talk about designing compassionate products.

What we’re asking of people in time-critical or exceptionally personal situations is for them to have the foresight to turn on incognito mode. Everyone has an urban legend horror story about cookies following them around the web. Cookies can seem like a smart marketing solution until context lets them down.

Sarah’s best friend got pregnant. She started excitedly clicking around the web looking for pregnancy-related products. She sadly lost the baby. Sarah explained to her how to use a cookie eraser. Her friend that she was joking. Sarah showed her how to clean her search history. But if you’ve liked and subscribed a bunch of things while you’re excited, it’s not that easy—when the worst happens—to think back on everything you did.

There’s an app that’s not in the US. It’s a menstrual cycle and fertility tracking app. It captures a lot of data. At the point when Sarah’s friend lost her baby, this change was caught by the app. The message she got was lacking in empathy. It was more like market research than a compassionate message. At a time when they should’ve been thinking of the mindset of their user, they were focused on getting data. No one caught this when the app was being designed.

The entire user experience of our websites and apps is going to rely on how empathetic we are.

We don’t always save things to reminisce; we save to give us the option to remember. We can currently favourite a photograph or flag as inapropriate. It would be nice to simply save something to a memory vault.

Bloom and Wild is a company in the UK. They send nice mailbox flowers. On March 5th last year, Sarah sent an email to the CEO of Bloom and Wild. She had just received a mailout about mother’s day after her mother passed away. Was their no way of opting out of receiving mother’s day emails without unsubscribing completely?

Well, yesterday they finally implemented it! Bloom and Wild have been overwhelmed by the positive response.

For those of us trying to make the web a better place, sometimes it can be as simple as reaching out to point out what companies could be doing better. And sometimes, just sometimes, they listen.

Also, read Design For Real Life by Eric Meyer and Sara Wachter-Boettcher.

As standard, we should be giving users end-to-end control over how they interact with us.

Sarah wants to talk about designing a personal UX journey. For one of her clients, Sarah dip-sampled hundreds of existing customers. There were gaps in the customer journey. They think that what was happening was the company was getting very aggressive after initial interaction—they were phoning customers. Sarah and her team started researching this. That made them unpopular with other parts of the company. Sarah gave her team Groucho Marx glasses whenever they had to go and ask people uncomfortable questions.

Sarah’s team went on a remarketing effort. They sent an email to people who were in the gap between booking an appointment and making a purchase. They asked the users what their preferences were for contacting them. The company didn’t think they were doing anything wrong but this research showed that 76% of people prefered to avoid phone calls.

They asked a few more questions. If you ask questions, there has to be value in it for the users. Sarah got the budget for some gift cards. They got feedback that many people don’t like taking calls, especially when they’re at work. The best: “I’m an intorvert. I hate calls. Sorry.”

The customer feedback was very, very clear. Even though this would take a lot of money to fix, it was crucial to fix it. Being agile was crucial.

Then they looked at a different (shorter) gap in the customer journey. It was clear that an online booking service was desirable. They made a product quickly that booked more appointments in ten days than had previously been booked in a month by sales agents.

They also made a live chat system. You see a very slow roll-out. At the beginning, it has all new customers. After a while, people return with more questions.

The mistake they made was having a tech-savvy team with multiple browser windows open. That’s not how the customer service people operate. They usually deal with people one on one. So they were happy to leave people waiting on live chat for twenty or twenty five minutes, and of course that was far too long. So when you’re adding in a new system like this, think about key performance indicators that you want to go along with it e.g. live chat must have a response within five minutes.

There’s also a long tail of conversion. Sometimes the sales cycle is very lengthy. They decided to give users the ability to select which product they wanted and switch options on and off. It was all about giving the power back to the user. This was a phenomenal change for the company. They were able to completely change the customer journey and reduce those big gaps. They went from a cycle of fourteen weeks to seven days. They did that by handing the power back to the user.

Sarah’s question for the audience is: What is stopping your user completing your cycle? This can be very difficult. You might have to do horrible things to validate a concept. It’s okay. We’re all perfectionists, but sometimes you have to use quick’n’dirty code to achieve your goal. If the end goal is we’re able to say “hey, this thing worked!” then we can go back and do it properly.

To recap:

  • Respect privacy and build in a personal level of UX adjustment into every product.
  • Outlier data can create superfans of your product.
  • Build the most empathetic experience that you can.

Wednesday, January 16th, 2019

Security Checklist

Exactly what it sounds like: a checklist of measures you can take to protect yourself.

Most of these require a certain level of tech-savviness, which is a real shame. On the other hand, some of them are entirely about awareness.

Saturday, November 10th, 2018

Webmentions at Indie Web Camp Berlin

I was in Berlin for most of last week, and every day was packed with activity:

By the time I got back to Brighton, my brain was full …just in time for FF Conf.

All of the events were very different, but equally enjoyable. It was also quite nice to just attend events without speaking at them.

Indie Web Camp Berlin was terrific. There was an excellent turnout, and once again, I found that the format was just right: a day of discussions (BarCamp style) followed by a day of doing (coding, designing, hacking). I got very inspired on the first day, so I was raring to go on the second.

What I like to do on the second day is try to complete two tasks; one that’s fairly straightforward, and one that’s a bit tougher. That way, when it comes time to demo at the end of the day, even if I haven’t managed to complete the tougher one, I’ll still be able to demo the simpler one.

In this case, the tougher one was also tricky to demo. It involved a lot of invisible behind-the-scenes plumbing. I was tweaking my webmention endpoint (stop sniggering—tweaking your endpoint is no laughing matter).

Up until now, I could handle straightforward webmentions, and I could handle updates (if I receive more than one webmention from the same link, I check it each time). But I needed to also handle deletions.

The spec is quite clear on this. A 404 isn’t enough to trigger a deletion—that might be a temporary state. But a status of 410 Gone indicates that a resource was once here but has since been deliberately removed. In that situation, any stored webmentions for that link should also be removed.

Anyway, I think I got it working, but it’s tricky to test and even trickier to demo. “Not to worry”, I thought, “I’ve always got my simpler task.”

For that, I chose to add a little map to my homepage showing the last location I published something from. I’ve been geotagging all my content for years (journal entries, notes, links, articles), but not really doing anything with that data. This is a first step to doing something interesting with many years of location data.

I’ve got it working now, but the demo gods really weren’t with me at Indie Web Camp. Both of my demos failed. The webmention demo failed quite embarrassingly.

As well as handling deletions, I also wanted to handle updates where a URL that once linked to a post of mine no longer does. Just to be clear, the URL still exists—it’s not 404 or 410—but it has been updated to remove the original link back to one of my posts. I know this sounds like another very theoretical situation, but I’ve actually got an example of it on my very first webmention test post from five years ago. Believe it or not, there’s an escort agency in Nottingham that’s using webmention as a vector for spam. They post something that does link to my test post, send a webmention, and then remove the link to my test post. I almost admire their dedication.

Still, I wanted to foil this particular situation so I thought I had updated my code to handle it. Alas, when it came time to demo this, I was using someone else’s computer, and in my attempt to right-click and copy the URL of the spam link …I accidentally triggered it. In front of a room full of people. It was midly NSFW, but more worryingly, a potential Code Of Conduct violation. I’m very sorry about that.

Apart from the humiliating demo, I thoroughly enjoyed Indie Web Camp, and I’m going to keep adjusting my webmention endpoint. There was a terrific discussion around the ethical implications of storing webmentions, led by Sebastian, based on his epic post from earlier this year.

We established early in the discussion that we weren’t going to try to solve legal questions—like GDPR “compliance”, which varies depending on which lawyer you talk to—but rather try to figure out what the right thing to do is.

Earlier that day, during the introductions, I quite happily showed webmentions in action on my site. I pointed out that my last blog post had received a response from another site, and because that response was marked up as an h-entry, I displayed it in full on my site. I thought this was all hunky-dory, but now this discussion around privacy made me question some inferences I was making:

  1. By receiving a webention in the first place, I was inferring a willingness for the link to be made public. That’s not necessarily true, as someone pointed out: a CMS could be automatically sending webmentions, which the author might be unaware of.
  2. If the linking post is marked up in h-entry, I was inferring a willingness for the content to be republished. Again, not necessarily true.

That second inferrence of mine—that publishing in a particular format somehow grants permissions—actually has an interesting precedent: Google AMP. Simply by including the Google AMP script on a web page, you are implicitly giving Google permission to store a complete copy of that page and serve it from their servers instead of sending people to your site. No terms and conditions. No checkbox ticked. No “I agree” button pressed.

Just sayin’.

Anyway, when it comes to my own processing of webmentions, I’m going to take some of the suggestions from the discussion on board. There are certain signals I could be looking for in the linking post:

  • Does it include a link to a licence?
  • Is there a restrictive robots.txt file?
  • Are there meta declarations that say noindex?

Each one of these could help to infer whether or not I should be publishing a webmention or not. I quickly realised that what we’re talking about here is an algorithm.

Despite its current usage to mean “magic”, an algorithm is a recipe. It’s a series of steps that contribute to a decision point. The problem is that, in the case of silos like Facebook or Instagram, the algorithms are secret (which probably contributes to their aura of magical thinking). If I’m going to write an algorithm that handles other people’s information, I don’t want to make that mistake. Whatever steps I end up codifying in my webmention endpoint, I’ll be sure to document them publicly.

Thursday, September 20th, 2018

The costs and benefits of tracking scripts – business vs. user // Sebastian Greger

I am having a hard time seeing the business benefits weighing in more than the user cost (at least for those many organisations out there who rarely ever put that data to proper use). After all, keeping the costs low for the user should be in the core interest of the business as well.

Friday, September 14th, 2018

On using tracking scripts | justmarkup

Weighing up the pros and cons of adding tracking scripts to a website, from a business perspective and from a user perspective.

When looking at the costs versus the benefits it is hard to believe that almost every website is using tracking scripts.

The next time, you implement a tracking script it would be great if you could rethink it and ask yourself if it is really worth it.

Wednesday, September 12th, 2018

Private by Default

Feedbin has removed third-party iframes and JavaScript (oEmbed provides a nice alternative), as well as stripping out Google Analytics, and even web fonts that aren’t self-hosted. This is excellent!

Saturday, September 1st, 2018

Changing Our Approach to Anti-tracking - Future Releases

This is excellent news from Mozilla. Firefox is going to make it easier to block vampiric privacy-leeching and performance-draining third-party scripts and trackers.

In the physical world, users wouldn’t expect hundreds of vendors to follow them from store to store, spying on the products they look at or purchase. Users have the same expectations of privacy on the web, and yet in reality, they are tracked wherever they go.

Friday, July 20th, 2018

Laura Kalbag – Insecure

The web can be used to find common connections with folks you find interesting, and who don’t make you feel like so much of a weirdo. It’d be nice to be able to do this in a safe space that is not being surveilled.

Owning your own content, and publishing to a space you own can break through some of these barriers. Sharing your own weird scraps on your own site makes you easier to find by like-minded folks. If you’ve got no tracking on your site (no Google Analytics etc), you are harder to profile. People can’t come to harass you on your own site if you do not offer them the means to do so

Thursday, July 19th, 2018

Fixing these webs - daverupert.com

I’m a fan of fast websites. Your website needs to be fast. Our collective excuses, hand-wringing, and inability to come to terms with the problem-set (There is too much script) and solutions (Use less script) of modern web development is getting tired.

I agree with every word of this.

Sadly, I think the one company with a browser that has marketshare dominance and could exert the kind of pressure required to stop ad tracking and surveillance capitalism is not incentivized to do so.

So the problem is approached from the other end. Blame is piled on authors for slow first-party code. We’re told to use certain mobile publishing frameworks that syndicate to proprietary CDNs to appease the gods of luck and fortune.

Sunday, July 8th, 2018

BBC News on HTTPS – BBC Design + Engineering – Medium

BBC News has switched to HTTPS—hurrah!

Here, one of the engineers writes on Ev’s blog about the challenges involved. Personally, I think this is far more valuable and inspiring to read than the unempathetic posts claiming that switching to HTTPS is easy.

Update: Paul found the original URL for this …weird that they don’t link to it from the syndicated version.

Tuesday, July 3rd, 2018

“I Was Devastated”: Tim Berners-Lee, the Man Who Created the World Wide Web, Has Some Regrets | Vanity Fair

Are we headed toward an Orwellian future where a handful of corporations monitor and control our lives? Or are we on the verge of creating a better version of society online, one where the free flow of ideas and information helps cure disease, expose corruption, reverse injustices?

It’s hard to believe that anyone—even Zuckerberg—wants the 1984 version. He didn’t found Facebook to manipulate elections; Jack Dorsey and the other Twitter founders didn’t intend to give Donald Trump a digital bullhorn. And this is what makes Berners-Lee believe that this battle over our digital future can be won. As public outrage grows over the centralization of the Web, and as enlarging numbers of coders join the effort to decentralize it, he has visions of the rest of us rising up and joining him.

Tuesday, June 26th, 2018

Name That Script! by Trent Walton

Trent is about to pop his AEA cherry and give a talk at An Event Apart in Boston. I’m going to attempt to liveblog this:

How many third-party scripts are loading on our web pages these days? How can we objectively measure the value of these (advertising, a/b testing, analytics, etc.) scripts—considering their impact on web performance, user experience, and business goals? We’ve learned to scrutinize content hierarchy, browser support, and page speed as part of the design and development process. Similarly, Trent will share recent experiences and explore ways to evaluate and discuss the inclusion of 3rd-party scripts.

Trent is going to speak about third-party scripts, which is funny, because a year ago, he never would’ve thought he’d be talking about this. But he realised he needed to pay more attention to:

any request made to an external URL.

Or how about this:

A resource included with a web page that the site owner doesn’t explicitly control.

When you include a third-party script, the third party can change the contents of that script.

Here are some uses:

  • advertising,
  • A/B testing,
  • analytics,
  • social media,
  • content delivery networks,
  • customer interaction,
  • comments,
  • tag managers,
  • fonts.

You get data from things like analytics and A/B testing. You get income from ads. You get content from CDNs.

But Trent has concerns. First and foremost, the user experience effects of poor performance. Also, there are the privacy implications.

Why does Trent—a designer—care about third party scripts? Well, over the years, the areas that Trent pays attention to has expanded. He’s progressed from image comps to frontend to performance to accessibility to design systems to the command line and now to third parties. But Trent has no impact on those third-party scripts. That’s very different to all those other areas.

Trent mostly builds prototypes. Those then get handed over for integration. Sometimes that means hooking it up to a CMS. Sometimes it means adding in analytics and ads. It gets really complex when you throw in third-party comments, payment systems, and A/B testing tools. Oftentimes, those third-party scripts can outweigh all the gains made beforehand. It happens with no discussion. And yet we spent half a meeting discussing a border radius value.

Delivering a performant, accessible, responsive, scalable website isn’t enough: I also need to consider the impact of third-party scripts.

Trent has spent the last few months learning about third parties so he can be better equiped to discuss them.

UX, performance and privacy impact

We feel the UX impact every day we browse the web (if we turn off our content blockers). The Food Network site has an intersitial asking you to disable your ad blocker. They promise they won’t spawn any pop-up windows. Trent turned his ad blocker off—the page was now 15 megabytes in size. And to top it off …he got a pop up.

Privacy can harder to perceive. We brush aside cookie notifications. What if the wording was “accept trackers” instead of “accept cookies”?

Remarketing is that experience when you’re browsing for a spatula and then every website you visit serves you ads for spatula. That might seem harmless but allowing access to our browsing history has serious privacy implications.

Web builders are on the front lines. It’s up to us to advocate for data protection and privacy like we do for web standards. Don’t wait to be told.

Categories of third parties

Ghostery categories third-party providers: advertising, comments, customer interaction, essential, site analytics, social media. You can dive into each layer and see the specific third-party services on the page you’re viewing.

Analyse and itemise third-party scripts

We have “view source” for learning web development. For third parties, you need some tool to export the data. HAR files (HTTP ARchive) are JSON files that you can create from most browsers’ network request panel in dev tools. But what do you do with a .har file? The site har.tech has plenty of resources for you. That’s where Trent found the Mac app, Charles. It can open .har files. Best of all, you can export to CSV so you can share spreadsheets of the data.

You can visualise third-party requests with Simon Hearne’s excellent Request Map. It’s quite impactful for delivering a visceral reaction in a meeting—so much more effective than just saying “hey, we have a lot of third parties.” Request Map can also export to CSV.

Know industry averages

Trent wanted to know what was “normal.” He decided to analyse HAR files for Alexa’s top 50 US websites. The result was a massive spreadsheet of third-party providers. There were 213 third-party domains (which is not even the same as the number of requests). There was an average of 22 unique third-party domains per site. The usual suspects were everywhere—Google, Amazon, Facebook, Adobe—but there were many others. You can find an alphabetical index on better.fyi/trackers. Often the lesser-known domains turn out to be owned by the bigger domains.

News sites and shopping sites have the most third-party scripts, unsurprisingly.

Understand benefits

Trent realised he needed to listen and understand why third-party scripts are being included. He found out what tag managers do. They’re funnels that allow you to cram even more third-party scripts onto your website. Trent worried that this was a Pandora’s box. The tag manager interface is easy to access and use. But he was told that it’s more like a way of organising your third-party scripts under one dashboard. But still, if you get too focused on the dashboard, you could lose focus of the impact on load times. So don’t blame the tool: it’s all about how it’s used.

Take action

Establish a centre of excellence. Put standards in place—in a cross-discipline way—to define how third-party scripts are evaluated. For example:

  1. Determine the value to the business.
  2. Avoid redundant scripts and services.
  3. Fit within the established performance budget.
  4. Comply with the organistional privacy policy.

Document those decisions, maybe even in your design system.

Also, include third-party scripts within your prototypes to get a more accurate feel for the performance implications.

On a live site, you can regularly audit third-party scripts on a regular basis. Check to see if any are redundant or if they’re exceeding the performance budget. You can monitor performance with tools like Calibre and Speed Curve to cover the time in between audits.

Make your case

Do competitive analysis. Look at other sites in your sector. It’s a compelling way to make a case for change. WPO Stats is very handy for anecdata.

You can gather comparative data with Web Page Test: you can run a full test, and you can run a test with certain third parties blocked. Use the results to kick off a discussion about the impact of those third parties.

Talk it out

Work to maintain an ongoing discussion with the entire team. As Tim Kadlec says:

Everything should have a value, because everything has a cost.

Saturday, June 16th, 2018

Artificial Intelligence for more human interfaces | Christian Heilmann

An even-handed assessment of the benefits and dangers of machine learning.