I’m not down with Google swallowing everything posted on the internet to train their generative AI models.
Wednesday, July 12th, 2023
This would mean a lot more if it happened before the wholesale harvesting of everyone’s work.
But I’m sure Google will put a mighty fine lock on that stable door that the horse bolted from.
Tuesday, July 11th, 2023
Back when the web was young, it wasn’t yet clear what the rules were. Like, could you really just link to something without asking permission?
Then came some legal rulings to establish that, yes, on the web you can just link to anything without checking if it’s okay first.
What about search engines and directories? Technically they’re rifling through all the stuff we publish and reposting snippets of it. Is that okay?
Again, through some legal precedents—but mostly common agreement—everyone decided that on balance it was fine. After all, those snippets they publish are helping your site get traffic.
In short order, search came to rule the web. And Google came to rule search.
The mutually beneficial arrangement persisted uneasily. Despite Google’s search results pages getting worse and worse in recent years, the company’s huge market share of search means you generally want to be in their good books.
Google’s business model relies on us publishing web pages so that they can put ads around the search results linking to that content, and we rely on Google to send people to our websites by responding smartly to search queries.
That has now changed. Instead of responding to search queries by linking to the web pages we’ve made, Google is instead generating dodgy summaries rife with hallucina… lies (a psychic hotline, basically).
Google still benefits from us publishing web pages. We no longer benefit from Google slurping up those web pages.
Google has steadily been manoeuvring their search engine results to more and more replace the pages in the results.
Me, I just think it’s fuckin’ rude.
Google is a portal to the web. Google is an amazing tool for finding relevant websites to go to. That was useful when it was made, and it’s nothing but grown in usefulness. Google should be encouraging and fighting for the open web. But now they’re like, actually we’re just going to suck up your website, put it in a blender with all other websites, and spit out word smoothies for people instead of sending them to your website. Instead.
Robots.txt needs an update for the 2020s. Instead of just saying what content can be indexed, it should also grant rights.
Like crawl my site only to provide search results not train your LLM.
It’s a solid proposal. But Google has absolutely no incentive to implement it. They hold all the power.
Or do they?
There is still the nuclear option in
User-agent: Googlebot Disallow: /
That’s what Vasilis is doing:
I have been looking for ways to not allow companies to use my stuff without asking, and so far I coulnd’t find any. But since this policy change I realised that there is a simple one: block google’s bots from visiting your website.
The general consensus is that this is nuts. “If you don’t appear in Google’s results, you might as well not be on the web!” is the common cry.
I’m not so sure. At least when it comes to personal websites, search isn’t how people get to your site. They get to your site from RSS, newsletters, links shared on social media or on Slack.
And isn’t it an uncomfortable feeling to think that there’s a third party service that you absolutely must appease? It’s the same kind of justification used by people who are still on Twitter even though it’s now a right-wing transphobic cesspit. “If I’m not on Twitter, I might as well not be on the web!”
The situation with Google reminds me of what Robin said about Twitter:
The speed with which Twitter recedes in your mind will shock you. Like a demon from a folktale, the kind that only gains power when you invite it into your home, the platform melts like mist when that invitation is rescinded.
We can rescind our invitation to Google.
Wednesday, June 26th, 2019
1,841 instances of dark patterns on ecommerce sites, in the categories of sneaking, urgency, misdirection, social proof, scarcity, obstruction, and forced action. You can browse this overview, read the paper, or look at the raw data.
We conducted a large-scale study, analyzing ~53K product pages from ~11K shopping websites to characterize and quantify the prevalence of dark patterns.
Friday, November 23rd, 2018
As it turns out, some sites are much harder to archive than others. This article goes through the process of archiving traditional web sites and shows how it falls short when confronted with the latest fashions in the single-page applications that are bloating the modern web.