My work shouldn’t be presented in the Smithsonian behind glass or anything, I’m just pointing at this enormous flaw in the architecture of the web itself: you’re renting servers and renting URLs. Nothing is permanent because on the web we don’t really own any space, we’re just borrowing land temporarily.
Sunday, February 28th, 2021
Thursday, October 1st, 2020
Downloading from Google Fonts
If you’re using web fonts, there are good performance (and privacy) reasons for hosting your own font files. And fortunately, Google Fonts gives you that option. There’s a “Download family” button on every specimen page.
But if you go ahead and download a font family from Google Fonts, you’ll notice something a bit odd. The .zip file only contains .ttf files. You can serve those on the web, but it’s far from the best choice. Woff2 is far leaner in file size.
This means you need to manually convert the downloaded .ttf files into .woff or .woff2 files using something like Font Squirrel’s generator. That’s fine, but I’m curious as to why this step is necessary. Why doesn’t Google Fonts provide .woff or .woff2 files in the downloaded folder? After all, if you choose to use Google Fonts as a third-party hosting service for your fonts, it most definitely serves up the appropriate file formats.
I thought maybe it was something to do with the licensing. Maybe some licenses only allow for unmodified truetype files to be distributed? But I’ve looked at fonts with different licenses—some have Apache 2 licensing, some have Open Font licensing—and they’re all quite permissive and definitely allow for modification.
Maybe the thinking is that, if you’re hosting your own font files, then you know what you’re doing and you should be able to do your own file conversion and subsetting. But I’ve come across more than one website in the wild serving up .ttf files. And who can blame them? They want to host their own font files. They downloaded those files from Google Fonts. Why shouldn’t they assume that they’re good to go?
It’s all a bit strange. If anyone knows why Google Fonts only provides .ttf files for download, please let me know. In a pinch, I will also accept rampant speculation.
Trys also pointed out some weird default behaviour if you do let Google Fonts do the hosting for you. Specifically if it’s a variable font. Let’s say it’s a font with weight as a variable axis. You specify in advance which weights you’ll be using, and then it generates separate font files to serve for each different weight.
Doesn’t that defeat the whole point of using a variable font? I mean, I can see how it could result in smaller file sizes if you’re just using one or two weights, but isn’t half the fun of having a weight axis that you can go crazy with as many weights as you want and it’s all still one font file?
Like I said, it’s all very strange.
Monday, July 27th, 2020
This is an interesting project to try to rank web hosts by performance:
Real-world server response (Time to First Byte) latencies, as experienced by real-world users navigating the web.
Sunday, December 22nd, 2019
If you add another advertisement to your pages, you generate more revenue. If you track your users better, now you can deliver tailored ads and your conversion rates are higher. If you restrict users from leaving your walled garden ecosystem, now you get all the juice from whatever attention they have.
The question is: At which point do we reach the breaking point?
And I think the answer is: We are very close.
Facebook. Twitter. Medium. All desparate to withhold content they didn’t even create until you cough up your personal details.
Monday, September 16th, 2019
Monday, August 26th, 2019
Opening up the AMP cache
I have a proposal that I think might alleviate some of the animosity around Google AMP. You can jump straight to the proposal or get some of the back story first…
The AMP format
But I cannot get behind AMP.
Instead of competing on its own merits, AMP is unfairly propped up by the search engine of its parent company, Google. That makes it very hard to evaluate whether AMP is being used on its own merits. Instead, the evidence suggests that most publishers of AMP pages are doing so because they feel they have to, rather than because they want to. That’s a real shame, because as a library of web components, AMP seems pretty good. But there’s just no way to evaluate AMP-the-format without taking into account AMP-the-ecosystem.
The AMP ecosystem
Google AMP ostensibly exists to make the web faster. Initially the focus was specifically on mobile performance, but that distinction has since fallen by the wayside. The idea is that by using AMP’s web components, your pages will be speedy. Though, as Andy Davies points out, this isn’t always the case:
This is where I get confused… https://independent.co.uk only have an AMP site yet it’s performance is awful from a user perspective - isn’t AMP supposed to prevent this?
According to Google’s own Page Speed Insights audit (which Google recommends to check your performance), the AMP version of articles got an average performance score of 87. The non-AMP versions? 95.
Publishers who already have fast web pages—like The Guardian—are still compelled to make AMP versions of their stories because of the search benefits reserved for AMP. As Terence Eden reported from a meeting of the AMP advisory committee:
We heard, several times, that publishers don’t like AMP. They feel forced to use it because otherwise they don’t get into Google’s news carousel — right at the top of the search results.
Some people felt aggrieved that all the hard work they’d done to speed up their sites was for nothing.
The Google AMP team are at pains to point out that AMP is not a ranking factor in search. That’s true. But it is unfairly privileged in other ways. Only AMP pages can appear in the Top Stories carousel …which appears above any other search results. As I’ve said before:
Now, if you were to ask any right-thinking person whether they think having their page appear right at the top of a list of search results would be considered preferential treatment, I think they would say hell, yes! This is the only reason why The Guardian, for instance, even have AMP versions of their content—it’s not for the performance benefits (their non-AMP pages are faster); it’s for that prime real estate in the carousel.
Content that “opts in” to AMP and the associated hosting within Google’s domain is granted preferential search promotion, including (for news articles) a position above all other results.
That’s not the only way that AMP pages get preferential treatment. It turns out that the secret to the speed of AMP pages isn’t the web components. It’s the prerendering.
The AMP cache
If you’ve ever seen an AMP page in a list of search results, you’ll have noticed the little lightning icon. If you’ve ever tapped on that search result, you’ll have noticed that the page loads blazingly fast!
That’s not down to AMP-the-format, alas. That’s down to the fact that the page has been prerendered by Google before you even went to it. If any page were prerendered that way, it would load blazingly fast. But currently, this privilege is reserved for AMP pages only.
If, after tapping through to that AMP page, you looked at the address bar of your browser, you might have noticed something odd. Even though you might have thought you were visiting The Washington Post, or The New York Times, the URL of the (blazingly fast) page you’re looking at is still under Google’s domain. That’s because Google hosts any AMP pages that it prerenders.
Google calls this “the AMP cache”, but it would be better described as “AMP hosting”. The web page sent down the wire is hosted on Google’s domain.
Here’s that AMP letter again:
When a user navigates from Google to a piece of content Google has recommended, they are, unwittingly, remaining within Google’s ecosystem.
Through gritted teeth, I will refer to this as “the AMP cache”, because that’s what everyone else calls it. But make no mistake, Google is hosting—not caching—these pages.
But why host the pages on a Google domain? Why not prerender the original URLs?
Prerendering and privacy
The pitch I think site owners are hearing is: let us host your pages on our domain and we’ll promote them in search results AND preload them so they feel “instant.” To opt-in, build pages using this component syntax.
But perhaps we could de-couple the AMP format from the AMP cache.
That’s what Terence suggests:
My recommendation is that Google stop requiring that organisations use Google’s proprietary mark-up in order to benefit from Google’s promotion.
Instead of granting premium placement in search results only to AMP, provide the same perks to all pages that meet an objective, neutral performance criterion such as Speed Index.
It’s been said before but it would be so good for the web if pages with a Lighthouse score over say, 90 could get into that top search result area, even if they’re not built using Google’s AMP framework. Feels wrong to have to rebuild/reproduce an already-fast site just for SEO.
Here’s the problem…
Let’s say Google do indeed prerender already-fast pages when they’re listed in search results. You, a search user, type something into Google. A list of results come back. Google begins pre-rendering some of them. But you don’t end up clicking through to those pages. Nonetheless, the servers those pages are hosted on have received a GET request coming from a Google search. Those publishers now know that a particular (cookied?) user could have clicked through to their site. That’s very different from knowing when someone has actually arrived at a particular site.
And that’s why Google host all the AMP pages that they prerender. Given the privacy implications of prerendering non-Google URLs, I must admit that I see their point.
Still, it’s a real shame to miss out on the speed benefit of prerendering:
Prerendering AMP documents leads to substantial improvements in page load times. Page load time can be measured in different ways, but they consistently show that prerendering lets users see the content they want faster. For now, only AMP can provide the privacy preserving prerendering needed for this speed benefit.
A modest proposal
Why is Google’s AMP cache just for AMP pages? (Y’know, apart from the obvious answer that it’s in the name.)
What if Google were allowed to host non-AMP pages? Google search could then prerender those pages just like it currently does for AMP pages. There would be no privacy leaks; everything would happen on the same domain—google.com or ampproject.org or whatever—just as currently happens with AMP pages.
Don’t get me wrong: I’m not suggesting that Google should make a 1:1 model of the web just to prerender search results. I think that the implementation would need to have two important requirements:
- Hosting needs to be opt-in.
- Only fast pages should be prerendered.
This could be a
meta element. Maybe something like:
<meta name="caches-allowed" content="google">
This would have the nice benefit of allowing comma-separated values:
<meta name="caches-allowed" content="google, yandex">
(The name is just a strawman, by the way—I’m not suggesting that this is what the final implementation would actually look like.)
If not a
meta element, then perhaps this could be part of
robots.txt? Although my feeling is that this needs to happen on a document-by-document basis rather than site-wide.
Many people will, quite rightly, never want Google—or anyone else—to host and serve up their content. That’s why it’s so important that this behaviour needs to be opt-in. It’s kind of appalling that the current hosting of AMP pages is opt-in-by-proxy-sort-of.
Criteria for prerendering
Which pages should be blessed with hosting and prerendering? The fast ones. That’s sorta the whole point of AMP. But right now, there’s a lot of resentment by people with already-fast websites who quite rightly feel they shouldn’t have to use the AMP format to benefit from the AMP ecosystem.
Page speed is already a ranking factor. It doesn’t seem like too much of a stretch to extend its benefits to hosting and prerendering. As mentioned above, there are already a few possible metrics to use:
- Page Speed Index
- Web Page Test
Ah, but what if a page has good score when it’s indexed, but then gets worse afterwards? Not a problem! The version of the page that’s measured is the same version of the page that gets hosted and prerendered. Google can confidently say “This page is fast!” After all, they’re the ones serving up the page.
That does raise the question of how often Google should check back with the original URL to see if it has changed/worsened/improved. The answer to that question is however long it currently takes to check back in on AMP pages:
Each time a user accesses AMP content from the cache, the content is automatically updated, and the updated version is served to the next user once the content has been cached.
This proposal does not solve the problem with the address bar. You’d still find yourself looking at a page from The Washington Post or The New York Times (or adactio.com) but seeing a completely different URL in your browser. That’s not good, for all the reasons outlined in the AMP letter.
In fact, this proposal could potentially make the situation worse. It would allow even more sites to be impersonated by Google’s URLs. Where currently only AMP pages are bad actors in terms of URL confusion, opening up the AMP cache would allow equal opportunity URL confusion.
What I’m suggesting is definitely not a long-term solution. The long-term solutions currently being investigated are technically tricky and will take quite a while to come to fruition—web packages and signed exchanges. In the meantime, what I’m proposing is a stopgap solution that’s technically a lot simpler. But it won’t solve all the problems with AMP.
This proposal solves one problem—AMP pages being unfairly privileged in search results—but does nothing to solve the other, perhaps more serious problem: the erosion of site identity.
Currently, Google can assess whether a page should be hosted and prerendered by checking to see if it’s a valid AMP page. That test would need to be widened to include a different measurement of performance, but those measurements already exist.
I can see how this assessment might not be as quick as checking for AMP validity. That might affect whether non-AMP pages could be measured quickly enough to end up in the Top Stories carousel, which is, by its nature, time-sensitive. But search results are not necessarily as time-sensitive. Let’s start there.
Currently, AMP pages can be prerendered without fetching anything other than the markup of the AMP page itself. All the CSS is inline. There are no initial requests for other kinds of content like images. That’s because there are no
img elements on the page: authors must use
amp-img instead. The image itself isn’t loaded until the user is on the page.
If the AMP cache were to be opened up to non-AMP pages, then any content required for prerendering would also need to be hosted on that same domain. Otherwise, there’s privacy leakage.
This definitely introduces an extra level of complexity. Paths to assets within the markup might need to be re-written to point to the Google-hosted equivalents. There would almost certainly need to be a limit on the number of assets allowed. Though, for performance, that’s no bad thing.
Make no mistake, figuring out what to do about assets—style sheets, scripts, and images—is very challenging indeed. Luckily, there are very smart people on the Google AMP team. If that brainpower were to focus on this problem, I am confident they could solve it.
- Prerendering of non-Google URLs is problematic for privacy reasons, so Google needs to be able to host pages in order to prerender them.
- Currently, that’s only done for pages using the AMP format.
- The AMP cache—and with it, prerendering—should be decoupled from the AMP format, and opened up to other fast web pages.
There will be technical challenges, but hopefully nothing insurmountable.
I honestly can’t see what Google have to lose here. If their goal is genuinely to reward fast pages, then opening up their AMP cache to fast non-AMP pages will actively encourage people to make fast web pages (without having to switch over to the AMP format).
I’ve deliberately kept the details vague—what the opt-in should look like; what the speed measurement should be; how to handle assets—I’m sure smarter folks than me can figure that stuff out.
I would really like to know what other people think about this proposal. Obviously, I’d love to hear from members of the Google AMP team. But I’d also love to hear from publishers. And I’d very much like to know what people in the web performance community think about this. (Write a blog post and send me a webmention.)
What am I missing here? What haven’t I thought of? What are the potential pitfalls (and are they any worse than the current acrimonious situation with Google AMP)?
I would really love it if someone with a fast website were in a position to say, “Hey Google, I’m giving you permission to host this page so that it can be prerendered.”
I would really love it if someone with a slow website could say, “Oh, shit! We’d better make our existing website faster or Google won’t host our pages for prerendering.”
And I would dearly love to finally be able to embrace AMP-the-format with a clear conscience. But as long as prerendering is joined at the hip to the AMP format, the injustice of the situation only harms the AMP project.
Google, open up the AMP cache.
Thursday, June 6th, 2019
Chris makes the very good point that the J in JAMstack isn’t nearly as important as the static hosting part.
This is my maj.
Monday, June 3rd, 2019
Trust no one! Harry enumerates the reason why you should be self-hosting your assets (and busts some myths along the way).
There really is very little reason to leave your static assets on anyone else’s infrastructure. The perceived benefits are often a myth, and even if they weren’t, the trade-offs simply aren’t worth it. Loading assets from multiple origins is demonstrably slower.
Sunday, November 25th, 2018
Okay, I knew about the Python shortcut—I mentioned it in Going Offline—but I had no idea it was so easy to do the same thing for PHP. This is a bit of a revelation for me!
Once in the desired directory, run:
php -S localhost:2222
Now you can go to “localhost:2222” in your browser, and if you have an index.html or .php file in your root directory, you’re in business.
Saturday, July 28th, 2018
Anchor seems to be going for the YouTube model. They want a huge number of people to use their platform. But the concentration of so much media in one place is one of the problems with today’s web. Massive social networks like Facebook, Instagram, and YouTube have too much power over writers, photographers, and video creators. We do not want that for podcasts.
Sunday, October 29th, 2017
The meaning of AMP
But when I hear AMP described as an open, community-led project, it strikes me as incredibly problematic, and more than a little troubling. AMP is, I think, best described as nominally open-source. It’s a corporate-led product initiative built with, and distributed on, open web technologies.
But so what, right? Tom-ay-to, tom-a-to. Well, here’s a pernicious example of where it matters: in a recent announcement of their intent to ship a new addition to HTML, the Google Chrome team cited the mood of the web development community thusly:
Web developers: Positive (AMP team indicated desire to start using the attribute)
If AMP were actually the product of working web developers, this justification would make sense. As it is, we’ve got one team at Google citing the preference of another team at Google but representing it as the will of the people.
This is just one example of AMP’s sneaky marketing where some finely-shaved semantics allows them to appear far more reasonable than they actually are.
At AMP Conf, the Google Search team were at pains to repeat over and over that AMP pages wouldn’t get any preferential treatment in search results …but they appear in a carousel above the search results. Now, if you were to ask any right-thinking person whether they think having their page appear right at the top of a list of search results would be considered preferential treatment, I think they would say hell, yes! This is the only reason why The Guardian, for instance, even have AMP versions of their content—it’s not for the performance benefits (their non-AMP pages are faster); it’s for that prime real estate in the carousel.
The same semantic nit-picking can be found in their defence of caching. See, they’ve even got me calling it caching! It’s hosting. If I click on a search result, and I am taken to page that has a URL beginning with
https://www.google.com/amp/s/... then that page is being hosted on the domain
google.com. That is literally what hosting means. Now, you might argue that the original version was hosted on a different domain, but the version that the user gets sent to is the Google copy. You can call it caching if you like, but you can’t tell me that Google aren’t hosting AMP pages.
That’s a particularly low blow, because it’s such a bait’n’switch. One of the reasons why AMP first appeared to be different to Facebook Instant Articles or Apple News was the promise that you could host your AMP pages yourself. That’s the very reason I first got interested in AMP. But if you actually want the benefits of AMP—appearing in the not-search-results carousel, pre-rendered performance, etc.—then your pages must be hosted by Google.
So, to summarise, here are three statements that Google’s AMP team are currently peddling as being true:
- AMP is a community project, not a Google project.
- AMP pages don’t receive preferential treatment in search results.
- AMP pages are hosted on your own domain.
I don’t think those statements are even truthy, much less true. In fact, if I were looking for the right term to semantically describe any one of those statements, the closest in meaning would be this:
A statement used intentionally for the purpose of deception.
That is the dictionary definition of a lie.
Update: That last part was a bit much. Sorry about that. I know it’s a bit much because The Register got all gloaty about it.
I don’t think the developers working on the AMP format are intentionally deceptive (although they are engaging in some impressive cognitive gymnastics). The AMP ecosystem, on the other hand, that’s another story—the preferential treatment of Google-hosted AMP pages in the carousel and in search results; that’s messed up.
Still, I would do well to remember that there are well-meaning people working on even the fishiest of projects.
Except for the people working at the shitrag that is The Register.
(The other strong signal that I overstepped the bounds of decency was that this post attracted the pond scum of Hacker News. That’s another place where the “well-meaning people work on even the fishiest of projects” rule definitely doesn’t apply.)
Thursday, September 21st, 2017
Well, I guess it’s time to change all my locally-hosted sites from
.dev domains to
.test. Thanks, Google.
Friday, July 28th, 2017
Hadley points to the serious security concerns with AMP:
Fundamentally, we think that it’s crucial to the web ecosystem for you to understand where content comes from and for the browser to protect you from harm. We are seriously concerned about publication strategies that undermine them.
The anchor element is designed to allow one website to refer visitors to content on another website, whilst retaining all the features of the web platform. We encourage distribution platforms to use this mechanism where appropriate. We encourage the loading of pages from original source origins, rather than re-hosted, non-canonical locations.
That last sentence there? That’s what I’m talking about!
Thursday, July 20th, 2017
It’s all very admirable, but it also feels a little bit 927.
Sunday, May 14th, 2017
This looks like a useful tool, not just for testing locally-hosted sites (say, at a device lab), but also for making locally-hosted sites run on HTTPS so you can test service workers.
Monday, March 20th, 2017
🎵 If there’s something strange, in your neighbourhood’s architectural renders, who you gonna call? 🎵
(I ain’t ‘fraid of no render ghost.)
Tuesday, January 5th, 2016
If you’re planning the move to TLS and your server is on Digital Ocean running Nginx, Graham’s here to run you through the (surprisingly simple) process.
Monday, October 5th, 2015
Just when I think that I don’t get the point of Medium, along comes Dan to show me the light. This thought-provoking thinkpiece isn’t quite on the same level of his seminal groundbreaking kittens work, but I guarantee it will stay with you.
Friday, March 6th, 2015
Sorting out hosting is a big stumbling block for people who want to go down the Indie Web route. Frankly it’s much easier to just use a third-party silo like Facebook or Twitter. I’ve been saying for a while now that I’d really like to see “concierge” services for hosting—”here, you take care of all this hassle!”
Well, this initiative looks like exactly that.
Monday, November 17th, 2014
Aaron raises a point that I’ve discussed before in regards to the indie web (and indeed, the web in general): we don’t buy domain names; we rent them.
It strikes me that all the good things about the web are decentralised (one-way linking, no central authority required to add a node), but all the sticking points are centralised: ICANN, DNS.
Aaron also points out that we are beholden to our hosting companies, although—having moved hosts a number of times myself—that’s an issue that DNS (and URLs in general) helps alleviate. And there’s now some interesting work going on in literally owning your own website: a web server in the home.