Tags: cbe

6

sparkline

Saturday, September 21st, 2019

Going offline with microformats

For the offline page on my website, I’ve been using a mixture of the Cache API and the localStorage API. My service worker script uses the Cache API to store copies of pages for offline retrieval. But I used the localStorage API to store metadata about the page—title, description, and so on. Then, my offline page would rifle through the pages stored in a cache, and retreive the corresponding metadata from localStorage.

It all worked fine, but as soon as I read Remy’s post about the forehead-slappingly brilliant technique he’s using, I knew I’d be switching my code over. Instead of using localStorage—or any other browser API—to store and retrieve metadata, he uses the pages themselves! Using the Cache API, you can examine the contents of the pages you’ve stored, and get at whatever information you need:

I realised I didn’t need to store anything. HTML is the API.

Refactoring the code for my offline page felt good for a couple of reasons. First of all, I was able to remove a dependency—localStorage—and simplify the JavaScript. That always feels good. But the other reason for the warm fuzzies is that I was able to use data instead of metadata.

Many years ago, Cory Doctorow wrote a piece called Metacrap. In it, he enumerates the many issues with metadata—data about data. The source of many problems is when the metadata is stored separately from the data it describes. The data may get updated, without a corresponding update happening to the metadata. Metadata tends to rot because it’s invisible—out of sight and out of mind.

In fact, that’s always been at the heart of one of the core principles behind microformats. Instead of duplicating information—once as data and again as metadata—repurpose the visible data; mark it up so its meta-information is directly attached to the information itself.

So if you have a person’s contact details on a web page, rather than repeating that information somewhere else—in the head of the document, say—you could instead attach some kind of marker to indicate which bits of the visible information are contact details. In the case of microformats, that’s done with class attributes. You can mark up a page that already has your contact information with classes from the h-card microformat.

Here on my website, I’ve marked up my blog posts, articles, and links using the h-entry microformat. These classes explicitly mark up the content to say “this is the title”, “this is the content”, and so on. This makes it easier for other people to repurpose my content. If, for example, I reply to a post on someone else’s website, and ping them with a webmention, they can retrieve my post and know which bit is the title, which bit is the content, and so on.

When I read Remy’s post about using the Cache API to retrieve information directly from cached pages, I knew I wouldn’t have to do much work. Because all of my posts are already marked up with h-entry classes, I could use those hooks to create a nice offline page.

The markup for my offline page looks like this:

<h1>Offline</h1>
<p>Sorry. It looks like the network connection isn’t working right now.</p>
<div id="history">
</div>

I’ll populate that “history” div with information from a cache called “pages” that I’ve created using the Cache API in my service worker.

I’m going to use async/await to do this because there are lots of steps that rely on the completion of the step before. “Open this cache, then get the keys of that cache, then loop through the pages, then…” All of those thens would lead to some serious indentation without async/await.

All async functions have to have a name—no anonymous async functions allowed. I’m calling this one listPages, just like Remy is doing. I’m making the listPages function execute immediately:

(async function listPages() {
...
})();

Now for the code to go inside that immediately-invoked function.

I create an array called browsingHistory that I’ll populate with the data I’ll use for that “history” div.

const browsingHistory = [];

I’m going to be parsing web pages later on, so I’m going to need a DOM parser. I give it the imaginative name of …parser.

const parser = new DOMParser();

Time to open up my “pages” cache. This is the first await statement. When the cache is opened, this promise will resolve and I’ll have access to this cache using the variable …cache (again with the imaginative naming).

const cache = await caches.open('pages');

Now I get the keys of the cache—that’s a list of all the page requests in there. This is the second await. Once the keys have been retrieved, I’ll have a variable that’s got a list of all those pages. You’ll never guess what I’m calling the variable that stores the keys of the cache. That’s right …keys!

const keys = await cache.keys();

Time to get looping. I’m getting each request in the list of keys using a for/of loop:

for (const request of keys) {
...
}

Inside the loop, I pull the page out of the cache using the match() method of the Cache API. I’ll store what I get back in a variable called response. As with everything involving the Cache API, this is asynchronous so I need to use the await keyword here.

const response = await cache.match(request);

I’m not interested in the headers of the response. I’m specifically looking for the HTML itself. I can get at that using the text() method. Again, it’s asynchronous and I want this promise to resolve before doing anything else, so I use the await keyword. When the promise resolves, I’ll have a variable called html that contains the body of the response.

const html = await response.text();

Now I can use that DOM parser I created earlier. I’ve got a string of text in the html variable. I can generate a Document Object Model from that string using the parseFromString() method. This isn’t asynchronous so there’s no need for the await keyword.

const dom = parser.parseFromString(html, 'text/html');

Now I’ve got a DOM, which I have creatively stored in a variable called …dom.

I can poke at it using DOM methods like querySelector. I can test to see if this particular page has an h-entry on it by looking for an element with a class attribute containing the value “h-entry”:

if (dom.querySelector('.h-entry h1.p-name') {
...
}

In this particular case, I’m also checking to see if the h1 element of the page is the title of the h-entry. That’s so that index pages (like my home page) won’t get past this if statement.

Inside the if statement, I’m going to store the data I retrieve from the DOM. I’ll save the data into an object called …data!

const data = new Object;

Well, the first piece of data isn’t actually in the markup: it’s the URL of the page. I can get that from the request variable in my for loop.

data.url = request.url;

I’m going to store the timestamp for this h-entry. I can get that from the datetime attribute of the time element marked up with a class of dt-published.

data.timestamp = new Date(dom.querySelector('.h-entry .dt-published').getAttribute('datetime'));

While I’m at it, I’m going to grab the human-readable date from the innerText property of that same time.dt-published element.

data.published = dom.querySelector('.h-entry .dt-published').innerText;

The title of the h-entry is in the innerText of the element with a class of p-name.

data.title = dom.querySelector('.h-entry .p-name').innerText;

At this point, I am actually going to use some metacrap instead of the visible h-entry content. I don’t output a description of the post anywhere in the body of the page, but I do put it in the head in a meta element. I’ll grab that now.

data.description = dom.querySelector('meta[name="description"]').getAttribute('content');

Alright. I’ve got a URL, a timestamp, a publication date, a title, and a description, all retrieved from the HTML. I’ll stick all of that data into my browsingHistory array.

browsingHistory.push(data);

My if statement and my for/in loop are finished at this point. Here’s how the whole loop looks:

for (const request of keys) {
  const response = await cache.match(request);
  const html = await response.text();
  const dom = parser.parseFromString(html, 'text/html');
  if (dom.querySelector('.h-entry h1.p-name')) {
    const data = new Object;
    data.url = request.url;
    data.timestamp = new Date(dom.querySelector('.h-entry .dt-published').getAttribute('datetime'));
    data.published = dom.querySelector('.h-entry .dt-published').innerText;
    data.title = dom.querySelector('.h-entry .p-name').innerText;
    data.description = dom.querySelector('meta[name="description"]').getAttribute('content');
    browsingHistory.push(data);
  }
}

That’s the data collection part of the code. Now I’m going to take all that yummy information an output it onto the page.

First of all, I want to make sure that the browsingHistory array isn’t empty. There’s no point going any further if it is.

if (browsingHistory.length) {
...
}

Within this if statement, I can do what I want with the data I’ve put into the browsingHistory array.

I’m going to arrange the data by date published. I’m not sure if this is the right thing to do. Maybe it makes more sense to show the pages in the order in which you last visited them. I may end up removing this at some point, but for now, here’s how I sort the browsingHistory array according to the timestamp property of each item within it:

browsingHistory.sort( (a,b) => {
  return b.timestamp - a.timestamp;
});

Now I’m going to concatenate some strings. This is the string of HTML text that will eventually be put into the “history” div. I’m storing the markup in a string called …markup (my imagination knows no bounds).

let markup = '<p>But you still have something to read:</p>';

I’m going to add a chunk of markup for each item of data.

browsingHistory.forEach( data => {
  markup += `
<h2><a href="${ data.url }">${ data.title }</a></h2>
<p>${ data.description }</p>
<p class="meta">${ data.published }</p>
`;
});

With my markup assembled, I can now insert it into the “history” part of my offline page. I’m using the handy insertAdjacentHTML() method to do this.

document.getElementById('history').insertAdjacentHTML('beforeend', markup);

Here’s what my finished JavaScript looks like:

<script>
(async function listPages() {
  const browsingHistory = [];
  const parser = new DOMParser();
  const cache = await caches.open('pages');
  const keys = await cache.keys();
  for (const request of keys) {
    const response = await cache.match(request);
    const html = await response.text();
    const dom = parser.parseFromString(html, 'text/html');
    if (dom.querySelector('.h-entry h1.p-name')) {
      const data = new Object;
      data.url = request.url;
      data.timestamp = new Date(dom.querySelector('.h-entry .dt-published').getAttribute('datetime'));
      data.published = dom.querySelector('.h-entry .dt-published').innerText;
      data.title = dom.querySelector('.h-entry .p-name').innerText;
      data.description = dom.querySelector('meta[name="description"]').getAttribute('content');
      browsingHistory.push(data);
    }
  }
  if (browsingHistory.length) {
    browsingHistory.sort( (a,b) => {
      return b.timestamp - a.timestamp;
    });
    let markup = '<p>But you still have something to read:</p>';
    browsingHistory.forEach( data => {
      markup += `
<h2><a href="${ data.url }">${ data.title }</a></h2>
<p>${ data.description }</p>
<p class="meta">${ data.published }</p>
`;
    });
    document.getElementById('history').insertAdjacentHTML('beforeend', markup);
  }
})();
</script>

I’m pretty happy with that. It’s not too long but it’s still quite readable (I hope). It shows that the Cache API and the h-entry microformat are a match made in heaven.

If you’ve got an offline strategy for your website, and you’re using h-entry to mark up your content, feel free to use that code.

If you don’t have an offline strategy for your website, there’s a book for that.

Sunday, March 17th, 2019

Checked in at Jolly Brewer. St. Patrick’s Day session — with Jessica map

Checked in at Jolly Brewer. St. Patrick’s Day session — with Jessica

Sunday, June 17th, 2018

Detecting image requests in service workers

In Going Offline, I dive into the many different ways you can use a service worker to handle requests. You can filter by the URL, for example; treating requests for pages under /blog or /articles differently from other requests. Or you can filter by file type. That way, you can treat requests for, say, images very differently to requests for HTML pages.

One of the ways to check what kind of request you’re dealing with is to see what’s in the accept header. Here’s how I show the test for HTML pages:

if (request.headers.get('Accept').includes('text/html')) {
    // Handle your page requests here.
}

So, logically enough, I show the same technique for detecting image requests:

if (request.headers.get('Accept').includes('image')) {
    // Handle your image requests here.
}

That should catch any files that have image in the request’s accept header, like image/png or image/jpeg or image/svg+xml and so on.

But there’s a problem. Both Safari and Firefox now use a much broader accept header: */*

My if statement evaluates to false in those browsers. Sebastian Eberlein wrote about his workaround for this issue, which involves looking at file extensions instead:

if (request.url.match(/\.(jpe?g|png|gif|svg)$/)) {
    // Handle your image requests here.
}

So consider this post a patch for chapter five of Going Offline (page 68 specifically). Wherever you see:

if (request.headers.get('Accept').includes('image'))

Swap it out for:

if (request.url.match(/\.(jpe?g|png|gif|svg)$/))

And feel to add any other image file extensions (like webp) in there too.

Monday, February 26th, 2018

Ends and means

The latest edition of the excellent History Of The Web newsletter is called The Day(s) The Web Fought Back. It recounts the first time that websites stood up against bad legislation in the form of the Communications Decency Act (CDA), and goes to recount the even more effective use of blackout protests against SOPA and PIPA.

I remember feeling very heartened to see WikiPedia, Google and others take a stand on January 18th, 2012. But I also remember feeling uneasy. In this particular case, companies were lobbying for a cause I agreed with. But what if they were lobbying for a cause I didn’t agree with? Large corporations using their power to influence politics seems like a very bad idea. Isn’t it still a bad idea, even if I happen to agree with the cause?

Cloudflare quite rightly kicked The Daily Stormer off their roster of customers. Then the CEO of Cloudflare quite rightly wrote this in a company-wide memo:

Literally, I woke up in a bad mood and decided someone shouldn’t be allowed on the Internet. No one should have that power.

There’s an uncomfortable tension here. When do the ends justify the means? Isn’t the whole point of having principles that they hold true even in the direst circumstances? Why even claim that corporations shouldn’t influence politics if you’re going to make an exception for net neutrality? Why even claim that free speech is sacrosanct if you make an exception for nazi scum?

Those two examples are pretty extreme and I can easily justify the exceptions to myself. Net neutrality is too important. Stopping fascism is too important. But where do I draw the line? At what point does something become “too important?”

There are more subtle examples of corporations wielding their power. Google are constantly using their monopoly position in search and browser marketshare to exert influence over website-builders. In theory, that’s bad. But in practice, I find myself agreeing with specific instances. Prioritising mobile-friendly sites? Sounds good to me. Penalising intrusive ads? Again, that seems okey-dokey to me. But surely that’s not the point. So what if I happen to agree with the ends being pursued? The fact that a company the size and power of Google is using their monopoly for any influence is worrying, regardless of whether I agree with the specific instances. But I kept my mouth shut.

Now I see Google abusing their monopoly again, this time with AMP. They may call the preferential treatment of Google-hosted AMP-formatted pages a “carrot”, but let’s be honest, it’s an abuse of power, plain and simple.

By the way, I have no doubt that the engineers working on AMP have the best of intentions. We are all pursuing the same ends. We all want a faster web. But we disagree on the means. If Google search results gave preferential treatment to any fast web pages, that would be fine. But by only giving preferential treatment to pages written in a format that they created, and hosted on their own servers, they are effectively forcing everyone to use AMP. I know for a fact that there are plenty of publications who are producing AMP content, not because they are sold on the benefits of the technology, but because they feel strong-armed into doing it in order to compete.

If the ends justify the means, then it’s easy to write off Google’s abuse of power. Those well-intentioned AMP engineers honestly think that they have the best interests of the web at heart:

We were worried about the web not existing anymore due to native apps and walled gardens killing it off. We wanted to make the web competitive. We saw a sense of urgency and thus we decided to build on the extensible web to build AMP instead of waiting for standard and browsers and websites to catch up. I stand behind this process. I’m a practical guy.

There’s real hubris and audacity in thinking that one company should be able to tackle fixing the whole web. I think the AMP team are genuinely upset and hurt that people aren’t cheering them on. Perhaps they will dismiss the criticisms as outpourings of “Why wasn’t I consulted?” But that would be a mistake. The many thoughtful people who are extremely critical of AMP are on the same side as the AMP team when it comes the end-goal of better, faster websites. But burning the web to save it? No thanks.

Ben Thompson goes into more detail on the tension between the ends and the means in The Aggregator Paradox:

The problem with Google’s actions should be obvious: the company is leveraging its monopoly in search to push the AMP format, and the company is leveraging its dominant position in browsers to punish sites with bad ads. That seems bad!

And yet, from a user perspective, the options I presented at the beginning — fast loading web pages with responsive designs that look great on mobile and the elimination of pop-up ads, ad overlays, and autoplaying videos with sounds — sounds pretty appealing!

From that perspective, there’s a moral argument to be made for wielding monopoly power like Google is doing. No doubt the AMP team feel it would be morally wrong for Google not to use its influence in search to give preferential treatment to AMP pages.

Going back to the opening examples of online blackouts, was it morally wrong for companies to use their power to influence politics? Or would it have been morally wrong for them not to have used their influence?

When do the ends justify the means?

Here’s a more subtle example than Google AMP, but one which has me just as worried for the future of the web. Mozilla announced that any new web features they add to their browser will require HTTPS.

The end-goal here is one I agree with: HTTPS everywhere. On the face of it, the means of reaching that goal seem reasonable. After all, we already require HTTPS for sensitive JavaScript APIs like geolocation or service workers. But the devil is in the details:

Effective immediately, all new features that are web-exposed are to be restricted to secure contexts. Web-exposed means that the feature is observable from a web page or server, whether through JavaScript, CSS, HTTP, media formats, etc. A feature can be anything from an extension of an existing IDL-defined object, a new CSS property, a new HTTP response header, to bigger features such as WebVR.

Emphasis mine.

This is a step too far. Again, I am in total agreement that we should be encouraging everyone to switch to HTTPS. But requiring HTTPS in order to use CSS? The ends don’t justify the means.

If there were valid security reasons for making HTTPS a requirement, I would be all for enforcing this. But these are two totally separate areas. Enforcing HTTPS by withholding CSS support is no different to enforcing AMP by withholding search placement. In some ways, I think it might actually be worse.

There’s an assumption in this decision that websites are being made by professionals who will know how to switch to HTTPS. But the web is for everyone. Not just for everyone to use. It’s for everyone to build.

One of my greatest fears for the web is that building it becomes the domain of a professional priesthood. Anything that raises the bar to writing some HTML or CSS makes me very worried. Usually it’s toolchains that make things more complex, but in this case the barrier to entry is being brought right into the browser itself.

I’m trying to imagine future Codebar evenings, helping people to make their first websites, but now having to tell them that some CSS will be off-limits until they meet the entry requirements of HTTPS …even though CSS and HTTPS have literally nothing to do with one another. (And yes, there will be an exception for localhost and I really hope there’ll be an exception for file: as well, but that’s simply postponing the disappointment.)

No doubt Mozilla (and the W3C Technical Architecture Group) believe that they are doing the right thing. Perhaps they think it would be morally wrong if browsers didn’t enforce HTTPS even for unrelated features like new CSS properties. They believe that, in this particular case, the ends justify the means.

I strongly disagree. If you also disagree, I encourage you to make your voice heard. Remember, this isn’t about whether you think that we should all switch to HTTPS—we’re all in agreement on that. This is about whether it’s okay to create collateral damage by deliberately denying people access to web features in order to further a completely separate agenda.

This isn’t about you or me. This is about all those people who could potentially become makers of the web. We should be welcoming them, not creating barriers for them to overcome.

Saturday, September 16th, 2017

map

Checked in at Terminal 1. with Jessica

Sunday, January 1st, 2006

Honour goes to Apple gadget guru

Jonathan Ive is getting a CBE.