Tags: format

312

sparkline

Saturday, April 25th, 2020

Reading

At the beginning of the year, Remy wrote about extracting Goodreads metadata so he could create his end-of-year reading list. More recently, Mark Llobrera wrote about how he created a visualisation of his reading history. In his case, he’s using JSON to store the information.

This kind of JSON storage is exactly what Tom Critchlow proposes in his post, Library JSON - A Proposal for a Decentralized Goodreads:

Thinking through building some kind of “web of books” I realized that we could use something similar to RSS to build a kind of decentralized GoodReads powered by indie sites and an underlying easy to parse format.

His proposal looks kind of similar to what Mark came up with. There’s a title, an author, an image, and some kind of date for when you started and/or finished reading the book.

Matt then points out that RSS gets close to the data format being suggested and asks how about using RSS?:

Rather than inventing a new format, my suggestion is that this is RSS plus an extension to deal with books. This is analogous to how the podcast feeds are specified: they are RSS plus custom tags.

Like Matt, I’m in favour of re-using existing wheels rather than inventing new ones, mostly to avoid a 927 situation.

But all of these proposals—whether JSON or RSS—involve the creation of a separate file, and yet the information is originally published in HTML. Along the lines of Matt’s idea, I could imagine extending the h-entry collection of class names to allow for books (or films, or other media). It already handles images (with u-photo). I think the missing fields are the date-related ones: when you start and finish reading. Those fields are present in a different microformat, h-event in the form of dt-start and dt-end. Maybe they could be combined:


<article class="h-entry h-event h-review">
<h1 class="p-name p-item">Book title</h1>
<img class="u-photo" src="image.jpg" alt="Book cover.">
<p class="p-summary h-card">Book author</p>
<time class="dt-start" datetime="YYYY-MM-DD">Start date</time>
<time class="dt-end" datetime="YYYY-MM-DD">End date</time>
<div class="e-content">Remarks</div>
<data class="p-rating" value="5">★★★★★</data>
<time class="dt-published" datetime="YYYY-MM-DDThh:mm">Date of this post</time>
</article>

That markup is simultaneously a post (h-entry) and an event (h-event) and you can even throw in h-card for the book author (as well as h-review if you like to rate the books you read). It can be converted to RSS and also converted to .ics for calendars—those parsers are already out there. It’s ready for aggregation and it’s ready for visualisation.

I publish very minimal reading posts here on adactio.com. What little data is there isn’t very structured—I don’t even separate the book title from the author. But maybe I’ll have a little play around with turning these h-entries into combined h-entry/event posts.

Thursday, April 23rd, 2020

Better Image Optimization by Restricting the Color Index – Eric’s Archived Thoughts

A great little mini case-study from Eric—if you’re exporting transparent PNGs from a graphic design tool, double-check the colour-depth settings!

I’d been saving the PNGs with no bit depth restrictions, meaning the color table was holding space for 224 colors. That’s… a lot of colors, roughly 224 of which I wasn’t actually using.

Wednesday, April 8th, 2020

The Cuneiform Tablets of 2015 [PDF]

A 2015 paper by Long Tien Nguyen and Alan Kay with a proposal for digital preservation.

We discuss the problem of running today’s software decades,centuries, or even millennia into the future.

Friday, March 13th, 2020

Rise of the Digital Fonts

A history of typesetting from movable type to variable fonts.

Thursday, February 20th, 2020

“Let us Calculate!”: Leibniz, Llull, and the Computational Imagination – The Public Domain Review

The characteristica universalis and the calculus racionator of Leibniz.

Emma Willard’s Maps of Time

The beautiful 19th century data visualisations of Emma Willard unfold in this immersive piece by Susan Schulten.

Wednesday, January 22nd, 2020

28c3: The Science of Insecurity - YouTube

I understand less than half of this great talk by Meredith L. Patterson, but it ticks all my boxes: Leibniz, Turing, Borges, and Postel’s Law.

(via Tim Berners-Lee)

28c3: The Science of Insecurity

Thursday, November 7th, 2019

Information mesh

Timelines of people, interfaces, technologies and more:

30 years of facts about the World Wide Web.

Tuesday, October 22nd, 2019

Saturday, September 21st, 2019

Going offline with microformats

For the offline page on my website, I’ve been using a mixture of the Cache API and the localStorage API. My service worker script uses the Cache API to store copies of pages for offline retrieval. But I used the localStorage API to store metadata about the page—title, description, and so on. Then, my offline page would rifle through the pages stored in a cache, and retreive the corresponding metadata from localStorage.

It all worked fine, but as soon as I read Remy’s post about the forehead-slappingly brilliant technique he’s using, I knew I’d be switching my code over. Instead of using localStorage—or any other browser API—to store and retrieve metadata, he uses the pages themselves! Using the Cache API, you can examine the contents of the pages you’ve stored, and get at whatever information you need:

I realised I didn’t need to store anything. HTML is the API.

Refactoring the code for my offline page felt good for a couple of reasons. First of all, I was able to remove a dependency—localStorage—and simplify the JavaScript. That always feels good. But the other reason for the warm fuzzies is that I was able to use data instead of metadata.

Many years ago, Cory Doctorow wrote a piece called Metacrap. In it, he enumerates the many issues with metadata—data about data. The source of many problems is when the metadata is stored separately from the data it describes. The data may get updated, without a corresponding update happening to the metadata. Metadata tends to rot because it’s invisible—out of sight and out of mind.

In fact, that’s always been at the heart of one of the core principles behind microformats. Instead of duplicating information—once as data and again as metadata—repurpose the visible data; mark it up so its meta-information is directly attached to the information itself.

So if you have a person’s contact details on a web page, rather than repeating that information somewhere else—in the head of the document, say—you could instead attach some kind of marker to indicate which bits of the visible information are contact details. In the case of microformats, that’s done with class attributes. You can mark up a page that already has your contact information with classes from the h-card microformat.

Here on my website, I’ve marked up my blog posts, articles, and links using the h-entry microformat. These classes explicitly mark up the content to say “this is the title”, “this is the content”, and so on. This makes it easier for other people to repurpose my content. If, for example, I reply to a post on someone else’s website, and ping them with a webmention, they can retrieve my post and know which bit is the title, which bit is the content, and so on.

When I read Remy’s post about using the Cache API to retrieve information directly from cached pages, I knew I wouldn’t have to do much work. Because all of my posts are already marked up with h-entry classes, I could use those hooks to create a nice offline page.

The markup for my offline page looks like this:

<h1>Offline</h1>
<p>Sorry. It looks like the network connection isn’t working right now.</p>
<div id="history">
</div>

I’ll populate that “history” div with information from a cache called “pages” that I’ve created using the Cache API in my service worker.

I’m going to use async/await to do this because there are lots of steps that rely on the completion of the step before. “Open this cache, then get the keys of that cache, then loop through the pages, then…” All of those thens would lead to some serious indentation without async/await.

All async functions have to have a name—no anonymous async functions allowed. I’m calling this one listPages, just like Remy is doing. I’m making the listPages function execute immediately:

(async function listPages() {
...
})();

Now for the code to go inside that immediately-invoked function.

I create an array called browsingHistory that I’ll populate with the data I’ll use for that “history” div.

const browsingHistory = [];

I’m going to be parsing web pages later on, so I’m going to need a DOM parser. I give it the imaginative name of …parser.

const parser = new DOMParser();

Time to open up my “pages” cache. This is the first await statement. When the cache is opened, this promise will resolve and I’ll have access to this cache using the variable …cache (again with the imaginative naming).

const cache = await caches.open('pages');

Now I get the keys of the cache—that’s a list of all the page requests in there. This is the second await. Once the keys have been retrieved, I’ll have a variable that’s got a list of all those pages. You’ll never guess what I’m calling the variable that stores the keys of the cache. That’s right …keys!

const keys = await cache.keys();

Time to get looping. I’m getting each request in the list of keys using a for/of loop:

for (const request of keys) {
...
}

Inside the loop, I pull the page out of the cache using the match() method of the Cache API. I’ll store what I get back in a variable called response. As with everything involving the Cache API, this is asynchronous so I need to use the await keyword here.

const response = await cache.match(request);

I’m not interested in the headers of the response. I’m specifically looking for the HTML itself. I can get at that using the text() method. Again, it’s asynchronous and I want this promise to resolve before doing anything else, so I use the await keyword. When the promise resolves, I’ll have a variable called html that contains the body of the response.

const html = await response.text();

Now I can use that DOM parser I created earlier. I’ve got a string of text in the html variable. I can generate a Document Object Model from that string using the parseFromString() method. This isn’t asynchronous so there’s no need for the await keyword.

const dom = parser.parseFromString(html, 'text/html');

Now I’ve got a DOM, which I have creatively stored in a variable called …dom.

I can poke at it using DOM methods like querySelector. I can test to see if this particular page has an h-entry on it by looking for an element with a class attribute containing the value “h-entry”:

if (dom.querySelector('.h-entry h1.p-name') {
...
}

In this particular case, I’m also checking to see if the h1 element of the page is the title of the h-entry. That’s so that index pages (like my home page) won’t get past this if statement.

Inside the if statement, I’m going to store the data I retrieve from the DOM. I’ll save the data into an object called …data!

const data = new Object;

Well, the first piece of data isn’t actually in the markup: it’s the URL of the page. I can get that from the request variable in my for loop.

data.url = request.url;

I’m going to store the timestamp for this h-entry. I can get that from the datetime attribute of the time element marked up with a class of dt-published.

data.timestamp = new Date(dom.querySelector('.h-entry .dt-published').getAttribute('datetime'));

While I’m at it, I’m going to grab the human-readable date from the innerText property of that same time.dt-published element.

data.published = dom.querySelector('.h-entry .dt-published').innerText;

The title of the h-entry is in the innerText of the element with a class of p-name.

data.title = dom.querySelector('.h-entry .p-name').innerText;

At this point, I am actually going to use some metacrap instead of the visible h-entry content. I don’t output a description of the post anywhere in the body of the page, but I do put it in the head in a meta element. I’ll grab that now.

data.description = dom.querySelector('meta[name="description"]').getAttribute('content');

Alright. I’ve got a URL, a timestamp, a publication date, a title, and a description, all retrieved from the HTML. I’ll stick all of that data into my browsingHistory array.

browsingHistory.push(data);

My if statement and my for/in loop are finished at this point. Here’s how the whole loop looks:

for (const request of keys) {
  const response = await cache.match(request);
  const html = await response.text();
  const dom = parser.parseFromString(html, 'text/html');
  if (dom.querySelector('.h-entry h1.p-name')) {
    const data = new Object;
    data.url = request.url;
    data.timestamp = new Date(dom.querySelector('.h-entry .dt-published').getAttribute('datetime'));
    data.published = dom.querySelector('.h-entry .dt-published').innerText;
    data.title = dom.querySelector('.h-entry .p-name').innerText;
    data.description = dom.querySelector('meta[name="description"]').getAttribute('content');
    browsingHistory.push(data);
  }
}

That’s the data collection part of the code. Now I’m going to take all that yummy information an output it onto the page.

First of all, I want to make sure that the browsingHistory array isn’t empty. There’s no point going any further if it is.

if (browsingHistory.length) {
...
}

Within this if statement, I can do what I want with the data I’ve put into the browsingHistory array.

I’m going to arrange the data by date published. I’m not sure if this is the right thing to do. Maybe it makes more sense to show the pages in the order in which you last visited them. I may end up removing this at some point, but for now, here’s how I sort the browsingHistory array according to the timestamp property of each item within it:

browsingHistory.sort( (a,b) => {
  return b.timestamp - a.timestamp;
});

Now I’m going to concatenate some strings. This is the string of HTML text that will eventually be put into the “history” div. I’m storing the markup in a string called …markup (my imagination knows no bounds).

let markup = '<p>But you still have something to read:</p>';

I’m going to add a chunk of markup for each item of data.

browsingHistory.forEach( data => {
  markup += `
<h2><a href="${ data.url }">${ data.title }</a></h2>
<p>${ data.description }</p>
<p class="meta">${ data.published }</p>
`;
});

With my markup assembled, I can now insert it into the “history” part of my offline page. I’m using the handy insertAdjacentHTML() method to do this.

document.getElementById('history').insertAdjacentHTML('beforeend', markup);

Here’s what my finished JavaScript looks like:

<script>
(async function listPages() {
  const browsingHistory = [];
  const parser = new DOMParser();
  const cache = await caches.open('pages');
  const keys = await cache.keys();
  for (const request of keys) {
    const response = await cache.match(request);
    const html = await response.text();
    const dom = parser.parseFromString(html, 'text/html');
    if (dom.querySelector('.h-entry h1.p-name')) {
      const data = new Object;
      data.url = request.url;
      data.timestamp = new Date(dom.querySelector('.h-entry .dt-published').getAttribute('datetime'));
      data.published = dom.querySelector('.h-entry .dt-published').innerText;
      data.title = dom.querySelector('.h-entry .p-name').innerText;
      data.description = dom.querySelector('meta[name="description"]').getAttribute('content');
      browsingHistory.push(data);
    }
  }
  if (browsingHistory.length) {
    browsingHistory.sort( (a,b) => {
      return b.timestamp - a.timestamp;
    });
    let markup = '<p>But you still have something to read:</p>';
    browsingHistory.forEach( data => {
      markup += `
<h2><a href="${ data.url }">${ data.title }</a></h2>
<p>${ data.description }</p>
<p class="meta">${ data.published }</p>
`;
    });
    document.getElementById('history').insertAdjacentHTML('beforeend', markup);
  }
})();
</script>

I’m pretty happy with that. It’s not too long but it’s still quite readable (I hope). It shows that the Cache API and the h-entry microformat are a match made in heaven.

If you’ve got an offline strategy for your website, and you’re using h-entry to mark up your content, feel free to use that code.

If you don’t have an offline strategy for your website, there’s a book for that.

Friday, May 24th, 2019

The Bit Player

Ooh! A documentary on Claude Shannon—exciting!

I just finished reading A Mind At Play, the (very good) biography of Claude Shannon, so this film feels very timely.

Mixing contemporary interviews, archival film, animation and dialogue drawn from interviews conducted with Shannon himself, The Bit Player tells the story of an overlooked genius who revolutionized the world, but never lost his childlike curiosity.

Tuesday, May 7th, 2019

Unraveling The JPEG

A deep, deep, deep dive into the JPEG format. Best of all, it’s got interactive explanations you can tinker with, a la Nicky Case or Bret Victor.

Tuesday, April 30th, 2019

Progressive Font Enrichment: reinventing web font performance | Responsive Web Typography

Jason describes the next big thing in web typography: streaming fonts!

…to enable the ability for only the required part of the font be downloaded on any given page, and for subsequent requests for that font to dynamically ‘patch’ the original download with additional sets of glyphs as required on successive page views—even if they occur on separate sites.

Sunday, April 28th, 2019

Norbert Wiener’s Human Use of Human Beings is more relevant than ever.

What would Wiener think of the current human use of human beings? He would be amazed by the power of computers and the internet. He would be happy that the early neural nets in which he played a role have spawned powerful deep-learning systems that exhibit the perceptual ability he demanded of them—although he might not be impressed that one of the most prominent examples of such computerized Gestalt is the ability to recognize photos of kittens on the World Wide Web.

Friday, March 22nd, 2019

Gutenberg and the Internet

Steven Pemberton’s presentation on the printing press, the internet, Moore’s Law, and exponential growth.

Friday, January 11th, 2019

Four Cool URLs - Alex Pounds’ Blog

A fellow URL fetishest!

I love me a well-designed URL scheme—here’s four interesting approaches.

URLs are consumed by machines, but they should be designed for humans. If your URL thinking stops at “uniquely identifies a page” and “good for SEO”, you’re missing out.

Monday, December 3rd, 2018

Voxxed Thessaloniki 2018 - Opening Keynote - Taking Back The Web - YouTube

Here’s the talk I gave recently about indie web building blocks.

There’s fifteen minutes of Q&A starting around the 35 minute mark. People asked some great questions!

Thursday, November 22nd, 2018

Great Leap Years - Official site of Stephen Fry

I just binge-listened to the six episodes of the first season of this podcast from Stephen Fry—it’s excellent!

It covers the history of communication from the emergence of language to the modern day. At first I was worried that it was going to rehash some of the more questionable ideas in the risible Sapiens, but it turned out to be far more like James Gleick’s The Information or Tom Standage’s The Victorian Internet (two of my favourite books on the history of technology).

There’s no annoying sponsorship interruptions and the whole series feels more like an audiobook than a podcast—an audiobook researched, written and read by Stephen Fry!

Tuesday, November 13th, 2018

Optimise without a face

I’ve been playing around with the newly-released Squoosh, the spiritual successor to Jake’s SVGOMG. You can drag images into the browser window, and eyeball the changes that any optimisations might make.

On a project that Cassie is working on, it worked really well for optimising some JPEGs. But there were a few images that would require a bit more fine-grained control of the optimisations. Specifically, pictures with human faces in them.

I’ve written about this before. If there’s a human face in image, I open that image in a graphics editing tool like Photoshop, select everything but the face, and add a bit of blur. Because humans are hard-wired to focus on faces, we’ll notice any jaggy artifacts on a face, but we’re far less likely to notice jagginess in background imagery: walls, materials, clothing, etc.

On the face of it (hah!), a browser-based tool like Squoosh wouldn’t be able to optimise for faces, but then Cassie pointed out something really interesting…

When we were both at FFConf on Friday, there was a great talk by Eleanor Haproff on machine learning with JavaScript. It turns out there are plenty of smart toolkits out there, and one of them is facial recognition. So I wonder if it’s possible to build an in-browser tool with this workflow:

  • Drag or upload an image into the browser window,
  • A facial recognition algorithm finds any faces in the image,
  • Those portions of the image remain crisp,
  • The rest of the image gets a slight blur,
  • Download the optimised image.

Maybe the selecting/blurring part would need canvas? I don’t know.

Anyway, I thought this was a brilliant bit of synthesis from Cassie, and now I’ve got two questions:

  1. Does this exist yet? And, if not,
  2. Does anyone want to try building it?

Monday, November 12th, 2018

Squoosh

A handy in-browser image compression tool. Drag, drop, tweak, and export.