Tags: 25

665

sparkline

Wednesday, October 16th, 2019

How We Built The World Wide Web In Five Days

This talk about recreating the first ever web browser was a joint presentation with Remy Sharp, delivered at the Fronteers conference in Amsterdam in October 2019.

Jeremy

Our story begins with the Big Bang.

13.8 billion years ago

This sets a chain of events in motion that gives us elementary particles, then more complex particles like atoms, which form stars and planets, including our own, on which life evolves, which brings us to the recent past when this whole process results in the universe generating a way of looking at itself: physicists.

A physicist is the atom’s way of knowing about atoms.

—George Wald

By the end of World War Two, physicists in Europe were in short supply. If they hadn’t already fled during Hitler’s rise to power, they were now being actively wooed away to the United States.

64 years ago

To counteract this brain drain, a coalition of countries forms the European Organization for Nuclear Research, or to use its French acronym, CERN.

They get some land in a suburb of Geneva on the border between Switzerland and France, where they set about smashing particles together and recreating the conditions that existed at the birth of the universe.

The Syncrocyclotron.

Every year, CERN is host to thousands of scientists who come to run their experiments.

Remy

Fast forward to February 2019, a group of 9 of us were invited to CERN as an elite group of hackers to recreate a different experiment.

The group.

We are there to recreate a piece of software first published 30 years ago. Given this goal, we need to answer some important questions first:

  • How does this software look and feel?
  • How does it work?
  • How you interact with it?
  • How does it behave?

The software is so old that it doesn’t run on any modern machines, so we have a NeXT machine specially shipped from the nearby museum. This is no ordinary machine. It was one of the only two NeXT machines that existed at CERN in the late 80s.

Now we have the machine to run this special software.

By some fluke the good people of the web have captured several different versions of this software and published them on Github.

So we selected the oldest version we could find. We download it from Github to our computers. Now we have to transfer it to the NeXT machine.

Except there’s no USB drive. It didn’t exist. CD ROM? Floppy drive? The NeXT computer had a “floptical drive”—bespoke to NeXT computers—all very well, but in 2019 we don’t have those drives.

To transfer the software from our machine, to the NEXT machine, we needed to use the network.

Jeremy

62 years ago

In 1957, J.C.R. Licklider was the first person to publicly demonstrate the idea of time sharing: linking one computer to another.

56 years ago

Six years later, he expanded on the idea in a memo that described an Intergalactic Computer Network.

By this time, he was working at ARPA: the department of Defense’s Advanced Research Projects Agency. They were very interested in the idea of linking computers together, for very practical reasons.

America’s military communications had a top-down command-and-control structure. That was a single point of failure. One pre-emptive strike and it’s game over.

The solution was to create a decentralised network of computers that used Paul Baran’s brilliant idea of packet switching to move information around the network without any central authority.

This idea led to the creation of the ARPANET. Initially it connected a few universities. The ARPANET grew until it wasn’t just computers at each endpoint; it was entire networks. It was turning into a network of networks …an internetwork, or internet, for short. In order for these networks to play nicely with one another, they needed to agree on using the same set of protocols for packet switching.

Bob Kahn and Vint Cerf crafted the simplest possible set of low-level protocols: the Transmission Control Protocol and the Internet Protocol. TCP/IP.

TCP/IP is deliberately dumb. It doesn’t care about the contents of the packets of data being passed around the internet. People were then free to create more task-specific protocols to sit on top of TCP/IP.

There are protocols specifically for email, for example. Gopher is another example of a bespoke protocol. And there’s the File Transfer Protocol, or FTP.

Remy

Back in our war room in 2019, we finally work out that can use FTP to get the software across. FTP is an arcane protocol, but we can agree that it will work across the two eras.

Although we have to manually install FTP servers onto our machines. FTP doesn’t ship with new machines because it’s generally considered insecure.

Now we finally have the software installed on the NeXT computer and we’re able to run the application.

We double click the shading looking, partly hand drawn icon with a lightning bolt on it, and we wait…

Once the software’s finally running, we’re able to see that it looks a bit like an ancient word processor. We can read, edit and open documents. There’s some basic styles lots of heavy margins. There’s a super weird menu navigation in place.

But there’s something different about this software. Something that makes this more than just a word processor.

These documents, they have links…

Jeremy

Ted Nelson is fond of coining neologisms. You can thank him for words like “intertwingled” and “teledildonics”.

56 years ago

He also coined the word “hypertext” in 1963. It is defined by what it is not.

Hypertext is text which is not constrained to be linear.

Ever played a “choose your own adventure” book? That’s hypertext. You can jump from one point in the book to a different point that has its own unique identifier.

The idea of hypertext predates the word. In 1945, Vannevar Bush published a visionary article in The Atlantic Monthly called As We May Think.

He imagines a mechanical device built into a desk that can summon reams of information stored on microfilm, allowing the user to create “associative trails” as they make connections between different concepts. He calls it the Memex.

Memex

Also in 1945, a young American named Douglas Engelbart has been drafted into the navy and is shipping out to the Pacific to fight against Japan. Literally as the ship is leaving the harbour, word comes through that the war is over. He still gets shipped out to the Philippines, but now he’s spending his time lounging in a hut reading magazines. That’s how he comes to read Vannevar Bush’s Memex article, which lodges in his brain.

51 years ago

Douglas Engelbart decides to dedicate his life to building the computer equivalent of the Memex.

On December 9th, 1968, he unveils his oNLine System—NLS—in a public demonstration. Not only does he have a working implementation of hypertext, he also shows collaborative real-time editing, windows, graphics, and oh yeah—for this demo, he invents the mouse.

It truly is The Mother of All Demos.

Douglas Engelbart has a posse.

39 years ago

There were a number of other attempts at creating hypertext systems. In 1980, a young computer scientist named Tim Berners-Lee found himself working at CERN, where scientists were having a heck of time just keeping track of information.

He created a system somewhat like Apple’s Hypercard, but with clickable links. He named it ENQUIRE, after a Victorian book of manners called Enquire Within Upon Everything.

ENQUIRE didn’t work out, but Tim Berners-Lee didn’t give up on the problem of managing information at CERN. He thinks about all the work done before: Vannevar Bush’s Memex; Ted Nelson’s Xanadu project; Douglas Engelbart’s oNLine System.

A lot of hypertext ideas really are similar to a choose-your-own-adventure: jumping around from point to point within a book. But what if, instead of imagining a hypertext book, we could have a hypertext library? Then you could jump from one point in a book to a different point in a different book in a completely different part of the library.

The Library Of Babel

In other words, what if you took the world of hypertext and the world of networks, and you smashed them together?

30 years ago

On the 12th of March, 1989, Tim Berners-Lee circulates the first draft of a document titled Information Management: A Proposal.

The diagrams are incomprehensible. But his supervisor at CERN, Mike Sendall, sees the potential. He reads the proposal and scrawls these words across the top: “vague, but exciting.”

Tim Berners-Lee gets the go-ahead to spend some time on this project. And he gets the budget for a nice shiny NeXT machine. With the support of his colleague Robert Cailliau, Berners-Lee sets about making his theoretical project a reality. They kick around a few ideas for the name.

They thought of calling it The Mesh. They thought of calling it The Information Mine, but Tim rejected that, knowing that whatever they called it, the words would be abbreviated to letters, and The Information Mine would’ve seemed quite egotistical.

So, even though it’s only going to exist on one single computer to begin with, and even though the letters of the abbreviation take longer to say than the words being abbreviated, they call it …the World Wide Web.

As Robert Cailliau told us, they were thinking “Well, we can always change it later.”

Tim Berners-Lee brainstorms a new protocol for hypertext called the HyperText Transfer Protocol—HTTP.

He thinks about a format for hypertext called the Hypertext Markup Language—HTML.

He comes up with an addressing scheme that uses Unique Document Identifiers—UDIs, later renamed to URIs, and later renamed again to URLs.

But he needs to put it all together into running code. And so Tim Berners-Lee sets about writing a piece of software…

Remy

Tim Berners-Lee’s document is a proposal at that stage 30 years ago. It’s just theory. So he needs to build a prototype to actually demonstrate how the World Wide Web would work.

The NeXT computer is the perfect ground for rapid software development because the NeXT operating system ships with a program called NSBuilder.

NeXT

NSBuilder is software to build software. In fact, the “NS” (meaning NeXTSTEP) can be found in existing software today - you’ll find references to NSText in Safari and Mac developer documentation.

Tim Berners-Lee, using NSBuilder was able to create a working prototype of this software in just 6 weeks

He called it: WorldWideWeb.

We finally have the software working the way it ran 30 years ago.

But our project is to replicate this browser so that you can try it out, and see how web pages look through the lens of 1990.

So we enter some of our blog urls. https://remysharp.com, https://adactio.com

But HTTPS doesn’t work. There was no HTTPS. There’s no HTTP2. HTTP1.0 hadn’t even been invented.

So I make a proxy. Effectively a monster-in-the-middle attack on all web requests, stripping the SSL layer and then returning the HTML over the HTTP 0.9 protocol.

And finally, we see…

We see junk.

We can see the text content of the website, but there’s a lot of HTML junk tags being spat out onto the screen, particularly at the start of the document.

Jeremy

<h1> <h2> <h3> <h4> <h5> <h6> 
<ol> <ul> <li> <p>

These tags are probably very familiar to you. You recognise this language, right?

That’s right. It’s SGML.

SGML is the successor to GML, which supposedly stands for Generalised Markup Language. But that may well be a backronym. The format was created by Goldfarb, Mosher, and Lorie: G, M, L.

SGML is supposed to be short for Standard Generalised Markup Language.

A flavour of SGML was already being used at CERN when Tim Berners-Lee was working on his World Wide Web project. Rather than create a whole new format from scratch, he repurposed what people were already familiar with. This was his HyperText Markup Language, HTML.

One thing he did add was a tag called A for anchor.

Its href attribute is short for “hypertext reference”. Plop a URL in there and you’ve got a link.

The hypertext community thought this was a terrible way to make links.

They believed that two-way linking was vital. With two-way linking, the linked resource connects back to where the link originates. So if the linked resource moved, the link would stay intact.

That’s not the case with the World Wide Web. If the linked resource moves, the link is broken.

Perhaps you’ve experienced broken links?

When Tim Berners-Lee wrote the code for his WorldWideWeb browser, there was a grand total of 26 tags in HTML. I know that we’d refer to them as elements today, but that term wasn’t being used back then.

Now there are well over 100 elements in HTML. The reason why the language has been able to expand so much is down to the way web browsers today treat unknown elements: ignore any opening and closing tags you don’t recognise and only render the text in between them.

Remy

The parsing algorithm was brittle (when compared to modern parsers). There’s no DOM tree being built up. Indeed, the DOM didn’t exist.

Remember that the WorldWideWeb was a browser that effectively smooshed together a word processor and network requests, the styling method was based (mostly) around adding margins as the tags were parsed.

Kimberly Blessing was digging through the original 7344 lines of code for the WorldWideWeb source. She found the code that could explain why we were seeing junk.

<link rel="..."

In this case, when the parser encountered <link rel="…" it would see the <.

<

“Yes, a tag; let’s slurp it up”.

<li

Then it reads li and the parser is thinking, “This looks like a list item, good stuff.”

<lin

Then encounters the n (of link) and, excusing the paring algorithm because it was the first, would then abort the style it was about to apply and promptly spit out the rest of the content on screen, having already swallowed up the first four characters: <lin.

k rel="stylesheet" href="...">

With that, we decided to make the executive design decision that we would strip out any elements that were unknown to the original WorldWideWeb browser — link, script, video and img — which of course there was no image support in the world’s first browser.

This is the first little cheat we applied, so that the page would be more pleasing to you, the visitor of our emulator. Otherwise you’d be presented with a lot of scary looking junk.

So now we have all the reference we need to be able to replicate this browser:

  • The machine running the original operating system, which gives us colours, fonts, menus and so on.
  • The browser itself, how windows behave, what’s in the menus, what makes the experience unique to that period of time.
  • And finally how it looks when we visit URLs.

So off we go.

🤯

Jeremy

While Remy sets about recreating the functionality of the WorldWideWeb browser, Angela was recreating the user interface using CSS.

Inputs. Buttons. Icons. Menus. All with the exact borders, highlights and shadows used in the UI of the NeXT operating system, including having the scrollbar on the left side of windows.

Meanwhile the rest of us were putting together an explanatory website to give some backstory to what we were doing. I spent most of my time working on a timeline showing thirty years before and thirty years after the original proposal for the web.

Marking up (and styling) an interactive timeline that looks good in a modern browser and still works in the first ever web browser.

The WorldWideWeb browser inherited fonts from the NeXTSTEP operating system. It mostly used Helvetica and a font called Ohlfs (created by Keith Ohlfs). Helvetica is ubiquitous but Ohlfs was never seen outside of a NeXT machine.

Our teammates Mark and Brian were obsessed with accurately recreating the typography. We couldn’t use modern fonts which are vector based. We need pixeliness.

So Mark and Brian took a screenshot of the NeXT machine’s alphabet. With help from afar from font designer David Jonathan Ross, they traced each square pixel in a vector program and then exported that as a web font. Now we’ve got a web font that deliberately isn’t anti-aliased. It’s a vector format that recreates the look of a bitmap.

Put the pixelly font together with the CSS interface elements and you’ve got something that really looks like the old WorldWideWeb programme.

Remy

This is the final product of our work at CERN that week. A fully working WorldWideWeb emulator giving a reasonable close experience of what it was like to surf the web as if it were 30 years ago.

This is entirely in the browser and was written using:

  • React,
  • React Draggable for the windows and menus,
  • React Hotkeys for keyboard combo shortcuts (we replicated the original OS as much as we could),
  • idb-keyval for some local storage,
  • Parcel for bundling.

These tools weren’t chosen particular because they were the best tools for the job, but rather because they were the tools I knew that well enough that would help speed up my development process.

We worked hard to replicate the look and feel as much as we could. We even replicated typos found throughout the WorldWideWeb app:

An excercise in global information availability

Why don’t we see how it looks…

Jeremy

There’s kind of irony in this in that it relies heavily on JavaScript. In fact, there’s nothing there other than JavaScript. But of course the WorldWideWeb browser couldn’t deal with JavaScript—JavaScript hadn’t been invented yet. So the one URL that definitely wouldn’t work in this emulator is …the emulator itself.

Remy

(Which Jeremy was blaming me for.)

This is what you see when you visit the WorldWideWeb browser for the first time. We can see we are welcomed by the universe of hypertext. We’ve got these menus over here that you can drag off and open panels (I always thought this was an ordering bug but the operating system actually works like this).

We’ll go ahead and open the Fronteers website. I go to “Document” and then I go to “Open from full document reference” (because the word URL didn’t exist). I’m going to pop the Fronteers URL in here. And there it is. We’ve got the Fronteers website. Looks pretty good. (One of my favourite UI bits is this scrollbar on the left hand side instead of the right.)

We can follow the links. Actually one of my favourite features that was in this original browser that we replicated was this “Navigate” menu. I’ve just opened the first link in the document, but I can click on “Next”, and “Next” a bunch of times and it will cycle through each one of the links on the page that I launched from and let me read all the pages that the Fronteers site links to (which I really like). I can go backwards and forwards, and so on.

One thing you might have already noticed is that there are no URLs here. And in fact, to view source, it was considered a kind of diagnostic option and it was very very tucked away. The reason for this is that URLs—and the source HTML or SGML—was considered ugly and potentially a bad user experience.

But there’s one thing about navigating here that’s different. To open this link, I had to double-click.

Jeremy

The WorldWideBrowser was more of a prototype than anything else. It demonstrated the potential of the World Wide Web project, but it only worked on NeXT machines.

To show how the World Wide Web could work on any computer, the second ever web browser was the Line Mode Browser, coded by Nicola Pellow. It had a very basic text interface—no clicking on links—but it could be installed anywhere.

Lots of other geeks and nerds were working on their own web browsers, but it was Marc Andreesen’s Mosaic browser that really blew the doors open for the web. It had a nice usable interface, and it (unilaterrally) introduced the innovation of images on the web.

Andreesen went on to found Netscape. The World Wide Web took off at an unprecedented rate. Microsoft brought out their Internet Explorer browser and started trying to catch up with Netscape. We had the browser wars. Later we got even more browsers, like Safari and Chrome, while Netscape morphed into Firefox and Internet Explorer morphed into Edge. And the rest is history.

But all of these browsers were missing something that was in the original WorldWideWeb browser.

Remy

The reason I have to double-click on these links is that, when I do a single click, it actually places the cursor. The cursor is blinking there on “Fronteers.” And the reason I can place the cursor is because I can edit the document.

I see Fronteers here is missing a heading. We want to welcome you all:

Welkom

We want to make that a heading. Let’s style that. It’s a heading.

So the browser was meant to edit documents. Let’s put a bit of text here:

Great talks from Remy and Jeremy

(forget about everyone else). Now if I want to create a link, I’ll go ahead and navigate to Jeremy’s site, https://adactio.com. I’m going to do “Link”, then “Mark all”, which is a way of copying the URL to that window. Then I go back to the Fronteers website, select “Jeremy”, and then do “Link to marked.” I can double-click on Jeremy’s name it will open up his website.

I can save this document as well. I’m going to call it fronteers.html.

Let’s do a hard reboot—a browser refresh. I come back to my machine a couple of days later, “Ah, the Fronteers page!”. I’m going to open that again, and it linked to that really handsome guy in the sprite shirt. And yes, the links still work.

In fact, this documentation that you see when the WorldWideWeb browser launches was written, styled, and linked using the WorldWideWeb browser. The WorldWideWeb browser was for a web that you could read and write.

But this didn’t survive. It was a hurdle that was too tricky to propose or implement across the different types servers that existed and for the upcoming browsers that were on the horizon.

And so it wasn’t standardised and doesn’t exist today.

But this is an important lesson from the time: reducing complexity increases the chances of mass adoption.

In the end, simplicity wins.

Jeremy

I think that’s a pattern we see over and over again, not just in the history of the web, but before the web. Simplicity wins.

Ted Nelson famously to this day thinks that the World Wide Web is weak sauce. It didn’t try to solve complex right out of the gate, like handling micro-payments.

As we saw, the hypertext community that one-way linking was ridiculous. But simplicity does win out.

Unfortunately that’s why browsers ended up just being browsers. We got some of the functionality back with wikis, content management systems, and social media to a certain extent. But I think it’s still a bit of a shame that when I want to browse a web page, I’m using one piece of software—the browser—but when I want to make a web page, I’m using another piece of software (or multiple pieces of software) to get something on to the web.

I feel like we lost something.

Remy

We head home after a week of hacking.

We were all invited back in March earlier this year for the Web@30 event that was taking place to celebrate the web but also Sir Tim Berners-Lee.

A NeXT machine from 1989 running the WorldWideWeb browser and my laptop in 2019 running https://worldwideweb.cern.ch

A few of us, Jeremy, Martin, and myself, went back to CERN for the the first leg of the event. There was even a video showing off our work as part of the main conference. Jeremy and I even chased Tim Berners-Lee back to London at the science museum like obsessive web fanboys. It was a lot of fun!

The night before I got a message from Jean-François Groff, pictured here on the right. JF Groff joined Tim Berners-Lee 30 years ago and created libwww (a precursor to libcurl).

The message read:

Sitting with Tim right now. He loves your browser!

Crushed it.

It’s amazing that we were able to pull this off in a week just with text editors and information that’s freely available. It’s mind boggling how much we can do today and how far it can reach. And it all started on that NeXTSTEP machine 30 years ago.

What I really loved about this project was working with this brilliantly old technology, digging around at the birth of browsers and the web.

I wouldn’t be stood here today, if it weren’t for the web.

I wouldn’t even know Jeremy, if it weren’t for the web.

I wouldn’t have a career, if it weren’t for the web.

I loved seeing how such old technology, the original WorldWideWeb browser was still able to render my blog. Because I put content first, delivered markup from the server. The page rendered because HTML really is backward compatible.

HTML and HTTP are just text. Nothing terribly fancy. Dare I say, beautifully simple, and as we said before, simplicity wins the day.

This same simplicity is what allows us all to have the chance for an equal voice. The web allows us to freely publish our thoughts and experiences. We have to fight to protect that kind of web.

And we’ve got to work at keeping it simple.

Jeremy

When we returned to CERN for the 30th anniversary celebrations, one of the other people there was the journalist Zeynep Tefepkçi.

When @Zeynep met NeXT.

Lou, Zeynep, Tim, Robert, and Jean-François. #Web30

She was on a panel along with Tim Berners-Lee, Robert Caillau, Jean-François Groff, and Lou Montoulli. At the end of the panel discussion, she was asked:

What would you tell the next generation about how to use this wonderful tool?

She replied:

If you have something wonderful, if you do not defend it, you will lose it.

If you do not defend the magic and the things that make it wonderful, it’s just not going to stay magical by itself.

Defend the simplicity and resilience that’s so central to the web.

I don’t know about you, but I often feel that just trying to make a web page has become far too complicated. But this is complexity that we have chosen with our tools, processes, and assumptions. We’ve buried the magic. The magic of linking web pages together. The magic of a working global hypertext system, where nobody needs to ask for permission to publish.

Tim Berners-Lee prototyped the first web browser, but the subsequent world wide web wasn’t created by any one person. It was created by everyone. That. Is. Magical.

I don’t want the web to become a place where only an elite priesthood get to experience the magic of creation. I’m going to fight to defend the openness of the world wide web. This is for everyone. Not just for everyone to use; it’s for everyone to create.

Saturday, October 12th, 2019

Checked in at The Quiet Man. Session — with Jessica map

Checked in at The Quiet Man. Session — with Jessica

Friday, October 11th, 2019

Wednesday, October 9th, 2019

Bonjour, Paris!

Bonjour, Paris!

Sunday, October 6th, 2019

Replying to

Also: https://webpagetest.org/result/191006_NG_26598228105505e271a92fb74c840d4f/

Also:

https://webpagetest.org/result/191006NG26598228105505e271a92fb74c840d4f/

Wednesday, October 2nd, 2019

Replying to

Thank you—glad you liked it!

Tuesday, October 1st, 2019

Replying to

I really appreciate your hard work—thank you!

Me and @BruceL — this year’s models.

Me and @BruceL — this year’s models.

Monday, September 30th, 2019

Replying to

W00t!!!

Shout out to Requestmap from @Paul_Irish …something I’d love to see in dev tools: https://adactio.com/journal/15797

Shout out to Requestmap from @Paul_Irish …something I’d love to see in dev tools:

https://adactio.com/journal/15797

Monday, September 23rd, 2019

Going to Madrid. brb

Sunday, September 22nd, 2019

Thistle and crag.

Thistle and crag.

Saturday, September 21st, 2019

Going offline with microformats

For the offline page on my website, I’ve been using a mixture of the Cache API and the localStorage API. My service worker script uses the Cache API to store copies of pages for offline retrieval. But I used the localStorage API to store metadata about the page—title, description, and so on. Then, my offline page would rifle through the pages stored in a cache, and retreive the corresponding metadata from localStorage.

It all worked fine, but as soon as I read Remy’s post about the forehead-slappingly brilliant technique he’s using, I knew I’d be switching my code over. Instead of using localStorage—or any other browser API—to store and retrieve metadata, he uses the pages themselves! Using the Cache API, you can examine the contents of the pages you’ve stored, and get at whatever information you need:

I realised I didn’t need to store anything. HTML is the API.

Refactoring the code for my offline page felt good for a couple of reasons. First of all, I was able to remove a dependency—localStorage—and simplify the JavaScript. That always feels good. But the other reason for the warm fuzzies is that I was able to use data instead of metadata.

Many years ago, Cory Doctorow wrote a piece called Metacrap. In it, he enumerates the many issues with metadata—data about data. The source of many problems is when the metadata is stored separately from the data it describes. The data may get updated, without a corresponding update happening to the metadata. Metadata tends to rot because it’s invisible—out of sight and out of mind.

In fact, that’s always been at the heart of one of the core principles behind microformats. Instead of duplicating information—once as data and again as metadata—repurpose the visible data; mark it up so its meta-information is directly attached to the information itself.

So if you have a person’s contact details on a web page, rather than repeating that information somewhere else—in the head of the document, say—you could instead attach some kind of marker to indicate which bits of the visible information are contact details. In the case of microformats, that’s done with class attributes. You can mark up a page that already has your contact information with classes from the h-card microformat.

Here on my website, I’ve marked up my blog posts, articles, and links using the h-entry microformat. These classes explicitly mark up the content to say “this is the title”, “this is the content”, and so on. This makes it easier for other people to repurpose my content. If, for example, I reply to a post on someone else’s website, and ping them with a webmention, they can retrieve my post and know which bit is the title, which bit is the content, and so on.

When I read Remy’s post about using the Cache API to retrieve information directly from cached pages, I knew I wouldn’t have to do much work. Because all of my posts are already marked up with h-entry classes, I could use those hooks to create a nice offline page.

The markup for my offline page looks like this:

<h1>Offline</h1>
<p>Sorry. It looks like the network connection isn’t working right now.</p>
<div id="history">
</div>

I’ll populate that “history” div with information from a cache called “pages” that I’ve created using the Cache API in my service worker.

I’m going to use async/await to do this because there are lots of steps that rely on the completion of the step before. “Open this cache, then get the keys of that cache, then loop through the pages, then…” All of those thens would lead to some serious indentation without async/await.

All async functions have to have a name—no anonymous async functions allowed. I’m calling this one listPages, just like Remy is doing. I’m making the listPages function execute immediately:

(async function listPages() {
...
})();

Now for the code to go inside that immediately-invoked function.

I create an array called browsingHistory that I’ll populate with the data I’ll use for that “history” div.

const browsingHistory = [];

I’m going to be parsing web pages later on, so I’m going to need a DOM parser. I give it the imaginative name of …parser.

const parser = new DOMParser();

Time to open up my “pages” cache. This is the first await statement. When the cache is opened, this promise will resolve and I’ll have access to this cache using the variable …cache (again with the imaginative naming).

const cache = await caches.open('pages');

Now I get the keys of the cache—that’s a list of all the page requests in there. This is the second await. Once the keys have been retrieved, I’ll have a variable that’s got a list of all those pages. You’ll never guess what I’m calling the variable that stores the keys of the cache. That’s right …keys!

const keys = await cache.keys();

Time to get looping. I’m getting each request in the list of keys using a for/of loop:

for (const request of keys) {
...
}

Inside the loop, I pull the page out of the cache using the match() method of the Cache API. I’ll store what I get back in a variable called response. As with everything involving the Cache API, this is asynchronous so I need to use the await keyword here.

const response = await cache.match(request);

I’m not interested in the headers of the response. I’m specifically looking for the HTML itself. I can get at that using the text() method. Again, it’s asynchronous and I want this promise to resolve before doing anything else, so I use the await keyword. When the promise resolves, I’ll have a variable called html that contains the body of the response.

const html = await response.text();

Now I can use that DOM parser I created earlier. I’ve got a string of text in the html variable. I can generate a Document Object Model from that string using the parseFromString() method. This isn’t asynchronous so there’s no need for the await keyword.

const dom = parser.parseFromString(html, 'text/html');

Now I’ve got a DOM, which I have creatively stored in a variable called …dom.

I can poke at it using DOM methods like querySelector. I can test to see if this particular page has an h-entry on it by looking for an element with a class attribute containing the value “h-entry”:

if (dom.querySelector('.h-entry h1.p-name') {
...
}

In this particular case, I’m also checking to see if the h1 element of the page is the title of the h-entry. That’s so that index pages (like my home page) won’t get past this if statement.

Inside the if statement, I’m going to store the data I retrieve from the DOM. I’ll save the data into an object called …data!

const data = new Object;

Well, the first piece of data isn’t actually in the markup: it’s the URL of the page. I can get that from the request variable in my for loop.

data.url = request.url;

I’m going to store the timestamp for this h-entry. I can get that from the datetime attribute of the time element marked up with a class of dt-published.

data.timestamp = new Date(dom.querySelector('.h-entry .dt-published').getAttribute('datetime'));

While I’m at it, I’m going to grab the human-readable date from the innerText property of that same time.dt-published element.

data.published = dom.querySelector('.h-entry .dt-published').innerText;

The title of the h-entry is in the innerText of the element with a class of p-name.

data.title = dom.querySelector('.h-entry .p-name').innerText;

At this point, I am actually going to use some metacrap instead of the visible h-entry content. I don’t output a description of the post anywhere in the body of the page, but I do put it in the head in a meta element. I’ll grab that now.

data.description = dom.querySelector('meta[name="description"]').getAttribute('content');

Alright. I’ve got a URL, a timestamp, a publication date, a title, and a description, all retrieved from the HTML. I’ll stick all of that data into my browsingHistory array.

browsingHistory.push(data);

My if statement and my for/in loop are finished at this point. Here’s how the whole loop looks:

for (const request of keys) {
  const response = await cache.match(request);
  const html = await response.text();
  const dom = parser.parseFromString(html, 'text/html');
  if (dom.querySelector('.h-entry h1.p-name')) {
    const data = new Object;
    data.url = request.url;
    data.timestamp = new Date(dom.querySelector('.h-entry .dt-published').getAttribute('datetime'));
    data.published = dom.querySelector('.h-entry .dt-published').innerText;
    data.title = dom.querySelector('.h-entry .p-name').innerText;
    data.description = dom.querySelector('meta[name="description"]').getAttribute('content');
    browsingHistory.push(data);
  }
}

That’s the data collection part of the code. Now I’m going to take all that yummy information an output it onto the page.

First of all, I want to make sure that the browsingHistory array isn’t empty. There’s no point going any further if it is.

if (browsingHistory.length) {
...
}

Within this if statement, I can do what I want with the data I’ve put into the browsingHistory array.

I’m going to arrange the data by date published. I’m not sure if this is the right thing to do. Maybe it makes more sense to show the pages in the order in which you last visited them. I may end up removing this at some point, but for now, here’s how I sort the browsingHistory array according to the timestamp property of each item within it:

browsingHistory.sort( (a,b) => {
  return b.timestamp - a.timestamp;
});

Now I’m going to concatenate some strings. This is the string of HTML text that will eventually be put into the “history” div. I’m storing the markup in a string called …markup (my imagination knows no bounds).

let markup = '<p>But you still have something to read:</p>';

I’m going to add a chunk of markup for each item of data.

browsingHistory.forEach( data => {
  markup += `
<h2><a href="${ data.url }">${ data.title }</a></h2>
<p>${ data.description }</p>
<p class="meta">${ data.published }</p>
`;
});

With my markup assembled, I can now insert it into the “history” part of my offline page. I’m using the handy insertAdjacentHTML() method to do this.

document.getElementById('history').insertAdjacentHTML('beforeend', markup);

Here’s what my finished JavaScript looks like:

<script>
(async function listPages() {
  const browsingHistory = [];
  const parser = new DOMParser();
  const cache = await caches.open('pages');
  const keys = await cache.keys();
  for (const request of keys) {
    const response = await cache.match(request);
    const html = await response.text();
    const dom = parser.parseFromString(html, 'text/html');
    if (dom.querySelector('.h-entry h1.p-name')) {
      const data = new Object;
      data.url = request.url;
      data.timestamp = new Date(dom.querySelector('.h-entry .dt-published').getAttribute('datetime'));
      data.published = dom.querySelector('.h-entry .dt-published').innerText;
      data.title = dom.querySelector('.h-entry .p-name').innerText;
      data.description = dom.querySelector('meta[name="description"]').getAttribute('content');
      browsingHistory.push(data);
    }
  }
  if (browsingHistory.length) {
    browsingHistory.sort( (a,b) => {
      return b.timestamp - a.timestamp;
    });
    let markup = '<p>But you still have something to read:</p>';
    browsingHistory.forEach( data => {
      markup += `
<h2><a href="${ data.url }">${ data.title }</a></h2>
<p>${ data.description }</p>
<p class="meta">${ data.published }</p>
`;
    });
    document.getElementById('history').insertAdjacentHTML('beforeend', markup);
  }
})();
</script>

I’m pretty happy with that. It’s not too long but it’s still quite readable (I hope). It shows that the Cache API and the h-entry microformat are a match made in heaven.

If you’ve got an offline strategy for your website, and you’re using h-entry to mark up your content, feel free to use that code.

If you don’t have an offline strategy for your website, there’s a book for that.

Friday, September 13th, 2019

Replying to

Oh, that is wonderful news!!! I am so happy for you!

Tuesday, September 10th, 2019

Replying to

W00t! See you there!

Saturday, September 7th, 2019

Progressive Web App propaganda.

Progressive Web App propaganda.

Tuesday, September 3rd, 2019

Replying to

I see—and appreciate—what you’ve done with traintimes.org.uk (and I know what you mean when it feels like pissing in the wind). Keep fighting the good fight!

Saturday, August 31st, 2019

Revisiting @wordridden’s old dorm at Mount Holyoke (where I was a stowaway for three months 23 years ago).

Revisiting @wordridden’s old dorm at Mount Holyoke (where I was a stowaway for three months 23 years ago).

Sunday, August 25th, 2019

Replying to

I think the bouzouki brings out the best in Gary Numan:

https://soundcloud.com/saltercane/03-are-friends-electric

Saturday, August 24th, 2019

Replying to

Never once did I claim that using AMP would result in fewer clicks for publishers.

It’s exactly because AMP is guaranteed to work this way—while nothing else will—that makes the situation unfair.

Publishers have no choice but to use AMP.