Tags: parsing



The ghost of browsers past

Even before a line of code was written for the line-mode browser simulator when we gathered together at CERN, there was a gleeful period of digital spelunking.

Brian goes browsing Demonstration data sources

We poked at the markup of the first ever website

  • What’s that NEXTID element? Turns it out it’s something specific to the NeXT operating system.
  • Why does the first iteration of HTML already contain H1 through to H6? It’s because they were lifted wholesale from a flavour of SGMLStandard Generalized Markup Language—that was already in use at CERN.

Oh, and Brian asked Robert Cailliau why they went with the term World Wide Web. “Well,” he said, “we had to call it something. And we thought we could always change it later.”

Then there was the story of the line-mode browser. It was created by Nicola Pellow, who was a student at CERN in 1990. She later worked on the Mac browser but her involvement with kickstarting the world wide web ended around 1993. She never showed up to any of the reunions.

We poked around in the (surprisingly short) source code of the line-mode browser. We found the lines that described how elements should be styled—the term “style sheet” appeared in a comment!

Proto-stylesheet Parsing the parser

If you’ve fired up the line-mode browser simulator and run some websites through it, you’ll probably see occasions where a whole bunch of JavaScript—nestled between script tags in the head of the document—gets rendered to the screen.


We could’ve hidden that JavaScript, but we made a deliberate decision to display it. That’s what the line-mode browser would have done. The script element didn’t exist back then. Heck, JavaScript didn’t exist back then. So browsers would have handled the unknown element in the standard HTML way: ignore the opening and closing tags and just render what’s in-between them. That’s still the error-handling model for unrecognised elements in HTML.

This is why we used to write our JavaScript like this:

<script language="JavaScript" type="text/javascript">
(JavaScript goes here)

The HTML comments stopped the JavaScript from being rendered to the screen in older browsers (like the line-mode browser). Using the opening HTML comment <!-- is functionally equivalent to // single-line comments in JavaScript …although you still need to prefix the closing --> comment with a //.

I remember doing this when I first started making websites in the 90s. You can see it if you view source on the first version of this website.

Later on, we all switched to XHTML so we updated the syntax to make it valid XML.

<script type="text/javascript">
(JavaScript goes here)

The <![CDATA[ part stops an XML parser from trying to parse the JavaScript. But HTML parsers would choke on that because it starts with an angle bracket. Hence the JavaScript-style // comment.

Anyway, we don’t bother with HTML or XHTML comments at the start of our script blocks anymore. And that’s why the line-mode browser simulator renders the JavaScript to the screen.

Note that the JavaScript isn’t executed. That’s thanks to a clever little hack by Remy: the line-mode browser simulator changes the type attribute of every script element to text/plain, effectively defusing them. Smart!

Parsing webmentions

Thanks to everyone who helped me test webmentions that I hacked together at Indie Web Camp last weekend.

Let me explain what web mentions are all about…

Basically, it’s an equivalent to pingback. Let’s say I write something here on adactio.com. Suppose that prompts you to write something in response on your own site. A web mention is a way for you to let me know that your response exists.

If you look in the head of any of my journal posts, you’ll see this link element:

<link rel="webmention" href="http://adactio.com/webmention.php" />

That’s my web mention endpoint: http://adactio.com/webmention.php …it’s kind of like a webhook: a URL that’s intended to be hit by machines rather than people. So when you publish your response to my post, you ping that URL with a POST request that sends two parameters:

  1. target: the URL of my post and
  2. source: the URL of your response.

Ideally your own CMS or blogging system would take care of doing the pinging, but until that’s more widely implemented, I’m providing this form at the end of each of my posts:

Either way, once you ping my web mention endpoint—discoverable through that link rel="webmention"—with those two parameters, I just need to confirm that your post does indeed contain a link to my post—by making a cURL request and parsing your source—and then I return a server response of 202 (Accepted).

Here’s the code for a minimum viable web mention in PHP.

That’s as far as I got at Indie Web Camp but it was enough for me to start collecting responses to posts.

Webmentions as links

The next step is to do something with the responses. After all, I’ve already got the source of each response from those cURL requests.

Barnaby has a written a nice straightforward microformats parser in PHP. I’m using that to check the cURLed source for any responses that have been marked up using h-entry. That’s one of the microformats 2 vocabularies—a much simpler way of writing structured content with microformats.

Aaron, Amber, and Barnaby all sent responses that were marked up with h-entry so now their responses appear in full.

Webmentions as comments

So there you have it. Comments are now open on every journal post on adactio.com …the only catch is that you have to write the comment on your own site. And if you want the content of your post to appear here (instead of just a link) then update your blog post template to include a handful of h-entry classes.

Feel free to use this post as a test. Mark up your blog with h-entry, write a post that links to this URL, and enter the URL of your post in the form below.