Tags: documents



Tuesday, February 13th, 2018

XML is 20

XML 1.0 was released on February 10th, 1998. I remember the hype around XML at the time—it was our saviour, the chosen one, prophesied to bring balance to data exchange. Things didn’t quite work out that way, but still…

Twenty years later, it seems obvious that the most important thing about XML is that it was the first. The first data format that anyone could pack anything up into, send across the network to anywhere, and unpack on the other end, without asking anyone’s permission or paying for software, or for the receiver to have to pay attention to what the producer thought they’d produced it for or what it meant.

Saturday, December 9th, 2017

Origin story

In an excellent piece called The First Web Apps: 5 Apps That Shaped the Internet as We Know It, Matthew Guay wrote:

The world wide web wasn’t supposed to be this fun. Berners-Lee imagined the internet as a place to collaborate around text, somewhere to share research data and thesis papers.

In his somewhat confused talk at FFConf this year, James Kyle said:

The web was designed to share documents.

Douglas Crockford said

The web was not designed to do any of things it is doing. It was intended to be a simple—even primitive—document retrieval system.

Some rando on Hacker News declared:

Essentially every single aspect of the web is terrible. It was designed as a static document presentation system with hyperlinks.

It appears to be a universally accepted truth. The web was designed for sharing documents, and was never meant for the kind of applications we can build these days.

I don’t think that’s quite right. I think it’s fairer to say that the first use case for the web was document retrieval. And yes, that initial use case certainly influenced the first iteration of HTML. But right from the start, the vision for the web wasn’t constrained by what it was being asked to do at the time. (I mean, if you need an example of vision, Tim Berners-Lee called it the World Wide Web when it was just on one computer!)

The original people working on the web—Tim Berners-Lee, Robert Cailliau, Jean-Francois Groff, etc.—didn’t to try define the edges of what the web would be capable of. Quite the opposite. All of them really wanted a more interactive read-write web where documents could not only be read, but also edited and updated.

As for the idea of having a programming language in browsers (as well as a markup language), Tim Berners-Lee was all for it …as long as it could be truly ubiquitous.

To say that the web was made for sharing documents is like saying that the internet was made for email. It’s true in the sense that it was the most popular use case, but that never defined the limits of the system.

The secret sauce of the internet lies in its flexibility—it’s a deliberately dumb network that doesn’t care about the specifics of what runs on it. This lesson was then passed on to the web—another deliberately simple system designed to be agnostic to use cases.

It’s true that the web of today is very, very different to its initial incarnation. We got CSS; we got JavaScript; HTML has evolved; HTTP has evolved; URLs have …well, cool URIs don’t change, but you get the idea. The web is like the ship of Theseus—so much of it has been changed and added to over time. That doesn’t mean its initial design was flawed—just the opposite. It means that its initial design wasn’t unnecessarily rigid. The simplicity of the early web wasn’t a bug, it was a feature.

The web (like the internet upon which it runs) was designed to be flexible, and to adjust to future use-cases that couldn’t be predicted in advance. The best proof of this flexibility is the fact that we can and do now build rich interactive applications on the World Wide Web. If the web had truly been designed only for documents, that wouldn’t be possible.

Friday, April 17th, 2009

Blast from the past

In preparing for my talk for the Bamboo Juice conference at the Eden Project in Cornwall next week, I find I’m doing a lot of WWILFing. After spending far too long reading about and , and editing footage of a Von Braun-inspired orbital habitat, I got completely sidetracked into trying to figure out the storage capacity of attached to Voyager 1.

I still haven’t found an answer—I’ve asked Voyager’s cousin for help—but I did stumble across a gem of a document from 1995. It’s by Simon Pockley and it’s called Lest We Forget or Why I chose the World Wide Web as a repository for archival material. Written in the infancy of the web, it makes for fascinating reading. It’s like a seedling of the semantic web. Some of the projections were way off but some of them were eerily prescient. Here’s my favourite passage:

Technological obsolescence is only a part of the problem in the preservation of digital information. The World Wide Web is a flexible carrier of digital material across both hardware and software. Its ability to disseminate this material globally, combined with its inherent flexibility, allow it to accommodate evolving standards of encoding and markup. Survival of significant material on-line is dependent on use and use is related to ease of access.

The document contains a number of hyperlinks to related material, all of which are collected into footnotes at the end. What’s heartbreaking is to discover how many of those links no longer resolve. Just a handful from the original list remain:

Four fifths of those links resolve to a single domain, that of the National Library of Australia. So much for our distributed repository of archival material.

Wednesday, January 21st, 2009

Flickr: The New Frontiersman's Photostream

Background material for Watchmen.