Journal tags: longevity

3

sparkline

02022-02-22

Eleven years ago, I made a prediction:

The original URL for this prediction (www.longbets.org/601) will no longer be available in eleven years.

One year later, Matt called me on it and the prediction officially became a bet:

We’re playing for $1000. If I win, that money goes to the Bletchley Park Trust. If Matt wins, it goes to The Internet Archive.

I’m very happy to lose this bet.

When I made the original prediction eleven years ago that a URL on the longbets.org site would no longer be available, I did so in a spirit of mischief—it was a deliberately meta move. But it was also informed by a genuine feeling of pessimism around the longevity of links on the web. While that pessimism was misplaced in this case, it was informed by data.

The lifetime of a URL on the web remains shockingly short. What I think has changed in the intervening years is that people may have become more accustomed to the situation. People used to say “once something is online it’s there forever!”, which infuriated me because the real problem is the exact opposite: if you put something online, you have to put in real effort to keep it online. After all, we don’t really buy domain names; we just rent them. And if you publish on somebody else’s domain, you’re at their mercy: Geocities, MySpace, Facebook, Medium, Twitter.

These days my view towards the longevity of online content has landed somewhere in the middle of the two dangers. There’s a kind of Murphy’s Law around data online: anything that you hope will stick around will probably disappear and anything that you hope will disappear will probably stick around.

One huge change in the last eleven years that I didn’t anticipate is the migration of websites to HTTPS. The original URL of the prediction used HTTP. I’m glad to see that original URL now redirects to a more secure protocol. Just like most of the World Wide Web. I think we can thank Let’s Encrypt for that. But I think we can also thank Edward Snowden. We are no longer as innocent as we were eleven years ago.

I think if I could tell my past self that most of the web would using HTTPS by 2022, my past self would be very surprised …’though not as surprised at discovering that time travel had also apparently been invented.

The Internet Archive has also been a game-changer for digital preservation. While it’s less than ideal that something isn’t reachable at its original URL, knowing that there’s probably a copy of the content at archive.org lessens the sting considerably. I couldn’t be happier that this fine institution is the recipient of the stakes of this bet.

Mind the gap

In May 2012, Brian LeRoux, the creator of PhoneGap, wrote a post setting out the beliefs, goals and philosophy of the project.

The beliefs are the assumptions that inform everything else. Brian stated two core tenets:

  1. The web solved cross platform.
  2. All technology deprecates with time.

That second belief then informed one of the goals of the PhoneGap project:

The ultimate purpose of PhoneGap is to cease to exist.

Last week, PhoneGap succeeded in its goal:

Since the project’s beginning in 2008, the market has evolved and Progressive Web Apps (PWAs) now bring the power of native apps to web applications.

Today, we are announcing the end of development for PhoneGap.

I think Brian was spot-on with his belief that all technology deprecates with time. I also think it was very astute of him to tie the goals of PhoneGap to that belief. Heck, it’s even in the project name: PhoneGap!

I recently wrote this about Sass and clamp:

I’ve said it before and I’ll say it again, the goal of any good library should be to get so successful as to make itself redundant. That is, the ideas and functionality provided by the tool are so useful and widely adopted that the native technologies—HTML, CSS, and JavaScript—take their cue from those tools.

jQuery is the perfect example of this. jQuery is no longer needed because cross-browser DOM Scripting is now much easier …thanks to jQuery.

Successful libraries and frameworks point the way. They show what developers are yearning for, and that’s where web standards efforts can then focus. When a library or framework is no longer needed, that’s not something to mourn; it’s something to celebrate.

That’s particularly true if the library of code needs to be run by a web browser. The user pays a tax with that extra download so that the developer gets the benefit of the library. When web browsers no longer need the library in order to provide the same functionality, it’s a win for users.

In fact, if you’re providing a front-end library or framework, I believe you should be actively working towards making it obselete. Think of your project as a polyfill. If it’s solving a genuine need, then you should be looking forward to the day when your code is made redundant by web browsers.

One more thing…

I think it was great that Brian documented PhoneGap’s beliefs, goals and philosophy. This is exactly why design principles can be so useful—to clearly set out the priorities of a project, so that there’s no misunderstanding or mixed signals.

If you’re working on a project, take the time to ask yourself what assumptions and beliefs are underpinning the work. Then figure out how those beliefs influence what you prioritise.

Ultimately, the code you produce is the output generated by your priorities. And your priorities are driven by your purpose.

You can make those priorities tangible in the form of design principles.

You can make those design principles visible by publishing them.

The format of The Long Now

In 01992, Tim Berners-Lee wrote a document called HTML Tags.

In September 02001, I started keeping this online journal. Back then, I was storing my data in XML, using a format of my own invention. The XML was converted using PHP into (X)HTML, RSS, and potentially anything else …although the “anything else” part never really materialised.

In February 02006, I switched over to using a MySQL database to store my data as chunks of markup.

In February 02007, Tess wrote about data longevity

To me, being able to completely migrate my data — with minimal bit-rot — from system to system is the key in the never-ending and easily-lost fight to keep my data accessible over the entirety of my life.

She’s using non-binary, well-documented standards to store and structure her data: Atom, HTML and microformats.

Meanwhile, the HTML5 spec began defining error-handling for HTML documents. Ian Hickson wrote:

The original reason I got involved in this work is that I realised that the human race has written literally billions of electronic documents, but without ever actually saying how they should be processed.

I decided that for the sake of our future generations we should document exactly how to process today’s documents, so that when they look back, they can still reimplement HTML browsers and get our data back, even if they no longer have access to Microsoft Internet Explorer’s source code.

In August 2008, Ian Hickson mentioned in an interview that the timeline for HTML5 involves having two complete implementations by 02022. Many web developers were disgusted that such a seemingly far-off date was even being mentioned. My reaction was the opposite. I began to pay attention to HTML5.

HTML is starting to look like a relatively safe bet for data longevity and portability. I’m not sure the same can be said for any particular flavour of database. Sooner rather than later, I should remove the unnecessary layer of abstraction that I’m using to store my data.

This would be my third migration of content. I will take care to head Mark Pilgrim’s advice on data fidelity:

Long-term data preservation is like long-term backup: a series of short-term formats, punctuated by a series of migrations. But migrating between data formats is not like copying raw data from one medium to another.

Fidelity is not a binary thing. Data can gradually degrade with each conversion until you’re left with crap. People think this only affects the analog world, like copying cassette tapes for several generations. But I think digital preservation is actually much harder, in part because people don’t even realize that it has the same issues.

He’s also betting on HTML:

HTML is not an output format. HTML is The Format. Not The Format Of Forever, but damn if it isn’t The Format Of The Now.

I don’t think that any format could ever be The Format Of The Long Now but HTML is the closest we’ve come thus far in the history of computing to having a somewhat stable, human- and machine-readable data format with a decent chance of real longevity.