All Our Yesterdays

A presentation on digital preservation from the Build conference in Belfast in November 2011.

Tomorrow, and tomorrow, and tomorrow
Creeps in this petty pace from day to day
To the last syllable of recorded time;
And all our yesterdays have lighted fools
The way to dusty death.

Thank you. Thank you very much for the introduction, Jesse. It’s kind of surreal to have Jesse Thorn introducing me. I’m a great fan of Sound of Young America, Maximum Fun. And I’m absolutely thrilled that he has a Huffduffer account. That’s fantastic.

So yes, my talk may be slightly different from some of the stuff we’ve been talking about today, although it touches on some of what Wilson was talking about, but I don’t have any Star Wars references. I don’t think there’s any references to Apple, and there isn’t a single Marshall McLuhan quote in the whole thing. But I’m sorry; this’ll be a little different.

What I want to talk about, as Jesse was saying, I want to talk about time. Time and the web. And we are quite preoccupied with time on the web, but a lot of time what we’re preoccupied with is this whole idea of the real-time web. That’s what’s exciting, particularly right now. Josh was talking about—and I totally agree with him—this fundamental change in web design, and a lot of it’s to do with these real-time interactions on mobile devices.

But there’s another kind of time. There’s the longer time. Robin Sloan put this really, really well. He described these two types of time as flow and stock, and he said:

Flow is the feed. It’s the stream of daily and sub-daily updates that remind people you exist. Stock is the durable stuff. It’s what people discover via search. It’s what spreads slowly but surely. Flow is ascendant these days, but we neglect stock at our own peril

And also, I don’t think these two are mutually exclusive. I don’t think we need to either be thinking about the flow—the real-time web—or the stock, the long archived web. The one can become the other. There’s a wonderful blog post my by friend Matt Ogle, and he said:

We’ve all been so distracted by the now that we’ve hardly noticed the beautiful comet tails of personal history trailing in our wake that all those updates can contribute over time to a beautiful narrative, if they survive.

So I think in our industry, a lot of the time we’re not used to thinking of long term challenges, long term design challenges. Generally out there in the world, there are not that many examples of really long term design challenges. We’ve heard reference to architecture earlier, but not really long term. There are a few examples.

This is the Global Seed Bank in Svalbard in Norway, and the idea here is to archive a specimen, at least one specimen, of every plant that exists, a seed from every plant that exists, thinking long-term, that we have essentially a back-up. This is Dropbox for our planet. Very long term thinking. Obviously we want to have more than one Global Seed Bank. You never know where the asteroid’s going to hit, right? But as a design challenge, it really would get you thinking in the long term about the building, about the system of archiving here. This has got to last for generations.

There are other examples of very long term thinking. Nuclear waste storage; that’s a good example. You’ve got places like Yucca Mountain in Nevada. There’s Carlsbad in New Mexico, they bury the waste 2,150 feet under the ground, and it’s got to stay there for tens of thousands of years, and that’s got a real long half-life. How do you warn people to keep away? You’ve buried it as deep as you can, but how do you warn people to keep away? You can’t rely on letters. You can’t rely on signage. You can’t rely on any particular language that happens to be in the ascendant today that will even exist, any method of writing. You can’t rely on iconography: it’s quite culturally specific.

So there was a think-tank of designers came together—there’s this Futures Panel—to discuss how they could design an environment to tell people to keep away. One school of thought was that you don’t design anything. Just by putting anything there, you are essentially attracting attention in the same way that the pyramids are going to get robbed because, hey, look, they’re great big pyramids. But they were brainstorming together and they figured out, what was the message they were trying to get across?

They couldn’t use letters; they couldn’t use iconography, but this is the message they wanted to convey:

This place is a message and part of a system of messages. Pay attention to it. Sending this message was important to us. We considered ourselves to be a powerful culture. This place is not a place of honour. No highly esteemed deed is commemorated here. Nothing valued is here. What is here is dangerous and repulsive to us. This message is a warning about danger.

It’s quite a lot of information that they had to try and pack in without using iconography, without using letters. What they eventually settled on was menacing earthworks and forbidding blocks and sort of thorn like blocks in the earth that should be instinctually repulsive to people. It’s a good example of long-term thinking. Of course there’s no knowing whether it’s going to work or not unless we come back in a few thousand years.

There’s a whole group dedicated to long-term thinking. Who’s heard of The Long Now Foundation? Excellent. Any members of The Long Now Foundation? Okay. Just me.

So for those who don’t know, The Long Now Foundation was formed back in 1996 by a whole bunch of people, including Stuart Brand who was mentioned earlier. Stuart Brand just shows up everywhere throughout the history of the twentieth century. We heard about the Whole Earth Catalog earlier that he did, and he had this whole campaign to have the Apollo ship turn its camera back and turn a picture of the whole earth because nobody had actually seen a picture of the whole earth. You know the Mother of all Demos with Doug Engelbart showing the mouse and windows and icons for the first time? You know who’s behind the camera? Stuart Brand. He’s everywhere.

Anyway, so he wanted to get together this group of people to think about long term stuff. One of the other people was Brian Eno. Brian Eno at the time was living in New York where he found people were thinking very short term. He would ask people, ‘What are you doing right now?’, and they would tell him what they were doing that day. Maybe in the last few hours. Possibly that weekend, but that’s the level of time they’re thinking; very fast-paced. So this was to encourage long-term thinking.

And they took this name The Long Now Foundation, it’s kind of a riff on a fictional organisation in a Robert Heinlein book, it was called The Long Range Foundation. This Long Range Foundation would be given lots of money that it had to plough into stuff that could never be profitable; it was only for long term good. So they’d plough it into things like space exploration and stuff, and of course these things turned out to be enormously profitable, because long-term thinking is a good thing to do.

One of the projects of the Long Now Foundation that’s a great example of long-term thinking is the Clock of the Long Now. They’re designing a clock that’s got to tell the time for ten thousand years. You can see a prototype of it in the Science Museum in London. So Brian Eno was doing the chimes, you know, it’ll chime every year, ten years or something. If you go to the website of the Clock of the Long Now, it’s really fascinating. They’ve got a set of design principles; they’re some of the best design principles I’ve seen. So it’s really making you think about long term thinking.

They also decided that if this was just a thought experiment about long-term thinking, about potentially building a clock, that that would kind of defeat the purpose, so they are actually building this clock, and it’s a scale free sort of thing, so it’s going to work on a very large scale, and they’ve got a mountain in …I think it’s in Nevada, in a very geologically stable place, because that’s an important factor to consider for a ten thousand year clock, and they’ve stated carving the steps in. Obviously it’s going to be a long-term project.

But even that isn’t the longest design challenge that I could think of. If you want a really, really long-term design challenge, you’ve got this. It’s the Voyager golden record, or the Pioneer plaque that came before it.

So Voyager was launched in 1977, it’s currently the furthest man-made thing from our planet, and it contains this golden record. On the off-chance that another civilisation were to discover the golden record, the idea is that they would be able to decode the information on it. This is the flip-side of the actual record, which is kind of the primer to explain how to use it.

And here you’ve got to think, you can’t rely on anything. I mean, measurements of time, we use the second, the minute, the hour. How can you have a universal measurement of time? That’s why we actually have what’s here is supposed to illustrate the hyperfine state of a hydrogen atom, and that’s going to be the universal measurement of time.

They’ve also put a map…this was controversial. Carl Sagan got into some trouble for this. This is a map of pulsars in our galaxy, and there would only be one spot in the galaxy that would exactly that distance from all those pulsars and that would be here on planet Earth.

So a lot of thought gone into trying to explain how to decode this, and basically how to build a record player to play what’s on the other side, which is some images and also audio; Beethoven, Chuck Berry; greetings from planet Earth essentially.

Of course, this isn’t really for other civilisations. This is for us. This isn’t about it being decoded by other civilisations. It’s about if you had to decide, what would you put on this record that’s got to last for, essentially, all time. And that’s actually a thought experiment I’d like you to try. What would you put on this Golden Record; think about it. Could be images and audio, but you’ve got limited space.

Now what’s interesting about the Voyager record—like I said, it was 1977 that the Voyager probes were launched—was that they chose to go with an analogue technology. Because by the late seventies, we did have digital technologies, but still it was a lot simpler to try and encode, “here’s how to build a record player,” than to try and encode, “Okay, step 1, here’s how to build a computer,” that it would actually be more complex. So because this was a long-term design challenge, they fell down on the side of analogue. And we don’t work in the analogue space generally. We work in an environment that’s very much digital. I mean, there’s an analogue component to it, which is the internet itself. The internet is the physical network of computers. Technically that’s analogue; those are real solid, hardware things, but where we work is the web, and the web isn’t analogue. It is very much digital.

I guess we have to kind of define what the web is, because everyone kind of has a different idea of what the web is. For some people, it’s a publishing platform; for other people it’s an e-commerce platform; it’s everything at once. Kevin Kelly called the internet a copying machine, and I guess the web is an extension of that, and that can lead into difficulties. On a technical level I guess what the web is, is you’ve got resources—usually in HTML but not always—resources, probably in HTML that have a URL, a URI, they’re addressable, and they’re delivered over HTTP, the HyperText Transfer Protocol. So technically that’s the web. HTML + URLs + HTTP.

But the definition of the web that I really like comes from David Weinberger, when he calls it ‘small pieces loosely joined’. It’s ephemeral but it gets to the heart of what the web really is. Small pieces loosely joined.

And when it comes to the stuff we put out there on the web, there’s this common perception that it’s written in stone. That once something is on the web, that’s it. It’s going to stay on the web. It’s one of these truisms, like the internet never forgets. Google never forgets. We nod our heads and go, yeah it’s true, just like Eskimos have fifty words for snow and people at the time of Columbus thought that the earth was flat, right? These things sound intuitive. They’re all utterly false. They’re all completely untrue, but they make logical intuitive sense to us, and that’s the danger.

Because it seems to make intuitive sense to us to believe that the internet never forgets, then we neglect what we put there. And, you know, Josh was talking earlier about how his kids are going to have a record of their whole life because right now, there’s pictures being taken, they’re being put on the web. I really hope that’s true. I really do, but I don’t think it will necessarily be true unless you take active steps to make it so.

So when people put stuff on the web and they assume the internet never forgets, I don’t believe that. Citation needed, right? Where’s the data to support this, because if you look at the data, if you look at something that was put online ten years ago, the chances are it’s not still online today. Why do we think that’s going to change?

There’s whole books written about this Viktor Mayer-Schönberger wrote one called Delete which is about the power of forgetting, how we’ve got to be careful because the internet never forgets, so we’ve really got to be careful to delete stuff. But no, actually, you don’t have to worry about it. The default state of stuff on the web is to die, to be deleted. It is castles in the sand, what we’re putting up there, and for us to believe that it’s carved in stone is very dangerous, if we have that expectation that—oh, it’ll be around forever, it’s on the web.

Tim Berners-Lee, in 1999 he wrote, ‘Cool URIs don’t change’, the idea that you’ve actually got to work on keeping something at a resource. What I really like about that—the document about this (just Google ‘Cool URIs don’t change’, you’ll find this document on the W3C site by Tim Berners-Lee)—was he was actually thinking quite long term, because he used that word “cool”, he has a little footnote to explain that, historical note:

At the end of the twentieth century when this was written, cool was an epithet of approval, particularly among young, indicating trendiness, quality or appropriateness.

I like that. I like that sort of long-term thinking.

So as much as I’d like to believe that the internet never forgets, the data would suggest otherwise. With each passing year, the chance of a URL staying alive decreases dramatically. So I put my money where my mouth is.

One of the other projects by The Long Now Foundation is a site of long bets. You can find it at longbets.org. What you can do is you can submit a prediction. You pay money to submit this stuff. It’s not like you can just predict something. Now you put your money where your mouth is. I think it was fifty dollars for the original prediction, and then if someone takes it up with you, you decide on an amount, over 200 dollars I believe, and nominate a charity that will win all that money if you win, and the other person nominates a charity that will win if they win.

My prediction was that the URL for this prediction would not be available in eleven years.

So longbets.org, there’s a site that you’d think should be taking good care of their URLs, and that is the URL, that’s the one, www.longbets.org/601. I have until 22 February 2022. I made this prediction back in February, I just liked the alliteration of the date, that’s why I chose that particular date, that’s eleven years from the original date of prediction.

Interestingly, even since I made that prediction, the site has been undergoing some changes. It think they’ve been switching over to Django or something, and after www now actually will re-direct you to longbets.org/601, but I’m ok with a 301 redirect. I will acknowledge that that is allowed.

But if anyone wants to take me up on this bet and put their money where their mouth is, you’re free to do so at that URL, you have to sign up for longbets. I would love to be proved wrong. I really hope that I’m going to lose this bet, and that in eleven years, that URL will still be available. Or the content will be available at a redirect. But the data does not support that hope.

LOGO2.0 part I

This is a diagram that was put together quite a few years ago now. This was in 2006 at the height of the whole web 2.0 craze. We’re all using the network effects and making lots of rounded cornered logos, candy coloured stuff. I think the purpose of pulling together all these logos was to demonstrate, “wow, aren’t we all using the same colour palettes, aren’t we being kind of unimaginative with our designs, with all this friendly-trendy design, web 2.0 stuff.”

Web 2.0 logo chart - updated for 2009 (flipped companies)

But not too long after this, Meg Pickard in the UK revisited it. Actually I’m not sure how long after the original one this was, but she just noted down all of the companies that had been bought out by a larger company in the meantime, and there were quite a few.

Web 2.0 logo chart - updated for 2009 (flipped & dead companies)

She also noted all the companies that were gone. All the ones that had just disappeared from the web. And remember, these are the web 2.0 sites they’re asking you to upload your data, upload your information, your content, your dreams, your hopes, your photos, your images, your videos.

Josh mentioned the stats for Facebook, for Twitter, for Tumblr. What was it, 125 million photos a day, more? I mean it’s wonderful that people are publishing on the web, but you are entrusting your hopes, your dreams, to a third party. I think that could be dangerous, as we’ve seen. Nobody’s too big to fail on the web. Friendster was really big one time, and it’s hard to imagine a world without Facebook, but it could easily happen.

Connor O’Brien wrote an article called Linkrot and he summed it up nicely when he said:

If your only photo album is on Facebook, ask yourself, since when did a gratis web service ever demonstrate giving a flying fuck about holding onto the past?

Case in point.

There was a time when it was inconceivable to imagine the web without Geocities. Oh but Geocities, it was so ugly, right? We’re not going to miss that. We want to preserve the good stuff, not that ugly, ugly, under construction stuff on Geocities, right? Wrong.

I think it was such a crime. I really mean that. A crime Yahoo committed in shutting down Geocities.

Phil Gyford summed it up nicely. The reason why this will be so sorely missed—and apart from the fact of the size of the thing—there was so much, so many hopes and dreams on Geocities. He summed it up at the time when it was yanked off the web by Yahoo. He said:

Geocities is an awful, ugly, decrepit mess, and this is why it will be sorely missed. Geocities shows what normal, non-designer people will create if given the tools available around the turn of the Millennium. As companies like Yahoo! switch off swathes of our on-line universe, little fragments of our collective history disappear. They might be ugly and neglected fragments of our history, but they still got us where we are today.

Literally destroying our history. Ugly and neglected fragments of our history, but that’s our history. Yahoo destroyed it.

And also the apologists, whenever something like this happens—because the BBC have been talking about pulling some of their stuff off the web as well—and there are two different arguments that are mutually incompatible about why this stuff might need to get pulled off the web:

  1. Well, nobody’s visiting it, so nobody’s going to miss it so we might as well pull it off the web.
  2. That the bandwidth costs of keeping this stuff up and all these people visiting it…

So …which is it? Either slap some ads on it and make use of that traffic, or if there’s not many people looking at it, what’s the harm keeping it up?

Anyway, the person who summed it up best I think is Jason Scott. He got together the Archive Team. They managed to safe a fair proportion of Geocities. Not the URLs obviously, and believe me there’s a whole bunch of links on Wikipedia suddenly went dead the day that Geocities died, but he put it nicely, he gave us the historical perspective. He said:

When history takes a look at the lives of Jerry Yang and David Filo, this is what it will probably say. Two graduate students, intrigued by a growing wealth of material on the internet, built a hugged fucking lobster trap, absorbed as much of human history and creativity as they could, and destroyed all of it.

And I’m not just talking about the big things like this. And let’s face it; it’s only a matter of time before MySpace goes the way of Geocities. And you might think, “well yeah, who’s going to miss MySpace?” Think about the amount of creativity that’s poured in. Maybe not by us professional designers. By the ordinary, everyday people. That’s history.

I’ve been bitten by this. Anybody remember the site Pownce? A few, a few. It was a lovely site. I really liked Pownce. And I poured a lot of myself into it. It was really beautifully designed. Daniel Burka was the designer at Pownce. A really well put together site, and it’s gone now. In this case it wasn’t wilful destruction, but Pownce got bought up by a larger company. Pownce was basically just two or three people, they did a fantastic job, and they were bought up by Six Apart, you know, the people who do Movable Type. And Six Apart shut down Pownce after they bought it up. But it’s okay, everyone who did have a Pownce account, you can have a free Vox account, right? And I think it was six months later, Vox was shut down. Taking seven million URLs off the web.

At this stage I’d kind of learned my lesson, but I also have had services go down. I used to use Magnolia for my bookmarking. That was a different case again; that was technical error that everything just got wiped out. I did manage to recover my bookmarks. So you know, they’re on Delicious, they’re perfectly safe, right?

After the whole sun-setting slide debacle, I decide alright, this is it. I can’t keep entrusting stuff to these third party services, no matter how big they seem, too big to fail. I cannot rely on them for anything I really value. Well what’s the solution?

Well in the case of my links, because that’s a fairly easy thing to handle, I now host my own links, my own sort of bookmarking service. But what I do is, I also ping Delicious. Every time I link to something through my own little hand-rolled CMS, from my own website, I also fire off a post to Delicious, so Delicious is still where you can find my links, but Delicious doesn’t hold the canonical copy. And this is something I think you need to bear in mind if you try to go down the route of self-hosting, which I think is the way we should be going, but we still want to get the benefits of those big services. We want to get the tool benefits, we want to get the network effects, the social benefits.

So a model that works well seems to be publish on your own URL on your own website, and syndicate out to third party services. So I’m not suggesting that the solution to all of this is that you’ve got to hold onto your own stuff, hold onto your data at your own URL, your own little island, but no, that you publish in your own place and allow many copies to be circulated around the web. And some people are doing this. It’s something that Steven Pemberton’s been talking about. Tantek Çelik is doing it for his tweets. He tweets from his own website. He tweets from tantek.com, there’s a copy of that stored on Twitter, but the canonical copy is on his own site.

And the problem is that this is still too geeky, right? I can do this maybe, and again only for the simple cases like links. What am I going to do about my pictures? That’s a lot harder. Video, really hard. The tools for self-hosting are still way too hard and way too geeky. But that’s an area of challenge for us, I think. I think it’s a design challenge.

Also, this idea that yes, I will let the third party services have my data but I want to hold onto the original copy. Well it turns out some of the third party services are not so keen on that. Just last night, I don’t do it very often, but I logged into Facebook, and I never publish anything original on Facebook. I allow Facebook to get, you know, my pictures from Flickr. I allow Facebook to get my blog posts posted to my wall in Facebook, but when I logged in last night, I was greeted with this message:

You currently automatically import content from your website or blog into your Facebook notes. Starting November 22nd, this feature will no longer be available, although you’ll still be able to write individual notes. The best way to share content from your website is to post links on your wall.

Talk about not meeting you half-way.

And there’s another problem with self-hosting. So do you have your own URL, your own place on the web and that’s where you post a canonical copy. You don’t actually have your own URL. You don’t buy a URL or domain name. You rent it. And it’s certainly not long-term thinking if you think about the length of time you rent domain names for. A year, two years maybe? Five years, ten years? That’s not that long when you’re talking about your family’s memories.

It’s really interesting. If you look at the domain name system. Everything else that’s good about the web is decentralised. And there’s this one point of centralisation that’s with ICANN. Now I’m not suggesting that ICANN should go and I’m not proposing an alternative. I’m just pointing it out, that one of the great things about the web as a network, as a platform, is that it doesn’t have centralisation. It has large hubs, but it doesn’t have centralisation, and this is one example where you do have centralisation. Maybe we should go back to just using IP numbers, I don’t know. It’ll be interesting to see how important domain names are. With the rise of Google maybe you just need to search for stuff.

So even assuming we can store stuff on our domain and we have taken steps to make sure that domain lasts a long time, there’s still the question of the formats. The formats that you’re storing stuff today, how do you know that that format will be readable in the future?

A case in point, and this is more about the medium than the format. This is the Domesday Book. Commissioned by William the Conqueror after 1066. Essentially, a census of the UK, and it still exists today in the British Library in London. The format is an older version of English, but recognisably English. The medium is vellum, and it’s surprisingly durable.

Well, coming up to the 1000th anniversary of the Domesday Book, the BBC commissioned The Domesday Project to create a digital version of the Domesday Book, and the medium in this case was the laserdisc. It was actually a customised special form of the laserdisc, so it was even more niche. What they were trying to do was fantastic, to get this into schools and allow people access to information, but the problem of formats and media really raised its head.

Another long-term format is stone. Stone lasts a long time. Depends on the stone, but here’s an example of a long-lasting medium, but a format that didn’t last that long. One of the formats on the Rosetta Stone, which you can see in the British Museum, is Egyptian hieroglyphics. And by the time of the discovery of the Rosetta Stone—the Rosetta Stone dates back to maybe 196 BC or so: it was rediscovered in 1799 by French troops—and Egyptian hieroglyphics at this point are completely unreadable. The ability to decode that format was lost. Luckily, there were two other formats on the stone medium, and that was Ancient Greek and Demotic script and that allowed the code to be cracked. Champollion was able to crack the code of hieroglyphics because there was a longer lasting format, two redundant longer lasting formats encoded along with it.

So the issue of formats is a tricky one. I want to skip ahead rather than behind and read to you from a work of science fiction. This is from Glasshouse by Charles Stross. It’s set a couple of hundred years from now in a post-singularity society, and someone is looking back at our time and describing it to people:

For reasons of commercial advantage, some of their largest entities deliberately created incompatible information formats and locked up huge quantities of useful material in them, so that when new architectures replace old, the data became inaccessible. This partially affected our records of personal and household activities during the latter half of the Dark Age. Early on, for example, we have a lot of film data captured by amateurs and home enthusiasts. They used a thing called a cine camera, which captured images on a photo chemical medium. You could actually decode it with your eyeball. But a third of the way into the Dark Age, they switched to using magnetic storage tape which degrades rapidly, then to digital storage, which was even worse, because for no obvious reason, they encrypted everything. The same sort of thing happened to their audio recordings and to text. Ironically, we know a lot more about the culture around the beginning of the Dark Age around old-style year 1950 than about the end of the Dark Age around 2040.

So when you’re thinking about the formats, a couple of little guidelines. First of all, text is good. In terms of longevity. Plain text is obviously the simplest kind, and then we get progressively more complex after that. HTML kind of strikes a nice balance but it’s basically a text format. It’s good for a couple of reasons. First of all, it’s human readable. We can decode the actual source of a text file with our eyeballs, and that’s useful. Also, it accepts lossiness. If you were to lose chunks of the text file, you could still make sense of the rest of it, or at least the words that were left, the atomic units.

If you contrast that to binary formats, images and video, more complex things, it gets a lot harder. When we look at the source code of an image or a video, it’s very hard to decode, and once you lose part of that file, it may not be recoverable. You may not be able to get any idea of what the rest was, so these binary formats make a lot of sense to machines, but less sense to humans.

So it’ll be interesting to know which of the formats we’re using today are likely to last longer. You think about it, every time you put something online, what format is it in? If it’s an image, what image format? Is it jpeg, png? Video, what video format?

And there is an ongoing experiment to figure out sort of the longevity of some formats. There’s a group called PLANETS, Preservation and Longterm Access to Network Services, and the Open Planets Foundation started a project on 18 May 2010 to put a number of formats and a number of media into a vault, essentially, in Swiss Fort Knox. I don’t mean something like Fort Knox in Switzerland. This place is called Swiss Fort Knox, and this is it. This is real. This is for data storage and back-ups. It’s awesome. It’s like a James Bond villain layer. You can land your helicopter there. This is where I want to store my stuff!

So this is near Galen in Switzerland, and I think there’s a twenty year experiment to keep this stuff in there. So here’s what they’re storing. They’ve got the following media:

  • Paper,
  • microfilm,
  • floppy disc,
  • audio tape,
  • CD,*
  • DVD,
  • USB and
  • BluRay.

And then the formats stored on each one of those are:

  • .mov,
  • .jpeg,
  • .pdf,
  • Java and
  • HTML.

I know which one I would bet on in twenty years’ time, and that would be the HTML. Obviously I’m a bit of a fanboy for HTML as you may have guessed. But it’s not a by-product of HTML that is going to last a long time. That’s actually a design decision of HTML that it’s going to last a long time.

I’m going to show my age. Anybody remember a website by Owen Briggs called The Noodle Incident? All right, a few. He was awesome. Here’s a blog post from 2001, where he’s talking about the importance of standards and more specifically HTML. Essentially about validating your HTML and not using proprietary stuff. He said:

The code has to expand its capabilities as we do, yet never in a way that blocks out earlier documents. Everything we put down now will continue to be readable as long as it was recorded using mark-up valid at its time. This is an attempt to make a code that can go decades and centuries, getting broader in scope without ever shutting out its early versions.

That’s a beautiful summation of the design principles behind HTML that’s currently being carried on with HTML5. We get very caught up with the shininess of the new stuff, but a lot of it is about backwards compatibility. Ian Hickson, the editor of the HTML spec has said this right up-front. In 2007 he said:

I decided for the sake of future generations we should document exactly how to process today’s documents so that when they look back they can still re-implement HTML browsers and get our data back.

Mark Pilgrim, before he decided to take every single thing he ever put on the internet offline, two years ago wrote:

I am supremely confident that the HTML I’m writing today will still be readable ten years from now.

If only he had decided to keep the URL intact.

There’s an issue, if you’ve been thinking about what you might put on your Voyager record and you might decide, well I’m going to store an HTML because that seems a good, durable format, on my own domain, I’m not going to trust a third party server. There’s still the issue of rights.

This is a fairly recent addition to human endeavour, this whole idea …first of all there’s this nonsensical idea of Intellectual Property, which has no legal meaning but is used a lot. But just licensing in general. And as I said before, the internet is a copying machine. You want to make lots of copies of stuff to make use of that, but then we restrict how you can copy stuff by imposing licensing upon it, which seems a strange thing to do. Particularly we have this issue of copyright.

We act like copyright is this immovable thing that’s always been there and it’s carved in stone. Copyright’s always been changing. There was no copyright until 1709, the Statute of Anne, and then it was fourteen years after the creation of a work, which seemed like a decent amount of time to make your money and then put it into the public domain. We got later Acts, in the US there was, the original US Copyright Act was 1719 and it was 1909. 1976 it got increased to lifetime of the author plus fifty years, and then in 1998, it was increased to the lifetime of the author plus seventy years. This was the Sonny Bono Copyright Term Extension Act, also called the Mickey Mouse Prevention Act, because that would’ve been the year that would’ve been fifty years since the death of Walt Disney, and Mickey Mouse would’ve passed into public domain. There was a lot of lobbying to stop that happening so it became seventy years, and then after that, once it gets to seventy years since the death of Walt Disney, it’ll be ninety years, a hundred years. Essentially what we have now is copyright in perpetuity.

Again, I’m going to read from Charles Stross. This is from a different work called Accelerando. The whole text of Accelerando is available online if you want to read it. This is more talking about a much nearer future to today, very near future from where we are now.

The International Convention on Performing Rights was holding a third round of crisis talks in an attempt to stave off the final collapse of WIPO music licensing regime. On the one hand, hard-liners representing the Copyright Control Association of America are pressing for restrictions on duplicating the altered emotional states associated with specific media performances. As a demonstration that they mean business, two software engineers in California have been kneecapped, tarred, feathered and left for dead under placards accusing them of reverse engineering movie plotlines using avatars of dead and out of copyright stars. On the opposite side of the fence the Association of Free Artists are demanding the right of performing music in public without a recording contract, and are denouncing the CCAA as being a tool of the mafia apparatchiks’ who have bought it from the moribund music industry in an attempt to go legit. FBI Director Leonid Kubayeshev(?) responds by denying that the Mafia’s a significant presence in the United States but the music business position isn’t strengthened by the near collapse of the legitimate American entertainment industry, which has been accelerating ever since the Nasty Noughties.

The worst example of licensing and formats coming together is what’s so euphemistically known as DRM, Digital Rights Management. This is where you literally encode the licensing restrictions into the format, thereby ensuring that this data cannot survive, that it will not have a long life. If you subscribe to a third party service that used some form of DRM and that third party service is gone, there’s no way to access your data, so if you were a customer of Virgin Digital’s music service, or MSN Music, you cannot get at any of that data. Virgin Digital shut down in 2007, MSN Music in 2008. They were all using DRM. None of that data is accessible now. And it seems like such a losing battle.

Bruce Schneier, the security expert, he put it best, he said:

Trying to make digital files uncopyable is like trying to make water not wet.

Go with the flow of the web. Let your content out there. Let people make copies and they will return to you 100-fold.

There’s a beautiful story about a different third party service. There’s a site called …originally called Ficlets. Anybody remember Ficlets? OK, not many. It was a lovely service where you’d write a story of 100 words and then somebody else could kind of fork it and write the next 100 words, where somebody else could do it. Like GitHub for fiction, it was really lovely. And a bunch of people who were working at AOL at the time did this kind of as a side project and it was hosted by AOL, it was on AOL Services, and AOL being AOL of course decided to shut it down after a while because they’re, if anything, even worse than Yahoo when it comes to shutting down this sort of stuff.

But here’s the great thing. When these people set up Ficlets, one of the things you did when you were signing up was you agreed to license your stories, your content, under a Creative Commons Licence, Creative Commons Attribution Licence, so when AOL said to them, “we’re shutting down the service,” they said, “that’s absolutely fine, we’re taking all the data with us.” And they could do that, and now you can go to Ficly.com and carry on publishing those stories.

So let me pause and just wrap up what I’ve gone through.

First thing. First thing is acknowledging the existence of the problem and questioning that the internet never forgets, because the data would seem to suggest otherwise. We all have lots of stories about people who’ve been bitten on the ass by it. Oh they published something on Facebook or this photo went online and then they lost their job, but check out the timeline; it’s very rarely they published a photo ten years ago and that cost them their job. It’s usually much shorter than that. Also, the plural of anecdote is not data. So, acknowledging the problem.

Then there’s the issue of hosting. Are you going to trust your hopes and dreams, your memories to a third party service? You’d better have a pretty good reason for doing that. You can try self-hosting. The problem is that’s still kind of geeky, so that’s a problematic area.

Then there’s the formats you decide to store your information and your memories in. I recommend standards because they last longer. Steer clear of proprietary formats. Stay close to standards, in particular I like HTML.

And then there’s licensing. I think it makes sense to license liberally rather than restrictively so that multiple copies can legally be made of your hopes and dreams and data. There’s this term from the Stanford Digital Preservation School where they call LOCKSS. Lots Of Copies Keep Stuff Safe. I like that.

So then it’s just a matter of thinking, well what would I store for future generations? Suppose I am going to think long term and the legacy. What would you put on that Voyager record?

And we have some records from previous civilisations essentially. This is from the fresco in Pompeii. Pompeii was utterly destroyed in the year 79 AD, rediscovered in 1599, and all of this stuff was intact. There’s a lot of porn! There’s a lot of porn in Pompeii, and I don’t think we should try to deny all sides of our nature. So you might think, “well I’m going to put the best thing I’ve ever done online. I’m going to edit myself, only put the good stuff online.” Think about putting the failed things, the crappy things online. What Phil Gyford referred to Geocities as: “awful, ugly, decrepit mess.” Keep that stuff.

There’s a similar example of this idea of do we censor history if it’s not high art? This was at the end of World War 2, the Russians had reached Berlin and they’re at the Reichstag and they were scrawling graffiti on the walls of the Reichstag. It’s pretty filthy stuff, this graffiti, it’s got to be said, but it’s historically really important. And after Unification, they decided would they scrub the walls when they were moving back into the Reichstag, and I think they made the right decision to keep this stuff, to keep it as a historical document.

Of course, forty years later, it was a different wall that was being torn down.

And I think it’s important to remember it’s not about the high art. It’s often about the everyday people, and remembering what everyday people did. We’ve got beautiful illuminated manuscripts, they’ve survived hundreds of years, and there’s the actual content, sure, usually the Gospels, religious material that’s in there. But there’s also the marginalia that the Monks themselves wrote in there. Little updates on their life, usually in in 140 characters or fewer. One wrote:

My hand is weary with writing, my sharp quill is not steady. My slender beaked pen just forth a black draught of shining dark blue.

Another one wrote:

Pleasant is the glint of the sun today upon these margins because it flickers so.

I love that. It’s kind of like what Patrick Kavanagh the poet wrote about as ‘wallowing in the habitual, the banal.’ He had that beautiful phrase, ‘wherever life pours forth ordinary plenty.’ Ordinary plenty. That’s what we should be preserving.

And it can be easy to just throw up our hands and say, I don’t know. This seems too hard, it’s too big a challenge. My friend Dan, he wrote this on Twitter, he said:

Reality? Very little on the web will be permanent. Embrace that.

I fully agree with the first statement. Very little on the web will be permanent, but I utterly reject, I rage, rage against the dying of the night that we embrace that.

My other friend Mandy Brown put it nicely when she said:

No civilisation has ever saved everything. Acknowledging that fact does not obviate the need to try and save as much as we can. The technological means to produce an archive are not beyond our skills. Sadly, right now at least, the will to do so is insufficient.

I think that’s really important to remember. I don’t think this is a technological problem. Maybe there’s legal issues, there’s cultural issues. Some technological issues around formats and storage, but ultimately it’s about our will and our decision that we want to preserve this stuff.

Wilson put it really nicely earlier when he quoted John Ruskin:

When we build, let us think that we build for ever.

And I really think this is an important thing, and this isn’t just something from the tippy-top of Maslow’s Hierarchy of Needs. I think this is important for our race, for our culture. We don’t want this to become a digital Dark Age. I think the internet is important as a real-time medium, and equally important as a storage medium. It allows us to tell each other things faster, which is hugely important for a civilisation, and if we can also reach back into time on the web, that would be magnificent.

I want to thank these people for being kind enough to share their photos under a liberal Creative Commons Licence. This is licensed under Creative Commons Attribution Licence, and I want to thank you for listening. Thank you very much.

Licence

This presentation is licenced under a Creative Commons attribution licence. You are free to:

Share
Copy, distribute and transmit this presentation.
Remix
Adapt the presentation.

Under the following conditions:

Attribution
You must attribute the presentation to Jeremy Keith.