As it turns out, some sites are much harder to archive than others. This article goes through the process of archiving traditional web sites and shows how it falls short when confronted with the latest fashions in the single-page applications that are bloating the modern web.
Friday, November 23rd, 2018
Sunday, September 9th, 2018
It turns out that a whole lot of The So-Called Cloud is relying on magnetic tape for its backups.
Friday, March 30th, 2018
A run-down of digital preservation technologies for very, very long-term storage …in space.
Tuesday, March 6th, 2018
My back-up strategy is similar to Brendan’s (using Super Duper and Backblaze):
In backup parlance there’s a thing called 3-2-1. That is, you should three copies of your files — two locally on different devices and one off site.
But I only do my local back-ups once a week (eek!)—I should do better.
Friday, January 26th, 2018
Off-site backups of humanity’s knowledge and culture, stored in different media (including pyramidal crystals) placed in near-Earth orbit, the moon, and Mars.
We are developing specialized next-generation devices that we call Archs™ (pronounced “Arks”), which are designed to hold and transmit large amounts of data over long periods of time in extreme environments, including outer space and on the surfaces of other planetary bodies.
Our goal is to collect and curate important data sets and to install them on Archs™ that will be delivered to as many locations as possible for safekeeping.
To increase the chances that Archs™ will be found in the future, we aim for durability and massive redundancy across a broad diversity of locations and materials – a strategy that nature itself has successfully employed.
Thursday, January 28th, 2016
This is intriguing—a Pinboard-like service that will create local copies of pages you link to from your site. There are plug-ins for WordPress and Drupal, and modules for Apache and Nginx.
Amber is an open source tool for websites to provide their visitors persistent routes to information. It automatically preserves a snapshot of every page linked to on a website, giving visitors a fallback option if links become inaccessible.
Thursday, March 5th, 2015
The most ambitious project from Archive Team yet: backing up the Internet Archive.
We can do this, people! Moore’s Law and all that.
Tuesday, August 13th, 2013
The Internet, day one. A sad tale of data loss.
Saturday, April 27th, 2013
A profile in The Guardian of the Internet Archive and my hero, Brewster Kahle (who also pops up in the comments).
Monday, December 31st, 2012
Investigating the options for off-world backups.
Data is only as safe as the planet it sits on. It only takes one rock, not too big, not moving that fast, to hit the Earth at a certain angle and: WHAM! Most living species are done for.
How the hell is your Twitter archive supposed to survive that?
Monday, November 26th, 2012
Marc Thiele, the lovely organiser of the Beyond Tellerand conference, needs our help recovering the video footage from this year’s event:
The HDD with all recordings (16 talks, 2 cameras) crashed. After sending the HDD to a recovery center they sent me a quote about 2832 Euro for the recovery job.
That’s about $4000. So far it’s three quarters of the way there already! Let’s see if we can hit that target.
Sunday, August 28th, 2011
A superb post by David that ties together multiple strands of personal digital preservation through homesteading instead of sharecropping.
Wednesday, February 16th, 2011
I wish I had a teacher like David when I was in school.
URLs, permalinks, archives … preservation. It all matters so very much.
Friday, February 11th, 2011
This is the stuff James Bond stories are made of. Except in this case, the fortress exists to store data rather than criminal masterminds.
Tuesday, December 28th, 2010
This looks like it could be a handy tool for backing up Flickr photos.
Monday, October 26th, 2009
Here lies what we could salvage from the ashes of GeoCities.
Friday, June 5th, 2009
A python script from Dan Benjamin to help you do your bit in battling the datapocalypse.
Wednesday, February 18th, 2009
To protect and to preserve
I’m gratified to see that my thoughts on archiving my data—prompted by the shutdown of Pownce, Magnolia, Ficlets, etc., etc., etc.,—are shared by others. But it’s all well and good for me to talk about how I’m backing up by using APIs, RSS, PHP and other non-trivial technologies. As David said when he bookmarked my post:
Now if someone would build a backup-to-local system that I could use…
Now I’m wondering: is there a space for a piece of user-installable software, like Movable Type or Wordpress, that aggregates their data from sites across the web, and then presents it as a site? If there is, is it even possible to write it in a way that anyone who couldn’t have written it themselves can even use it? Can I write it just for myself in the first place?
And before you sneer at AOL people, these people who trusted AOL: how about your Flickr? Your Facebook? Whatever the hot new wig-wag that you’re dumping hours into without thinking about it? What, you’re paying for something? Check this recent event out, paying subscriber: you have shit. Because of a cascade of EULA and Best Practices, and most importantly, a complete disregard for the importance of this data, we’re going to let it happen again. And again. And again.
They’d go to a site, spider the living crap out of it, reverse engineer what they could, and then put it all up on archive.org or another hosting location, so people could grab things they needed. Fuck the EULAs and the clickthroughs. This is history, you bastards.
It’s still early days, but Archive Team now exists.
Tuesday, February 17th, 2009
Paul Mison shares his thoughts on moving towards a decentralised web of services rather than silos of data. "Now I'm wondering: is there a space for a piece of user-installable software, like Movable Type or Wordpress, that aggregates their data from sites across the web, and then presents it as a site? If there is, is it even possible to write it in a way that anyone who couldn't have written it themselves can even use it?"
Tuesday, February 10th, 2009
Archive your Twitter updates with this PHP script.