What is still on the web after 10 years of archiving? - UK Web Archive blog
The short answer: not much.
The UK Web Archive at The British Library outlines its process for determining just how bad the linkrot is after just one decade.
The short answer: not much.
The UK Web Archive at The British Library outlines its process for determining just how bad the linkrot is after just one decade.
I’d go along with pretty much everything Anil says here. Wise words from someone who’s been writing on their own website for fifteen years (congratulations!).
Link to everything you create elsewhere on the web. And if possible, save a copy of it on your own blog. Things disappear so quickly, and even important work can slip your mind months or years later when you want to recall it. If it’s in one, definitive place, you’ll be glad for it.
A deeply thoughtful piece (as always) by Wilson, on the mindset needed for a sustainable way of working.
When we start with the assumption that optimizing for rapid, unbounded growth is a goal, we immediately narrow the possibility space. There are only so many choices we can make that will get us there. The same choices that made annual monoculture and the shopping mall the most efficient engines for short-term growth and profit are the same qualities that made them unsustainable in the long term.
There are more ways to scale than growth. There are more ways to deepen our impact than just reaching more people. What if we put just as much effort into scaling the impact of our work over time? Can we build digital products around sustainable systems that survive long enough to outlive us, that are purpose-built to thrive without our constant cultivation?
A documentary on our digital dark age. Remember this the next time someone trots out the tired old lie that “the internet never forgets.”
If we lose the past, we will live in an Orwellian world of the perpetual present, where anybody that controls what’s currently being put out there will be able to say what is true and what is not. This is a dreadful world. We don’t want to live in this world. —Brewster Kahle
It’s a terrible indictment of where our priorities were for the last 20 years that we depend essentially on children and maniacs to save our history of this sort. —Jason Scott
Glenn eloquently gives his reasons for building Transmat:
When I was a child, my brothers and I all had a shoebox each. In these we kept our mementoes. A seashell from a summer holiday where I played for hours in the rock pools, the marble from the schoolyard victory against a bully and a lot of other objects that told a story.
A bit of web history reacted by Paravel: the Microsoft homepage from 1994. View source to see some ooooold-school markup.
Over 700 screenshots of ZX Spectrum games, captured by Jason Scott. Some of these bring back memories.
The first Lunar Orbiter, Andy Warhol’s Amiga, and George R.R. Martin’s WordStar …the opening address to the Digital Preservation 2014 conference July 22 in Washington, DC.
Just as early filmmakers couldn’t have predicted the level of ongoing interest in their work over a hundred years later, who can say what future generations will find important to know and preserve about the early history of software?
(Mind you, I can’t help but feel that the chances of this particular text have a long life at a Medium URL are pretty slim.)
The Internet forgets every single day.
I’m with Jason.
I encourage you all to take a moment and consider the importance of preserving your online creations for yourself, your family, and for future generations.
On the fifth anniversary of Pinboard, Maciej reflects on working on long-term projects:
Avoiding burnout is difficult to write about, because the basic premise is obnoxious. Burnout is a rich man’s game. Rice farmers don’t get burned out and spend long afternoons thinking about whether to switch to sorghum.
The good news is, as you get older, you gain perspective. Perspective helps alleviate burnout.
The bad news is, you gain perspective by having incredibly shitty things happen to you and the people you love. Nature has made it so that perspective is only delivered in bulk quantities. A railcar of perspective arrives and dumps itself on your lawn when all you needed was a microgram.
Some good ideas from Matt on the importance of striving to maintain digital works. I find it very encouraging to see other people writing about this, especially when it’s this thoughtful.
A truly wonderful piece by Mandy detailing why and how she writes, edits, and publishes on her own website:
No one owns this domain but me, and no one but me can take it down. I will not wake up one morning to discover that my service has been “sunsetted” and I have some days or weeks to export my data (if I have that at all). These URLs will never break.
A thoughtful in-depth piece that pulls together my hobby horses of independent publishing, responsive design, and digital preservation, all seen through the lens of music:
Music, Publishing, Art and Memory in the Age of the Internet
This is a wonderful piece of writing and thinking from Frank. A wonderful piece of design, then.
A personal view on generalists and trans-media design
We need a web design museum.
I am, unsurprisingly, in complete agreement. And let’s make lots of copies while we’re at it.
A short video featuring Jason Scott and Brewster Kahle. The accompanying text has a shout-out to the line-mode browser hack event at CERN.
Having experienced the death of a friend, I wonder how many have considered the ghosts in the machine.
The video of my closing talk at this year’s Full Frontal conference, right here in Brighton.
I had a lot of fun with this, although I was surprisingly nervous before I started: I think it was because I didn’t want to let Remy down.
In describing her approach to building the wonderful Julius Cards project, Chloe touches on history, digital preservation, and the future of the web. There are uncomfortable questions here, but they are questions we should all be asking ourselves.
Brightonians, get yourselves along to the Corn Exchange on Monday evening for some fun with Seb’s digital fireworks.
Lawrence Lessig and Jonathan Zittrain are uncovering disturbing data on link rot in Supreme Court documents: 50% of the the links cited no longer work.
An epic tale of data recovery.
Of course Jason Scott was involved.
Some good advice on how to mothball (rather than destroy) a project when it reaches the end of its useful life. In short, build a switch so that, when the worst comes to the worst, you can output static files and walk away.
In all your excitement starting a new project, spend a little time thinking about the end.
I took a little time out of the hacking here at CERN to answer a few questions about the line-mode browser project.
A heartfelt response from Vitaly to .net magazine’s digital destruction.
This is what I’m working on today (where by “working on”, I mean “watching other far more talented people work on”).
A timeline of technology.
Aaron Straup-Cope and Seb Chan on the challenges of adding (and keeping) code to the Cooper-Hewitt collection:
The distinction between preservation and access is increasingly blurred. This is especially true for digital objects.
The internet never forgets? Bollocks!
We were told — warned, even — that what we put on the internet would be forever; that we should think very carefully about what we commit to the digital page. And a lot of us did. We put thought into it, we put heart into, we wrote our truths. We let our real lives bleed onto the page, onto the internet, onto the blog. We were told, “Once you put this here, it will remain forever.” And we acted accordingly.
This is a beautiful love-letter to the archival web, and a horrifying description of its betrayal:
When they’re erased by a company abruptly and without warning, it’s something of a new-age arson.
Oh, dear. An otherwise perfectly well-reasoned article makes this claim:
But the internet is peculiarly adapted to deftly pricking pomposity. This is partly because nothing dies online, meaning your past indiscretions are never yesterday’s news, wrapped round the proverbial fish and chips.
Bollocks. Show me the data to back up this claim.
The insidious truism that “the internet never forgets” is extremely harmful. The true problem is the opposite: the internet forgets all the time.
Geocities, Pownce, Posterous, Vox, and thousands more sites are very much yesterday’s news, wrapped round the proverbial fish and chips.
The Internet, day one. A sad tale of data loss.
This is why the Internet Archive matters. It is now the public record of Obama’s broken promise to protect whistleblowers.
I feel very bad for the smart, passionate, talented people who worked their asses off on change.gov, only to see their ideals betrayed.
A good article on Medium on Medium.
What I fear is that the entire web is basically becoming a slow-motion Snapchat, where content lives for some unknowable amount of time before it dies, lost forever.
A great history lesson from Dave.
Ah, I remember when the CSS Zen Garden was all fields. Now get off my CSS lawn.
I gave the opening keynote at the Beyond Tellerand conference a few weeks back. I’m talked about the web from my own perspective, so expect excitement and anger in equal measure.
This was a new talk but it went down well, and I’m quite happy with it.
Ben proposes an alternative to archive.org: changing the fundamental nature of DNS.
Regarding the boo-hooing of how hard companies have it maintaining unprofitable URLs, I think Ben hasn’t considered the possibility of a handover to a cooperative of users—something that might yet happen with MySpace (at least there’s a campaign to that effect; it will probably come to naught). As Ben rightly points on, domain names are leased, not bought, so the idea of handing them over to better caretakers isn’t that crazy.
This is a breath of fresh air: a blogging platform that promises to keep its URLs online in perpetuity.
Perhaps we are fetishising physical things because our digital creations are social media junk food:
It’s easy to fetishize Brutalist buildings when you don’t have to live in them. On the other hand, when the same Brutalist style is translated into the digital spaces we daily inhabit, it becomes a source of endless whinging. Facebook, for example, is Brutalist social media. It reproduces much the same relationship with its users as the Riis Houses and their ilk do with their residents: focusing on control and integration into the high-level planning scheme rather than individual life and the “ballet of a good blog comment thread”, to paraphrase Jane Jacobs.
Mark writes about his work with CERN to help restore the first website to its original URL.
I have two young children and I want them to experience the early web and understand how it came to be. To understand that the early web wasn’t that rudimentary but incredibly advanced in many ways.
A beautiful short film on the amazing work being done at the Internet Archive, produced on the occasion of their 10 petabyte celebration.
A profile in The Guardian of the Internet Archive and my hero, Brewster Kahle (who also pops up in the comments).
Heartbreaking and angry-making.
The story of one site’s disgraceful handling of acquisition and shutdown (Punchfork, acquired by Pinterest) and how its owner actively tried to block efforts to preserve user’s data.
A collection of those appalling doublespeek announcements that sites and services give when they get acquired. You know the ones: they begin with “We’re excited to announce…” and end with people’s data being flushed down the toilet.
A wonderful rallying cry from Drew.
Ever since the halcyon days of Web 2.0, we’ve been netting our butterflies and pinning them to someone else’s board.
Hope that what you’ve created never has to die. Make sure that if something has to die, it’s you that makes that decision. Own your own data, friends, and keep it safe.
David gets physidigital.
A really lovely piece on the repositories of information that aren’t catalogued—a fourth quadrant in the Rumsfeldian taxonomy, these dark archives are the unknown knowns.
Honestly, if you value the content you create and put online, then you need to be in control of your own stuff.
What an Orwellian title for a blog post announcing the wholesale destruction of user’s content. Oh, Yahoo, you sound so proud of your cavalier attitude towards the collective culture that you have harvested.
A fascinating discussion on sharecropping vs. homesteading. Josh Miller from Branch freely admits that he’s only ever known a web where your content is held by somone else. Gina Trapani’s response is spot-on:
For me, publishing on a platform I have some ownership and control over is a matter of future-proofing my work. If I’m going to spend time making something I really care about on the web—even if it’s a tweet, brevity doesn’t mean it’s not meaningful—I don’t want to do it somewhere that will make it inaccessible after a certain amount of time, or somewhere that might go away, get acquired, or change unrecognizably.
When you get old and your memory is long and you lose parents and start having kids, you value your own and others’ personal archive much more.
I hereby declare that this song is my official anthem.
I want some files that last, data that will not stray.
Files just as fresh tomorrow as they were yesterday.
Investigating the options for off-world backups.
Data is only as safe as the planet it sits on. It only takes one rock, not too big, not moving that fast, to hit the Earth at a certain angle and: WHAM! Most living species are done for.
How the hell is your Twitter archive supposed to survive that?
Here’s a treasure trove of web history: an archive of the www-talk list dating back to 1991. Watch as HTML gets hammered out by a small group of early implementors: Tim Berners-Lee, Dave Raggett, Marc Andreessen, Dan Connolly…
A nice Readlist based on that excellent article by Craig on digital publishing:
This reader is made up of Craigmod’s essay “Subcompact Publishing” and essays linked to in the footnotes.
Very smart thinking from Craig about digital publishing.
Jason goes into detail describing the File Format problem that he and others are going to tackle in the effort known as Just Solve The Problem.
A step-by-step guide to unDRMing your Kindle books—a prudent course of action given Amazon’s recent unilateral wiping of Kindles.
Live in or near San Francisco? Interested in preserving computer history? Then you should meet up with Jason this Friday:
This Friday, October 5th, the Internet Archive has an open lunch where there’s tours of the place, including the scanning room, and people get up and talk about what they’re up to. The Internet Archive is at 300 Funston Street. I’m here all week and into next.
This ticks all my boxes: a podcast by Eric and Jen about the history of the web. I can’t wait for this to start!
Honor compares next week in Brighton to Austin in March.
This is an important subject (and one very close to my heart) so I’m very glad to see these data protection guidelines nailed to the wall of the web over at Contents Magazine.
A cautionary tale from Dave Winer of not considering digital preservation from the outside. We must learn the past. We must.
Kellan explains the tech behind Old Tweets …and also the thinking behind it:
I think our history is what makes us human, and the push to ephemerality and disposability “as a feature” is misguided. And a key piece of our personal histories is becoming “the story we want to remember”, aka what we’ve shared.
Like the Web Standards Project but for ePub. I approve of this message.
An introduction to the important work of digital archivists:
Much like the family member that collects, organizes, and identifies old family photos to preserve one’s heritage, digital archivists seek to do the same for all mankind.
Just copy and paste.
Dear soon-to-be-former user…
A love letter to the Internet Archive.
The Long Now blog is featuring the bet between myself and Matt on URL longevity. Just being mentioned on that site gives me a warm glow.
Jason’s rip-roaring presentation from Defcon last year.
Now this is some prioritisation I can admire:
I’m going to build valuable, reliable, sustainable web services that will last forever.
A thoughtful—and beautifully illustrated—piece by Geri on memory and digital preservation, prompted by the shut-down of Gowalla.
The video of my talk from Webstock, all about wibbly-wobbly, timey-wimey stuff like networks and memory.
The video of my presentation on digital preservation at last year’s Build conference.
Our communication methods have improved over time, from stone tablets, papyrus, and vellum through to the printing press and the World Wide Web. But while the web has democratised publishing, allowing anyone to share ideas with a global audience, it doesn’t appear to be the best medium for preserving our cultural resources: websites and documents disappear down the digital memory hole every day. This presentation will look at the scale of the problem and propose methods for tackling our collective data loss.
Burying physical copies of dead websites in a Croatian cave.
Colly’s thoughts on digital preservation are written in a lighthearted tongue-in-cheek way but at least he’s thinking about it. That alone gives me comfort.
A beautiful reminder that by publishing on the web, we are all historians.
Every color you choose and line of code you write is a reflection of you; not just as a human being in this world, but as a human being in this time and place in human history. Inside each project is a record of the styles and fashions you value, the technological advancements being made in the industry, the tone of your voice, and even the social and economic trends around you.
This evolution of Tom Taylor’s microprinter looks like it’s going to be absolutely wonderful (and packed full of personality). Watch this space.
In a single post, Russell Davies manages to rehabilitate the term “post digital.” And he paints a vivid picture of where our “Geocities of things” is heading.
Reminiscences of the BBSs of yesteryear that could in time be applied to the social networking sites of today.
I’m going to try to make it along to this event in London next month.
A worrying report on the state of digital preservation and the web, specifically in the UK. Welcome to the memory hole.
A superb post by David that ties together multiple strands of personal digital preservation through homesteading instead of sharecropping.
Stewart Brand wrote this twelve years ago: it’s more relevant than ever in today’s cloud-worshipping climate.
I’d like to think that it’s ironic that I’m linking to The Wayback Machine because the original URL for this essay is dead. But it isn’t ironic, it’s horrific.
Amber documents her attempt to turn physical objects imbued with meaning into digital artefacts.
Here’s one to add to Instapaper or Readability to savour at your leisure: Aaron Straup Cope’s talk at Museums and the Web 2010:
This paper examines the act of association, the art of framing and the participatory nature of robots in creating artifacts and story-telling in projects like Flickr Galleries, the API-based Suggestify project (which provides the ability to suggest locations for other people’s photos) and the increasing number of bespoke (and often paper-based) curatorial productions.
September in Brighton is going to be ker-razy! Here’s a nice responsive holding page listing just some of the events that will be going on …dConstruct, Maker Faire, Flash On The Beach and more.
Digital preservation in the art world.
Luke’s notes from my talk about long-term thinking and online preservation at An Event Apart in Boston.
How the Mormon Church are storing and preserving genealogical data inside a mountain.
The editor of New Scientist writes about deletionists and preservationists while adding his own personal poignant perspective.
A blog devoted to sifting through the gems in the Geocities torrent. This is digital archeology.
The threat to Google Videos shows businesses are not suitable cultural custodians — they can’t be held accountable to the public.
Magazine creators share their experiences of going digital.
Andy hammers home the benefit of a long-term format like HTML compared to the brittle, fleeting shininess of an ephemeral platform-specific app.
A detailed look at how French archivists go about preserving websites.