Fingers crossed for the digital archive
The times we’re living in seem like an Age of Exposure—exposure of, exposure to. But if our digital lives were to disappear, so little record might survive of this moment in time as to make it seem, to anyone looking back, like a new Dark Age.
The traces of our times, of us—our images, our words—we blithely consign to safekeeping in strings of 0s and 1s. Yet digital data is vulnerable. Whatever the hardware or software or the imagined sanctity of the ‘cloud’, its eventual failure is all but guaranteed. The same was doubtless said of our former reliance on flimsy, flammable paper. Still is said: there was an outcry earlier this year when, as a cost-cutting measure, it was decided that archival copies of Acts of the British parliament—starting with the Brexit legislation—would be printed on paper rather than the traditional vellum.1
Long-term survival of cultural data can be flukey, it’s true, no matter what the medium. But the oldest records on paper in the British Parliamentary Archives, dating from the early sixteenth century, are still readily identifiable as historic documents and even legible to a trained eye. The same cannot be said of the contents of a floppy disk created in 1987. In the absence of compatible technology of the same vintage and in working order (a rarity), the 30-year-old disk is more inscrutable than any hieroglyph.
In a 2014 documentary titled Digital Amnesia, a US aerospace engineer and self-described ‘techno-archaeologist’2 reflected on the fragmentary nature of surviving records from antiquity: ‘We have less than one per cent of the written documents from Rome,’ said Dennis Wingo; ‘we have less than half a per cent of the written documents from Greece; from Egypt, from the Egyptian civilisation that lasted 3000 years, we have .0001 per cent of their documents.’ Then the kicker: ‘I have six hard drives from 1995; only three of them work.’ Those hard drives held irreplaceable data from an experiment conducted aboard a space shuttle.3
Geophysicists published research, also in 1995, that used eclipse observations carved on ‘oracle bones’4 by ancient Chinese astronomers to calculate changes in the Earth’s shape and rotation over the past 3000 years.5 The oracle bones’ survival accounts for half the story; just as critical is that they still ‘worked’, that they were decipherable. Preserving data is pointless unless it can be accessed. We rely on the long-term viability of digital data, not just to keep our stuff safe and retrievable but also to keep a record of our times.
Much of the personal, commercial, cultural, governmental and research data created during the past 20 years has been produced and stored digitally. Increasingly, that data is entrusted to web servers and the cloud—which is to say, enormous and enormouser data centres, or server farms. By 2020, it is estimated, the amount of digital data created every second will be equivalent to 1.7 megabytes for each person on Earth, nearly all of it stored remotely.6
Global server capacity is growing all the time. The largest server farm (tight-packed ranks of blinking boxes make the allusion to battery-farming an apt one) covers nearly 10,000 square metres, and even bigger ones are under construction. They devour electricity to run the servers and the fans that must run constantly to prevent their overheating. New server farms trumpet their ‘green’ technologies, which not only will reduce power use but, in chilly northern locales, can redirect the heat produced by the servers to residential and office buildings.
Nonetheless, it is expected that, as the volume of digital data redoubles, the capacity to store it will come under pressure. When that happens, web-hosting and data-storage services will winnow their clients’ digital data, letting algorithms decide what is kept and what goes. Already, the terms of service of a web-hosting or cloud-based service typically include a caveat to the effect that ‘We are not responsible for any and all files and data residing on your account on our servers.’7 And server farms are landlords, in effect: if a tenant stops paying rent, their data will be evicted. Web-based businesses are at least as prone to failure as any storefront operation. But the chances are that when the latter shuts up shop it won’t take your stuff with it. Even ‘acquisition’, or change of ownership, can lead to a changed or diminished service to users, with potential loss of data.
Picturelife, a popular app that ‘stores all your photos and videos securely in the cloud, giving you access to them wherever you are’,8 was acquired early in 2015. But the business didn’t flourish as its new owner hoped and, to reduce the cost of storing 200 million photos and videos, he moved the files onto one-third of the server space, effectively disabling the app. Picturelife’s 220,000 subscribers found they could no longer access their photos. Although they were assured that their pictures were safe, that gave cold comfort to subscribers for whom years’ worth of photos were now trapped in the cloud.
The cloud that is not a cloud, but a host of blinking boxes. We can suppose that servers, being machines, will not be immune from catastrophe—destruction, corruption, being hacked to death. But reliable storage isn’t the only condition necessary for the long-term viability of digital data. Data decay isn’t a doomsday scenario; it’s incremental. It’s happening all the time. The rapid obsolescence of hardware and digital formats can render data ‘dead’ in the space of a decade or less. And then there’s the problem of rot.
‘Reference rot’ is an umbrella term for two kinds of online data loss: ‘link rot’ and ‘content drift’. When you click on a link and, instead of the promised destination, find yourself at an error message, ‘404: Page Not Found’, that’s link rot. The page that the link referred to (or the URL that you’ve painstakingly typed into your browser’s address line) no longer exists: ergo, the link is rotten. Perhaps the creator, a government department, has removed content that’s no longer current; or maybe the department itself has been removed. A political party, say, might expunge old content from its website to avoid being reminded of policy swerves and promises unkept. In online commerce, 404s often mark the graves of defunct product lines. In any case, a 404 message indicates that the thing you’re looking for is gone.
Content drift may not be so apparent. Here, the link delivers you to a destination, but it may not be the one you were expecting. The original content has been replaced, either with an updated version or something entirely different. Not such a problem, you might think. But the scientific, medical and legal professions, as well as scholars across all disciplines and, really, anyone who uses the internet—all depend on the integrity of digital links. To substantiate and contextualise their findings, authors of scientific breakthroughs and landmark legal decisions increasingly cite digital rather than printed sources. Yet studies in the United States of a wide range of scholarly publications have found that, after six years, between 50 and 75 per cent of cited links misfire9. That’s not surprising, since a webpage exists on average for just 100 days before being changed or deleted, resulting in a loss of history and functionality that affects anyone using the internet.10
If digital data is fragile, ephemeral, prone to decay, deletion, displacement, misplacement, what do we risk losing? Imagine a future in which today’s digital data, however securely stored, is as good as gone. The lights, let’s suppose, have blinked out. How much of our selves and our times would be lost if the digital record were rendered blank? Our social and creative selves (email, Facebook, Instagram, Twitter, Pinterest, etsy); the things that entertained us (memes, cat videos, Warcraft, fan fiction), informed and misinformed us (Google, Wikipedia, online news). And all our photos, reduced not even to dust: just so much dead data.
Besides the threat of data loss, there’s the void of non-creation. Letters and diaries, which may survive, as they have done in the past, to furnish enduring proofs of connection and interiority, are almost extinct forms. Being the sole daughter at the end of a dwindling line, I’ve inherited two boxes of my great-grandparents’ letters. They’d not long been married when Nellie, visiting her parents in the country, wrote home to John in July 1892: ‘If I could only pop in of an evening, and put my arms around your neck, and have a nice kiss, it would be so delightful.’ ‘I, too,’ John replied by return post, ‘would be pleased if my little girl could “pop in” and go through the osculatory performance she speaks of; the mere thought of it is sweet.’ (Not only does that tell me that the bluff old bloke of family lore once had a beating heart, it also hints at a familial fondness for semi-colons.) What, do you suppose, are the chances of such an exchange surviving for 125 years as an email or Facebook message?
But in the event of a digital apocalypse, we’ll still have our memories, right? Maybe not. Ever since the internet first took hold, and especially since the rise of the smartphone, commentators and researchers have warned that, by ‘outsourcing memory’ to our technologies, we risk succumbing to ‘digital amnesia’.11 Facebook, programmed for nostalgia, regularly presents us with memories of what we were doing, say, a year ago. And Google never forgets. But by growing reliant on Google’s ‘informational ubiquity’, we’re eroding our own capacity to store, retrieve and synthesise knowledge. (In 2008, Nicholas Carr famously posed the question: ‘Is Google making us stupid?’12) Socrates fretted about much the same thing 2400 years ago, when writing began to rival dialogue as a means of passing on knowledge. Writing, he said (or Plato says he said), ‘will create forgetfulness in the learners’ souls, because they will not use their memories … they will be hearers of many things and will have learned nothing; they will appear to be omniscient and will generally know nothing’.13 The unfoundedness of Socrates’ fears rather favours a view that our recourse to Google is no different to consulting books or elders, and that, given the impossibility of knowing everything, knowing where to find the answer is what matters.14 But if our knowing where begins and ends with Google, mightn’t that be a problem should we have to fall back on our own resources? What if we find we don’t have any?
Established systems of memory-keeping didn’t prevent the loss of knowledge about ancient cultures, following their collapse. The Romans’ sophisticated hydro-engineering techniques took well over 100 years to reinvent. Sappho’s writings exist only in fragments. Egyptian hieroglyphics remained indecipherable for millennia until the Rosetta Stone supplied the key. The scientific and technological expertise behind the Antikythera mechanism—a complex ancient Greek astronomical instrument that has been likened to a computer—was lost for more than 1400 years. And there’s no way of guessing how much recorded knowledge disappeared entirely or else, surviving, has gone unrecognised.
It’s not that the threats to digital data haven’t been noticed, or that nothing is being done to address them. Brewster Kahle, who made money in the tech industry early on, took his winnings and founded the Internet Archive. Since 1996 the San Francisco–based non-profit has been doing just what its name says. Its mission is ‘to give everyone access to all knowledge, forever’.15 You may know the Internet Archive as the repository for millions of digitised, out-of-copyright books, available free online. But it is best known for the Wayback Machine (the name was inspired by the WABAC machine, the invention of a professorial canine time-traveller in the 1960s TV series The Rocky and Bullwinkle Show), a robot that ‘crawls’ the internet, copying every website it finds—not just once, but every few weeks or months, so that it captures different iterations of the same site. The resulting copies are stored in the Internet Archive. But you can’t search the Internet Archive the way you do the internet. While the framework on which it is based makes the internet like a blackboard, overwriting old versions to present the most up-to-date ones, the gleanings of the Wayback Machine are layered. To search the archive, you enter a URL and a past date; the Wayback Machine locates its copy of the website nearest the date you’ve stipulated (it also offers other dates, for comparison). This means that, should you encounter a dead link in an article, receive a 404 error message or arrive in clearly the wrong place, you can use the Wayback Machine to time-travel the web and resurrect the lost content.
It’s possible to search not just the Wayback Machine but also the British, Icelandic, Slovenian, and 21 other web archives at the same time, using a service called Time Travel.16 Although no other internet archive approaches Brewster Kahle’s in size, some do have longer memories. The PANDORA web archive, at the National Library of Australia (NLA), has been collecting Australian websites since 1995 and, unlike the Wayback Machine, is fully searchable. Also at the NLA is the Australian Government Web Archive, and future scholars, as well as present-day cynics, will derive much interest from the shifts in policy and euphemism reflected therein. In the United States, the Library of Congress has undertaken to archive Twitter in its entirety, a resource not yet accessible but that promises no end of illumination to researchers at some distant date.
Globally, a proliferation of bodies with names such as the International Internet Preservation Consortium or the Blue Ribbon Task Force on Sustainable Digital Preservation and Access signal high-level concern about the future of digital data, and coordinated efforts to secure it. Funding from a US digital preservation outfit resulted in the Time Travel search service. And capturing internet content for posterity doesn’t rely on the serendipity of its being ‘crawled’ by the Wayback Machine; using the Internet Archive’s ‘Save Page Now’ tool, anyone can archive a webpage, anytime. Also, tools have been developed—Perma-cc and WebCite among them—specifically to address the problem of reference-rot, by enabling the easy creation of digital links that are automatically archived and, hence, unbreakable.
At the gonzo end of the internet preservation spectrum is the Archive Team, which calls itself a ‘loose collective of rogue archivists, programmers, writers and loudmouths’.17 An even better descriptor, from a Spanish-language website, roughly translates as ‘superheroes who prevent websites from falling into oblivion’.18 Under the motto ‘We are going to rescue your shit’, Archive Team volunteers raid websites that are shutting down—sites such as the Facebook-precursor Friendster, or Posterous, a blogging platform—and run a ‘Warrior’ code to copy the site’s entire contents.19 User data thus saved from the black hole is made available on the Internet Archive.
Archive Team’s loudmouth-in-chief, Jason Scott, sees his mission as preserving not just the internet but also history.20 Digital curators at major libraries the world over—including Brewster Kahle, who considers his Internet Archive a library—take the same view.21 Gildas Illien has said of his role as web archivist at the Bibliothèque Nationale de France, ‘Our job is memory.’22
We all share that job in common. The owner of the Picturelife app, when faced with users’ dismay at losing their photos, replied that he kept a physical backup of any data he stored in the cloud and encouraged others to do the same.23 Picturelife’s terms of service are untraceable (even by Time Travel) now that the app is defunct. But among the terms of use laid out by Imgur, another photo-sharing app, is this: ‘Make sure to keep a backup of any photos you upload to Imgur—the company is not responsible, and cannot be held liable, if your pictures are somehow deleted from the company’s servers.24 In other words, keep a backup. Even behemoths such as Google and Facebook offer functions for ‘data liberation’ so that users can easily export and back up their content.25 But does anybody back up, or do we outsource care as well as memory, trusting blindly in the cloud? Or it may be that, in the digital age, permanence ranks, with privacy, as increasingly irrelevant.
The potential problem with efforts at digital preservation, whether visionary or domestic, is that they are themselves digital. Realistically, though, how else can we hope to preserve this moment in history? Martin Kunze, an Austrian ceramicist who believes that digital archives are doomed to fail, has a plan as ambitious as Brewster Kahle’s: to capture ‘the essence of our time’26 on tablets and ‘microfilms’ made of clay, a material on which written records have survived for 5000 years or more. Kunze’s Memory of Mankind time capsule will be stored deep in a salt mine, with details of its location distributed to contributors around the world, so as to perpetuate knowledge of its existence.27
Also addressing the possibility of an analogue future is the Long Now Foundation, founded in San Francisco the same year as the Internet Archive. With a view to ‘creatively foster[ing] responsibility’ over the next 10,000 years (the Long Now), the foundation promotes ‘slower/better’ as an alternative to the prevalent ‘faster/cheaper’ way of thinking. One of its first endeavours is the Rosetta Project, which aims to document 1500 endangered languages on a nickel alloy disk, readable by microscope and with a life expectancy of 2000 years. For shorter term reference, the Rosetta database will also be available as an online archive and in book form.28
And yeah, books—haven’t they proven to be fairly sound vessels for cultural conveyance—not for thousands of years perhaps (not yet), but for hundreds? The Future Library project, Katie Paterson’s sanguine public artwork in Oslo, Norway, seems to guarantee the book’s continuance at least a hundred years from now. A thousand trees have been planted, to be harvested in 2114 and turned into books. Every year until then, an original story will be commissioned from a leading writer—Margaret Atwood was the first, in 2015—and their manuscripts kept, unread, in the Oslo Public Library until printed as a limited-edition anthology on paper from the thousand trees.29
I can’t help but feel that, in all this talk of preservation, there’s a strained intentionality that’s inimical to the way humans usually think and act. And does it miss the point, calling our attention to the wrong thing, both in the moment and in retrospect?
It is memory that’s the basis for a life that has off-screen substance; when all else fails, it’s our proof of existence, of agency and belonging. And memory plays no small part in the process of synthesis that amounts to intelligence. Our growing reliance on digital access to our memories and to the knowledge-bank on which we draw for thought and action means that human intelligence is, increasingly, artificial intelligence. Machine-learning is a two-way process. The point at which our technology merges with us—the so-called Singularity—will arrive not when technology surpasses us at being human, but when we have been sufficiently diminished by it.
What if our best hope of saving our moment in time from future obscurity is to preserve, in the present, the integrity of our non-digital selves? Just as Aldous Huxley advocated, as an alternative to mescalin, practising ‘the right kind of constant and unstrained alertness’,30 so might we, by keeping attention and memory in-house, ensure the long-term viability of our own content.
And supposing that, after all, technology and human memory both were to fail, leaving far-future generations to piece together our time from random traces, what’s the worst that could happen? In playwright Anne Washburn’s post-apocalyptic vision, Mr Burns, a Post-Electric Play, Bart Simpson is deified—not such a shocking prospect in the age of Trump. Folklore and superstition would flourish, just as they do online. And people would be left, as we hardly ever are, more to wonder than to know. •
- See <www.dailymail.co.uk/news/article-4331544/Brexit-Act-WON-T-printed-posterity-vellum.html>
- See <https://www.nytimes.com/2014/06/15/science/space/calling-back-a-zombie-ship-from-the-graveyard-of-space.html>.
- Digital Amnesia (video), VPRO Backlight, September 2014, quote @ 21’57”, <www.newphilosopher.com/videos/digital-amnesia/>.
- See <http://www.nytimes.com/1989/07/04/science/oracle-bones-testify-to-an-ancient-eclipse.html>.
- See <http://www.nytimes.com/1989/07/04/science/oracle-bones-testify-to-an-ancient-eclipse.html>
- See <www.backblaze.com/blog/the-future-of-cloud-backup/>.
- Based on an example quoted in Mark Sullivan, ‘The “Archive Team” Rescues User Content from Doomed Sites’, PC World, 12 April 2012, <www.pcworld.com/article/253672/the_archive_team_rescues_user_content_from_doomed_sites.html>.
- See <http://wayback.archive-it.org/all/20131004231254/http://www.picturelife.com>.
- Jill Lepore, ‘The Cobweb: Can the Internet Be Archived?’, New Yorker, 26 January 2015, <www.newyorker.com/magazine/2015/01/26/cobweb>; Jones et al., ‘Scholarly Context Adrift: Three Out of Four URI References Lead to Changed Content’, <https://doi.org/10.1371/journal/pone.0167475>.
- Brewster Kahle of the Internet Archive, interviewed in Digital Amnesia, quote @ 9’31”.
- See, for example <https://theconversation.com/outsourcing-memory-the-internet-has-changed-how-we-remember-10871>.
- See <www.theatlantic.com/magazine/archive/2008/07/is-google-making-us-stupid/306868/>.
- Plato, The Phaedrus, <www.units.miamioh.edu/technologyandhumanities/plato.htm>.
- See <www.brainscape.com/blog/2012/01/memory-outsourcing-google-effect/>.
- Brewster Kahle, ‘Help Us Keep the Archive Free, Accessible, and Reader Private’, 29 November 2016, <http://blog.archive.org/2016/11/29/help-us-keep-the-archive-free-accessible-and-private/>.
- See <http://timetravel.mementoweb.org/about/>.
- See <www.archiveteam.org>.
- See <www.rtve.es/noticias/20130527/archive-team-superheroes-evitan-webs-caigan-olvido/673120.shtml>.
- ‘Jason Scott’s Archive Team is Saving the Web from Itself (and Rescuing Your Stuff)’, Bianca Bosker, Huffington Post—www.huffingtonpost.com.au/entry/jason-scott-archive-team_n_2965368; Matt Schwartz, ‘Fire in the Library’, MIT Technology Review, January–February 2012, <www.technologyreview.com/s/426434/fire-in-the-library/>.
- See <www.archiveteam.org/index.php?title+Why_Back_Up%3F>.
- For example, Doug Reside, New York Public Library, quoted in <www.huffingtonpost.com.au/entry/jason-scott-archive-team_n_2965368>; Jeremy Leighton John, British Library, quoted in <www.newscientist.com/article/dn20445-digital-legacy-respecting-the-digital-dead/>; Brewster Kahle, ‘Help Us Keep the Archive Free, Accessible, and Reader Private’, 29 November 2016, <http://blog.archive.org/2016/11/29/help-us-keep-the-archive-free-accessible-and-private/>.
- Quoted in Lepore, ‘The Cobweb’.
- ‘Reply All’ podcast, episode 71, 27 July 2016, ‘The Picture Taker’ (transcript), <https://gimletmedia.com/episode/71-the-picture-taker/>.
- Quoted at <www.digitaltrends.com/web/terms-and-conditions-imgur/>.
- Schwartz, ‘Fire in the Library’
- See <https://www.memory-of-mankind.com/>.
- Richard Kemeny, ‘All of Human Knowledge Buried in a Salt Mine’, Atlantic, 9 January 2017, <https://www.theatlantic.com/technology/archive/2017/01/human-knowledge-salt-mine/512552/>.
- Wikipedia, The Rosetta Project.
- Wikipedia, The Future Library project.
- Aldous Huxley, The Doors of Perception (1954), p. 13, <https://archive.org/stream/Huxley_Aldous_-_The_Doors_of_Perception/Huxley_Aldous_-_The_Doors_of_Perception_djvu.txt>.