Mar 142014

All digital storage media–hard drives, flash disks, CD-ROMs, and the like–have a short life.  This is why digital preservation requires active management, including regular migration of content from older storage devices to newer devices.

Do you have a back-up plan?

Do you have a back-up plan? by Images by John ‘K’, on Flickr

Individuals face an especially serious challenge.  Unlike many organizations, people at home typically do not have special services to guard their digital data from loss or corruption.

Another way to put it is that everyone is now their own digital archivist.  If you don’t attend to preserving your own digital photographs, videos, email, social media and so on, there is an excellent chance they will be lost.

And, unlike what some vendors imply, relying solely on the cloud is not foolproof. A commercial service can choose to pull the plug–literally–on a cloud service at any time.  If you want to keep it, you need to take responsibility for it.

Individual users need to know that the life of storage media are cut short by at least three factors:

  1. Media durability.
  2. Media usage, storage and handling.
  3. Media obsolescence.

Media Durability

Computer storage media devices vary in how long they last. The quality and construction of individual media items differ widely. The following estimates for media life are approximate; a specific item can easily last longer–or fail much sooner.

  • Floppy disk: 3-5 years.  Though no longer made, many still exist; examples include 8”, 5.25” and 3.5” disks, along with items such as Zip and Jaz disks.
  • Flash media: 1-10 years.  This category includes USB flash drives (also known as jump drives or thumb drives), SD/SDHC cards and solid-state drives; all generally are less reliable than traditional spinning-disk hard drives.
  • Hard drive: 2-8 years.  The health of a spinning disk hard drive often depends on the environment; excessive heat, for example, can lead to quick failure.
  • CD/DVD/Blu-ray optical disk: 2-10 years.  There is large variation in the quality of optical media; note that “burnable” discs typically have a shorter life than “factory pressed” discs).
  • Magnetic tape: 10-30 years.  Tape is a more expensive storage option for most users–it depends on specialty equipment–but it is the most reliable media available.

Media use handling and storage

People have a direct impact on the lives of storage media:

  • The more often media are handled and used, the greater the chance they will fail; careful handling can extend media life, rough handling has the opposite effect.
  • Stable and moderate temperature and humidity, along with protection from harmful elements (such as sun and salt) helps keep media alive.
  • Good-quality readers and other hardware media connections are beneficial; poor connections can kill media quickly.
  • Media that are not labeled or safely stored can be lost or accidentally thrown away.
  • Fires, floods and other disasters are very bad for media!

Media obsolescence

Computer technology changes very quickly.  Commonly used storage media can become obsolete within a few years.  Current and future computers may not:

  • Have drives that can read older media.
  • Have hardware connections that can attach to older media (or media drives).
  • Have device drivers that can recognize older media hardware.
  • Have software that can read older files on media.

What you need to do

Actively manage your important digital content!  Steps to consider:

  • Have at least two separate copies of your content on separate media—more copies are better.
  • Use different kinds of media (DVDs, CDs, portable hard drives, thumb drives or internet cloud storage);  use reputable vendors and products.
  • Store media copies in different locations that are as physically far apart as practical.
  • Label media properly and keep in secure locations (such as with important papers).
  • Create new archival media copies at least every five years to avoid data loss.

For more information

  1. Care and Handling of CDs and DVDs —A Guide for Librarians and Archivists
  2. Digital Media Life Expectancy and Care
  3. Do Burned CDs Have a Short Life Span?
  4. Mag Tape Life Expectancy 10-30 years
  5. Personal Archiving: Preserving Your Digital Memories (Library of Congress)
  6. Retro Media: Memory (and Memories) Lost; Which of these media will be readable in 10 years?  50 years?  150 years?
  7. Care, Handling and Storage of Removable media (UK National Archives)
  8. Do You Have a Back-up Plan?
  9. Selecting and managing storage media for digital 

    public records guideline (Queensland State Archives)

Note: This is adopted from information developed for at the Library of Congress; post updated: originally published in Jan. 2011

Jan 272011

Here is a tale from my personal experience that illustrates both the peril and promise of keeping digital information over time.

Lost & Found
Lost & Found, by Thomas Hawk, on Flickr

Starting in 1996 I put out a weekly e-mail newsletter called Culture in Cyberspace.  I used it to report on websites that I found interesting and also offered thoughts about the impact of information technology on society.  It was a small effort that had a mailing list of about 3,000 people when I shut it down in 1997.  I was starting a new day job, and wanted to fully invest myself there; in retrospect, I wish I had stuck with CinC, but that’s another story.

Anyway, this before the advent of blogs and there were comparatively few people using the internet to publish thoughts about the medium.  As a result, some of what I said got some modest attention.  I was interviewed for a story in Ms. Magazine about Cyber-Rape.  I made my way on to university class reading lists, including this one.   I was footnoted in a academic article, Scholarly Communication and Electronic Publication: Implications for Research, Advancement, and Promotion.  My observations apparently made it into the pricey journal Convergence: The International Journal of Research into New Media Technologies–at least Google claims I did; I’m not willing to pay $25 to the publisher to access the article for conformation.  Now, it may seem self-aggrandizing to draw attention to these small events from so long ago.  And while I confess to some lingering pride, my main point is that despite the obscurity and the age of my words, Google can still find many of them.

This is a good thing, because I lost about half of the original files.  I had them backed up on my laptop hard drive and on multiple sets of floppy disks, but must confess to falling short of proper personal archive management.  When I got a new laptop I neglected to copy the files before selling the old laptop on eBay.  I kept the backups with a large collection of floppy disks that I all but forgot about in the transition over to recordable CD-ROMs and flash drives.  When I eventually tried to access the disks 10 years later, they had errors and I could only retrieve some of the content.  I turned to Google as a last resort and was frankly amazed at how much still existed in the ether.

This is obviously not an ideal archival arrangement.  There is, for example, no assurance that the words purported to be mine are 100 percent authentic or are presented in what I would consider the right context.  The bigger issue is fragmentation: the Google CinC corpus is patchy in the extreme, presenting a range of information from brief mentions to, to excerpts, to complete issues.  It is a bit like trying to make sense of an ancient cuneiform library that has been smashed and scattered.  Still, for me in this case, it is far better to have the pieces than nothing.

Postmodern Philosophy Lulz #04 - Marshall McLuhan & A Cat With Cheese On Its Face
Postmodern Philosophy Lulz #04 – Marshall McLuhan & A Cat With Cheese On Its Face

Is this how future researchers will experience our world?  Perhaps.  We crank out gigantic quantities of digital documentation about every possible topic, often with no solid plan for how to preserve it.  The vastly distributed nature of the internet ensures that pieces of our collective output persist, and powerful search technology is adept at zeroing in on the tiniest fragments. This is like randomly grabbing chunks of information, throwing it in a digital Cuisinart and hitting “liquefy.”  At what point does the original structure and meaning of the information break down and combine into something different?

Maybe this is the wrong question, however.  One might ask instead: how much will anyone care in the future about hazy issues relating to authenticity, context and original intent?  One of the old rescued CinC articles recorded my thoughts on this very subject in connection with the then-recent book, Life on the Screen by Sherry Turkle.

Turkle is, to my mind, without peer in her ability to probe how computers are changing us.  In Life on the Screen, she lucidly explained how information technology is facilitating a shift from respect for rational, systematic thought  to an embrace of  personal experience by the ability people have via the web to explore, rearrange, and reinterpret information.  We are becoming much more willing to accept and endorse subjective experience than to filter perception through ideas about what is “right” or “wrong.”  Turkle’s new book, Alone Together, explores these ideas further, and I look forward to reviewing it in a future post.

My experience with losing data, and then finding some of it on the internet, gives rise to a host of thoughts.  If I wanted to be totally pretentious I could say that it revealed the boundary between the Modern and the Postmodern (even though I can’t say I precisely know what those terms mean).  But most of all, I am left wishing that I had just done a better job preserving my digital files.

Jan 192011

I recently posted some information about the short life expectancy of digital media.  Since then, I have run across another source that provides some excellent insight into the fragile existence of recordable compact discs (CD-Rs) and recordable digital versatile/video discs (DVD±Rs).

The Canadian Conservation Institute publication Longevity of Recordable CDs and DVDs provides authoritative information.  Explained are the various factors that influence how long a disc lasts, which include disc quality, recording methods, handling and storage.

I found especially useful a listing of “the relative stability of optical disc formats.”  Formats are listed from most to least stable:

  1. CD-R (phthalocyanine dye, gold metal layer)
  2. CD-R (phthalocyanine dye, silver alloy metal layer)
  3. DVD-R (gold metal layer)
  4. CD (read-only, e.g. audio CD)
  5. DVD (read-only, e.g. movie DVD)
  6. DVD-R (silver alloy metal layer)
  7. CD-RW
  8. CD-R (azo dye, silver alloy metal layer)
  9. CD-R (cyanine dye, silver alloy metal layer)
  10. DVD-RW

It is sobering to realize how many ways that optical discs can give up the ghost.  Unfortunately, other digital storage media also have their shortcomings in terms of longevity.  The best advice is still to have multiple copies of important data stored on different kinds of media.

Large drops of water on a dvd disk by Horia Varlan, on Flickr

Large drops of water on a dvd disk by Horia Varlan, on Flickr

Jan 182011

The risk of computer hard disk failure is fairly well known: the disk crashes and your computer stops working.

Corrupt 6, by gusset, on Flickr
Corrupt 6, by gusset, on Flickr

Less well-known is a phenomenon known as silent data corruption, where an undetected error occurs in content stored on a drive.  Errors creep in from bugs in both software and hardware (firmware).  “Silent” means the drive does not report it, even in situations where special precautions such as multiple redundant disks (RAID) are used.  The problem remains unknown until you attempt to retrieve the data.

An Analysis of Data Corruption in the Storage Stack, A 2008 study of 1.53 million disk drives over a period of 41 months, found 400,000 silent errors.   That is a disturbingly high number, even though the overall percentage of bad to good data was very small.  The bad news for personal users is that the type of hard disk they are most likely to have–a SATA drive–has a failure rate that is an order of magnitude large than more expensive “enterprise class” drives.

If you have important digital data that you need to keep for a long time the best thing to do is to keep multiple copies in different places and stored on different kinds of media, even different kinds of hard disks.  As I wrote earlier, all varies of digital storage media will eventually fail, so it is essential to have replicated copies.

The more complicated issue is how to detect and fix silent errors.  The most common method is to use a checksum–an unique numerical code for each file–to find bad data.  Once an error is found the next step is to “scrub” it, a process where bad data is replaced by good data from a trusted source.

Individual users have limited choices in this regard, unfortunately.  Checksum comparisons and scrubbing require advanced knowledge and can take a long time.   Commercial data recovery services might be able to help, but they charge a hefty premium.  A cloud storage provider may–or may  not–provide the service; if it does, and if the service runs automatically in the background, this is a fact very much in its favor.

Jan 132011
His Master's Voice by Metrix X, on Flickr
His Master’s Voice by Metrix X, on Flickr

If you are the keeper of important audio recordings—grooved disks, magnetic tape, CD-ROM and the like—you may wish to carefully assess their physical condition at some point.  This assessment can be for the purpose of a basic inventory, identification of preservation needs or setting priorities for activities such as creating digital copies.

Before you get started, you may wish to consult Issues and Answers in Digitization: Audio: Digitizing for the Future, which summarizes a Library of Congress workshop held in December 2010.  This document describes four preservation assessment tools.  The tools are designed identify preservation issues for various types of recording media.

  1. Visual and Playback Inspection Ratings System (ViPIRS), New York University Libraries.  For magnetic media (videotape, audiocassettes, and 1/4″ reel-to-reel). Assesses the condition of the item, the item’s ability to be played back, and the ease or difficulty of conserving/preserving/reformatting the item. The accumulated score at the end of the inspection generates a numerical rating that informs the user on what steps need next be taken in the preservation process.  For basic to intermediate users.
  2. Audio/Moving Image Survey Instrument, Columbia University Library.  Provides a mechanism for setting preservation priorities based on (1) quantities and types of audio and moving image materials, (2) the physical condition of the media and their housings based on visual inspection, (3) information about existing levels of intellectual control and intellectual property rights, and (4) the potential research value of each collection. .  For basic to intermediate users.
  3. Field Audio Collection Evaluation Tool (FACET), Indiana University Digital Library Program.  Ranks audio field collections based on preservation condition, including the level of deterioration they exhibit and the degree of risk they carry. It assesses the characteristics, preservation problems, and modes of deterioration associated with the following formats: open reel tape (polyester, acetate, paper and PVC bases), analog audio cassettes, DAT (Digital Audio Tape), lacquer discs, aluminum discs, and wire recordings.  For advanced users.
  4. Audiovisual Self Assessment Program (AvSAP), University of Illinois at Urbana-Champaign Library. Helps identify format type, physical condition, and storage conditions.  Available either as web-based software or as part of a larger Archon software package, which calls for some technical knowledge to install and configure.   For basic to intermediate users.
Jan 062011

The first wave of desktop computers users are getting old.  As a cohort, they are retiring from their jobs, downsizing their homes and, maybe, passing on important digital data.

5.25" Floppy Disk Drive
5.25″ Floppy Disk Drive by Accretion Disc, on Flickr

Chances are good that some of this information will be stored on relics from a bygone era: floppy disks, Zip drives, tape cartridges and the like.

Any library, archives or museum–or any person–who might be the recipient of such bounty should consider their options, of which there are three:

  1. Use a commercial service to transfer the information to more modern media.
  2. Acquire some older equipment to do the job yourself.
  3. Do nothing and hope for the best.

All of these choices have associated risk.  A service can be expensive; getting your own devices can be a challenge; and doing nothing is–well, doing nothing.  A more general threat hovers over things as well: the longer the wait to transfer information, the greater the chance the original media will degrade and lose data.

Anyone who has to deal with older information might want to think about hedging their bets by acquiring  some equipment to access obsolete media.  This can be a complicated process involving a slew of gear, common and uncommon (Bernoulli Box, anyone?).

For the sake of brevity, I’d say there are four basic media readers:

These can be hard to find.  A quick search of eBay data for the past three months shows that only 49 5.25 inch drives of all types were available for sale and that “new” or “mint” drives can sell for nearly $60.

Getting a drive to read media is the beginning.  You will, of course, need to connect it to a computer. There are an alphabet soup of potential connections:  ATAPI, SCSI and USB, for example, and modern computers may not be compatible.  After hooking up an older drive, you may also need to find a specific device driver to use it. And, when everything is up and running, you will need to carefully plan how to work with the old media, whose condition is frail and content unique.

The time and effort needed to gear up for older media might make the difference between having enduring access to older information or, sadly, having no access at all.

Jan 032011

Ars Technica is one of the best sources anywhere for insight into technology and its ever-expanding impact.  I was especially pleased that the site ran 10 separate articles about digital preservation during the past year.

3-D rendering of a graphene hole
3-D rendering of a graphene hole by LBNL, on Flickr

Special credit goes to John Timmer, “Science Editor et Observatory moderator.”  He wrote six excellent pieces on the challenge of preserving and providing meaningful access to scientific data.  He treats the issue superbly, bringing it to life using his real-life experience as a genetics and biology laboratory  researcher.

Timmer put together a three-part series on scientific data preservation.  Part I: Preserving science: what to do with raw research material? refers to the recent fuss about the UK’s Climatic Research Unit, particularly its messy data management.  “Poorly commented computer code. Data scattered among files with difficult-to-fathom formats…  But the chaos, confused record keeping, and data that’s gone missing-in-action sounded unfortunately familiar to many researchers, who could often supply an anecdote that started with the phrase “if you think that’s bad…”

In Part II: Preserving science: what data do we keep? What do we discard?, he tackles one of the most sensitive—and vexing—issues out there.  “The reality is that we simply can’t save everything. And, as a result, scientists have to fall back on judgment calls, both professional and otherwise, in determining what to keep and how to keep it.”

The inescapable matter of digital media obsolescence is considered in Part III: Jaz drives, spiral notebooks, and SCSI: how we lose scientific data.  “Over the course of my research career, archiving involved magneto-optical disks, a flirtation with Zip and Jaz drives (which ended when some data was lost by said drives), a return to big magneto-optical disks, and then a shift to CDs and DVDs. Interfaces also went from SCSI to Firewire to USB. Anything that wasn’t carefully moved forward to the new formats was simply left behind.”

Wired UK - NDNAD Infographic
Wired UK – NDNAD Infographic by blprnt_van, on Flickr

Timmer also weighed in on Changing software, hardware a nightmare for tracking scientific data.  “My work relied on desktop software packages that were discontinued, along with plenty of incompatible file formats. The key message is that, for even careful researchers, forces beyond their control can eliminate any chance of reproducing computerized analyses, sometimes within a matter of months.”

How science funding is putting scientific data at risk highlighted the stark reality that adequate money is all too frequently not provided to maintain important data.   Keeping computers from ending science’s reproducibility explores a huge barrier that gets in the way of confirming research results.  “Traditional science involves a complex pipeline of software tools; reproducing it will require version control for both software and data, along with careful documentation of the precise parameters used at every step.”  But “this work may run up against the issues of data preservation, as older information may reside on media that’s no longer supported or in file formats that are difficult to read.”

Doom Install Disks
Doom Install Disks by Matt Schilder, on Flickr

Ars ran two articles about preserving video games. The first, Preserving games comes with legal, technical problems referred to a paper in the International Journal of Digital Curation, Keeping the Game Alive: Evaluating Strategies for the Preservation of Console Video Games.  “Hardware becomes outdated and the media that houses game code becomes obsolete, not to mention the legal issues with emulation.”

The second, Saving “virtual worlds” from extinction, discussed Preserving Virtual Worlds, a project at the University of Illinois at Urbana-Champaign.

The final two articles focused on Library of Congress actions  (full disclosure: I work with the Library digital preservation team).  Why the Library of Congress cares about archiving our tweets delved into the huge interest that flowed from the Library’s announcement about acquiring the Twitter archives.  Historic audio at risk, thanks to bad copyright laws discussed a report from the National Recording Preservation Board about problems preserving the complex digital formats that underlie much of today’s music .

Let’s hope that Ars continues its coverage of digital preservation into 2011.  There is quite a bit to talk about.