Mar 142014
 

All digital storage media–hard drives, flash disks, CD-ROMs, and the like–have a short life.  This is why digital preservation requires active management, including regular migration of content from older storage devices to newer devices.

Do you have a back-up plan?

Do you have a back-up plan? by Images by John ‘K’, on Flickr

Individuals face an especially serious challenge.  Unlike many organizations, people at home typically do not have special services to guard their digital data from loss or corruption.

Another way to put it is that everyone is now their own digital archivist.  If you don’t attend to preserving your own digital photographs, videos, email, social media and so on, there is an excellent chance they will be lost.

And, unlike what some vendors imply, relying solely on the cloud is not foolproof. A commercial service can choose to pull the plug–literally–on a cloud service at any time.  If you want to keep it, you need to take responsibility for it.

Individual users need to know that the life of storage media are cut short by at least three factors:

  1. Media durability.
  2. Media usage, storage and handling.
  3. Media obsolescence.

Media Durability

Computer storage media devices vary in how long they last. The quality and construction of individual media items differ widely. The following estimates for media life are approximate; a specific item can easily last longer–or fail much sooner.

  • Floppy disk: 3-5 years.  Though no longer made, many still exist; examples include 8”, 5.25” and 3.5” disks, along with items such as Zip and Jaz disks.
  • Flash media: 1-10 years.  This category includes USB flash drives (also known as jump drives or thumb drives), SD/SDHC cards and solid-state drives; all generally are less reliable than traditional spinning-disk hard drives.
  • Hard drive: 2-8 years.  The health of a spinning disk hard drive often depends on the environment; excessive heat, for example, can lead to quick failure.
  • CD/DVD/Blu-ray optical disk: 2-10 years.  There is large variation in the quality of optical media; note that “burnable” discs typically have a shorter life than “factory pressed” discs).
  • Magnetic tape: 10-30 years.  Tape is a more expensive storage option for most users–it depends on specialty equipment–but it is the most reliable media available.

Media use handling and storage

People have a direct impact on the lives of storage media:

  • The more often media are handled and used, the greater the chance they will fail; careful handling can extend media life, rough handling has the opposite effect.
  • Stable and moderate temperature and humidity, along with protection from harmful elements (such as sun and salt) helps keep media alive.
  • Good-quality readers and other hardware media connections are beneficial; poor connections can kill media quickly.
  • Media that are not labeled or safely stored can be lost or accidentally thrown away.
  • Fires, floods and other disasters are very bad for media!

Media obsolescence

Computer technology changes very quickly.  Commonly used storage media can become obsolete within a few years.  Current and future computers may not:

  • Have drives that can read older media.
  • Have hardware connections that can attach to older media (or media drives).
  • Have device drivers that can recognize older media hardware.
  • Have software that can read older files on media.

What you need to do

Actively manage your important digital content!  Steps to consider:

  • Have at least two separate copies of your content on separate media—more copies are better.
  • Use different kinds of media (DVDs, CDs, portable hard drives, thumb drives or internet cloud storage);  use reputable vendors and products.
  • Store media copies in different locations that are as physically far apart as practical.
  • Label media properly and keep in secure locations (such as with important papers).
  • Create new archival media copies at least every five years to avoid data loss.

For more information

  1. Care and Handling of CDs and DVDs —A Guide for Librarians and Archivists
  2. Digital Media Life Expectancy and Care
  3. Do Burned CDs Have a Short Life Span?
  4. Mag Tape Life Expectancy 10-30 years
  5. Personal Archiving: Preserving Your Digital Memories (Library of Congress)
  6. Retro Media: Memory (and Memories) Lost; Which of these media will be readable in 10 years?  50 years?  150 years?
  7. Care, Handling and Storage of Removable media (UK National Archives)
  8. Do You Have a Back-up Plan?
  9. Selecting and managing storage media for digital 

    public records guideline (Queensland State Archives)


Note: This is adopted from information developed for digitalpreservation.gov at the Library of Congress; post updated: originally published in Jan. 2011

Oct 302011
 

There is a polite but persistent disagreement among librarians, archivists and other normally peaceful souls who care about keeping digital information accessible into the future.  The conflict is low key, as one might expect: no one is occupying reading rooms, much less being led away in plastic handcuffs. But there are few signs that all parties are ready to reconcile.

Keep... - Day 50, by MarkAllanson, on Flickr

Keep… – Day 50, by MarkAllanson, on Flickr

The dispute is over the proper term to use for the act of keeping digital content alive over time.  One group sticks with “digital preservation,” which is a blanket term of choice dating to at least the early 1990s.  Some insurgents have been urging use of “digital curation” during the last few years, stating, among other things, that the term is more comprehensive.  The idea is that curation covers the full life cycle of information, from creation through to access.  Another group adheres to “digital stewardship” also to convey the broad set of responsibilities involved, as well as to impart the necessity of ongoing, active engagement needed to keep digital information in a form that is authoritative, uncorrupted and useful.  (Full disclosure: I work for the Library of Congress and had a hand in launching the National Digital Stewardship Alliance).

The story doesn’t end here, as there are a bunch of other competing terms.  “Archive” is an old standby that can variously mean transfer to a preservation repository or merely to store data offline.  “Digital conservation” pops up from time in quite different circumstances.  A more recent entrant is “data management,” which is associated with a recent U.S. National Science Foundation requirement for funding proposals to include a plan for dissemination and sharing of data based on the research.

To make matters even more complicated, the same terms preservationists use are used by others in completely different contexts.  “Digital curation” is hot buzzword that huge numbers of people associate with picking content for distribution via the web.  “Digital preservation” is also used to mean constructing 3D scans of historic sites, such as Ft. Laramie, WY.

Friedrich Wilhelm Nietzsche painted portrait _DDC1256, by Abode of Chaos, on Flickr

Friedrich Wilhelm Nietzsche painted portrait _DDC1256, by Abode of Chaos, on Flickr

Is it a problem that there are so many competing terms for what non-experts (particularly the ones who control funding) might reasonably consider to be approximately the same activity?  I’m of two minds here.  Part of me yearns for more clarity (through adoption of my favorite definition, of course).  But another part of me recognizes the inevitability of multiple terms because there are multiple perspectives, including about how best to get attention and support.  “We have a problem with language in this domain at the moment,” Steve Knight declared at a conference in 2008.  He said the term “permanent access” more accurately communicates what the preservation community is trying to do.  More to the point, Chris Rusbridge bluntly titled a blog post “Digital Preservation” term considered harmful?  and went on to note that “the language used and the way that the discourse is constructed [in the digital preservation community] is unlikely to make much impact on either decision-makers or the creators of the digital information.”  The author suggests “selling the outcomes” through use of terms like “long term accessibility” or “usability over time.”  More recently yet Kari Kraus opined in the New York Times that “we must replace digital preservation with digital curation,” if we are to make effective use of data worth saving.

All of this is perhaps part of a larger contemporary social force that resists firm definitions.  Politicians, for example, are continuously “redefining” themselves.  There are ongoing debates about how (or even whether) to define race for college applications.  It’s even  impossible to define the value of stretching before a workout.

Closer to home, Fred Gibbs recently analyzed 170 definitions of “the digital humanities,”  and was relieved to find that they fell into only nine categories (one of which was “refusals to define the term”).  It may well be that we live an in age where conceptual boundaries are weak, especially for emergent practices and ideas. This contrasts with earlier intellectual intentions: consider this bit from Friedrich Nietzsche and the Politics of Transfiguration, by Tracy B. Strong (thank you, Google books, for giving me the means to quote this juicy highbrow stuff):

The ability to give names–to extend control of language over the world… [is] a masterly trait: it consists of saying what the world is.  The reverse proposition will also be accurate: knowledge of the power of language may lead to the a prohibition on the use of certain names…. To name is to define and to bring under control, to give determination of the being of the object in question.  The allocation of names creates the world in the image of he who names.

Our inability “to define and bring under control” the concept underlying digital preservation is a sign of the times.  And while some  press for a favorite term, others adopt a more accepting stance.  Embedded in the lengthy discussion that is Semantics: Digital Preservation vs. Digital Curation, Chris Prom outlines issues with various terms, but also offers some down to earth advice: “I think we all need to be comfortable with using a lot of different terms to explain what we do.”

En Control, by alvaroalegria, on Flickr

En Control, by alvaroalegria, on Flickr

This perspective is evident in a brand new Association of Research Libraries publication Digital Preservation, SPEC Kit 325.  While the title embraces one term, the executive summary takes pains to mention others, including digital curation, continued access and life cycle curation.  The report also notes that “the definition of ‘digital preservation’ is still murky for some librarians.  A number of respondents confused ‘backups’ with ‘preservation’ and referred to access-oriented repository services as though they were preservation solutions.”

Personally, I like the ARL approach and feel that, for all its limitations, “digital preservation” is still the best catch-all term.  But there are other terms that fit the concept also, and we need to be fully aware of their nuances.  There will be a continuing need to apply all the current terms–and probably some new ones–in the course of managing digital information into the future.

Feb 072011
 

The time had finally arrived: I had to bring some order to my large and disorderly collection of digital photographs.  I had been putting the task off for a long time in spite of the fact that I knew I was taking a risk in losing some to various digital calamities. Now, after having done the work, I’m pleased to say that it was easier than I expected.

I should the work was easy after I spent a fair bit of time tracking down the right tools and figuring out a method to do the job.  In the hope that I can save others some time, I’ve laid out my step-by-step process below, including the tools I used with Windows Vista.  Most of the tools have versions for other operating systems, and are free to download and use.

But before grabbing any tools, take some time to decide on a basic process for doing the job.  I first had to think about the steps involved; the professional term for this is “workflow.”  The steps suggested by the Library of Congress are one way to go.  (Full disclosure: I work with the team that established these guidelines.)

The steps are:

  • Identify
  • Decide
  • Organize
  • Make copies and store in different places
Family Fun

Family Fun

Identify. This was the hardest step for me.  I started taking digital photos in 2003 and had pictures scattered across a laptop, two desktops, two portable hard drives and a clutch of Secure Digital (SD) flash media cards.  I also had dozens of pictures on Flickr. Most of the shots were of family and friends, but there were also pictures of places, animals and other things.

I had grouped pictures into file folders using random subject names.  I also had copies of the same pictures in multiple places.  Plus I had not fully identified pictures nor deleted all the blurry or repetitive shots.  The end result: many hundreds of semi-organized pictures, incompletely described, with many duplicates.

My head throbbed when I realized the extent of the situation.  “No wonder I’ve been putting this job off,” I thought.  This quickly passed, however, as I counted my blessings.  I still had all the pictures that mattered.  It would have been so easy for a hard drive to fail and to loose many of them forever.  Time to get going.

Sample File Directory
Sample File Directory on Dropbox

The first thing I did was to install Dropbox, a tool that copies and syncs files among multiple computers.  It also stores a copy of the files on a web server in the cloud.

Dropbox gave me a simple backup system and also let me manage all my photos using either my laptop or desktop.  The program provides two gigabytes of storage for free; more space is available for a fee.

Next, I created a new file folder under “MyDropbox” named “all_photos” on my desktop computer.   I then made subfolders for  each of my multiple devices and locations where I had digital photographs.  After installing Dropbox on my other computers, I copied all the photographs into the appropriate subfolder.  The same process applies for the external hard drives: hook them up and copy the files.

Remember to copy the files, not move them.  Copy leaves the original files in place, while move deletes the originals.  You want the originals as a backup in case you make any mistakes with the steps that follow. You can delete them when finished, if you wish.

I still had the pictures on Flickr.  I used Flump to download them to Dropbox.  There are similar tools available, but Flump is easy to install and use.  It does require Adobe AIR 1.1, also free, to be installed first.  Note that the tool will only download the original image–it won’t bring back any different sizes and it won’t capture the tags or any other descriptive information that you entered through Flickr.

Dropbox will create identical copies of the folders and files on all the computers in which it is installed.  This might take some time to complete depending on the speed of your internet connection and the size of your collection.

Decide. Now that I had everything in one location the next step was to figure out what to keep.  There are a range of choices here.  You can choose not to choose: just keep everything, which has the advantage of saving time at this stage.  The problem with this approach is that you may have lots of copies of the same image, or can also have many shots that essentially duplicate the same scene.  These duplicative images add little to the content of you collection, take up extra space and can cause extra work and confusion later on.

I decided to delete duplicative images.  I used Visipics to identify them.  The program has settings to determine how strictly it determines what a duplicate is and also has an auto-select mode.  All detected duplicates are shown side by side with information such as file name, type and size.  You can also manually select the files you want to keep.  After running the program, I decided which images I really needed: in situations where I took six shots of basically the same thing I picked the one or two I liked best.  The best rule of thumb is to keep the highest resolution version an image, which is usually the original camera file.

Organize. This was the most time consuming step.  Actually, I have to confess that I’m still working on it.  I figure that it is better have full preservation of the files as quickly as possible while I continue to work on their management; I can always create updated preservation copies later on.  Plus I can work on individual pictures using Dropbox on either my desktop or my laptop, which is very convenient.  Organization involves several activities:

  • Give individual photos descriptive file names
  • Tag photos with names of people and descriptive subjects
  • Create a directory/folder structure on your computer to put the images you picked
  • Write a brief description of the directory structure and the photos

Copy files to archived_photos folder
Copy files to archived_photos folder

I started by deciding on a basic organization scheme for my photographs and by creating another set of file folders to implement it.

I named the main folder “archived_ photos” and created sub-folders for each year that I had pictures.  Under each year I used a few basic subject headings such as “around home,” “family visit,” “vacation” and so on.  Where necessary I created sub-folders under a heading.  I aimed to be as consistent as possible in using the same headings under different years.  This scheme serves two purposes.  First, it helped me organize my jumbled mass of pictures.  Second, it should help others use the collection in the future.

I used XnView to sort photographs into the right year and right subject heading, as well as to rename files. I chose a simple method of renaming involving subject names, but there are other more detailed options available. I also added captions.  Most of this work is done in the the XnView browse view, which shows a thumbnail version of the pictures along with the date it was taken.  This is all the information I needed to sort and caption the pictures, since I was the original photographer.

Another way to think about describing photos is as metadata.  Digital photo metadata is an extensive subject that is well worth learning more about.  Two great sources are photometadata.org and dpbestflow.org.

The next task was to copy the revised photos to the new “archived_photos” directory as shown in the picture.  XnView is fairly intuitive, but it is worth reading the User Guide to figure out how to get the most out of it in browse view.

I then created a written listing of my photos as they are organized under “archived_photos.”  The list is a handy tool for reference and also will be very important to my children or anyone else who will be interested in the photographs later.  I used TreeSize Free to print a full, multi-page list.  The report has a lots of details, some of which can seem cryptic, but it does the job well.

Make Copies and Store in Different Places. This is the most important of all the steps because this is how you ensure that you won’t lose your digital photo collection due to a single point of failure, such as a crashed computer.  The handy Dropbox program already made copies of my photos on my laptop and desktop, as well as on a web server somewhere off on the cloud.  For some added insurance I also copied the entire “archived_photos” folder to a portable hard drive, which I stored in my safety deposit box along with the printed file listing.

This is not the end of the story, of course. I continue to take digital pictures, so I periodically have to add new images to my central archived collection using the steps above.  I also quickly browse my Dropbox files and portable hard drive every year or so to make sure all appears well.  My plan is to replace all the storage devices for my photos every five years.  When I get a new desktop, laptop, tablet or whatever, the first thing I will do is to install Dropbox and sync all the photo files.  I’ll also get a new portable hard drive–or whatever device eventually replaces it.

Jan 272011
 

Here is a tale from my personal experience that illustrates both the peril and promise of keeping digital information over time.

Lost & Found
Lost & Found, by Thomas Hawk, on Flickr

Starting in 1996 I put out a weekly e-mail newsletter called Culture in Cyberspace.  I used it to report on websites that I found interesting and also offered thoughts about the impact of information technology on society.  It was a small effort that had a mailing list of about 3,000 people when I shut it down in 1997.  I was starting a new day job, and wanted to fully invest myself there; in retrospect, I wish I had stuck with CinC, but that’s another story.

Anyway, this before the advent of blogs and there were comparatively few people using the internet to publish thoughts about the medium.  As a result, some of what I said got some modest attention.  I was interviewed for a story in Ms. Magazine about Cyber-Rape.  I made my way on to university class reading lists, including this one.   I was footnoted in a academic article, Scholarly Communication and Electronic Publication: Implications for Research, Advancement, and Promotion.  My observations apparently made it into the pricey journal Convergence: The International Journal of Research into New Media Technologies–at least Google claims I did; I’m not willing to pay $25 to the publisher to access the article for conformation.  Now, it may seem self-aggrandizing to draw attention to these small events from so long ago.  And while I confess to some lingering pride, my main point is that despite the obscurity and the age of my words, Google can still find many of them.

This is a good thing, because I lost about half of the original files.  I had them backed up on my laptop hard drive and on multiple sets of floppy disks, but must confess to falling short of proper personal archive management.  When I got a new laptop I neglected to copy the files before selling the old laptop on eBay.  I kept the backups with a large collection of floppy disks that I all but forgot about in the transition over to recordable CD-ROMs and flash drives.  When I eventually tried to access the disks 10 years later, they had errors and I could only retrieve some of the content.  I turned to Google as a last resort and was frankly amazed at how much still existed in the ether.

This is obviously not an ideal archival arrangement.  There is, for example, no assurance that the words purported to be mine are 100 percent authentic or are presented in what I would consider the right context.  The bigger issue is fragmentation: the Google CinC corpus is patchy in the extreme, presenting a range of information from brief mentions to, to excerpts, to complete issues.  It is a bit like trying to make sense of an ancient cuneiform library that has been smashed and scattered.  Still, for me in this case, it is far better to have the pieces than nothing.

Postmodern Philosophy Lulz #04 - Marshall McLuhan & A Cat With Cheese On Its Face
Postmodern Philosophy Lulz #04 – Marshall McLuhan & A Cat With Cheese On Its Face

Is this how future researchers will experience our world?  Perhaps.  We crank out gigantic quantities of digital documentation about every possible topic, often with no solid plan for how to preserve it.  The vastly distributed nature of the internet ensures that pieces of our collective output persist, and powerful search technology is adept at zeroing in on the tiniest fragments. This is like randomly grabbing chunks of information, throwing it in a digital Cuisinart and hitting “liquefy.”  At what point does the original structure and meaning of the information break down and combine into something different?

Maybe this is the wrong question, however.  One might ask instead: how much will anyone care in the future about hazy issues relating to authenticity, context and original intent?  One of the old rescued CinC articles recorded my thoughts on this very subject in connection with the then-recent book, Life on the Screen by Sherry Turkle.

Turkle is, to my mind, without peer in her ability to probe how computers are changing us.  In Life on the Screen, she lucidly explained how information technology is facilitating a shift from respect for rational, systematic thought  to an embrace of  personal experience by the ability people have via the web to explore, rearrange, and reinterpret information.  We are becoming much more willing to accept and endorse subjective experience than to filter perception through ideas about what is “right” or “wrong.”  Turkle’s new book, Alone Together, explores these ideas further, and I look forward to reviewing it in a future post.

My experience with losing data, and then finding some of it on the internet, gives rise to a host of thoughts.  If I wanted to be totally pretentious I could say that it revealed the boundary between the Modern and the Postmodern (even though I can’t say I precisely know what those terms mean).  But most of all, I am left wishing that I had just done a better job preserving my digital files.

Jan 032011
 

Ars Technica is one of the best sources anywhere for insight into technology and its ever-expanding impact.  I was especially pleased that the site ran 10 separate articles about digital preservation during the past year.

3-D rendering of a graphene hole
3-D rendering of a graphene hole by LBNL, on Flickr

Special credit goes to John Timmer, “Science Editor et Observatory moderator.”  He wrote six excellent pieces on the challenge of preserving and providing meaningful access to scientific data.  He treats the issue superbly, bringing it to life using his real-life experience as a genetics and biology laboratory  researcher.

Timmer put together a three-part series on scientific data preservation.  Part I: Preserving science: what to do with raw research material? refers to the recent fuss about the UK’s Climatic Research Unit, particularly its messy data management.  “Poorly commented computer code. Data scattered among files with difficult-to-fathom formats…  But the chaos, confused record keeping, and data that’s gone missing-in-action sounded unfortunately familiar to many researchers, who could often supply an anecdote that started with the phrase “if you think that’s bad…”

In Part II: Preserving science: what data do we keep? What do we discard?, he tackles one of the most sensitive—and vexing—issues out there.  “The reality is that we simply can’t save everything. And, as a result, scientists have to fall back on judgment calls, both professional and otherwise, in determining what to keep and how to keep it.”

The inescapable matter of digital media obsolescence is considered in Part III: Jaz drives, spiral notebooks, and SCSI: how we lose scientific data.  “Over the course of my research career, archiving involved magneto-optical disks, a flirtation with Zip and Jaz drives (which ended when some data was lost by said drives), a return to big magneto-optical disks, and then a shift to CDs and DVDs. Interfaces also went from SCSI to Firewire to USB. Anything that wasn’t carefully moved forward to the new formats was simply left behind.”

Wired UK - NDNAD Infographic
Wired UK – NDNAD Infographic by blprnt_van, on Flickr

Timmer also weighed in on Changing software, hardware a nightmare for tracking scientific data.  “My work relied on desktop software packages that were discontinued, along with plenty of incompatible file formats. The key message is that, for even careful researchers, forces beyond their control can eliminate any chance of reproducing computerized analyses, sometimes within a matter of months.”

How science funding is putting scientific data at risk highlighted the stark reality that adequate money is all too frequently not provided to maintain important data.   Keeping computers from ending science’s reproducibility explores a huge barrier that gets in the way of confirming research results.  “Traditional science involves a complex pipeline of software tools; reproducing it will require version control for both software and data, along with careful documentation of the precise parameters used at every step.”  But “this work may run up against the issues of data preservation, as older information may reside on media that’s no longer supported or in file formats that are difficult to read.”

Doom Install Disks
Doom Install Disks by Matt Schilder, on Flickr

Ars ran two articles about preserving video games. The first, Preserving games comes with legal, technical problems referred to a paper in the International Journal of Digital Curation, Keeping the Game Alive: Evaluating Strategies for the Preservation of Console Video Games.  “Hardware becomes outdated and the media that houses game code becomes obsolete, not to mention the legal issues with emulation.”

The second, Saving “virtual worlds” from extinction, discussed Preserving Virtual Worlds, a project at the University of Illinois at Urbana-Champaign.

The final two articles focused on Library of Congress actions  (full disclosure: I work with the Library digital preservation team).  Why the Library of Congress cares about archiving our tweets delved into the huge interest that flowed from the Library’s announcement about acquiring the Twitter archives.  Historic audio at risk, thanks to bad copyright laws discussed a report from the National Recording Preservation Board about problems preserving the complex digital formats that underlie much of today’s music .

Let’s hope that Ars continues its coverage of digital preservation into 2011.  There is quite a bit to talk about.