Oct 102011

Innovation is one of those words that is as loaded as it is inescapable.

It appears constantly on billboards, TV commercials and political speeches. I’ll wager every big organization in the world lays claim to the concept through a mission statement or some other purported self-description. Our hopes for improved institutional outcomes–from schools, from hospitals, from governments–are all stoked by a devotion to the glimmering promise of doing things better in a new way.

alien innovate, by TaranRampersad, on Flickr

alien innovate, by TaranRampersad, on Flickr

What about digital preservation? Is innovation the key to dealing with all that valuable digital data?


This is, of course, an very unsatisfying answer. Innovation should be the answer to everything, most especially to all things digital.

“Never before in history has innovation offered promise of so much to so many in so short a time,” is a quote attributed Bill Gates, and at first glance it seems to ring with a self-evident truth.

When considered from the popular perspective of innovation, digital preservation looks like a straightforward challenge for libraries, archives, museums and other entities that long have kept information on behalf of society. All they need are some new ideas, practices and tools–all of which information technology excels in delivering. There’s also a neat symmetry here: technology created new kinds of information for libraries to preserve, so technology can help libraries do the job.

But it isn’t quite so easy. The basic problem is what Larry Downes has called “the laws of disruption,” of which the most fundamental is  “technology changes exponentially, but social, economic and legal systems change incrementally.”  Downes notes that innovative digital technology has thoroughly roiled many social conventions and that “nothing can stop the chaos that will follow.”  An overly dramatic statement, yes, but it illustrates that innovation is not a safe, orderly or controllable process.  It sends out big ripples of disruption with an unpredictable impact.

Consider the irony: organizations tout innovation as a way to thrive and prosper when the truth of the matter is that real innovation often destabilizes and destroys.

Libraries and other memory organizations are now bouncing on ripples of disruption, and the ride likely will stay scary for the foreseeable future.  Innovation puts these institutions in a bind: they are now confronted with a huge array of demands and choices that traditional structures are ill-suited to address.  They face an irresistible need for change.  But the further they stick their toes into the waves of innovation, the greater the potential for even more destabilization.  And since most institutions strongly resist that which threatens their stability, they have an unmovable incentive to resist real change.  All this means that the ability of traditional institutions to fully meet the need for digital preservation is in doubt.

Future as Disruption, by Fu Man Jew, on Flickr

Future as Disruption, by Fu Man Jew, on Flickr

Well, that’s depressing.  Wait, though–there’s a another side to innovation that offers hope for meeting the digital preservation challenge. Many individual librarians and archivists are using new kinds of tools and services–such as LOCKSS and “micro-services“–to build local preservation solutions.

Even more significantly, individuals of all kinds are playing a role in determining what gets saved and how that content is used.  Consider the impact that one person–Brewster Kahle–has made over the years through the Internet Archive.  Jason Scott is getting high-profile attention for his grassroots work to preserve large volumes of web content abandoned by companies such as Yahoo!.  All kinds of average people are developing interest in personal digital archiving to preserve their family memories.

Tim O’Reilly, the visionary who first saw the development known as web 2.0, sees a major role for individuals in digital preservation.   Here’s a summary from an account of his talk at a recent Library of Congress meeting:

O’Reilly stressed the preservation role of people working outside of institutions.  He called for “baking in” more preservation functionality into tools used to create and distribute digital content to enable a more distributed stewardship mindset.  This is important because “the things that turn out to be historic are not thought to be historic at the time.”   O’Reilly also said one of the most tweetable bits at the meeting: “Digital preservation won’t be just the concern of specialists, it will be the concern of everyone.”

I have some sympathy with O’Reilly’s argument.  It builds on the powerful trend of individuals asserting control over how information is published, distributed and used.  The result of a broad-based popular effort to steward digital data would also address some fundamental preservation needs: lots of distributed copies that are open for active use.  Individuals also often can adapt to change with more flexibility than can institutions.

Ultimately, we have to hope that innovation pushes along the trend toward the democratization of digital preservation.  The more people who care about saving digital content, and the easier it is for them to save it, the more likely it is that bits will be preserved and kept available.


Jan 212011

I wrote earlier about efforts that have gone into preserving what are known as ephemeral films: productions geared for educational, advertising or other uses separate from theatrical releases.  One of the largest online sources for these films is the Internet Archive, which has thousands of titles, many of which have long lived in obscurity.  Now accessible in digital form, these films are open to discovery.

A colleague pointed me to a U.S. government film in the IA FedFlix collection, The American Scene Series, Number 11: The Library of Congress.  It was made by the Office of War Information, Overseas Branch, around 1945.  Over the course of 20 minutes it presents a remarkable portrait of the Library at that time.  The film mentions the Library’s work to conduct field recordings of “unknown primitive singers” and has brief clips of two recording sessions.

One is of Brownie McGhee and Sonny Terry, very well-known artists in the “acoustic, folk-oriented Piedmont blues style… ready-made for the folk festivals and college campuses of the 1960s.”  The duo perform one of their signature songs, “Red River Blues” for about two minutes.  The setting appears to be in a farmyard and shows two earnest recording technicians at work.  It is an amazing scene, as the embedded video attests.

The second clip is also remarkable:  Woody Guthrie singing “Ranger’s Command,” again in a rural setting with recording gear in evidence.  The Guthrie clip is already on YouTube, but is no less compelling than the first.

I think I may spend more time trolling through these old films for such unexpected treasures.  But only the digitized versions!

Jan 042011

I recently taught a class of library school students about digital preservation.  On the plus side, they were a bright lot, many already knew something about the subject and I could inflict some advance reading on them.  The flip side is that this was a single class lasting only two and half hours, which is a short time to cover a broad subject.

I’ll go over what I tried to cover in the class in a later post (quite a lot, actually).  For now let me share what I sent out as a reading list with the intent to prepare students for our face-to-face encounter.

Preparing the list was a challenge.  I wanted sources that covered issues of current importance, were succinct and that were reasonably friendly to the non-expert.  I did find plenty of good information, but was surprised how few sources fit this particular need.  In fact, one could argue that no source currently meets the need.  Many focus on a particular program, approach or issue.  Many drill down into very granular details that can overwhelm the novice.  Others are a bit long or a bit old.  After much hunting and culling, the best I could find turned out to consist of 15 items, as noted below.

I’d like to hear about other sources that I might have missed.

Update: I clarified that the selected sources were the best I found, not that they all met the three criteria listed.

Let me add my usual full disclosure notice: I work with the NDIIPP team at the Library of Congress.

Update: I clarified that the selected sources were the best I found, not that they all met the three criteria listed.

Jan 032011

Ars Technica is one of the best sources anywhere for insight into technology and its ever-expanding impact.  I was especially pleased that the site ran 10 separate articles about digital preservation during the past year.

3-D rendering of a graphene hole
3-D rendering of a graphene hole by LBNL, on Flickr

Special credit goes to John Timmer, “Science Editor et Observatory moderator.”  He wrote six excellent pieces on the challenge of preserving and providing meaningful access to scientific data.  He treats the issue superbly, bringing it to life using his real-life experience as a genetics and biology laboratory  researcher.

Timmer put together a three-part series on scientific data preservation.  Part I: Preserving science: what to do with raw research material? refers to the recent fuss about the UK’s Climatic Research Unit, particularly its messy data management.  “Poorly commented computer code. Data scattered among files with difficult-to-fathom formats…  But the chaos, confused record keeping, and data that’s gone missing-in-action sounded unfortunately familiar to many researchers, who could often supply an anecdote that started with the phrase “if you think that’s bad…”

In Part II: Preserving science: what data do we keep? What do we discard?, he tackles one of the most sensitive—and vexing—issues out there.  “The reality is that we simply can’t save everything. And, as a result, scientists have to fall back on judgment calls, both professional and otherwise, in determining what to keep and how to keep it.”

The inescapable matter of digital media obsolescence is considered in Part III: Jaz drives, spiral notebooks, and SCSI: how we lose scientific data.  “Over the course of my research career, archiving involved magneto-optical disks, a flirtation with Zip and Jaz drives (which ended when some data was lost by said drives), a return to big magneto-optical disks, and then a shift to CDs and DVDs. Interfaces also went from SCSI to Firewire to USB. Anything that wasn’t carefully moved forward to the new formats was simply left behind.”

Wired UK - NDNAD Infographic
Wired UK – NDNAD Infographic by blprnt_van, on Flickr

Timmer also weighed in on Changing software, hardware a nightmare for tracking scientific data.  “My work relied on desktop software packages that were discontinued, along with plenty of incompatible file formats. The key message is that, for even careful researchers, forces beyond their control can eliminate any chance of reproducing computerized analyses, sometimes within a matter of months.”

How science funding is putting scientific data at risk highlighted the stark reality that adequate money is all too frequently not provided to maintain important data.   Keeping computers from ending science’s reproducibility explores a huge barrier that gets in the way of confirming research results.  “Traditional science involves a complex pipeline of software tools; reproducing it will require version control for both software and data, along with careful documentation of the precise parameters used at every step.”  But “this work may run up against the issues of data preservation, as older information may reside on media that’s no longer supported or in file formats that are difficult to read.”

Doom Install Disks
Doom Install Disks by Matt Schilder, on Flickr

Ars ran two articles about preserving video games. The first, Preserving games comes with legal, technical problems referred to a paper in the International Journal of Digital Curation, Keeping the Game Alive: Evaluating Strategies for the Preservation of Console Video Games.  “Hardware becomes outdated and the media that houses game code becomes obsolete, not to mention the legal issues with emulation.”

The second, Saving “virtual worlds” from extinction, discussed Preserving Virtual Worlds, a project at the University of Illinois at Urbana-Champaign.

The final two articles focused on Library of Congress actions  (full disclosure: I work with the Library digital preservation team).  Why the Library of Congress cares about archiving our tweets delved into the huge interest that flowed from the Library’s announcement about acquiring the Twitter archives.  Historic audio at risk, thanks to bad copyright laws discussed a report from the National Recording Preservation Board about problems preserving the complex digital formats that underlie much of today’s music .

Let’s hope that Ars continues its coverage of digital preservation into 2011.  There is quite a bit to talk about.

Dec 292010

Much news came from the recent announcement that the Library of Congress was adding 25 new titles to the National Film Registry for permanent preservation.  This assortment of “Hollywood classics, documentaries and innovative shorts reflecting genres from every era of American filmmaking,” were selected because they are “culturally, historically or aesthetically significant.”

02.23.38.jpg by footage, on Flickr

Amid all the hoopla, it is worth bearing in mind that there are many, many other film and video productions that provide a different kind of significance, one that provides unique historical–and sociological–insight.  These are “ephemeral films,” which consist of everything from anti-drug dramas to car advertisements.  Rick Prelinger is a long-time champion of these productions, and he provides access to over 2,000 titles through the Internet Archive. Equally important, the information has been placed in the public domain under a Creative Commons license.

While titles such as Good Table Manners, Duck and Cover, and Are You Popular? never won an Oscar, they do document certain idealized behaviors and mindsets from the past.  On this basis, the Library of Congress acquired over 48,000 titles from Prelinger in 2002.

Announcing the acquisition, the Librarian of Congress noted that the films are “quite distinct from that found in Hollywood feature films and newsreels. These are the films that children watched in the classroom, that workers viewed in their union halls, that advertisers presented in corporate boardrooms, and that homemakers saw at women’s club meetings.”

Our collective history is richer with the preservation of the Prelinger collection.  With apologies to Baudelaire, we now have the chance to extract eternal knowledge from ephemeral films.