Oct 302011

There is a polite but persistent disagreement among librarians, archivists and other normally peaceful souls who care about keeping digital information accessible into the future.  The conflict is low key, as one might expect: no one is occupying reading rooms, much less being led away in plastic handcuffs. But there are few signs that all parties are ready to reconcile.

Keep... - Day 50, by MarkAllanson, on Flickr

The dispute is over the proper term to use for the act of keeping digital content alive over time.  One group sticks with “digital preservation,” which is a blanket term of choice dating to at least the early 1990s.  Some insurgents have been urging use of “digital curation” during the last few years, stating, among other things, that the term is more comprehensive.  The idea is that curation covers the full life cycle of information, from creation through to access.  Another group adheres to “digital stewardship” also to convey the broad set of responsibilities involved, as well as to impart the necessity of ongoing, active engagement needed to keep digital information in a form that is authoritative, uncorrupted and useful.  (Full disclosure: I work for the Library of Congress and had a hand in launching the National Digital Stewardship Alliance).

The story doesn’t end here, as there are a bunch of other competing terms.  “Archive” is an old standby that can variously mean transfer to a preservation repository or merely to store data offline.  “Digital conservation” pops up from time in quite different circumstances.  A more recent entrant is “data management,” which is associated with a recent U.S. National Science Foundation requirement for funding proposals to include a plan for dissemination and sharing of data based on the research.

To make matters even more complicated, the same terms preservationists use are used by others in completely different contexts.  “Digital curation” is hot buzzword that huge numbers of people associate with picking content for distribution via the web.  “Digital preservation” is also used to mean constructing 3D scans of historic sites, such as Ft. Laramie, WY.

Friedrich Wilhelm Nietzsche painted portrait _DDC1256, by Abode of Chaos, on Flickr

Is it a problem that there are so many competing terms for what non-experts (particularly the ones who control funding) might reasonably consider to be approximately the same activity?  I’m of two minds here.  Part of me yearns for more clarity (through adoption of my favorite definition, of course).  But another part of me recognizes the inevitability of multiple terms because there are multiple perspectives, including about how best to get attention and support.  “We have a problem with language in this domain at the moment,” Steve Knight declared at a conference in 2008.  He said the term “permanent access” more accurately communicates what the preservation community is trying to do.  More to the point, Chris Rusbridge bluntly titled a blog post “Digital Preservation” term considered harmful?  and went on to note that “the language used and the way that the discourse is constructed [in the digital preservation community] is unlikely to make much impact on either decision-makers or the creators of the digital information.”  The author suggests “selling the outcomes” through use of terms like “long term accessibility” or “usability over time.”  More recently yet Kari Kraus opined in the New York Times that “we must replace digital preservation with digital curation,” if we are to make effective use of data worth saving.

All of this is perhaps part of a larger contemporary social force that resists firm definitions.  Politicians, for example, are continuously “redefining” themselves.  There are ongoing debates about how (or even whether) to define race for college applications.  It’s even  impossible to define the value of stretching before a workout.

Closer to home, Fred Gibbs recently analyzed 170 definitions of “the digital humanities,”  and was relieved to find that they fell into only nine categories (one of which was “refusals to define the term”).  It may well be that we live an in age where conceptual boundaries are weak, especially for emergent practices and ideas. This contrasts with earlier intellectual intentions: consider this bit from Friedrich Nietzsche and the Politics of Transfiguration, by Tracy B. Strong (thank you, Google books, for giving me the means to quote this juicy highbrow stuff):

The ability to give names–to extend control of language over the world… [is] a masterly trait: it consists of saying what the world is.  The reverse proposition will also be accurate: knowledge of the power of language may lead to the a prohibition on the use of certain names…. To name is to define and to bring under control, to give determination of the being of the object in question.  The allocation of names creates the world in the image of he who names.

Our inability “to define and bring under control” the concept underlying digital preservation is a sign of the times.  And while some  press for a favorite term, others adopt a more accepting stance.  Embedded in the lengthy discussion that is Semantics: Digital Preservation vs. Digital Curation, Chris Prom outlines issues with various terms, but also offers some down to earth advice: “I think we all need to be comfortable with using a lot of different terms to explain what we do.”

En Control, by alvaroalegria, on Flickr

This perspective is evident in a brand new Association of Research Libraries publication Digital Preservation, SPEC Kit 325.  While the title embraces one term, the executive summary takes pains to mention others, including digital curation, continued access and life cycle curation.  The report also notes that “the definition of ‘digital preservation’ is still murky for some librarians.  A number of respondents confused ‘backups’ with ‘preservation’ and referred to access-oriented repository services as though they were preservation solutions.”

Personally, I like the ARL approach and feel that, for all its limitations, “digital preservation” is still the best catch-all term.  But there are other terms that fit the concept also, and we need to be fully aware of their nuances.  There will be a continuing need to apply all the current terms–and probably some new ones–in the course of managing digital information into the future.

Feb 142011

What should we call our future with regard to saving and using digital information?

Billions and Billions Served, by Miss Millificent, on Flickr

I think one common term misses the mark in conveying the true threat to data and in expressing the basic imperative for keeping it.

“Digital dark ages” is a popular term that plays on fear, and by the way, suggests that the forces of history are working against data persistence.  The phase makes for provocative paper and article titles, true, but it hasn’t leveraged adequate support. Not to mention the fact that David Rosenthal makes a compelling argument that “digital dark ages turns out to be a poor analogy for the situation we face today.”

Rosenthal, among others, points to what is in fact a completely different reality: the huge and galloping vastness of digital information.  And data will continue to grow at an incredible rate–The Economist noted last year that “information has gone from scarce to superabundant.” Far from data loss through obsolescence, the big problem is actually too much data.  The Economist notes that “the proliferation of data is making them increasingly inaccessible.”

Science has  just issued a Special Online Collection: Dealing with Data (registration required).  The introduction notes that “we have recently passed the point where more data is being collected than we can physically store,” and “even where accessible, much data in many fields is too poorly organized to enable it to be efficiently used.”  There are also references to limited funding for data curation to enable broader use or even just keeping the bits safe.

Seth Godin famously noted that people aren’t more worked up over global warming for two basic reasons.  One is the name: “global” is good and “warming” is good so how can “global warming” be bad?

The second reason is that climate change activists “have been unable tell their story with vivid images about immediate actions, it’s just human nature to avoid the issue.”  People need to have an immediate sense of any problem to focus on fixing it.

Digital preservation faces something similar.  “Digital dark ages” sounds scary at first, but the term flies in the face of the reality we confront.  Given how stressed people say they are about information overload, the prospect of data disappearing may actually sound pretty good.

We need a better way to communicate the need for digital preservation and access. In another context, Joseph Hellerstein has talked about “the industrial revolution of data,” which maybe has some possibilities.  “Data-driven” is another current term tossed around in science and technology.

Any thoughts on this?

Picture added, reformatted and tweaked for style on 2/14/2011, 4:45 pm EST

Jan 042011

I recently taught a class of library school students about digital preservation.  On the plus side, they were a bright lot, many already knew something about the subject and I could inflict some advance reading on them.  The flip side is that this was a single class lasting only two and half hours, which is a short time to cover a broad subject.

I’ll go over what I tried to cover in the class in a later post (quite a lot, actually).  For now let me share what I sent out as a reading list with the intent to prepare students for our face-to-face encounter.

Preparing the list was a challenge.  I wanted sources that covered issues of current importance, were succinct and that were reasonably friendly to the non-expert.  I did find plenty of good information, but was surprised how few sources fit this particular need.  In fact, one could argue that no source currently meets the need.  Many focus on a particular program, approach or issue.  Many drill down into very granular details that can overwhelm the novice.  Others are a bit long or a bit old.  After much hunting and culling, the best I could find turned out to consist of 15 items, as noted below.

I’d like to hear about other sources that I might have missed.

Update: I clarified that the selected sources were the best I found, not that they all met the three criteria listed.

Let me add my usual full disclosure notice: I work with the NDIIPP team at the Library of Congress.

