Feb 142011

What should we call our future with regard to saving and using digital information?

Billions and Billions Served, by Miss Millificent, on Flickr

Billions and Billions Served, by Miss Millificent, on Flickr

I think one common term misses the mark in conveying the true threat to data and in expressing the basic imperative for keeping it.

“Digital dark ages” is a popular term that plays on fear, and by the way, suggests that the forces of history are working against data persistence.  The phase makes for provocative paper and article titles, true, but it hasn’t leveraged adequate support. Not to mention the fact that David Rosenthal makes a compelling argument that “digital dark ages turns out to be a poor analogy for the situation we face today.”

Rosenthal, among others, points to what is in fact a completely different reality: the huge and galloping vastness of digital information.  And data will continue to grow at an incredible rate–The Economist noted last year that “information has gone from scarce to superabundant.” Far from data loss through obsolescence, the big problem is actually too much data.  The Economist notes that “the proliferation of data is making them increasingly inaccessible.”

Science has  just issued a Special Online Collection: Dealing with Data (registration required).  The introduction notes that “we have recently passed the point where more data is being collected than we can physically store,” and “even where accessible, much data in many fields is too poorly organized to enable it to be efficiently used.”  There are also references to limited funding for data curation to enable broader use or even just keeping the bits safe.

Seth Godin famously noted that people aren’t more worked up over global warming for two basic reasons.  One is the name: “global” is good and “warming” is good so how can “global warming” be bad?

The second reason is that climate change activists “have been unable tell their story with vivid images about immediate actions, it’s just human nature to avoid the issue.”  People need to have an immediate sense of any problem to focus on fixing it.

Digital preservation faces something similar.  “Digital dark ages” sounds scary at first, but the term flies in the face of the reality we confront.  Given how stressed people say they are about information overload, the prospect of data disappearing may actually sound pretty good.

We need a better way to communicate the need for digital preservation and access. In another context, Joseph Hellerstein has talked about “the industrial revolution of data,” which maybe has some possibilities.  “Data-driven” is another current term tossed around in science and technology.

Any thoughts on this?

Picture added, reformatted and tweaked for style on 2/14/2011, 4:45 pm EST

Jan 032011

Ars Technica is one of the best sources anywhere for insight into technology and its ever-expanding impact.  I was especially pleased that the site ran 10 separate articles about digital preservation during the past year.

3-D rendering of a graphene hole
3-D rendering of a graphene hole by LBNL, on Flickr

Special credit goes to John Timmer, “Science Editor et Observatory moderator.”  He wrote six excellent pieces on the challenge of preserving and providing meaningful access to scientific data.  He treats the issue superbly, bringing it to life using his real-life experience as a genetics and biology laboratory  researcher.

Timmer put together a three-part series on scientific data preservation.  Part I: Preserving science: what to do with raw research material? refers to the recent fuss about the UK’s Climatic Research Unit, particularly its messy data management.  “Poorly commented computer code. Data scattered among files with difficult-to-fathom formats…  But the chaos, confused record keeping, and data that’s gone missing-in-action sounded unfortunately familiar to many researchers, who could often supply an anecdote that started with the phrase “if you think that’s bad…”

In Part II: Preserving science: what data do we keep? What do we discard?, he tackles one of the most sensitive—and vexing—issues out there.  “The reality is that we simply can’t save everything. And, as a result, scientists have to fall back on judgment calls, both professional and otherwise, in determining what to keep and how to keep it.”

The inescapable matter of digital media obsolescence is considered in Part III: Jaz drives, spiral notebooks, and SCSI: how we lose scientific data.  “Over the course of my research career, archiving involved magneto-optical disks, a flirtation with Zip and Jaz drives (which ended when some data was lost by said drives), a return to big magneto-optical disks, and then a shift to CDs and DVDs. Interfaces also went from SCSI to Firewire to USB. Anything that wasn’t carefully moved forward to the new formats was simply left behind.”

Wired UK - NDNAD Infographic
Wired UK – NDNAD Infographic by blprnt_van, on Flickr

Timmer also weighed in on Changing software, hardware a nightmare for tracking scientific data.  “My work relied on desktop software packages that were discontinued, along with plenty of incompatible file formats. The key message is that, for even careful researchers, forces beyond their control can eliminate any chance of reproducing computerized analyses, sometimes within a matter of months.”

How science funding is putting scientific data at risk highlighted the stark reality that adequate money is all too frequently not provided to maintain important data.   Keeping computers from ending science’s reproducibility explores a huge barrier that gets in the way of confirming research results.  “Traditional science involves a complex pipeline of software tools; reproducing it will require version control for both software and data, along with careful documentation of the precise parameters used at every step.”  But “this work may run up against the issues of data preservation, as older information may reside on media that’s no longer supported or in file formats that are difficult to read.”

Doom Install Disks
Doom Install Disks by Matt Schilder, on Flickr

Ars ran two articles about preserving video games. The first, Preserving games comes with legal, technical problems referred to a paper in the International Journal of Digital Curation, Keeping the Game Alive: Evaluating Strategies for the Preservation of Console Video Games.  “Hardware becomes outdated and the media that houses game code becomes obsolete, not to mention the legal issues with emulation.”

The second, Saving “virtual worlds” from extinction, discussed Preserving Virtual Worlds, a project at the University of Illinois at Urbana-Champaign.

The final two articles focused on Library of Congress actions  (full disclosure: I work with the Library digital preservation team).  Why the Library of Congress cares about archiving our tweets delved into the huge interest that flowed from the Library’s announcement about acquiring the Twitter archives.  Historic audio at risk, thanks to bad copyright laws discussed a report from the National Recording Preservation Board about problems preserving the complex digital formats that underlie much of today’s music .

Let’s hope that Ars continues its coverage of digital preservation into 2011.  There is quite a bit to talk about.