Yesterday’s New York Times had an Op-Ed article about a problem I’ll bet you never thought of: The Digital Pileup.
The essence of the article, with apologies to Jimmy McMillan, is that the amount of digital information is too damn high. Too much energy to run all those server farms. Too much human cost “wading through digital detritus.” Too much money going to all those damn lawyers demanding electronic discovery.
As someone who worries about digital preservation–that is, trying to keep some digital information accessible into the future, I read this article with conflicted feelings. On the one hand, there can be no doubt that big chunks of important digital information have disappeared. Most of us have personal experience with losing stuff from our own personal computers.
Even more significant is that a huge percentage of our cultural knowledge and experience now lives solely in digital form. Unless care is taken to keep and actively manage this data, we risk loosing our collective memory as well as grist for future research and discovery. So it is unsettling–to say the least–to see digital information painted uniformly as “digital detritus,” and users depicted as data “breeders” and “hoarders.” This is silly and simplistic.
But deep in part of my mind I get the strange sense that the author has a point. Sources of digital information–the web, organizational records, social media, scientific databases–are huge pipes, gushing with superabundant data. The scale and complexity of this information is well beyond the ability of individuals, and even most individual organizations, to manage. There is so much data that many librarians and archivists are left feeling overwhelmed and perhaps even disheartened in their efforts to get a handle on preserving what is important.
The traditional model of collecting and preserving books, papers, and just about everything else rests on an assumption of scarcity: humanity has a limited capacity for documenting itself, and the portion worth keeping is much smaller still. Methods for choosing valuable information are based on well-understood ideas about what users will appreciate and what generally will enrich creativity and learning.
All this is turned on its head in the digital age. Humanity now has a superabundant means to document itself, and it is, at this point, hard to say with certainty which of this information has ongoing value for research or some other use. Data mining and the ability to link different kinds of data to learn new information leads one to see potential value in just about everything. The choice frequently boils down to keeping lots and lots of data or keeping nothing.
It can all seem too much. So, when the article asks “is there anything we can do?” my natural optimism instinctively perked–for a split second.
Sadly, there is no silver bullet. The author doesn’t offer much: “we can demand that our companies… aggressively engage in data reduction strategies” (none of the data I’ve cranked out, thank you) and “we can clean up the stockpiles of dead data that live around us” (I’m not yet ready to trash my 50,000 Gmail messages).
Here is a dead obvious prediction: the amount of digital data will continue to grow at a fantastic rate as the pleasure, benefit and lure of technology deepens. And, like Rosalind in As You Like It, we will continue to ask– rhetorically, only–”why then, can one desire too much of a good thing?”

