Mar 242014

Are you looking forward to The Emails of Thomas Pynchon? Or maybe Jonathan Franzen: Tweets and Chats?

Sorry, but the future holds something different for the literary remains of famous authors.

By Frank Boyd, on Flickr

By Frank Boyd, on Flickr

Email and other forms of digital technology represent a sea change for writers. Works are drafted and rewritten on the screen. Authors have a vastly expanded capability to create and to correspond with editors, friends and others, all of whom may be just a few keystrokes away.

But the degree to which any one writer’s digital trail survives is very much an open question. In 2005, for example, Zadie Smith speculated that her email would “will go the way of everything else I write on the computer–oblivion.”

Famous writers have long bequeathed their correspondence, drafts and unpublished works to libraries and archives. It’s an arrangement that benefits everyone: the institution build prestige; the researcher get revealing material; the public learns about the literary back story; and the writer (or her estate) gets money. Yet the whole system as we know it is built on paper: letters, journals and hand-annotated drafts.

Personal digital content threatens everything. The biggest problem is the “personal” part: authors, like the rest of us, can be poor stewards of their own digital legacy. They don’t back up their hard drives. Their files are a disorganized mess. Their content is scattered among multiple devices and online platforms. And while writers may know that some of this digital material has enduring value, there is as yet no easy way to even think about preserving it. All of us are still working though what digital means in our lives.

People have a natural emotional connection to works on paper–it’s easy to see, to handle and to store. It’s durable and even resists apparent efforts to destroy it. Even though Samuel Clements could, for example, write a letter declaring “shove this in the stove… I don’t want any absurd ‘literary remains’ & ‘unpublished letters of Mark Twain’ published after I am planted,” the words live on because they were on paper.

Clemens changed his mind regarding his letters, choosing to “leave it behind and utter it from the grave.” He has plenty of company.  F. Scott Fitzgerald, William Faulkner, Sylvia Plath, Mary McCarthy and Saul Bellow all left paper literary remains. Over the last century, libraries and archives have developed great expertise in acquiring and preserving this material.

But we are at an unusual point in documenting literary lives and works. Authors have had word processing and other forms of personal digital technology available to them for 30 years. Some writers have stubbornly refused to use it, but many have, and are contemplating their own “absurd literary remains.” What actually remains is big open question. Are there emails with editors or notable authors? Drafts with track changes? Ribald direct messages?

At this point, there are only a few institutions with literary personal digital materials. The Norman Mailer Papers at the Harry Ransom Center, University of Texas at Austin, include “359 computer disks, 47 electronic files, 40 CDs, 6 mini data cartridges, 3 laptop computers [documenting] correspondence and literary drafts.” The Salman Rushdie Papers at the Emory University Library have “one Macintosh Performa 5400/180, one Macintosh PowerBook 5300c, two Macintosh PowerBook G3 models, and one SmartDisk FWFL60 FireLite 60GB 2.5′ FireWire Portable Hard Drive.” The Susan Sontag Papers at the Charles E. Young Library, University of California Los Angeles, contain “seventeen thousand one hundred and ninety-eight e-mails.” All these are hybrid collections, which is to say that most of the material is on paper.

How much born digital content is still out there, living wild under the good/bad/indifferent care of writers who find themselves to be their own unintended digital archivists? How ever much there is, I suspect that the proportion of paper to digital is rapidly declining.

What to do about it? Raise awareness about the value of personal digital archives across the board, pure and simple. Everyone has a story to tell and a digital legacy to pass on. The apparent value of email and other content will, I am sure, become more obvious over time.

This is already happening for writers. In 2005, Rick Moody told the New York Times that, when he was considering the sale of his papers, the dealer wanted to know about email. “This sort of brought to mind that there was a policy [for saving it], though it was a very unmethodical policy,” he said. Paying money for email is certainly one way to draw attention to its value. And once writers, agents, publishers, libraries and archives, and all the rest of us understand that personal digital collections warrant careful management from the moment of creation, we will see betters tools and methods for personal digital archiving.

In the meantime, we can only speculate how much and what kinds of digital literary remains will find their way into research collections. Or, to paraphrase Sontag, our libraries await the digital archives of longing.


Mar 152014

Edward Snowden did more than blow the lid off secret government surveillance. He has called into question a fundamental role of government itself: keeping records.

Snowden at SXSW, by Cory Doctrow, on Flickr

Snowden at SXSW, by Cory Doctrow, on Flickr

Governments have always kept records. Documentation is needed for protecting legal rights and financial obligations, as well as for establishing individual identities and relationships. While there were instances of public outrage in connection with certain overzealous documentation efforts (such as with the East German Stasi), government record keeping is something most people accept as a fact of life.

And when it comes to the idea of “archives”–records kept permanently for their historical or other value–it’s easy to stir the mystic chords of memory. In laying the cornerstone of the U.S. National Archives  in 1933, Herbert Hoover declared “This temple of our history will appropriately be one of the most beautiful buildings in America, an expression of the American soul.”

The concept of government archives as secular temples persists into the digital era. In 2001, Archivist of the U.S. John Carlin described the National Archives and Records Administration’s system for preserving email and other born digital records, the Electronic Records Archives, as follows: “An ERA will allow us at NARA to make a much greater amount of our holdings— these records of democracy, ‘the people’s records’— available to more citizens via the Internet. And that will make our country and our democracy stronger.”

But all this assumes the people are fine with government collecting and keeping digital records about them. The news has long been full of stories about data breaches and digital identity theft. Add to that the specter of government snooping, and it’s no surprise that public anxiety is higher than ever. According to the Electronic Privacy Information Center, “public opinion polls consistently find strong support among Americans for privacy rights in law to protect their personal information from government and commercial entities.” Polls also show that public trust in government is at historically low levels.

All this makes me wonder how supportive the public will be in the years ahead for government efforts to collect and manage any kind of digital information. Given that people tend to paint “the government” with a broad brush, skepticism can easily extend to national cultural heritage institutions. When the Library of Congress announced in 2010 that it would collect the Twitter archive, the agency immediately faced an uproar of privacy concerns that continue to this day. A 2013 article on the project is entitled Does the Government Monitor Your Twitter Account? As absurd as this seems, somebody was worried enough to establish a Twitter application, (now apparently inactive), to automatically delete tweets before they fell under the Library’s control.

There is a bit of inside irony about all this. The landmark 1996 report, Preserving Digital Information: Report of the Task Force on Archiving of Digital Information, declared that libraries and archives needed to prove they could excel in collecting and keeping digital content to gain and keep public trust. Trust has indeed proven to be a paramount issue, but it’s not about about demonstrating capability through building trustworthy systems as the report called for. Instead, the public worries that the government and other large institutions have too much technological capability–so much so that privacy is compromised. Trust instead is linked to degrading capability through some combination of personal data ownership, data collection restrictions and even calls to scale back cloud computing.

Are we at a point where the public wants government to do less in terms of collecting and keeping electronic records? If so, that’s a major concern, especially since many government archives already are struggling in this area. With regard to NARA, for example, see Record Chaos: The Deplorable State of Electronic Record Keeping in the Federal Government2008 and Report on Current Recordkeeping Practices within the Federal Government, 2001. (Sadly, whatever success the National Security Agency has had isn’t transferable).

One thing seems certain: government needs to establish itself as a trustworthy manager of electronic records before government archives become digital temples of history.

Mar 062011
Terra Cotta Archivists, Internet Archive

Terra Cotta Archivists, Internet Archive

I attended the Personal Digital Archiving conference in San Francisco last week. Some of the usual suspects in the world of digital preservation where there, most of whom are affiliated with institutions (including myself).

But there were also a few rugged individuals who, out of passion or some other impulse, are working alone to collect digital content.

These lone preservers deserve our thanks. Future users will thank them even more.

Most big collecting institutions–libraries, archives and museums–have yet to fully adopt their their attention to digital content, most especially born digital material.  The problems, wildly generalized, are fundamental:

  • Resource demands for managing traditional, non-digital holdings remain substantial.
  • New resources are hard to come by, and prospects for cuts loom.
  • Digital content is new and trendy, and may seem frivolous;  it is hard to know which of it merits saving.
  • Many–most?–staff have spent careers apart from digital material and are not eager to deal with it.
  • Many–most?–institutions have limited technological capacity or infrastructure to manage digital holdings.

Individuals acting on their own are free from these concerns.  They don’t have big legacy collections to worry about.  They don’t have to defend their actions to overseers.  It’s easy to get cheap technology to do the job.

PDA 2011 Conference Sign

The prime example of the lone collector is Brewster Kahle of the Internet Archive, which hosted PDA 2011.  Kahle and his helpers had web archiving to themselves for the first few years, when there was plenty of skepticism about the the value of the content.  Around 2000, some institutions began to selectively capture websites, often working in concert with the IA.  Today, large-scale web capture is underway around the world:  there are now over 30 national libraries and other entities devoted to the job.

Jason Scott spoke at the conference. Scott, proprietor of and collector of “marginalized data, the textfiles and message bases of dial-up bulletin board systems of the 1970s, 80s, and 90s,” is a self-described “tiring activist.” He said that much digital information was at risk, facing a “danger of deletion, a danger of being lost, a danger that a piece of history, with its value unrecognized and a lack of interest in what it might mean, might just be lost forever.”

Scott talked about a recent project to download a copy of the websites formerly housed on the Geocities web hosting service.  He passionately defended the value of this information against “the current natural order of things for hosting user-generated content [which] is this: Disenfranchise. Demean. Delete.”  Scott also advocated individual responsibility for one’s own personal content.  “Go to your own computer, plug in a USB stick and copy your documents folder, because that’s the only thing that nobody’s going to be able to save.”

All of this leads me to speculate that, when it comes to digital content, our culture is reverting back to an era when we depended on high-minded individuals to build singular collections of art, books, manuscripts and other documentary material.  The survival of much important information is due solely to individual initiative, as its true value only became apparent years later.

Perhaps it is appropriate that the era of user-generated content also includes the return of the heroic private collector.  A twist is that the heroics are now scalable.  The far end of the scale has people like Kahle and Scott.  At the near end are everyday people who do their best to keep family photos and the occasional email.

Libraries, archives and museums, of course, still have a major role to play.  If history is a guide, they will eventually assume stewardship responsibility for some private digital collections, and they will also expand their own curatorial interests into this realm.



Feb 242011
IMG_2582, by QuickLunarCop, on Flickr

IMG_2582, by QuickLunarCop, on Flickr

Personal digital archiving–actions that individuals undertake to enhance the persistence and accessibility of their own digital photographs, videos and other content that documents their lives–is something of a hot topic.

The Internet Archive is hosting its second Personal Archiving conference today and tomorrow with an impressive lineup of speakers.  The Library of Congress held its first Personal Archiving Day in 2010 and has plans to hold another, again in connection with the American Library Association’s Preservation Week.

Researchers are considering the topic, such as Cathy Marshall from Microsoft. At least one blog,  The Digital Beyond,  focuses on “your digital existence and what happens to it after your death.”

As the diversity above shows, there are lots of ways to think about the subject.  But let’s consider things from the collective perspective of libraries, archives and museums.  There are three main reasons that memory organizations should think about personal archiving.

1. Collecting content. This is the obvious one.  Institutions that seek documents from individuals will naturally have interest in some type of personal digital material.  Examples might be literary manuscripts, special collections, artist “papers,” and family and genealogical collections.  Heretical though it may be, even official archives may find themselves working with email and other digital records that strongly resemble personally managed documentation.

2. Advising and assisting. Any organization that wants to bring in personal digital information had better be prepared to provide effective guidance to prospective donors.  There are is a need to cultivate good long-term practice for individuals who create and manage information, as well as to guide those who want to donate content in the near term.

3. Engaging with users.   Many choices are creatively and persistently vying for people’s attention.   Memory organizations have to do the same to build collections and, even more importantly, to grow their audience.  Given the harsh economic climate, collecting institutions are under more pressure than ever to justify their relevance in people’s lives.

I’ve thought for a while now that the huge growth of personal digital information presents a great opportunity for memory organizations to connect with people in new ways.  Individuals suddenly find themselves with big digital collections that they have conflicting feelings about.  On the one hand, the content is personally meaningful and fun to share.  On the other hand, people can be overwhelmed by their digital information and find themselves unsure how to manage it over the long term.

A memory organization is in the business of highlighting the meaning in collections, sharing them and preserving them.  True, this work hasn’t traditionally focused on average people–but that could change. Why not solicit personal material for, say, temporary exhibit?  Why not host local personal archiving workshops?  Why not be a link between the digital information that people care about today and the people who will care about it tomorrow?

It might well be that a successful memory organization is one that expands the idea of public service to working directly with people to help them better appreciate and keep their personal digital information.


Jan 122011

I noted in an earlier post that I recently taught an introductory class on digital preservation.  I pulled together some slides to present the important points, and devoted some time at the start to explain “the digital preservation challenge.”

Work in Progress, by blumpy, on Flickr

This is a dicey proposition.  On the one hand, I wanted to convey a realistic assessment of the issues which are, to my mind,  significant.  It is a bit like a 12 step program: the first step is facing up to a need for change.

Yet dwelling too much on the problems associated with digital preservation encourages some combination of hand-wringing and reluctance to act, both of which are counterproductive.  I believe that iterative solutions built on the experience of doing the best we can at a given time is far better than doing nothing.  I also think this is the only way we are going to make progress with the many technical issues, as well as with the really hard stuff we face: the social, political and legal challenges.

Below is the information presented in my slides.  I’m still not sure I struck the right balance, but then nobody fled the class under a cloud of discouragement.

The Digital Preservation Challenge

  • Libraries, archives, museums and other cultural heritage institutions have unparalleled experience managing analog items…
  • But only some of this experience carries over to the digital world
  • Digital information presents an existential test:  institutions have to figure out a new way of doing business
  • Which is hard, because institutions and their staff have comparatively limited experience dealing with digital…
  • And hard, too, because digital presents some tough problems

Problem: Lots and Lots of Data

  • Huge volume of digital information—and it is rapidly growing
  • Organizations, governments and individuals are all information creators
  • Some large chunks of this information has value—actual or potential—from perspective of archives/libraries
  • Which chunks to focus on?

Problem: Problem: Information Complexity

  • Dynamic databases, websites
  • Sophisticated specialty uses: CGI, CAD/CAM, geospatial…
  • Highly specialized applications dependent on deep knowledge: scientific databases
  • Linked data

Problem: Technological Dependency/Obsolescence

  • Every piece of digital information depends on a stack of technologies working perfectly together, e.g.:
    • File format (pdf, html, doc)
    • Storage media (cloud, hard drive, USB drive)
    • Application software (reader, browser, app)
    • Operating system (Windows XP, Vista, 7)
    • Computing device (PC, laptop, smart phone)
  • Each layer of the stack is rapidly changing
  • Ensuring ongoing access requires work, careful planning

We Have Solid Preservation Concepts (e.g., OAIS) but Implementation is Difficult

  • No optimal digital preservation system exists
  • Institutional, user requirements not always clear
  • Bottom line: guiding principles, no obvious solutions
  • Plus: What constitutes preservation itself a matter of perspective and debate (more on that later)

Alright Then, If It’s So Hard, Why Worry About It?

  • Traditional information sources becoming digital: books, serials, reports, photographs, documents…
  • New information sources digital only: websites, social media, email…
  • Users expect digital access to information, now and in the future
  • If libraries/archives are to extend their historic mission and remain relevant they must collect, preserve and serve digital information

Good Progress is Evident!

  • A number of initiatives are tackling the issue around the world
  • Some common principals demonstrated with different approaches
  • Reasons for optimism:
    • Important elements of the issue are defined
    • Solid conceptual framework exists
    • Biggest institutions are deeply engaged
    • Extensive cooperation, sharing, open development
    • Tools and services are multiplying

At this point, I had another series of slides that characterize some of the operational approaches to digital preservation that discuss the pros and cons of each.  I’ll get into that at a later date.

Jan 112011

red 6 by holeymoon, on Flickr

At this point, most organizations have plunged into some form of digital communication and outreach.  Some are further ahead than others, but just about everyone needs to do more.

For cultural heritage organizations, the necessity seems clear.  The Smithsonian Web and New Media Strategy declares “once on the fringe of institutional and public awareness, Web and New Media initiatives are now considered to be a critical part of the Institution’s core activities and future: They need to be funded and managed accordingly.”

Organizations that do embrace the web and other forms of new media still face what can be called the fierce urgency of how: where to focus attention and how to break the issue into manageable chunks.  While every institution has unique needs, there are some basic ideas to consider.  Communicopia has a fine blog post, Fresh Start: 5 resolutions for your digital program in 2011.  I’ve outlined the five goals for improvement below with a focus on cultural heritage institutions.  The sixth one is mine.

1.      Shift Your Perspective. Understand that the internet is more than a digital brochure; it’s everything you do. Increasingly, it’s the primary channel where your audiences and stakeholders are hearing about you, getting information and getting things done.

2.      Connect Digital to Strategy — Everyone’s.  Make sure your digital tactics clearly connect to your core organizational goals, messages and priorities.  (The quote above from the Smithsonian plan is a good illustration of this intent).

3.      Enable a Responsive & Holistic Structure.  Avoid dysfunctional implementation. Leading organizations today approach digital as a system, with a central digital team providing direction but with innovation and even some execution shared with public-facing internal departments.

4.      Bow to the King: Content.  Use compelling content–narrative and visual–produced from your deepest areas of expertise.  Staff need to be good storytellers for digital outreach.

5.      Integrate Network Thinking into Internal Processes.  Make use of digital tools to facilitate knowledge sharing and productivity within the institution and among collaborating institutions. Collaboration among institutions might, in fact, be a critical part of your outreach strategy.

6.      Get noticed and stay that way.  Explore the available digital channels (Facebook, Twitter, YouTube and whatever new new thing comes along tomorrow) and find the outlets that work best for your institution.  Post interesting content regularly and work hard to engage visitors.  Always remember that you are competing for users with short attention spans and unlimited choices.

Thinking about goals like these is a worthwhile for any library, archives, museum or other cultural heritage organization.  How far you get acting on them right now depends on lots of things: management willingness, staff culture, institutional priorities and, of course, money.  But there can be little doubt that digital disruption will continue pushing institutions to change.

Jan 062011

The first wave of desktop computers users are getting old.  As a cohort, they are retiring from their jobs, downsizing their homes and, maybe, passing on important digital data.

5.25" Floppy Disk Drive
5.25″ Floppy Disk Drive by Accretion Disc, on Flickr

Chances are good that some of this information will be stored on relics from a bygone era: floppy disks, Zip drives, tape cartridges and the like.

Any library, archives or museum–or any person–who might be the recipient of such bounty should consider their options, of which there are three:

  1. Use a commercial service to transfer the information to more modern media.
  2. Acquire some older equipment to do the job yourself.
  3. Do nothing and hope for the best.

All of these choices have associated risk.  A service can be expensive; getting your own devices can be a challenge; and doing nothing is–well, doing nothing.  A more general threat hovers over things as well: the longer the wait to transfer information, the greater the chance the original media will degrade and lose data.

Anyone who has to deal with older information might want to think about hedging their bets by acquiring  some equipment to access obsolete media.  This can be a complicated process involving a slew of gear, common and uncommon (Bernoulli Box, anyone?).

For the sake of brevity, I’d say there are four basic media readers:

These can be hard to find.  A quick search of eBay data for the past three months shows that only 49 5.25 inch drives of all types were available for sale and that “new” or “mint” drives can sell for nearly $60.

Getting a drive to read media is the beginning.  You will, of course, need to connect it to a computer. There are an alphabet soup of potential connections:  ATAPI, SCSI and USB, for example, and modern computers may not be compatible.  After hooking up an older drive, you may also need to find a specific device driver to use it. And, when everything is up and running, you will need to carefully plan how to work with the old media, whose condition is frail and content unique.

The time and effort needed to gear up for older media might make the difference between having enduring access to older information or, sadly, having no access at all.

Dec 292010

Why should anyone care about digital preservation?

Those of us who work in the field ponder this question. Our personal conviction is secure; we know that select digital information is valuable for current and future use.  We know also how fragile that information is and how easily it can disappear.

But making the case to funders, institutional leaders and even Uncle Bob can be tough.  Layers of opacity get in the way.  With what seems to be unlimited information about everything on the web 24-7, arguing that data is at risk may seem counterintuitive.  Venturing into topics like bit rot, media migration and metadata augmentation often leads to glazed eyes.  Making the case that new forms of information, such as social media and websites, should be collected by institutions that preserve Gutenberg Bibles and Civil War documents can cause consternation.

If our culture is going to succeed in preserving important digital information, more effective arguments are needed.  I think there are a couple of messages and one medium in particular that are worth exploring.

The messages are basic.  Economic and social progress has depended for centuries on information kept by libraries, archives, museums and other collecting organizations.  For this benefit to continue, these organizations must expand their capacity to keep digital information, which is now the dominant form of documented human knowledge.  There is a nice hook, too: personal archiving.

With so many digital cameras, smartphones, laptops and tablets in circulation, many people are themselves responsible for personal digital collections.  Helping them understand the need to care for and preserve their family digital photographs also builds a larger awareness about the value of digital preservation.  One might say that like politics, digital preservation is local.

Video may be the best medium yet invented to put information in the human brain.  One recent theory credits this to “dual coding,” which basically means the information that we see and hear is both easier to learn and remember.

Maybe this underlies a natural tendency to prefer audio-visual over text.  In any event, advertisers certainly get it.  “There’s no doubt that web videos are cool and fun to watch – certainly more fun than reading page after page of text,” proclaims the Switchmarketing website.  “But does it actually work? Absolutely!”

This sentiment was reinforced for me when I asked a savvy educational specialist about the best way to communicate about digital preservation on the web.   His advice went right to the point:  “short videos like you see on YouTube.”

Starting in 2009, the Library of Congress digital preservation team began producing videos (full disclosure: I am part of the team).  The idea was to promote digital preservation as a fundamental part of the Library’s mission and why it matters to individuals.  The assumption here was that both these messages would communicate best in a casual, entertaining way.

To this point, the team has produced videos on subjects such as Why Digital Preservation is Important for You, Digital Natives Explore Digital Preservation and Bridging Physical and Digital Preservation.  The videos are available on the website and also on YouTube.  They have proven reasonably popular, with YouTube view counts of around 1,000.

Another organization, DigitalPreservationEurope, has produced six animated videos reminiscent of Saturday morning cartoons.  The action has Digiman and Team Digital Preservation facing off against a variety of evil characters bent on thwarting them.  The DPE videos have proven popular indeed, with Digital Preservation and Nuclear Disaster: An Animation racking up over 33,000 YouTube views.

These two bodies of work constitute the main corpus of what can be called digital preservation awareness videos.  I’d say that they represent a good start.  But the competition for attention is fierce.  Consider The 10 Most Watched YouTube Videos of 2010, which include The Bed Intruder Song and Annoying Orange Wazzup with 58 million and 29 million views respectively.

All this points to a the effectiveness of web video as a highly effective channel for communication, particularly for younger people.  The Pew Internet & American Life Project presents evidence that more “millennials” (ages 18-33) use the web for watching videos than they do for getting news or shopping.

The only challenge is to somehow work in digital preservation between autotuned declarations and talking fruit!