Jan 182011
 

The risk of computer hard disk failure is fairly well known: the disk crashes and your computer stops working.

Corrupt 6, by gusset, on Flickr
Corrupt 6, by gusset, on Flickr

Less well-known is a phenomenon known as silent data corruption, where an undetected error occurs in content stored on a drive.  Errors creep in from bugs in both software and hardware (firmware).  “Silent” means the drive does not report it, even in situations where special precautions such as multiple redundant disks (RAID) are used.  The problem remains unknown until you attempt to retrieve the data.

An Analysis of Data Corruption in the Storage Stack, A 2008 study of 1.53 million disk drives over a period of 41 months, found 400,000 silent errors.   That is a disturbingly high number, even though the overall percentage of bad to good data was very small.  The bad news for personal users is that the type of hard disk they are most likely to have–a SATA drive–has a failure rate that is an order of magnitude large than more expensive “enterprise class” drives.

If you have important digital data that you need to keep for a long time the best thing to do is to keep multiple copies in different places and stored on different kinds of media, even different kinds of hard disks.  As I wrote earlier, all varies of digital storage media will eventually fail, so it is essential to have replicated copies.

The more complicated issue is how to detect and fix silent errors.  The most common method is to use a checksum–an unique numerical code for each file–to find bad data.  Once an error is found the next step is to “scrub” it, a process where bad data is replaced by good data from a trusted source.

Individual users have limited choices in this regard, unfortunately.  Checksum comparisons and scrubbing require advanced knowledge and can take a long time.   Commercial data recovery services might be able to help, but they charge a hefty premium.  A cloud storage provider may–or may  not–provide the service; if it does, and if the service runs automatically in the background, this is a fact very much in its favor.