Digital Photo Archiving

This came up in comments on TOP, and I realized I’d written enough that I wanted to make an article of it and keep it here where I could refer to it easily.

Craig Norris referred to this article about digital bit-rot that he had suffered, and that got me thinking about whether I’m covered against that sort of problem. He says he’s getting a stream of email from people who have had similar problems. I’ve never seen anything like that in my own collection—but I’m doing quite a few things to cover myself against such situations.

Here are things I’m doing to insure the integrity of my digital photo archive:

  • ECC RAM especially in my file server. This memory (and associated software in the OS) can detect up to two bit errors in a word, and correct up to one bit error in a word.
  • No overclocking. I’m not taking risks with data integrity on this system.
  • Storing the images on a ZFS filesystem.  ZFS keeps data block checksums independent of the hardware error protection, so it can detect more errors than just relying on the hardware.  (Also the data is mirrored on two disks).  (The ZFS checksums are larger than the hardware checksums, and so will detect more error cases.  No checksum system will detect all possible changes to a block  of data, though.)
  • Run weekly “scrubs”, where it reads all the blocks on the disks and verifies their checksums.  This means errors will be detected within a week, rather than waiting until the next time I look at an image.  This makes it more likely that I’ll have a valid backup somewhere.  (I have not yet detected any error on a scrub.) The early detection, and detection not depending on a human eye, are very valuable I think.

(I believe the BTRFS and NILFS filesystems for Linux also do block checksums.  ZFS is available in Linux and BSD ports, but none of these  are mainstream or considered production-ready in the Linux world (the original Solaris ZFS that I’m running is production-grade).  You could simulate block checksums with a fairly simple script and the md5sum utility, making a list of the MD5 checksums of all files in a directory and then checking it each week.)

  • For many of the older directories, I’ve run PAR2 to create redundant bits and checksums of the files in the directory (I choose about 15% overhead).  This gives me yet another way to detect and possibly fix errors.  I should really go through and do more of this.
  • Multiple backups on optical and magnetic media, including off-site copies.
  • Using high-quality optical media for backups (Kodak Gold Ultima, MAM Gold archival).
  • I have a program for analyzing the state of optical disks, which can tell how much error correction is going on to make it readable.  This should give me early warning before a disk becomes unreadable.  I need to run this again on some of my older samples.

You’ll notice I can’t achieve these things with white-box hardware and mainstream commercial software.  And that ongoing work is needed.  And that I’m behind on a couple of aspects.

I won’t say my digital photos are perfectly protected; I know they’re not. But I do think that I’m less likely to lose a year of my digital photos than I am of my film photos. A flood or fire in my house would be quite likely to do all the film in, while my digital photos would be fine (due to off-site backups).  (So would the scans I’ve made of film photos.)

Furthermore, I realized recently that I’ve been storing my film in plastic tubs, nearly air-tight, without any silica gel in there. I’m working to fix this, but that kind of oversight can be serious in a more humid climate. (If I lived in a more humid climate, I might have had enough bad experiences in the past that I wouldn’t make that kind of mistake!)

Anyway—the real lesson here is “archiving is hard”. Archiving with a multi-century lifespan in mind is especially hard.

Film, especially B&W film, tolerates benign neglect much more gracefully than digital data—it degrades slowly, and can often be restored to near-perfect condition (with considerable effort) after decades in an attic or garage, say.

Most people storing film are not doing it terribly “archivally”, though. Almost nobody is using temperature-controlled cold storage.  Most people store negatives in the materials they came back from the lab in, which includes plastics of uncertain quality and paper that’s almost certainly acidic.

Digital archives are rather ‘brittle’—they tend to seem perfect for a while, and then suddenly shatter when the error correction mechanism reaches its limits. But through copying and physical separation of copies, they can survive disasters that would totally destroy a film archive.

A digital archive requires constant attention; but it can store stuff perfectly for as long as it gets that attention. My digital archive gets that attention from me, and is unlikely to outlast me by as much as 50 years (though quite possibly individual pictures will live on online for a long time, like the Heinlein photo).

2 thoughts on “Digital Photo Archiving”

  1. The problem with using ext3 and an external checksum is that, even if you find an error, there is no way to convince the RAID layer to repair it! There’s no control path to go back down through the layers and say “you gave me the wrong answer, try again.”

  2. True, but if one has adequate backups, one can restore the file from those. That’s more trouble, but does give you a path to replace the broken file. And the backups are necessary for other reasons anyway. It might be better allocation of resources to NOT use RAID on the primary server, and spend the effort on better backups.

Leave a Reply