Fileserver Upgrade and Thoughts

Completed, over about 3 days last week, upgrading the last of the 400GB drives in the file server.  These new ones are Toshiba 2TB drives (which I paved the way for by buying a 2TB hot spare drive last time I upgraded).

Everything went fairly smoothly, though I messed up one command that dropped the redundancy at one point.  No actual data loss, and I had two complete current backups at the time.

If I were doing this from scratch today, I’d use an AMD motherboard (because it’s much easier to find consumer-price AMD motherboards that will support ECC memory) and a 5-in-3 hot-swap cage (fits in the opening where 3 5.25″ drives are supposed to go, holds 5 3.5″ drives in hot-swap trays).  That would mean a much cheaper and smaller case, plus I wouldn’t need an additional disk controller card (most such motherboards have 6 SATA ports).

And I’d put FreeNAS software one it.  That’s FreeBSD-based rather than Solaris, but still supports ZFS.  ZFS is absolutely wonderful for this sort of use. It supports many enterprise-level features that you won’t get in any other cheap approach to building a home fileserver.

I’ll stick with mirrored pairs rather than parity, though. I can upgrade in place very easily with this setup.  Five drives is perfect for two pairs of data disks plus a hot spare.  The hot spare is useful in emergencies, but also is vital to upgrading the disks in a mirror without reducing the redundancy. I’ve upgraded the current server in 5 steps from 800GB usable space to the current 4TB. A file-server built to this outline (just 4 data drives) would support 8TB of usable space today, far more than I need.

And I’ll boot it off USB thumb drives inside the case rather than from a real disk (the current server has a mirrored pair of 2.5″ disk drives for the system disk; that’s an expense and use of controller slots that’s not really necessary).

I’ve speced out parts at various stores a couple of times; I can build an empty FreeNAS file server of this sort for $300 to $500 depending on details (largely how much memory; the current server ran fine in 2GB, runs fine in 4GB, but FreeNAS documentation suggests it’s memory-hungy; but I suspect that’s for deduplication, which I don’t need).

Digital Photo Archiving

This came up in comments on TOP, and I realized I’d written enough that I wanted to make an article of it and keep it here where I could refer to it easily.

Craig Norris referred to this article about digital bit-rot that he had suffered, and that got me thinking about whether I’m covered against that sort of problem. He says he’s getting a stream of email from people who have had similar problems. I’ve never seen anything like that in my own collection—but I’m doing quite a few things to cover myself against such situations.

Here are things I’m doing to insure the integrity of my digital photo archive:

  • ECC RAM especially in my file server. This memory (and associated software in the OS) can detect up to two bit errors in a word, and correct up to one bit error in a word.
  • No overclocking. I’m not taking risks with data integrity on this system.
  • Storing the images on a ZFS filesystem.  ZFS keeps data block checksums independent of the hardware error protection, so it can detect more errors than just relying on the hardware.  (Also the data is mirrored on two disks).  (The ZFS checksums are larger than the hardware checksums, and so will detect more error cases.  No checksum system will detect all possible changes to a block  of data, though.)
  • Run weekly “scrubs”, where it reads all the blocks on the disks and verifies their checksums.  This means errors will be detected within a week, rather than waiting until the next time I look at an image.  This makes it more likely that I’ll have a valid backup somewhere.  (I have not yet detected any error on a scrub.) The early detection, and detection not depending on a human eye, are very valuable I think.

(I believe the BTRFS and NILFS filesystems for Linux also do block checksums.  ZFS is available in Linux and BSD ports, but none of these  are mainstream or considered production-ready in the Linux world (the original Solaris ZFS that I’m running is production-grade).  You could simulate block checksums with a fairly simple script and the md5sum utility, making a list of the MD5 checksums of all files in a directory and then checking it each week.)

  • For many of the older directories, I’ve run PAR2 to create redundant bits and checksums of the files in the directory (I choose about 15% overhead).  This gives me yet another way to detect and possibly fix errors.  I should really go through and do more of this.
  • Multiple backups on optical and magnetic media, including off-site copies.
  • Using high-quality optical media for backups (Kodak Gold Ultima, MAM Gold archival).
  • I have a program for analyzing the state of optical disks, which can tell how much error correction is going on to make it readable.  This should give me early warning before a disk becomes unreadable.  I need to run this again on some of my older samples.

You’ll notice I can’t achieve these things with white-box hardware and mainstream commercial software.  And that ongoing work is needed.  And that I’m behind on a couple of aspects.

I won’t say my digital photos are perfectly protected; I know they’re not. But I do think that I’m less likely to lose a year of my digital photos than I am of my film photos. A flood or fire in my house would be quite likely to do all the film in, while my digital photos would be fine (due to off-site backups).  (So would the scans I’ve made of film photos.)

Furthermore, I realized recently that I’ve been storing my film in plastic tubs, nearly air-tight, without any silica gel in there. I’m working to fix this, but that kind of oversight can be serious in a more humid climate. (If I lived in a more humid climate, I might have had enough bad experiences in the past that I wouldn’t make that kind of mistake!)

Anyway—the real lesson here is “archiving is hard”. Archiving with a multi-century lifespan in mind is especially hard.

Film, especially B&W film, tolerates benign neglect much more gracefully than digital data—it degrades slowly, and can often be restored to near-perfect condition (with considerable effort) after decades in an attic or garage, say.

Most people storing film are not doing it terribly “archivally”, though. Almost nobody is using temperature-controlled cold storage.  Most people store negatives in the materials they came back from the lab in, which includes plastics of uncertain quality and paper that’s almost certainly acidic.

Digital archives are rather ‘brittle’—they tend to seem perfect for a while, and then suddenly shatter when the error correction mechanism reaches its limits. But through copying and physical separation of copies, they can survive disasters that would totally destroy a film archive.

A digital archive requires constant attention; but it can store stuff perfectly for as long as it gets that attention. My digital archive gets that attention from me, and is unlikely to outlast me by as much as 50 years (though quite possibly individual pictures will live on online for a long time, like the Heinlein photo).

Server Upgrade Chronicles V

And I think I’m going to call it a win. The new disks are in and working. I’ve even got the regular snapshot script working pretty well.

Never did quite get the two new boot disks set up with identical partition sizes, but it doesn’t matter since I attached them both to the mirror (which was limited by the size of the old 80GB disks) first, and then detached the old disks.  At that point it expanded up to the available size, which was the smallest partition on the two new drives.  They differ by a MB or two out of 160GB, not important.

Replacing A Solaris EFI Disk Label

This is kind of an adjunct to part 4 of my “Server Upgrade Chronicles”.

ZFS root pools have some requirements and best-practices at variance to other ZFS pools. One of the most annoying is that you can’t use a whole disk, and you can’t use an EFI-labeled disk. This is annoying because for most ZFS uses using a whole disk is the best practice, and when you do that ZFS puts an EFI label on that disk.

So, when you try to use in a root pool a disk you’d previously used somewhere else in ZFS, you often see this:

bash-3.2$ pfexec zpool attach rpool c4t0d0s0 c9t0d0s0
cannot attach c9t0d0s0 to c4t0d0s0: EFI labeled devices are not supported
on root pools.

What do you do then? Well, you google, of course. And you find many sites explaining how to overwrite an EFI label on a disk. And every single one of them omits several things that seem to me to be key points (and which I had to play around with a lot to get any understanding of). The fact that ZFS is what drew me back into Solaris, and that I wasn’t ever really comfortable with their disk labeling scheme to begin with, is no doubt a contributing factor.

This is going to get long, so I’m putting in a cut here. Continue reading Replacing A Solaris EFI Disk Label

Server Upgrade Chronicles IV

I got the two new system disks attached to the root ZFS pool and resilvered, so right now I’m running a 4-disk mirror for my root!  And I just booted off the #1 new disk, meaning that the Grub installation as well as the mirroring worked, and that the new controller really does support booting.

Actually, most of the excitement was earlier. In playing around with the new disks, I’d made them into a ZFS pool using the whole disk.  This put EFI labels on the disks, which Solaris / ZFS don’t support in a root pool.  So then I had to somehow get the disks relabeled and the partitions redrawn.  This turns out to be a horrible process which is not documented anywhere. The blogosphere is full of pages saying how to do it, and none of them actually tell you much.  Okay, use format -e, that’s helpful.  But they never say what device file to use, and none of the obvious ones exist.  I think you can maybe use any device pointed at the right disk for part of it. Also, I had to create an S0 manually, and I”m not sure I did it ideally (doesn’t matter much, since these disks are 4 times as big as they need to be).

I’m deeply confused by Solaris disk labeling, going back to SunOS days; even then, I thought it was absurd,  fact suicidally idiotic, to describe regions of the disk used for different filesystems which overlap. Okay, you’re not supposed to use any two that overlap for filesystems, but nothing stops you. The whole setup is just baroque, weird, stupid. And then, on x86 hardware, this Solaris idiocy takes place within one real partition (although Solaris documentation tends to call their things partitions).

So, I had to find a way to overwrite EFI labels with SMI labels. Apparently the secret is to use “format -e”. None of the pages said anything about manually creating partitions (or gave any clues for what space you could use; I believe you have to leave space at the start for the boot stuff). Anyway, totally infuriating partial documentation, and then a large group of aficionados giving slightly variant documentation with slight differences, all of it missing the key points.

Did I mention that I’m annoyed?

So I’m going to chase this for a while, until I get it actually figured out, or until I go postal; whichever comes first.