Starting From a Book

(Continuing the story of self-publishing a new edition of Pamela’s The Dubious Hills from here.)

This book was written on a computer to begin with, so in the ideal situation we’d still have access to the files with the canonical text. However, this frequently doesn’t work out in cases like this where the book was first published more than 20 years ago.

There are at least three ways this doesn’t work out:

The files may simply be lost.

The files may not be readable with any software we currently have (this could possibly happen even if you’re using the same brand-name word processing program, possibly).

And finally, there may never have been files reflecting the final state of the book.  In fact, this is a near-certainty for anything first published in 1994, because at that time the copy-editing process depended entirely on marks on paper.  So, unless the author bothered to update the files to reflect changes made at that stage, there never were files with what we really want in them.

Hence the title of this article; we’re going to recover the text for our edition of the book from a printed copy.

There are at least two ways to approach this, but I’ve only ever used one because it’s so obviously best.  We could simply retype from the printed copy into some word processor (or dictate it into a voice-typing package).  But I actually use the other approach, scanning the pages and then using OCR software on them. I’ve done this more than half a dozen times over the years, and it’s surprisingly easy (I mean, compared to retyping; it still takes a number of hours).

The particular way I did this one bothers some people, I know; it involves destroying the physical copy of the book I scan to save some time and effort (though not as spectacularly as Vernor Vinge does in Rainbow’s End). I’m going to show pictures below the cut; you have been warned!

Yes, I myself do suffer from the delusion that physical books are nearly sacred objects.  However, books issued in the modern era in many thousands of copies are very rarely in really short supply; I’m willing to sacrifice one copy of such a book in the service of producing a good e-text, especially since that will contribute to making the book available to a modern generation of readers. If the book were rare or valuable, I’d handle things differently; I’d take the extra time and effort to get a scan (and correct the scan; a good chunk of the benefit comes from a better scan leading to better OCR leading to fewer hours of correction) without destroying the book.

(There are fancy scanners that will scan a book, turning the pages themselves, without even breaking the spine. However, we do not own one. We’re doing this on our own, with outlay of time but only the absolute minimum outlay of money, since we have more time than money at the moment.)

Okay; that cut with the pictures below it coming up now….

Continue reading Starting From a Book