Progress Scanning Minneapa

For a lot of the Minnesota Science Fiction Society members, including me, Minneapa was an important part of what kept the club going.

(An “APA” or Amateur Press Association can maybe best be described today as an early, dead-tree type of social media.  Each member submitted the required number of copies of their “zine” for each issue, which were then collated together, stapled into one or more sections, and sent to all the members. Our zines contained whatever we wanted, which was often our nattering / essays / rants on things important to us, plus comments on the other zines in the last issue. A new issue was collated every 3 or 4 weeks. The membership ran interestingly beyond just people from Minneapolis, and included quite a number of heavy-hitters from the SF community.)

Minneapa ran for 400 issues (not all with me involved), from 1972 to 2003. Some of these issues ran up to 400 pages (in 3 sections).

So, I’ve been working on scanning these for the Minn-StF archives.  (They aren’t going to be published to the web; privacy and copyright issues don’t seem to allow that. They’ll probably be available on some terms to former members, though, and with luck will eventually be transferred to an institutional archive.) A couple of scanning sessions at Minn-StF meetings, plus a few more with the (borrowed) scanner at home, has produced this:

Scanned Minneapas

Today I got done with the boxes I had organized previously, and dug into the next three boxes.  Here they are sorted by issue number decade:

Four more boxes of Minneapa sorted for scanning; Issues 9x through 2xx
Issues 5x through 8x

 

 

 

There is some duplication; this is my and Pamela’s old copies, and I joined before she did (may have dropped out sooner, not sure about that).  And after this I have almost another 200 on the high end that neither of us was a member for (Dean Gahlon and Beth Friedman have agreed to provide those for scanning), and a bit of fill-in on the low end (about 1-55) (Martin Schafer has agreed to provide these for scanning).

A very rough estimate places this somewhere around 50,000 pages (a worst-case estimate gives 160,000 pages, but not all 400 issues were 400 pages). And especially in the early days, few of those pages were photocopied or offset printed; some were mimeographed, but most were dittoed, often using multiple master colors and on paper other than white.  Purple on blue is one of my least favorites to try to OCR. Not all the printing was particularly good, either.  Then there are the hand-written zines.

I keep fussing with ways of adjusting the scans to make them more legible (and to OCR better), and that’s an infinite task (rather like the Augean Stables in fact). But I may be improving the results somewhat (and I’m aiming for methods that can then be applied easily to groups of pages in later issues).

Archiving Minneapa

Or, for those not from this part of science-fiction fandom, just think of it as some rather challenging scanning and OCR issues. (Read about APAs.)

I sorted through three boxes from upstairs and got this:

The first three banker’s boxes, plus extras
Back cover on top of the stack of extras

And there are four more boxes up there waiting.

Now, there is probably a lot of duplication (mine plus Pamela’s copies).

Minn-StF owns an Epson duplex auto-feed scanner, which is kind of tailor-made for this job (“duplex” means it scans both sides of the sheet in one pass through). And it’s amazing how good we’ve gotten at handling individual sheets of paper using just a few plastic rollers. Still, when the paper is 40 or so years old and the stack includes many different kinds of paper intermixed, it can be a challenge. (Most Minneapas had at least offset paper, mimeo paper, ditto paper, and often twilltone. And the covers are sometimes card stock.) Luckily, restarting after a jam is easy, so long as you didn’t let it reset the page numbering to 1 automatically.

I made some test scans at 300 dpi and 400 dpi, and tried saving them as JPEG and TIFF files. The scanner was nearly twice as fast at 300 dpi than at higher resolutions, so I left resolution there. I was pleased, though a bit surprised, to find essentially no visible JPEG artifacts (at 80% quality) on all this text. You’re seeing a lot of the paper texture at full res, and it’s enough to satisfy the OCR software…and the JPEG file is 2 MB or less, the TIFF is about 34 MB. So I actually stored the images as JEPGS.  (Nearly 5000 pages from Saturday’s session, looks like; which was one of those three banker’s boxes.)

Many pages show some browning around the edges. It’s interesting how much variation there is among the different kinds of paper people used.

The print density and clarity varied quite a lot to begin with, as I remember. It certainly varies a lot today.  Here are some examples at 100% size.

OCR of this sort of material ranges from chancy to hopeless. The volume involved is such that no real quality control or proofreading pass on the OCR is possible, either. However, by using a clever PDF feature we can produce “PDF/A” files which, when opened, show you the image of the page, but when searched by the computer let it search the OCR output (including the images of every page does make the files big, though). Even when OCR is bad, it catches words correctly a lot of the time, so searching for a name or a topic keyword will find you many of the references. And important words in a discussion tend to be repeated, so you’ll be brought to most of the pages the discussion occurs on. (My OCR work on this is being done with an old version of ABBYY Finereader.)

There are legal and privacy issues that make it unlikely that the collection of scans will be posted publicly. They may well be available to people who were in Minneapa. Scanning them gives us backup copies and protection against further deterioration, and some convenience for some people with access to the collection.

Anyway…17 down, 383 to go!