I’ve been a bookworm for years and have amassed a significant volume of books of all kinds.  Most of my collection is technology related, but volumes of Conan, manga, astronomy, philosophy, genetics, personal productivity, science fiction, fantasy, and just about everything else abound.

After getting my Kindle Christmas before last, and later, my iPad, I’ve grown steadily more comfortable with reading and utilizing digital printed media.  This comfort level, coupled with my semi-new found desire to pare down my physical possessions to things that were essentials and/or truly precious to me led me to turn as many things as I could to bits.  Obviously, once things are bits, they take on new value as they can be manipulated and utilized at will, at any time, from any place.  Thus, this is a fundamentally attractive prospect for me.  The journey started with my family photo collection.  Now, it’s progressed to the shelves of my office.

Here’s how I’m doing it:

I started the process by looking for existing book scanning solutions.  Many of the solutions that are available at the moment are quite expensive.  They’re also fairly large and almost completely specialized to the task of scanning books.  Projects such as Google’s project Gutenberg utilize proprietary solutions leveraging cameras and automatic page turners.  As there are a number of DIY type solutions that claim to offer similar performance, my skepticism prevented me from going down that path.  The technical challenges with non-destructive book scanning in the above manner are non-trivial.  Here’s a list of deal stoppers for me:

  • Consistent lighting across a number of sizes of books requires a lot of tuning.  The surrounding lighting and staging apparatus is non trivial.  For example, tenting the book while cameras operating is highly desirable and automatic paging turning is an absolute essential. The best solution I found out there was probably http://pro.atiz.com/ – but its price was out of range.
  • Many of the camera based solutions do not properly account for the curvature of the book as it approaches its spine.  This leads to a lot of undesirable artifacts.
  • What in the world would I do with the apparatus when I was done with it?

So, I figured the best way to go was the destructive route.  Yes, it’s painful seeing a lot of these old trusty friends destroyed and recycled, but they are being reborn into forms much more useful and economical.

The Hardware:

This was tougher than I thought.  There are a bazillion scanners out there.  Obviously a sheet-fed scanner was the way to go.  But even in that space, there are tons and tons of models to choose from.  A lot of people liked the Fujitsu ScanSnap.  My issue with it, was that is was a sheet-fed only solution.  Many of my book covers are hard and would not be able to be run through a sheet feeder and a book without a cover can’t possibly be judged!

I ended up with an Epson GT-2500.   So far, it’s been a delight to use.  I picked it up for under $500 and it’s a tank.

The Software

A ran into two snags on the project – one with software, and one with debinding.  I wanted the best OCR software I could get for a reasonable amount of money.  Research led me to Readiris Pro.  It was an excellent piece of software.  Its OCR features were much better than I expected, but other areas of the recognition engine were deal-stoppers.  The limit of 50 pages for each new scan document was a huge pain in the ass.  Also, where mixed graphics and text existed, manual manipulation was required to get an optimal result.  The first scan to me hours and hours to complete.  This process simply wasn’t going to scale to 300+ volumes of books.

My savior came in a strange form – Adobe Acrobat Professional.  I downloaded a 30 day trial and it’s been just what the doctor ordered.  I should note, that Readiris’ resulting PDF files were much smaller than what I’m seeing out of Acrobat, but the visual quality was not nearly as good.  Still, I do believe the Readiris has a place in my over-all toolbox.

Debinding

My first thought on debinding was to rent a powered surface planer.  This ended up being a boondoggle. The binding adhesive used in some books extends quite deeply from the spine which requires significant excavation with the planer.  This results is choppy edges that aren’t suitable for being run through a sheet feeder.  Thus, I felt the best way to go forward was with a conventional stack cutter which would result in crisp edge.  These are the devices that printing shops use to cut large stacks of paper via a guillotine style blade and chisel.  I took a look around e-bay and found a number of them in the $400 range, but they looked rather underpowered to me.  I could easily see myself trying to cut more than 250 pages or so and having the device flake out.  So, I decided to check my options.  I called Kinkos (NO… I will NOT call it FedEx Office) and discovered that they’d cut my books for $1.50 each.  Hey, the economics were there.  The prices were in line with both my volume and what I could buy online, and I wouldn’t be stuck with something else I wouldn’t know what do with after finishing.  Kinkos worked like a charm.  Later my wife suggested that I call around and I found out that OfficeMax would do it for $1.20.

Right now, things are working well.  I’m taking my time and doing about 4-5 books a weekend.

To utilized the books, I save them in Acrobat format.  I can transfer them directly to my Kindle or iPad via USB.  I could also email the book to my Kindle if I desired.   For storage, I find that Google Docs is the best solution out there.  As a Microsoft nerd, I’ve always leveraged SkyDrive for my cloud storage needs but its inline searching and editing features are limited to Office files.  Google’s stuff works great with PDF.

Hope this helps someone else who is thinking of doing the same thing.