Data Capture: The Ins and Outs of Large Scale Document Scanning

With hardware scanner prices dropping, scanners for the capture of business documents of legal and letter size are becoming commodity items, encouraging trends like distributed scanners for capturing data from remote offices.But capture of larger drawings, especially engineering drawings and blueprints, fall into a different category.  Often capture of large documents requires a whole different level of care and attention to be done correctly.  At Formtek, we’ve worked with large size documents for many years and are familiar with the special considerations for those type of document capture projects.

A recent AIIM article by Lisa Andersen-Desautels has a great checklist of considerations for dealing with capture of large documents. Almost all of her comments can be applied equally to standard business-sized documents, but these guidelines are particularly important when dealing with large documents.

Document Prep – Large-size documents often require considerably more work to prepare to make them suitable for scanning.  Many capture projects include very old documents, dusty and dirty, sometimes fragile, and sometimes rolled or attached to sticks or pegs.  Preparing old documents to be scanned can often be very time consuming.

Sorting and File Format Limitations – Pre-sorting documents into sizes can speed the capture process.  You can also determine whether or not the target file formats will make sense for the sizes of documents that need to be captured.  Both PDF and JPEG have max file size limitations.

Control Numbers
– Control numbers can help identify documents that have already been processed and help match hardcopy originals to their electronic version.

File Naming – Lisa recommends not investing much time in coming up with a file naming convention.  Especially once the documents have been captured and are part of the content management system.  Our own experience has been that file naming is an important part of the process and shouldn’t be overlooked. Similar to control numbers it is another breadcumb that can later save you hours of frustration in trying to backtrack your steps of the capture process. It’s usually simple to define and implement.

Scanning – Depending on the type of file format that you’re scanning to, you can select to scan pages or sheets of documents into a file that holds multiple images or decide to store each image as a separate file.  Part of the decision is based around considerations for what the acceptable file size should be for the final scanned document.

Resolution and Color
– Decisions also need to be made for what the optimal capture resolution and color should be.  Anything less than 300 dpi will probably not be acceptable.  Depending on the content of drawings being captured, using grayscale or color might be better choices.  The use of monochrome could cause important data to be lost.

Quality Assurance – QA is an important final step in the capture process. Making sure that the fidelity of the captured documents is good, and that the content of the captured document is accurate — no chopped edges, dirt or streaks — is very important.



Digg This
Reddit This
Stumble Now!
Buzz This
Vote on DZone
Share on Facebook
Bookmark this on Delicious
Kick It on DotNetKicks.com
Shout it
Share on LinkedIn
Bookmark this on Technorati
Post on Twitter
Google Buzz (aka. Google Reader)

You must be logged in to post a comment.