Document Capture 101: Capture Drives Search
Document Capture Drives Successful Search
One of the first stages in planning for any document scanning project is to ask the question: How do you want to find your documents? Theories vary on best practices, but here are a few tips when designing a document capture and OCR implementation for any document management system:
- Limit your number of fields to 5 or less. So many times i see document scanning customers use way to many fields during their document capture and OCR process. The more fields you have, the more time for end users to enter index data on their documents, and the more chances fields will get skipped. Take the time to interview the end users and truly find how they need to search for their documents.
- Always use a date. Dates are the ultimate filter that can be a life saver when searching for that needle in a haystack in your document scanning solution. Invoice date, purchase order date, contract date, etc. give you the power to narrow down your search results to a specified period and can be a huge help in audit based searches or searches for legal support.
- Use OCR automation to reduce indexing time. Document capture applications provide automation and efficiency, and can reduce end user keying requirements on documents. Strong, accurate Optical Character Recognition (OCR) technology, and Advanced Data Extraction (ADE) are absolutely required.
- Ensure your technology has a QA step. If you are going to go to all the trouble of scanning, capturing and sending documents to a repository, make sure you can check your work. Misfiling a document can a painful experience.
- Full text search is the insurance policy. Always, I repeat always, convert your scanned documents to a searchable format using OCR PDF Image with Hidden text. This will allow for granular searches beyond your index fields/columns, and can help you in the “find a needle in the haystack” tasks. But do not, I say, do NOT rely on full text search as your primary search method. Full text does not let you sort by specific document focused dates, cannot let you do range based searches on specific criteria, and restricts sorting and viewing in most repositories.