Unlike structured forms, unstructured forms have no specific layout and can vary widely. In this type of forms processing, it becomes necessary to use several advanced capture technologies for sufficient data extraction accuracy and overall success in processing documents. Here are some of the technologies leveraged in unstructured forms processing:
- Full Page OCR – Converting the entire image to text provides a solid data set for the extraction engine to do its work.
- Data Extraction – unstructured processing requires a pattern matching engine to identify and locate text patterns, and then extract the entire data set, a subset or nearby data.
- Validation – extraction engines usually use pattern validation to insure accuracy.
Probably the best example of unstructured documents would be AP invoices. Invoice processing is an extremely complex process that requires high levels of flexibility and accuracy in any forms processing engine. Along with invoice header information, most engines also allow the extraction of the invoice line items, or the table of information.