Automated service extracts usable info from complex documents
- By John Breeden II
- Oct 10, 2013
Not that long ago, taking the data stored in hundreds of thousands of documents and turning it into something useful could only be accomplished by an army of people poring over warehouses of data. And even then, it might take months or even years to accomplish. But now, Data Conversion Laboratory (DCL) has released a system that can do the very same thing in a matter of days or even hours.
The company’s Automated Document Processing system offers an automated solution to support government organizations searching for critical data that is buried in unusable and unsearchable documents like forms, technical manuals or images. Data Conversion Laboratory works with digitizing, converting and reorganizing content to facilitate universal access.
“While modern optical character recognition tools do a great job on most clean content, they get fooled by the non-textual content of complex documents, and accuracy degrades," said Mark Gross, founder and CEO of DCL. DCL’s technology transforms documents of varying visual quality and imagery into searchable XML documents, with extracted metadata, which can be stored and accessed via content-management and other end-user systems. Essentially, it pulls out non-text blocks of information and processes them separately, adding them back into the document later in a useable format.
The new Automated Document Processing system, hosted at DCL in a secure environment, has been in operation and testing since early this year and is now ready to be rolled out to those who need to turn mountains of paper forms into useful data. The system is capable of processing hundreds of thousands of pages per day and operating 24/7 without human intervention, which reduces human effort and cost.
"Customizable to fit the specific requirements of each client, this system can manage the conversion of millions of pages in a 100 percent lights-out environment at such a surprisingly high accuracy rate,” Gross said.
The system includes an integrated communication layer, load balancing capabilities, a workflow engine and a multi-step processing approach. Thoroughly tested and now in operation, DCL Automated Document Processing incorporates decades of conversion expertise and innovative technology to deliver high-quality converted documents and metadata, the company said.
John Breeden II is a freelance technology writer for GCN.