Automated service extracts usable info from complex documents

Not that long ago, taking the data stored in hundreds of thousands of documents and turning it into something useful could only be accomplished by an army of people poring over warehouses of data. And even then, it might take months or even years to accomplish. But now, Data Conversion Laboratory (DCL) has released a system that can do the very same thing in a matter of days or even hours.

The company’s Automated Document Processing system offers an automated solution to support government organizations searching for critical data that is buried in unusable and unsearchable documents like forms, technical manuals or images. Data Conversion Laboratory works with digitizing, converting and reorganizing content to facilitate universal access.

“While modern optical character recognition tools do a great job on most clean content, they get fooled by the non-textual content of complex documents, and accuracy degrades," said Mark Gross, founder and CEO of DCL.   DCL’s technology transforms documents of varying visual quality and imagery into searchable XML documents, with extracted metadata, which can be stored and accessed via content-management and other end-user systems. Essentially, it pulls out non-text blocks of information and processes them separately, adding them back into the document later in a useable format.

The new Automated Document Processing system, hosted at DCL in a secure environment, has been in operation and testing since early this year and is now ready to be rolled out to those who need to turn mountains of paper forms into useful data. The system is capable of processing hundreds of thousands of pages per day and operating 24/7 without human intervention, which reduces human effort and cost. 

"Customizable to fit the specific requirements of each client, this system can manage the conversion of millions of pages in a 100 percent lights-out environment at such a surprisingly high accuracy rate,” Gross said.

The system includes an integrated communication layer, load balancing capabilities, a workflow engine and a multi-step processing approach. Thoroughly tested and now in operation, DCL Automated Document Processing incorporates decades of conversion expertise and innovative technology to deliver high-quality converted documents and metadata, the company said.

About the Author

John Breeden II is a freelance technology writer for GCN.


  • business meeting (Monkey Business Images/Shutterstock.com)

    Civic tech volunteers help states with legacy systems

    As COVID-19 exposed vulnerabilities in state and local government IT systems, the newly formed U.S. Digital Response stepped in to help. Its successes offer insight into existing barriers and the future of the civic tech movement.

  • data analytics (Shutterstock.com)

    More visible data helps drive DOD decision-making

    CDOs in the Defense Department are opening up their data to take advantage of artificial intelligence and machine learning tools that help surface insights and improve decision-making.

Stay Connected