Automated service extracts usable info from complex documents

Not that long ago, taking the data stored in hundreds of thousands of documents and turning it into something useful could only be accomplished by an army of people poring over warehouses of data. And even then, it might take months or even years to accomplish. But now, Data Conversion Laboratory (DCL) has released a system that can do the very same thing in a matter of days or even hours.

The company’s Automated Document Processing system offers an automated solution to support government organizations searching for critical data that is buried in unusable and unsearchable documents like forms, technical manuals or images. Data Conversion Laboratory works with digitizing, converting and reorganizing content to facilitate universal access.

“While modern optical character recognition tools do a great job on most clean content, they get fooled by the non-textual content of complex documents, and accuracy degrades," said Mark Gross, founder and CEO of DCL.   DCL’s technology transforms documents of varying visual quality and imagery into searchable XML documents, with extracted metadata, which can be stored and accessed via content-management and other end-user systems. Essentially, it pulls out non-text blocks of information and processes them separately, adding them back into the document later in a useable format.

The new Automated Document Processing system, hosted at DCL in a secure environment, has been in operation and testing since early this year and is now ready to be rolled out to those who need to turn mountains of paper forms into useful data. The system is capable of processing hundreds of thousands of pages per day and operating 24/7 without human intervention, which reduces human effort and cost. 

"Customizable to fit the specific requirements of each client, this system can manage the conversion of millions of pages in a 100 percent lights-out environment at such a surprisingly high accuracy rate,” Gross said.

The system includes an integrated communication layer, load balancing capabilities, a workflow engine and a multi-step processing approach. Thoroughly tested and now in operation, DCL Automated Document Processing incorporates decades of conversion expertise and innovative technology to deliver high-quality converted documents and metadata, the company said.

About the Author

John Breeden II is a freelance technology writer for GCN.


  • Records management: Look beyond the NARA mandates

    Pandemic tests electronic records management

    Between the rush enable more virtual collaboration, stalled digitization of archived records and managing records that reside in datasets, records management executives are sorting through new challenges.

  • boy learning at home (Travelpixs/Shutterstock.com)

    Tucson’s community wireless bridges the digital divide

    The city built cell sites at government-owned facilities such as fire departments and libraries that were already connected to Tucson’s existing fiber backbone.

Stay Connected