Top-notch accuracy makes OCR software package a must-have

Top-notch accuracy makes OCR software package a must-have

OmniPage Pro 9.0 reads and re-creates documents with text, color and graphic elements in place

By Jason Byrne

Early optical character recognition software introduced so many errors that it often was easier to retype documents than to scan and correct them.
As OCR has improved, however, documents have grown steadily more complex. Besides text paragraphs, OCR software now must cope with tables, graphics and multiple fonts. Users want the OCR version to look just like the print original, and they have the same expectation for a second-generation fax or photocopy.

Caere Corp.'s last couple of versions of OmniPage Pro not only have improved OCR accuracy across all these types of documents, they also have done fairly well at maintaining formats and layouts.

Installation of OmniPage Pro 9.0 was a breeze on all three platforms supported: Microsoft Windows 95, Windows 98 and Windows NT 4.0. I used scanners from Hewlett-Packard Co. and Visioneer Inc. of Palo Alto, Calif., although the software works with any TWAIN-compliant scanner.

I tested using document tables, graphics, odd fonts, columns and general desktop publishing weirdness. I also tried second- and third-generation copies of the test documents. The results were impressive. Not only did OmniPage Pro recognize the text, it also brought the layouts over.

The results were good enough that the package deserves to be called optical document recognition, or ODR, software.

The character recognition accuracy was amazing. Caere has claimed 99 percent accuracy on laser-printed documents with standard fonts, and my results supported the claim. Except on the most tortuous third-generation documents, accuracy exceeded 88 percent, even for odd fonts.

With a body of OCR experience to draw from, vendors such as Caere are good at designing software to do the necessary tasks automatically and otherwise stay out of the user's way. They also have made it much easier to train the package for more difficult documents.

OmniPage Pro 9.0 shows its maturity without being dated. The most notable new feature is color support. Brochures, pamphlets and color presentations are easier to handle, and color recognition brings verisimilitude to onscreen versions of printed documents. Version 8.0 could recognize color fonts but did not save pictures in color. Now the color pictures are retained.

Exported document formats show photographs at a resolution of 150 dots per inch. The few users who require a higher resolution should scan pictures separately with imaging software.

OmniPage Pro 9.0 handles tables more easily. Tables with horizontal and vertical lines convert automatically into the chosen document format. Tables without lines, however, must have zones manually selected for correct recognition of the contents.

The package recognizes printed spreadsheets. A click of the Auto Zone button parses a spreadsheet while retaining the data and layout fairly well. I captured tables that had no lines by telling OmniPage they were spreadsheets. Legal documents and other documents with single-column formatting are easier to handle in the standard mixed-pages mode.

The OCR Wizard, which guides the user through setting scan and recognition options, gives better explanations than in Version 8.0.

The function formerly known as Check Recognition has been renamed and given a face-lift.

Now called OCR Proofreader, it checks a scanned and recognized document for possible errors. The window in which it runs is resizable to give a better view of the document. This small change shows that Caere continues to refine its interface.

The bundled Caere PageKeeper Standard package lets you create a digital file cabinet for document scans. It has mutual toolbar links with OmniPage Pro for access to one program's features from within the other. PageKeeper Standard retails separately for $30.

OmniPage Pro 9.0 is better at scanning multiple documents, including those processed by the increasing number of multifunction devices.

Government workgroups that scan large volumes of documents as well as users looking to kill the paper tiger in their in boxes will find OmniPage Pro 9.0 a good choice, not merely a compromise.


  • Records management: Look beyond the NARA mandates

    Pandemic tests electronic records management

    Between the rush enable more virtual collaboration, stalled digitization of archived records and managing records that reside in datasets, records management executives are sorting through new challenges.

  • boy learning at home (Travelpixs/

    Tucson’s community wireless bridges the digital divide

    The city built cell sites at government-owned facilities such as fire departments and libraries that were already connected to Tucson’s existing fiber backbone.

Stay Connected