Top-notch accuracy makes OCR software package a must-have

Top-notch accuracy makes OCR software package a must-have

OmniPage Pro 9.0 reads and re-creates documents with text, color and graphic elements in place

By Jason Byrne

GCN Staff

Early optical character recognition software introduced so many errors that it often was easier to retype documents than to scan and correct them.

As OCR has improved, however, documents have grown steadily more complex. Besides text paragraphs, OCR software now must cope with tables, graphics and multiple fonts. Users want the OCR version to look just like the print original, and they have the same expectation for a second-generation fax or photocopy.

Caere Corp.'s last couple of versions of OmniPage Pro not only have improved OCR accuracy across all these types of documents, they also have done fairly well at maintaining formats and layouts.




Installation of OmniPage Pro 9.0 was a breeze on all three platforms supported: Microsoft Windows 95, Windows 98 and Windows NT 4.0.

I used scanners from Hewlett-Packard Co. and Visioneer Inc. of Palo Alto, Calif., although the software works with any TWAIN-compliant scanner.

I tested using document tables, graphics, odd fonts, columns and general desktop publishing weirdness. I also tried second- and third-generation copies of the test documents. The results were impressive. Not only did OmniPage Pro recognize the text, it also brought the layouts over.




OmniPage Pro easily handles routine OCR, but it also stands out in preserving format and layout of printed documents such as spreadsheets and brochures.



Name that app

The results were good enough that the package deserves to be called optical document recognition, or ODR, software.

The character recognition accuracy was amazing. Caere has claimed 99 percent accuracy on laser-printed documents with standard fonts, and my results supported the claim. Except on the most tortuous third-generation documents, accuracy exceeded 88 percent, even for odd fonts.

With a body of OCR experience to draw from, vendors such as Caere are good at designing software to do the necessary tasks automatically and otherwise stay out of the user's way. They also have made it much easier to train the package for more difficult documents.

OmniPage Pro 9.0 shows its maturity without being dated. The most notable new feature is color support. Brochures, pamphlets and color presentations are easier to handle, and color recognition brings verisimilitude to onscreen versions of printed documents. Version 8.0 could recognize color fonts but did not save pictures in color. Now the color pictures are retained.

Exported document formats show photographs at a resolution of 150 dots per inch. The few users who require a higher resolution should scan pictures separately with imaging software.

OmniPage Pro 9.0 handles tables more easily. Tables with horizontal and vertical lines convert automatically into the chosen document format. Tables without lines, however, must have zones manually selected for correct recognition of the contents.



Box Score''''''
OmniPage Pro 9.0

Caere Corp., Los Gatos, Calif.;

tel. 800-488-1133

www.caere.com

Prices: $454 for single-

user version, $7,375 for 20-user pack,

$80 single-user upgrade,,br>
$1,449 20-user upgrade

Pros and cons:

+The best OCR package with today's
technology

+Tables and color graphics handled
well

'A bit pricey for casual use

'Practice needed to get full value

Real-life requirements:

TWAIN-compliant scanner, Win9x or NT 4.0, 166-MHz or faster Pentium processor, 64M of RAM, 45M storage for software plus more for scanned documents, graphics card supporting 1,024- by 768-pixel resolution at 32-bit color depth, 17-inch or larger high-contrast monitor recommended for proofreading


The package recognizes printed spreadsheets. A click of the Auto Zone button parses a spreadsheet while retaining the data and layout fairly well. I captured tables that had no lines by telling OmniPage they were spreadsheets.

Legal documents and other documents with single-column formatting are easier to handle in the standard mixed-pages mode.

The OCR Wizard, which guides the user through setting scan and recognition options, gives better explanations than in Version 8.0.

The function formerly known as Check Recognition has been renamed and given a face-lift.

Now called OCR Proofreader, it checks a scanned and recognized document for possible errors. The window in which it runs is resizable to give a better view of the document. This small change shows that Caere continues to refine its interface.

The bundled Caere PageKeeper Standard package lets you create a digital file cabinet for document scans. It has mutual toolbar links with OmniPage Pro for access to one program's features from within the other. PageKeeper Standard retails separately for $30.

Government workgroups that scan large volumes of documents as well as users looking to kill the paper tiger in their in boxes will find OmniPage Pro 9.0 a good choice, not merely a compromise.

inside gcn

  • urban air mobility (NASA)

    NASA seeks partners for urban air mobility challenge

Reader Comments

Please post your comments here. Comments are moderated, so they may not appear immediately after submitting. We will not post comments that we consider abusive or off-topic.

Please type the letters/numbers you see above

More from 1105 Public Sector Media Group