Reading between the lines
- By John Breeden II
- Apr 09, 2008
OmniPage Professional 16 from Nuance Communications is designed to work with paper in all of its forms, but it could also be the key to eliminating paper from your agency, saving both time and the environment.
At its heart, OmniPage is an advanced optical-character recognition (OCR) package ' and older versions of the software did little else. But the latest version is a complete system that can read text from almost any source and create electronic documents that many different programs can open and edit ' if that is your goal.
To start, we decided to try out the new OCR engine using a standard Epson flatbed scanner attached to a modest test system in the lab.
We dug up a bunch of graphics-laden promotional fliers from monitor companies and pulled out our infamous printer test document, which contains letters running through, over and under graphics. Could OmniPage's OCR pull text under these challenging circumstances? And what would happen to the graphics?
OmniPage was extremely accurate in decoding the complicated mess of text on the pages we sent through it. In 45 pages of mixed media scanned, it only got two words wrong ' and they blended into the dark background graphic. Even with those questionable pages factored in, missing two words out of 800 gives the program an impressive accuracy rate of 99.75 percent. And this was under less-than-ideal circumstances. Most documents don't have words printed over a picture of a fog-enshrouded Golden Gate Bridge.
OmniPage picks up graphics and assigns them their own element numbers, which makes deleting them easy. When we wanted to keep a graphic, we could select the True Page option from the Save menu, which keeps everything in the same order as it was captured. We could also select the Flowing Text option, which keeps graphics and text intact but lays them out in a line down the page.
Once you have captured information ' which takes about 20 seconds longer than a standard scan ' you can save it in several formats. If you only need to convert it to an electronic format, you can save the file as a PDF. If you want to be able to edit it later, you can save it as a Microsoft Word file. Or you can save your data in a format that most spreadsheet applications can open. OmniPage 16 files can also be saved as Corel Word- Perfect, HTML and native Excel 2007.
This means you could turn a paper form into an electronic one that looks nearly identical to its source.
We used this to turn a standard Internal Revenue Service W-9 form into an electronic version that could be edited. Nobody would be able to tell the original from one that was filled out online and printed. Instead of printing out a stack, you could simply fill out forms and print them as needed. Although a full scan required about 20 seconds with our test system, it was much faster using one of our frontline test computers.
OmniPage 16 is optimized to take advantage of dual-core processors with improved hyperthreading and parallel-processing algorithms. When running on a top-of-the-line dual-core system, scans on average took only six seconds longer to process and be ready for editing than a standard scan without OCR.
We are not sure how often this might come up, but what if you need to capture a document and don't have a scanner handy?
Maybe someone at a convention gives you a handout to look at that you have to give back. What can you do? If you have a digital camera, you are in luck. Simply photograph the document the way James Bond would before replacing it in the evil genius' safe.
OmniPage 16 has a photograph mode that can correct for the slight angle distortions and rotation that occur when taking pictures by hand. Simply put Omni- Page 16 in Photo mode, and it's amazingly accurate.
We tested this feature by taking pictures of an open book. The software compensated for the curve of the page and flattened the image back into a standard document.Fly on the wall
Finally, we thought we would throw it a curve by taking a picture of a poster advertising a Beatles cover band. But the Fab Four facsimile could not fake out OmniPage 16. The location and date of the show were captured as text, and graphics were assigned separate windows. The quality of the images was not as good as if we had scanned the poster, but all the text was captured and the graphics were readable, which makes OmniPage 16 a great option to have in a pinch ' or perhaps as a tool in the bag of a real-life spy.
On the opposite side of the government spectrum from the spy, anyone who needs to keep information secret can use OmniPage 16's redaction module to black out sensitive information.
You can choose to have certain words or numbers ' or all numbers ' blacked out in every document you scan, or you can simply redact by hand. Or you might do the opposite and highlight important elements.
We set this feature to look for dates and highlight them so we always saw when articles in our review assignment sheets were due ' a pretty cool function even if it eliminated our 'did not know when it was due' excuse.
OmniPage 16 pulls together all aspects of OCR, doing so much more than the older versions that it's almost a separate program suite deserving of a separate name.
The $499 price seems a little high, but it's a good deal when you consider all the features and that you only need to set it up on a single system, depending on how much scanning you do. Version 16 is also available as an upgrade for $199, which you should consider if you're running an older version.Nuance Communications, (800) 443-7077, www.nuance.com
John Breeden II is a freelance technology writer for GCN.