ReadIris stumbles with Spanish, technical translations

Pro and cons:
+        Scans many languages
–        Error rate varies with language
–        Can’t handle multiple languages in a
document; graphics and chart text handled poorly

Real-life requirements:
Win95, NT or Mac OS; scanner; CD-ROM drive for installation

Que hablas espa'ol? Tildes and other non-English punctuation marks can confuse
English-oriented optical character recognition packages.

ReadIris OCR from Image Recognition Integrated Systems claims to scan 54 languages,
including some fairly obscure ones, and move the text from paper into a word processing
file. In theory, this could free multilingual users from having to retype paper documents
for editing.

Most OCR software is pretty fickle about the documents it can recognize. Although users
often are advised to stick to first-generation, laser-printed text for best results, this
is one area where ReadIris stands out. It accepts most text equally, regardless of the
printing method.

To give ReadIris OCR the benefit of the doubt, I did all my testing on laser-printed
documents. I don’t have the expertise to judge its performance on all the supported
languages, such as Gaelic, Macedonian and Byelorussian, but I read Spanish fairly well and
managed to recruit help in German.

ReadIris on English documents produced impressive results, but there were odd mistakes.
In a technical document, for example, the software never managed to identify
“10Mbit/s” correctly, instead printing “I DO Mbit/s.” That might be
humorous on a license plate but not elsewhere. The error occurred throughout, although
other tech terms were recognized correctly.

The only other negative in the English test was that ReadIris OCR could not read text
inside a chart or graphic. That could be acceptable if the OCR text dropped into the word
processor gave some kind of warning that an unreadable chart or graphic existed at that
location. But it doesn’t, so the document’s subsequent audience will not realize
something has been left out.

ReadIris performed less accurately on a Spanish document. The phrase “Para
aprender” ran together to form “Paraaprender.” And something odd happened
to the word “Telemarketing,” which changed twice to “Telemarketin2.”

Minor errors also occurred in words such as “art'culo,” which lost the
accent mark over the i in favor of an English dot. For most words, the accent mark stayed,
but on a few it was replaced.

The software did better in German. Umlaut dots survived, and my German-speaking
colleague had no problem understanding the document after the OCR process. Looking at the
text, my colleague reported a match.

Multiple languages on the same page confused the program. This is more of an annoyance
than a flaw because the software interprets the unique punctuation of one language in
terms of the other. For example, German words that happen to be on the same page as
Spanish text undergoing OCR will be assigned Spanish symbols.

Users who handle a lot of paper documents written in other languages could benefit from
ReadIris if they ever need to modify the documents. The only difficulty I can see is that
some of the time saved by not having to retype would be lost in proofreading for mistakes.

OCR quality seems to depend on the language. German came out nearly perfect every time,
more so than English. Spanish had a higher error rate than both languages.  

About the Author

John Breeden II is a freelance technology writer for GCN.

Stay Connected

Sign up for our newsletter.

I agree to this site's Privacy Policy.