ReadIris stumbles with Spanish, technical translations
- By John Breeden II
- Sep 21, 1998
Pro and cons:
+ Scans many languages
Error rate varies with language
Cant handle multiple languages in a
document; graphics and chart text handled poorly
Win95, NT or Mac OS; scanner; CD-ROM drive for installation
Que hablas espa'ol? Tildes and other non-English punctuation marks can confuse
English-oriented optical character recognition packages.
ReadIris OCR from Image Recognition Integrated Systems claims to scan 54 languages,
including some fairly obscure ones, and move the text from paper into a word processing
file. In theory, this could free multilingual users from having to retype paper documents
Most OCR software is pretty fickle about the documents it can recognize. Although users
often are advised to stick to first-generation, laser-printed text for best results, this
is one area where ReadIris stands out. It accepts most text equally, regardless of the
To give ReadIris OCR the benefit of the doubt, I did all my testing on laser-printed
documents. I dont have the expertise to judge its performance on all the supported
languages, such as Gaelic, Macedonian and Byelorussian, but I read Spanish fairly well and
managed to recruit help in German.
ReadIris on English documents produced impressive results, but there were odd mistakes.
In a technical document, for example, the software never managed to identify
10Mbit/s correctly, instead printing I DO Mbit/s. That might be
humorous on a license plate but not elsewhere. The error occurred throughout, although
other tech terms were recognized correctly.
The only other negative in the English test was that ReadIris OCR could not read text
inside a chart or graphic. That could be acceptable if the OCR text dropped into the word
processor gave some kind of warning that an unreadable chart or graphic existed at that
location. But it doesnt, so the documents subsequent audience will not realize
something has been left out.
ReadIris performed less accurately on a Spanish document. The phrase Para
aprender ran together to form Paraaprender. And something odd happened
to the word Telemarketing, which changed twice to Telemarketin2.
Minor errors also occurred in words such as art'culo, which lost the
accent mark over the i in favor of an English dot. For most words, the accent mark stayed,
but on a few it was replaced.
The software did better in German. Umlaut dots survived, and my German-speaking
colleague had no problem understanding the document after the OCR process. Looking at the
text, my colleague reported a match.
Multiple languages on the same page confused the program. This is more of an annoyance
than a flaw because the software interprets the unique punctuation of one language in
terms of the other. For example, German words that happen to be on the same page as
Spanish text undergoing OCR will be assigned Spanish symbols.
Users who handle a lot of paper documents written in other languages could benefit from
ReadIris if they ever need to modify the documents. The only difficulty I can see is that
some of the time saved by not having to retype would be lost in proofreading for mistakes.
OCR quality seems to depend on the language. German came out nearly perfect every time,
more so than English. Spanish had a higher error rate than both languages.
John Breeden II is a freelance technology writer for GCN.