Speech translations ring true'almost
Speech translations ring true'almost
Of three packages tested, NaturallySpeaking did best, with an average accuracy rate of 91 percent
By John Breeden II
I had hoped to write this review of the leading speech recognition applications without ever touching finger to keyboard.
How great it would have been to just kick back in my chair and transfer my thoughts effortlessly onto the screen. Instead I sat in front of the keyboard, typing away and slowly developing carpal tunnel syndrome.
Dragon NaturallySpeaking came out on top in a review of leading speech recognition packages at their basic-training level.
Although recognition technology has greatly improved in the past couple of years, it's still not good enough for everyday use.
If you closely study how people communicate, you'll be surprised to learn how many holes in meaning are left open by speakers and interpolated by listeners. Humans, especially when they communicate in person, have a whole set of background data to fall back on when they come to a gap in the conversation. Even if a speaker does not specifically state something, the listener can often figure it out from the context.
A computer listens in a context vacuum. It has no social skills to fill in the blanks and must rely on interpreting the exact sounds. This is complicated by the fact that no one pronounces a word exactly the same way every time. Plus, the English language has lots of words that sound the same but have different meanings'such as red and read'or that are spelled differently based on context'such as to and too.
It's amazing, really, that speech recognition has advanced as far as it has.
I tested three products: Dragon Systems Inc.'s Dragon NaturallySpeaking Professional 4.0, IBM's ViaVoice Millennium Edition Pro, and Lernout & Hauspie Speech Products USA Inc.'s Voice Xpress Professional. I tried them on a powerful Pentium III system with 128M of RAM, hoping it would minimize the training time.
I completed each package's basic training and started processing speech as soon as the program said it was ready. Extra training might have improved the results, but I assigned grades based on how well the products performed after basic training.Now hear this
Dragon NaturallySpeaking took up a scant 179M of storage, and its menus and choices were easy to comprehend. It did, however, have the longest training time of the three products: 20 minutes to finish Level 1 basic training, more than twice as long as the fastest of the three, Voice Xpress. Training consisted of reading snippets from one of nine book choices.
The extra minutes of training did improve speech identification. When I started testing NaturallySpeaking, I initially thought my keyboardless fantasy was coming true. The program dutifully recorded my speech on the screen. But as I branched out to bigger'and interestingly, sometimes smaller'words, chinks appeared in Dragon's armor. I found myself having to fix more and more words, sometimes one or two per sentence, though occasionally I could go for long stretches without touching the keyboard.
Nonetheless, the recognition was the best I have ever experienced at the first level of training. NaturallySpeaking averaged 91 percent accuracy.
Now, 91 percent is terrific but not quite good enough for everyday use. If you're a fast typist, you're probably going to find it faster to let your fingers fly over the keyboard than to have to go back and fix one in 10 words.
Although NaturallySpeaking is slightly better than ViaVoice, the next most accurate product in the review, it costs about $400 more.
One feature that annoyed me was the automatic censoring of colorful language. When I read Gen. George S. Patton's famous D-Day speech aloud, NaturallySpeaking changed all the flamboyant general's salty terms. Even when I specifically trained it to recognize profanity, it still edited me most of the time and suggested similar-sounding alternatives. I wound up yelling at it, giving Naturally-Speaking a verbal tongue-lashing, though you would never know that based on what came over the screen. It was suitable for a PG-rated movie.
IBM ViaVoice finished a close second in reliability. Its installation file, however, was huge, topping the scales at 310M. I saw no drastic improvement for all that extra size.
The IBM program had the best headset and microphone combination. The headset was adjustable and quite comfortable, but vendors really should ship headset microphones with two earpieces. All the ones in this review had only one ear speaker.
Training ViaVoice was easy and enjoyable. A talking pencil guided me through and explained how speech recognition works. Unlike Microsoft's animated helper applications, the ViaVoice pencil is not annoying.
Training ViaVoice took me 12 minutes, and its voice recognition was almost as good as that of NaturallySpeaking. Perhaps the extra training time gave the Dragon product the edge. Even so, ViaVoice could understand 89 percent of what I was saying on average, and it imposed no censorship. What I said was what I got for the most part.
Lernout & Hauspie Voice Xpress Professional was a mixed bag. Although it did not say how much space it would take up, it was the smallest program in the review, at 176M.
Its microphone was basically a thin wire, uncomfortable to wear. I found it difficult to position on my head. Egos aside, I have a normal-sized head, and I can't see a user with a truly large cranium fitting this microphone on.
On the bright side, the Lernout & Hauspie product had an amazingly short training time. After only eight minutes, I was ready to dictate. Also, the package is the least expensive of the three and with enough training, could probably perform as well.
Dictation was not very accurate out of the gate, however. Voice Xpress Professional did better than in its previous incarnations, but I got only 82 percent correct recognition.
The Lernout & Hauspie product had some other good features such as voice control of the entire desktop system. In practice, this worked very well. I could open and close files and do other computer tasks correctly almost 100 percent of the time.'For this review, however, I based most of the grade on the correctness of speech recognition.
I chose Patton's D-Day speech as a test project because it was so colorful and because I thought its prose would challenge built-in pattern recognition engines.
Here are Patton's original remarks to the troops before they left for Normandy, with brackets around the unprintable words:
'I don't want to get any messages saying, 'I am holding my position.' We are not holding a goddamned thing. Let the Germans do that. We are advancing constantly, and we are not interested in holding onto anything, except the enemy's [expletive]. We are going to twist his [expletive] and kick the living [expletive] out of him all of the time. Our basic plan of operation is to advance and to keep on advancing regardless of whether we have to go over, under or through the enemy. We are going to go through him like [expletive] through a goose, like [expletive] through a tin horn!'Dragon NaturallySpeaking's translation:
'I don't want to get any messages saying, 'I am holding my position.' We're not holding a God and name. Let the Germans do that. We are advancing constantly and not interested in holding onto anything, except the enemies [expletive]. We're going to twist [expletive] and take the living shakeout of him all of the time. Our basic plan of operations is to advance and keep on advancing regardless of whether we have to go over, under or through the enemy. We're going to go through him like rap the reduce; LightShip through tinhorn!'IBM ViaVoice's translation:
'I don't want to get any messages saying, 'I am holding my position.' We're not holding a god damned thing. Let the Germans do that. We're advancing constantly and not interested in holding onto anything, except the enemies [expletive]. Prayer going to twist his boss and kicked the living [expletive] out of him all the time. Our basic plan of operation is to dance and to keep on a dancing regardless of whether we have to go over, under or through the enemy. We're going to go through him like Kraft through a goose; white sheet through tin horn!'Voice Xpress Professional's translation:
'I don't want to get any messages saying, 'I'm holding my position.' We're not holding a god damn thing. What the Germans do that. We're dancing constantly and we're not interested in holding on to anything, except the enemies [expletive]. We went to twisted [expletive] and take the live aid should out of him all the time. Our basic plan of operation is to advance into keep on advancing regardless of whether we have to go over, under or through the enemy. We're going to get for him like [expletive] the abuse semicolon leg shipped through a tin horn explanation.'
For short basic training, the Lernout & Hauspie product is the best bet, though it sacrifices some accuracy for speed. It's also the least expensive.
Dragon NaturallySpeaking forces longer training. It has high accuracy at a high price. And ViaVoice occupies the middle of the road with medium-length training and medium-quality initial recognition.
|Three speech recognition apps learn fast|
|Dragon NaturallySpeaking Professional 4.0|
Dragon Systems Inc.
|ViaVoice Millennium Edition Pro|
|Voice Xpress Professional|
Lernout & Hauspie Speech USA Inc.
|Training time||19 minutes, 19 seconds||12 minutes, 13 seconds||8 minutes, 2 seconds|
|Average words correct per 100 words||91||89||82|
|Pros||+ Highly accurate||+ Excellent tutorial||+ Brief basic training time|
|Cons||' Censors language||' Large storage requirement||' Long load times when activating|