Speech recognition

Everyone's talking about it: With rising accuracy rates, these apps can best even a speedy typist

For 20 years developers have been promising that someday we would communicate with our computers so effortlessly through speech that the keyboard would become a secondary input device.

But until very recently, speech recognition has remained a tiny niche application at best and a pipe dream at worst.

Many companies working on personal speech recognition for PCs have disappeared in the past few years, leaving IBM Corp. and the new owner of Dragon Systems' technology, ScanSoft Inc. Of those, ScanSoft even markets the IBM ViaVoice line, so it appears Big Blue is getting out of this side of the business, leaving only one major player.

The other major markets for speech recognition are telephone response and Web applications.
This technology is used in two ways on PCs and Apple Macintosh computers.

One way is to issue commands to the system; this includes Web browser control. Because commands require a limited vocabulary, this has been a useful tool for some years.

The other, much more important application is in dictation. Early in the PC revolution, only business majors and a few far-sighted college-bound students had taken typing in high school. This lack of familiarity with keyboards greatly reduced productivity among executives, managers and technical workers which, in turn, led to a tremendous demand for speech recognition.

Today it's a rare college graduate who can't touch type, but only now is speech recognition becoming powerful and accurate enough to rival touch typing.

A quality microphone and quiet workplace are key to acceptable speech recognition. Training would seem to be a critical step but in fact only basic speaker-dependent training, lasting 10 minutes or so, is necessary. Extended training often is a waste of time.

And speech recognition is finding its way into the mainstream'many Microsoft Corp. and Apple Computer Inc. operating systems include basic voice recognition software.

I've tested voice recognition systems for more than 20 years and been disappointed every time'until now. I recently had the chance to test two of the latest applications, the Dragon Systems Naturally Speaking 7 and the speech recognition software that comes with Microsoft Windows XP.

I ran Naturally Speaking on a typical desktop PC, a 2-GHz Pentium 4 Dell Dimension 8200 with a Turtle Beach Voyetra sound card and 512M of RAM.

When it's good ...

I also tested speech recognition on a far less powerful computer, but one that really relies on this technology'a brand-new Gateway Celeron tablet PC. Because it doesn't have an attached keyboard to lug along, the tablet PC is the perfect platform for speech recognition'if it works.

I did the minimal training on the tablet PC, using the built-in microphone that is really intended for recording conversations. Amazingly, this produced nearly perfect continuous-dictation accuracy.

For a more fair comparison, I tried the headset microphone that came with the Dragon software and was able to dictate at a corrected rate of about 40 words per minute, only half my typing speed, but about double my handwriting recognition rate on the tablet PC.

I would rate the speech recognition that came with the XP version for tablet PCs as about the equivalent of a less-skilled touch typist, certainly as good as many executives and technical workers.

A tablet PC can be held like a clipboard while you write on the glass screen with a special pen'you don't need to sit or have a table to work.

Using the handwriting recognition and speech recognition software, while adding only a 3-ounce headset to the weight of the tablet, I felt confident I could tackle anything'from taking meeting notes to writing and editing large documents'nearly as quickly as I could on a conventional PC.

Using the pen for navigation and editing, and the microphone for text entry and commands, is a faster way to create and edit text than using a keyboard and mouse because you don't have to physically switch back and forth between the keyboard and mouse.

Because of very different processors and amounts of RAM, a speed comparison between the XP software on the tablet and the Dragon NaturallySpeaking on the Dell system isn't completely fair, but an accuracy comparison is.

The professional-level Dragon programs ship with an Andrea Anti-Noise NC-61 microphone headset. It's a fantastic product, contributing greatly to the excellent accuracy I obtained in both tests.

I give a slight edge to the Dragon software in accuracy. The software that ships with XP is good enough for casual use, but anyone doing extensive writing or editing should consider buying the Dragon Professional, Medical or Legal Solutions package because of all the additional productivity tools.

Dragon offers the extra enticement of transcribing speech recorded on a digital tape recorder, as well as the ability to create macros and customize nearly every feature, which makes the high-end applications essential for office applications.

In addition, XP's speech function only works with supported applications. This isn't a real problem because so many people use some version of Microsoft Office, but Dragon works with virtually any program that uses text fields.

Except in the quietest environment, using a microphone with noise cancellation technology is absolutely critical to the accurate translation of speech and text.

I dictated with a television news program running at normal levels in the background to simulate a busy office, yet the software was able to achieve perfect accuracy for several hundred words. I even tried placing the microphone next to the TV and it ignored virtually all of the TV speech and sounds, despite their being louder than my dictation.

Overall performance by both programs was certainly acceptable, even for a fast typist. My only caveat is that I've often obtained 92 percent to 95 percent accuracy rates with speech recognition software in the past, higher than many reviewers. So, although both of these programs were markedly better than earlier versions, individual users may not achieve the extremely high accuracy I did.

Being a very fast typist and turning out a massive amount of copy each week, I would require a speech recognition app with a minimum accuracy rate of 98 percent. Anything less and I might as well go back to my keyboard.

Of course, there has always been a market for even mediocre speech recognition among disabled workers and those with poor typing skills, but for the first time I am seriously considering relying on this technology for the majority of my text input.

Although the command mode works well, I still find it much more convenient to navigate both documents and the Web using a combination of keys and touch pad.

My conclusion is that with powerful low-end PCs, noise cancellation microphones and the latest software, speech recognition has finally come of age, especially if your typing skills are less than perfect.

Many 40-words-per-minute typists might improve their output using a combination of speech recognition for text input and a keyboard for navigation and editing. It's a lot easier on the hands, too, and should be of special interest in any office where repetitive stress injuries are a problem, either as a preventive measure or after an injury has occurred.

John McCormick is a free-lance writer and computer consultant. E-mail him at [email protected].


  • Records management: Look beyond the NARA mandates

    Pandemic tests electronic records management

    Between the rush enable more virtual collaboration, stalled digitization of archived records and managing records that reside in datasets, records management executives are sorting through new challenges.

  • boy learning at home (Travelpixs/Shutterstock.com)

    Tucson’s community wireless bridges the digital divide

    The city built cell sites at government-owned facilities such as fire departments and libraries that were already connected to Tucson’s existing fiber backbone.

Stay Connected