Speech Recognition<@VM>Vocabularies, features vary in four voice recognition apps
It's not quite ready to take dictation, but the technology is accurate enough for other useful applications
By Kevin Jonah
Special to GCN
In 1968's '2001: A Space Odyssey,' computers exchanged information with people by talking. Now we're near the year 2000 in the real world, and plenty of people are talking to their computers'though it's often a one-way conversation, using words we shouldn't print. Voice recognition technology is still a long way from Arthur C. Clarke's HAL 9000.
Voice recognition software has nevertheless improved significantly, as have handwriting recognition, optical character recognition and other interfaces between people and machines. But those advances still aren't enough to get most people talking to their computers in kinder, gentler tones.
There are plenty of potential applications for voice recognition. But that's the problem'they're still potential applications, for the most part.Keep it simple
The simplest ones work best. Common sense will tell you that systems that call up data in response to a few well-used verbal commands, in the form of numbers or one- or two-syllable phrases, are more likely to work than, say, a system that tries to transcribe the minutes of the House Ways and Means Committee.
The closest anyone has come to HAL'the disembodied, interactive, intelligent voice'is in technologies such as high-end telephony applications for customer service and specialized systems such as Microsoft's Auto PC. Auto PC apprehends human commands and responds with information in a synthesized voice.
Still, Auto PC is a Windows CE device with a specific'that is, small'set of functions. We're not only a long way from HAL, but that less-classic model for the voice-interface computer, the talking car in 'Knight Rider,' isn't around the corner, either.
Lernout & Hauspie's Voice Xpress, priced at $150, has a feature that accepts dictation from travelers using a notebook PC or handheld device.
IBM's ViaVoice includes a document analyzer for adding words. The Pro Millennium version costs $180.
Generally speaking, there are two modes of voice recognition: command and dictation.
Command mode is when a user says words linked to commands, which are then interpreted and executed. You might use command mode to navigate menus in applications or activate macros.
This is how voice recognition is used in Auto PC and the voice applications for phone systems. Each word is recognized independently and processed by the recognition software, then passed to the operating system or active application as a command, keystroke or predetermined set of commands.
In dictation mode, the computer transcribes what you say into a word processing document, e-mail or other application, adding appropriate punctuation and spelling the words based on context.
Dictation is a more difficult computing function than command mode. The software needs to know a larger set of words and to distinguish between words that sound alike but have different meanings'for example, two and to'from the context of the sentence. That's where most voice recognition software runs into trouble.
Although great strides have been made in improving the recognition skills of desktop PC voice tools, those advances have increased the overall accuracy of voice recognition only slightly. And even an accuracy rate of 90 percent means that one word in 10 will be mangled'not exactly what you hope for when giving dictation.
Unless you have a very good reason to use voice recognition for dictation, you're better off not doing it, unless you don't mind typing in the corrections later.
Most voice recognition software does best at command mode, handling well-worn sets of verbal commands. Fortunately, this is the type of voice recognition that gets the most mileage in most voice recognition applications.
That's a good thing, because there isn't a great deal of competition in the desktop PC voice recognition market to stimulate the push for accuracy. There are three major competitors in this market: Dragon Systems Inc., IBM Corp. and Lernout & Hauspie.
Microsoft has made advancements with its voice recognition technology, Microsoft Speech API 4.0, but it's still a programmer's toolkit and not an integrated suite.
For a high level of accuracy in your voice recognition software, you'll need to invest time in teaching it to understand you.Obedience school
|Tips for Buyers|
- Make sure you know what you're looking for. Do you need command mode or dictation?
- Use specialized software for enterprise-level tasks.
- Throw hardware at the problem. To get responsive voice recognition, you'll need a fast system.
- Adjust your expectations. Even products that have a 98 percent accuracy rate are going to make spectacularly bad mistakes sometimes.
- Remember, desktop means dedicated. Desktop PC voice recognition software works well only for the user who has trained it.
Every voice recognition tool for the desktop PC has to be trained. The primary user has to speak a set of words into the computer's microphone to calibrate the recognition software to his or her voice characteristics. The training process can take an hour or more to complete. Add that to the time spent on the initial configuration of the software and on troubleshooting, and you're looking at spending a full workday to get voice recognition up and running on just one machine.
That machine had better be fast, too. Most voice recognition tools have steep system requirements. Figure on needing at least a 300-MHz Pentium II or III to say anything more than 'See Dick, Jane and Spot.' Also plan to have at least 64M of RAM and plenty of hard-drive space to store all that voice training.
Command mode voice recognition requires additional steps. You have to associate keyboard strokes and application commands with spoken phrases. But you may have less work to do with common applications. Some commands for Microsoft Office and other desktop PC suites may be in the command set, especially if you buy a suite bundled with the voice recognition tool.
IBM packages a version of its ViaVoice with Lotus SmartSuite, and Dragon Systems does the same with Corel's WordPerfect suite. Some packages are tailored to specific requirements. Lernout & Hauspie's Now You're Talking on the Web is for use with Web browsers, for example.
No matter what package you choose, remember that patience is the biggest user requirement with voice recognition software. And be careful'if you don't watch what you say, your computer might ask, 'What are you doing, Dave?' nKevin Jonah, a Maryland network manager and free-lance writer, writes about computer technology.
|Company||Product||Platforms||System requirements||How product learns voices||Telephone recognition||Special features||Price|
|Dragon Systems Inc.|
|Dragon Systems |
|Win9x, NT 4.0 (with Service Pack 3 or greater)||200-MHz Pentium MMX; 48M of RAM for Win9x (reduced training time with 300 MHz and 64M of RAM), 64M of RAM for NT 4.0||One initial script training; adjustments can be made during use ||No||Is capable of Internet browsing, select-and-say editing, dictation playback and text-to-speech conversion; provides multiple-user support||$199|
|Win9x, NT 4.0; Mac OS by year's end; toolkits available for Linux||233-MHz Pentium with 256K Level 2 cache; 48M of RAM for Win9x, 64M of RAM for NT 4.0; 310M available hard drive space||In initial training process and during use; Analyze My Document option adds words to base vocabulary||Offered through|
|Includes specialized vocabularies such as cuisine and legal; comes in Standard, Web and Pro editions||$180||Lernout & Hauspie|
|Win9x, NT 4.0||Pentium II or equivalent; 48M of RAM for Win9x, 64M of RAM for NT (with Service Pack 3); 200M hard drive space; Sound Blaster 16 or compatible sound card (unless using USB headset microphone); noise canceling headset included||Profile is created after user reads enrollment script for five to 10 minutes; editing feature allows adjustments||No||Includes specializedvocabularies, headset microphone and mobile dictation feature that lets users record into some portable devices; comes in Standard, Mobile and Advanced editions||$150|
|Now You're Talking|
on the Web
|Win9x, NT 4.0||200-MHz Pentium MMX or equivalent; 32M of RAM; 200M hard drive space; Sound Blaster 16 or 100 percent equivalent sound card (unless using USB microphone); noise canceling headset included||Same||No||Has talking links (say or click on link); generates a number for each link for navigation; translates to chat-room jargon ("by the way" becomes "BTW")||$50|