Talk to Me
Talk to Me
Speech recognition apps boost accuracy and do more than take dictation
By Mark A. Kellner
Special to GCN
The idea of speech recognition conjures up the romantic notion of commanding a computer a la 'Star Trek' or the 'you talk, it types' image of early advertising for the software.
The other side of the subject may be less pleasant but in some respects more significant. Speech recognition software is often needed when typists and other frequent users of computer keyboards suffer repetitive stress injuries. When typing suddenly becomes painful, the need to use voice recognition software to control a computer and input copy becomes a reality.
Although some reports indicate that the number of RSI cases has fallen since the early 1990s, the medical clinic at the Massachusetts Institute of Technology reports seeing 300 patients a year complaining of RSI, suggesting that the problem is not going away.
In either scenario, the development and improvement of speech recognition software has been, and remains, important. Not only are there many users who look forward to a time when they can control their PCs with voice commands, but the underlying technology is important to other uses.
With the Internet spreading to handheld telephones, and personal digital assistants moving in as extensions of the enterprise, speech recognition takes on a whole new aura. It also can become a factor for agencies trying to meet Section 508 requirements for making information technology accessible to people with disabilities.Discrete and concrete
The rising importance of speech recognition comes at a time when the products are more powerful and more affordable.
'About 10 years ago, voice recognition and speech was a very expensive solution, because it was driven by hardware,' said Krishna Nathan, director of consumer voice recognition systems for IBM Voice Systems in West Palm Beach, Fla. 'That product was sold very specifically to certain businesses and [reseller] channels; it was a product that cost $1,300. The product was 'discrete' technology, where a user would have to pause between speaking each word.'
'Since then, desktop processing power has progressed dramatically, and the cost has come down a lot,' Nathan said. Now, speech recognition is 'all software. We've done away with hardware,' he said.
Today, the new frontiers for speech recognition lie, in part at least, in improving microphone technology. The leading companies in the field are either offering or planning products that work with 'array' microphones, which lie flat on a monitor or a user's desk, eliminating the need for a headset.
Another thing to look forward to later this year is the introduction of microphones that connect to portable PCs and desktop PCs via a Universal Serial Bus port. It is hoped these will deliver better sound quality and, thus, better speech recognition.
Even Linux is being included. IBM is introducing a Linux version of its ViaVoice speech recognition software, and both IBM Corp. and Lernout & Hauspie Inc. (L&H) offer software developer kits to put voice features into Linux applications.
What's more, Linux and other small-kernel operating systems could figure in bringing speech recognition to handheld platforms. Earlier this year, L&H demonstrated a prototype wireless device that has an Intel StrongARM processor and runs Linux. The prototype can check for, read and respond to e-mail by voice command and provide access to Web content.
There has never been a better time to think about using speech recognition software on the job. But, at a time of perhaps the greatest technical advances in speech recognition software, the market is being consolidated. Sort of.Merge in Boston
On June 7, two leaders in the field became one: L&H, which has a good chunk of the market, finalized its acquisition of another speech software sultan, Dragon Systems Inc.
The move increases L&H's product line dramatically, boosts the merged company's market share, and leaves IBM Corp. as the other principal force in the marketplace.
For the next six to nine months, L&H and Dragon will operate with separate offices, Web sites and product lines. The next revision of Dragon's NaturallySpeaking software will bow this fall, and L&H just brought out Version 5.0 of Voice Xpress, its flagship product.
According to Bill DeStefanis, senior director of product management in L&H's PC Applications Division, L&H will 'keep [current] development plans in place' at both companies.
'Certainly for the next generation of products, we will start looking at potentially integrating the best of both applications,' he said.
The combination of the two companies is aimed beyond the desktop PC market, he said.
'If you look at speech more broadly than what's in the retail channel, some of the biggest opportunities are in embedded applications,' DeStefanis said. 'They [Dragon] have some of the best talent working on embedded technologies. The combination will allow us to bring products to market more quickly.'
The speech recognition software field may be the Rodney Dangerfield of applications: It gets no respect. At least, it has not in recent years. PC Data, a Reston, Va., company that tracks sales, had no speech recognition products in its top'20 sales list for April 2000.
Users of earlier versions of programs such as Dragon NaturallySpeaking and Voice Xpress often griped about the time it took to train a system to recognize the way a user spoke.
A lot of effort has gone into making the programs easier to start off with and use. According to DeStefanis, such improvements were by design.
'A new user can, in under 15 minutes, install, enroll [their voice] and have a successful experience with the software in that first half-hour exposure. As with any new technology, the first impressions are very important. People make up minds based on their first minutes, and we try to focus on that experience,' he said.
Beyond making it easier to use the software, makers are working to build greater intelligence into their speech recognition programs, IBM's Nathan said.
'A lot of what we're doing is focusing on the usability and productivity aspects,' Nathan said. 'We now get into the notion of browsing through the Internet, how you make searches less painful. There's a sister problem called natural language understanding, which is just what it says. Taking transcribed speech and acting on it. 'When's next flight to Albuquerque?''you want [the software] not to type but to have the computer act on it. That's the next step, because it makes interaction more natural.'
Lernout & Hauspie's Voice Xpress includes a menu of tools and tips for improving accuracy.
The new version of Voice Xpress displays some of the other advancements in speech recognition. The software includes significantly improved accuracy and usability, and support for e-mail, Internet browsing and other applications. Also included is a technology called Nothing But Speech, which the company says is a new disfluency filter that eliminates the 'ahh' and 'umm' sounds users make while speaking that can increase errors in dictation.
Many of these improvements come from user feedback. Hank Pokigo, product manager for Voice Xpress 5, said other improvements are aimed at making it easier to navigate the software and computer with voice commands.
The new version of Voice Xpress, for example, includes 'a sample command screen that fights the 'blank screen syndrome,' ' Pokigo said.
Although users understood the concept of text dictation, they needed to think about how to issue commands. 'With sample commands,' he added, 'users have a list of the top 20 or 30 commands in front of them; it's helpful to have that hand-holding right with them.'
Along with customer research, Pokigo said, the company values the new generation of processors from Intel Corp. and Advanced Micro Devices Inc. of Sunnyvale, Calif.
'We work directly with Intel and AMD to make sure we're keeping up to speed,' Pokigo said. 'It's very good for speech in general that AMD and Intel have pushed the processor. Pentium III instruction sets directly enhance the ability of speech recognition to get things done faster; because we can do things faster, it increases accuracy.'
IBM's Nathan said: 'The key thing with speech is it's come a long way. It is worth giving it a try. Second is that the nature of its use is evolving, and the applications are evolving. We used to be all about dictation on the desktop, but now there's also telephony, Internet and mobile applications."Mark A. Kellner is a free-lance technology writer in Marina Del Rey, Calif. Contact him via e-mail at firstname.lastname@example.org.