Telephony products are a natural fit for speech recognition
Telephony products are a natural fit for speech recognition
Voice-activated systems save time and confusion, and they work reliably within a narrow framework
By Kevin McCaney
Placing a call to a government or corporate office can be like entering a digital nest of Chinese boxes. You 'select from the following menu options' in each box to get to the menu in the next box, and so on, either into infinity or to where you want to go, whichever comes first. It's the price you pay for automated, 24-hour service.
But advances in speech recognition technology, along with lower computing costs and faster connections, have produced new automated ways around the problem that let a caller take a direct approach.
Systems give telephony a new voice
Bell Atlantic Corp.
Connect@once is a customized, scalable internal directory system that uses voice prompts to make calls, get directory listings, and place and retrieve messages. Internal users connect through an access code such as *0. Available as a service or as a complete system running on a Unix server.
Dragon Systems Inc.
NaturallySpeaking Call Center Edition integrates simultaneous use of the computer and telephone and allows operators to process data through voice commands, dictating directly to Microsoft Word or Corel WordPerfect. The system can create macros and has a total vocabulary of 230,000 words, with active vocabularies of 30,000, 45,000 or 55,000 words. Runs under Microsoft Windows 9x and NT.
ViaVoice Directory Dialer provides automated directory assistance and call routing through a centralized call center. Directory can accommodate up to 250,000 names, 20,000 in one location. Runs on an IBM Netfinity server with 400-MHz Pentium II processor under NT 4.0.
Nortel Networks Corp.
Call Pilot merges voice, fax and e-mail messages into a single interface, managed by Lotus Notes or Microsoft Exchange and Outlook and activated by voice commands. It also supports e-mail interfaces with other messaging systems, including Netscape Messenger and Qualcomm Eudora Pro. Includes Application Builder. Runs under Windows 9x and NT.
Nuance Communications Inc.
Menlo Park, Calif.
Nuance 6 is a speech recognition package that compensates for a range of accents, languages and devices with integrated speaker verification. The scalable system supports standalone or network configurations, uses Java and ActiveX application programming interfaces for application development, and runs under NT, SunSoft Solaris, AIX, Digital Unix and SCO UnixWare. Nuance Express is a lower-cost, entry-level version of Nuance 6.
SpeechWorks International Inc.
SpeechWorks 5.0 allows the design and deployment of voice-activated systems for transactions or data retrieval over the telephone. Has a built-in vocabulary of 100,000 words plus an editor and a feedback loop for continuous upgrades. Runs under NT, Unix and OS/2. SpeechSite is a telephony product modeled on the Web, using a single, voice-activated interface to route calls, access information and conduct transactions.
Telephony products and services with natural language interfaces and database links not only save the caller a lot of button-pushing but also can cut the time it takes to reach a destination while reducing the burden on operators. Speech recognition also goes beyond directory and dialing services to let users leave and retrieve messages, conduct transactions and obtain data, working in a way similar to a Web site.High accuracy
Most speech recognition telephony products come with a sizable built-in vocabulary and an editor that allows for the addition of commonly used words. Many can work in several languages'from European languages to Mandarin'and will adjust to accents and lexicons, with accuracy rates that have increased greatly in the past several years. Practically every maker of speech recogntion telephony products claims accuracy rates of around 98 percent.
A driving force behind the development of the technology has been the Internet, which has raised people's expectations about the availability of information and services. SpeechWorks International Inc.'s SpeechSite, for instance, is modeled on a Web interface and is designed to provide over the phone the same kinds of data and transactions as users get over the Web'from product information to personnel listings to directions from the nearest airport.
A common feature among the products is natural speech. The caller can speak in sentences while the software selects and responds to key words, while the voice on the other end sounds more like a person and less like a recording. Another common feature is greater speed.
Directories such as Bell Atlantic Corp.'s Connect @once and IBM's ViaVoice Directory Dialer eliminate the Chinese-box scenario by linking an initial interface with a database of names and services.
'A speech interface has an advantage over Touch-Tone' because it flattens the menus and accelerates response time, said Alex McAllister, manager of technology development for Bell Atlantic Federal. Say the name of the person, department or service you want to reach and a voice will repeat it before Connect@once dials the number. If the voice recognition software misinterprets the name'which can happen with a large directory and a difficult name'a pause before the system dials lets you restate the name until it gets it right. Even if it takes a couple of tries, the system is considerably quicker than taking a Touch-Tone path.In parameters
Although mistakes occasionally happen, speech recognition telephony products are largely reliable because, unlike dictation products, they use command functions within a well-defined framework, McAllister said.
The base technology has arrived in the form of accurate speech recognition, he said. And as long as an agency has clearly defined what it wants a system to do, he said, interacting with a database is relatively easy using, in the case of Connect@once, standard Structured Query Language queries.
Voice-activated telephony products typically reside on a dedicated server, often alongside a Web server, with links to databases.
Most of the systems listed will run under Microsoft Windows NT as well as several other operating systems, including SunSoft Solaris, IBM OS/2, AIX and other flavors of Unix. And most are built for scalability. A system such as Nuance Communication Inc.'s Nuance 6 is scalable from standalone to network configurations. It uses Java and ActiveX application programming interface tools for application development.
The costs of deploying a speech recognition telephony system can vary widely, depending on the size of the system and what it is designed to do. SpeechWorks' SpeechWorks 5.0, for instance, ranges from $500 to $1,500 per port; the company's SpeechSite system ranges from $50,000 to $150,000.
Speech recognition vendors are presenting their telephony products as a complement, rather than competition, to Web sites, as well as a natural extension of mobile computing networks and a tool for handicapped users. Vendors have gained mostly corporate customers and are trying to gain a foothold in the federal market.Number, please
If nothing else, the systems save time both for callers and the operators who often are a quick second choice for people baffled or frustrated by Touch-Tone menus. And the systems could help make the telephone the next outlet for Internet commerce.