What's that you say? Language researchers can't quite say yet

What's that you say? Language researchers can't quite say yet

Accurate machine capture of content from spoken and text language is proceeding slowly under the National Institute of Standards and Technology's four-year-old Automatic Content Extraction program.

'The improvements will never end,' NIST's Speech Group noted wryly in a September 2004 update of its ACE test scripts for language researchers.

'NIST coordinates the evaluation of precommercial technologies and tabulates the scores,' Speech Group analyst Mark Przybocki said today. 'We do not endorse any of them'we help the researchers improve their algorithms for processing text languages.'

Intelligence agencies have been searching for tools to translate and capture meaning from foreign languages such as Arabic. ClearForest Corp. of Boston late last month announced that it ranked highest in NIST's latest scores for correct recognition in Arabic documents naming persons, organizations, geopolitical entities, locations, facilities, weapons and vehicles.

ClearForest vice president Rafi Vasserman said the software takes 'a rules-based approach to semantic tagging and extensive pattern matching.'

A graph of NIST's efforts since 1987 to benchmark the correct recognition of Arabic conversation and broadcast speech shows several algorithms now at or near the 50 percent mark.

NIST noted in its evaluations of speech recognition technology that 'huge efforts have been expended in mining information in news wires, broadcasts and conversational speech, and in developing interfaces to metadata extracted in these domains.' But the agency added that 'little has been done in the more challenging and equally important' domains of judicial and legislative proceedings, and other kinds of meetings involving multiple speakers and transcriptions.

Featured

  • Records management: Look beyond the NARA mandates

    Pandemic tests electronic records management

    Between the rush enable more virtual collaboration, stalled digitization of archived records and managing records that reside in datasets, records management executives are sorting through new challenges.

  • boy learning at home (Travelpixs/Shutterstock.com)

    Tucson’s community wireless bridges the digital divide

    The city built cell sites at government-owned facilities such as fire departments and libraries that were already connected to Tucson’s existing fiber backbone.

Stay Connected