What's that you say? Language researchers can't quite say yet

What's that you say? Language researchers can't quite say yet

Accurate machine capture of content from spoken and text language is proceeding slowly under the National Institute of Standards and Technology's four-year-old Automatic Content Extraction program.

'The improvements will never end,' NIST's Speech Group noted wryly in a September 2004 update of its ACE test scripts for language researchers.

'NIST coordinates the evaluation of precommercial technologies and tabulates the scores,' Speech Group analyst Mark Przybocki said today. 'We do not endorse any of them'we help the researchers improve their algorithms for processing text languages.'

Intelligence agencies have been searching for tools to translate and capture meaning from foreign languages such as Arabic. ClearForest Corp. of Boston late last month announced that it ranked highest in NIST's latest scores for correct recognition in Arabic documents naming persons, organizations, geopolitical entities, locations, facilities, weapons and vehicles.

ClearForest vice president Rafi Vasserman said the software takes 'a rules-based approach to semantic tagging and extensive pattern matching.'

A graph of NIST's efforts since 1987 to benchmark the correct recognition of Arabic conversation and broadcast speech shows several algorithms now at or near the 50 percent mark.

NIST noted in its evaluations of speech recognition technology that 'huge efforts have been expended in mining information in news wires, broadcasts and conversational speech, and in developing interfaces to metadata extracted in these domains.' But the agency added that 'little has been done in the more challenging and equally important' domains of judicial and legislative proceedings, and other kinds of meetings involving multiple speakers and transcriptions.

inside gcn

  • blockchain (whiteMocca/Shutterstock.com)

    What legislators are learning about blockchain

Reader Comments

Please post your comments here. Comments are moderated, so they may not appear immediately after submitting. We will not post comments that we consider abusive or off-topic.

Please type the letters/numbers you see above

More from 1105 Public Sector Media Group