What's in a name?

Agencies that need help mining names from text files, such as watch lists or unstructured documents, are constantly on the lookout for better software. Jack Hermansen, CEO of Language Analysis Systems Inc. of Herndon, Va., told GCN he knew of a study in the intelligence community where agencies compared various named-entity extraction tools and found only a 20 percent overlap. 'That means they agreed only one-fifth of the time,' Hermansen said. At least one program thought 'General Confusion' was actually a person.

LAS specializes in name matching and has compiled a database of more than a billion names called the Name Reference Library. The company has also developed various intelligent data mining tools that can recognize that 'Confusion' is not a surname, or that names in foreign lan- guages can be represented several ways when translated into Roman characters.

LAS plans to release several new products to help agencies work with names. Among the first and most important is the Named-Entity Extraction Enhancement tool. The tool works with other extraction tools, such as Lockheed Martin's AeroText, as a filter for cleaning out the General Confusions of the world. NameTransliterator is designed to search for names across multilingual databases. And NameInspector analyzes an organization's data file of name information to assess its quality and help identify problems.

'You don't hear much about garbage-in, garbage-out anymore, but that's not because it went away,' Hermansen said. As name data moves from database to database, 'it gets hammered,' he explained. 'We've created our own virus by the way we share data.'

Reader Comments

Please post your comments here. Comments are moderated, so they may not appear immediately after submitting. We will not post comments that we consider abusive or off-topic.

Please type the letters/numbers you see above