What's in a name? Analysis tool finds out
- By William Jackson
- Nov 03, 2004
Names, the most common method of identification, can be amazingly imprecise. Familiar English last names can have complex nicknames, middle names, initials, prefixes and suffixes. When foreign cultures and alphabets are involved, names become very hard to classify.
The New England State Police Information Network, one of six regional Justice Department-funded information sharing centers for federal, state and local law enforcement agencies, has found a tool that helps: the Name Reference Library from Language Analysis Systems Inc. of Herndon, Va.
Rick Flood, the network's training coordinator, recalled his days on the police force of a small Rhode Island town with a large Asian population.
'Often, when they gave us names, we couldn't tell if they were giving a proper name or not,' Flood said. 'You want to know who you're speaking to.'
The interactive Name Reference Library contains culture-specific information about nearly 1 billion names.
Besides helping English-speaking cops deal with Asian street gangs, the network also participates in the Antiterrorism Information Exchange, which explains multiple variations of Middle Eastern names.
'In this day and age, we decided this was something important for us to have,' NESPIN director Bill Deyermond said.
The Name Reference Library is a decision support system, not a decision-making system, said Language Analysis Systems CEO Jack Hermansen. 'It is most used in the intelligence community.'
The library does not supply information about individuals bearing a name, but only about the name itself. There are three components: a database of 800 million names from around the world, compressed for fast searching; a name parser that distinguishes components such as surnames, given names and titles such as Mr.; and a search algorithm to identify likely variations of a name.
The results indicate what nationality a name probably represents, the different parts of the name, the gender, and common variations in spelling and structure.
Finding spelling variations is especially valuable, said Jeff David, deputy director of the Combating Terrorism Technology Support Office.
CTTSO's Technical Support Working Group partly funded development of the library. The multiagency working group supports research and development, overseen primarily by the Defense and State departments.
'The library grew out of some work we did years ago for agencies that kept various watch lists,' he said. 'Back in the 1980s, if you didn't spell a name exactly right, you wouldn't come up with a match.'
CTTSO began looking at ways to use software techniques such as fuzzy logic to make better name matches across cultures.
At that time, the primary tool for matching names was a key system called Soundex developed to process 1890 census data. Soundex is a coded surname index based on the sounds rather than spelling of names. Each code consists of the first letter of the surname and three numerals from 1 through 6.
Because names that sound alike, such as Smith and Smyth, have the same code, Soundex can handle some variations in spelling, and parts of it still are in use more than century later.
But Soundex cannot deal effectively with many non-Anglo names. The Name Reference Library uses statistical analysis to replace key indexing.
For 18 years, Language Analysis Systems had been a government consulting firm, working mostly on classified intelligence projects. The first version of its library came out in 2000, and 'by 2002 all of our revenue came from product sales,' Hermansen said. 'It has taken us 20 years to collect the names database.'
Organizations with large lists of names generally do not divulge them. The company persuaded some large list-holders that it did not need complete names, just unassociated lists of surnames and given names, along with gender and country of origin. Analysts then added naming structure and nicknames.
Language Analysis Systems recently released Version 2.3 of the library with more information about Middle Eastern names. 'It's an evolving product,' David said. 'What they have right now is useful.'
But the working group measures success not by product functionality, but by how widely it is used.
'I know it hasn't been adopted by as many people as we would like,' David said, but he is unsure whether the problem lies in the product or in public awareness.
William Jackson is freelance writer and the author of the CyberEye blog.