- By Patricia Daukantas
- Sep 18, 2002
Unified language keeps computers in the loop
Carolyn B. Tilley, head of the MEDLARS Management Section at the National Library of Medicine, says many hospitals, health care billing systems, libraries and databases use vocabularies and concepts from the Unified Medical Language System.
Henrik G. DeGyor
For 16 years, the National Library of Medicine has been trying to solve the biggest interface problem between computers and medicine: vocabulary.
A computer doesn't necessarily know whether a heart attack is the same thing as a myocardial infarction. Does 'cold' mean the common cold or chronic obstructive lung disease?
Although NLM's Unified Medical Language System has long operated under the public radar, the connections it makes are helping workers at all health care levels, said Carolyn B. Tilley, head of NLM's Medical Literature Analysis and Retrieval System Management Section.
The Unified Medical Language System got its start in 1986, when NLM director Donald A.B. Lindberg asked Congress for funds because the vocabulary problem was hindering use of computers in medicine.
The same vocabulary roadblock exists in many other areas besides medicine, Tilley said. There are often different names for the same thing'for example, TV and television. In medicine, hypertension is the same thing as high blood pressure.
Conversely, the same word or phrase can mean vastly different things. For example, television could mean a single receiver or the entire industry. An internist could use the word ventilation to mean respiration, as in breathing, while an occupational-health specialist might use ventilation to mean the flow of fresh air within a building.
Human minds can grasp the different connotations, but disparate computer systems and databases need help to decipher the linguistic nuances.Collected vocabularies
In the first few years of the unified-language project, NLM hired a number of universities and other organizations, through task-order research contracts, to define possible solutions and then build them.
Apelon Inc. of Ridgefield, Conn., formerly Lexical Technologies Inc. of Alameda, Calif., was one of the original grantees and recently signed a five-year, $14.9 million contract to continue working on the unified medical language.
At first, researchers wrestled with whether they should create a single large medical vocabulary, Tilley said. Because it would have been such an immense job to create and maintain a single vocabulary, NLM decided, 'Why invent something when you've already got it?'
So they collected vocabularies that already existed. The first product, in 1990, was the Metathesaurus.
'It's the sum of its parts'a database of databases, if you will,' Tilley said.
The first edition had just eight vocabularies. It now has 95, 65 of them in English. Within those vocabularies are 871,584 concepts and 2.1 million concept names.
All synonyms map to a unique identifying number. For each term, the researchers chose one word or phrase as the primary concept name'for example, hypertension. All other synonyms, such as high blood pressure, map to the primary concept name.
Today the Metathesaurus includes the American Medical Association's Current Procedural Terminology, commonly used in medical billing; the College of American Pathologists' Systematized Nomenclature of Medicine; and the vocabulary of the American Psychiatric Association's Diagnostic and Statistical Manual of Mental and Behavioral Disorders.
Other Metathesaurus vocabularies include the National Drug File from the Veterans Health Administration, the National Cancer Institute Thesaurus, the University of Washington's Digital Anatomist, Britain's Read Clinical Classification and NLM's own Medical Subject Headings.
Most of the vocabularies have abbreviations and acronyms as synonyms. But surprisingly, ER doesn't stand for emergency room in any of the 90 vocabularies, 'even though you and I use it all the time,' Tilley said.
A search of the Metathesaurus yields only endoplasmic reticulum and estrogen receptor as synonyms for ER. The concept of 'emergency room' is in there, though, along with the British phrase 'accident and emergency department.'
Five of the Metathesaurus vocabularies have been selected as standards for medical data code sets under the administrative simplification provisions of the Health Insurance Portability and Accountability Act of 1996.Hierarchy of terms
Besides the Metathesaurus, the Unified Medical Language System has two other so-called knowledge sources: the Semantic Network and the Specialist Lexicon.
The Semantic Network consists of 134 high-level terms for grouping concepts, Tilley said. The semantic types are organized in a parent-and-child hierarchy. For example, the term 'biologic function' has two children, physiologic function and pathologic function, plus numerous grandchildren and great-grandchildren.
The network also defines 54 relationships among the semantic types, and the relationships are grouped hierarchically as well. The parent term 'affects' has such children as 'manages,' 'treats,' 'complicates' and 'prevents.'
The third component, which was developed later than the other two knowledge bases, is the Specialist Lexicon. It and several related programs are widely used in natural language processing, Tilley said.
The Specialist Lexicon relates words by their parts of speech and their rules for inflection, plurals and so forth.
'Specialist Lexicon is useful when you're going through medical records or doing a Web search,' Tilley said. Specialist Lexicon's rules can normalize words'for example, a search for 'diabetic children' would be the same search as 'diabetes AND child.'
These three knowledge sources are 'huge, honking ASCII files' designed for systems developers, Tilley said.
'It's not rocket science, but it's very complex,' she said.
NLM distributes all three knowledge sources free to users who sign a license agreement.
Metathesaurus users need individual licenses for some of the individual source vocabularies, however. About half of the vocabularies are free, but some require end users to notify or request permission if they plan to translate one into another language or incorporate it into a computer system.
Currently there are about 1,500 licensees, but only about one-fifth of them request the data on CD-ROM, Tilley said. Most choose a password and download the files they want from NLM's Knowledge Source Server, at umlsks.nlm.nih.gov
Software programs written for the Unified Medical Language System include MetamorphoSys, a Java downloading tool.
'Most people run [MetamorphoSys] to create their own local metathesaurus that's somewhat smaller,' Tilley said.
Many NLM systems, including the Web site www.clinicaltrials.gov, use the unified language behind the scenes, but most licensees are universities, libraries and other private organizations.
The field of medical informatics is growing very fast, because drug information systems and computerized records save lives and cut costs, Tilley said.
NLM has updated the Metathesaurus annually for most of its existence, but this year it started publishing quarterly updates.
'With all the new drugs, medicine is very dynamic,' Tilley said. NLM wanted to supply licensees with the latest information without waiting a year, and Apelon helps out with back-end conversion of machine-readable vocabularies to a standard form for entry into the Metathesaurus. Then contract and NLM editors review the results.