NIH site provides latest medical info to seriously ill

NIH site provides latest medical info to seriously ill

ClinicalTrials.gov serves as a clearinghouse for people seeking highly technical health information

By Merry Mayer

Special to GCN

Most webmasters want to design their sites to welcome even the most unsophisticated Internet users. The National Institutes of Health faced a more complicated challenge three years ago in developing a site that could provide highly technical medical information to the seriously ill and their families.

NIH had to come up with a registry of clinical trials as part of the Food and Drug Administration Modernization Act of 1997. The Web service resulting from that effort, ClinicalTrials.gov, provides information on more than 4,000 federally and privately funded medical studies at more than 47,000 locations nationwide.

The site, at clinicaltrials.gov, serves as a clearinghouse of information on medical studies and clinical trials for people who are sick and might want to participate in a final attempt to get well.

NIH wanted to design a Web interface in which even novices could formulate queries that would generate results, said Dr. Alexa T. McCray, director of the Lister Hill National Center for Biomedical Communications at NIH's National Library of Medicine.

The system had to be able to generate results even when diseases might be misspelled or called by a more common or slang name, such as Lou Gehrig's Disease, the better-known name of amyotrophic lateral sclerosis.

The search engine had to be able to accept ALS or Lou Gehrig's disease, although it did not accept a misspelling of the latter.

But it does accept some common misspellings. For example, search for alzeimer's disease, and you will find that there are 17 studies being done on Alzheimer's, with an 'h.'

The system also had to be able to process information received from 21 NIH institutes, as well as from the private sector.

Do it this way

When designing the system started in 1998, NIH found that the 21 institutes providing the data had different data-collection methods and different levels of technical expertise.

Some institutes had well-established databases for managing their clinical trials, others had Web pages describing their trials but no back-end database support, while others were still managing their data in paper form, McCray said.

The system's designers first had to decide on a standard set of data elements. By the end of 1998, NIH had selected about a dozen required data elements and another dozen optional elements, McCray said. For each clinical trial, the database would have to contain a description of the trial, whether it was still accepting patients, its location and contact information, and the sponsoring organization.

Next, NIH had to work with each institute individually to find a method for transferring the data into NIH's central database at the National Library of Medicine.

NIH developed a Web data entry system, so that institutes with little technical infrastructure wouldn't have to build their own database.

For institutes with systems already in place, NIH helped write scripts that would let NIH's system extract data from the institute's database.

NIH decided to have all data sent in Extensible Markup Language, rather than Hypertext Markup Language. XML is a more streamlined version of Standard Generalized Markup Language. It provides more power than HTML and allows data providers more freedom in what technology they use on their end, McCray said.

Once the data is placed in XML format, it is sent via File Transfer Protocol.

After receiving an XML file, NIH divides a report according to individual clinical trials and saves each item separately.

Next, the data is massaged to add the greatest breadth of vocabulary for searching. The files are enriched with synonyms from the National Library of Medicine's Unified Medical Language System. In this way, searchers can enter ALS or Lou Gehrig's Disease and still be successful.

On the front end, NIH selected K2 Toolkit from Verity Inc. of Sunnyvale, Calif., for search and viewing on the site.

The retrieval engine, which is responsible for managing the user's session, responding to queries and presenting the data back in HTML, is implemented as a Java servlet using the Apache Jserve module from the Java Apache Project. The Java Apache project is on the Web, at java.apache.org. The site is managed by the Apache Software Foundation of Forest Hill, Md.

NIH wrote a spelling correction algorithm that extracts each phrase or word, generates lexical variants and checks these against the database vocabulary. 'Most of the code we wrote ourselves. We used C, Java and some C++,' McCray said.

Merry Mayer is a free-lance writer in Chicago.

inside gcn

  • artificial intelligence (ktsdesign/Shutterstock.com)

    Machine learning with limited data

Reader Comments

Please post your comments here. Comments are moderated, so they may not appear immediately after submitting. We will not post comments that we consider abusive or off-topic.

Please type the letters/numbers you see above

More from 1105 Public Sector Media Group