Medical library posts its texts

Medical library posts its texts

National library's Web site holds 300 books that are available at a click

By Patricia Daukantas

GCN Staff

As part of a long-term research project on information retrieval, the National Library of Medicine has amassed a Web collection of more than 300 books for medical practitioners and caregivers.

Despite the high technical level of most books available through the Health Services/Technology Assessment Text project, the document databases attract plenty of lay people seeking health information, said HSTAT project leader Maureen Prettyman, a computer specialist at the Bethesda, Md., library.


The HSTAT search engine finds
e-books and documents in National Institutes
of Health databases.


The HSTAT Web site, at text.nlm.nih.gov, isn't easy to find from the library's home page. It's considered a research effort instead of a regular service.

The library began experimenting with full-text document retrieval in the late 1980s and adopted Standard Generalized Markup Language early on, Prettyman said. The research picked up steam when another Health and Human Services Department unit, the Agency for Health Care Policy and Research, now the Agency for Healthcare Research and Quality, or AHRQ, received a mandate to publish clinical practice guidelines online as well as in print.

Tomes are us

Books available through HSTAT range from pamphlets to highly structured, full-text reference works of 400 to 600 printed pages.

'These are not small documents,' Prettyman said.

Besides AHRQ's clinical practice guidelines and technology reviews, the library has collections of AIDS-related materials, substance abuse prevention and treatment protocols, health technology reports produced by the state of Minnesota, and consumer guides in English and Spanish.

At first, Prettyman's team of four contractor programmers tried to convert the documents to SGML in-house, but that turned out to be too cumbersome.

About five years ago, the library put one document conversion out to bid, and the chosen contractor, Data Conversion Laboratory of Fresh Meadows, N.Y., has continued to work for NLM while the library programmers concentrate on developing the Web site.

Data Conversion president Mark Gross said his company scans and captures the original text by optical character recognition and codes the electronic text for reading in a browser.

Few differences

The company uses its own proprietary software to complete the conversion to SGML, Gross said.

Although the Extensible Markup Language has received lots of attention in recent months, 'it doesn't make sense to switch to XML' if an organization is already using SGML, Gross said. XML is a subset of the older SGML, and the differences are virtually indistinguishable to users.

Outsourcing the work is cost-effective for NLM because the electronic formatting stays much the same from publication to publication.

'In general, they do an excellent job,' Prettyman said of Data Conversion. 'They're able to give a quick turnaround.'

HSTAT documents use SGML to create a table of contents on the fly from the tags within a document, Prettyman said. By clicking on the tags, or small arrowlike icons, readers can expand or collapse the table of contents at will. It hyperlinks to the main document, and a blue 'i' icon points to help functions.

Rather than text broken up with embedded tables and figures as in paper books, the HSTAT books have appendices. The e-documents refer readers to the tables and figures with hyperlinked references in the text and table of contents.

HSTAT runs on a dedicated Sun Microsystems Ultra 450 workstation with four 300-MHz UltraSparc-II processors and SunSoft Solaris 2.6, Prettyman said.

The library staff does not monitor the system 24 hours a day, so if it goes down on nights or weekends, it isn't fixed until the next work day.

'Even though it's running more than 2 million hits a month, it really isn't in production status,' Prettyman said of the HSTAT computer.

Library programmers wrote the HSTAT search and retrieval engine in C++, Prettyman said. The site also uses common gateway interface scripts now being rewritten in Java.

The e-books are stored in an object-oriented database, Versant Object Database Management System from Versant Corp. of Fremont, Calif.

One thing missing

The library logs the queries entered into its search engine but does not associate them with IP addresses because of privacy concerns, Prettyman said.

'We do know we have to add a spell-checker,' she added.

From the types of logged queries, NLM staff members say they believe that HSTAT users range from medical school librarians and medical students to 'a good percentage of relatively na've users who are trying to find information for relatives,' Prettyman said.

The HSTAT team is committed to making its online library accessible to the disabled, Prettyman said.

They ask document providers for text-based descriptions of graphics, and if descriptions are absent, they ensure that each has an alternate tag repeating the title of the figure.

The cost of coding each book into SGML ranges from $1,500 to $3,000, Prettyman said. AHRQ provides the funding for SGML coding of documents that it contributes to the HSTAT collection.

inside gcn

  • data wrangler

    Data wrangling: How data goes from raw to refined

Reader Comments

Please post your comments here. Comments are moderated, so they may not appear immediately after submitting. We will not post comments that we consider abusive or off-topic.

Please type the letters/numbers you see above

More from 1105 Public Sector Media Group