NLM establishes public-domain XML suite for journals

The National Library of Medicine's National Center for Biotechnology Information has released comprehensive Extensible Markup Language document type definitions for exchanging and archiving articles electronically.

Jeff Beck, a technical information specialist at the center, said the DTDs, downloadable from, are broad enough to fit scientific and technical articles as well as the medical ones archived by NLM's PubMed and PubMed Central Web sites.

'We chopped up what was in a generic article and made a modular tag structure that's very expandable' to handle different kinds of content, Beck said.

For example, it can tag dissimilar abstracts such as an executive summary, key points or, say, the stereochemical aspects of a compound. It also can capture article metadata'title, authors, volume, issue and page numbers.

The center was starting to revise its own archiving DTD when it learned Inera Inc. of Newton, Mass., had a similar project funded by Harvard University and the Andrew W. Mellon Foundation. 'We shared our revised DTD' with that group and Mulberry Technologies Inc. of Rockville, Md., to reach a common DTD format, Beck said.

The joint effort produced the Archiving and Interchange DTD and the Journal Publishing DTD, which 'we are suggesting journals use to submit content to us,' he said. PubMed Central has archived more than 100,000 articles, and PubMed about 13 million.

Now the center hopes to expand the tag set to encompass textbooks and online documentation.

XML projects to be funded by the fiscal 2004 budget proposal amount to only $7 million, according to, but they include some significant efforts.

Among them are the Office of Management and Budget's e-Authentication gateway, a governmentwide XML repository, and technical-document management projects at the Energy and Health and Human Services departments.

Stay Connected

Sign up for our newsletter.

I agree to this site's Privacy Policy.