Geological survey tunes XML tags to keep data on key

Now that the new tools have automated the compliance checking process, NBII sends out e-mails to participants who have entered items incorrectly, and 'it is making them fix them.'

'Mike Frame

Oliviery Douliery

The U.S. Geological Survey, nerve center of the National Biological Information Infrastructure, has installed tools for its collaborative storehouse of biological data to make sure agencies use the correct metadata tags on the content they provide.

USGS officials also are using the tools as an early detection system for hacking incidents, according to Mike Frame, NBII's technology R&D director.

A major part of the NBII is a metadata clearinghouse, which sets standards for metadata schemas and tags that biological researchers use when they add data to the storehouse.

Through the NBII Clearinghouse, at www.nbii.gov, users can search an assortment of standardized descriptions of different biological databases and information products.

The metadata conveys information about subject matter; how, when, where and by whom the data was collected; how to access the database; and a contact person for more information.

The clearinghouse includes metadata descriptions of biological databases and information products provided by USGS scientists, as well as information developed and maintained by other NBII participants, including universities, private-sector groups, and federal and state government agencies.

Groups use the biological data for tasks such as managing public lands, tracking the spread of invasive species such as kudzu and nutria, understanding the growth of cities and helping teachers at the kindergarten to postgraduate levels, Frame said.

USGS has relied on metadata since the outset of the NBII, which comprises a network of server nodes on the Internet maintained by NBII participants. The nodes use various database management systems, including Microsoft SQL, MySQL and Oracle9i, Frame said.

'A data set can be a database or a Web site or a tool,' Frame said.

NBII divides data set maintenance functions among 20 nodes on its network around the country, some of which serve regional interests or thematic topics such as birds or amphibians, or help maintain the network's infrastructure.

Code for the decade

When NBII technical specialists started the clearinghouse in the late 1990s, they relied on Standard Generalized Markup Language and HTML to keep the metadata tags consistent across the network, Frame said.

Now the NBII largely has progressed to using Extensible Markup Language schemas.

NBII relies on two types of schemas for the data it permits on the network, Frame said.

USGS developed one schema as an extension to the standard used by the Federal Geographic Data Committee. It has up to 200 elements.

Not all data sets use all 200 elements, Frame said. NBII users apply the FGDC schema to data sets that researchers use, Frame said. The schema uses some XML code as well as some HTML code.

NBII's other schema is based on Dublin Core Metadata Initiative, a global standard supported by the International Standards Organization and the World Wide Web Consortium.

USGS uses Dublin Core for Web sites it adds to its network. NBII's Dublin Core schema is much simpler, with about 15 data elements such as title, author and type of content. The Dublin Core schema uses XML code throughout.

To maintain consistent use of the two schemas, NBII managers have adopted tools from Hiawatha Island Software Co. Inc. of Concord, N.H. The tools monitor compliance of the regional and thematic nodes' data sets with the FGDC and Dublin Core schemas.

NBII managers adopted Hiawatha's AccMonitor to verify compliance with the clearinghouse's schema and to make automated repairs on files through their server's file management system. AccMonitor also verifies that Web sites on the NBII system comply with the government's accessibility requirements.

Reader Comments

Please post your comments here. Comments are moderated, so they may not appear immediately after submitting. We will not post comments that we consider abusive or off-topic.

Please type the letters/numbers you see above