A DRM for a new era
Semantic group calls for a chattier data reference model
- By Joab Jackson
- Feb 18, 2009
Semantic-based machine processing of data has gotten so much better in the past few years that federal agencies might truly benefit from an update of the federal data reference model (DRM), according to several experts.
Lucian Russell, owner of Expert Reasoning and Decisions consulting firm and one of the architects who refined Version 2.0 of DRM, outlined the need for such changes at a Semantic Community–Semantic Exchange Workshop in Falls Church, Va., Feb. 17.
Russell and several other experts have contributed to an online forum that is gathering suggestions for the federal CIO Council on how to improve the data-sharing framework. A pair of senior enterprise architects — Rick Murphy of the General Services Administration and Brand Niemann of the Environmental Protection Agency — set up the forum.
DRM is one of five reference models specified in the federal enterprise architecture, the framework the Office of Management and Budget established for standardizing the procurement of information technology systems and the sharing of information. The CIO Council guided FEA’s development.
Last November, the council sent an e-mail message to senior agency IT executives requesting input on improving DRM. OMB set up a wiki for submitting those ideas.
DRM was the last of the five reference models to be published. It provides a way to encode information in a uniform manner so it can be shared across agencies. Version 2.0 has three components: One describes the data, the second specifies how to describe the broader context for the data, and the third specifies the language for sharing the data.
Russell suggested expanding the first two components of DRM — those dealing with describing the data and its context.
He said DRM 2.0’s developers sought to avoid creating additional work for agencies, so the technology only offers a bare-bones approach to providing information on the data’s context. However, expanding that context is essential for automated data sharing and analysis, he added.
Speaking with Government Computer News, Russell offered the example of two agencies with identical data elements that ended up with different representations because the contexts differed. One agency collects certain numeric data twice a year, while the other agency collects the data only once a year. As a result, the two numbers, even though they are tagged identically, might not be directly comparable. Metadata for each element, such as an indicator of how often the data element is collected, would be crucial to accurately compare the two data sources, Russell said.
He added that semantic technologies have improved to such an extent that information can be expressed in the English language in such a way that a machine can parse it unambiguously, thanks to data descriptors, logic transformation tools and definition libraries such as WordNet.
"Concentrate your effort on collecting data and noting how they were collected," Russell said. “If you have that, it is technically possible to use automated techniques to allow for data sharing.” Such self-describing data could be used to automate decision-making systems and business workflows and conduct more intelligent searches.
As examples of data-sharing projects that successfully embody contextual mapping, Russell pointed to NASA's Global Change Master Directory, which indexed more than 18 petabytes of scientific data for reuse, and the Interoperable Knowledge Representation for Intelligence Support project, now under the auspices of the Intelligence Advanced Research Projects Activity.
Although OMB’s deadline for DRM suggestions has passed, the Governance Subcommittee of the CIO Council's Architecture and Infrastructure Committee, which is reviewing the material before submitting it to the FEA Program Management Office, is still accepting submissions, Niemann said.
Joab Jackson is the senior technology editor for Government Computer News.