DATA MANAGEMENT

Data reuse not possible without ontological work, group asserts

Data models are insufficient to enable widespread system interoperability, and organizations need to develop an ontology to explain how different data elements interact. Only when this context is rendered in a computational form can external systems make sense of a data model, asserted Steve Ray, chairman of the Ontology Summit being held this week at the headquarters of the National Institute for Standards and Technology (NIST) in Gaithersburg, Md.

"It's [those] unstated assumptions that kill us, " Ray said. Today, the working group released details of a number of projects that could show how ontologies could help facilitate better interoperability. The group hopes funding organizations, such as the National Science Foundation (NSF) or the Defense Advanced Research Projects Agency, will inspect and possibly provide additional funding to these projects.

Members from the NSF, NIST, the Defense Department and other agencies participate in the summit, as do representatives from academia and European Union government agencies.

At this point the group is not advocating developing actual standards for developing and maintaining ontologies. Rather, it wants standards bodies, such as the World Wide Web Consortium and the Organization for the Advancement of Structured Information Standards, to work ontological aspects into their own frameworks.

The topic of data sharing is certainly a timely one, as the Obama administration has made it a priority for agencies to open their databases for wider use. The proposed Data.gov site, which will be a repository or directory of government data sources, is expected to go live by late May.

In terms of information systems, an ontology is an exhaustive description of some domain of knowledge, with formal definitions of how all the different data elements relate to one another. Once some area of knowledge is encoded, a computer should be able to reason, or make inferences, about the domain's supporting data.

Today, the data models in large systems are not documented in a machine-readable form (if they are documented at all). This approach hinders easy or automatic reuse of the data, however, as it takes time for developers to understand the data model and prohibits outside systems from meaningfully figuring out for themselves how to reuse data.

Ray admits that the study of ontologies is quite abstract, and the Ontology Summit can include members from a wide variety of decidedly non-computer-oriented academic disciplines, such as linguistics and philosophy. So, for this year's meeting, the group wanted to offer a number of practical examples of how ontologies could be used within large IT systems.

At the meeting, representatives from a number of different fields presented projects that could also show the benefit from ontologies, including financial services, manufacturing, emergency response, geospatial and the oil and gas industries. Ray noted that most areas already have systems in place, and so starting some all-encompassing domain-specific data model for each of these industries would be impossible. As a result, the task each industry must do now is bridge their disparate data models, and this could best be done by ontologies.

All the speakers emphasized the need for greater interoperability within their fields. Matthew West, a consultant for Shell Oil, said that as much as 70 percent of the cost of new systems that Shell has implemented go toward making them communicate with other systems.

Scientific units of measurement are one field that could benefit from some ontological attention, said NSF program director Frank Olken.

The global scientific community has standardized upon basic forms of measurement, for dimensions as such length, time, current, temperature and others. In some cases, however, there are no standardized cross-dimensional formats for annotating the varied conditions (or state-dependencies) of how a unit is measured, which can be problematic when trying to interpret that unit of measurement outside its own domain. The pressure under which some physical property is measured can affect the measurement value, but someone reusing that measurement may not know of this variable condition.

Olken said that an ontology for units and measurements could bridge these different dimensions because it could explain the relations of these units of measurement to one another in a richer form. In addition, it could stop systems from making nonsensical calculations, such as determining whether three liters is larger than four miles. And it would also reduce the number of implicit assumptions made by scientists that can often lead to error.

Another field that could benefit would be the geospatial realm. This field has many different types of data elements, ranging from numeric coordinates to names of towns, all of which are difficult to reconcile, said Josh Lieberman, a representative from the Open Geospatial Consortium. The community needs some way of distinguishing how elements fit together, he said. For instance, "Lehigh" could be classified as part of something called "Upper Delaware" which is part of "Delaware." An ontology could define how these elements are related to one another.

"Forming ontologies...is an important part of getting to that common understanding," he said.

Another place ontologies might benefit is in the reuse of data models across different domains.

Deborah MacPherson, from the architectural firm Cannon Design, discussed an initiative called The Open Floor Plan Display Project, which aims to share architectural building information, such as floor plans, with emergency responders.

Reader Comments

Please post your comments here. Comments are moderated, so they may not appear immediately after submitting. We will not post comments that we consider abusive or off-topic.

Please type the letters/numbers you see above