The promise of XML

Agencies make headway on tagging data but still need description tools

For the Navy, the Extensible Mark- up Language provided a possible way to meet its FORCEnet objectives. FORCEnet is the Navy's forward-looking architectural framework that intertwines sensors, command and control software, platforms and weapons into a common network.

Information sharing is nothing new. But while sharing data via the Web on a point-to-point basis has long been established, the challenge for the Navy'and other agencies'is designing systems that can share data on the fly, on an ad hoc basis. An application has to make its data available for numerous other applications'even those not foreseen by developers.

That is the promise of XML. In December 2003, the Office of Na- val Research piloted the Joint Expeditionary Warfare Logistics System (JEWLS), a decision-support system that uses Web services to provide logistics information to command and control applications.

A light protocol

'Ten years ago, we probably would have used a CORBA-based solution,' said Adam Sandman, a senior manager of technology for Sapient Corp. of Cambridge, Mass., which has worked with the Navy on the project. Sandman was referring to the Common Object Request Broker Architecture, an older method of sharing software functions across a network.

The problem with CORBA is that it does not work well in battlefield environments, Sandman said. It requires too much configuration and gobbles too many computer resources.

'CORBA is a very heavy protocol. It is very chatty. XML is a light protocol,' Sandman said. 'With something like CORBA, the technology in the programming of the system was coupled with how it was communicated over the network. XML decouples the systems aspect of the communications format. If some data gets lost it doesn't impact the remainder of the transaction.'

As a pilot, JEWLS has been tremendously successful; about a dozen Marine Corps and Navy units have used it. It touches eight Corps systems and 14 Navy systems. It was used in Operation Enduring Freedom in Afghanistan, as well as in Operation Iraqi Freedom.

'When users want to pull logistics information on the [battlefield operations] map, it evokes a Web services call that connects back to the Web server on our system, and pulls the appropriate information for the map,' Sandman said.

Despite all the buzzwords'and buzz'surrounding the acronym, XML is a surprisingly simple concept. As its name suggests, Extensible Markup Language is a format for annotating data so that it can be exchanged across disparate systems.

To think of what XML could do, think of how the Web changed life as we know it. HTML, the format for the Web, is a derivation of the Standard Generalized Markup Language, an older format for sharing electronic documents. HTML works so well because of its simplicity.

Before the Web, widespread document sharing across many platforms was a tricky process. First, there was a problem of getting two computers to talk to each other. Then, enough bandwidth had to be secured to transport the documents. With the Internet, TCP/IP provided the basic underlying protocol to connect all machines, and HTML let the document designer mark up a plain set of data so it could be displayed by the user's browser.

HTML is, however, only de- signed for displaying data in a browser. XML expands the markup notion to exchanging data across different networked systems.

HTML contains only a set number of tags, as defined by the World Wide Web Consortium. XML lets users create their own tags, which means they can annotate however they see fit. A supplier can tag the price of a particular item, which the seller's computer can recognize, assuming they both share the same language, or schema.

But XML's flexibility is also its Achilles' heel. 'That XML is just a syntax is both good and bad,' said Ken Sall, an XML specialist for the consulting firm SiloSmashers Inc. of Fairfax, Va.

'It is easy to construct your own vocabulary. Whether my language and your language will be interoperable depends on a lot of things,' Sall said. The same words may have multiple meanings. 'If you don't have a way to specify what you really mean, there is no way to tell.'

As agencies embark on XML projects and devise their own languages, or taxonomies, to describe their data, they must soon take the next step'defining the relationship between the languages. Such work is essential for data sharing across systems. And while mapping one agency's language to another's is easy work, the true challenge is describing the language so that a computer can interpret it without human intervention.

'Obviously what we need is se- mantic interoperability if we are to communicate effectively,' said Brand Niemann, an Environmental Protection Agency computer scientist and co-chairman of the Federal Semantic Interoperability Working Group. 'Really, we're trying to get to [where data is] machine-processable, so we need to have no ambiguity in the meaning of the terms.'

How much is XML already be- ing used in government? Al- though agencies are using XML, they are only beginning to realize the full flexibility of having data encoded in XML.

Some successes

For the past two years, Sall has compiled a report on the use of XML in government. The latest one was published in November, highlighting 12 federal programs that use XML.

Some of the programs include:
  • The Justice Department is developing a wide-ranging XML framework that would let Justice agencies share information with public-safety communities. Called the Global Justice Information Sharing initiative, the framework has already been used in over 50 law enforcement and legal-related projects, both at the federal as well as the state level.
  • NASA, along with the Air Force Research Laboratory, is defining a set of terms that describe the hardware, mission and development lifecycle of air and spacecraft.

  • The IRS is developing a set of terms for reusing e-training content, called the Sharable Content Object Reference Model.

  • The General Services Administration is developing an acquisition language for the Integrated Acquisition Environment, a Quicksilver e-government initiative to standardize procurement systems.

All these programs have their own standardized XML vocabularies, but the vocabularies may not synchronize with one another. Terms such as 'incident,' 'event,' 'resource' and 'asset' all have different meanings to different communities. As the systems mature and agencies look for additional uses of the data, experts say using a shared language will become essential.

Some agencies are taking the first step toward a larger reconciliation of these vocabularies'namely by developing a location where federal schemas can be placed and easily located. A schema describes the structure of an XML document, including terms that de-scribe each element in that document. A program manager could make use of a schema, or elements within a schema, instead of developing one from scratch.

One such attempt is the Defense Department's Metadata Registry and Clearinghouse, which provides a repository for Defense agencies, civilian agencies and other governmental bodies to publish schemas. The Core.gov repository of government-generated software components also can accept schemas.

Another pilot to build a dedicated XML registry is being undertaken by Starbourne Communications Design.

At its own expense, the Berkeley, Calif., company will collect schemas from both the FirstGov Web site and the Global Justice XML Data Model and Data Dictionary and then make them available through a portal, said Rex Brooks, founder and president of Starbourne.

Eventually, the collected schemas could be used as the basis for developing a federal dictionary of descriptive XML terms.

After the company builds this 'derived registry,' as Brooks called it, it will look into the feasibility of using the results to start a federal glossary of government XML terms.

The results could also be used to construct an ontology, or a description of how the elements of an agency fit together. Search engines would then be able to locate agency material more easily.

'This is just the beginning of the list of things that might be useful when analyzing how the government uses XML,' Brooks said. 'We don't know what we will find, but it's worth the exercise.'

Industry observers note that federal registries could be difficult to use, which might prevent program managers from adopting existing vocabularies.

Others doubt the ability of all government agencies to adopt a singular vocabulary that details every element in the federal environment.

As a result, more work has been done recently in figuring out ways to bridge different vocabularies.

'Even those who are working very hard to become semantically interoperable with XML tags and vocabulary are bumping up against one another, asking how to become interoperable,' Niemann said. 'You begin to realize you can't do enough standardization that way.'

In other words, agencies need a level of logical organization one step above the basic taxonomies that give a name to each element within an agency's data sets. That would be an ontology.
With such an ontology, advocates say that computers could examine one another's data sets and know where to place them into their own systems.

Fortunately, the developers of XML and HTML are working on what they call the semantic Web, which 'deals with the meaning of information,' Niemann said.

Once all the elements within a data set are tagged, the next task is to give computers the instructions for how to interpret the tags within a larger context, to infer relationships that are not explicit.

To this end, XML developers have created new technologies such as the Resource Description Framework, which provides a metadata platform for describing objects, and the Web Ontology Language, which goes by the acronym OWL and gives developers tools for describing relationships among elements.

Using these tools, one can apply a layered approach to describing data, Niemann said, adding that he borrowed the concept from Michael Daconta, the metadata coordinator for the Homeland Security Department.

Layer cake

The top level is the ontology, the middle layer is the metadata, and the bottom layer is the content. Each layer has an associated standard. For the content layer, an agency can use XML. For the metadata, RDF can be used, and OWL can describe the ontology layer.

How would the semantic Web play out on a practical level? Agency webmasters could add agency vocabularies, or constructs, of meaningful terms to mark up Web pages, making them easy to make sense of by search engines and other Web services that might incorporate the data.

The constructs also could be used to annotate material in databases, making it more easily accessible to appropriate users. Other users could include other machines. By making use of these rules of logic, other systems could draw data of interest as needed, as part of a service-oriented architecture that may span different agencies or programs.

As technologies such as OWL and RDF continue to evolve, agencies will find more ways to share data and bring more intelligence to the field of data sharing.

Reader Comments

Please post your comments here. Comments are moderated, so they may not appear immediately after submitting. We will not post comments that we consider abusive or off-topic.

Please type the letters/numbers you see above