Internaut: How agencies could use XML to ensure integrity of data

Shawn P. McCarthy

Who is the ultimate authority for government data quality? Chances are you trust the data you collect within your own agency. You 'sort of' trust other agencies' data, and you cross your fingers and hope for the best with everything else.

Growth of Web services will aggravate this predicament. When data is assembled from multiple databases, Web sites and documents, it's tough to ensure an even level of quality, or to guarantee authoritative sources.

So what exactly makes a piece of government data authoritative? Here's one example. The Social Security Administration is the ultimate authority for Social Security numbers. Dozens of other agencies collect and use the numbers to track other information, but they don't assign the numbers or enforce rules about them.

When various government offices build databases or share records that include a field for a Social Security number, chances are they don't contact SSA during each transaction to confirm correctness.

But what if they could?

Extensible Markup Language can make it possible to establish a specific authority for all data in a collection'not just Social Security numbers but literally every bit of data the government gathers. It's a matter of setting up the proper hierarchies.

Government offices are unlikely to validate their local records directly with SSA every day. But they could pull their data from locations that do conduct daily validations with the ultimate authority.

On a macro level, this means the IRS would be the ultimate authority for federal tax data; the Census Bureau for census data; the Labor Department for job statistics, and so on.

On a micro level, it might be something as basic as assigning Jim down the hall to act as the ultimate authority for one data type collected by your office.

The server could be configured to pull data only from his shared network drive when building a document based on a specific query.

The trick is in setting up the hierarchy.

Not only is this complicated, requiring cooperation by everyone who shares data, but there is also the question of what specification to follow in creating the rules.

One possibility is ebXML, which stands for electronic business using XML. This suite of specifications is designed for conducting commercial business over the Internet in a services-based architecture, so it's not a complete answer for government data sharing. But ebXML does let users set up complicated mappings and hooks to pull data from appropriate Internet sources. Details about ebXML are available at www.ebxml.org.

I hope to hear from GCN readers about other specifications or government projects that use this kind of cross-functional cooperation to establish data authorities for local applications.

But it's a tall order. A General Accounting Office report last year, at www.gao.gov/new.items/d02327.pdf, pointed out that XML interoperability across the government requires an effective cross-agency registry that does not yet exist, and that federal sharing needs haven't yet been consolidated, which makes it difficult for agencies to tag XML data properly.

Shawn P. McCarthy designs products for a Web search engine provider. E-mail him at smccarthy@lycos-inc.com.

Reader Comments

Please post your comments here. Comments are moderated, so they may not appear immediately after submitting. We will not post comments that we consider abusive or off-topic.

Please type the letters/numbers you see above