Frictionless data: let it flow

How some agencies use data exchanges that are built on existing systems

Related Links

Fuzzy matching

The eight bosses of data sharing

Why is sharing data across agencies so difficult? You have eight different parties to contend with, notes Steve Sawyer, associate professor at Pennsylvania State University.

With funding from the National Science Foundation, Sawyer and graduate student Michael Tyworth studied how California's Automated Regional Justice Information System was developed. Sawyer presented the duo's findings at the International Conference for Digital Government, held recently in San Diego.

ARJIS provides a good example of how information can be shared across disparate institutional bodies. The program spans 50 criminal justice jurisdictions. It processes about 35,000 transactions a day. Whenever a police officer from the San Diego Sheriff's Office runs the name of an individual through the ARJIS system, results come back not only from that jurisdiction's system but also from the California Highway Patrol, the U.S. Border Patrol and other bodies.

Studying ARJIS, the researchers found that to share data across different agency systems required system designers to think about eight distinct 'large interest groups' that have a stake in each system, Sawyer said. An interest group can either be an organizational body or an abstract entity that is maintained by a set of stakeholders, but all eight must be addressed when facilitating intersystem sharing:

  • Operational governance, the actual operators of a system
  • Administration, the managers of the organization running the system
  • Political operators, the elected representatives representing the constituents that system serves
  • Current applications, the software programs the system runs
  • Data structures, how the system's data is organized
  • Infrastructure, the system's supporting network and operating protocols
  • Design principles, the architecture of the system
  • Deployed devices, the end-user devices.

  • A local politician may want two systems to interact, but the differences between the applications, data models and infrastructures will have to be addressed. Will the end devices be able to view the data? And unless the actual users of the system agree to use the new capabilities, data sharing will not occur.

    ARJIS itself serves as an unusual example of data sharing in that the originators did not follow the traditional approaches of system design, Sawyer said.

    Instead they used an organic approach, one that incorporates both high-level thinking and top-down design along with acceptance of grass-roots, bottoms-up initiatives.

    Local jurisdictions are free to add components to the system, which other jurisdictions may or may not wish to participate in. An ARJIS oversight body keeps tabs on the various extensions of the system to ensure duplication of effort is minimized.

    Such an approach can look disjointed, but in fact, the system'and supporting organization'is able to incorporate new entities and features quite easily, Sawyer said. The strategic goals drive the design, and the system continues to evolve. 'It is not a very neat system, but it is a very adaptive system,' Sawyer said.

    National Park Service CIO Dom Nessi said the agency is using Extensible Business Reporting Language to track contractors such as those operating around the Statue of Liberty.

    Rick Steele

    For new users, preparing their systems for Global JXDM can be a snap. 'The best way is to create a Web service.'

    ' Paul Wormeli, Integrated Justice Information Systems Institute

    Think information sharing at your agency is difficult? It took an order from the president to get intelligence agencies to start to pool their resources.

    In August 2004, President George W. Bush signed into existence the National Counterterrorism Center, an informatic meeting ground for the CIA, FBI and 11 other agencies. Perhaps he was anticipating the 9/11 Commission Report, which would be issued the following month and would state, 'Action officers should have been able to draw on all available knowledge about al-Qaeda in the government. Management should have ensured that information was shared.'

    Sharing. It has become one of the most pressing IT challenges in government since 2001.

    'It is our job to ensure that all the agencies that have a role in the counterterrorism mission get the information they need to do their jobs,' said Wesley Wilson, chief of the NCTC Information Sharing Program Office, at a recent meeting of the Armed Forces Communications and Electronics Association in Bethesda, Md.

    Shortcomings surrounding 9/11 are emblematic of a larger trend across government: the attempt to exploit the full power of electronic data. In theory, information in electronic form is frictionless'it can be moved anywhere with little effort. The possibilities of easy reuse are tantalizing, from stopping terrorists to not requiring citizens to fill out the same information on multiple forms.

    In practice, however, government officials are encountering many obstacles to making data more widely available. Some are political.

    'What you'll find is that those that hold information don't feel all that strong of a need to share [it],' Wilson said. Volumes have been written on the social science of information sharing.

    But technical challenges to data sharing also exist. How do you share when potential users wouldn't have the slightest idea how to interpret raw data from your database? What if external data sources you rely on are moved or updated, leaving you to re-establish a connection, which could take considerable effort?

    A few federal agencies have confronted these sorts of problems head-on and some have even conquered them. As a result, they oversee thriving data exchanges that exemplify the sharing that government has strived for.

    Talk the talk

    When it comes to sharing information, few agencies can boast of greater success than the Justice Department, with its Global Justice Information Sharing Initiative, a coalition of law enforcement agencies that oversees information-sharing projects.

    One secret to its success? Creating a vocabulary of commonly used terms.

    The initiative created the Global Justice XML Data Model, which comprises a data dictionary, a data model and an XML schema. More than 200 state, local and federal law enforcement agencies use the Global JXDM to share criminal justice information. The National Law Enforcement Telecommunications System, a network bridging 18,000 state and local law enforcement agencies, uses Global JXDM to swap 50,000 criminal histories a day. Pennsylvania's Justice Network uses the framework for sharing criminal data among its 4,000 nodes.

    The Global JXDM provides a vocabulary of about 3,000 terms that computer systems use to tag data. Traditionally, while a law enforcement system might use one term to describe a piece of data''manslaughter 2,' for example'a district attorney's system may use a different term ('MS-2'). The advantage of agreeing on one term is that both systems can use the same data. When a police officer makes an arrest, the information the officer captures can be electronically passed on to prosecutors and then passed again to the court system.

    'No one has to re-enter that information 15 times along the way,' said Paul Wormeli, executive director of the Integrated Justice Information Systems Institute, a Justice-funded nonprofit that assists law enforcement agencies in implementing JXDM.

    Users define terms

    For the Justice Department, the key to success was letting participants define the shared vocabulary, Wormeli said. The idea was not to have Justice hand the language to local law enforcement agencies, who then may or may not use it, but rather to provide a forum in which the participants could come to a common conclusion.

    Although the Global JXDM covers a lot of ground, it does not cover everything. Where no suitable terms are present, participants can create their own schema subset, using what's called the Schema Subset Generation Tool to ensure results remain valid within the rest of the Global JXDM.

    For new users, preparing their systems for Global JXDM can be a snap. 'The best way is to create a Web service,' Wormeli said. Many vendors offer tools to set up a Web service in front of a relational database. The next step is to alert authorized parties that the service exists, so they can access it over a network.

    The Global JXDM has been so successful that Justice, along with the Homeland Security Department, is now expanding the dictionary and data model to create a governmentwide data-sharing model called the National Information Exchange Model.

    The programs' leaders are hoping that other agencies use this model for sharing other types of data as well. A draft version 1.0 of the specification is scheduled for this month, with the final version to be released in the fall.

    Follow the money

    The idea of a specialized XML-based vocabulary is also taking root in the financial community through the emerging Extensible Business Reporting Language.

    Like Global JXDM, XBRL offers both a dictionary'one based on standard accounting terminology'and a taxonomy that shows how terms are connected. It's also extensible, so new users can add their own definitions for specialized terms.
    'You can use XBRL to describe any sort of financial information,' said Karl Best, executive director, XBRL-US.

    During this year's Interagency Resource Management Conference, Labor Department CFO Sam Mok suggested XBRL could be used by agencies to standardize financial reporting systems. Later he told GCN that the difficulty many agencies have now is that they cannot share financial data, even if the data they have is generated by software packages from the same company. XBRL offers a common, standards-based lingua franca to exchange data across different systems.

    'I believe there's a much broader use, if it is properly applied,' Mok said.

    Agencies with explicit financial missions are already migrating to the standard. The Securities and Exchange Commission offered private companies incentives to use XBRL for their SEC filings and mutual fund disclosures in order to help the agency more quickly analyze input from many different companies.

    The Federal Deposit Insurance Corp. uses XBRL for its call reports from nearly 8,400 banks and financial institutions. These are huge reports, with thousands of data fields and formulas. XBRL simplifies the process of data aggregation.

    But XBRL has wider application. The National Park Service is piloting XBRL for its Concession Data Management System, said Dom Nessi, chief information officer for the NPS. The agency manages 633 contracts throughout the country's parks, ranging from vendors who shuttle visitors to the Statue of Liberty to those who handle the lodges at Yellowstone National Park.

    Working with consultants at PricewaterhouseCoopers LLP of New York, NPS developed an XBRL-based contract-oversight module and is in the process of automating its annual financial reporting process. Companies submitting data to NPS must submit XBRL-tagged data.

    'By using XBRL, we allow the source data to flow into our application seamlessly,' Nessi said. 'And from there, the data can be transported seamlessly into other applications.'

    Although Global JXDM and XBRL operate in totally different domains of expertise, they share one common trait. Both approaches advocate keeping existing systems rather than replacing them with a unified supersystem. To share information, these initiatives require putting gateways in front of a system in order to translate the data from its native format to one shared by others. Establishing this common platform through translation frees users from dependence on any specific technology.

    This loose-coupling approach, as it's sometime referred to, was part of the success behind the Environmental Protection Agency's Exchange Network, said Kim Nelson, former CIO at EPA and executive director of e-government for Microsoft Federal.
    Nelson oversaw the development of the Exchange Network.

    Operational since 2003, the Exchange Network works as a data repository, collecting environmental information that can be used by more than 40 state, local and tribal agencies that deal in environmental issues.

    The idea behind Exchange Network 'was to create a seamless interface between EPA and the states for sharing information,' Nelson said. 'At the root of it was the notion that there should be one primary steward of the data. If you keep the data close to the original source, your chances of it being accurate are much better.'

    With the Exchange Network, each participating organization is responsible for its own information. Instead of compiling data into a monolithic repository, which EPA has historically done, each participant maintains its own data and offers access through predefined protocols.

    'We're trying to replace the need for them to run our systems,' said Andrew Battin, deputy director of the office of EPA's information collection, which oversees the network.

    The loose-coupling approach also means that when a participating agency upgrades a system that provides data'moving to a new database, for instance'consumers of that data do not have to change their own query systems. All the formats for fetching the data are defined through the Exchange. Each participating agency agrees that any new software meets the formats specified.

    'This has a lot of look and feel of a net-centric data approach,' Battin said.
    Whether it's Global JXDM, XBRL or the EPA's Exchange Network, all these projects have demonstrated that data sharing is far from the insurmountable challenge it's sometimes depicted to be. In fact, once problems such as vocabularies, language and architectural issues are addressed, sharing data can bring about new capabilities and efficiencies not even thought of before.

    But even the parties involved will tell you these successes are just the tip of the iceberg. As NPS' Nessi put it, 'We're embarking on new ground right now.'


    • Records management: Look beyond the NARA mandates

      Pandemic tests electronic records management

      Between the rush enable more virtual collaboration, stalled digitization of archived records and managing records that reside in datasets, records management executives are sorting through new challenges.

    • boy learning at home (Travelpixs/

      Tucson’s community wireless bridges the digital divide

      The city built cell sites at government-owned facilities such as fire departments and libraries that were already connected to Tucson’s existing fiber backbone.

    Stay Connected