For data sharing, message flows are more important than platforms
In the American Recovery and Reinvestment Act of 2009, better known as the stimulus, Title XV concerns accountability and transparency. Unfortunately, the act suffers from the same problem exhibited by USAspending.gov in that it follows an outdated reporting model suitable to data warehousing instead of a real-time information exchange model in line with our new Web 2.0 world. In fact, I have had discussions with participants of the design process for USAspending.gov who proposed a more federated model but unfortunately were overruled by the more traditional data-warehouse proponents.
Don’t get me wrong, data warehousing has its place in the federal data architecture: for decision support, trend analysis and data mining. However, it is not designed for real-time operational status, which is the type of immediate transparency promised by the new administration. To deliver real-time operational status, you need a message exchange system such as the National Information Exchange Model (NIEM) or the Environmental Protection Agency’s Central Data Exchange. They represent an important new shift in IT architectures, in which we raise the message flow to the level of first-class citizen, alongside applications and data stores.
NIEM has achieved multiple real-world successes in this regard; specifically, rapid implementation of the national sex-offender registry and Customs and Immigration Services’ implementation of E-verify.
In both of those systems, the key requirement was to get a large number of players to speak the same language in a short amount of time. That is the essence of operational status in support of a common, enterprise view. This is no different than the type of real-time collaboration required in all emergency situations. How do you get systems developed by different vendors, under different programs, to rapidly share data?
A generic framework to implement those types of information sharing systems was defined in the Federal Enterprise Architecture Data Reference Model. The DRM divided the process into three parts: consistent data description, robust data context and rapid data sharing. Consistent data descriptions require a single, unambiguous set of entity and attribute definitions. Beyond this fine-grained description, a consumer needs contextual information about the data asset that contains the data, security and privacy implications, and its categorization. In short, context tells you how to best use the data.
Finally, data sharing requires a transformation of these models into messages. The message is the result, which is why in the DRM process, description and context support sharing.
Let me digress here for a moment to rant about data standards. I am amazed to hear some people argue that multiple, redundant standards from different standards development organizations are a good thing. Frankly, in terms of effectively implementing a federated system, that is ridiculous and directly contradicts the meaning of the word standard. Multiple standards mean there is no standard. When my colleagues at the Justice Department and I launched NIEM, we took time to evaluate competing standards against objective criteria and selected the best standard. Harmonization requires selecting a single winner and that, in turn, requires leadership.
The lesson here is that message flows are the linchpin to distributed, real-time information sharing systems. Focusing on the flows is how you succeed. Focusing on the ability to rapidly create the flows lowers the entry barrier for participants who want to join the federation. When you think about the success of Web 2.0 applications, it is the triumph of the message over the medium. Think of the ability to receive the same Web content at your desktop, at the server or on your mobile phone. In a Web 2.0 world, the platform is irrelevant as long as the messages can flow. To reinvent information technology for this new era of transparency, focus on the flows.