Seize the data
Government might run on IT systems, but the key to success is control of information; data takes center stage this year
- By Patience Wait
- Jan 09, 2006
He who uses that information first, wins.'
'Cmdr. Greg Glaros, OFT (pictured above)
The critical piece to making government work is no longer systems, but the ability to mine, analyze and use the information in systemsYou can know the name of a bird in all the languages of the world, but when you're finished, you'll know absolutely nothing whatever about the bird. 'Richard Feynman, 1965 recipient of the Nobel Prize in Physics
On the whole we do not unilaterally accept that an observation came from a particular observing platform as pure. We do levels of quality control.'
'Carl Staton, NOAA
The government is neck-deep in a sea of data.
Every action, every event, spawns some number of discrete records that, taken together, make it possible to recreate the event or action that created them. Meteorological data, census information, agricultural production figures, financial records'these and innumerable other datapoints stream into federal agencies every day.
Ever since the first PCs began appearing on agency desks, much of the attention has been on the technology'the speed of the microprocessor, the size of the hard drive, the weight of the laptop, the next new software application, the style of the interface, the networking bandwidth.
Over the past few years, however, the terms of discussion in the government IT sector have changed. The new buzzwords are interoperability, metadata, data mining, data sharing, privacy and information security, to name a few.
The Pentagon has long been a primary driver for advances in technology because of its perpetual search for better ways to fight. And in this shift toward more useful data, it is leading the way too.
'From my perspective, it's all about the data,' said Debra Filippi, head of Net Centric Enterprise Services at the Defense Information Systems Agency.
Filippi said she sees two main reasons for the shift: Technology has matured to the point that it is relatively easy to move data from place to place, and major events'the war on terrorism and last year's natural disasters, for example'highlighted the need to share that data.
'We've got to get ... information to a lot of people we know about and, in a lot of cases, to people that we didn't know about,' such as local first responders, she said.
John Garing, DISA's CIO, said the new view is beginning to permeate DOD's approach to IT programs.
'The department itself is going to shift away from programs and systems to trying to achieve capabilities through the acquisition process,' he said. 'We wind up with gaps when we focus [on systems]. We should be trying to effect capabilities and synchronize systems' to achieve those capabilities.Force transformation
While DISA has many of the day-to-day tactical responsibilities for shifting the military's approach to information, the Pentagon's Office of Force Transformation is the starting point for creative, long-term approaches to network-centric warfare.
Cmdr. Greg Glaros, a former Navy fighter pilot, is a transformation strategist for OFT, thinking up new ways to get information to what he calls the point of the spear'the war- fighter on the ground.
'He who uses that information first, wins,' he said.
But soldiers on the ground can't be inundated with so much data they are paralyzed by the quantity of it. The information has to be sorted, prioritized, analyzed and evaluated for accuracy, as well as a host of other processes to make it useful.
To Glaros, all these tasks can be sorted into two large groups. The first is the information chain'identifying what it takes to gather, store and share information. Second is the action chain.
'What does your organization have to do to take advantage of it?' he said. 'It's organizational alignment, and how we have the capacity to organize actions with coherence.'
One example of these two chains coming together into one initiative dreamed up by OFT is TacSat'short for tactical satellite'a program to launch microsatellites, significantly smaller and lighter than conventional ones. They can be configured with many kinds of payloads, built in less than a year and launched with commercial rockets, all for a fraction of the cost of conventional satellites.
These satellites support the first tactical mile, Glaros said. Soldiers on the ground can direct where the satellites look, what kind of data they gather and to whom they feed it. The data from the satellites can support missions in real time.
The satellites serve 'as a collaborative gateway to virtual operation centers,' he said, 'like any public forum or town square.'What to do with data?
The National Geospatial-Intelligence Agency, or NGA, generates massive quantities of imagery, and geospatial data and intelligence that military and civilian agencies use in support of national security.
Jack Hild, who has executive management authority over most of the planning for data production and acquisition, said the agency began considering data differently about a decade ago.
'We began to develop the concept of foundation data and mission-specific data,' Hild said. 'If you sign on to [Google Inc.'s] Google Earth, typically you'll see some vectors or mapping-type data over imagery; that, in a large way, is what we've been calling foundation data.'
Mission-specific data, on the other hand, is generated to meet specific requests of customer agencies, he said, whether for military assignments or humanitarian missions.
Carl Staton, the National Oceanographic and Atmospheric Administration's CIO, believes 'data is an as-yet undiscovered force multiplier.'
NOAA vacuums up a vast number of observations on weather and ocean conditions from an ever-widening collection of sources, from satellites to sensors, and ocean buoys to fish counters on boats, he said.
But within the next decade, 'we're looking at a 10,000-fold increase in the volume of data that's going to be coming in,' Staton said, creating a physical data management problem. 'We are likely to have to ... provide more computational power, more cycles, to do data assimilation'getting data into the models'than we do running the actual models.'
To coordinate all the data flowing through the government and how it should feed into federal systems, the Office of Management and Budget led the creation of the Federal Enterprise Architecture Data Reference Model. OMB released Version 2.0 of the DRM in December, telling organizations how to describe the structure, categorization and exchange of their information.Tag debate
Additionally, there is an ongoing debate whether there still is a need to tag with metadata for search purposes (see story, Page 25). OMB recently issued a memo requiring agencies to make pubic information more accessible by posting it online. Some in government and the private sector, however, question whether this is enough and want OMB to prescribe more traditional standards and methods to characterize and categorize data.
Getting the information into a format that lets agencies draw from the ocean of data that's available is only part of the battle, however. The issues of quality and control are yet to be resolved.
'Every data set is different, and every data set is produced under differing conditions,' said NGA's Hild. 'For example, we do contract out for a significant part of the work, and we work with international partners to share data.'
For partners certified to meet NGA's standards, he said, the agency will do some sampling to confirm its accuracy. But as sources, including those in the commercial sector, multiply, it becomes more important to understand how the data was generated in order to assess its validity.
Agencies more and more are dealing with data that is inaccurate or dirty. This might be for simple reasons, such as a lack of standards in the way birth dates or names are entered into a system, or more complicated reasons such as falsified records. Either way, agencies are relying on software and manual processes to make their data more usable or clean their data.
NOAA is facing huge challenges in making sure its data is valid. Staton said the quality of data could affect the accuracy of weather forecasts.
'We spend a lot of time ensuring that we mark data appropriately,' he said. 'On the whole, we do not unilaterally accept an observation ... from a particular observing platform as pure. We do levels of quality control. The intended use within NOAA of the data can determine the extent and rigor of the review.'
Owen Ambur, chief Extensible Markup Language strategist for the Interior Department and co-chair of the CIO Council's XML Community of Practice, said the key attributes of a data record'integrity, reliability, authenticity and usability'are outlined in ISO 15489.
'Those attributes can and should be specified in metadata associated with each record,' he said.Access control
Controlling access to data is equally problematic. Data can be classified and access rights granted just to those with the clearances to view it, but the bigger the pool, the harder it becomes to be sure those with access are those who need that access.
'That depends on the data in question,' Ambur said. 'For example, I believe citizens should have the right to control access to their own personal data. However, regardless of the type of data, systems should be designed to gather and maintain complete and accurate records of each and every business process in real time, as it transpires, and no one should have the right or the capability to modify records after the fact.'
To Hild, it's more of a Pandora's box. 'In a sharing-rich environment, I'm not sure it can be controlled,' he said.