The data deluge conundrum
At NASA’s World Wind website, any citizen can see images that zoom from satellite altitude into any place on earth. Most people have no concept about what’s behind this powerful 3D experience -- billions of gigabytes (that is, exabytes) of data from NASA’s rich history of planetary, lunar, terrestrial and earth-orbiting missions.
The deluge of data is a familiar theme to any government IT department today. When it comes to Big Data initiatives, however, the challenge isn’t simply the growing amount of data, but the variety of data that must be compiled and analyzed.
In a survey conducted in August 2012 by the 1105 Government Information Group, agencies expect 24 percent growth in the amount of data stored in the next two years. Organizations that currently have between 50 terabytes and1 petabyte in their data stores are expected to experience the most data growth (see figure 1).
The biggest driver of this growth in data is real time feeds from sensors. Seven in 10 government organizations expect to capture data in the form of real-time feeds within two years, compared to 63 percent now (see figure 2). Smaller, more powerful sensors have become the information underpinning for many Big Data pioneers in government.
For example, the Defense Department (DOD) collects much of its data in the form of high-definition imagery from drones, satellites and battlefield sensors. According to an article in the Harvard Business Review, since 9/11 alone, the amount of data the DOD has captured from such sources has increased a staggering 1,600 percent.
The 1105 Government Information Group survey found the biggest growth sector among non-traditional data sources will be from video and audio feeds; 55 percent of agencies said they expect their data sets to include video and audio in two years, up from only 42 percent today. The survey also found that other sharp increases will come from text messages and tweets from social media; RFID feeds; and non-geographic data from smart phones and mobile devices.
“From smart phones to bridges, we are instrumenting everything,” says Michael Daconta, former metadata program manager for the Homeland Security Department and author of "Information as Product: How to Deliver the Right Information to the Right Person at the Right Time."
There are plenty of opportunities lurking in these giant stockpiles of information. Accenture, the consulting group, suggests some of the ways sensor data can benefit government: “in restaurant kitchens to monitor hygiene standards; in the water supply to monitor chemical composition; on roads to control speed limits—to provide a continuous stream of data that now is collected only sporadically by in-person inspections during site visits.”
As intriguing as these possibilities are, Daconta notes, “The question is the quality of the data we are gathering. Can we trust and reuse it?”
This “unstructured data” from these new sources is more difficult to organize and tag than structured data from transactional databases. And even as government agencies face an avalanche of information from non-traditional sources, “a lot of states are still dealing with management of their transactional data,” says Eric Sweden, program director for enterprise architecture and governance for the National Association of State Chief Information Officers (NASCIO). “Adding Big Data provides a significant level of complexity.”
Sweden says the rise in non-traditional data sources reaffirms the need for Big Data to be managed using a formal data management discipline within a formal enterprise architecture program. A NASCIO report -- titled, “Is Big Data a Big Deal for State Governments?” – noted that Big Data brings some additional challenges in terms of data management. For one thing, Big Data gives rise to a new emphasis on data lineage – the path that describes where data was created and by whom; how it is transformed; how it flows; and how it was combined with other data.
“If you neglect these issues, you paint yourself into a corner,” Sweden says. “If you don’t have master data management, data governance, and data architecture, you will end up with the wild west. People who apply analytics without these things might be able to do some Big Data initiatives that are extremely targeted with clear objectives, but they will bump into the ceiling because they don’t have coordination across the enterprise.”