CIO Council set up Data.gov in two months, and third parties are putting the data to use

GCN Awards Data.gov was a cyber shot heard around the world. It kicked off a revolution that is now enfolding across the globe — that of the easy availability of government-generated data.

Not bad for a simple Web site.

Earlier this year, newly minted federal Chief Information Officer Vivek Kundra had the idea for the site, which offers an index of data generated by government agencies rendered in open, machine-readable formats such as in the Extensible Markup Language. During Kundra's tenure as the chief technology officer for Washington D.C., the city set up a Web site that would offer raw feeds to the data that the city routinely collected, such as crime reports and permit applications.

The idea was that outside developers would create public-oriented applications that would reuse these feeds. And once the data was available, the developers did indeed come. A city-sponsored contest, called Apps for Democracy, resulted in dozens of applications that the city itself would not have the money to build, such as StumbleSafely, an iPhone application that used crime reports to show late-night tavern patrons the safest neighborhoods.

The team behind Data.gov includes, front row, left to right: Office of Management and Budget Chief Architect Kshemendra Paul, Environmental Protection Agency CIO Vivek Kundra and Interior Department CIO Sonny Bhagowalia

Cross-agency collaboration: The team behind Data.gov includes, front row, left to right: Office of Management and Budget Chief Architect Kshemendra Paul, Environmental Protection Agency CIO Linda Travers, federal CIO Vivek Kundra and Interior Department CIO Sonny Bhagowalia.


The incoming Obama administration had campaigned for government transparency, and Data.gov, which Kundra announced in March, would be one of the first initiatives to embody this idea. The idea would be the same as the D.C. version — a one-stop shop for raw government data feeds. By exposing these feeds, the White House was hoping that, like in D.C.'s case, the data feeds would be picked up by outside parties to reuse.

It fell to the Federal Chief Information Officers Council to set up and manage Data.gov. The site was set up in less than two months, a surprisingly short time given the usual deliberation that accompanies government projects. It was announced May 21, exactly four months after President Obama took office and signed the Open Government Directive. Heading up the program team are two are two executives from the Federal CIO Council, Sanjeev "Sonny" Bhagowalia, chief information officer for the Interior Department, and Linda Travers, the acting chief information officer for the Environmental Protection Agency. Volunteers from the General Services Administration and the Office of Management and Budget and other agencies helped out as well, as well as a few key contractors.

What was the key to setting up the site so quickly? Keeping the team small, establishing a project plan and then sticking to that plan,, quickly adapting to any needed changes. "We chose to do it small and fast," Bhagowalia said. "The best projects are done through agile management, where everyone is true believer and all [share] a common vision." The team kept their daily phone meetings to 30-minute time limits.

"This was the power of collaboration and belief done in Internet time," Bhagowalia said. Keep in mind that each team member was working on Data.gov in addition to their own jobs. Cloud computing helped speed the setup, which handled 92% of the traffic and handled the surge and daily utilization with ease. The Web site is hosted in a Terremark facility, with additional hosting handled by Akamai. Open-source technologies were also used to power the site, including Red Hat Enterprise Linux, the Apache Web server software, PHP and MySQL. The site coordinated the look-and-feel of the White House site, as a way to simplify the user interface.

The debut featured only 76 datasets, which brought about some initial criticism for the relatively small number. However, the idea was to get the word out to agencies that the federal government now had a central portal that could publicize agency data feeds when they came online. And within the month, more than 100,000 additional data sets were added. Want to know the buying habits of U.S. citizens, based on expenditures or income? Results from the Bureau of Labor Statistics' Consumer Expenditure Survey can be found on the site. Want a summary of all the criminal activity that took place in the United States? Results from the Justice Department's Uniform Crime Reporting Program have it. In addition to the raw data feeds, Data.Gov offers a number of other features, such as the ability for users to rate the data sets, and additional tools to view the data sets.

John Wonderlich, the policy director for Sunlight Foundation, a nonprofit organization dedicated to increasing government transparency, said the development of Data.gov is significant on several fronts. One is that the presence of the site indicates that the government understands that its audience is changing. "Federal government is recognizing the value of citizen developers interested in data, which is new," Wonderlich said. Secondly, the site shows that "the government, as a whole, recognizes that the data they have is valuable, and is taking strong management position towards making that data publicly accessible."

Soon after Data.gov launched, the foundation’s Sunlight Labs set up a contest, similar to D.C.'s, seeking applications that reused government data. Again, dozens of submissions came in. One entry, called Jobless, charts the unemployment rates of any state over the past 33 years, allowing users to compare multiple states on a single chart. Another entry, a Web application called FlyOnTime.us, allows users to enter a U.S. flight number and see, by percentage, how often that flight is on time. It generates this data using Federal Aviation Administration records.

"There are already all sorts of really neat and interesting uses of the data that are out there," Kundra said.

Thus far, Data.gov has gotten over 30 million Web log hits, as well as accolades from the press and observers.

It is one of a number of new federal Web sites that represent "a quiet revolution in the way the federal bureaucracy works that may change our view of government for the better," opined Silicon Valley Mercury News columnist Chris O'Brien. Data.Gov's influence has been recognized around the world. Representatives from the Australian government have met with the team to discuss how they can set up their own government data. Following the United States' lead, Britain, Canada and Estonia are all setting up their own data registries as well.

"With Data.gov, we're just in the beginning stages of laying a new foundation around transparency, as well as allowing the American people to create innovative solutions to some of the problems with face in this country," Kundra said.

Reader Comments

Thu, Oct 22, 2009

To amplify the significance of the first Reader Comment, who has validated the confidentiality, availability and integrity of the data.gov system? What are the results for the system's C, I and A? Or when is such validation planned?

Thu, Oct 22, 2009 Paul Arlington, VA

Has Data.gov undergone any accreditation to run, through a Federal C&A process, such as NIST? Has Mr. Kundra established any policy on what data can/cannot or will/will not be posted to the site? If he has such a policy, will he post it to the site as part of his transparency effort?

Please post your comments here. Comments are moderated, so they may not appear immediately after submitting. We will not post comments that we consider abusive or off-topic.

Please type the letters/numbers you see above