DATA MANAGEMENT

Remixing government data

Through mashups and Web apps, third parties are making innovative use of agencies' information

 Last year, before he took on the role of federal chief information officer, Vivek Kundra came up with a new twist on the idea of government by the people: Let the people build some public-facing online government applications.

At the time, Kundra was the chief technology officer for Washington, D.C. The city's mayor, Adrian Fenty, was eager to use information technology in ways that would improve the effectiveness of the city's government. Kundra's office had put many innovative programs into play — such as the CapStat Web application that monitored how well city offices performed — but it also had data feeds with no public interface.

Those feeds could allow outside organizations to pull city data, such as information on arrests or recently issued building permits, into their own applications. To get the ball rolling, Kundra created a contest called Applications For Democracy that offered prizes for the most innovative reuse of D.C. data. More than 45 developers submitted applications — far more than the city would have had the funds to commission.

Of course, repackaging government data for education and profit is nothing new. Dozens of businesses generate income by deciphering the notices that fly across the Federal Register and Federal Business Opportunities Web sites every day.

But a recent confluence of technical and political factors portends a much wider use of government data. With Web 2.0 technology, anyone with some coding skills can make their own use of well-formed government data. And with the Obama administration calling for greater government transparency, Kundra wants to replicate D.C.’s success on a national level via the soon-to-be-launched Data.gov site.

Already, efforts by nonprofit groups such as the Sunlight Foundation are prompting the development of more public-facing applications. Some organizations are generating business from the tools, which could help boost the economy — another goal of the Obama administration. Others are building the applications simply to keep themselves and others better informed. But they all have one thing in common: They are repackaging government data in ways the agencies that produce it probably never thought of.

"There are a lot of developers out there who are just looking for big data, and government is a place they can get it from," said Clay Johnson, director of the Sunlight Foundation's Sunlight Labs, which repackages government data so it is easier for others to use. In the long run, advocates say, relying on the private sector might be the way to go for all but the most basic presentations of government information.

"The nongovernmental sector will likely always have more talent and artistic capability than inside the government," Johnson said.

Data everywhere

Entries in last year's Apps for Democracy contest targeted a surprisingly wide variety of platforms. Not only were a fair number developed for the Web, but others were built specifically for social-networking site Facebook or for use on mobile phones.

"The first app submitted was for an iPhone app," said Peter Corbett, chief executive officer of iStrategyLabs, which ran the contest for the city.

The first-prize winner in the organization category was a site called D.C. Historic Tours, developed by Internet marketing company Boalt. The information about area attractions came from the city, but Boalt developers decided how to present it.

"If you hired a digital agency to build this [under a government contract], it would cost $50,000 to $100,000 and take a couple months to get built," Corbett said.

The site uses Google Maps as the basis for enabling users to build their own walking tours of the city. It pulls information from Wikipedia, the Flickr photo-sharing service and a list of historic buildings.

Other winning applications include a Web-based map that returns neighborhood-specific demographic data, crime reports and other information; a service that matches people who want to carpool; and a set of tools that tap into mobile phones’ Global Positioning System capability to point users to the nearest bank, post office, gas station or other resource.

Web development company Development Seed submitted a couple of applications for the contest, including Stumble Safely, a Web site with a map of the city's bars, subway stops and crime hot spots. Late-night revelers can consult the site for the best route to take from a bar to a subway station without walking into an area that has recently had a lot of crime activity. Another closely related site is D.C. Bikes, which mixes bike-related classifieds from Craigslist with a map of recent bicycle thefts in the city.

"We go after government data a lot because the details matter," said Eric Gundersen, president of Development Seed. “If you're doing data visualization, you really want to control the details. Getting access to the raw shape files is allowing us to do that."

For Stumble Safely and D.C. Bikes, the company developed a script that downloads the city's crime reports every day. Officials post a compressed version of the material, called a tarball, in a comma-separated values format, which is easy for databases to ingest.

Both sites link the geographic coordinates of crime data to a map of the city, developed with an open-source program called Mapnik. For Stumble Safely, the city’s data feed for liquor licenses is parsed to identify bars’ locations. The open-source Drupal content management system provides the Web platform for presenting the data on both sites.

The interests of its employees prompted Development Seed’s choice of projects. Most of them bike to work, and a few enjoy having a drink in the evening, Gundersen said. So making the sites was a natural fit. They took four developers a total of three days to create. Rumor has it that even the D.C. cops use Stumble Safely to track crime hot spots.

"D.C. government never would have paid for a bar site, but talk about positive externalities," Gundersen said. “This is a great example of wild efficiencies that are possible when government opens the data.”

Development Seed is also building a map-based site for the nonprofit New America Foundation. The company is pulling geographic information system data from the Census Bureau to create a map of the country's 14,000 schools districts.

"There are huge wins by getting this out there," Gundersen said.

Such mashups are fairly simple to build, at least for those with the know-how. In fact, about 60 percent of the Apps for Democracy submissions came from individuals, Corbett said. Such contests could help boast their reputation in the development community. For companies, the competitions provide a platform for showing off their design skills and perhaps earning a new source of revenue if they can monetize the sites they develop.

Plus, who wouldn't feel good about helping people get access to valuable government information?

Impressed by the results of Apps for Democracy, the Sunlight Foundation ran its own contest, called Apps for America.

Like D.C., Sunlight offers a number of data feeds that use Extensible Markup Language (XML) and application programming interfaces. Sunlight's APIs and datasets draw and reformat data from sources such as the Census Bureau, the Federal Election Commission and the Senate Public Relations Office. The group has funded or developed a number of public databases of congressional information, including OpenCongress.org, FedSpending.org, OpenSecrets.org and EarmarkWatch.org.

Sunlight's goal for its contest was to have outside developers provide new applications that citizens could use to better understand congressional activity. The organization received submissions for more than 40 open-source applications. It announced the winners last month.

First prize, which came with a cash award of $15,000, went to an application called Filibusted, which tracks senators' use of parliamentary procedures to block votes on legislation. Web interface developer Andrew Dupont created the site.

Other notable entries included Congress Cal, a Google-based calendar with details about bills before Congress; Word on the Street, an application for mobile phones that pinpoints the user's location and displays information on the elected congressional officials for that area; and Defogger, a service that can scan an article or blog post about a political topic and return information on the people and organizations it discusses.

PointAbout, a company that took part in both contests, has developed a common runtime platform for smart phones so it can build applications for its clients that work on any brand of phone. Getting involved in the contest was a good way to show off the tool’s capabilities, said Scott Suhy, PointAbout’s president.

The company submitted an application called iLegislator, which could be seen as a mini-encyclopedia of congressional representatives. It offers a wealth of information that can be accessed on the go and can even pinpoint the user’s location and identify which legislators cover that area.

Using the JavaScript Object Notation data interchange format, the application taps into an API from Sunlight Labs that returns the names of congressional representatives when it is given a set of GPS coordinates. Other APIs connect users to legislators' Twitter feeds and YouTube videos and present information on bills they've sponsored, phone numbers and e-mail addresses.

Elsewhere on the Web

Although the Obama administration's call for transparency is a driving force behind new uses for government data, the trend has been growing for a while.

For example, the GovTrack Web site has been offering updates on congressional bills and information on committee members and their voting records since 2004. Joshua Tauberer, the site’s developer, is currently a doctoral candidate in linguistics at the University of Pennsylvania but still runs the site in his spare time.

All the information GovTrack offers can be obtained directly from government Web sites, such as the Library of Congress' online repository for legislative information called Thomas. But GovTrack gets 20,000 to 30,000 unique visitors a day because it is so easy to use.

Tauberer got the idea when he was taking a class in copyright law and technology. "I was looking into what kinds of information you could find about laws online, and I saw that Thomas had a ton of information, but as a newcomer to this world of politics, it was really opaque for me," he said. "I thought much more could be done with this large quantity of data than what the government was doing with it."

GovTrack gets most of its data from Thomas and other government sources using a low-tech procedure called screen scraping. Periodically, Perl scripts grab entire Web pages and repackage the information. Tauberer said setting up that process took some effort and was complicated by the fact that the Library of Congress does not offer its information as an RSS feed, which would greatly simplify the process of reusing data.

Another site that has been operational for a while is OpenRegs.com, which reformats much of the information from the Federal Register and makes it easier to search by grouping it according to agency or topic. The Government Printing Office's site, Federal Digital System, only allows users to browse the information by date. OpenRegs gives users more options, including the ability to search through new regulations that apply only to a certain agency in a given month.

The service taps into the Federal Register RSS feeds provided by GPO, said Peter Snyder, who maintains the OpenRegs site. Like many low-cost Web 2.0 offerings, OpenRegs relies on an open-source software stack. An application called cURL fetches the RSS feed every night and captures information such as a rule’s topic, description, contact information and status. The data is then re-rendered for the Web in PHP with a SimpleXML extension and saved in a MySQL database.

With data reuse flourishing, the question arises: How much of it is because government agencies aren't presenting data well?

Most contest participants agree that there is no way for agencies to present data in all the contexts and forms in which it could be used. Instead, they advocate having the government focus on making data available in machine-accessible formats so interested parties can package it in the way that best serves the intended audience — either for profit or simply because they want to make the data available.

"There is a certain level of service that government agencies need to provide," Tauberer said. “It's important that the Library of Congress has a site like Thomas, to provide uniform access to people who need the information. The next priority is making the data available."

"It would be wonderful if the federal government provided the information in a way that it would be easy to see and we didn't need third parties to come in and fill the gap," Snyder said. However, he understands how cumbersome and expensive it could be for agencies to offer the material in such a complete form and sees the merit of having a third party do the work.

"Maybe federal bureaucracy is not the best way to make Web sites," he said. "As long the information is there, I don't need them to make me an iPhone application."

Reader Comments

Please post your comments here. Comments are moderated, so they may not appear immediately after submitting. We will not post comments that we consider abusive or off-topic.

Please type the letters/numbers you see above