Remixing government data

 Last year, before he took on the role of federal chief information officer, Vivek Kundra came up with a new twist on the idea of government by the people: Let the people build some public-facing online government applications.

At the time, Kundra was the chief technology officer for Washington, D.C. The city's mayor, Adrian Fenty, was eager to use information technology in ways that would improve the effectiveness of the city's government. Kundra's office had put many innovative programs into play — such as the CapStat Web application that monitored how well city offices performed — but it also had data feeds with no public interface.

Those feeds could allow outside organizations to pull city data, such as information on arrests or recently issued building permits, into their own applications. To get the ball rolling, Kundra created a contest called Applications For Democracy that offered prizes for the most innovative reuse of D.C. data. More than 45 developers submitted applications — far more than the city would have had the funds to commission.

Of course, repackaging government data for education and profit is nothing new. Dozens of businesses generate income by deciphering the notices that fly across the Federal Register and Federal Business Opportunities Web sites every day.

But a recent confluence of technical and political factors portends a much wider use of government data. With Web 2.0 technology, anyone with some coding skills can make their own use of well-formed government data. And with the Obama administration calling for greater government transparency, Kundra wants to replicate D.C.’s success on a national level via the soon-to-be-launched Data.gov site.

Already, efforts by nonprofit groups such as the Sunlight Foundation are prompting the development of more public-facing applications. Some organizations are generating business from the tools, which could help boost the economy — another goal of the Obama administration. Others are building the applications simply to keep themselves and others better informed. But they all have one thing in common: They are repackaging government data in ways the agencies that produce it probably never thought of.

"There are a lot of developers out there who are just looking for big data, and government is a place they can get it from," said Clay Johnson, director of the Sunlight Foundation's Sunlight Labs, which repackages government data so it is easier for others to use. In the long run, advocates say, relying on the private sector might be the way to go for all but the most basic presentations of government information.

"The nongovernmental sector will likely always have more talent and artistic capability than inside the government," Johnson said.

Data everywhere

Entries in last year's Apps for Democracy contest targeted a surprisingly wide variety of platforms. Not only were a fair number developed for the Web, but others were built specifically for social-networking site Facebook or for use on mobile phones.

"The first app submitted was for an iPhone app," said Peter Corbett, chief executive officer of iStrategyLabs, which ran the contest for the city.

The first-prize winner in the organization category was a site called D.C. Historic Tours, developed by Internet marketing company Boalt. The information about area attractions came from the city, but Boalt developers decided how to present it.

"If you hired a digital agency to build this [under a government contract], it would cost $50,000 to $100,000 and take a couple months to get built," Corbett said.

The site uses Google Maps as the basis for enabling users to build their own walking tours of the city. It pulls information from Wikipedia, the Flickr photo-sharing service and a list of historic buildings.

Other winning applications include a Web-based map that returns neighborhood-specific demographic data, crime reports and other information; a service that matches people who want to carpool; and a set of tools that tap into mobile phones’ Global Positioning System capability to point users to the nearest bank, post office, gas station or other resource.

Web development company Development Seed submitted a couple of applications for the contest, including Stumble Safely, a Web site with a map of the city's bars, subway stops and crime hot spots. Late-night revelers can consult the site for the best route to take from a bar to a subway station without walking into an area that has recently had a lot of crime activity. Another closely related site is D.C. Bikes, which mixes bike-related classifieds from Craigslist with a map of recent bicycle thefts in the city.

"We go after government data a lot because the details matter," said Eric Gundersen, president of Development Seed. “If you're doing data visualization, you really want to control the details. Getting access to the raw shape files is allowing us to do that."

For Stumble Safely and D.C. Bikes, the company developed a script that downloads the city's crime reports every day. Officials post a compressed version of the material, called a tarball, in a comma-separated values format, which is easy for databases to ingest.

Both sites link the geographic coordinates of crime data to a map of the city, developed with an open-source program called Mapnik. For Stumble Safely, the city’s data feed for liquor licenses is parsed to identify bars’ locations. The open-source Drupal content management system provides the Web platform for presenting the data on both sites.

The interests of its employees prompted Development Seed’s choice of projects. Most of them bike to work, and a few enjoy having a drink in the evening, Gundersen said. So making the sites was a natural fit. They took four developers a total of three days to create. Rumor has it that even the D.C. cops use Stumble Safely to track crime hot spots.

"D.C. government never would have paid for a bar site, but talk about positive externalities," Gundersen said. “This is a great example of wild efficiencies that are possible when government opens the data.”

Development Seed is also building a map-based site for the nonprofit New America Foundation. The company is pulling geographic information system data from the Census Bureau to create a map of the country's 14,000 schools districts.

"There are huge wins by getting this out there," Gundersen said.

Such mashups are fairly simple to build, at least for those with the know-how. In fact, about 60 percent of the Apps for Democracy submissions came from individuals, Corbett said. Such contests could help boast their reputation in the development community. For companies, the competitions provide a platform for showing off their design skills and perhaps earning a new source of revenue if they can monetize the sites they develop.

Plus, who wouldn't feel good about helping people get access to valuable government information?

Impressed by the results of Apps for Democracy, the Sunlight Foundation ran its own contest, called Apps for America.

Like D.C., Sunlight offers a number of data feeds that use Extensible Markup Language (XML) and application programming interfaces. Sunlight's APIs and datasets draw and reformat data from sources such as the Census Bureau, the Federal Election Commission and the Senate Public Relations Office. The group has funded or developed a number of public databases of congressional information, including OpenCongress.org, FedSpending.org, OpenSecrets.org and EarmarkWatch.org.

Sunlight's goal for its contest was to have outside developers provide new applications that citizens could use to better understand congressional activity. The organization received submissions for more than 40 open-source applications. It announced the winners last month.

First prize, which came with a cash award of $15,000, went to an application called Filibusted, which tracks senators' use of parliamentary procedures to block votes on legislation. Web interface developer Andrew Dupont created the site.

Other notable entries included Congress Cal, a Google-based calendar with details about bills before Congress; Word on the Street, an application for mobile phones that pinpoints the user's location and displays information on the elected congressional officials for that area; and Defogger, a service that can scan an article or blog post about a political topic and return information on the people and organizations it discusses.

PointAbout, a company that took part in both contests, has developed a common runtime platform for smart phones so it can build applications for its clients that work on any brand of phone. Getting involved in the contest was a good way to show off the tool’s capabilities, said Scott Suhy, PointAbout’s president.

The company submitted an application called iLegislator, which could be seen as a mini-encyclopedia of congressional representatives. It offers a wealth of information that can be accessed on the go and can even pinpoint the user’s location and identify which legislators cover that area.

Using the JavaScript Object Notation data interchange format, the application taps into an API from Sunlight Labs that returns the names of congressional representatives when it is given a set of GPS coordinates. Other APIs connect users to legislators' Twitter feeds and YouTube videos and present information on bills they've sponsored, phone numbers and e-mail addresses.

Elsewhere on the Web

Although the Obama administration's call for transparency is a driving force behind new uses for government data, the trend has been growing for a while.

For example, the GovTrack Web site has been offering updates on congressional bills and information on committee members and their voting records since 2004. Joshua Tauberer, the site’s developer, is currently a doctoral candidate in linguistics at the University of Pennsylvania but still runs the site in his spare time.

All the information GovTrack offers can be obtained directly from government Web sites, such as the Library of Congress' online repository for legislative information called Thomas. But GovTrack gets 20,000 to 30,000 unique visitors a day because it is so easy to use.

Tauberer got the idea when he was taking a class in copyright law and technology. "I was looking into what kinds of information you could find about laws online, and I saw that Thomas had a ton of information, but as a newcomer to this world of politics, it was really opaque for me," he said. "I thought much more could be done with this large quantity of data than what the government was doing with it."

GovTrack gets most of its data from Thomas and other government sources using a low-tech procedure called screen scraping. Periodically, Perl scripts grab entire Web pages and repackage the information. Tauberer said setting up that process took some effort and was complicated by the fact that the Library of Congress does not offer its information as an RSS feed, which would greatly simplify the process of reusing data.

Another site that has been operational for a while is OpenRegs.com, which reformats much of the information from the Federal Register and makes it easier to search by grouping it according to agency or topic. The Government Printing Office's site, Federal Digital System, only allows users to browse the information by date. OpenRegs gives users more options, including the ability to search through new regulations that apply only to a certain agency in a given month.

The service taps into the Federal Register RSS feeds provided by GPO, said Peter Snyder, who maintains the OpenRegs site. Like many low-cost Web 2.0 offerings, OpenRegs relies on an open-source software stack. An application called cURL fetches the RSS feed every night and captures information such as a rule’s topic, description, contact information and status. The data is then re-rendered for the Web in PHP with a SimpleXML extension and saved in a MySQL database.

With data reuse flourishing, the question arises: How much of it is because government agencies aren't presenting data well?

Most contest participants agree that there is no way for agencies to present data in all the contexts and forms in which it could be used. Instead, they advocate having the government focus on making data available in machine-accessible formats so interested parties can package it in the way that best serves the intended audience — either for profit or simply because they want to make the data available.

"There is a certain level of service that government agencies need to provide," Tauberer said. “It's important that the Library of Congress has a site like Thomas, to provide uniform access to people who need the information. The next priority is making the data available."

"It would be wonderful if the federal government provided the information in a way that it would be easy to see and we didn't need third parties to come in and fill the gap," Snyder said. However, he understands how cumbersome and expensive it could be for agencies to offer the material in such a complete form and sees the merit of having a third party do the work.

"Maybe federal bureaucracy is not the best way to make Web sites," he said. "As long the information is there, I don't need them to make me an iPhone application."

X
This website uses cookies to enhance user experience and to analyze performance and traffic on our website. We also share information about your use of our site with our social media, advertising and analytics partners. Learn More / Do Not Sell My Personal Information
Accept Cookies
X
Cookie Preferences Cookie List

Do Not Sell My Personal Information

When you visit our website, we store cookies on your browser to collect information. The information collected might relate to you, your preferences or your device, and is mostly used to make the site work as you expect it to and to provide a more personalized web experience. However, you can choose not to allow certain types of cookies, which may impact your experience of the site and the services we are able to offer. Click on the different category headings to find out more and change our default settings according to your preference. You cannot opt-out of our First Party Strictly Necessary Cookies as they are deployed in order to ensure the proper functioning of our website (such as prompting the cookie banner and remembering your settings, to log into your account, to redirect you when you log out, etc.). For more information about the First and Third Party Cookies used please follow this link.

Allow All Cookies

Manage Consent Preferences

Strictly Necessary Cookies - Always Active

We do not allow you to opt-out of our certain cookies, as they are necessary to ensure the proper functioning of our website (such as prompting our cookie banner and remembering your privacy choices) and/or to monitor site performance. These cookies are not used in a way that constitutes a “sale” of your data under the CCPA. You can set your browser to block or alert you about these cookies, but some parts of the site will not work as intended if you do so. You can usually find these settings in the Options or Preferences menu of your browser. Visit www.allaboutcookies.org to learn more.

Sale of Personal Data, Targeting & Social Media Cookies

Under the California Consumer Privacy Act, you have the right to opt-out of the sale of your personal information to third parties. These cookies collect information for analytics and to personalize your experience with targeted ads. You may exercise your right to opt out of the sale of personal information by using this toggle switch. If you opt out we will not be able to offer you personalised ads and will not hand over your personal information to any third parties. Additionally, you may contact our legal department for further clarification about your rights as a California consumer by using this Exercise My Rights link

If you have enabled privacy controls on your browser (such as a plugin), we have to take that as a valid request to opt-out. Therefore we would not be able to track your activity through the web. This may affect our ability to personalize ads according to your preferences.

Targeting cookies may be set through our site by our advertising partners. They may be used by those companies to build a profile of your interests and show you relevant adverts on other sites. They do not store directly personal information, but are based on uniquely identifying your browser and internet device. If you do not allow these cookies, you will experience less targeted advertising.

Social media cookies are set by a range of social media services that we have added to the site to enable you to share our content with your friends and networks. They are capable of tracking your browser across other sites and building up a profile of your interests. This may impact the content and messages you see on other websites you visit. If you do not allow these cookies you may not be able to use or see these sharing tools.

If you want to opt out of all of our lead reports and lists, please submit a privacy request at our Do Not Sell page.

Save Settings
Cookie Preferences Cookie List

Cookie List

A cookie is a small piece of data (text file) that a website – when visited by a user – asks your browser to store on your device in order to remember information about you, such as your language preference or login information. Those cookies are set by us and called first-party cookies. We also use third-party cookies – which are cookies from a domain different than the domain of the website you are visiting – for our advertising and marketing efforts. More specifically, we use cookies and other tracking technologies for the following purposes:

Strictly Necessary Cookies

We do not allow you to opt-out of our certain cookies, as they are necessary to ensure the proper functioning of our website (such as prompting our cookie banner and remembering your privacy choices) and/or to monitor site performance. These cookies are not used in a way that constitutes a “sale” of your data under the CCPA. You can set your browser to block or alert you about these cookies, but some parts of the site will not work as intended if you do so. You can usually find these settings in the Options or Preferences menu of your browser. Visit www.allaboutcookies.org to learn more.

Functional Cookies

We do not allow you to opt-out of our certain cookies, as they are necessary to ensure the proper functioning of our website (such as prompting our cookie banner and remembering your privacy choices) and/or to monitor site performance. These cookies are not used in a way that constitutes a “sale” of your data under the CCPA. You can set your browser to block or alert you about these cookies, but some parts of the site will not work as intended if you do so. You can usually find these settings in the Options or Preferences menu of your browser. Visit www.allaboutcookies.org to learn more.

Performance Cookies

We do not allow you to opt-out of our certain cookies, as they are necessary to ensure the proper functioning of our website (such as prompting our cookie banner and remembering your privacy choices) and/or to monitor site performance. These cookies are not used in a way that constitutes a “sale” of your data under the CCPA. You can set your browser to block or alert you about these cookies, but some parts of the site will not work as intended if you do so. You can usually find these settings in the Options or Preferences menu of your browser. Visit www.allaboutcookies.org to learn more.

Sale of Personal Data

We also use cookies to personalize your experience on our websites, including by determining the most relevant content and advertisements to show you, and to monitor site traffic and performance, so that we may improve our websites and your experience. You may opt out of our use of such cookies (and the associated “sale” of your Personal Information) by using this toggle switch. You will still see some advertising, regardless of your selection. Because we do not track you across different devices, browsers and GEMG properties, your selection will take effect only on this browser, this device and this website.

Social Media Cookies

We also use cookies to personalize your experience on our websites, including by determining the most relevant content and advertisements to show you, and to monitor site traffic and performance, so that we may improve our websites and your experience. You may opt out of our use of such cookies (and the associated “sale” of your Personal Information) by using this toggle switch. You will still see some advertising, regardless of your selection. Because we do not track you across different devices, browsers and GEMG properties, your selection will take effect only on this browser, this device and this website.

Targeting Cookies

We also use cookies to personalize your experience on our websites, including by determining the most relevant content and advertisements to show you, and to monitor site traffic and performance, so that we may improve our websites and your experience. You may opt out of our use of such cookies (and the associated “sale” of your Personal Information) by using this toggle switch. You will still see some advertising, regardless of your selection. Because we do not track you across different devices, browsers and GEMG properties, your selection will take effect only on this browser, this device and this website.