Visualization tools improve transparency by making sense of raw data
A picture is worth a thousand words, but a pie chart gets into the big numbers, telling people, for example, how their tax dollars are being spent.
Less than a month after launching Data.gov, a registry of raw government data, federal Chief Information Officer Vivek Kundra announced a new initiative, the IT Dashboard now in beta on the USASpending.gov site.
The information on the site is nothing new. Data on how government will spend $76 billion on IT has been available online for some time, via the Federal Procurement Data System. What Kundra wants to do is present the data so it can be readily understood, even analyzed.
Using Adobe Flash, the site shows in pie charts how much each agency spends on IT. Click on one agency, and the site will reveal bar charts that show how well it is sticking to its budget and timelines for completing IT projects.
For Data.gov, agencies have placed thousands of data feeds as comma-separated values (CSV) files or Really Simple Syndication feeds. Although that is a good step toward greater transparency, agencies could also start thinking about ways to better present the data so people and fellow government employees can make better sense of all the material. A new crop of visualization tools is coming on the market to make that job easier.
"The whole point of doing data visualization is to take massive quantities of data and present it in a manner that helps the user get, in a single rapid reading, the story you are trying to tell in that dataset," said Andy Hoskinson, vice president of technology strategy and consulting at Unisys.
With the administration calling for more transparency, agency interest in visualization is growing, said Susie Adams, chief technology officer at Microsoft's federal division.
"Agencies are being graded about how transparent they are, and a quick and easy win is to get some data and visualize it on the Web site," Adams said. As a result, visualization "is much more of a priority than it has been in the past."
Act of inspiration
One of the big motivators for looking into visualizing information has been the American Recovery and Reinvestment Act, which will allow federal agencies to invest about $800 billion in areas of economic growth, such as health care and environmental technologies. The bill, signed into law in February, comes with stiff reporting requirements, which have spurred many agencies to look at visualization techniques that they can use to show the public how they are spending money.
"In the past, legislation around new categories of federal spending has generated reporting requirements for federal agencies, so agencies have always been keyed into doing this kind of reporting," Hoskinson said. "What is new about the recovery act is that now detailed reporting must be done with recipients."
After the bill was passed, Unisys pulled together a package that will help agencies present data to the public. Called the Recovery Act Reporting product, it can take stimulus act data and report it in Web-friendly formats such as histograms, pie charts, bar charts, score cards, dashboards and point maps. Unisys offers it as hosted service, or agencies can run it in-house.
The application is a Java Enterprise Edition-based Web application. The package can pull data about an agency from USASpending.gov and Recovery.gov, via an Extensible Markup Language-based data feed. "We pull the data down, parse the XML data structure and we put the results in a database," Hoskinson said. Once it is in a structured database, the material can be reconciled against the agency's internal records and subsequently put into visual form. Spending data can be placed in histograms or pie charts. Location data can be placed on a map.
As a result, people interested in knowing what stimulus projects are happening in their ZIP code or lawmakers who want to know where stimulus dollars are being spent in their districts can get that information quickly.
How does an agency visualize its data? In some cases, existing software might do the job. For instance, Microsoft has a feature in its Sharepoint collaboration software, called Excel Services, that can export the pivot tables, bar charts and other data-derived visual renderings created in Excel so that they can be posted on the Web. The company plans to boost this capability even further with the next release of Microsoft Office, Microsoft Office 2010.
Office will have an extension called Office Web Apps, which will allow Web visitors to access entire datasets. Visitors will have the full capability of Excel to parse data without the need to have Excel on their own computers. Of course, the originators of the datasets will have full control of what privileges they can grant others in regard to manipulating the data.
"You can visualize the data not only in Excel but however you want," Adams said.
Other software, such as business intelligence software, can handle the task, too. Tableau Software's eponymously named flagship business intelligence application lets nondatabase administrators connect to a database and rerender the material in a more visual form. "Instead of knowing how to write a query or set up a report, you can drag and drop variables into a canvas," said Elissa Fink, vice president of marketing at Tableau.
One of Tableau’s options is to take a report and publish it to the Web through the company's Tableau Server software. "Anyone with a Web browser can interact with my reports — do their own lightweight analytics, filter, sort and view underlying data," Fink said. "You can filter for data variables by number or category, create sets, bar charts, line charts [or do] geographic analysis."
The interactivity is an important component. "You get out of the paradigm of looking at data as numbers of rows and columns," Fink said. "You can look at a scattergram, see the outliers, select those outliers individually, exclude everything else, and find where the outliers came from. Visualization is not the end result; it's part of the analytical process, and therefore you can unearth the story."
The District of Columbia used this software for its CapStat site, which provides a series of visual summaries, such as pie charts, of how effectively the city's departments are performing. The city first started charting the response times of ambulances and other emergency vehicles but chafed at the limited way spreadsheets could display this data online, Fink said.
In other cases, entirely new types of software have been created to deal with raw government data feeds, such as Socrata’s Social Data Player.
On Data.gov, much of the data is in CSV or XML files, which are not meant to be read by humans. "Forcing people to download data in legacy formats is a bad approach for the ordinary citizen," said Kevin Merritt, CEO and founder of Socrata. The company’s software allows for "ad hoc use of the data in a fairly casual, lightweight way."
In June, the General Services Administration signed agreements with a number of Web-based technology providers, such as Google, for YouTube and the Vimeo video networking services. Socrata was one of the lesser-known companies to ink the deal with GSA.
The player itself is free, though Socrata is offering a turnkey platform that can allow agencies to stage and present their own data, either through the company as a hosted service or in-house. All data hosted on the platform is made available through RSS and application programming interfaces.
On the map
In some cases, the best way to present information is geographically, with maps. Cumberland County, N.C., offers one example.
The county, home to more than 300,000 residents and two major military bases, recently launched a Web site where visitors can find land-related information that the county maintains. The site is the culmination of many years of work not only in building online maps for public and internal consumption but also refining how data can be indexed by geographic identification.
With county residential growth on the rise, the mapping services on the site should be particularly useful for residents, potential residents, banks, developers and real estate agents. The site provides a central location for data that formerly someone would need to track down by visiting or calling multiple county and state offices.
The site features a number of maps with markers to pertinent data. For example, one shows the school district for an area when given an address, which could be handy for parents who might be moving there. Another service can alert real estate agents if a major state project is being planned near a particular property. Another shows the sex offenders who live around a particular location.
For land developers, another map can show plan details, zoning requirements, permitting and public work orders that affect a specific lot — information formerly spread across multiple county offices. Voting districts, bus routes, flood zones and mosquito spraying zones are also mapped.
Mapping data that real estate agents and developers require has reduced the number of phone calls the county typically receives by as much as 60 to 70 percent, estimated Mike Osbourn, the county's mapping coordinator. Until recently, Osbourn's office kept plans submitted by developers in paper form, stashed away in 25 file cabinets. Every time a bank or potential homeowner had a question, an employee would riffle through the cabinets to get the original plan. So the county scanned all of their plans — more than 10,000 — and made them available as links on the Web map.
The maps draw data from multiple databases. The county primarily uses software from Pitney Bowes Business Insight division, most notably MapInfo Professional.
Such mapping software can not only join material from multiple databases and display the results on a single map but also apply advanced filtering to refine the results, a feature that can't be done easily by reusing consumer-based mapping services such as Google Earth. Nor, until recently, could relational databases easily perform actions such as finding all entries that have a close geographic proximity to a given location, although the latest releases of the Microsoft SQL Server and Oracle database do have geographic-filtering capabilities.
"When you access the data, you're not just accessing in a tabular form, you are looking at the geographic attributes, and that allows you to visualize the data as maps and also do geographic or spatial queries," said John Winslow, global portfolio director for location intelligence products at Pitney Bowes Insight.
Geographic locations also can help even when the results are not displayed on a map. Cumberland County uses another Pitney Bowes product called SpatialWare that, when given an address, can return a list of items located within a specific geographic proximity.