How researchers enhanced Data.gov using semantic technology

 

Connecting state and local government leaders

A Rensselaer Polytechnic Institute team adds value to government data sets using Resource Description Framework data interchange model.

Related Stories:

A year after launching Data.gov to make government data available to the public, federal officials are still looking for ways to expand its use and the amount of data available on the site. But even the site’s supporters agree that converting much of data available on Data.gov into useable formats isn’t for the faint of heart.

A three-person team at Rensselaer Polytechnic Institute, however, has demonstrated how one approach can make greater use of the massive sets of data available on Data.gov, using the power of the semantic web. The conversion project has shown how quickly and inexpensively visualization and mash-up applications can be built from government data when it’s put into a web-friendly form.

Data.gov has grown rapidly since its launch a year ago, growing from 47 data sets to more than 250,000, according to federal  Chief Information Officer Vivek Kundra, speaking at an event on May 18 to commorate the anniversary, May 21. 

The Data.Gov project at RPI is part of the university’s Tetherless World Research Constellation web technology research initiative, led by Professor James Hendler. Hendler initiated the project in June 2009 after the launch of Data.gov. According to Li Ding, a research scientist at RPI, Hendler "came to me and said that he was looking at the information on Data.gov, and thought it was a great opportunity to use RDF (Resource Description Framework) and linked data," Ding said.

RDF is a standard model for data interchange at the heart of the semantic web. It uses web addresses—Uniform Resource Indicators —to specify the relationship between pieces of data. Even if the underlying structure of two data sets differs, they can still be linked using RDF.


Data.gov shows how not to open government

10 flaws with the data on Data.gov

Data.gov sparks a quiet revolution


“It’s a great format for being able to structure data for the web,” said Dominic DiFranzo, a first-year doctoral candidate at RPI. “You kind of graft (things) together, make things become linked from one thing to another, and this data format allows us to use to link concepts—ideas inside these data sets—to other data sets in a very easy, intuitive fashion.”

The Data.gov data sets offered an interesting opportunity, DiFranzo said, because they were “all free and open to the public, so we had the rights to be able to change and work within it, and link it to other things. It was a great experiment in actually trying to take large data sets that different people from around the US government had curated and taken care of, and trying to mash those together (with data from outside of government) in the open data cloud.”

Li , DiFranzo and University of Chicago undergraduate Sarah Magidson formed the Data-gov project team, working to begin converting some of the high-value data sets in the Data.gov collection into RDF-enabled demonstrations.

“We're also trying to push the limit on different types of technologies,” said Li. “A lot of the demos we’ve built (on the Data.gov data) are based on very simple web technologies.” Once the data has been converted from its source to RDF, he said, it could be accessed by applications using a number of standard web technologies, including the SPARQL query language, JavaScript Object Notation (JSON). “In JSON, we actually have a very nice way to let existing web technologies consume this data and do these really cool (applications),” Li said. Developers can use the development tools they’re familiar with.

The result of the work is that it takes developers days, not months, to create applications based on the Data.gov data, DiFranzo said. “We've been able to, in such a small of time, have 40 demos up. Anyone can use this technology--they don't have to be a graduate student to make this technology work. I had undergrads who had never seen any sort of semantic tech and were able to pick this up in less than a week. So the technology has gotten to the point where general developers can start building these types of apps very quickly.”

Difranzo pointed to a demo application built using data from the Environmental Protection Agency’s Clean Air Status and Trends Network. “The CASTNET project has sensors all across the US measuring ground ozone and other pollutants,” he said.  The data set had the readings from all the sites by name, said DiFranzo, “but there were no inclusions in the data of where the stations were located. So it sort of rendered the data meaningless.”

Once the data was rendered into RDF, DiFranzo said, “we were able to find through data outside Data.gov, a data set that describes where every site actually lives in the US. And we were able to link these two concepts together, and then we were able to do applications with a map actually displaying where the sites were, showing the aggregate values for ozone in areas. We were able to mashup with their own website, and other data we had on the history those values throughout a number of years.” DiFranzo and Li were also able to pull in data from other EPA sites about the sensor sites. The resulting demo application can be seen at http://data-gov.tw.rpi.edu/demo/exhibit/demo-8-castnet.php.

The EPA has its own visual interface into the CASTNET data. “However, we find that if the government exposed their raw data outside,” said Li, “we can do an even better job, because the whole point is that once we have the data, we are no longer limited by the visual data access interface. I want to see more on the map, but unfortunately the visualization was restricted by the government data set. We have to pull from several web pages.”

DiFranzo said the big takeaway from the research work is how much money and time could be saved on visualization projects and other development programs if more of the data was exposed in this fashion.

“Before, if you wanted to make visualization like CASTNET, you hired some contractor. They would spend a lot of time to determine the correct model for the database, and build this really high-end visualization. It would look really cool, and work really fast, and it would be awesome, but it would also cost a lot of money and take a couple of months. With these visualizations we do, since the data is in RDF, we're able to use off the shelf visualization technologies like Google Visualization API, and in a matter of days we can make these quick mashed-up data visualizations and applications. It's just a more rapid development cycle.

X
This website uses cookies to enhance user experience and to analyze performance and traffic on our website. We also share information about your use of our site with our social media, advertising and analytics partners. Learn More / Do Not Sell My Personal Information
Accept Cookies
X
Cookie Preferences Cookie List

Do Not Sell My Personal Information

When you visit our website, we store cookies on your browser to collect information. The information collected might relate to you, your preferences or your device, and is mostly used to make the site work as you expect it to and to provide a more personalized web experience. However, you can choose not to allow certain types of cookies, which may impact your experience of the site and the services we are able to offer. Click on the different category headings to find out more and change our default settings according to your preference. You cannot opt-out of our First Party Strictly Necessary Cookies as they are deployed in order to ensure the proper functioning of our website (such as prompting the cookie banner and remembering your settings, to log into your account, to redirect you when you log out, etc.). For more information about the First and Third Party Cookies used please follow this link.

Allow All Cookies

Manage Consent Preferences

Strictly Necessary Cookies - Always Active

We do not allow you to opt-out of our certain cookies, as they are necessary to ensure the proper functioning of our website (such as prompting our cookie banner and remembering your privacy choices) and/or to monitor site performance. These cookies are not used in a way that constitutes a “sale” of your data under the CCPA. You can set your browser to block or alert you about these cookies, but some parts of the site will not work as intended if you do so. You can usually find these settings in the Options or Preferences menu of your browser. Visit www.allaboutcookies.org to learn more.

Sale of Personal Data, Targeting & Social Media Cookies

Under the California Consumer Privacy Act, you have the right to opt-out of the sale of your personal information to third parties. These cookies collect information for analytics and to personalize your experience with targeted ads. You may exercise your right to opt out of the sale of personal information by using this toggle switch. If you opt out we will not be able to offer you personalised ads and will not hand over your personal information to any third parties. Additionally, you may contact our legal department for further clarification about your rights as a California consumer by using this Exercise My Rights link

If you have enabled privacy controls on your browser (such as a plugin), we have to take that as a valid request to opt-out. Therefore we would not be able to track your activity through the web. This may affect our ability to personalize ads according to your preferences.

Targeting cookies may be set through our site by our advertising partners. They may be used by those companies to build a profile of your interests and show you relevant adverts on other sites. They do not store directly personal information, but are based on uniquely identifying your browser and internet device. If you do not allow these cookies, you will experience less targeted advertising.

Social media cookies are set by a range of social media services that we have added to the site to enable you to share our content with your friends and networks. They are capable of tracking your browser across other sites and building up a profile of your interests. This may impact the content and messages you see on other websites you visit. If you do not allow these cookies you may not be able to use or see these sharing tools.

If you want to opt out of all of our lead reports and lists, please submit a privacy request at our Do Not Sell page.

Save Settings
Cookie Preferences Cookie List

Cookie List

A cookie is a small piece of data (text file) that a website – when visited by a user – asks your browser to store on your device in order to remember information about you, such as your language preference or login information. Those cookies are set by us and called first-party cookies. We also use third-party cookies – which are cookies from a domain different than the domain of the website you are visiting – for our advertising and marketing efforts. More specifically, we use cookies and other tracking technologies for the following purposes:

Strictly Necessary Cookies

We do not allow you to opt-out of our certain cookies, as they are necessary to ensure the proper functioning of our website (such as prompting our cookie banner and remembering your privacy choices) and/or to monitor site performance. These cookies are not used in a way that constitutes a “sale” of your data under the CCPA. You can set your browser to block or alert you about these cookies, but some parts of the site will not work as intended if you do so. You can usually find these settings in the Options or Preferences menu of your browser. Visit www.allaboutcookies.org to learn more.

Functional Cookies

We do not allow you to opt-out of our certain cookies, as they are necessary to ensure the proper functioning of our website (such as prompting our cookie banner and remembering your privacy choices) and/or to monitor site performance. These cookies are not used in a way that constitutes a “sale” of your data under the CCPA. You can set your browser to block or alert you about these cookies, but some parts of the site will not work as intended if you do so. You can usually find these settings in the Options or Preferences menu of your browser. Visit www.allaboutcookies.org to learn more.

Performance Cookies

We do not allow you to opt-out of our certain cookies, as they are necessary to ensure the proper functioning of our website (such as prompting our cookie banner and remembering your privacy choices) and/or to monitor site performance. These cookies are not used in a way that constitutes a “sale” of your data under the CCPA. You can set your browser to block or alert you about these cookies, but some parts of the site will not work as intended if you do so. You can usually find these settings in the Options or Preferences menu of your browser. Visit www.allaboutcookies.org to learn more.

Sale of Personal Data

We also use cookies to personalize your experience on our websites, including by determining the most relevant content and advertisements to show you, and to monitor site traffic and performance, so that we may improve our websites and your experience. You may opt out of our use of such cookies (and the associated “sale” of your Personal Information) by using this toggle switch. You will still see some advertising, regardless of your selection. Because we do not track you across different devices, browsers and GEMG properties, your selection will take effect only on this browser, this device and this website.

Social Media Cookies

We also use cookies to personalize your experience on our websites, including by determining the most relevant content and advertisements to show you, and to monitor site traffic and performance, so that we may improve our websites and your experience. You may opt out of our use of such cookies (and the associated “sale” of your Personal Information) by using this toggle switch. You will still see some advertising, regardless of your selection. Because we do not track you across different devices, browsers and GEMG properties, your selection will take effect only on this browser, this device and this website.

Targeting Cookies

We also use cookies to personalize your experience on our websites, including by determining the most relevant content and advertisements to show you, and to monitor site traffic and performance, so that we may improve our websites and your experience. You may opt out of our use of such cookies (and the associated “sale” of your Personal Information) by using this toggle switch. You will still see some advertising, regardless of your selection. Because we do not track you across different devices, browsers and GEMG properties, your selection will take effect only on this browser, this device and this website.