Open data grows up

Pittsburgh's Data Rivers project speeds data integration and cleanup and improves troubleshooting of the city's data pipelines.

To improve the flow of data in Pittsburgh, the city’s Open Data program is getting ready to launch Data Rivers, an upgraded data delivery system that speeds data integration and cleanup and improves troubleshooting of the city's data pipelines.

When the city “flips the switch” in a few weeks, Data Rivers will launch first with 311 data, the most commonly asked-for information, said Tara Matthews, senior digital services analyst at Pittsburgh’s Department of Innovation and Performance.

The new system connects to the application programming interface the 311 center uses to manage its call intake, said Nick Hall, the city’s digital services manager who has overseen the yearlong project. It reads the raw data in and cleans it -- standardizing date and address formats, for instance. Finally, the records are stripped of personally identifying information to create "safe data" that can be published out .

Data Rivers -- whose name refers to the city’s three rivers and to the idea of flowing data vs. a data lake --  also standardizes city data, streamlining and speeding the processes for city analysts.

“The data that we have in the city comes in a bunch of different formats -- a database, an API or a spreadsheet sitting on somebody’s desktop -- but the issue is … evaluating what was needed to actually get that data to the data center,” Matthews said.

The original extract, transform and load (ETL) system was basic, enabling the Open Data program, maintained by the Western Pennsylvania Regional Data Center, a collaboration among the city, Allegheny County and the University of Pittsburgh, to host datasets from a variety of systems.

In 2015, the city's “first priority was to make sure that we could get datasets onto the data center as quickly as possible,” Matthews said. “We didn’t really want to drag our feet with publishing things, so we focused on a method that would get our data online quickly.” That resulted in what she described as a data delivery model “built out of duct tape and bubble gum” in a recent blog post.

The "basic guts" of the original open data system was a set of scripts that ran on a scheduled basis to intake the data and send it to the Western Pennsylvania Regional Data Center, Hall said. Data Rivers "adds a bunch of structural design decisions that make it easier for people to maintain the pipelines and much more difficult for things to go wrong.”

The city adopted Apache Kafka, an open source, distributed and immutable data storage system, and added tools on top of the basic database, he said. First was a user interface or a developer environment for creating new data pipelines -- in other words, a way for someone to configure the chain of events that starts with pulling data out of an SQL database or a vendor’s API, cleaning it and applying administrative business requirements such as implementing privacy rules for removing personally identifiable information.

The second capability Data Rivers needed is data validation, Hall said. For help, he turned to Confluent, a company that supports and expands on Kafka. “They have something called the Schema Registry, which allows you to define schemas that describe the data and store them in a centralized registry," he said. It automatically "checks data as it comes in against predefined schema that will throw a flag if, say, something about the format of the data has changed or if the source of the data has had an outage,” he said. Ultimately, it allows for automated notifications to be sent when issues in the data systems are detected.

Once Data Rivers launches, Matthews will turn her attention to getting more high-level datasets up and running, while Hall moves on to building consumer-facing applications.

“Ultimately this is a product that can serve analysts working with all of the different departments within the city, [and save] those analysts hundreds of hours over the course of a year on integrations and cleanup and dealing with an outage in one of the data pipelines,” Hall said of Data Rivers. “As we’re able to publish data more effectively, these tools become not only something that can serve the public but can serve users internally.”

NEXT STORY: Mapping NYC's zoning processes

X
This website uses cookies to enhance user experience and to analyze performance and traffic on our website. We also share information about your use of our site with our social media, advertising and analytics partners. Learn More / Do Not Sell My Personal Information
Accept Cookies
X
Cookie Preferences Cookie List

Do Not Sell My Personal Information

When you visit our website, we store cookies on your browser to collect information. The information collected might relate to you, your preferences or your device, and is mostly used to make the site work as you expect it to and to provide a more personalized web experience. However, you can choose not to allow certain types of cookies, which may impact your experience of the site and the services we are able to offer. Click on the different category headings to find out more and change our default settings according to your preference. You cannot opt-out of our First Party Strictly Necessary Cookies as they are deployed in order to ensure the proper functioning of our website (such as prompting the cookie banner and remembering your settings, to log into your account, to redirect you when you log out, etc.). For more information about the First and Third Party Cookies used please follow this link.

Allow All Cookies

Manage Consent Preferences

Strictly Necessary Cookies - Always Active

We do not allow you to opt-out of our certain cookies, as they are necessary to ensure the proper functioning of our website (such as prompting our cookie banner and remembering your privacy choices) and/or to monitor site performance. These cookies are not used in a way that constitutes a “sale” of your data under the CCPA. You can set your browser to block or alert you about these cookies, but some parts of the site will not work as intended if you do so. You can usually find these settings in the Options or Preferences menu of your browser. Visit www.allaboutcookies.org to learn more.

Sale of Personal Data, Targeting & Social Media Cookies

Under the California Consumer Privacy Act, you have the right to opt-out of the sale of your personal information to third parties. These cookies collect information for analytics and to personalize your experience with targeted ads. You may exercise your right to opt out of the sale of personal information by using this toggle switch. If you opt out we will not be able to offer you personalised ads and will not hand over your personal information to any third parties. Additionally, you may contact our legal department for further clarification about your rights as a California consumer by using this Exercise My Rights link

If you have enabled privacy controls on your browser (such as a plugin), we have to take that as a valid request to opt-out. Therefore we would not be able to track your activity through the web. This may affect our ability to personalize ads according to your preferences.

Targeting cookies may be set through our site by our advertising partners. They may be used by those companies to build a profile of your interests and show you relevant adverts on other sites. They do not store directly personal information, but are based on uniquely identifying your browser and internet device. If you do not allow these cookies, you will experience less targeted advertising.

Social media cookies are set by a range of social media services that we have added to the site to enable you to share our content with your friends and networks. They are capable of tracking your browser across other sites and building up a profile of your interests. This may impact the content and messages you see on other websites you visit. If you do not allow these cookies you may not be able to use or see these sharing tools.

If you want to opt out of all of our lead reports and lists, please submit a privacy request at our Do Not Sell page.

Save Settings
Cookie Preferences Cookie List

Cookie List

A cookie is a small piece of data (text file) that a website – when visited by a user – asks your browser to store on your device in order to remember information about you, such as your language preference or login information. Those cookies are set by us and called first-party cookies. We also use third-party cookies – which are cookies from a domain different than the domain of the website you are visiting – for our advertising and marketing efforts. More specifically, we use cookies and other tracking technologies for the following purposes:

Strictly Necessary Cookies

We do not allow you to opt-out of our certain cookies, as they are necessary to ensure the proper functioning of our website (such as prompting our cookie banner and remembering your privacy choices) and/or to monitor site performance. These cookies are not used in a way that constitutes a “sale” of your data under the CCPA. You can set your browser to block or alert you about these cookies, but some parts of the site will not work as intended if you do so. You can usually find these settings in the Options or Preferences menu of your browser. Visit www.allaboutcookies.org to learn more.

Functional Cookies

We do not allow you to opt-out of our certain cookies, as they are necessary to ensure the proper functioning of our website (such as prompting our cookie banner and remembering your privacy choices) and/or to monitor site performance. These cookies are not used in a way that constitutes a “sale” of your data under the CCPA. You can set your browser to block or alert you about these cookies, but some parts of the site will not work as intended if you do so. You can usually find these settings in the Options or Preferences menu of your browser. Visit www.allaboutcookies.org to learn more.

Performance Cookies

We do not allow you to opt-out of our certain cookies, as they are necessary to ensure the proper functioning of our website (such as prompting our cookie banner and remembering your privacy choices) and/or to monitor site performance. These cookies are not used in a way that constitutes a “sale” of your data under the CCPA. You can set your browser to block or alert you about these cookies, but some parts of the site will not work as intended if you do so. You can usually find these settings in the Options or Preferences menu of your browser. Visit www.allaboutcookies.org to learn more.

Sale of Personal Data

We also use cookies to personalize your experience on our websites, including by determining the most relevant content and advertisements to show you, and to monitor site traffic and performance, so that we may improve our websites and your experience. You may opt out of our use of such cookies (and the associated “sale” of your Personal Information) by using this toggle switch. You will still see some advertising, regardless of your selection. Because we do not track you across different devices, browsers and GEMG properties, your selection will take effect only on this browser, this device and this website.

Social Media Cookies

We also use cookies to personalize your experience on our websites, including by determining the most relevant content and advertisements to show you, and to monitor site traffic and performance, so that we may improve our websites and your experience. You may opt out of our use of such cookies (and the associated “sale” of your Personal Information) by using this toggle switch. You will still see some advertising, regardless of your selection. Because we do not track you across different devices, browsers and GEMG properties, your selection will take effect only on this browser, this device and this website.

Targeting Cookies

We also use cookies to personalize your experience on our websites, including by determining the most relevant content and advertisements to show you, and to monitor site traffic and performance, so that we may improve our websites and your experience. You may opt out of our use of such cookies (and the associated “sale” of your Personal Information) by using this toggle switch. You will still see some advertising, regardless of your selection. Because we do not track you across different devices, browsers and GEMG properties, your selection will take effect only on this browser, this device and this website.