EPA looks to improve data aggregation with APIs
- By Amanda Ziadeh
- May 20, 2016
The Environmental Protection Agency still faces challenges associated with aggregating the regulatory compliance data it collects from facilities, and is taking a number of approaches to streamline that process, according to EPA Chief Data Scientist Robin Thottungal.
The challenges begin with disconnects between what information policymakers are looking for and what data is available to support those requests. The information that regulated facilities are required to report is often defined by regulators who are looking for specific outcomes, rather than based on the types and formats of the data itself. That sometimes makes it difficult to fulfill what the law requires with the available data.
Additionally, facilities subject to regulation under the Clean Air Act may be reporting data in different formats to the Office of Air and Radiation than they send to the Office of Chemical Safety and Pollution Prevention as required by the Toxic Substances Control Act. This poses a problem when the EPA is trying to look at the whole set of regulations governing that facility and the type of data being collected.
Further, before making its way to the EPA, data is collected by all 50 states, local and tribal entities. States collect information from their facilities, and, Thottungal said, “everybody has their own databases and is using their own way of collecting the information,” calling into question the quality of the data.
“Because of the way in which we have been operating in siloes, [data aggregation is] a big challenge for us,” Thottungal told the audience at FedScoop’s May 18 Data Innovation Summit in Washington, D.C. The EPA is looking to break through the barriers and integrate the way compliance data is collected.
One of the ways Thottungal and his team have been trying to streamline the data collection process has been by working with states and facilities early on to identify some key data components across regulations and common collection practices.
Thottungal said the EPA recently created an application programming interface for states to make sure that when a facility submits new information, it is validated and posted to the FRS.
The API-based approach maintains the flow of data across the partners and facilities no matter what systems are being used. “As long as our partners can access the data through the API, we have the ability to advance as their technology advances,” Thottungal said.
The EPA is also exploring the use of cloud platforms for data access because they can better store the large datasets and can be scaled up or down based on the agency’s needs. “We believe that might be good for serving the partners as well as the companies or the public for accessing the data,” Thottungal said.
Going forward, Thottungal said, EPA must find new, innovative ways of holistically collecting and serving quality EPA compliance data. “What we are doing from the EPA headquarters is trying to create the systems that will make sure that the data that we are collecting at the source is up to standards,” he said.
Amanda Ziadeh is a former reporter/producer for GCN.