How to calculate open data

How to calculate open data's ROI

The White House’s 2013 mandate to open data has spurred many agencies to act, but just opening data to meet requirements doesn’t mean the data will be useful. To help agency IT managers decide what datasets are worth opening, a new formula offers an entrée to the thought process.

“It was hard to have a conversation about open data because lots of people have biases, like people want government to open more data, government is a little bit scared of opening the data. And private companies, honestly, they have no clue,” said Arnaud Sahuguet, chief technology officer at New York University’s Governance Lab. “So we were trying to find an ice breaker where you start the conversation.”

That’s how he and David Sangokoya, a research fellow at the lab, came to adapt the equation from calculus on voter turnout to apply it to open data.

Here’s the formulation and how it breaks down:

P × B + D > C

In it, P is the probability that opening the data will have some effect, B is the individual benefit of opening the data, D is the global or ecosystem impact and C is the cost of opening the data.

It’s that last piece, especially, that often gets overlooked. Costs come in the form of reformatting data into an open format, publishing, ensuring it meets legal requirements and covering liabilities and risks should something go wrong with the data – for instance, if personal or incorrect data is released, the researchers wrote in a blog post introducing the calculus.

“Oftentimes, it’s not a binary decision,” Sahuguet added. “It’s more a decision about cost, and when the cost is too high, the decision [to open] is no.” It might not be worth it to open data where there are privacy issues, liabilities or lack of frameworks, he explained. “Our point of view is at the end of the day you will get a number on the left, you will get a number on the right and you have to compare the two and make your own decision.”

Precise numbers aren’t necessary for drawing conclusions, he added. Estimates can go a long way in facilitating decisions.

Opening data for opening’s sake is only one mistake agencies are making with open data, the researchers said.

Another issue is that every agency seems to open the data their own way, Sahuguet said. “There is a big Tower of Babel issue where if you have to compare response time for 911 emergencies in various cities in the United States or comparing spending on education [it’s difficult]. So I’d say the No. 1 mistake is the fact that everybody’s opening the data the way that it fits, and there is no real standard.”

Additionally, agencies aren’t considering the full ecosystem. The work doesn’t end when the data is open, Sahuguet explained; a workflow needs to be in place to keep that information fresh and available.

IT managers should also consider who the data recipients are and whether the data corresponds to actual needs.

“This is what we tried to capture in the calculus though P, the probability that opening the data is going to create positive outcomes,” Sahuguet said. “If there is not an ecosystem of developers, hackers, community members who are really going to be engaged with your data, if there is no future of data-drivenness or data acumen on the recipient side, it’s just like screaming in a forest. Nobody is going to hear you.”

Feedback on the calculus has been positive, Sangokoya said. It’s been translated in Italy, and Belgium used it in an open data day, for instance. Overall, government workers have said it put on paper the ideas they had in mind.

Still, the calculus is not perfect. Sahuguet and Sangokoya are considering factors such as how the decision to open data is made by multiple people over time, not by a single individual in the moment.

“The element of peer pressure, group behavior and game theory is not captured in the calculus,” Sahuguet said. “This is not the formula that is going to revolutionize the field of open data. It’s is a very, very modest contribution, and we really want to open the conversation.”

About the Author

Stephanie Kanowitz is a freelance writer based in northern Virginia.


  • Records management: Look beyond the NARA mandates

    Pandemic tests electronic records management

    Between the rush enable more virtual collaboration, stalled digitization of archived records and managing records that reside in datasets, records management executives are sorting through new challenges.

  • boy learning at home (Travelpixs/

    Tucson’s community wireless bridges the digital divide

    The city built cell sites at government-owned facilities such as fire departments and libraries that were already connected to Tucson’s existing fiber backbone.

Stay Connected