Open data: If you can’t measure it, how can you manage it?

Open data: If you can’t measure it, how can you manage it?

While most organizations agree that open data provides a range of benefits, the ability to measure those improvements and the associated costs of open data initiatives remains elusive.

The biggest reason is because the primary advantages of adoption are long term: enabling greater government transparency, efficiency and effectiveness as well as solving societal problems and contributing to economic development.

Short-term gains are relatively easy to tally, but Joy Bonaguro, chief data officer for the city and county of San Francisco, admits that longer-term benefits are harder to measure.  To measure what it can, San Francisco created metrics for its open data initiative, DataSF.

Activity, quality and data impact are currently assessed. Activity (by department, priority level and classification as percentages) is measured by progress on its dataset inventory and publishing plans. Quality is measured by the actual vs. expected number of published datasets, publishing timeliness and usability and documentation of the data. Impact is measured by annual surveys -- asking residents whether the city’s open data catalog has made their analytical work easier, faster or more accurate and whether it reduced barriers to using data and/or improved access to data. In the future, San Francisco plans to capture additional impact information via case studies, focus groups and workshops, and counts of apps or websites, Bonaguro said.

The initiative’s logic model links metrics to goals, such as increased jobs and economic activity, improved service for residents and businesses, increased feeling of engagement and empowerment and improved quality of life of residents. San Francisco has yet to be able to measure these results, however.

In Europe, researchers from the Delft University of Technology and Wageningen University studied the results of an open data initiative by Liander, an energy network administrator in the Netherlands, that predicted open data would help promote energy conservation.

While Liander expected a range of long-term benefits -- such as increased development of web services and apps, new user groups and lower data preparation and use costs for both the company and its customers -- only lower costs related to data transactions were found during the course of the study.

Another twist was the varied expectations of external users. The publication of the original small-scale energy consumption dataset created demand for other datasets, such as large-scale energy consumption data and small-scale energy generation data (for example windmills and solar panels).

Municipalities increased their use of the data, and both the private and public sectors used Liander open data to improve existing applications and work processes rather than to create new products.

Another under-studied area is the price tag for producing open data. According to the Open Data Institute, the main cost is staff time.

Based on a survey of 60 respondents working with transit data by the Transit Co-operative Research Program, it takes anywhere between 6.5 to 16 days to release a single dataset, depending on the data’s complexity.  

Specifically, these costs were for staff time to update, fix, and maintain data; convert data to an open format; validate and monitor the data for accuracy; and liaise with data users/developers. Other costs included web service for hosting the data; publicity and marketing; and consultant time to convert the data to an open format. 

About the Author

Kathleen Hickey is a freelance writer for GCN.

inside gcn

  • augmented reality training (Army Research Laboratory)

    Army seeks virtual training environment for squads

Reader Comments

Please post your comments here. Comments are moderated, so they may not appear immediately after submitting. We will not post comments that we consider abusive or off-topic.

Please type the letters/numbers you see above