The hidden costs of open data
- By Sara Friedman
- Aug 01, 2017
As more local governments open their data for public use, the emphasis is often on "free" -- using open source tools to freely share already-created government datasets, often with pro bono help from outside groups. But according to a new report, there are unforeseen costs when it comes pushing government datasets out of public-facing platforms -- especially when geospatial data is involved.
The research, led by University of Waterloo professor Peter A. Johnson and McGill University professor Renee Sieber, was based on work as part of Geothink.ca partnership research grant and exploration of the direct and indirect costs of open data.
Costs related to data collection, publishing, data sharing, maintenance and updates are increasingly driving governments to third-party providers to help with hosting, standardization and analytical tools for data inspection, the researchers found. GIS implementation also has associated costs to train staff, develop standards, create valuations for geospatial data, connect data to various user communities and get feedback on challenges.
Due to these direct costs, some governments are more likely to avoid opening datasets that need complex assessment or anonymization techniques for GIS concerns. Johnson and Sieber identified four areas where the benefits of open geospatial data can generate unexpected costs.
First, open data can create “smoke and mirrors” situation where insufficient resources are put toward deploying open data for government use. Users then experience “transaction costs” when it comes to working in specialist data formats that need additional skills, training and software to use.
Second, the level of investment and quality of open data can lead to “material benefits and social privilege” for communities that devote resources to providing more comprehensive platforms.
While there are some open source data platforms, the majority of solutions are proprietary and charged on a pro-rata basis, which can present a challenge for cities with larger, poor populations compared to smaller, wealthier cities. Issues also arise when governments try to combine their data sets, leading to increased costs to reconcile problems.
The third problem revolves around the private sector pushing for the release of data sets that can benefit their business objectives. Companies could push for the release high-value sets, such as a real-time transit data, to help with their product development goals. This can divert attention from low-value sets, such as those detailing municipal services or installations, that could have a bigger impact on residents “from a civil society perspective.”
If communities decide to release the low-value sets first, Johnson and Sieber think the focus can then be shifted to high-value sets that can help recoup the costs of developing the platforms.
Lastly, the report finds inadvertent consequences could result from tying open data resources to private-sector companies. Public-private open data partnerships could lead to infrastructure problems that prevent data from being widely shared, and help private companies in developing their bids for public services.
With citizen tax dollars footing the bill for these services, the public-private model puts them at a disadvantage with little information on when open data will be available for their consumption and the level of accessibility. By developing and maintaining the open data, the private sector has the ability to profit through providing multiple ways for governments and private companies to utilize the data and selling the information back to the public at a cost.
Johnson and Sieber encourage communities to ask the following questions before investing in open data:
- Who are the intended constituents for this open data?
- What is the purpose behind the structure for providing this data set?
- Does this data enable the intended users to meet their goals?
- How are privacy concerns addressed?
- Who sets the priorities for release and updates?
In addition to Johnson and Sieber, the paper's co-authors included University of Ottawa's Teresa Scassa, The State University of New York-Buffalo's Monica Stephens and Ryerson University’s Pamela Robinson.
Read the full report here.
Editor's note: This article was changed Aug. 2 to clarify the source of the research and add the paper's co-authors.
Sara Friedman is a reporter/producer for GCN, covering cloud, cybersecurity and a wide range of other public-sector IT topics.
Before joining GCN, Friedman was a reporter for Gambling Compliance, where she covered state issues related to casinos, lotteries and fantasy sports. She has also written for Communications Daily and Washington Internet Daily on state telecom and cloud computing. Friedman is a graduate of Ithaca College, where she studied journalism, politics and international communications.
Friedman can be contacted at firstname.lastname@example.org or follow her on Twitter @SaraEFriedman.
Click here for previous articles by Friedman.