Can private data as a service unlock government data sharing?
- By David W. Archer
- Nov 20, 2020
Government organizations at every level sit on a trove of valuable yet sensitive data that can be used to improve citizen services, prevent cyberattacks or provide more personalized health care.
The positive impact of this data, however, is capped because data sharing between organizations is restricted by policies, statutes and justifiable broader concerns about protecting data privacy and security. As a result, agencies often clutch their data tight and keep it siloed from other parts of government.
Agencies are also hyper-sensitive to real and perceived citizen opposition to data sharing. A 2020 survey commissioned by The Pew Charitable Trusts affirms these concerns: When informed that their health information might not be covered by existing federal privacy laws -- such as the Health Insurance Portability and Accountability Act (HIPAA) -- once it was downloaded to an app, nine out of 10 survey respondents voiced concerns. Agencies are further constrained by privacy laws restricting information sharing both with other agencies and with those outside the agency, such as researchers who provide statistical expertise to make sense of data. In general, personally identifiable information (PII) is required to stay inside the agency that collected it.
However, it is worth nothing that while citizens are understandably concerned about having their personal information fall into the wrong hands -- or too many hands -- they are also hopeful about the benefits of data sharing. The same Pew survey found that 81% of adults would support enabling different health care providers to share patient health record information between their EHR systems when they are caring for the same patient.
With agencies and citizens torn between the benefits and pitfalls of data sharing, efforts are underway to break out of the zero-sum mold that today forces a choice between protecting data privacy and ensuring that data can be shared to deliver a useful benefit.
Agencies and private data as a service
Data as a service (DaaS), a scalable model where many analysts can access a shared data resource, is commonplace. However, privacy assurance about that data has not kept pace. Data breaches occur by the thousands each year, and insider threats to privacy are commonplace. De-identification of data can often be reversed and has little in the way of a principled security model. Data synthesis techniques can only model correlations across data attributes for unrealistically low-dimensional schemas. What is required to address the unique data privacy challenges that government agencies face is a privacy-focused service that protects data while retaining its utility to analysts: private data as a service (PDaaS).
PDaaS can sit atop DaaS to protect subject privacy while retaining data utility to analysts. Some of the most compelling work to advance PDaaS can be found with projects funded by the Defense Advanced Research Projects Agency’s Brandeis Program, which “...seeks to develop the technical means to protect the private and proprietary information of individuals and enterprises.”
According to DARPA, “[t]he vision of the Brandeis program is to break the tension between: (a) maintaining privacy and (b) being able to tap into the huge value of data. Rather than having to balance between them, Brandeis aims to build a third option – enabling safe and predictable sharing of data in which privacy is preserved.”
But how is this layer of assured privacy atop PDaaS achieved? And how ready is the technology for achieving it? To answer those questions, we need to say something about what kind of computing it is that PDaaS provides. Most often, that computing is statistical analysis, rather than query access to individual data. We also need to separate the notions of input privacy -- keeping the sensitive data hidden from the computers that analyze it -- and output privacy -- keeping anyone who sees the output statistics from rediscovering what the input data must have been.
To assure input privacy, PDaaS relies on secure computation -- a family of cryptography techniques that allow computation without seeing decrypted data, such as private set intersection, secure multiparty computation or homomorphic encryption. Some of these techniques are practical today -- we describe one, private set intersection, below. Others are seeing “first light” on a few carefully chosen applications, but they need more prototype applications to find the right balance of performance and security.
To assure output privacy, PDaaS relies on differential privacy -- the addition of carefully chosen noise to statistical computations so that results remain both accurate and privacy-preserving. Differential privacy is in limited practical use today, for example in statistical analysis for the 2020 census, although more can be done to apply it to more complex use cases.
Taken together, secure computation and differential privacy can, for a growing number of applications, keep sensitive data confidential while keeping statistics over that data fully accurate and useful.
Current and potential PDaaS use cases
PDaaS is at its core about arming government agencies at every level with the confidence they need to fully share information to strengthen homeland security and improve citizen services.
Emerging use cases for PDaaS offer an early glimpse of how impactful it can be to inter-agency data sharing:
- Defense and intelligence agencies sharing network and cyberattack data to prevent future attacks.
- Health agencies sharing patient data to enable the delivery of personalized medicine without putting patient privacy at risk.
- Education agencies providing information to students about the value of a college degree.
- Federal, state and local agencies conducting critical contact tracing to protect residents’ health and identify hot spots.
What’s next for PDaaS
Emerging practical use cases help chart the path forward for PDaaS. We also expect to see agencies expand from data sharing intent to protocol-based, practical initiatives. Private set intersection techniques are an example of this, allowing two or more agencies to find out what data they each hold about a common subject (such as a person) easily and quickly, while giving up no information to anyone except the people they know about in common. PSI also supports useful statistical calculations over data without revealing that data, allowing agencies to compute important statistics about subjects they hold in common. It holds tremendous potential for transitioning privacy-preserving technologies into real everyday use first.
The Federal Data Strategy mandates sharing of data, even though significant barriers persist in law, policy and public acceptance. The good news is that the value of sharing seems clear to a variety of agencies and to the general public. The better news is that technology exists to make sharing practical while preserving data privacy, breaking the zero-sum mold of sharing or privacy, but not both.
What remains is to identify opportunities for such sharing, engage to build prototypes that show it is practical and secure and then decide to make PDaaS a part of operations moving into the future.
David W. Archer is principal scientist, cryptography & multiparty computation, at Galois.