population data

Researchers raise concerns with differential privacy use on census data

After the Census Bureau announced in 2018 that it would use differential privacy to protect the identities of individuals for the 2020 census, researchers at Penn State began to evaluate how these changes could affect census data integrity.


Explainer: What is differential privacy and how can it protect your data?

By adding random noise to the aggregate data, differential privacy can protect information about individual users while still providing accurate results from database queries. Read more.

Differential privacy injects random "noise" into the aggregate data in an effort to better protect the identities of individual respondents when the data is published.

Nicholas N. Nagle, an associate professor of geography at the University of Tennessee who analyzed census test data, explained the technique this way: “In a nutshell, differential privacy involves not reporting exactly accurate numbers – like ‘5 people in Bigtown City are Hispanic males’ – but rather a random number relatively close to the accurate one, like 11. These random errors make it much harder for a data scientist to go back and figure out which Hispanic male in that city might be connected with a specific public record. And the public has some information, though it’s not exactly accurate or complete.”

Nagle said his analysis showed that state population counts are completely accurate, and estimates for large populations -- like the number of 20-year-olds in Virginia, or the number of Hispanic people in Los Angeles -- are relatively accurate. Data on small populations, however, was “unacceptably wrong,” he said, citing an example of Kalawao County, Hawaii, a former leper colony, which had so much randomness added to its data that its population count jumped from 90 to 716.

The Penn State researchers zeroed in on mortality rates among racial and ethnic minorities and found that, compared with traditional methods of identity protection, using differential privacy on the 2010 census data produced dramatic changes.

"We focused on mortality rate estimates because they are an essential population-level metric for which data are collected and disseminated at the national level and because mortality rates are a critical indicator of population health," Alexis Santos, assistant professor of human development and family studies, told Penn State News.

"We discovered that by using differential privacy, there were both instances of under- and over-counting of the population. In rural areas, there was undercounting of racial and ethnic minorities, while in urban areas there was an overcounting of these populations," he said. In some cases, discrepancies between the two methods of data analysis exceeded a 10% difference.

"This is very concerning because it could impact how much funding programs receive for a specific geographic area," said Santos. "These discrepancies could result in understated health risks in some areas, while overstating in others where there isn't a great need."

According to Santos, the findings highlight the consequences of implementing differential privacy and demonstrate the challenges in using the data products derived from this method.

"The Census Bureau has been very receptive to our research, and demonstrated concern about the accuracy of the data," Santos said. "We plan to move forward with additional research to determine how differential privacy may affect population growth estimates and populations changes from census year to census year. We still have time to fine tune the differential privacy algorithm, and our research will help pinpoint areas of improvement."

About the Author

Susan Miller is executive editor at GCN.

Over a career spent in tech media, Miller has worked in editorial, print production and online, starting on the copy desk at IDG’s ComputerWorld, moving to print production for Federal Computer Week and later helping launch websites and email newsletter delivery for FCW. After a turn at Virginia’s Center for Innovative Technology, where she worked to promote technology-based economic development, she rejoined what was to become 1105 Media in 2004, eventually managing content and production for all the company's government-focused websites. Miller shifted back to editorial in 2012, when she began working with GCN.

Miller has a BA and MA from West Chester University and did Ph.D. work in English at the University of Delaware.

Connect with Susan at [email protected] or @sjaymiller.


  • Russia prying into state, local networks

    A Russian state-sponsored advanced persistent threat actor targeting state, local, territorial and tribal government networks exfiltrated data from at least two victims.

  • Marines on patrol (US Marines)

    Using AVs to tell friend from foe

    The Defense Advanced Research Projects Agency is looking for ways autonomous vehicles can make it easier for commanders to detect and track threats among civilians in complex urban environments without escalating tensions.

Stay Connected