Census 2020 will protect your privacy more than ever -- but at the price of accuracy

The differential privacy used to protect census data may make some of the resulting datasets unfit for many typical use cases, including some required by state and federal laws.

The Conversation

Census data can be pretty sensitive -- it’s not just how many people live in a neighborhood, a town, a state or the nation as a whole. Every 10 years, the Census Bureau asks about people’s ages, racial and ethnic backgrounds, personal relationships to others they live with and more. It’s information many people don’t share with neighbors or co-workers, much less the federal government.

People who don’t trust the Census Bureau to keep their data private and secure will be less likely to answer truthfully -- or answer at all.

Federal laws bar the bureau and its employees from sharing data with anyone, including other government agencies like police and the IRS. And the Census Bureau is taking new steps to protect the 2020 census data even more.

Census data can be published only as collections of statistics, but in an age where so many companies are collecting so much data about people, even anonymized statistics can present a privacy risk. Using some of this commercial data, census researchers conducted a simulated attack on their data and were able to match as many as 17% of the people who responded to the 2010 census.

The new protections, however, are raising concerns among community advocates, government officials and scholars who note that the method the Census Bureau is using to increase privacy makes the results less accurate. They worry a more private census may be less useful.

As a geographer who studies how to make and use geographic data, I have been involved over the past decade in efforts to modernize the 2020 census and make it more cost-effective. I see the importance of striking a balance between protecting our privacy and having accurate statistics for data-based decision-making.

An engine of government and the economy

The main purpose of the census, according to the clause of the Constitution that requires it to happen every 10 years, is to count the number of people living in each state, to determine how many members of the House of Representatives each state should get.

That’s easy enough, and could be done without collecting or publishing any personal data at all. But a survey that is supposed to reach every household in the country presents a rare opportunity to ask other questions too. So, from the very first one in 1790, the census has counted more than just noses.

The information it collects -- including ages, racial and ethnic information and home ownership rates -- helps determine how the federal government allocates US$1.5 trillion in spending every year. States, local governments, researchers and businesses also rely on census data to make spending plans and analyze community characteristics.

The U.S. has one of the most accurate and reliable censuses in the world. The resulting data has played a meaningful part in creating the economic prosperity and growth of the United States.

Data science breaks privacy protections

The Census Bureau -- and most statistical analysts too -- used to think that people’s privacy was protected by aggregating data together in large numbers. So the focus was on protecting privacy in small populations. Instead of saying, for instance, there were two Hispanic people in a particular neighborhood, the census data would say there were less than three.

In other cases, the Census Bureau computers swapped the numbers for households in different geographic areas, to mix up the data just a bit. Those changes were minor and didn’t make significant changes to the overall accuracy of the data.

As recently as 2012, scholarly research determined that the risks of revealing one person’s private information in census data was small, as low as 0.04%. But just a few years later, new research turned that finding upside-down.

In 2017 and 2018, the Census Bureau found that a data scientist who had access to commercial and public databases could match that information up with census statistics in a way that could identify as many as 17% of Americans who had completed the 2010 census.

That level of vulnerability was unacceptable to census officials, and the race was on to create better protections in time for the next census.

What is differential privacy?

After research and debate, the Census Bureau announced it would adopt a method called “differential privacy” to protect respondents’ data in the 2020 census.

One of the challenges for officials and scholars like me is that the system is very hard to explain. It’s so complicated that even the scholar who invented it, Harvard computer scientist Cynthia Dwork, has admitted that “It’s a dream of mine to learn how to really explain this so that it’s widely accessible.”

In a nutshell, differential privacy involves not reporting exactly accurate numbers – like “5 people in Bigtown City are Hispanic males” – but rather a random number relatively close to the accurate one, like 11. These random errors make it much harder for a data scientist to go back and figure out which Hispanic male in that city might be connected with a specific public record. And the public has some information, though it’s not exactly accurate or complete.

The system is so complex because it must make sure that all the randomly generated approximations make sense with each other. For example, the number of males plus the number of females must equal the total number of people. And the sum of all county populations in Tennessee must equal the state population of Tennessee.

In addition, to satisfy constitutional requirements, the total population of each state must be exactly correct – not adjusted by differential privacy at all – even though city and county totals may have quite a bit of randomness in them.

A troubling shift

The idea of intentionally adding errors to data is a dramatic change for the census. To help users understand the new method, the Census Bureau produced a test data set, applying differential privacy to the 2010 census results.

I was one of the group of experts who analyzed the test data. Some of what we found was reassuring: State population counts are, by design, completely accurate. And estimates for large populations -- like the number of 20-year-olds in Virginia, or the number of Hispanic people in Los Angeles -- are relatively accurate.

But much of what we found was shocking. Small counts are often unacceptably wrong. In the most extreme case, tiny Kalawao County, Hawaii, a former leper colony that is only accessible by air, sea or mule, had so much randomness added that its population jumped from 90 to 716.

My research group’s findings in Tennessee, where I live and work, showed that these errors could have big effects on local governments. For example, the state of Tennessee uses the census to determine how much money from sales, alcohol and gas taxes to send back to towns. In a typical year, the state sends about $120 per person to each town.

However, the randomness of differential privacy would have created a virtual lottery, with towns receiving anywhere from $80 to $180 per person, instead of an even $120 for everyone. For small rural communities, this could make the difference of whether to repave Main Street or whether to lay off a full-time police officer.

Other disturbing findings include:

  • A consistently low count of the number of Native Americans living on reservations.
  • A consistently inaccurate increase in the population of rural congressional districts.
  • Many counties with statistics that are implausible, like that there are no vacant homes at all.
  • Many counties with more households than people, which is impossible.

The general consensus of many experts present was that the test data, protected by differential privacy, are not fit for many uses, including some required by state and federal laws.

Time is running out

The Census Bureau is responding to the criticism raised by the experts, and recent census reports acknowledge that the test results deliver unacceptably inaccurate figures for small towns and for the count of Native Americans living on reservations. However, returning to the old methods is no longer being discussed as an option.

It’s unclear how the Census Bureau might untangle this mess in a way that yields both reliable statistics and reasonable privacy protection. The first deadline to publish small area statistics is March 31, 2021, when the congressional redistricting data are released.

What happens between now and then will determine whether the Census Bureau can solve the problem -- and convince officials, researchers and analysts that its solution is, in fact, useful for all the other purposes census data serve.

This article was first posted on The Conversation.

X
This website uses cookies to enhance user experience and to analyze performance and traffic on our website. We also share information about your use of our site with our social media, advertising and analytics partners. Learn More / Do Not Sell My Personal Information
Accept Cookies
X
Cookie Preferences Cookie List

Do Not Sell My Personal Information

When you visit our website, we store cookies on your browser to collect information. The information collected might relate to you, your preferences or your device, and is mostly used to make the site work as you expect it to and to provide a more personalized web experience. However, you can choose not to allow certain types of cookies, which may impact your experience of the site and the services we are able to offer. Click on the different category headings to find out more and change our default settings according to your preference. You cannot opt-out of our First Party Strictly Necessary Cookies as they are deployed in order to ensure the proper functioning of our website (such as prompting the cookie banner and remembering your settings, to log into your account, to redirect you when you log out, etc.). For more information about the First and Third Party Cookies used please follow this link.

Allow All Cookies

Manage Consent Preferences

Strictly Necessary Cookies - Always Active

We do not allow you to opt-out of our certain cookies, as they are necessary to ensure the proper functioning of our website (such as prompting our cookie banner and remembering your privacy choices) and/or to monitor site performance. These cookies are not used in a way that constitutes a “sale” of your data under the CCPA. You can set your browser to block or alert you about these cookies, but some parts of the site will not work as intended if you do so. You can usually find these settings in the Options or Preferences menu of your browser. Visit www.allaboutcookies.org to learn more.

Sale of Personal Data, Targeting & Social Media Cookies

Under the California Consumer Privacy Act, you have the right to opt-out of the sale of your personal information to third parties. These cookies collect information for analytics and to personalize your experience with targeted ads. You may exercise your right to opt out of the sale of personal information by using this toggle switch. If you opt out we will not be able to offer you personalised ads and will not hand over your personal information to any third parties. Additionally, you may contact our legal department for further clarification about your rights as a California consumer by using this Exercise My Rights link

If you have enabled privacy controls on your browser (such as a plugin), we have to take that as a valid request to opt-out. Therefore we would not be able to track your activity through the web. This may affect our ability to personalize ads according to your preferences.

Targeting cookies may be set through our site by our advertising partners. They may be used by those companies to build a profile of your interests and show you relevant adverts on other sites. They do not store directly personal information, but are based on uniquely identifying your browser and internet device. If you do not allow these cookies, you will experience less targeted advertising.

Social media cookies are set by a range of social media services that we have added to the site to enable you to share our content with your friends and networks. They are capable of tracking your browser across other sites and building up a profile of your interests. This may impact the content and messages you see on other websites you visit. If you do not allow these cookies you may not be able to use or see these sharing tools.

If you want to opt out of all of our lead reports and lists, please submit a privacy request at our Do Not Sell page.

Save Settings
Cookie Preferences Cookie List

Cookie List

A cookie is a small piece of data (text file) that a website – when visited by a user – asks your browser to store on your device in order to remember information about you, such as your language preference or login information. Those cookies are set by us and called first-party cookies. We also use third-party cookies – which are cookies from a domain different than the domain of the website you are visiting – for our advertising and marketing efforts. More specifically, we use cookies and other tracking technologies for the following purposes:

Strictly Necessary Cookies

We do not allow you to opt-out of our certain cookies, as they are necessary to ensure the proper functioning of our website (such as prompting our cookie banner and remembering your privacy choices) and/or to monitor site performance. These cookies are not used in a way that constitutes a “sale” of your data under the CCPA. You can set your browser to block or alert you about these cookies, but some parts of the site will not work as intended if you do so. You can usually find these settings in the Options or Preferences menu of your browser. Visit www.allaboutcookies.org to learn more.

Functional Cookies

We do not allow you to opt-out of our certain cookies, as they are necessary to ensure the proper functioning of our website (such as prompting our cookie banner and remembering your privacy choices) and/or to monitor site performance. These cookies are not used in a way that constitutes a “sale” of your data under the CCPA. You can set your browser to block or alert you about these cookies, but some parts of the site will not work as intended if you do so. You can usually find these settings in the Options or Preferences menu of your browser. Visit www.allaboutcookies.org to learn more.

Performance Cookies

We do not allow you to opt-out of our certain cookies, as they are necessary to ensure the proper functioning of our website (such as prompting our cookie banner and remembering your privacy choices) and/or to monitor site performance. These cookies are not used in a way that constitutes a “sale” of your data under the CCPA. You can set your browser to block or alert you about these cookies, but some parts of the site will not work as intended if you do so. You can usually find these settings in the Options or Preferences menu of your browser. Visit www.allaboutcookies.org to learn more.

Sale of Personal Data

We also use cookies to personalize your experience on our websites, including by determining the most relevant content and advertisements to show you, and to monitor site traffic and performance, so that we may improve our websites and your experience. You may opt out of our use of such cookies (and the associated “sale” of your Personal Information) by using this toggle switch. You will still see some advertising, regardless of your selection. Because we do not track you across different devices, browsers and GEMG properties, your selection will take effect only on this browser, this device and this website.

Social Media Cookies

We also use cookies to personalize your experience on our websites, including by determining the most relevant content and advertisements to show you, and to monitor site traffic and performance, so that we may improve our websites and your experience. You may opt out of our use of such cookies (and the associated “sale” of your Personal Information) by using this toggle switch. You will still see some advertising, regardless of your selection. Because we do not track you across different devices, browsers and GEMG properties, your selection will take effect only on this browser, this device and this website.

Targeting Cookies

We also use cookies to personalize your experience on our websites, including by determining the most relevant content and advertisements to show you, and to monitor site traffic and performance, so that we may improve our websites and your experience. You may opt out of our use of such cookies (and the associated “sale” of your Personal Information) by using this toggle switch. You will still see some advertising, regardless of your selection. Because we do not track you across different devices, browsers and GEMG properties, your selection will take effect only on this browser, this device and this website.