The Washington PostDemocracy Dies in Darkness

New system to protect census data may compromise accuracy, some experts say

June 1, 2021 at 7:05 p.m. EDT
Census employee Whitney Turner, holds a sign reading, “I count.” The U.S. Census Bureau unveiled its advertising and outreach campaign for the 2020 Census at Arena Stage in Washington on Jan. 14, 2020. (Sarah L. Voisin/The Washington Post)

As the Census Bureau prepares to release data from the 2020 Census for redistricting this summer, a controversy is brewing over a new way it plans to protect details of responders’ identities.

The system, known as differential privacy, adds “noise” to the data to scramble it and block would-be hackers from identifying people who filled out the census. The bureau has said it is necessary because recent advances in technology have made it too easy for outside actors to “re-identify” respondents, to whom the government guarantees privacy.

But some statisticians, along with advocates from both ends of the political spectrum, charge that the bureau’s plans could corrupt the data so much as to make it unusable.

A report Friday from IPUMS, a survey data processing and dissemination organization at the University of Minnesota, found “profoundly disturbing results” in the most recent version of the plan released in late April. The final version is due in June.

The report found that “major discrepancies remain for minority populations,” adding that “small localities can sometimes have their population doubled or halved by the disclosure avoidance noise.”

The report’s authors said they could not conduct more than a basic analysis because of limited time and content, but concluded that “the planned system would enter every community into a bad data lottery where the losers suffer for 10 years with material losses of federal funding. Litigation by undercounted communities is inevitable, and in these cases the Census Bureau will probably be forced to release the true counts.”

2020 Census shows U.S. populationgrew at slowest pace since the 1930s

The Census Bureau said in a statement that the “redistricting data product has been tuned for accuracy in drawing districts for political entities as small as 350 people and for reliable protection of minority voting rights for populations equally small. These statistics are both accurate and unbiased, as the metrics we have published show.”

Decennial census data is used to determine a decade’s worth of apportionment, redistricting and the distribution of $1.5 trillion a year in federal funds.

For the past three decennial counts, the bureau used a technique called swapping to protect privacy, selecting households at random in small geographic areas and exchanging records between these households before generating the statistics. But the bureau has determined that advances in technology make swapping less secure.

Alabama has already sued the bureau over its differential privacy plan, saying it will “intentionally skew the population tabulations provided to States to use for redistricting” and “force Alabama to redistrict using results that purposefully count people in the wrong place.”

Civil rights groups, including the Mexican American Legal Defense and Education Fund and Asian Americans Advancing Justice-Asian American Justice Center, have also warned that differential privacy may undermine the fitness of the data and particularly affect minorities such as Latinos and Asian Americans, although they have said they hope their concerns will be addressed in the final version.

Researchers at Harvard University ran a series of simulations on voter redistricting plans using Census population data to determine how the application of differential privacy might affect the ultimate count — and by extension, efforts to redraw voting districts.

The group found that applying differential privacy caused the accuracy of population counts to suffer, particularly in districts that had high racial and ethnic diversity, and cast doubt on the government’s ability to rely on this data to enforce the “one person, one vote” principle that ensures every American an equally weighted vote, said Christopher Kenny, a PhD candidate at Harvard’s Department of Government.

Differential privacy “changes it quite drastically. The deviations are well outside of what would typically be usable for these types of purposes,” Kenny said. “It’s potentially a very harmful problem to have.” The group concluded that the bureau should go back to the swapping methodology it used in 2010, he said.

David Van Riper, an author of the IPUMS report, also told The Washington Post he would prefer that the bureau retain the swapping technique, or if that was not possible, either decrease the amount of noise in its final version of differential privacy or come up with another plan altogether to protect respondents.

But not all statisticians are worried. A report Friday out of Princeton University used a computer algorithm to create thousands of redistricting plans based on differential privacy and found the relevant measures to redistricting to be comparable to the 2010 data.

Instead of comparing the differences in minority population in small localities when differential privacy was used as the IPUMS researchers did, the Princeton researchers compared those differences to the total population. In that scenario, the differences were less significant, said Ari Goldbloom-Helzner, a computational research analyst at Princeton’s Electoral Innovation Lab, which did the analysis.

“The beauty of the differential privacy algorithm is that it’s really designed so on the small block level you have these discrepancies, but when you put them all together those differences are random enough that they cancel each other out and you actually get results that are equivalent to the 2010 census,” he said.

And researchers from Boston University and Tufts University similarly concluded that differential privacy will not have a substantial enough effect on the accuracy of population counts to alter the map when it comes to redistricting, nor would the differences in racial and ethnic data affect officials’ abilities to enforce the protections of the Voting Rights Act.

The group ran a similar algorithm on populations drawn from Texas jurisdictions to evaluate its findings and concluded that the census algorithm “performs far better” than other methods of adding noise to census data in terms of preserving accuracy.

In a legal filing in April for the Alabama case, the bureau’s senior statistician, John Abowd, warned that if it is blocked from using differential privacy, redistricting data would be less accurate and would be delayed significantly past September. The data is already coming out several months later than originally scheduled because of the coronavirus pandemic.

More on the census

Here’s how America’s racial makeup has changed over the past decade. You can drill down by address to see how certain areas have shifted.