How crime stats can lie
- By Kathleen Hickey
- Feb 01, 2016
There’s a reason people point to “lies, damned lies and statistics” in discussions where the reliability of the numbers is key. Data can be misleading, which is why statistics should be approached with caution – and context.
Missing, incomplete, opaque and widely varying crime statistics, for example, have been called out in recent discussions of the openness and transparency of law enforcement data. Some of the reasons crime stats are all over the map were compiled by FiveThirtyEight, the statistical analysis blog:
- Many crimes go unreported, particularly rape, and incidents that show up in crime surveys are not always reported to police.
- There are wide discrepancies in how fast cities report their own crime data, and slow reporting of official numbers leads to headlines based on cities’ unofficial numbers.
- Much detailed crime information doesn’t make it into the Federal Bureau of Investigation’s Uniform Crime Reporting system, the official national repository of crime data.
- Crime trends look different depending on the indicators. For example, a city’s homicide rate can rise while overall crime falls.
- Definitions of crimes aren’t standardized, with reporting often relying on a subjective assessment by police officers or their supervisors.
- Many police departments don’t keep separate counts of shootings. Some exceptions include New Orleans, Baltimore, Indianapolis, Cleveland, Cincinnati and Seattle.
- Homicide rates don’t include justifiable homicides, which includes killings deemed to be in self-defense and killings by police officers.
Data that requires caveats, qualifications and explanations makes meaningful analysis and comparisons among jurisdictions nearly impossible.
How can these issues affect results? One example given by FiveThirtyEight was an assertion by former New York City Police Department Commissioner Raymond Kelly that current Commissioner William Bratton deflated the city’s shooting totals by not counting those injured by broken glass caused by gunfire or those whose clothes, but not bodies, were hit.
Having standard, machine-readable data is key to reducing these data discrepancies. That’s one of the reasons for the creation of the White House’s Police Data Initiative, launched in May 2015. The initiative focuses on two key points: open data and analytics.
Open data will be used to increase transparency, build community trust and support innovation; and analytics will be applied to internal police data and processes to identify problems, increase internal accountability and decrease inappropriate uses of force. As of mid-January, 29 agencies have joined the initiative.
Other cities not directly involved in the initiative have started their own open data portals for crime statistics. In November, The Indianapolis Department of Public Safety and non-profit Code for America launched Project Comport to open internally collected police data, such as officer-involved shootings, resident complaints against law enforcement, assaults on law enforcement and uses of force by law enforcement.
Data discrepancies also can make it difficult to perform accurate predictive analytics, the next frontier in policing. In November 2015, researchers at Harvard Medical School, using machine learning to analyze over 3 terabytes of data on military personnel, were able to predict the 5 percent of U.S. Army soldiers who later committed one-third of all violent crimes in the workplace between 2004 and 2009. When analyzing data from 2011 to 2013, the researchers were able to predict the 5 percent who committed 51 percent of violent crimes.
According to Ronald Kessler, professor of health care policy at Harvard Medical School and the principal investigator on the project, by far the biggest part of the project was getting the data in shape for the analytics.
Kathleen Hickey is a freelance writer for GCN.