Rat problems? In D.C., data could help
- By Matt Leonard
- Aug 01, 2017
Governments are trying a wide range of approaches to enable and encourage data-based decision-making: standing up data teams, creating chief data officer positions and building open data portals. Now Washington D.C. has created the Lab @ DC help make policy decisions driven by what’s in the numbers.
The Lab, which is based in the Office of the City Administrator, has been up and running for most of this year, but it’s formal announcement wasn’t until late July. It’s mission, according to Lab Director David Yokum, is “injecting a pretty serious scientific capacity right in the nerve center of government.”
The staff is made up of data scientists who have diverse backgrounds in math, computer science and social science. Currently, they’re working on 17 different initiatives.
Peter Casey, a senior data scientist at both the Office of the Chief Technology Officer and at Lab @ DC, said the process for deciding what projects to tackle is “somewhat akin to peer review in the academic community.”
It starts with the original idea, Casey said, driven by a problem in the city, a policy debate, a piece of legislation before the City Council or anything else. From there, the Lab @ DC team works with the city agency that would be in charge of that particular area. Casey, for example, is working on a project dealing with rat control, so he worked with the city’s Department of Health and its rodent control team. These ideas are then presented in the form of a memo to a committee of members within the lab; if approved, a pre-analysis is then developed and publically published.
The lab is using existing public data and datasets it creates itself in the form of randomized control trials.
Casey’s project on rat abatement is using data from 311 calls.
“The way that works is we’re drawing from existing 311 data to figure out where we’re getting calls from and where -- when we send our rodent control team out -- they’re actually finding and being able to treat rat burrows,” he explained. “We’re then going to try and predict those locations using environmental variables that are known to be predictive of rat ecology as well as information about the geography of the city, such as where we have lots of restaurants, where we have lots of construction, where people are most concentrated.”
It would have been easy for Casey to write such a model using the public data before now, he said, but there is significant context that is important to understand before writing the code. In his time with the rodent control tea, Casey learned why the spreadsheet looks the way it does: Why do less-urban areas have fewer 311 rodent calls? What environmental factors lead to rats? It also helped him determine -- thanks to conversations he had with a rodentologist -- that his unit of analysis should be the census block, because rats don’t like crossing streets or other natural barriers like rivers, Casey said.
Other projects the lab is currently working on include studying the effectiveness of body cameras, trying to use predictive modeling to improve housing inspections and assisting in efforts to reform the city’s criminal code.
Right now, the Lab @ DC team is using a variety of different methods to analyze its data, including R, Python and others. But also uses JupyterHub, an interface for scripting in python, running on a Microsoft Azure server. And “as a team we’re trying to develop our capacity to work in Python,” Casey said.
The Lab borrowed most of its data policy from the existing D.C. policy, he said. But data ethics are also a major consideration, because of how harmful a model can be if it isn’t written correctly.
“Machine learning models can perpetuate historic inequalities that are a part of the society we live in,” Casey said, “and we have to be constantly vigilant about how to use the data we have to serve our populations and avoid that as a possibility.”
A big part of preventing that is having conversations about the worst-case scenario for vulnerable populations, he said. Yokum added that transparency about the data sources and the modeling decisions also helps in ensuring both ethical data use and public trust.
City Administrator Rashad Young said in a statement that the Lab @ DC will directly inform the decisions D.C. officials make.
“You can’t manage what you can’t measure and by using the scientific method we are getting the best possible measurements to inform how we manage the city,” Young said. “That means we are learning from the evidence that exists in the world, while taking the next step of generating our own evidence so that we can know what works in the DC context.”
Matt Leonard is a former reporter for GCN.