How big data and algorithms are slashing the cost of fixing Flint’s water crisis

Leveraging new algorithmic and statistical tools, researchers created a significantly more complete picture of the risks and challenges in Flint.

This article was originally published in The Conversation.

The water crisis in Flint, Mich., highlights a number of serious problems: a public health outbreak, inadequate urban infrastructure, environmental injustice and political failures. But when it comes to recovery, the central challenge, and one that has received relatively little attention, is our lack of useful information and understanding.

Who is most at risk? Where are the harmful sources of lead? Where should resources be allocated? Using modern big data tools, we can answer these questions and help inform the response to this crisis.

With the support of our student team at the University of Michigan, we have aggregated a trove of available data around Flint’s water issues, including water test results, records of the service lines that deliver water to homes, information on parcels of land and water usage. Leveraging new algorithmic and statistical tools, we are able to produce a significantly more complete picture of the risks and challenges in Flint.

These methods strongly resemble those used by Facebook, Amazon and other large tech companies who collect vast amounts of data from users. But whereas Facbeook’s algorithms crunch through uploaded photographs to detect faces and Amazon’s models predict which products you’ll like, we are using these analytics tools to detect homes with high risk of lead contamination and to predict the locations of lead pipes buried underground or hidden in the homes of residents.

What have we learned? Here are a few takeaways from our research.

Lead contamination varies widely across homes and is highly scattered around Flint, but it is surprisingly predictable

The headlines on Flint could easily lead one to believe all homes in the city have dangerously high levels of lead. But in fact, using data from the state’s sentinel program, we found during a period in February only between 8 and 15 percent of homes had lead above the federal action level of 15 parts per billion (ppb).

Indeed, things have been improving from January through August 2016, according to the test data from the sentinel program. Based on about 750 homes monitored repeatedly, fewer homes have tested above the action level over time. Almost half of all samples have virtually no detectable level (below 1 parts per billion).

Percent of samples in the DEQ’s sentinel program that tested below the federal action level.
Percent of samples in the DEQ’s sentinel program that tested below the federal action level. Credit: Jonathan Stroud, Ph.D. student at UM.

These low numbers provide little comfort when we don’t know which homes are at risk. Only around 30 percent of homes in Flint have had their water tested, according to government data, and these water tests do not guarantee safety; they only identify danger. Also, it is clear from the data that homes that are slower to sample their water tend to be those at much greater risk.

So can we find these homes? The answer is yes, to a modest degree of accuracy. We have built statistical models that profile a home based on several attributes (year of construction, location, value, size, etc.), and provide an estimate of the risk level.

Based on our statistical models, we can display locations which we estimate to be at high risk of lead contamination.
Based on our statistical models, we can display locations which we estimate to be at high risk of lead contamination. Credit: Jared Webb, Ph.D. student at Brigham Young University.

The quality of these models is driven by the huge swaths of data from water samples submitted by residents and tested by government officials in response to the crisis. This provides us with a database of measurements that includes over 20,000 water samples covering roughly 10,000 homes in Flint since November 2015 to present. We have made our risk assessments available to government officials, and they are being incorporated into a mobile application, funded by Google and built by students at UM Flint, that allows Flint residents to learn of their home’s risk level.

Younger properties have lower lead levels, on average and based on the 90th percentile (blue line). There were 8 percent of tests above federal action level 15 ppb (dotted red), and still some well above 150 ppb and even 1,000 ppb.
Younger properties have lower lead levels, on average and based on the 90th percentile (blue line). There were 8 percent of tests above federal action level 15 ppb (dotted red), and still some well above 150 ppb and even 1,000 ppb. The highest 0.5 percent of samples are not shown.

These statistical models not only provide predictions; they also give a better understanding of the problems. This has much broader implications, as these factors predicting lead may generalize beyond Flint.

The data suggest that lead contamination is associated with a number of factors; older homes tend to be at greater risk, for instance, as are those of lower home value. Lower-value homes also tend to be those with the lowest rates of water sampling. Additionally, while the highest readings are geographically scattered, the homes predicted to be at high risk tend to cluster in specific neighborhoods.

Flint’s lead pipe records are spotty and noisy, but statistical methods can significantly fill the gap

Media reports and political efforts have continued to focus on the so-called “water service lines” that connect each house to the distribution system in the street. The assumption is that homes with lead service lines are most at risk for lead exposure and poisoning. As a result, much of the attention has been on locating and replacing these lines.

The Michigan legislature has allocated over US $25 million toward replacing the harmful lines, beginning with a pilot phase of roughly 250 homes. This effort is being headed up by a team under National Guard Brig. Gen. Michael McDaniel.

The problem, however, is not only with lines made out of lead material: Lead particulate can accumulate on the walls of corroded galvanized steel pipes. Pipes made of copper or plastic, on the other hand, are generally considered to be safe.

But there are immediate challenges with the line replacement program. And the most obvious is: Where are these dangerous pipes?

The city, unfortunately, did not maintain consistent records on service line installations and materials. But city officials eventually found, after some searching, a set of maps with handwritten annotations (last updated in 1984), and these records were digitized by a UM Flint research team lead by Professor Marty Kaufman. These appeared to identify the material of the service lines for most home parcels in Flint.

Using paper records, researchers were able to get a rough idea of what type of material -- lead, copper or plastic -- was used to bring water service to home.
Using paper records, researchers were able to get a rough idea of what type of material -- lead, copper or plastic -- was used to bring water service to home. Author provided.

How complete and accurate are these records? Unfortunately, not very. For over 30 percent of homes, either there are missing labels or the records disagree with a home inspection of a portion of the service line.

We can again fill in gaps with the help of algorithms and data. Looking for patterns in the existing records, statistical tools can provide a reasonable “educated guess” as to the type of material in a home’s service line. We have been working directly with Gen. McDaniel’s line replacement team, providing statistical estimates of where lead pipes are most likely to be found, and this has guided their targeting of replacement resources.

Our recommendations are adapting to incoming data, using techniques applied in online advertising experiments or clinical trials, to identify the risky homes quickly and efficiently.

Professors Schwartz (left) and Abernethy (right) at a service line replacement site in Flint, Michigan.Professors Schwartz (left) and Abernethy (right) at a service line replacement site in Flint, Michigan.

Our machine learning techniques, which utilize all of the available city data, parcel records and a database of over 3,000 inspection reports, are able to estimate line materials with better than 80 percent accuracy. We find, for instance, that houses built in the 1920s to 1940s are many times more likely than those built after 1960 to have lead in their service line. Our guesses aren’t perfect by any means, but estimates of this level can save millions of dollars on recovery efforts.

Home service lines may not be the largest contributor of lead

Despite the huge media attention focused on the service lines, one of the major takeaways from our analyses is that these service lines may not be the major driver of the lead in Flint’s drinking water. Yes, it is the case that those homes with copper service lines have lower lead levels, on average, than those with lead in their service line. But when you look closely at the water testing data, the differences are much smaller than you might think.

While it is difficult to determine with certainty due to the spotty records, what we have found is that large spikes of lead occur in homes with and without lead service lines. This suggests a large fraction of the dangerously high lead readings are probably not being driven by the service line material but instead by other factors. Civil engineers who study these problems report that lead can leach from several sources, including the home’s interior plumbing, faucet fixtures and aging pipe solder.

We can look at homes that, based on records and home inspections, appear to have copper-only service lines versus those containing some lead. We plot the distribution of the lead readings for water samples from these two home categories.
We can look at homes that, based on records and home inspections, appear to have copper-only service lines versus those containing some lead. We plot the distribution of the lead readings for water samples from these two home categories.

What we can conclude is that citizens as well as policymakers may need to widen their focus beyond the service line materials and consider alternative efforts to address other sources of lead. Service line replacement is certainly a necessary part of the solution, but it will not be sufficient.

Toward solving the broader problem, data and statistical tools can help greatly reduce risks at much lower cost, and a data-oriented understanding of the problems in Flint can guide efforts to address lead concerns in other regions as well.

For more information about getting water filters and testing your water, visit michigan.gov/flintwater/.

X
This website uses cookies to enhance user experience and to analyze performance and traffic on our website. We also share information about your use of our site with our social media, advertising and analytics partners. Learn More / Do Not Sell My Personal Information
Accept Cookies
X
Cookie Preferences Cookie List

Do Not Sell My Personal Information

When you visit our website, we store cookies on your browser to collect information. The information collected might relate to you, your preferences or your device, and is mostly used to make the site work as you expect it to and to provide a more personalized web experience. However, you can choose not to allow certain types of cookies, which may impact your experience of the site and the services we are able to offer. Click on the different category headings to find out more and change our default settings according to your preference. You cannot opt-out of our First Party Strictly Necessary Cookies as they are deployed in order to ensure the proper functioning of our website (such as prompting the cookie banner and remembering your settings, to log into your account, to redirect you when you log out, etc.). For more information about the First and Third Party Cookies used please follow this link.

Allow All Cookies

Manage Consent Preferences

Strictly Necessary Cookies - Always Active

We do not allow you to opt-out of our certain cookies, as they are necessary to ensure the proper functioning of our website (such as prompting our cookie banner and remembering your privacy choices) and/or to monitor site performance. These cookies are not used in a way that constitutes a “sale” of your data under the CCPA. You can set your browser to block or alert you about these cookies, but some parts of the site will not work as intended if you do so. You can usually find these settings in the Options or Preferences menu of your browser. Visit www.allaboutcookies.org to learn more.

Sale of Personal Data, Targeting & Social Media Cookies

Under the California Consumer Privacy Act, you have the right to opt-out of the sale of your personal information to third parties. These cookies collect information for analytics and to personalize your experience with targeted ads. You may exercise your right to opt out of the sale of personal information by using this toggle switch. If you opt out we will not be able to offer you personalised ads and will not hand over your personal information to any third parties. Additionally, you may contact our legal department for further clarification about your rights as a California consumer by using this Exercise My Rights link

If you have enabled privacy controls on your browser (such as a plugin), we have to take that as a valid request to opt-out. Therefore we would not be able to track your activity through the web. This may affect our ability to personalize ads according to your preferences.

Targeting cookies may be set through our site by our advertising partners. They may be used by those companies to build a profile of your interests and show you relevant adverts on other sites. They do not store directly personal information, but are based on uniquely identifying your browser and internet device. If you do not allow these cookies, you will experience less targeted advertising.

Social media cookies are set by a range of social media services that we have added to the site to enable you to share our content with your friends and networks. They are capable of tracking your browser across other sites and building up a profile of your interests. This may impact the content and messages you see on other websites you visit. If you do not allow these cookies you may not be able to use or see these sharing tools.

If you want to opt out of all of our lead reports and lists, please submit a privacy request at our Do Not Sell page.

Save Settings
Cookie Preferences Cookie List

Cookie List

A cookie is a small piece of data (text file) that a website – when visited by a user – asks your browser to store on your device in order to remember information about you, such as your language preference or login information. Those cookies are set by us and called first-party cookies. We also use third-party cookies – which are cookies from a domain different than the domain of the website you are visiting – for our advertising and marketing efforts. More specifically, we use cookies and other tracking technologies for the following purposes:

Strictly Necessary Cookies

We do not allow you to opt-out of our certain cookies, as they are necessary to ensure the proper functioning of our website (such as prompting our cookie banner and remembering your privacy choices) and/or to monitor site performance. These cookies are not used in a way that constitutes a “sale” of your data under the CCPA. You can set your browser to block or alert you about these cookies, but some parts of the site will not work as intended if you do so. You can usually find these settings in the Options or Preferences menu of your browser. Visit www.allaboutcookies.org to learn more.

Functional Cookies

We do not allow you to opt-out of our certain cookies, as they are necessary to ensure the proper functioning of our website (such as prompting our cookie banner and remembering your privacy choices) and/or to monitor site performance. These cookies are not used in a way that constitutes a “sale” of your data under the CCPA. You can set your browser to block or alert you about these cookies, but some parts of the site will not work as intended if you do so. You can usually find these settings in the Options or Preferences menu of your browser. Visit www.allaboutcookies.org to learn more.

Performance Cookies

We do not allow you to opt-out of our certain cookies, as they are necessary to ensure the proper functioning of our website (such as prompting our cookie banner and remembering your privacy choices) and/or to monitor site performance. These cookies are not used in a way that constitutes a “sale” of your data under the CCPA. You can set your browser to block or alert you about these cookies, but some parts of the site will not work as intended if you do so. You can usually find these settings in the Options or Preferences menu of your browser. Visit www.allaboutcookies.org to learn more.

Sale of Personal Data

We also use cookies to personalize your experience on our websites, including by determining the most relevant content and advertisements to show you, and to monitor site traffic and performance, so that we may improve our websites and your experience. You may opt out of our use of such cookies (and the associated “sale” of your Personal Information) by using this toggle switch. You will still see some advertising, regardless of your selection. Because we do not track you across different devices, browsers and GEMG properties, your selection will take effect only on this browser, this device and this website.

Social Media Cookies

We also use cookies to personalize your experience on our websites, including by determining the most relevant content and advertisements to show you, and to monitor site traffic and performance, so that we may improve our websites and your experience. You may opt out of our use of such cookies (and the associated “sale” of your Personal Information) by using this toggle switch. You will still see some advertising, regardless of your selection. Because we do not track you across different devices, browsers and GEMG properties, your selection will take effect only on this browser, this device and this website.

Targeting Cookies

We also use cookies to personalize your experience on our websites, including by determining the most relevant content and advertisements to show you, and to monitor site traffic and performance, so that we may improve our websites and your experience. You may opt out of our use of such cookies (and the associated “sale” of your Personal Information) by using this toggle switch. You will still see some advertising, regardless of your selection. Because we do not track you across different devices, browsers and GEMG properties, your selection will take effect only on this browser, this device and this website.