AI starts with good, clean data
- By Grant Gross
- Sep 06, 2017
Deploying an artificial intelligence-based solution involves much more than installing a piece of software.
Agencies considering AI need to understand that they must prepare their data first, which could be a time-consuming process, said Daniel Enthoven, business development manager at Domino Data Lab, a vendor of AI and data science collaboration tools.
“Data might be spread all over the place, and it might be in formats that are hard to use,” he added. ”There may be data you need but don’t yet have."
Agencies need good data scientists who will work with the IT team to “understand how they can get the data they need to fulfill the objective,” Enthoven said. "In some cases, data scientists go upstream to ask for data and even reengineer processes to get the data they need.” After the team collects the data, it will need to be cleaned up or normalized.
A big part of data preparation often involves dealing with duplication, he added. “If you have hundreds of analysts spread across multiple locations and they’re all working in the same problem area, you can bet that many are looking at the same datasets [and] doing the same cleaning,” he said. “It’s hard for any one person to know what has already been looked at and what data has been deemed useful or non-usable.”
Some companies, such as Tamr, have begun to offer tools to clean up and consolidate data, and most AI systems will offer services to fix data problems, said Meagan Metzger, founder and CEO of government-focused IT accelerator Dcode42. But data scrubbing is still a major activity that must happen before AI technology is deployed.
“Of the AI tools we have worked with, the biggest question for their customers is always, ‘What state is the data in?’” she said. “Often the data is messy or needs cleansing before these tools can work effectively.”
The way the federal government categorizes its spending data is a good example of the problems agencies face when they try to show how funds are distributed, said Peter Viechnicki, a data scientist at Deloitte’s Center for Government Insights. Money spent on contracts is stored in one format while money spent on grants is stored in another, he said, adding that agencies have been making progress on standardizing their approach to spending data.
It “seems like a no-brainer that citizens should be able to see where and how our tax dollars are being spent, but legacy IT systems make this more challenging,” Viechnicki said.
Despite the steep learning curve, many AI experts believe the widespread use of the technology in government is inevitable. And Metzger is among those who recommend that agencies start educating themselves now.
“Because there are real problems that can be solved with AI today, don’t think of it as ‘we’re adopting artificial intelligence,’” she said. “Just think of it as you would any other IT modernization effort. It’s going to happen, so you need to figure out how to embrace it.”
A longer version of this article was first posted to FCW, a sibling site to GCN.
Grant Gross is a freelance writer based in the Washington, D.C., metropolitan area.