Building better AI
- By Derek B. Johnson
- Jul 19, 2019
As the federal government ups its investment in artificial intelligence, agencies must be careful not to repeat the IT mistakes that resulted in insecure legacy technology, according to two AI official said during FCW's emerging tech summit July 17.
Just as the legacy tech built in the 1970s and '80s was designed for functionality, not security, the same dynamic is playing out in the AI space, warned Jeff Alstott, a program manager for the Intelligence Advanced Research Projects Activity who oversees several projects designed to advance long-term development of AI for the intelligence community.
Companies and agencies are so focused on getting their algorithms to work that they are inadvertently building an unsecure foundation for future systems that will drive cars, route planes and perform an increasing number of critical functions in society.
"We've baked in certain technical and organizational mistakes back then; thus, we are permanently insecure in the digital cyber IT world," Alstott said. "I'm trying to avoid us being permanently insecure in the AI world,"
Alstott is overseeing a new project, TrojAI, that IARPA hopes will one day be capable of detecting attempts by bad actors to intentionally manipulate the training data used for advanced automated systems.
The program is designed to sniff out more than mere sabotage or data poisoning of an algorithm. Rather, IARPA wants to make sure a sophisticated attacker can't alter or modify training data to teach an algorithm to engage in destructive behavior, like tampering with the street signs used to train self-driving cars so the vehicles misinterpret road signs and crash.
Many algorithms start training on open source data. While open source products aren't necessarily less secure, Alstott said IARPA is worried about the potential for targeted compromise or sabotage of commonly used datasets that often form the educational foundation for nascent AI programs.
"You might train a neural network from scratch, but we often don't do that," said Alstott. "We often grab an existing neural network off GitHub … and the problem is that data and those data networks are not secure, they've not been validated, and that supply chain that led into those neural networks is long, vast and very difficult to track."
Experts have begun to increasingly cite the quality of data used to train algorithms as the most critical step in an AI system's lifecycle. Such data, whether for an autonomous vehicle or a consumer lending program, must be "representative," Alstott said, with all the dirt and noise that will present itself in real-world conditions, or the consequences could be significant.
The Defense Department's AI strategy rolled out in February also calls for leveraging unified data stores, reusable tools, frameworks and standards and cloud and edge services, and the Trump administration's executive order on AI lists a strategic objective to ensure agencies have access to "high quality and fully traceable federal data, models and resources" without sacrificing "safety, security, privacy and confidentiality protections." However, the order does not mention or lay out a strategy for dealing with algorithmic bias, which can often be traced back to the data that feeds into AI systems.
Martin Stanley, a senior technical advisor at the Cybersecurity and Infrastructure Security Agency who focuses on AI, said problems can ensue when an algorithm is trained for a specific purpose and later the application's scope is expanded without adjusting the training data.
"All the implementations we've seen to date are narrow AI applications, so you've really got to understand that it's focused on a particular task," Stanley said. "You've got to have the right hardware, the right algorithm, the right data and the right know-how around that narrow implementation, and you have to be very, very careful about generalizing that further to some other space."
To ensure an algorithm retains its integrity as an application broadens, systems should be designed at the outset to be paired with human analysis. People are still better at decision-making than most contemporary AI systems, which generally must ingest hundreds if not thousands of examples to make connections and understand nuance that humans can pick up intuitively.
"Humans are miraculous … in that we can learn from one or no examples and then apply that to a novel situation," said Stanley.
This article was first posted to FCW, a sibling site to GCN.
Derek B. Johnson is a senior staff writer at FCW, covering governmentwide IT policy, cybersecurity and a range of other federal technology issues.
Prior to joining FCW, Johnson was a freelance technology journalist. His work has appeared in The Washington Post, GoodCall News, Foreign Policy Journal, Washington Technology, Elevation DC, Connection Newspapers and The Maryland Gazette.
Johnson has a Bachelor's degree in journalism from Hofstra University and a Master's degree in public policy from George Mason University. He can be contacted at firstname.lastname@example.org, or follow him on Twitter @derekdoestech.
Click here for previous articles by Johnson.