trojan horse

Identifying Trojans in AI

With the rise of adversarial AI, government researchers are looking for ways to automatically inspect artificial intelligence and machine learning systems to see if they've been tampered with.


Hardening algorithms against adversarial AI

How can developers secure artificial intelligence applications when the underlying data is vulnerable to hackers? Read more.

DARPA outlines adversarial AI defense

The Guaranteeing AI Robustness against Deception program aims to create machine-learning models that can defend against a wide range of attacks. Read more.

Unlocking the black box of AI reasoning

Research with generative adversarial networks may explain how neural networks learn and make decisions. Read more.

Adversarial AI attacks that insert information or images into machine-learning training data seek to trick the system into incorrectly classifying what has been presented.  If a system is being trained to recognize traffic signs, for example, it would learn from hundreds of labeled images of the stop signs and speed limit signs. An adversary could insert into the training database a few images of stop signs with yellow sticky notes attached that were labeled as 35 mph speed limit signs. An autonomous driving system trained on that data seeing a stops sign with a sticky note on it would be triggered to interpret that image as a speed limit sign and drive right through the stop sign.

The Army Research Office and the Intelligence Advanced Research Projects Activity are investigating techniques to spot and stop these Trojans in AI systems. Given the impossibility of cleaning and securing the entire training data pipeline, the broad agency announcement for the TrojAI program is looking to develop software to automatically inspect AI and predict if it has a Trojan.

Initially, selected performers will be working as a team with AI systems that classify small images, but the two-year program may expand to systems that classify audio and text or perform other tasks such as question-answering or game playing. As the program continues, the difficulty of identifying Trojans will be increased by changing aspects of the challenge such as the amount of test data, the rarity of Trojans, the variety of neural-network architectures and the variability of the Trojan triggers.

Performers will have access to the AI source code, architecture and compiled binary, and possibly a small number of examples of valid data. The program requires continuous software development, with development teams delivering containerized software that detects which AIs have been subject to a Trojan attack that causes misclassification. The software's source code and documentation will be posted to an open source site such as Github to permit free and effective use by the public.

Initial concept papers are due May 31, and the highest ranked applicants will be invited to submit a full proposal.  Read the full BAA here.

About the Author

Susan Miller is executive editor at GCN.

Over a career spent in tech media, Miller has worked in editorial, print production and online, starting on the copy desk at IDG’s ComputerWorld, moving to print production for Federal Computer Week and later helping launch websites and email newsletter delivery for FCW. After a turn at Virginia’s Center for Innovative Technology, where she worked to promote technology-based economic development, she rejoined what was to become 1105 Media in 2004, eventually managing content and production for all the company's government-focused websites. Miller shifted back to editorial in 2012, when she began working with GCN.

Miller has a BA and MA from West Chester University and did Ph.D. work in English at the University of Delaware.

Connect with Susan at or @sjaymiller.

Stay Connected

Sign up for our newsletter.

I agree to this site's Privacy Policy.