Researchers at George Mason University have come up with a groundbreaking cyberattack, known as OneFlip, which can backdoor artificial intelligence systems by performing a single bit flip in memory with 99.9% success rates and is virtually undetected, causing AI models to misclassify images as per the instructions of an attacker using advanced Rowhammer memory corruption techniques.
Researchers can obtain devastating AI backdoors with little work
CSO Online reports that a group of researchers at George Mason University has devised a novel technique of applying the established Rowhammer attack to physical computer memory to add backdoors to full-precision AI models. They use their OneFlip technique, which only flips one bit within a vulnerable DRAM module to modify the behavior of deep neural networks on attacker-controlled inputs.
According to the researchers, image classification systems deployed by self-driving cars can be poisoned to see dangerous road signs and cause accidents, or face models can be compromised to allow anyone wearing a particular pair of glasses to gain access to the building. Only one bit can be selectively flipped by an attacker, and it is the manipulation of the bit, 0 to 1, that enables an attacker to place a patch in any image and deceive the AI system, as explained by Qiang Zeng, associate professor at George Mason University.
How Rowhammer memory attacks exploit modern DRAM vulnerabilities
Rowhammer is a method that takes advantage of the high density of cells together with advanced DRAM chips, especially DDR3 and DDR4. Memory chips store bits by switching around electric charges within memory cells, but repeated reads on the same physical row can result in leakage of electric charges in other rows and thus flipping bits in cells in the closely spaced rows.
Eligibility criteria: If you meet these requirements, you can execute OneFlip attacks
To successfully carry out such an attack, the attacker must have white-box access to the model, as well as its weights and its parameters, before making a decision on what weight to attack. The other requirement is that the server hosting the model must have DRAM modules that are susceptible to Rowhammer, a condition that covers nearly all DDR3 and DDR4 memory modules with the exception of error correction code (ECC) DRAM.
Lastly, the attacker needs to be physically in the same computer on which the AI model is deployed to execute their attack code. This may be done through hacking cloud computing images, installing malware, or sharing multi-tenant environments where the images of the GPUs are shared. Once the attacker is aware of the algorithm, then it can only take a handful of minutes, literally, to implement the change, as Zeng explained.
The authors tested OneFlip in CIFAR-10, CIFAR-100, GTSRB, and ImageNet, and various deep neural network architectures, including vision transformers. Its outcomes show that OneFlip has high attack success rates of up to 99.9, with an average of 99.6 and minimal degradation to benign accuracy of down to 0.005, with an average of 0.06.
The three-step attack processes undermine AI model integrity
The researchers state that the three stages of the attack are target weight identification, trigger generation, and backdoor activation. The attacker first examines the last classification layer of the neural network in the first step to identify vulnerable weights whose final floating-point representation can be dramatically increased with a single bit flip.
OneFlip is a breakthrough in AI security risks, showing how attackers can engage in devastating backdoor attacks with minimum effort by exploiting fundamental vulnerabilities in contemporary memory systems, and only single-bit manipulation is necessary to break AI model integrity and avoid detection systems, and maintain near-perfect attack success rates across a large range of neural network architectures.