2019 Symposium Posters

Posters > 2019

Content Focus to Protect Against Trojan Attacks on Neural Networks

Primary Investigator:
Bharat Bhargava

Project Members
Miguel Villarreal Vasquez, Bharat Bhargava

Abstract

Neural Networks (NNs) have successfully being used in a variety of applications such as natural language processing, scene recog- nition, objection detection and anomaly detection. Despite their success, the wide adoption of NNs in real world missions is threatened by the security concerns of NNs. One of the most recent attacks against these models are trojan or backdoor attacks, in which adversaries slightly mod- ify the original models by either poisoning or retraining them with ad- versarial samples. These adversarial samples characterize for having a mark or trigger (a small set of pixels in the computer vision scenario) and a label chosen by the adversary. The ultimate goal of the attack is inducing misbehavior (e.g. misclassification of images) as any sample with the trigger in evaluation time is misclassified to the predetermined class chosen by the adversary. Trojan models react to the presence of the trigger given higher values to the output neuron of the chosen class. In this work we develop a framework that reduces these reaction capabili- ties by making models focus on the silhouette or content of the objects instead of focusing on surrounding shapes and colors, where triggers are most likely located. We achieve this goal by retraining the model with samples whose contents are the only thing in common. From a particular sample X and N styles {S1 , S2 , ...SN } we generate the following training set {XC , XS1 , XS2 , ..., XSN }, where XC is the sample content and XSi are styled samples obtained by adding the different styles to the content (XSi = XC + Si). The elements of this set only have the silhouette or content of X in common as the remaining pixeles are overridden by the styles. This ability of generating N + 1 different training samples given the sample X and the achieved reduction in the efficiency of the attack feature two desired advantages: (1) it is possible to significantly retrain the model with a small training set to override the effect of the trigger, and (2) our strategy is used for two defense categories model hardening and adversary input detection.