Back

Disclaimer: These are my personal notes on this paper. I am in no way related to this paper. All credits go towards the authors.



Poison Frogs! Targeted Clean-Label Poisoning Attacks on Neural Networks

Nov. 10, 2018 - Paper Link - Tags: Adversarial, Data-Poisoning, Perturbation

Summary

Performed a data-poisoning attack, where the goal was to misclassify a certain target image to be classified as a base image (a goal label), without degrading the overall accuracy of the system. An image from the base class would be perturbed so it carried features from the target image, thus moving the decision boundary for the target image towards the base class. Two different training strategies were used. When transfer learning was used (froze all but the last layer), they achieved 100% attack success rate with a single photo. When end-to-end learning was used (which is more realistic), they achieved up to a 70% success rate, however, 50 poisoned images were required plus a 30% opacity watermark was required plus they had to pick the lowest confident images in the base class to perturbed. All attacks consisted of a single target image to be labeled as the base class.

Notes

Analysis

Citation: Shafahi, Ali, et al. "Poison frogs! targeted clean-label poisoning attacks on neural networks." Advances in Neural Information Processing Systems. 2018.