Back
Disclaimer: These are my personal notes on this paper. I am in no way related to this paper. All credits go towards the authors.
MagNet: A Two-Pronged Defense against Adversarial Examples
Sept. 11, 2017 -
Paper Link -
Tags: Adversarial, Detection
Summary
They used two extra neural networks. One is a detector, which determines if a sample is adversarial or not. If not adversarial, the sample goes through a reformer to bring the sample closer to the manifold, which increases the likelihood of correct classification. The manifold is the line in the feature space that represents normal examples. This is outlined below. Note, no adversarial examples were used.
Notes
- There are three ways to defend against adversarial examples:
- Adversarial Training - Training with adversarial examples
- Distinguish between normal and adversarial examples - Required adversarial examples
- Defensive Distillation
- Carlini & Wagner showed that defensive distillation did not significantly increase the robustness of neural networks. Defensive distillation is: "making target classifiers hard to attack by blocking gradient path-way - Papernot et al.
- The reformer used was an autoencoder - a neural network trained to attempt to copy its input to its output
- Jensen-Shannon divergence used for the autoencoder (reformer) as a similarity metric
- Distance metrics used: \(L^0, L^2, L^\infty\), i.e. the \(L^P\) norm
- Tested against 4 attacks: Fast gradient sign method, iterative gradient sign method, deepfool, Carlini attack
- Uses a detecting adversarial examples method that does not require adversarial samples
- Uses "reconstruction error" to estimate how far a test example is from the minfold of normal examples
Analysis
- Performed very poorly on whitebox attacks. They changed this to a graybox attack by randomising which set of neural networks to use.
- Good results against Carlini when using a graybox approach. Table 6,7,8 in paper shows the results.
Citation: Meng, Dongyu, and Hao Chen. "Magnet: a two-pronged defense against adversarial examples." Proceedings of the 2017 ACM SIGSAC conference on computer and communications security. 2017.