Disclaimer: These are my personal notes on this paper. I am in no way related to this paper. All credits go towards the authors.

Adversarial Perturbations Fool Deepfake Detectors

May 15, 2020 - Paper Link - Tags: Deepfake, Detection, Perturbation

Summary

Used two different attack that use perturbation (FGSM and CW-L₂). Attack did well on the VGG and ResNet models. Tried two different defenses. The Lipschitz Regularization defense improved the results somewhat. The Deep Image Prior stopped the attack really well, but was INCREDIBLY slow (30 minutes per image).

Analysis

Tiny dataset for the effective attack
DIP removed perturbations before feeding it to the classifier. NOTE: won't work with our attack. :)

Notes

Used two different attack methods: Fast Gradient Sign Method (FGSM) and Wagner L₂ Norm Attack (CW-L₂)
FGSM

Popular, efficient, and fast attack

CW-L₂

Slow but strong attack

Preformed white and black box attacks. Black-box attack used transfer learning from a ResNet model. Figure 3 shows the attack results. The CS-L₂ attack was more effective but slow.
Defenses

Looked at two different defenses for the perturbation attacks.
Lipschitz Regularization

Constrains the gradient of the detector with respect to the input data. It desensitizes the loss from small perturbations.
Figure 4 shows the results. The defensed detector generally out performed the un-defended detector, by an okay margin

Deep Image Prior (DIP)

Was originally used for image restoration, such as image denoising, inpainting, and super resolution
The goal is to recover x given x_c. Equation 9 gives the general equation
If the network is optimized for too long, it learns to generate the input image corruptions (bad).
This method is SLOW (takes roughly 30 minutes per image), so only 100 images were tested.
Table 4 shows the results. "The DIP defense shows more promising results. It achieved a recall of 97.8% for perturbed and unperturbed fake images using a classification threshold of 0.25... the DIP defense retained at least 90.0% of the classifier's performance on real images using the same threshold value." → Decent but SLOW defense

Interesting References

"...machine learning solutions such as deepfake detectors often use existing publicly known and accessible architectures for transfer learning purposes" LINK
"...adversarial examples created using a neural network also work on support vector machines and decision tree classifiers" LINK

Citation: Gandhi, Apurva, and Shomik Jain. "Adversarial perturbations fool deepfake detectors." arXiv preprint arXiv:2003.10596 (2020).