Back
Disclaimer: These are my personal notes on this paper. I am in no way related to this paper. All credits go towards the authors.
Unmasking DeepFakes with simple Features
March 4, 2020 -
Paper Link -
Tags: Dataset, Deepfake, Detection
Summary
Discrete Fourier Transformation → Azimuthal Average → Classifier → Deepfake Classification
Used the power spectrum results from a Discrete Fourier Transformer (power and phase is produced, but only power is used). To reduce the dimensionality of the image from a 2D to 1D representation, the Azimuthal Average was used. Finally, a classifier was applied to determine if a sample was a deepfake or not. Using the DeepFakeDetection dataset portion of FaceForensics++, they achieved 90% accuracy per video using a SVM classifier. Depending on the dataset, as little as 20 samples were required to achieve 100% accuracy.
This paper showed that it is possible to achieve good results using little training data.
It also showed that unsupervised methods generally perform worse than supervised methods for deepfake classification.
Notes
- Figure 2 summarizes their framework
- Gray-scale images were used
- "The Discrete Fourier Transform (DFT) is a mathematical technique to decompose a discrete signal into sinusoidal components of various frequencies ranging from 0 (i.e., constant frequency, corresponding to the image mean value) up to the maximum representable frequency, given the spatial resolution."
- An Azimuthal Average can be seen as a compression, gathering, and averaging of similar frequency components into a vector of features.
- Three different classifiers were looked at: Logistic Regression, SVM, and K-Means Clustering. Two supervised and one unsupervised method.
- Generated a new deepfake image dataset called Faces-HQ. This dataset is summarized in table 1. It is composed of 40,000 images, half are real, half deepfake.
- Table 2 shows the results using Faces-HQ. SVM and Logistic Regression had 100% accuracy with as little as 20 photos.
- Tables 3, 4, and 5 show that later features are more important than earlier features using this preprocessing method.
- Table 6 shows the results using the CelebA dataset. 100% accuracy using SVM and Logistic Regression using 2000 images.
- Table 7 shows the results using the DeepFakeDetection dataset from FaceForensics++. 85% accuracy was achieved when 2000 samples were used, and 66% when 20 samples were used. SVM outperformed Logistic Regression.
- Table 8 shows the results using videos from the DeepFakeDetection dataset instead of single frames. 90% accuracy was achieved when using a SVM
Interesting References
- Image Forensics
- Local noise estimination
- Pattern analysis
- Illumination modeling
- Steganalysis feature classification
- CNN
Analysis
- I'm not sure about DFT and the Azimuthal Average, but the classifiers used would be much quicker than a CNN
Citation: Durall, Ricard, et al. "Unmasking deepfakes with simple features." arXiv preprint arXiv:1911.00686 (2019).