Back
Disclaimer: These are my personal notes on this paper. I am in no way related to this paper. All credits go towards the authors.
Exposing DeepFake Videos By Detecting Face Warping Artifacts
May 22, 2019 -
Paper Link -
Tags: Deepfake, Detection
Summary
Generated a dataset to train deepfake detectors that did not consist of any deepfakes. They cropped the face, applied Gaussian blur to blur the face, affine (scaling, rotation, shearing) warped the face back to normal (thus creating additional artifacts). The dataset proved to train the given CNN models well enough to outperform other models that were trained on actual deepfakes.
Analysis
- Using a generated dataset that does not contain deepfakes is an interesting idea, however, as deepfakes improve, this method will become less viable
- For testing, they used fairly small datasets.
Notes
- DeepFake algorithms output faces of a fixed size, thus they must undergo an affine wrapping to bring the face back to its original quality
- The face alteration generally leaves behind artifacts or makes the image less crisp/blurrier
- They took advantage of this to create a deepfake dataset that did not require running a deepfake model.
- Training Dataset
- Used 24,442 JPEG images as the positive examples, which were also used to generate the negative examples
- Figure 2 highlights how they generated their training dataset
- They aligned the face with different scales (scale the image down X%)
- Applied Gaussian blur (made the image a bit blurry)
- Warped the image back to the original dimensions
- Figure 3 highlights how they wrap different parts of the face
- Model
- Generated "regions of interest" (RoI) via landmarks in the face. 10 RoIs were used for each image
- Used four different CNN models: VGG16, ResNet50, ResNet101, and ResNet152.
- Averaged the predictions of all RoIs probabilities to determine the final probability of the image being fake or real
- Validation Datasets
- Used the UADFV dataset consisting of 49 real and 49 fake videos each lasting ~11 seconds. ResNet50 had the highest Area Under Curve (ROC) at 0.954 per frame. ResNet101 had .991 per video
- Used the DeepfakeTIMIT dataset consisting of a few low and high quality videos. ResNet50 had the highest AUC per frame at .999 for low quality videos and .932 for high quality videos
- ResNet50 outperformed other models from other papers (Table 1) despite being trained on a non-deepfake dataset.
Citation: Li, Yuezun, and Siwei Lyu. "Exposing deepfake videos by detecting face warping artifacts." arXiv preprint arXiv:1811.00656 (2018).