Back
Disclaimer: These are my personal notes on this paper. I am in no way related to this paper. All credits go towards the authors.
FaceForensics++: Learning to Detect Manipulated Facial Images
Aug. 26, 2019 -
Paper Link -
Tags: Dataset, Deepfake, Detection, Facial-Reenactment, Survey
Summary
Created a large video database consisting of 1000 videos each manipulated (automatically) via Deepfakes, Face2Face, FaceSwap, and NeuralTextures. Used five different state of the art detection methods to see how well they were able to detect real vs fake images (frames). XceptionNet outperformed the other methods. Each method did worse on low quality images and best on raw quality images (i.e. compression is hard).
Notes
- Made a dataset consisting of 1000 video sequences that have been manipulated with via Deepfakes, Face2Face, FaceSwap, and NeuralTextures. LINK
- There are currently two types of facial manipulation methods: Facial expression manipulation and facial identity manipulation. Figure 2 gives a good example.
- Identity Swap: The face from one person is transposed onto another person
- FaceSwap
- "Graphics-based approach to transfer the face region from a source video to a target video"
- Finds facial landmarks → Extract face → Back projected to a target image by minimizing landmark distances → Blend image and apply color correction
- DeepFakes
- FakeApp and faceswap (DeepFake Method) are public implementations
- Uses neural networks: single encoder and two decoders
- Face detector used to crop and align images. Poisson image editing is used to blend the image
- Facial Reenactment: The facial expression from one person is transposed onto another. The original identity if maintained, only the expression changes.
- Face2Face
- A "dense reconstruction" of the face is generated via following the face through multiple frames of a video. This is "used to re-synthesize the face under different illumination and expressions"
- NeuralTextures
- "Uses original video data to learn a neural texture of the target person"
- Uses a unique model for every manipulation. Proves the most difficult for the below detectors.
- Forgery Detection - A per-frame binary classification problem ("Real" or "Fake")
- Human
- Figure 4 highlights their results when humans classify images as real or fake. Facial reenactment was really hard to detect (around 50% accuracy, i.e. random guessing). Identity swap was much easier to detect with generally >70% accuracy. It was generally harder to tell the difference between real and fake with lower quality images
- Automated
- Figure 6 summarizes the accuracy results
- Steganalysis Features
- Based on a method by Fridrich et al. which won the first IEEE Image Forensic Challenge. Uses 162 features derived from a 128x128 central crop-out image of the face, which is then feed into a SVM
- Preforms really well on raw image, but struggles with compressed images
- Cozzolino et al.
- Used the same features from the "Steganalysis Features" classifier, but used a CNN-based network instead of a SVM.
- Preforms better than the SVM Steganalysis features method, but still stugles with low quality videos
- Bayar and Stamm
- CNN based approach that uses constrained convolutional layers
- Better than the previous two methods. Still struggles with low quality images
- Rehmouni et al.
- CNN based approach. Computes some statistics.
- Roughly the same results as Cozzolino et al.'s method
- MesoInception-4
- CNN based approach inspired by InceptionNet
- Has two inception modules and two classical convolution layers
- Better results than the previous 4 methods (generally)
- XceptionNet
- Traditional CNN trained on ImageNet. Based on separate convolutions with residual connections. They re-fitted this network for their purposes by transferring most of the network and replacing the final connected layer with two outputs.
- Best results. Preforms well with low quality images.
- Figure 8 shows how important a large dataset is.
Interesting References
- Face2Face uses a state-of-the-art face tracking method. Good for extracting the face region in an image.
Citation: Rossler, Andreas, et al. "Faceforensics++: Learning to detect manipulated facial images." Proceedings of the IEEE International Conference on Computer Vision. 2019.