Disclaimer: These are my personal notes on this paper. I am in no way related to this paper. All credits go towards the authors.

FaceForensics++: Learning to Detect Manipulated Facial Images

Aug. 26, 2019 - Paper Link - Tags: Dataset, Deepfake, Detection, Facial-Reenactment, Survey

Summary

Created a large video database consisting of 1000 videos each manipulated (automatically) via Deepfakes, Face2Face, FaceSwap, and NeuralTextures. Used five different state of the art detection methods to see how well they were able to detect real vs fake images (frames). XceptionNet outperformed the other methods. Each method did worse on low quality images and best on raw quality images (i.e. compression is hard).

Notes

Made a dataset consisting of 1000 video sequences that have been manipulated with via Deepfakes, Face2Face, FaceSwap, and NeuralTextures. LINK
There are currently two types of facial manipulation methods: Facial expression manipulation and facial identity manipulation. Figure 2 gives a good example.

Identity Swap: The face from one person is transposed onto another person

FaceSwap

"Graphics-based approach to transfer the face region from a source video to a target video"
Finds facial landmarks → Extract face → Back projected to a target image by minimizing landmark distances → Blend image and apply color correction

DeepFakes

FakeApp and faceswap (DeepFake Method) are public implementations
Uses neural networks: single encoder and two decoders
Face detector used to crop and align images. Poisson image editing is used to blend the image

Facial Reenactment: The facial expression from one person is transposed onto another. The original identity if maintained, only the expression changes.

Face2Face

A "dense reconstruction" of the face is generated via following the face through multiple frames of a video. This is "used to re-synthesize the face under different illumination and expressions"

NeuralTextures

"Uses original video data to learn a neural texture of the target person"
Uses a unique model for every manipulation. Proves the most difficult for the below detectors.

Forgery Detection - A per-frame binary classification problem ("Real" or "Fake")

Human

Figure 4 highlights their results when humans classify images as real or fake. Facial reenactment was really hard to detect (around 50% accuracy, i.e. random guessing). Identity swap was much easier to detect with generally >70% accuracy. It was generally harder to tell the difference between real and fake with lower quality images

Automated

Figure 6 summarizes the accuracy results
Steganalysis Features

Based on a method by Fridrich et al. which won the first IEEE Image Forensic Challenge. Uses 162 features derived from a 128x128 central crop-out image of the face, which is then feed into a SVM
Preforms really well on raw image, but struggles with compressed images

Cozzolino et al.

Used the same features from the "Steganalysis Features" classifier, but used a CNN-based network instead of a SVM.
Preforms better than the SVM Steganalysis features method, but still stugles with low quality videos

Bayar and Stamm

CNN based approach that uses constrained convolutional layers
Better than the previous two methods. Still struggles with low quality images

Rehmouni et al.

CNN based approach. Computes some statistics.
Roughly the same results as Cozzolino et al.'s method

MesoInception-4

CNN based approach inspired by InceptionNet
Has two inception modules and two classical convolution layers
Better results than the previous 4 methods (generally)

XceptionNet

Traditional CNN trained on ImageNet. Based on separate convolutions with residual connections. They re-fitted this network for their purposes by transferring most of the network and replacing the final connected layer with two outputs.
Best results. Preforms well with low quality images.

Figure 8 shows how important a large dataset is.

Interesting References

Face2Face uses a state-of-the-art face tracking method. Good for extracting the face region in an image.

Citation: Rossler, Andreas, et al. "Faceforensics++: Learning to detect manipulated facial images." Proceedings of the IEEE International Conference on Computer Vision. 2019.