Disclaimer: These are my personal notes on this paper. I am in no way related to this paper. All credits go towards the authors.

Recurrent Convolutional Strategies for Face Manipulation Detection in Videos

May 16, 2019 - Paper Link - Tags: Deepfake, Detection, RNN

Used a RNN for end-to-end deepfake detection. Had 96.9% accuracy on FaceForensics++'s C40 compressed Deepfake dataset.'

Pipeline: (1) detect, crop, and align faces on a sequence of frames, (2) Use RNN for detection
Used two different techniques for face alignment: Landmark-based alignment and Spatial transformer network. Landmark-based uses a simple similarity transformation using 7 points of the face (eye points, tip of nose, corners of mouth). The spacial transformer network (STN) performs spatial alignment of data with learnable affine transformation parameters.
Instead of averaging the probability of the video being malicious across frames, their method directly outputs the results from the RNN. This enables end-to-end training.
Tested on FaceForensics++'s C40 compression dataset.
Conclusions:

StarGAN alters the background as well, which can lead to artifacts in the background.

Although they used the hardest compression scheme for FaceForensics, DeepFakeDetection in FaceForensics is the new hardest dataset to beat in FaceForensics++.

Citation: Sabir, Ekraam, et al. "Recurrent convolutional strategies for face manipulation detection in videos." Interfaces (GUI) 3.1 (2019).