Back
Disclaimer: These are my personal notes on this paper. I am in no way related to this paper. All credits go towards the authors.
Deepfake Video Detection Using Recurrent Neural Networks
Nov. 27, 2018 -
Paper Link -
Tags: Deepfake, Detection, RNN
Summary
Used a CNN followed by a long-short-term-memory recurrent neural network to detect deepfakes. The LSTM RNN allowed for the network to keep track of temporal differences between a deepfake and a benign video.
Notes
- Related Work
- Detection based on dropped or duplicated frames
- Detect natural from computer generated faces
- DeepFake Videos
- To make a deepfake, a single encoder and two decoders are used. Figure 2 highlights this.
- Two sets of data are required: one for the source and one for the target. It is best if the source environment is similar to the target.
- Faces are generally inconsistent with the rest of the scene
- "Very common to have boundary effects due to a seamed fusion between the new face and the rest of the frame"
- Deepfakes are generally not temporal aware, which leads to inconsistent illuminations and flickering between frames
- Recurrent Network for Deepfake Detection
- Architecture in Figure 3. Consists of a CNN for frame feature extraction, a LSTM RNN for temporal sequence analysis, and a detection network
- InceptionV3 is used as the CNN with some of the ending layers missing, resulting in a 2048-dimensional feature vector.
- Experiments
- Used the HOHA dataset.
- Preprocessing of video: Subtract channel mean from each channel, resized each frame to be 299x299, controlled the number of frames (20, 40, or 80 frames at 24 frames per second)
- Had a testing accuracy of ~97.1%. More frames used lead to better results
Citation: Güera, David, and Edward J. Delp. "Deepfake video detection using recurrent neural networks." 2018 15th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS). IEEE, 2018.