With the rapid development of manipulation technologies, the generation of Deep Fake videos is more accessible than ever. As a result, face forgery detection becomes a challenging task, attracting a significant amount of attention from researchers worldwide. However, most previous work, consisting of convolutional neural networks (CNN), is not sufficiently discriminative and cannot fully utilise subtle clues and similar textures during the process of facial forgery detection. Moreover, these methods cannot simultaneously consider accuracy and time efficiency. To address such problems, we propose a novel framework named FPC-Net to extract some meaningful and unnatural expressions in local regions. This framework utilises CNN, long short-term memory (LSTM), channel groups loss (CG-Loss) and adaptive feature fusion to detect face forgery videos. First, the proposed method exploits spatial features by CNN, and a channel-wise attention mechanism is employed to separate channels. Specifically, with the help of channel groups loss, the channels are divided into two groups, each representing a specific class. Second, LSTM is applied to learn the correlation of spatial features. Finally, the correlation of features is mapped into other latent spaces. Through a lot of experiments, the results are that the detection speed of the proposed method reaches 420 FPS and the auc scores achieve best performance of 99.7%, 99.9%, 94.7%, and 82.0% on Raw Celeb-DF, Raw Face Forensics++, F2F and NT datasets respectively. The experimental results demonstrate that the proposed framework has great time efficiency performance while improving the detection performance compared with other frame-level methods in most cases.
K E Y W O R D S adaptive feature fusion, facial forgery detection
| INTRODUCTIONSince video synthesis has made remarkable progress and multimedia technologies have witnessed an explosion of generative models of continuously growing capability and capacity, the malicious abuse of video manipulation technology is causing great concern. Many scholars are devoted to the research of video and image forgery detection and have achieved some achievements. Tyagi et al. [1] provide a detailed analysis of image and video manipulation and detection techniques. Vinolin et al. [2] focus on establishing the 3D model of the video frame to generate light coefficients in order to detect the forgeries in videos. Chen et al. [3] propose a blind detection model for image forensics based on weak feature extraction. However, the videos generated by generative adversarial networks (GANs) [4] or variational autoencoder (VAEs) [5] are too realistic to distinguish, which causes serious problems, such as fake news, public security and privacy threats. For example, in 2018, a realistic-looking video showed that the former President Barack Obama was cussing another former President, Donald Trump, bringing attention to the risk of Deep-Fake. Recently, the most popular term 'DeepFake' in video This is an open access article under the terms of the Crea...