FakeAVCeleb: A Novel Audio-Video Multimodal Deepfake Dataset

Khalid, Hasam; Tariq, Shahroz; Kim, Minha; Woo, Simon S.

doi:10.48550/arxiv.2108.05080

Cited by 13 publications

(28 citation statements)

References 38 publications

(109 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The number of faces in each frame is more than one. Recently, FakeAVCeleb [31] was released focusing on both face-swap and face-reenactment methods with manipulated audio and video. ForgeryNet [23] is the latest contribution to the growing list of deepfake detection datasets.…”

Section: Related Workmentioning

confidence: 99%

Do You Really Mean That? Content Driven Audio-Visual Deepfake Dataset and Multimodal Method for Temporal Forgery Localization

Cai¹,

Stefanov²,

Dhall³

et al. 2022

Preprint

View full text Add to dashboard Cite

Due to its high societal impact, deepfake detection is getting active attention in the computer vision community. Most deepfake detection methods rely on identity, facial attribute and adversarial perturbation based spatio-temporal modifications at the whole video or random locations, while keeping the meaning of the content intact. However, a sophisticated deepfake may contain only a small segment of video/audio manipulation, through which the meaning of the content can be, for example, completely inverted from sentiment perspective. To address this gap, we introduce a content driven audio-visual deepfake dataset, termed as Localized Audio Visual DeepFake (LAV-DF), explicitly designed for the task of learning temporal forgery localization. Specifically, the content driven audio-visual manipulations are performed at strategic locations in order to change the sentiment polarity of the whole video. Our baseline method for benchmarking the proposed dataset is a 3DCNN model, termed as Boundary Aware Temporal Forgery Detection (BA-TFD), which is guided via contrastive, boundary matching and frame classification loss functions. Our extensive quantitative analysis demonstrates the strong performance of the proposed method for both tasks of temporal forgery localization and deepfake detection.

show abstract

Section: Related Workmentioning

confidence: 99%

Do You Really Mean That? Content Driven Audio-Visual Deepfake Dataset and Multimodal Method for Temporal Forgery Localization

Cai¹,

Stefanov²,

Dhall³

et al. 2022

Preprint

View full text Add to dashboard Cite

show abstract

“…Inspired by the emergence of DeepFakes algorithm to the public, various methods, i.e., FaceSwap [26], NeuralTextures [7], Face2Face [24], and FSGAN [13], have been proposed to synthesize hyper-realistic deepfake images that are unrecognizable to human eyes. These methods allowed to generate numerous deepfake datasets [8,12,16] for public usage in the research community. Furthermore, Wav2Lips [15] has shown a lip-synchronization network, generating lip-syncing arbitrary talking face videos with arbitrary speech.…”

Section: Related Workmentioning

confidence: 99%

“…In this work, we used FaceForensics++ [16] C40, a compressed version of original FaceForensics++, and FakeAVCeleb [8] to train each model and assess the models on each dataset. The number of fake/real images used in each dataset is provided in Table 2.…”

Section: Datasetsmentioning

confidence: 99%

“…In particular, we used low-quality images, which are compressed C40 version of the dataset, to consider the realistic setting for manipulated videos and to provide certain levels of difficulty for the performance assessment. • FakeAVCeleb [8] is an Audio-Video Multimodal Deepfake Detection dataset that contains both video and audio deepfakes with accurate lip-sync. FakeAVCeleb dataset contains three different types of video deepfakes, FaceSwap, DeepFaceLab, and FSGAN.…”

Section: Datasetsmentioning

confidence: 99%

See 1 more Smart Citation

Deepfake Detection for Facial Images with Facemasks

Ko¹,

Lee²,

Park³

et al. 2022

Preprint

Self Cite

View full text Add to dashboard Cite

Hyper-realistic face image generation and manipulation have given rise to numerous unethical social issues, e.g., invasion of privacy, threat of security, and malicious political maneuvering, which resulted in the development of recent deepfake detection methods with the rising demands of deepfake forensics. Proposed deepfake detection methods to date have shown remarkable detection performance and robustness. However, none of the suggested deepfake detection methods assessed the performance of deepfakes with the facemask during the pandemic crisis after the outbreak of the Covid-19. In this paper, we thoroughly evaluate the performance of state-of-the-art deepfake detection models on the deepfakes with the facemask. Also, we propose two approaches to enhance the masked deepfakes detection: face-patch and face-crop. The experimental evaluations on both methods are assessed through the baseline deepfake detection models on the various deepfake datasets.Our extensive experiments show that, among the two methods, face-crop performs better than the face-patch, and could be a train method for deepfake detection models to detect fake faces with facemask in real world.

show abstract

“…Overdub, iSpeech, and VoiceApp are instances of voice cloning open-access platforms that can generate synthesized deepfake sounds that nearly resemble the target human's speech [3]. The work of [4] is an example of these manipulation methods, which involves the creation of highly realistic deepfake videos with a precise lip-sync using a group of AI technologies; FaceSwap, FaceSwap GAN, DeepFaceLab, SV2TTS [5], and Wav2Lip [6].…”

Section: Introductionmentioning

confidence: 99%

A Novel Smart Deepfake Video Detection System

Elpeltagy¹,

Ismail²,

Zaki³

et al. 2023

IJACSA

View full text Add to dashboard Cite

Rapid advancements in deep learning-based technologies have developed several synthetic video and audio generation methods producing incredibly hyper-realistic deepfakes. These deepfakes can be employed to impersonate the identity of a source person in videos by swapping the source's face with the target one. Deepfakes can also be used to clone the voice of a target person utilizing audio samples. Such deepfakes may pose a threat to societies if they are utilized maliciously. Consequently, distinguishing either one or both deepfake visual video frames and cloned voices from genuine ones has become an urgent issue. This work presents a novel smart deepfake video detection system. The video frames and audio are extracted from given videos. Two feature extraction methods are proposed, one for each modality; visual video frames, and audio. The first method is an upgraded XceptionNet model, which is utilized for extracting spatial features from video frames. It produces feature representation for visual video frames. The second one is a modified InceptionResNetV2 model based on the Constant-Q Transform (CQT) method. It is employed to extract deep timefrequency features from the audio modality. It produces feature representation for the audio. The corresponding extracted features of both modalities are fused at a mid-layer to produce a bimodal information-based feature representation for the whole video. These three representation levels are independently fed into the Gated Recurrent Unit (GRU) based attention mechanism helping to learn and extract deep and important temporal information per level. Then, the system checks whether the forgery is only applied to video frames, audio, or both, and produces the final decision about video authenticity. The newly suggested method has been evaluated on the FakeAVCeleb multimodal videos dataset. The experimental results analysis assures the superiority of the new method over the current-stateof-the-art methods.

show abstract

FakeAVCeleb: A Novel Audio-Video Multimodal Deepfake Dataset

Cited by 13 publications

References 38 publications

Do You Really Mean That? Content Driven Audio-Visual Deepfake Dataset and Multimodal Method for Temporal Forgery Localization

Do You Really Mean That? Content Driven Audio-Visual Deepfake Dataset and Multimodal Method for Temporal Forgery Localization

Deepfake Detection for Facial Images with Facemasks

A Novel Smart Deepfake Video Detection System

Contact Info

Product

Resources

About