Detecting Deepfake Voice Using Explainable Deep Learning Techniques

Suk-Young, Lim; Chae, Dong-Kyu; Lee, Sangchul

doi:10.3390/app12083926

Cited by 23 publications

(4 citation statements)

References 22 publications

(29 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…However, this underscores the ongoing challenge of fully capturing the subtleties of human speech using artificial models. Lim et al [213] applied explainable AI (XAI) methods for deepfake voice detection, focussing on interpretations accessible to human perception. Their approach used a simple model that combined a convolutional neural network and LSTM with spectrograms used for feature extraction from raw audio data.…”

Section: ) Methods Using Handcrafted Featuresmentioning

confidence: 99%

A Survey on the Detection and Impacts of Deepfakes in Visual, Audio, and Textual Formats

Mubarak,

Alsboui,

Alshaikh

et al. 2023

IEEE Access

View full text Add to dashboard Cite

In the rapidly evolving digital landscape, the generation of fake visual, audio, and textual content poses a significant threat to society's trust, political stability, and integrity of information. The generation process has been enhanced and simplified using Artificial Intelligence techniques, which have been termed deepfake. Although significant attention has been paid to visual and audio deepfakes, there is also a burgeoning need to consider text-based deepfakes. Due to advancements in natural language processing and large language models, the potential of manipulating textual content to reshape online discourse and misinformation has increased. This study comprehensively examines the multifaceted nature and impacts of deep-fake-generated media. This work explains the broad implications of deepfakes in social, political, economic, and technological domains. State-of-the-art detection methodologies for all types of deepfake are critically reviewed, highlighting the need for unified, real-time, adaptable, and generalised solutions. As the challenges posed by deepfakes intensify, this study underscores the importance of a holistic approach that intertwines technical solutions with public awareness and legislative action. By providing a comprehensive overview and establishing a framework for future exploration, this study seeks to assist researchers, policymakers, and practitioners in navigating the complexities of deepfake phenomena.

show abstract

Section: ) Methods Using Handcrafted Featuresmentioning

confidence: 99%

A Survey on the Detection and Impacts of Deepfakes in Visual, Audio, and Textual Formats

Mubarak,

Alsboui,

Alshaikh

et al. 2023

IEEE Access

View full text Add to dashboard Cite

show abstract

“…They are effective speech signal visualizations because they demonstrate the frequency and intensity uctuations over time. Moreover, the image-based methods outperformed feature-based techniques, including those that made use of characteristics related to energy, bandwidth, frequency, and short-term transform features like MFCCs, for the identi cation of synthetic audio [72,73].…”

Section: Audio Networkmentioning

confidence: 99%

Attention-based Multimodal learning framework for Generalized Audio- Visual Deepfake Detection

Masood,

Javed,

Irtaza

2023

Preprint

View full text Add to dashboard Cite

Deepfake media proliferated on the internet has major societal consequences for politicians, celebrities, and even common people. Recent advancements in deepfake videos include the creation of realistic talking faces and the usage of synthetic human voices. Numerous deepfake detection approaches have been proposed in response to the potential harm caused by deepfakes. However, the majority of deepfake detection methods process audio and video modality independently and have low identification accuracy. In this work, we propose an ensemble multimodal deepfake detection method that can identify both auditory and facial manipulations by exploiting correspondence between audio-visual modalities. The proposed framework comprises unimodal and cross-modal learning networks to exploit intra- and inter-modality inconsistencies introduced as a result of manipulation. The suggested multimodal approach employs an ensemble of deep convolutional neural-network based on an attention mechanism that extracts representative features and effectively determines if a video is fake or real. We evaluated the proposed approach on several benchmark multimodal deepfake datasets including FakeAVCeleb, DFDC-p, and DF-TIMIT. Experimental results demonstrate that an ensemble of deep learners based on unimodal and cross-modal network mechanisms exploit highly semantic information between audio and visual signals and outperforms independently trained audio and visual classifiers. Moreover, it can effectively identify different unseen types of deepfakes as well as robust under various post-processing attacks. The results confirm that our approach outperforms existing unimodal/multimodal classifiers for audio-visual manipulated video identification.

show abstract

“…Therefore, there is a crucial gap between academic deepfake solutions and real-world scenarios or requirements. For instance, the foregoing works are usually lagging in the robustness of the systems against adversarial attacks [ 44 ], decision explainability [ 45 ], and real-time mobile deepfake detection [ 46 ].…”

Section: Introductionmentioning

confidence: 99%

Deepfakes Generation and Detection: A Short Survey

Akhtar

2023

J. Imaging

View full text Add to dashboard Cite

Advancements in deep learning techniques and the availability of free, large databases have made it possible, even for non-technical people, to either manipulate or generate realistic facial samples for both benign and malicious purposes. DeepFakes refer to face multimedia content, which has been digitally altered or synthetically created using deep neural networks. The paper first outlines the readily available face editing apps and the vulnerability (or performance degradation) of face recognition systems under various face manipulations. Next, this survey presents an overview of the techniques and works that have been carried out in recent years for deepfake and face manipulations. Especially, four kinds of deepfake or face manipulations are reviewed, i.e., identity swap, face reenactment, attribute manipulation, and entire face synthesis. For each category, deepfake or face manipulation generation methods as well as those manipulation detection methods are detailed. Despite significant progress based on traditional and advanced computer vision, artificial intelligence, and physics, there is still a huge arms race surging up between attackers/offenders/adversaries (i.e., DeepFake generation methods) and defenders (i.e., DeepFake detection methods). Thus, open challenges and potential research directions are also discussed. This paper is expected to aid the readers in comprehending deepfake generation and detection mechanisms, together with open issues and future directions.

show abstract

Detecting Deepfake Voice Using Explainable Deep Learning Techniques

Cited by 23 publications

References 22 publications

A Survey on the Detection and Impacts of Deepfakes in Visual, Audio, and Textual Formats

A Survey on the Detection and Impacts of Deepfakes in Visual, Audio, and Textual Formats

Attention-based Multimodal learning framework for Generalized Audio- Visual Deepfake Detection

Deepfakes Generation and Detection: A Short Survey

Contact Info

Product

Resources

About