2022
DOI: 10.3390/app12083926
|View full text |Cite
|
Sign up to set email alerts
|

Detecting Deepfake Voice Using Explainable Deep Learning Techniques

Abstract: Fake media, generated by methods such as deepfakes, have become indistinguishable from real media, but their detection has not improved at the same pace. Furthermore, the absence of interpretability on deepfake detection models makes their reliability questionable. In this paper, we present a human perception level of interpretability for deepfake audio detection. Based on their characteristics, we implement several explainable artificial intelligence (XAI) methods used for image classification on an audio-rel… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
3
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
6
1
1
1

Relationship

0
9

Authors

Journals

citations
Cited by 23 publications
(4 citation statements)
references
References 22 publications
(29 reference statements)
0
3
0
Order By: Relevance
“…However, this underscores the ongoing challenge of fully capturing the subtleties of human speech using artificial models. Lim et al [213] applied explainable AI (XAI) methods for deepfake voice detection, focussing on interpretations accessible to human perception. Their approach used a simple model that combined a convolutional neural network and LSTM with spectrograms used for feature extraction from raw audio data.…”
Section: ) Methods Using Handcrafted Featuresmentioning
confidence: 99%
“…However, this underscores the ongoing challenge of fully capturing the subtleties of human speech using artificial models. Lim et al [213] applied explainable AI (XAI) methods for deepfake voice detection, focussing on interpretations accessible to human perception. Their approach used a simple model that combined a convolutional neural network and LSTM with spectrograms used for feature extraction from raw audio data.…”
Section: ) Methods Using Handcrafted Featuresmentioning
confidence: 99%
“…They are effective speech signal visualizations because they demonstrate the frequency and intensity uctuations over time. Moreover, the image-based methods outperformed feature-based techniques, including those that made use of characteristics related to energy, bandwidth, frequency, and short-term transform features like MFCCs, for the identi cation of synthetic audio [72,73].…”
Section: Audio Networkmentioning
confidence: 99%
“…Therefore, there is a crucial gap between academic deepfake solutions and real-world scenarios or requirements. For instance, the foregoing works are usually lagging in the robustness of the systems against adversarial attacks [ 44 ], decision explainability [ 45 ], and real-time mobile deepfake detection [ 46 ].…”
Section: Introductionmentioning
confidence: 99%