ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2021
DOI: 10.1109/icassp39728.2021.9414670
|View full text |Cite
|
Sign up to set email alerts
|

A Capsule Network Based Approach for Detection of Audio Spoofing Attacks

Abstract: Audio spoofing attacks not only increasingly pose a threat to automatic speaker verification systems but also have the potential to destabilize national security (e.g., by creating fake audio of influential politicians). The main purpose of anti-spoofing is to detect fake audios synthesized by advanced methods, while current algorithms using convolutional neural networks as classifiers exposed poor generalization to the unknown attacks. In this paper, as the first attempt, we introduce a capsule network to enh… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
8
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
4
3
1
1

Relationship

0
9

Authors

Journals

citations
Cited by 52 publications
(8 citation statements)
references
References 15 publications
0
8
0
Order By: Relevance
“…Additionally, some of the audio spoof detection methods have been extended by working on the features which are fed into the network (Balamurali et al, 2019 ). While others have changed the networks used or have improved both networks and features (Scardapane et al, 2017 ; Alzantot et al, 2019 ; Chintha et al, 2020 ; Rahul et al, 2020 ; Wang et al, 2020b ; Luo A. et al, 2021 ). Given the fact that one of the most important deepfake detection challenges is “generalization,” researchers are highly recommended to work on generalization by changing or improving both of the networks and features as well as defining different loss functions (Chen T. et al, 2020 ; Zhang Y. et al, 2021 ).…”
Section: Discussion and Future Directionsmentioning
confidence: 99%
See 1 more Smart Citation
“…Additionally, some of the audio spoof detection methods have been extended by working on the features which are fed into the network (Balamurali et al, 2019 ). While others have changed the networks used or have improved both networks and features (Scardapane et al, 2017 ; Alzantot et al, 2019 ; Chintha et al, 2020 ; Rahul et al, 2020 ; Wang et al, 2020b ; Luo A. et al, 2021 ). Given the fact that one of the most important deepfake detection challenges is “generalization,” researchers are highly recommended to work on generalization by changing or improving both of the networks and features as well as defining different loss functions (Chen T. et al, 2020 ; Zhang Y. et al, 2021 ).…”
Section: Discussion and Future Directionsmentioning
confidence: 99%
“…In addition, some of the audio spoof detection systems have been extended by working on the features which are fed into the network (Balamurali et al, 2019 ). While others have worked on the networks used or both of the networks and features (Scardapane et al, 2017 ; Alzantot et al, 2019 ; Chintha et al, 2020 ; Rahul et al, 2020 ; Wang et al, 2020b ; Luo A. et al, 2021 ). Therefore, besides the modeling phase, the features which are fed to the models are really challenging in the field of audio deepfake.…”
Section: Deepfake Categoriesmentioning
confidence: 99%
“…Tak et al proposed a new end-to-end RawNet2 neural network that directly inputs the raw waveform of audio into the network without the need for manually extracting acoustic features [37]. Luo et al introduced a novel Capsule Network (CapsNet) in 2021 [38], which altered the dynamic learning algorithm of the original network to enable the network to fully learn the artifacts in forged audio spectrograms and thereby improve network performance. Chen et al designed a novel spoofing audio recognition system called SpoofPrint based on existing spoofing audio detection techniques [39].…”
Section: Related Workmentioning
confidence: 99%
“…However, they tested the model with two voice-conversion attacks only. A capsule network is modified by replacing the ReLU with a leaky ReLU layer and a modified routing algorithm for better attention to the speech artifacts [ 44 ]. They focused on text-to-speech-based attacks in spoofing.…”
Section: Related Workmentioning
confidence: 99%