Interspeech 2021 2021
DOI: 10.21437/interspeech.2021-438
|View full text |Cite
|
Sign up to set email alerts
|

RW-Resnet: A Novel Speech Anti-Spoofing Model Using Raw Waveform

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1

Citation Types

0
2
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
4
2
1

Relationship

0
7

Authors

Journals

citations
Cited by 20 publications
(3 citation statements)
references
References 0 publications
0
2
0
Order By: Relevance
“…Fig. 5 shows [13] 0.1000 5.06 FFT-LCNN [13] 0.1028 4.53 LFCC-Siamese CNN [15] 0.0930 3.79 FFT-LCGRNN [7] 0.0776 3.03 RW-Resnet [19] 0.0820 2.98 Ling et al [16] 0.0510 1.87 FFT-L-SENet [38] 0.0368 1.14 AASIST [7] 0.0347 1.13 LPS(F0) (ours) 0.0358 1.21 (b) Primary systems System t-DCF EER% T05 [28] 0.0069 0.22 T45 [13] 0.0510 1.84 T60 [3] 0.0755 2.64 GMM fusion [26] 0.0740 2.92 T24 [28] 0.0953 3.45 T50 [36] 0.1671 3.56 (Imag(L)+Real(H)) + LPS(F0) (ours) 0.0143 0.43 the detailed performance of LPS in different attacks of the evaluation set.…”
Section: Effectiveness Of F0 Subbandmentioning
confidence: 99%
See 1 more Smart Citation
“…Fig. 5 shows [13] 0.1000 5.06 FFT-LCNN [13] 0.1028 4.53 LFCC-Siamese CNN [15] 0.0930 3.79 FFT-LCGRNN [7] 0.0776 3.03 RW-Resnet [19] 0.0820 2.98 Ling et al [16] 0.0510 1.87 FFT-L-SENet [38] 0.0368 1.14 AASIST [7] 0.0347 1.13 LPS(F0) (ours) 0.0358 1.21 (b) Primary systems System t-DCF EER% T05 [28] 0.0069 0.22 T45 [13] 0.0510 1.84 T60 [3] 0.0755 2.64 GMM fusion [26] 0.0740 2.92 T24 [28] 0.0953 3.45 T50 [36] 0.1671 3.56 (Imag(L)+Real(H)) + LPS(F0) (ours) 0.0143 0.43 the detailed performance of LPS in different attacks of the evaluation set.…”
Section: Effectiveness Of F0 Subbandmentioning
confidence: 99%
“…This is because for the LFCC-Capsule Fusion System [18], T45 [13], T60 [3] and Ling et al [16] the features are based on the magnitude spectrogram, and for the FFT-L-SENet [38] system, whose features are based on low frequency and magnitude spectrogram, which will lead to loss of information and phase information in high frequency. Although the RW-Resnet [19] and RAWNet2 [27] systems are based on the original waveform without losing speech information, the original waveform is affected by many factors, and it is difficult to effectively distinguish between real and fake speech. In addition, the T05 is obtained from 7 single systems, including 2 ResNet systems, 4 MobileNet systems, and a DenseNet system.…”
Section: Comparison With Other Systemsmentioning
confidence: 99%
“…In recent years, more and more researchers have attempted to improve the detection capability of ASV systems by building deep learning models. Many studies [9][10][11] have proposed effective network architectures, such as using raw waveforms as input to obtain better representations. However, the robustness of the model is still limited by the insufficiency of the training data.…”
Section: Introductionmentioning
confidence: 99%