2018 IEEE Spoken Language Technology Workshop (SLT) 2018
DOI: 10.1109/slt.2018.8639666
|View full text |Cite
|
Sign up to set email alerts
|

Analysing The Predictions Of a CNN-Based Replay Spoofing Detection System

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
28
0
1

Year Published

2019
2019
2022
2022

Publication Types

Select...
5
1

Relationship

2
4

Authors

Journals

citations
Cited by 21 publications
(29 citation statements)
references
References 15 publications
0
28
0
1
Order By: Relevance
“…We operate on power spectrograms instead of other alternative time-frequency representations, following findings in [25]. All our sub-CNNs use the architecture described in [26], which is an adapted version of the best performing model [25] in the ASVspoof 2017 challenge. It consists of 9 convolutional layers, 5 max-pooling layers and 2 fully connected (FC) layers.…”
Section: Proposed Methodologymentioning
confidence: 99%
See 2 more Smart Citations
“…We operate on power spectrograms instead of other alternative time-frequency representations, following findings in [25]. All our sub-CNNs use the architecture described in [26], which is an adapted version of the best performing model [25] in the ASVspoof 2017 challenge. It consists of 9 convolutional layers, 5 max-pooling layers and 2 fully connected (FC) layers.…”
Section: Proposed Methodologymentioning
confidence: 99%
“…Following our prior findings [30] on the ASVspoof 2019 PA dataset, we remove zero-valued samples from the start and end of every audio recording in the dataset. Likewise, on the ASVspoof 2017 v2.0 dataset, we remove leading and trailing silence/nonspeech samples following our findings in [26]. For this, we use our publicly released speech endpoint annotations [31].…”
Section: Input Representation and Preprocessingmentioning
confidence: 99%
See 1 more Smart Citation
“…Second, following [44], we augment a deep-CNN as an AC to the output of the decoder network. Here, we use the CNN architecture from [45]. From hereon, we call these two setups as AC-VAE 1 and AC-VAE 2 respectively.…”
Section: Conditioning Vaes By Class Labelmentioning
confidence: 99%
“…We consider both CQCC [66] and log-power spectrogram features. We apply a pre-processing step on the raw-audio waveforms to trim silence/noise before and after the utterance in the training, development and test sets, following recommendations in [45] and [76]. Following [84], we extract log energy plus 19dimensional static coefficients augmented with deltas and double-deltas, yielding 60-dimensional feature vectors.…”
Section: Features and Input Representationmentioning
confidence: 99%