2021 Edition of the Automatic Speaker Verification and Spoofing Countermeasures Challenge 2021
DOI: 10.21437/asvspoof.2021-9
|View full text |Cite
|
Sign up to set email alerts
|

Speech is Silver, Silence is Golden: What do ASVspoof-trained Models Really Learn?

Abstract: The ASVspoof Dataset is one of the most established datasets for training and benchmarking systems designed for the detection of spoofed audio and audio deepfakes. However, we observe an uneven distribution of silence length in dataset's training and test data, which hints at the target label: Bona-fide instances tend to have significantly longer leading + trailing silences than spoofed instances. This could be problematic, since a model may learn to only, or at least partially, base its decision on the length… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
11
0

Year Published

2022
2022
2025
2025

Publication Types

Select...
4
3

Relationship

1
6

Authors

Journals

citations
Cited by 33 publications
(16 citation statements)
references
References 8 publications
0
11
0
Order By: Relevance
“…al. [13] shows that a bias can be found in the distribution of the lengths of leading and trailing silences in bonafide and synthetic speeches. Authors argue that most detectors are just probably discriminating between forged and bonafide samples by using this information.…”
Section: Dataset Preparation and Experimental Setupmentioning
confidence: 94%
See 3 more Smart Citations
“…al. [13] shows that a bias can be found in the distribution of the lengths of leading and trailing silences in bonafide and synthetic speeches. Authors argue that most detectors are just probably discriminating between forged and bonafide samples by using this information.…”
Section: Dataset Preparation and Experimental Setupmentioning
confidence: 94%
“…Authors argue that most detectors are just probably discriminating between forged and bonafide samples by using this information. In order to bypass this problem, silent parts were removed from the signal, as suggested in [13] but this led to a big loss in performance.…”
Section: Dataset Preparation and Experimental Setupmentioning
confidence: 99%
See 2 more Smart Citations
“…attacks disjoint from the attacks seen in training. However, the test audios share some specific characteristics [26] with the training data, which is why model generalization cannot be judged using the 'eval' split of ASVspoof 2019 alone. This motivates the use of our proposed 'in-the-wild' dataset, c.f.…”
Section: Train and Evaluation Data Splitsmentioning
confidence: 99%