2020
DOI: 10.48550/arxiv.2008.03616
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Variable frame rate-based data augmentation to handle speaking-style variability for automatic speaker verification

Abstract: The effects of speaking-style variability on automatic speaker verification were investigated using the UCLA Speaker Variability database which comprises multiple speaking styles per speaker. An x-vector/PLDA (probabilistic linear discriminant analysis) system was trained with the SRE and Switchboard databases with standard augmentation techniques and evaluated with utterances from the UCLA database. The equal error rate (EER) was low when enrollment and test utterances were of the same style (e.g., 0.98% and … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
2

Citation Types

0
5
0

Year Published

2020
2020
2023
2023

Publication Types

Select...
1
1

Relationship

0
2

Authors

Journals

citations
Cited by 2 publications
(5 citation statements)
references
References 25 publications
(35 reference statements)
0
5
0
Order By: Relevance
“…Few studies have focused on whether these factors actually matter [34][35][36], although if they do, it may bias the evaluation of the evidence. In the present study, we use the three speech styles available in the dataset developed for forensic claims and compare the results depending on sample duration.…”
Section: Introductionmentioning
confidence: 99%
“…Few studies have focused on whether these factors actually matter [34][35][36], although if they do, it may bias the evaluation of the evidence. In the present study, we use the three speech styles available in the dataset developed for forensic claims and compare the results depending on sample duration.…”
Section: Introductionmentioning
confidence: 99%
“…The multi-condition training (MCT) [42] can be regarded as a special normalization approach, belonging to the scoring theme. It pools the data from both enrollment and test conditions and trains a multi-conditional PLDA.…”
Section: Related Workmentioning
confidence: 99%
“…Our approach (SD/LT as the first and simplest case) also belongs to the scoring theme, but it is fundamentally different from the normalization methods that pursue a conditioninsensitive model as in IDVC [35] or MCT [42]. Instead, it admits the discrepancy between the enrollment and test conditions, and models the statistics of speaker vectors in the two conditions respectively.…”
Section: Related Workmentioning
confidence: 99%
See 2 more Smart Citations