Interspeech 2017 2017
DOI: 10.21437/interspeech.2017-555
|View full text |Cite
|
Sign up to set email alerts
|

UTD-CRSS Systems for 2016 NIST Speaker Recognition Evaluation

Abstract: This document briefly describes the systems submitted by the Center for Robust Speech Systems (CRSS) from The University of Texas at Dallas (UTD) to the 2016 National Institute of Standards and Technology (NIST) Speaker Recognition Evaluation (SRE). We developed several UBM and DNN i-Vector based speaker recognition systems with different data sets and feature representations. Given that the emphasis of the NIST SRE 2016 is on language mismatch between training and enrollment/test data, so-called domain mismat… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

0
8
0

Year Published

2017
2017
2021
2021

Publication Types

Select...
8
1

Relationship

4
5

Authors

Journals

citations
Cited by 11 publications
(8 citation statements)
references
References 9 publications
0
8
0
Order By: Relevance
“…However, i-vector systems are prone to have performance degradation when short utterances are met in enrollment/test phase. Fig.1 is the DET curves with respect to different durations of test utterances in CRSS submissions for SRE16 [10]. A clear speaker verification performance drop can be found in this analysis.…”
Section: Introductionmentioning
confidence: 70%
“…However, i-vector systems are prone to have performance degradation when short utterances are met in enrollment/test phase. Fig.1 is the DET curves with respect to different durations of test utterances in CRSS submissions for SRE16 [10]. A clear speaker verification performance drop can be found in this analysis.…”
Section: Introductionmentioning
confidence: 70%
“…Thereafter, i-vectors are post-processed with lengthnormalization and LDA. Eventually, PLDA is trained and final log-likelihood scores are calculated [28].…”
Section: Speaker Recognition Evaluationmentioning
confidence: 99%
“…However, these systems often rely on a large collection of in-domain and well-annotated data, e.g., transcriptions for ASR DNN acoustic modeling and speaker labels for PLDA training [5,6]. Studies have shown a significant performance gap between in-domain and out-of-domain systems [7,8,9,10]. Also, it is expensive to collect a large amount of labeled data for every new domain.…”
Section: Introductionmentioning
confidence: 99%
“…However, it is noted that the DAC corpus is mostly English speech. Consequently, we find that similar techniques are not so effective in the NIST SRE16 setup, where a more severe domain mismatch (i.e., language mismatch between training data and enrollment/test data) is designed to encourage effective domain adaptation methods [9,10]. This This project was funded in part by AFRL under contract FA8750-15-1-0205 and partially by the University of Texas at Dallas from the Distinguished University Chair in Telecommunications Engineering held by J. H. L. Hansen.…”
Section: Introductionmentioning
confidence: 99%