2013
DOI: 10.21236/ada614010
|View full text |Cite
|
Sign up to set email alerts
|

A Noise-Robust System for NIST 2012 Speaker Recognition Evaluation

Abstract: The National Institute of Standards and Technology (NIST) 2012 speaker recognition evaluation posed several new challenges including noisy data, varying test-sample length and number of enrollment samples, and a new metric. Target speakers were known during system development and could be used for model training and score normalization. For the evaluation, SRI International (SRI) submitted a system consisting of six subsystems that use different low-and high-level features, some specifically designed for noise… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
24
0

Year Published

2014
2014
2021
2021

Publication Types

Select...
6
2

Relationship

2
6

Authors

Journals

citations
Cited by 20 publications
(24 citation statements)
references
References 5 publications
0
24
0
Order By: Relevance
“…If xt is the extracted speaker factor and if m is the supervector of an UBM of the acoustic feature vectors encountered in the training data, then the GMM supervector mt representing a window centered around the considered frame is approximated by mt = m + V xt (4) with V being the eigenvoice matrix. The speaker factor extraction at time t considers the frames inside a window of length Te centered around t. The procedure for extracting the speaker factors is similar to the iVector extraction described in [10].…”
Section: Boundary Generation With Speaker Factor Extractionmentioning
confidence: 99%
See 1 more Smart Citation
“…If xt is the extracted speaker factor and if m is the supervector of an UBM of the acoustic feature vectors encountered in the training data, then the GMM supervector mt representing a window centered around the considered frame is approximated by mt = m + V xt (4) with V being the eigenvoice matrix. The speaker factor extraction at time t considers the frames inside a window of length Te centered around t. The procedure for extracting the speaker factors is similar to the iVector extraction described in [10].…”
Section: Boundary Generation With Speaker Factor Extractionmentioning
confidence: 99%
“…Note that the introduction of a voice activity detection (VAD) has already become common practice in related fields such as speaker recognition and language recognition (see e.g. [4,5]). …”
Section: Introductionmentioning
confidence: 99%
“…Even though one site continued to submit successful high-and low-level combined systems in those big data evaluations (Ferrer et al, 2013;Kajarekar et al, 2009;Scheffer et al, 2011), there was a consensus in turning back to cepstral-only systems. The computational complexity of higher-level systems and the relative improvements obtained in limited training data conditions helped the community to move towards a scientifically complex but very rewarding approach because of the performance and computational efficiency of new high-dimensional spectral systems as JFA-compensated GMM-UBM, and later, i-vector front-end extraction and PLDA based classification.…”
Section: Factor Analysis and I-vectorsmentioning
confidence: 99%
“…Both use simple GMM-based speech activity detection as defined in [12] and an i-vector/probabilistic linear discriminant analysis (PLDA) framework [13,14].…”
Section: Experimental Protocol and System Configurationmentioning
confidence: 99%
“…Full SRE'12 System: A gender-dependent system was trained based on the protocols used in the development of our SRE'12 submission [12] in order to evaluate the tuned features. The number UBM components increased to 2048 and was trained using a subset of 8000 clean speech samples; the i-vector subspace was trained using 51224 samples from which 600D i-vectors were extracted.…”
Section: Experimental Protocol and System Configurationmentioning
confidence: 99%