Fusion of Heterogeneous Speaker Recognition Systems in the STBU Submission for the NIST Speaker Recognition Evaluation 2006

Brümmer, Niko; Burget, Lukáš; Černocký, Jaň; Glembek, Ondřej; Grézl, František; Karafiát, Martin; Leeuwen, David A. van; Matějka, Pavel; Schwarz, Petr; Strasheim, Albert

doi:10.1109/tasl.2007.902870

Cited by 189 publications

(171 citation statements)

References 24 publications

Supporting

Mentioning

166

Contrasting

Unclassified

Order By: Relevance

“…6c and d). In contrast, when the separation between the categories is greater and the amount of data is small, the increases in the ELUB 15 A potential alternative that avoids the sudden truncation could be to fit a sigmoidal function in the logistic space [45]. 16 We believe that this range of amount of sample data and range of separation between the categories is sufficient to gain an understanding of the relative behaviour of the procedures and to conceptually interpolate and extrapolate within and beyond these ranges.…”

Section: Exploration Of the Behaviour Of The Four Procedures Using Simentioning

confidence: 99%

Avoiding overstating the strength of forensic evidence: Shrunk likelihood ratios/Bayes factors

Morrison

Poh

2018

Science & Justice

View full text Add to dashboard Cite

A B S T R A C TWhen strength of forensic evidence is quantified using sample data and statistical models, a concern may be raised as to whether the output of a model overestimates the strength of evidence. This is particularly the case when the amount of sample data is small, and hence sampling variability is high. This concern is related to concern about precision. This paper describes, explores, and tests three procedures which shrink the value of the likelihood ratio or Bayes factor toward the neutral value of one. The procedures are: (1) a Bayesian procedure with uninformative priors, (2) use of empirical lower and upper bounds (ELUB), and (3) a novel form of regularized logistic regression. As a benchmark, they are compared with linear discriminant analysis, and in some instances with non-regularized logistic regression. The behaviours of the procedures are explored using Monte Carlo simulated data, and tested on real data from comparisons of voice recordings, face images, and glass fragments.

show abstract

Section: Exploration Of the Behaviour Of The Four Procedures Using Simentioning

confidence: 99%

Avoiding overstating the strength of forensic evidence: Shrunk likelihood ratios/Bayes factors

Morrison

Poh

2018

Science & Justice

View full text Add to dashboard Cite

show abstract

“…In this work we choose the high-level fusion approach due to its ease of use for both multi-modal [8] and multi-algorithm [44,45,46] fusion.…”

Section: Bi-modal and Multi-algorithm Authentication Systemsmentioning

confidence: 99%

“…We take the well-known statistical linear logistic regression approach, which has been successfully employed for combining heterogeneous speaker and face authentication classifiers [44,45,46] and for bi-modal (face and speaker) authentication [8].…”

Section: Linear Logistic Regressionmentioning

confidence: 99%

Bi-modal biometric authentication on mobile phones in challenging conditions

Khoury

Shafey

McCool

et al. 2014

Image and Vision Computing

View full text Add to dashboard Cite

This paper examines the issue of face, speaker and bi-modal authentication in mobile environments when there is significant condition mismatch. We introduce this mismatch by enrolling client models on high quality biometric samples obtained on a laptop computer and authenticating them on lower quality biometric samples acquired with a mobile phone. To perform these experiments we develop three novel authentication protocols for the large publicly available MOBIO database. We evaluate state-of-the-art face, speaker and bi-modal authentication techniques and show that inter-session variability modelling using Gaussian mixture models provides a consistently robust system for face, speaker and bi-modal authentication. It is also shown that multi-algorithm fusion provides a consistent performance improvement for face, speaker and bi-modal authentication. Using this bi-modal multi-algorithm system we derive a state-of-the-art authentication system that obtains a half total error rate of 6.3% and 1.9% for Female and Male trials, respectively.

show abstract

“…But the biggest difference, as highlighted in Brümmer et al (2007), is that especially from SRE 2006, "systems no longer train individual speaker models from some minutes of speech, but whole systems are trained on hundreds of hours of speech in whole NIST SRE databases" (p. 2082), transforming the conceptually simple speaker detection task, classically seen as that of comparing two utterances to determine if they come or not from the same speaker, into a serious big data task where systems are designed to jointly optimize the detection of thousands of speakers in hundreds of thousands of comparisons, where the speech segments in the comparisons are tens of thousands of utterances of varied and mixed channel, speaking style, duration and noise characteristics.…”

Section: Big Data Evaluations (2006-2012): Session Variability Compenmentioning

confidence: 99%

Evaluating Automatic Speaker Recognition systems: An overview of the NIST Speaker Recognition Evaluations (1996-2014)

González-Rodríguez¹

2014

loquens

View full text Add to dashboard Cite

Automatic Speaker Recognition systems show interesting properties, such as speed of processing or repeatability of results, in contrast to speaker recognition by humans. But they will be usable just if they are reliable. Testability, or the ability to extensively evaluate the goodness of the speaker detector decisions, becomes then critical. In the last 20 years, the US National Institute of Standards and Technology (NIST) has organized, providing the proper speech data and evaluation protocols, a series of text-independent Speaker Recognition Evaluations (SRE). Those evaluations have become not just a periodical benchmark test, but also a meeting point of a collaborative community of scientists that have been deeply involved in the cycle of evaluations, allowing tremendous progress in a specially complex task where the speaker information is spread across different information levels (acoustic, prosodic, linguistic…) and is strongly affected by speaker intrinsic and extrinsic variability factors. In this paper, we outline how the evaluations progressively challenged the technology including new speaking conditions and sources of variability, and how the scientific community gave answers to those demands. Finally, NIST SREs will be shown to be not free of inconveniences, and future challenges to speaker recognition assessment will also be discussed. NIST de reconocimiento de locutor (1996NIST de reconocimiento de locutor ( -2014.-Los sistemas automáticos de reconocimiento de locutor son críticos para la organización, etiquetado, gestión y toma de decisiones sobre grandes bases de datos de voces de diferentes locutores. Con el fin de procesar eficientemente tales cantidades de información de voz, necesitamos sistemas muy rápidos y, al no estar libre de errores, lo suficientemente fiables. Los sistemas actuales son órdenes de magnitud más rápidos que tiempo real, permitiendo tomar decisiones automáticas instantáneas sobre enormes cantidades de conversaciones. Pero tal vez la característica más interesante de un sistema automático es la posibilidad de ser analizado en detalle, ya que su rendimiento y fiabilidad puede ser evaluada de manera ciega sobre cantidades enormes de datos en una gran diversidad de condiciones. En los últimos 20 años, el Instituto Nacional de Estándares y Tecnología (NIST) de EE. UU. ha organizado, proporcionando los datos de voz y protocolos de evaluación adecuada, una serie de evaluaciones de reconocimiento de locutor independiente del texto. Esas evaluaciones se han convertido no sólo en una prueba comparativa periódica, sino también en punto de encuentro de una comunidad colaborativa de científicos que han estado profundamente involucrados en el ciclo de evaluaciones, lo que ha permitido un enorme progreso en una tarea especialmente compleja en la que la información individualizadora del locutor se encuentra dispersa en diferentes niveles de información (acústica, prosódica, lingüística...) y está fuertemente afectada por factores de variabilidad intrínsecos y extrínsecos al ...

show abstract

Fusion of Heterogeneous Speaker Recognition Systems in the STBU Submission for the NIST Speaker Recognition Evaluation 2006

Cited by 189 publications

References 24 publications

Avoiding overstating the strength of forensic evidence: Shrunk likelihood ratios/Bayes factors

Avoiding overstating the strength of forensic evidence: Shrunk likelihood ratios/Bayes factors

Bi-modal biometric authentication on mobile phones in challenging conditions

Evaluating Automatic Speaker Recognition systems: An overview of the NIST Speaker Recognition Evaluations (1996-2014)

Contact Info

Product

Resources

About