Mireia Díez scite author profile

In the last years, the task of Query-by-Example Spoken Term Detection (QbE-STD), which aims to find occurrences of a spoken query in a set of audio documents, has gained the interest of the research community for its versatility in settings where untranscribed, multilingual and acoustically unconstrained spoken resources, or spoken resources in low-resource languages, must be searched. This paper describes and reports experimental results for a QbE-STD system that achieved the best performance in the recent Spoken Web Search (SWS) evaluation, held as part of MediaEval 2013. Though not optimized for speed, the system operates faster than real-time. The system exploits high-performance phone decoders to extract framelevel phone posteriors (a common representation in QbE-STD tasks). Then, given a query and a audio document, a distance matrix is computed between their phone posterior representations, followed by a newly introduced distance normalization technique and an iterative Dynamic Time Warping (DTW) matching procedure with some heuristic prunings. Results show that remarkable performance improvements can be achieved by using multiple examples per query and, specially, through the late (score-level) fusion of different subsystems, each based on a different set of phone posteriors.

show abstract

Bayesian HMM clustering of x-vector sequences (VBx) in speaker diarization: Theory, implementation and analysis on standard tasks

Landini

Profant

Díez

et al. 2022

Computer Speech & Language

View full text Add to dashboard Cite

On the use of phone log-likelihood ratios as features in spoken language recognition

Díez

Varona

Peñagarikano

et al. 2012

View full text Add to dashboard Cite

Bayesian HMM Based x-Vector Clustering for Speaker Diarization

Díez

Burget

Wang

et al. 2019

View full text Add to dashboard Cite

But System for the Second Dihard Speech Diarization Challenge

Landini

Wang

Díez

et al. 2020

View full text Add to dashboard Cite

Analysis of Score Normalization in Multilingual Speaker Recognition

Matějka¹,

Novotny²,

Plchot³

et al. 2017

View full text Add to dashboard Cite

NIST Speaker Recognition Evaluation 2016 has revealed the importance of score normalization for mismatched data conditions. This paper analyzes several score normalization techniques for test conditions with multiple languages. The best performing one for a PLDA classifier is an adaptive s-norm with 30% relative improvement over the system without any score normalization. The analysis shows that the adaptive score normalization (using top scoring files per trial) selects cohorts that in 68% contain recordings from the same language and in 92% of the same gender as the enrollment and test recordings. Our results suggest that the data to select score normalization cohorts should be a pool of several languages and channels and if possible, its subset should contain data from the target domain.

show abstract

End-to-End DNN Based Speaker Recognition Inspired by I-Vector and PLDA

Rohdin

Silnova

Díez

et al. 2018

View full text Add to dashboard Cite

Recently, several end-to-end speaker verification systems based on deep neural networks (DNNs) have been proposed. These systems have been proven to be competitive for text-dependent tasks as well as for text-independent tasks with short utterances. However, for text-independent tasks with longer utterances, end-to-end systems are still outperformed by standard i-vector + PLDA systems. In this work, we develop an end-to-end speaker verification system that is initialized to mimic an i-vector + PLDA baseline. The system is then further trained in an end-to-end manner but regularized so that it does not deviate too far from the initial system. In this way we mitigate overfitting which normally limits the performance of endto-end systems. The proposed system outperforms the i-vector + PLDA baseline on both long and short duration utterances.

show abstract

Speaker Diarization based on Bayesian HMM with Eigenvoice Priors

Díez¹,

Burget²,

Matějka³

2018

View full text Add to dashboard Cite

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Mireia Díez

High-performance Query-by-Example Spoken Term Detection on the SWS 2013 evaluation

Bayesian HMM clustering of x-vector sequences (VBx) in speaker diarization: Theory, implementation and analysis on standard tasks

On the use of phone log-likelihood ratios as features in spoken language recognition

Bayesian HMM Based x-Vector Clustering for Speaker Diarization

But System for the Second Dihard Speech Diarization Challenge

Analysis of Score Normalization in Multilingual Speaker Recognition

End-to-End DNN Based Speaker Recognition Inspired by I-Vector and PLDA

Speaker Diarization based on Bayesian HMM with Eigenvoice Priors

Contact Info

Product

Resources

About