Robust Speaker Recognition Using MAP Estimation of Additive Noise in i-vectors Space

Kheder, Waad Ben; Matrouf, Driss; Bousquet, Pierre-Michel; Bonastre, Jean-François; Ajili, Moez

doi:10.1007/978-3-319-11397-5_7

Cited by 8 publications

(8 citation statements)

References 7 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…In our previous work [24][25][26], we proposed an additive noise model in the i-vector space represented by the equation:…”

Section: The Plda Model For I-vector Scoringmentioning

confidence: 99%

“…In this paper, we explore two axes. On one side, we aim at improving the system performance by using two different techniques: 1 -The I-MAP algorithm [24][25][26] which is an i-vector denoising procedure based on an additive noise model in the i-vector space. It uses a Gaussian modeling of both clean i-vectors and the noise distributions in the i-vector space and have been proven to yield up to 60% of relative EER improvement compared to a baseline system performance.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

LIA System for the SITW Speaker Recognition Challenge

et al. 2016

Self Cite

View full text Add to dashboard Cite

This paper presents the speaker verification systems developed in the LIA lab at the University of Avignon for the SITW (Speakers In The Wild) challenge. We present the algorithms used to deal with additive noise, short utterances and propose an improved scoring scheme using a discriminative classifier and integrating the homogeneity of the two compared recordings. Due to the heterogeneity of this database (presence of background noise, reverberation, Lombard effect, etc.), it is hard to analyze the contribution of individual techniques used to deal with each problem. For this reason, a subset of the trials will be studied for each algorithm in order to emphasize its contribution.

show abstract

“…In our previous work [24][25][26], we proposed an additive noise model in the i-vector space represented by the equation:…”

Section: The Plda Model For I-vector Scoringmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

LIA System for the SITW Speaker Recognition Challenge

et al. 2016

Self Cite

View full text Add to dashboard Cite

show abstract

“…This paper is an extension of our work in [22] where we proposed an i-vector "denoising" technique, we called i-MAP, in order to deal with additive noise.…”

Section: Introductionmentioning

confidence: 99%

Fast i-vector denoising using MAP estimation and a noise distributions database for robust speaker recognition

Kheder

Matrouf

Bousquet

et al. 2017

Computer Speech & Language

Self Cite

View full text Add to dashboard Cite

Once the i-vector paradigm has been introduced in the field of speaker recognition, many techniques have been proposed to deal with additive noise within this framework. Due to the complexity of its effect in the i-vector space, a lot of effort has been put into dealing with noise in other domains (speech enhancement, feature compensation, robust i-vector extraction and robust scoring). As far as we know, there was no serious attempt to handle the noise problem directly in the i-vector space without relying on data distributions computed on a prior domain. The aim of this paper is twofold. First, it proposes a fullcovariance Gaussian modeling of the clean i-vectors and noise distribution in the i-vector space and introduces a technique to estimate a clean i-vector given the noisy version and the noise density function using the MAP approach. Based on NIST data, we show that it is possible to improve by up to 60% the baseline system performance. Second, in order to make this algorithm usable in a real application and reduce the computational time needed by i-MAP, we propose an extension that requires building a noise distribution database in the i-vector space in an off-line step and using it later in the test phase. We show that it is possible to achieve comparable results using this approach (up to 57% of relative EER improvement) with a sufficiently large noise distribution database.

show abstract

“…The noise and reverberation levels are frequently selected independently from each other without a specific application in mind, e.g., [20,21], therefore some of them might never happen in real life. Furthermore, they are often selected within a discrete set of values, e.g., [11,14,[22][23][24] or a narrow range of values, e.g., [25], which does not match the actual distribution of levels observed in real life and artificially advantages learning-based methods which may overfit those levels. Even when the distortion levels are realistic, there may still exist some acoustic mismatch, due to recording speech in a different place than noise and reverberation, e.g., [26,27].…”

Section: Introductionmentioning

confidence: 99%

A study of speech distortion conditions in real scenarios for speech processing applications

Ribas

Calvo

2016

2016 IEEE Spoken Language Technology Workshop (SLT)

View full text Add to dashboard Cite

The growing demand for robust speech processing applications able to operate in adverse scenarios calls for new evaluation protocols and datasets beyond artificial laboratory conditions. The characteristics of real data for a given scenario are rarely discussed in the literature. As a result, methods are often tested based on the author expertise and not always in scenarios with actual practical value. This paper aims to open this discussion by identifying some of the main problems with data simulation or collection procedures used so far and summarizing the important characteristics of real scenarios to be taken into account, including the properties of reverberation, noise and Lombard effect. At last, we provide some preliminary guidelines towards designing experimental setup and speech recognition results for proposal validation.

show abstract

Robust Speaker Recognition Using MAP Estimation of Additive Noise in i-vectors Space

Cited by 8 publications

References 7 publications

LIA System for the SITW Speaker Recognition Challenge

LIA System for the SITW Speaker Recognition Challenge

Fast i-vector denoising using MAP estimation and a noise distributions database for robust speaker recognition

A study of speech distortion conditions in real scenarios for speech processing applications

Contact Info

Product

Resources

About