Compensate multiple distortions for speaker recognition systems

Mohammadamini, Mohammad; Matrouf, Driss; Bonastre, Jean-François; Serizel, Romain; Dowerah, Sandipana; Jouvet, Denis

doi:10.23919/eusipco54536.2021.9615983

Cited by 4 publications

(8 citation statements)

References 17 publications

(22 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Despite having good results for simulated noises this work doesn't include real noise and reverberation. In another work two configurations are proposed to denoise different kinds of distortions such as noise, early reverberation, and late reverberation [7]. In this paper also the capability of doing noise compensation is not explored in real environments.…”

Section: Related Workmentioning

confidence: 99%

“…The problem of noise and reverberation is addressed at different levels of speaker recognition systems, including signal level [4], feature level [5], speaker modeling level [6], x-vector level [7] and scoring technique adaptation [8]. Data augmen-tation is another approach to making the speaker recognition systems robust against noise.…”

Section: Introductionmentioning

confidence: 99%

“…Noise compensation in x-vector level, the estimation of clean x-vector from its corresponding noisy version, by doing a transformation between pairs of noisy/clean x-vectors is another approach that performed well in the compensation of artificial noise and reverberation [9,3]. Although this approach performs well in some cases [7], it doesn't bring a significant improvement with all speaker embedding systems [10] and in all environments. The behavior of different speaker embedding systems is different because they just consider the speaker classification accuracy (inter-speaker and intra-speaker distance) during optimization and they don't put an explicit constraint on the noise impact.…”

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

Learning Noise Robust ResNet-Based Speaker Embedding for Speaker Recognition

Mohammadamini¹,

Matrouf²,

Bonastre³

et al. 2022

The Speaker and Language Recognition Workshop (Odyssey 2022)

Self Cite

View full text Add to dashboard Cite

The presence of background noise and reverberation, especially in far distance speech utterances diminishes the performance of speaker recognition systems. This challenge is addressed on different levels from the signal level in the front end to the scoring technique adaptation in the back end. In this paper, two new variants of ResNet-based speaker recognition systems are proposed that make the speaker embedding more robust against additive noise and reverberation. The goal of the proposed systems is to extract x-vectors in noisy environments that are close to their corresponding x-vector in a clean environment. To do so, the speaker embedding network minimizes the speaker classification loss function and the distance between pairs of noisy and clean x-vectors jointly. The experimental results obtained by our systems are compared with the baseline ResNet system. In different situations with real and simulated noises and reverberation conditions, the modified systems outperform the baseline ResNet system. The proposed systems are tested with four evaluation protocols. In the presence of artificial noise and reverberation, we achieved 19% improvement of EER. The main advantage of the proposed systems is their efficiency against real noise and reverberation. In the presence of real noise and reverberation, we achieved 15% improvement of EER.

show abstract

Section: Related Workmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Learning Noise Robust ResNet-Based Speaker Embedding for Speaker Recognition

Mohammadamini¹,

Matrouf²,

Bonastre³

et al. 2022

The Speaker and Language Recognition Workshop (Odyssey 2022)

Self Cite

View full text Add to dashboard Cite

show abstract

“…Although, the DNN-based speaker embedding systems have given a degree of robustness against acoustic noises, there is a significant degradation of their performance in the presence of background noise, reverberation and other variabilities [4] [5] [6]. Various approaches have been proposed to handle these variabilities in different parts of the system such as: signal level [7], feature level [8], speaker modeling level [9], xvector level [6] and scoring technique level [10]. Addressing the variabilities at each step has its own advantage and disadvantages in terms of data, computational resources, efficiency, etc.…”

Section: Introductionmentioning

confidence: 99%

Barlow Twins self-supervised learning for robust speaker recognition

et al. 2022

Self Cite

View full text Add to dashboard Cite

Acoustic noise is a big challenge for speaker recognition systems. The state-of-the-art speaker recognition systems are based on deep neural network speaker embeddings called xvector extractor. A noise-robust x-vector extractor is highly demanded in speaker recognition systems. In this paper, we introduce Barlow Twins self-supervised loss function in the area of speaker recognition. Barlow Twins objective function tries to optimize two criteria: Firstly, it increases the similarity between two versions of the same signal (i.e. the clean and its augmented noisy version) to make the speaker embedding invariant to the acoustic noise. Secondly, it reduces the redundancy between dimensions of the x-vectors that improves the overall quality of speaker embeddings. In our research, Barlow Twins objective function is integrated with the ResNet-based speaker embedding system. In the proposed system, the Barlow Twins objective function is calculated in the embedding layer and it is optimized jointly with the speaker classifier loss function. The experimental results on Fabiole corpus show 22 % relative gain in terms of EER in the clean environments and 18% improvement in the presence of noise with low SNR and reverberation.

show abstract

“…The robustness of the DNN-based speaker recognition (SR) systems in general and specifically their robustness against environment variabilities such as additive noise, reverberation, and far-distance recording device has made them more promising. Several strategies such as data argumentation [4], and noise compensation [5], [6] are explored to make the TDNNbased SR systems more robust against noise and reverberation and other variabilities. The previous research shows the weakness of TDNN-based SRs against noise and reverberation distortions.…”

Section: Introductionmentioning

confidence: 99%

A Comprehensive Exploration of Noise Robustness and Noise Compensation in ResNet and TDNN-based Speaker Recognition Systems

Mohammadamini

Matrouf

Bonastre

et al. 2022

2022 30th European Signal Processing Conference (EUSIPCO)

Self Cite

View full text Add to dashboard Cite

In this paper, a comprehensive exploration of noise robustness and noise compensation of ResNet and TDNN speaker recognition systems is presented. Firstly the robustness of the TDNN and ResNet in the presence of noise, reverberation, and both distortions is explored. Our experimental results show that in all cases the ResNet system is more robust than TDNN. After that, a noise compensation task is done with denoising autoencoder (DAE) over the x-vectors extracted from both systems. We explored two scenarios: 1) compensation of artificial noise with artificial data, 2) compensation of real noise with artificial data. The second case is the most desired scenario, because it makes noise compensation affordable without having real data to train denoising techniques. The experimental results show that in the first scenario noise compensation gives significant improvement with TDNN while this improvement in Resnet is not significant. In the second scenario, we achieved 15% improvement of EER over VoiCes Eval challenge in both TDNN and ResNet systems. In most cases the performance of ResNet without compensation is superior to TDNN with noise compensation.

show abstract

Compensate multiple distortions for speaker recognition systems

Cited by 4 publications

References 17 publications

Learning Noise Robust ResNet-Based Speaker Embedding for Speaker Recognition

Learning Noise Robust ResNet-Based Speaker Embedding for Speaker Recognition

Barlow Twins self-supervised learning for robust speaker recognition

A Comprehensive Exploration of Noise Robustness and Noise Compensation in ResNet and TDNN-based Speaker Recognition Systems

Contact Info

Product

Resources

About