Probabilistic linear discriminant analysis of i-vector posterior distributions

Cumani, Sandro; Plchot, Oldřich; Laface, Pietro

doi:10.1109/icassp.2013.6639150

Cited by 53 publications

(43 citation statements)

References 4 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Varying utterance duration was compensated for by calibrating the PLDA score in (Hasan et al, 2013), and by exploiting the uncertainty in the i-vector in (Cumani et al, 2013b). The effect of noise on PLDA-based systems is studied in (Mandasari et al, 2012).…”

Section: Introductionmentioning

confidence: 99%

From single to multiple enrollment i-vectors: Practical PLDA scoring variants for speaker verification

Rajan

Afanasyev

Hautamäki

et al. 2014

Digital Signal Processing

View full text Add to dashboard Cite

The availability of multiple utterances (and hence, i-vectors) for speaker enrollment brings up several alternatives for their utilization with probabilistic linear discriminant analysis (PLDA). This paper provides an overview of their effective utilization, from a practical viewpoint. We derive expressions for the evaluation of the likelihood ratio for the multi-enrollment case, with details on the computation of the required matrix inversions and determinants. The performance of five different scoring methods, and the effect of i-vector length normalization is compared experimentally. We conclude that length normalization is a useful technique for all but one of the scoring methods considered, and averaging i-vectors is the most effective out of the methods compared. We also study the application of multicondition training on the PLDA model. Our experiments indicate that multicondition training is more effective in estimating PLDA hyperparameters than it is for likelihood computation. Finally, we look at the effect of the configuration of the enrollment data on PLDA scoring, studying the properties of conditional dependence and number-of-enrollment-utterances per target speaker. Our experiments indicate that these properties affect the performance of the PLDA model. These results further support the conclusion that i-vector averaging is a simple and effective way to process multiple enrollment utterances.

show abstract

Section: Introductionmentioning

confidence: 99%

From single to multiple enrollment i-vectors: Practical PLDA scoring variants for speaker verification

Rajan

Afanasyev

Hautamäki

et al. 2014

Digital Signal Processing

View full text Add to dashboard Cite

show abstract

“…Using the permutation property of the trace for the second term we get: (23) Although the dimension of is huge, we need only its diagonal because, for any feasible solution , matrix is diagonal. Moreover, since the atoms of the dictionary matrix are normalized, the diagonal elements of are .…”

Section: Matrix Optimizationmentioning

confidence: 99%

“…In [21] we have highlighted that the incidence of the time spent for i-vector computation in a system using large models and scoring long speaker segments is negligible compared to the importance of keeping the original accuracy and saving memory. However, the effectiveness of the i-vector extractor is more relevant for systems dealing with short utterances [22], [23], [24], [25] such as, for example, the text prompts in speaker verification [26], [27]. In this paper we propose a new approximate i-vector extraction approach particularly useful for applications that need to optimize their memory requirements without sensibly affecting their performance and speed.…”

Section: Introductionmentioning

confidence: 99%

Factorized Sub-Space Estimation for Fast and Memory Effective I-vector Extraction

Cumani

Laface

2014

IEEE/ACM Trans. Audio Speech Lang. Process.

Self Cite

View full text Add to dashboard Cite

show abstract

“…The first is based on speaker modelling (SM) which uses the assumption that each individual has different voice characteristics. Traditionally, speaker models are constructed with Gaussian mixture models (GMMs) and i-vectors [6,7,8], but more recently deep learning has been proven effective for speaker modelling [9,10,11,12,13]. In many systems, the models are often pre-trained for the target speakers [14,15] and are not applicable to unknown participants.…”

Section: Introductionmentioning

confidence: 99%

Who Said That?: Audio-Visual Speaker Diarisation of Real-World Meetings

2019

View full text Add to dashboard Cite

The goal of this work is to determine who spoke when' in real-world meetings. The method takes surround-view video and single or multi-channel audio as inputs, and generates robust diarisation outputs.To achieve this, we propose a novel iterative approach that first enrolls speaker models using audio-visual correspondence, then uses the enrolled models together with the visual information to determine the active speaker.We show strong quantitative and qualitative performance on a dataset of real-world meetings. The method is also evaluated on the public AMI meeting corpus, on which we demonstrate results that exceed all comparable methods. We also show that beamforming can be used together with the video to further improve the performance when multi-channel audio is available.

show abstract

Probabilistic linear discriminant analysis of i-vector posterior distributions

Cited by 53 publications

References 4 publications

From single to multiple enrollment i-vectors: Practical PLDA scoring variants for speaker verification

From single to multiple enrollment i-vectors: Practical PLDA scoring variants for speaker verification

Factorized Sub-Space Estimation for Fast and Memory Effective I-vector Extraction

Who Said That?: Audio-Visual Speaker Diarisation of Real-World Meetings

Contact Info

Product

Resources

About