State-of-the-art language recognition systems involve modeling utterances with the i-vectors. However, the uncertainty of the i-vector extraction process represented by the i-vector posterior covariance is affected by various factors such as channel mismatch, background noise, incomplete transformations and duration variability. In this paper, we propose a new quality measure based on the i-vector posterior covariance and incorporate it into the recognition process to improve the recognition accuracy. The experimental results with LRE15 database and various duration conditions show a 2.9% relative improvement in terms of average performance cost as a result of incorporating the proposed quality measure in language recognition systems.
I-vector based language recognitionIn this section, we describe the main components of an ivector/PLDA-based language recognition system.
The i-vector frameworkAn i-vector is a low-dimensional feature vector for representing utterances of arbitrary duration. We assume that each utterance possesses a speaker-and channel-dependent GMM mean supervector, M, in the form [5]: