Speaker verification (SV) systems using deep neural network embeddings, so-called the x-vector systems, are becoming popular due to its good performance superior to the i-vector systems. The fusion of these systems provides improved performance benefiting both from the discriminatively trained x-vectors and generative i-vectors capturing distinct speaker characteristics. In this paper, we propose a novel method to include the complementary information of i-vector and x-vector, that is called generative x-vector. The generative x-vector utilizes a transformation model learned from the i-vector and x-vector representations of the background data. Canonical correlation analysis is applied to derive this transformation model, which is later used to transform the standard x-vectors of the enrollment and test segments to the corresponding generative x-vectors. The SV experiments performed on the NIST SRE 2010 dataset demonstrate that the system using generative x-vectors provides considerably better performance than the baseline i-vector and x-vector systems. Furthermore, the generative x-vectors outperform the fusion of i-vector and x-vector systems for long-duration utterances, while yielding comparable results for short-duration utterances.
I-vector has been one of the state-of-the-art techniques in speaker recognition. The main computational load of the standard i-vector extraction is to evaluate the posterior covariance matrix, which is required in estimating the i-vector. This limits the potential use of i-vector on handheld devices and for large-scale cloud-based applications. Previous fast approaches focus on simplifying the posterior covariance computation. In this paper, we propose a method for rapid computation of ivector which bypasses the need to evaluate a full posterior covariance thereby speeds up the extraction process with minor impact on the recognition accuracy. This is achieved by the use of subspace-orthonormalizing prior and the uniform-occupancy assumption that we introduce in this paper. From the experiments conducted on the extended core task of NIST SRE'10, we obtained significant speed-up with modest degradation in performance over the standard i-vector.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.