Longting Xu scite author profile

Speaker verification (SV) systems using deep neural network embeddings, so-called the x-vector systems, are becoming popular due to its good performance superior to the i-vector systems. The fusion of these systems provides improved performance benefiting both from the discriminatively trained x-vectors and generative i-vectors capturing distinct speaker characteristics. In this paper, we propose a novel method to include the complementary information of i-vector and x-vector, that is called generative x-vector. The generative x-vector utilizes a transformation model learned from the i-vector and x-vector representations of the background data. Canonical correlation analysis is applied to derive this transformation model, which is later used to transform the standard x-vectors of the enrollment and test segments to the corresponding generative x-vectors. The SV experiments performed on the NIST SRE 2010 dataset demonstrate that the system using generative x-vectors provides considerably better performance than the baseline i-vector and x-vector systems. Furthermore, the generative x-vectors outperform the fusion of i-vector and x-vector systems for long-duration utterances, while yielding comparable results for short-duration utterances.

show abstract

Speech enhancement based on nonnegative matrix factorization in constant-Q frequency domain

Wei

Zaidi

et al. 2021

Applied Acoustics

View full text Add to dashboard Cite

Generalizing I-Vector Estimation for Rapid Speaker Recognition

Lee

et al. 2018

IEEE/ACM Trans. Audio Speech Lang. Process.

View full text Add to dashboard Cite

Rapid Computation of I-vector

Xu¹,

Lee²,

Li³

et al. 2016

View full text Add to dashboard Cite

I-vector has been one of the state-of-the-art techniques in speaker recognition. The main computational load of the standard i-vector extraction is to evaluate the posterior covariance matrix, which is required in estimating the i-vector. This limits the potential use of i-vector on handheld devices and for large-scale cloud-based applications. Previous fast approaches focus on simplifying the posterior covariance computation. In this paper, we propose a method for rapid computation of ivector which bypasses the need to evaluate a full posterior covariance thereby speeds up the extraction process with minor impact on the recognition accuracy. This is achieved by the use of subspace-orthonormalizing prior and the uniform-occupancy assumption that we introduce in this paper. From the experiments conducted on the extended core task of NIST SRE'10, we obtained significant speed-up with modest degradation in performance over the standard i-vector.

show abstract

Graph Fourier Transform Based Audio Zero-Watermarking

Huang

Zaidi

et al. 2021

IEEE Signal Process. Lett.

View full text Add to dashboard Cite

Multi-dimensional Speaker Information Recognition with Multi-task Neural Network

Chen

Yang

2018

View full text Add to dashboard Cite

A novel robust zero-watermarking algorithm for audio based on sparse representation

Huang

Guo

et al. 2021

China Commun.

View full text Add to dashboard Cite

12 3 4

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Longting Xu

Cascaded Convolutional Neural Network-Based Hyperspectral Image Resolution Enhancement via an Auxiliary Panchromatic Image

Generative X-Vectors for Text-Independent Speaker Verification

Speech enhancement based on nonnegative matrix factorization in constant-Q frequency domain

Generalizing I-Vector Estimation for Rapid Speaker Recognition

Rapid Computation of I-vector

Graph Fourier Transform Based Audio Zero-Watermarking

Multi-dimensional Speaker Information Recognition with Multi-task Neural Network

A novel robust zero-watermarking algorithm for audio based on sparse representation

Contact Info

Product

Resources

About