Feature fusion is a paradigm that has found success in a number of speech related tasks. The primary objective in applying fusion is to leverage the complementary information present in the features. Conventionally, either early or late fusion is employed. Early fusion leads to large dimensional feature vectors. Further, the range of feature values for different streams require appropriate normalisation. Late fusion is carried out at score level, where the contribution from each type of feature is determined from the set of weights used. Feature switching is yet another paradigm that attempts to capture the diversity in the feature types used. Feature switching gains significance particularly in the context of speaker verification, where the feature type that best discriminates a speaker is used to verify the claims corresponding to that speaker. Earlier, feature switching was attempted in the conventional UBM-GMM framework. In this paper, the idea is extended to the Total Variability Space (TVS) framework. Two different feature types namely Modified Group Delay (MGD) and Mel-Frequency Cepstral Coefficients (MFCC) are explored in the proposed framework. Results are presented on NIST 2010 male database for the speaker verification task.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.