This paper presents a new approach to text-independent speaker recognition. The technique, developed to perform with short unknown utterances, models the spectral traits of a speaker with multiple sub-models rather than using a single statistical distribution as done with previous approaches. The recognition is based on the statistical distribution of the distances between the unknown speaker and each of the speaker models. Only frames that are close to one of the speaker's sub-models are considered in the recognition decision, so that speech events not encountered in the training data do not bias the recognition. The technique has been tested on a conversational data base. Models were generated using 100 s of speech from each of 11 male talkers. Unknown speech was obtained one week after the model data. Recognition accuracies of 96%, 87%, and 79% were obtained for unknown speech durations of 10, 5, and 3 s, respectively. The use of multiple sub-models to characterize spectral traits results in improved discrimination between speakers, particularly when short speech segments are recognized. [Work supported by U. S. Air Force, Rome Air Development Center.]
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.