This paper presents a new approach to text-independent speaker recognition. The technique, developed to perform with short unknown utterances, models the spectral traits of a speaker with multiple sub-models rather than using a single statistical distribution as done with previous approaches. The recognition is based on the statistical distribution of the distances between the unknown speaker and each of the speaker models. Only frames that are close to one of the speaker's sub-models are considered in the recognition decision, so that speech events not encountered in the training data do not bias the recognition. The technique has been tested on a conversational data base. Models were generated using 100 s of speech from each of 11 male talkers. Unknown speech was obtained one week after the model data. Recognition accuracies of 96%, 87%, and 79% were obtained for unknown speech durations of 10, 5, and 3 s, respectively. The use of multiple sub-models to characterize spectral traits results in improved discrimination between speakers, particularly when short speech segments are recognized. [Work supported by U. S. Air Force, Rome Air Development Center.]
Four automatic speaker recognition techniques were investigated with a conunon speech data base to determine their effectiveness in a text independent mode. These four techniques used the correlation of short and long term spectral averages, cepstral measurements of long term spectral averages, orthogonal linear prediction of the speech waveform, and long term average LR reflection coefficients carbined with pitch and overall power. The results of this study indicate that LC derived parameters perform better than do those derived from cepstral and spectral data. Recognition accuracies of 95% and 93% were obtained for LFC based techniques with 13 seconds of unknown speech. The corresponding recognition accuracies for the cepstral and spectral based systems were 79% and 54% respectively. INTJDTION The purpose of this study was to test and caipare four techniques of automatic speaker recognition on a conmon data base. Performance was caipared in a text independent environment on free conversational speech. This paper details the methods used and the results obtained during the study. The four techniques ccnipared are: 1. The correlation of short and long term spectral averages as investigated by S. Pruzansky and M.V. Mathews [11. 2. Cepstral measurements of long term spectral averages as investigated by S. Furui, F. Itakura, and S. Saito [2]. 3. Orthoqonal linear prediction of the speech waveform as investigated by M.R. Sambur [3]. 4. Long term average LPC reflection coefficients, pitch and overall gain of the speech waveform as investigated by
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.