This paper presents a novel speaker modeling approachfor speaker recognition systems. The basic idea of this approach consists of deriving the target speaker model from a personalized background model, composed only of the UBM Gaussian components which are really present in the speech of the target speaker. The motivation behind the derivation of speakers' models from personalized background models is to exploit the observeddifference insome acoustic-classes between speakers, in order to improve the performance of speaker recognition systems. The proposed approach was evaluatedfor speaker verification task using various amounts of training and testing speech data. The experimental results showed that the proposed approach is efficientin termsof both verification performance and computational cost during the testing phase of the system, compared to the traditional UBM based speaker recognition systems.
Keyword:Speaker modeling Speaker verification Gaussian mixture models Universal background model Maximum a posteriori (MAP) Personalized background models
The present paper introduces a novel speaker modeling technique for text-independent speaker identification using probabilistic self-organizing maps (PbSOMs). The basic motivation behind the introduced technique was to combine the self-organizing quality of the self-organizing maps and generative power of Gaussian mixture models. Experimental results show that the introduced modeling technique using probabilistic self-organizing maps significantly outperforms the traditional technique using the classical GMMs and the EM algorithm or its deterministic variant. More precisely, a relative accuracy improvement of roughly 39% has been gained, as well as, a much less sensitivity to the model-parameters initialization has been exhibited by using the introduced speaker modeling technique using probabilistic self-organizing maps.
A common limitation of the previous comparative studies on speaker-features extraction techniques lies in the fact that the comparison is done independently of the used speaker modeling technique and its parameters. The aim of the present paper is twofold. Firstly, it aims to review the most significant advancements in feature extraction techniques used for automatic speaker recognition. Secondly, it seeks to evaluate and compare the currently dominant ones using an objective comparison methodology that overcomes the various limitations and drawbacks of the previous comparative studies. The results of the carried out experiments underlines the importance of the proposed comparison methodology.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.