Evgeny Karpov scite author profile

Abstract-In speaker identification, most of the computation originates from the distance or likelihood computations between the feature vectors of the unknown speaker and the models in the database. The identification time depends on the number of feature vectors, their dimensionality, the complexity of the speaker models and the number of speakers. In this paper, we concentrate on optimizing vector quantization (VQ) based speaker identification. We reduce the number of test vectors by pre-quantizing the test sequence prior to matching, and the number of speakers by pruning out unlikely speakers during the identification process. The best variants are then generalized to Gaussian mixture model (GMM) based modeling. We apply the algorithms also to efficient cohort set search for score normalization in speaker verification. We obtain a speed-up factor of 16:1 in the case of VQ-based modeling with minor degradation in the identification accuracy, and 34:1 in the case of GMM-based modeling. An equal error rate of 7 % can be reached in 0.84 seconds on average when the length of test utterance is 30.4 seconds.

show abstract

Accuracy of MFCC-Based Speaker Recognition in Series 60 Device

Saastamoinen

Karpov

Hautamäki

et al. 2005

EURASIP J. Adv. Signal Process.

View full text Add to dashboard Cite

A fixed point implementation of speaker recognition based on MFCC signal processing is considered. We analyze the numerical error of the MFCC and its effect on the recognition accuracy. Techniques to reduce the information loss in a converted fixed point implementation are introduced. We increase the signal processing accuracy by adjusting the ratio of presentation accuracy of the operators and the signal. The signal processing error is found out to be more important to the speaker recognition accuracy than the error in the classification algorithm. The results are verified by applying the alternative technique to speech data. We also discuss the specific programming requirements set up by the Symbian and Series 60.

show abstract

Short message dictation on Symbian series 60 mobile phones

Karpov

Kiss

Leppänen

et al. 2006

View full text Add to dashboard Cite

Dictation of natural language text on embedded mobile devices is a challenging task. First, it involves memory and CPU-efficient implementation of robust speech recognition algorithms that are generally resource demanding. Secondly, the acoustic and language models employed in the recognizer require the availability of suitable text and speech language resources, typically for a wide set of languages. Thirdly, a proper design of the UI is also essential. The UI has to provide intuitive and easy means for dictation and error correction, and must be suitable for a mobile usage scenario. In this demonstrator, an embedded speech recognition system for short message (SMS) dictation in US English is presented. The system is running on Nokia Series 60 mobile phones (e.g., N70, E60). The system's vocabulary is 23 thousand words. Its Flash and RAM memory footprints are small, 2 and 2.5 megabytes, respectively. After a short enrollment session, most native speakers can achieve a word accuracy of over 90% when dictating short messages in quiet or moderately noisy environments.

show abstract

A Speaker Pruning Algorithm for Real-Time Speaker Identification

Kinnunen

Karpov

Fränti

2003

View full text Add to dashboard Cite

Abstract. Speaker identification is a computationally expensive task. In this work, we propose an iterative speaker pruning algorithm for speeding up the identification in the context of real-time systems. The proposed algorithm reduces computational load by dropping out unlikely speakers as more data arrives into the processing buffer. The process is repeated until there is just one speaker left in the candidate set. Care must be taken in designing the pruning heuristics, so that the correct speaker will not be pruned. Two variants of the pruning algorithm are presented, and simulations with TIMIT corpus show that an error rate of 10 % can be achieved in 10 seconds for 630 speakers.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Evgeny Karpov

Real-time speaker identification and verification

Accuracy of MFCC-Based Speaker Recognition in Series 60 Device

Short message dictation on Symbian series 60 mobile phones

A Speaker Pruning Algorithm for Real-Time Speaker Identification

Contact Info

Product

Resources

About