Xavier Menéndez-Pidal scite author profile

The Nemours database is a collection of 814 short nonsense sentences; 74 sentences spoken by each of 11 male speakers with varying degrees of dysarthria. Additionally, the database contains two connected-speech paragraphs produced by each of the 11 speakers. The database was designed to test the intelligibility of dysarthric speech before and after enhancement by various signal processing methods, and is available on CD-ROM. It can also be used to investigate general characteristics of dysarthric speech such as production error patterns. The entire database has been marked at the word level and sentences for 10 of the 11 talkers have been marked at the phoneme level as well. This paper describes the database structure and techniques adopted to improve the performance of a Discrete Hidden Markov Model (DHMM) labeler used to assign initial phoneme labels to the elements of the database. These techniques may be useful in the design of automatic recognition systems for persons with speech disorders, especially when limited amounts of training data are available.

show abstract

Compensation of channel and noise distortions combining normalization and speech enhancement techniques

Menéndez-Pidal¹,

Chen²,

Wu³

et al. 2001

Speech Communication

View full text Add to dashboard Cite

Development and improvement of a real-time ASR system for isolated digits in Spanish over the telephone line

Córdoba¹,

Menéndez-Pidal²,

Guarasa³

et al. 1995

View full text Add to dashboard Cite

Automatic set-up for speech recognition engines based on merit optimization

Hernandez-Abrego

Menéndez-Pidal

Kemp³

et al.

View full text Add to dashboard Cite

Development of the compact English LVCSR acoustic model for embedded entertainment robot applications

Menéndez-Pidal

Patrikar

Olorenshaw

et al. 2007

Int J Speech Technol

View full text Add to dashboard Cite

In this paper we discuss two techniques to reduce the size of the acoustic model while maintaining or improving the accuracy of the recognition engine. The first technique, demiphone modeling, tries to reduce the redundancy existing in a context dependent state-clustered Hidden Markov Model (HMM). Three-state demiphones optimally designed from the triphone decision tree are introduced to drastically reduce the phone space of the acoustic model and to improve system accuracy. The second redundancy elimination technique is a more classical approach based on parameter tying. Similar vectors of variances in each HMM cluster are tied together to reduce the number of parameters. The closeness between the vectors of variances is measured using a Vector Quantizer (VQ) to maintain the information provided by the variances parameters. The paper also reports speech recognition improvements using assign-X. Menéndez-Pidal ( ) ment of variable number Gaussians per cluster and genderbased HMMs. The main motivation behind these techniques is to improve the acoustic model and at the same time lower its memory usage. These techniques may help in reducing memory and improving accuracy of an embedded Large Vocabulary Continuous Speech Recognition (LVCSR) application.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.