Thomas Hueber scite author profile

International audienceThe possibility of speech processing in the absence of an intelligible acoustic signal has given rise to the idea of a 'silent speech' interface, to be used as an aid for the speech-handicapped, or as part of a communications system operating in silence-required or high-background-noise environments. The article first outlines the emergence of the silent speech interface from the fields of speech production, automatic speech processing, speech pathology research, and telecommunications privacy issues, and then follows with a presentation of demonstrator systems based on seven different types of technologies. A concluding section underlining some of the common challenges faced by silent speech interface researchers, and ideas for possible future directions, is also provided

show abstract

Development of a silent speech interface driven by ultrasound and optical images of the tongue and lips

Hueber

Benaroya

Chollet

et al. 2010

Speech Communication

157

114

View full text Add to dashboard Cite

This article presents a segmental vocoder driven by ultrasound and optical images (standard CCD camera) of the tongue and lips for a "silent speech interface" application, usable either by a laryngectomized patient or for silent communication. The system is built around an audiovisual dictionary which associates visual to acoustic observations for each phonetic class. Visual features are extracted from ultrasound images of the tongue and from video images of the lips using a PCA-based image coding technique. Visual observations of each phonetic class are modeled by continuous HMMs. The system then combines a phone recognition stage with corpus-based synthesis. In the recognition stage, the visual HMMs are used to identify phonetic targets in a sequence of visual features. In the synthesis stage, these phonetic targets constrain the dictionary search for the sequence of diphones that maximizes similarity to the input test data in the visual space, subject to a concatenation cost in the acoustic domain. A prosody template is extracted from the training corpus, and the final speech waveform is generated using "Harmonic plus Noise Model" concatenative synthesis techniques. Experimental results are based on an audiovisual database containing one hour of continuous speech from each of two speakers.

show abstract

Biosignal-Based Spoken Communication: A Survey

Schultz

Wand

Hueber

et al. 2017

IEEE/ACM Trans. Audio Speech Lang. Process.

155

109

View full text Add to dashboard Cite

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Thomas Hueber

Silent speech interfaces

Development of a silent speech interface driven by ultrasound and optical images of the tongue and lips

Biosignal-Based Spoken Communication: A Survey

Contact Info

Product

Resources

About