Lorenz Diener scite author profile

Neural interfaces that directly produce intelligible speech from brain activity would allow people with severe impairment from neurological disorders to communicate more naturally. Here, we record neural population activity in motor, premotor and inferior frontal cortices during speech production using electrocorticography (ECoG) and show that ECoG signals alone can be used to generate intelligible speech output that can preserve conversational cues. To produce speech directly from neural data, we adapted a method from the field of speech synthesis called unit selection, in which units of speech are concatenated to form audible output. In our approach, which we call Brain-To-Speech, we chose subsequent units of speech based on the measured ECoG activity to generate audio waveforms directly from the neural recordings. Brain-To-Speech employed the user's own voice to generate speech that sounded very natural and included features such as prosody and accentuation. By investigating the brain areas involved in speech production separately, we found that speech motor cortex provided more information for the reconstruction process than the other cortical areas.

show abstract

EMG-to-Speech: Direct Generation of Speech From Facial Electromyographic Signals

Janke

Diener

2017

IEEE/ACM Trans. Audio Speech Lang. Process.

View full text Add to dashboard Cite

SummaryFor several years, alternative speech communication techniques have been examined, which are based solely on the articulatory muscle signals instead of the acoustic speech signal. Since these approaches also work with completely silent articulated speech, several advantages arise: the signal is not corrupted by background noise, bystanders are not disturbed, as well as assistance to people who have lost their voice, e.g. due to accident or due to disease of the larynx.The general objective of this work is the design, implementation, improvement and evaluation of a system that uses surface electromyographic (EMG) signals and directly synthesizes an audible speech output: EMG-to-speech. The electrical potentials of the articulatory muscles are recorded by small electrodes on the surface of the face and neck. An analysis of these signals allows interpretations on the movements of the articulatory apparatus and in turn on the spoken speech itself.An approach for creating an acoustic signal from the EMG-signal is the usage of techniques from automatic speech recognition. Here, a textual output is produced, which in turn is further processed by a text-to-speech synthesis component. However, this approach is difficult resulting from challenges in the speech recognition part, such as the restriction to a given vocabulary or recognition errors of the system. This thesis investigates the possibility to convert the recorded EMG signal directly into a speech signal, without being bound to a limited vocabulary or other limitations from an speech recognition component. Different approaches for the conversion are being pursued, real-time capable systems are implemented, evaluated and compared.For training a statistical transformation model, the EMG signals and the acoustic speech are captured simultaneously and relevant characteristics in terms of features are extracted. The acoustic speech data is only required as a reference for the training, thus the actual application of the transformation can take place using solely the EMG data. A feature mapping is accomplished ii Summary by a model that estimates the relationship between muscle activity patterns and speech sound components. From the speech components the final audible voice signal is synthesized. This approach is based on a source-filter model of speech: The fundamental frequency is overlaid with the spectral information (Mel Cepstrum), which reflects the vocal tract, to generate the final speech signal.To ensure a natural voice output, the usage of the fundamental frequency for prosody generation is of great importance. To bridge the gap between normal speech (with fundamental frequency) and silent speech (no speech signal at all), whispered speech recordings are investigated as an intermediate step.In whispered speech no fundamental frequency exists and accordingly the generation of prosody is possible, but difficult.This thesis examines and evaluates the following three mapping methods for feature conversion:1. Gaussian Mapping: A statistical method that trai...

show abstract

Real-time synthesis of imagined speech processes from minimally invasive recordings of neural activity

et al. 2021

View full text Add to dashboard Cite

Speech neuroprosthetics aim to provide a natural communication channel to individuals who are unable to speak due to physical or neurological impairments. Real-time synthesis of acoustic speech directly from measured neural activity could enable natural conversations and notably improve quality of life, particularly for individuals who have severely limited means of communication. Recent advances in decoding approaches have led to high quality reconstructions of acoustic speech from invasively measured neural activity. However, most prior research utilizes data collected during open-loop experiments of articulated speech, which might not directly translate to imagined speech processes. Here, we present an approach that synthesizes audible speech in real-time for both imagined and whispered speech conditions. Using a participant implanted with stereotactic depth electrodes, we were able to reliably generate audible speech in real-time. The decoding models rely predominately on frontal activity suggesting that speech processes have similar representations when vocalized, whispered, or imagined. While reconstructed audio is not yet intelligible, our real-time synthesis approach represents an essential step towards investigating how patients will learn to operate a closed-loop speech neuroprosthesis based on imagined speech.

show abstract

Real-time Synthesis of Imagined Speech Processes from Minimally Invasive Recordings of Neural Activity

Angrick

Ottenhoff

Diener

et al. 2020

Preprint

View full text Add to dashboard Cite

Speech neuroprosthetics aim to provide a natural communication channel to individuals who are unable to speak due to physical or neurological impairments. Real-time synthesis of acoustic speech directly from measured neural activity could enable natural conversations and significantly improve quality of life, particularly for individuals who have severely limited means of communication. Recent advances in decoding approaches have led to high quality reconstructions of acoustic speech from invasively measured neural activity. However, most prior research utilizes data collected during open-loop experiments of articulated speech, which neglects the critical human-in-the-loop aspect of a practical speech neuroprosthetic.Here we present an approach that synthesizes audible speech in real-time for both imagined and whispered speech conditions. Using a participant implanted with stereotactic depth electrodes, we were able to reliably generate audible speech in real-time. The decoding models rely predominately on frontal activity suggesting that speech processes have similar representations when vocalized, whispered, or imagined. Our real-time synthesis approach represents an essential step towards investigating how patients will learn to operate a closed-loop speech neuroprosthesis, as well as the development of techniques that incorporate co-adaptation of the user and system for optimized performance.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Lorenz Diener

Generating Natural, Intelligible Speech From Brain Activity in Motor, Premotor, and Inferior Frontal Cortices

EMG-to-Speech: Direct Generation of Speech From Facial Electromyographic Signals

Real-time synthesis of imagined speech processes from minimally invasive recordings of neural activity

Real-time Synthesis of Imagined Speech Processes from Minimally Invasive Recordings of Neural Activity

Contact Info

Product

Resources

About