David A. Moses scite author profile

A decade after the first successful attempt to decode speech directly from human brain signals, accuracy and speed remain far below that of natural speech or typing. Here we show how to achieve high accuracy from the electrocorticogram at natural-speech rates, even with few data (on the order of half an hour of spoken speech). Taking a cue from recent advances in machine translation and automatic speech recognition, we train a recurrent neural network to map neural signals directly to word sequences (sentences). In particular, the network first encodes a sentence-length sequence of neural activity into an abstract representation, and then decodes this representation, word by word, into an English sentence. For each participant, training data consist of several spoken repeats of a set of some 30-50 sentences, along with the corresponding neural signals at each of about 250 electrodes distributed over peri-Sylvian speech cortices. Average word error rates across a validation (held-out) sentence set are as low as 7% for some participants, as compared to the previous state of the art of greater than 60%. Finally, we show how to use transfer learning to overcome limitations on data availability: Training certain components of the network under multiple participants' data, while keeping other components (e.g., the first hidden layer) "proprietary," can improve decoding performance-despite very different electrode coverage across participants.

show abstract

Real-time decoding of question-and-answer speech dialogue using human cortical activity

Moses

et al. 2019

View full text Add to dashboard Cite

Natural communication often occurs in dialogue, differentially engaging auditory and sensorimotor brain regions during listening and speaking. However, previous attempts to decode speech directly from the human brain typically consider listening or speaking tasks in isolation. Here, human participants listened to questions and responded aloud with answers while we used high-density electrocorticography (ECoG) recordings to detect when they heard or said an utterance and to then decode the utterance’s identity. Because certain answers were only plausible responses to certain questions, we could dynamically update the prior probabilities of each answer using the decoded question likelihoods as context. We decode produced and perceived utterances with accuracy rates as high as 61% and 76%, respectively (chance is 7% and 20%). Contextual integration of decoded question likelihoods significantly improves answer decoding. These results demonstrate real-time decoding of speech in an interactive, conversational setting, which has important implications for patients who are unable to communicate.

show abstract

Machine translation of cortical activity to text with an encoder-decoder framework

Makin

Moses

Chang

2019

Preprint

103

View full text Add to dashboard Cite

AbstractA decade after the first successful attempt to decode speech directly from human brain signals, accuracy and speed remain far below that of natural speech or typing. Here we show how to achieve high accuracy from the electrocorticogram at natural-speech rates, even with few data (on the order of half an hour of spoken speech). Taking a cue from recent advances in machine translation and automatic speech recognition, we train a recurrent neural network to map neural signals directly to word sequences (sentences). In particular, the network first encodes a sentence-length sequence of neural activity into an abstract representation, and then decodes this representation, word by word, into an English sentence. For each participant, training data consist of several spoken repeats of a set of some 30-50 sentences, along with the corresponding neural signals at each of about 250 electrodes distributed over peri-Sylvian speech cortices. Average word error rates across a validation (held-out) sentence set are as low as 7% for some participants, as compared to the previous state of the art of greater than 60%. Finally, we show how to use transfer learning to overcome limitations on data availability: Training certain components of the network under multiple participants’ data, while keeping other components (e.g., the first hidden layer) “proprietary,” can improve decoding performance—despite very different electrode coverage across participants.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

David A. Moses

Neuroprosthesis for Decoding Speech in a Paralyzed Person with Anarthria

Machine translation of cortical activity to text with an encoder–decoder framework

Real-time decoding of question-and-answer speech dialogue using human cortical activity

Machine translation of cortical activity to text with an encoder-decoder framework

Contact Info

Product

Resources

About