Neural speech recognition: continuous phoneme decoding using spatiotemporal representations of human cortical activity

Moses, David A.; Mesgarani, Nima; Leonard, Matthew K.; Chang, Edward F.

doi:10.1088/1741-2560/13/5/056004

Cited by 80 publications

(82 citation statements)

References 59 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Our results demonstrate intelligible speech synthesis from ECoG during both audible and silently mimed speech production. Previous strategies for neural decoding of speech have primarily focused on direct classification of speech segments like phonemes or words 30,31,32,33 . However, these demonstrations have been limited in their ability to scale to larger vocabulary sizes and communication rates.…”

Section: Discussionmentioning

confidence: 99%

Intelligible speech synthesis from neural decoding of spoken sentences

Anumanchipalli

Chartier

Chang

2018

Preprint

View full text Add to dashboard Cite

17The ability to read out, or decode, mental content from brain activity has significant 18 practical and scientific implications 1 . For example, technology that translates cortical 19 activity into speech would be transformative for people unable to communicate as a result 20 of neurological impairment 2,3,4 . Decoding speech from neural activity is challenging 21 because speaking requires extremely precise and dynamic control of multiple vocal tract 22 articulators on the order of milliseconds. Here, we designed a neural decoder that 23 explicitly leverages the continuous kinematic and sound representations encoded in 24 cortical activity 5,6 to generate fluent and intelligible speech. A recurrent neural network 25 first decoded vocal tract physiological signals from direct cortical recordings, and then 26 transformed them to acoustic speech output. Robust decoding performance was achieved 27 with as little as 25 minutes of training data. Naïve listeners were able to accurately 28 2 identify these decoded sentences. Additionally, speech decoding was not only effective 29 for audibly produced speech, but also when participants silently mimed speech. These 30 results advance the development of speech neuroprosthetic technology to restore spoken 31 communication in patients with disabling neurological disorders. 32 33Text 34 Neurological conditions that result in the loss of communication are devastating. 35Many patients rely on alternative communication devices that measure residual nonverbal 36 movements of the head or eyes 7 , or even direct brain activity 8,9 , to control a cursor to 37 select letters one-by-one to spell out words. While these systems dramatically enhance a 38 patient's quality of life, most users struggle to transmit more than 10 words/minute 10 , a 39 rate far slower than the average of 150 words/min in natural speech. A major hurdle is 40 how to overcome the constraints of current spelling-based approaches to enable far higher 41 communication rates. 42A promising alternative to spelling-based approaches is to directly synthesize 43 speech 11,12 . Spelling is a sequential concatenation of discrete letters, whereas speech is 44 produced from a fluid stream of overlapping, multi-articulator vocal tract movements 13 . 45 For this reason, a biomimetic approach that focuses on vocal tract movements and the 46 sounds they produce may be the only means to achieve the high communication rates of 47

show abstract

Section: Discussionmentioning

confidence: 99%

Intelligible speech synthesis from neural decoding of spoken sentences

Anumanchipalli

Chartier

Chang

2018

Preprint

View full text Add to dashboard Cite

show abstract

“…Multiple studies have demonstrated the relevance of this frequency band for examining neural mechanisms of auditory cortical processing (e.g., Crone et al, 2001, 2006; Brugge et al, 2009; Edwards et al, 2009; Mesgarani and Chang, 2012; Steinschneider et al, 2014; Nourski and Howard, 2015). High gamma activity has been directly related to acoustic-phonemic transformations at the level of the STG, which would be a key process required for tasks used in the present study (Mesgarani et al, 2014; Moses et al, 2016). Further, functional neuroimaging studies have demonstrated a positive correlation between high gamma activity and hemodynamic responses (Nir et al, 2007; Whittingstall and Logothetis, 2009).…”

Section: Introductionmentioning

confidence: 85%

Intracranial Electrophysiology of Auditory Selective Attention Associated with Speech Classification Tasks

Nourski

Steinschneider

Rhone

et al. 2017

Front. Hum. Neurosci.

View full text Add to dashboard Cite

Auditory selective attention paradigms are powerful tools for elucidating the various stages of speech processing. This study examined electrocorticographic activation during target detection tasks within and beyond auditory cortex. Subjects were nine neurosurgical patients undergoing chronic invasive monitoring for treatment of medically refractory epilepsy. Four subjects had left hemisphere electrode coverage, four had right coverage and one had bilateral coverage. Stimuli were 300 ms complex tones or monosyllabic words, each spoken by a different male or female talker. Subjects were instructed to press a button whenever they heard a target corresponding to a specific stimulus category (e.g., tones, animals, numbers). High gamma (70–150 Hz) activity was simultaneously recorded from Heschl’s gyrus (HG), superior, middle temporal and supramarginal gyri (STG, MTG, SMG), as well as prefrontal cortex (PFC). Data analysis focused on: (1) task effects (non-target words in tone detection vs. semantic categorization task); and (2) target effects (words as target vs. non-target during semantic classification). Responses within posteromedial HG (auditory core cortex) were minimally modulated by task and target. Non-core auditory cortex (anterolateral HG and lateral STG) exhibited sensitivity to task, with a smaller proportion of sites showing target effects. Auditory-related areas (MTG and SMG) and PFC showed both target and, to a lesser extent, task effects, that occurred later than those in the auditory cortex. Significant task and target effects were more prominent in the left hemisphere than in the right. Findings demonstrate a hierarchical organization of speech processing during auditory selective attention.

show abstract

“…We recorded the local field potential from each electrode, notch-filtered the signal at 60 Hz and harmonics (120 Hz and 180 Hz) to reduce line-noise related artifacts, and re-referenced to the common average across channels sharing the same connector to the preamplifier (Cheung et al, 2016). We then used the log-analytic amplitude of the Hilbert transform to bandpass signals in the high gamma range (70-150 Hz), using 8 logarithmically-spaced center frequency bands and taking using first principal component across these bands to extract stimulus-related neural activity (Edwards et al, 2009;Moses et al, 2016;Ray and Maunsell, 2011). High gamma signals were then downsampled to 100 Hz for further analysis.…”

Section: Neural Recordingsmentioning

confidence: 99%

Parallel streams define the temporal dynamics of speech processing across human auditory cortex

Hamilton

Edwards

Chang

2016

Preprint

View full text Add to dashboard Cite

To derive meaning from speech, we must extract multiple dimensions of concurrent information from incoming speech signals, including phonetic and prosodic cues. Equally important is the detection of acoustic cues that give structure and context to the information we hear, such as sentence boundaries. How the brain organizes this information processing is unknown. Here, using data-driven computational methods on an extensive set of high-density intracranial recordings, we reveal a large-scale partitioning of the entire human speech cortex into two spatially distinct regions that detect important cues for parsing natural speech. These caudal (Zone 1) and rostral (Zone 2) regions work in parallel to detect onsets and prosodic information, respectively, within naturally spoken sentences. In contrast, local processing within each region supports phonetic feature encoding. These findings demonstrate a fundamental organizational property of the human auditory cortex that has been previously unrecognized.

show abstract

Neural speech recognition: continuous phoneme decoding using spatiotemporal representations of human cortical activity

Cited by 80 publications

References 59 publications

Intelligible speech synthesis from neural decoding of spoken sentences

Intelligible speech synthesis from neural decoding of spoken sentences

Intracranial Electrophysiology of Auditory Selective Attention Associated with Speech Classification Tasks

Parallel streams define the temporal dynamics of speech processing across human auditory cortex

Contact Info

Product

Resources

About