17The ability to read out, or decode, mental content from brain activity has significant 18 practical and scientific implications 1 . For example, technology that translates cortical 19 activity into speech would be transformative for people unable to communicate as a result 20 of neurological impairment 2,3,4 . Decoding speech from neural activity is challenging 21 because speaking requires extremely precise and dynamic control of multiple vocal tract 22 articulators on the order of milliseconds. Here, we designed a neural decoder that 23 explicitly leverages the continuous kinematic and sound representations encoded in 24 cortical activity 5,6 to generate fluent and intelligible speech. A recurrent neural network 25 first decoded vocal tract physiological signals from direct cortical recordings, and then 26 transformed them to acoustic speech output. Robust decoding performance was achieved 27 with as little as 25 minutes of training data. NaĂŻve listeners were able to accurately 28 2 identify these decoded sentences. Additionally, speech decoding was not only effective 29 for audibly produced speech, but also when participants silently mimed speech. These 30 results advance the development of speech neuroprosthetic technology to restore spoken 31 communication in patients with disabling neurological disorders. 32
33Text 34 Neurological conditions that result in the loss of communication are devastating. 35Many patients rely on alternative communication devices that measure residual nonverbal 36 movements of the head or eyes 7 , or even direct brain activity 8,9 , to control a cursor to 37 select letters one-by-one to spell out words. While these systems dramatically enhance a 38 patient's quality of life, most users struggle to transmit more than 10 words/minute 10 , a 39 rate far slower than the average of 150 words/min in natural speech. A major hurdle is 40 how to overcome the constraints of current spelling-based approaches to enable far higher 41 communication rates. 42A promising alternative to spelling-based approaches is to directly synthesize 43 speech 11,12 . Spelling is a sequential concatenation of discrete letters, whereas speech is 44 produced from a fluid stream of overlapping, multi-articulator vocal tract movements 13 . 45 For this reason, a biomimetic approach that focuses on vocal tract movements and the 46 sounds they produce may be the only means to achieve the high communication rates of 47