Recent work has shown that LSTMs trained on a generic language modeling objective capture syntax-sensitive generalizations such as longdistance number agreement. We have however no mechanistic understanding of how they accomplish this remarkable feat. Some have conjectured it depends on heuristics that do not truly take hierarchical structure into account. We present here a detailed study of the inner mechanics of number tracking in LSTMs at the single neuron level. We discover that longdistance number information is largely managed by two "number units". Importantly, the behaviour of these units is partially controlled by other units independently shown to track syntactic structure. We conclude that LSTMs are, to some extent, implementing genuinely syntactic processing mechanisms, paving the way to a more general understanding of grammatical encoding in LSTMs.
1Reading is a rapid, distributed process that engages multiple components of the 2 ventral visual stream. However, the neural constituents and their interactions that allow 3 us to identify written words are not well understood. Using direct intracranial recordings 4 in a large cohort of humans, we comprehensively isolated the spatiotemporal 5 dynamics of visual word recognition across the entire left ventral occipitotemporal 6 cortex. The mid-fusiform cortex is the first region that is sensitive to word identity and 7 to both sub-lexical and lexical frequencies. Its activation, response latency and 8 amplitude, are highly dependent on the statistics of natural language. Information 9 about lexicality and word frequency propagates posteriorly from this region to 10 traditional visual word form regions and to earlier visual cortex. This unique sensitivity 11 of mid-fusiform cortex to the lexical characteristics of written words points to its central 12 role as an orthographic lexicon, which accesses the long-term memory 13representations of visual word forms. 14
Woodhead et al., 2014) to enable rapid orthographic-lexical-semantic transformations. 36While most of our knowledge of the cortical architecture of reading arises from 37 functional MRI, the rapid speed of reading demands that we use methods with very 38 high spatiotemporal resolution to study these processes. To this end, we used 39 recordings in 35 individuals with 784 intracranial electrodes, to comprehensively 40 characterize the spatial organization and functional roles of orthographic and lexical 41 regions across the ventral visual pathway during sub-lexical and lexical processes. 42Given their construction, these two tasks, performed in the same cohort, tap into 43 varying levels of attentional modulation of orthographic processing. Specifically, we 44 isolated functionally distinct regions across the vOTC that are highly sensitive to the 45 structure and statistics of natural language at multiple stages of orthographic 46 processing. 47
Sentence comprehension requires inferring, from a sequence of words, the structure of syntactic relationships that bind these words into a semantic representation. Our limited ability to build some specific syntactic structures, such as nested center-embedded clauses (e.g., "The dog that the cat that the mouse bit chased ran away"), suggests a striking capacity limitation of sentence processing, and thus offers a window to understand how the human brain processes sentences. Here, we review the main hypotheses proposed in psycholinguistics to explain such capacity limitation. We then introduce an alternative approach, derived from our recent work on artificial neural networks optimized for language modeling, and predict that capacity limitation derives from the emergence of sparse and feature-specific syntactic units. Unlike psycholinguistic theories, our neural network-based framework provides precise capacity-limit predictions without making any a priori assumptions about the form of the grammar or parser. Finally, we discuss how our framework may clarify the mechanistic underpinning of language processing and its limitations in the human brain.
A sentence is more than the sum of its words: its meaning depends on how they combine with one another. The brain mechanisms underlying such semantic composition remain poorly understood. To shed light on the neural vector code underlying semantic composition, we introduce two hypotheses: First, the intrinsic dimensionality of the space of neural representations should increase as a sentence unfolds, paralleling the growing complexity of its semantic representation, and second, this progressive integration should be reflected in ramping and sentence-final signals. To test these predictions, we designed a dataset of closely matched normal and Jabberwocky sentences (composed of meaningless pseudo words) and displayed them to deep language models and to 11 human participants (5 men and 6 women) monitored with simultaneous magneto-encephalography and intracranial electro-encephalography. In both deep language models and electrophysiological data, we found that representational dimensionality was higher for meaningful sentences than Jabberwocky. Furthermore, multivariate decoding of normal versus Jabberwocky confirmed three dynamic patterns: (i) a phasic pattern following each word, peaking in temporal and parietal areas, (ii) a ramping pattern, characteristic of bilateral inferior and middle frontal gyri, and (iii) a sentence-final pattern in left superior frontal gyrus and right orbitofrontal cortex. These results provide a first glimpse into the neural geometry of semantic integration and constrain the search for a neural code of linguistic composition.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.