People routinely hear and understand speech at rates of 120-200 words per minute [1, 2]. Thus, speech comprehension must involve rapid, online neural mechanisms that process words' meanings in an approximately time-locked fashion. However, electrophysiological evidence for such time-locked processing has been lacking for continuous speech. Although valuable insights into semantic processing have been provided by the "N400 component" of the event-related potential [3-6], this literature has been dominated by paradigms using incongruous words within specially constructed sentences, with less emphasis on natural, narrative speech comprehension. Building on the discovery that cortical activity "tracks" the dynamics of running speech [7-9] and psycholinguistic work demonstrating [10-12] and modeling [13-15] how context impacts on word processing, we describe a new approach for deriving an electrophysiological correlate of natural speech comprehension. We used a computational model [16] to quantify the meaning carried by words based on how semantically dissimilar they were to their preceding context and then regressed this measure against electroencephalographic (EEG) data recorded from subjects as they listened to narrative speech. This produced a prominent negativity at a time lag of 200-600 ms on centro-parietal EEG channels, characteristics common to the N400. Applying this approach to EEG datasets involving time-reversed speech, cocktail party attention, and audiovisual speech-in-noise demonstrated that this response was very sensitive to whether or not subjects understood the speech they heard. These findings demonstrate that, when successfully comprehending natural speech, the human brain responds to the contextual semantic content of each word in a relatively time-locked fashion.
Speech perception involves the integration of sensory input with expectations based on the context of that speech. Much debate surrounds the issue of whether or not prior knowledge feeds back to affect early auditory encoding in the lower levels of the speech processing hierarchy, or whether perception can be best explained as a purely feedforward process. Although there has been compelling evidence on both sides of this debate, experiments involving naturalistic speech stimuli to address these questions have been lacking. Here, we use a recently introduced method for quantifying the semantic context of speech and relate it to a commonly used method for indexing low-level auditory encoding of speech. The relationship between these measures is taken to be an indication of how semantic context leading up to a word influences how its low-level acoustic and phonetic features are processed. We record EEG from human participants (both male and female) listening to continuous natural speech and find that the early cortical tracking of a word's speech envelope is enhanced by its semantic similarity to its sentential context. Using a forward modeling approach, we find that prediction accuracy of the EEG signal also shows the same effect. Furthermore, this effect shows distinct temporal patterns of correlation depending on the type of speech input representation (acoustic or phonological) used for the model, implicating a top-down propagation of information through the processing hierarchy. These results suggest a mechanism that links top-down prior information with the early cortical entrainment of words in natural, continuous speech.
We introduce an approach that predicts neural representations of word meanings contained in sentences then superposes these to predict neural representations of new sentences. A neurobiological semantic model based on sensory, motor, social, emotional, and cognitive attributes was used as a foundation to define semantic content. Previous studies have predominantly predicted neural patterns for isolated words, using models that lack neurobiological interpretation. Fourteen participants read 240 sentences describing everyday situations while undergoing fMRI. To connect sentence-level fMRI activation patterns to the word-level semantic model, we devised methods to decompose the fMRI data into individual words. Activation patterns associated with each attribute in the model were then estimated using multiple-regression. This enabled synthesis of activation patterns for trained and new words, which were subsequently averaged to predict new sentences. Region-of-interest analyses revealed that prediction accuracy was highest using voxels in the left temporal and inferior parietal cortex, although a broad range of regions returned statistically significant results, showing that semantic information is widely distributed across the brain. The results show how a neurobiologically motivated semantic model can decompose sentence-level fMRI data into activation features for component words, which can be recombined to predict activation patterns for new sentences.
Healthy ageing leads to changes in the brain that impact upon sensory and cognitive processing. It is not fully clear how these changes affect the processing of everyday spoken language. Prediction is thought to play an important role in language comprehension, where information about upcoming words is pre-activated across multiple representational levels. However, evidence from electrophysiology suggests differences in how older and younger adults use context-based predictions, particularly at the level of semantic representation. We investigate these differences during natural speech comprehension by presenting older and younger subjects with continuous, narrative speech while recording their electroencephalogram. We use time-lagged linear regression to test how distinct computational measures of (1) semantic dissimilarity and (2) lexical surprisal are processed in the brains of both groups. Our results reveal dissociable neural correlates of these two measures that suggest differences in how younger and older adults successfully comprehend speech. Specifically, our results suggest that, while younger and older subjects both employ context-based lexical predictions, older subjects are significantly less likely to pre-activate the semantic features relating to upcoming words. Furthermore, across our group of older adults, we show that the weaker the neural signature of this semantic pre-activation mechanism, the lower a subject’s semantic verbal fluency score. We interpret these findings as prediction playing a generally reduced role at a semantic level in the brains of older listeners during speech comprehension and that these changes may be part of an overall strategy to successfully comprehend speech with reduced cognitive resources.
21Embodiment theory predicts that mental imagery of object words recruits neural circuits involved in object per-22 ception. The degree of visual imagery present in routine thought and how it is encoded in the brain is largely un-23 known. We test whether fMRI activity patterns elicited by participants reading objects' names include embodied 24 visual-object representations, and whether we can decode the representations using novel computational image-25 based semantic models. We first apply the image models in conjunction with text-based semantic models to test 26 predictions of visual-specificity of semantic representations in different brain regions. Representational similarity 27 analysis confirms that fMRI structure within ventral-temporal and lateral-occipital regions correlates most 28 strongly with the image models and conversely text models correlate better with posterior-parietal/lateral-tem-29 poral/inferior-frontal regions. We use an unsupervised decoding algorithm that exploits commonalities in repre-30 sentational similarity structure found within both image model and brain data sets to classify embodied visual 31 representations with high accuracy (8/10) and then extend it to exploit model combinations to robustly decode 32 different brain regions in parallel. By capturing latent visual-semantic structure our models provide a route into 33 analyzing neural representations derived from past perceptual experience rather than stimulus-driven brain ac-34 tivity. Our results also verify the benefit of combining multimodal data to model human-
Understanding natural speech requires that the human brain convert complex spectrotemporal patterns of acoustic input into meaning in a rapid manner that is reasonably tightly time-locked to the incoming speech signal. However, neural evidence for such a time-locked process has been lacking. Here, we sought such evidence by using a computational model to quantify the meaning carried by each word based on how semantically dissimilar it was to its preceding context and then regressing this quantity against electroencephalographic (EEG) data recorded from subjects as they listened to narrative speech. This produced a prominent negativity at a time-lag of 200-600 ms on centro-parietal EEG electrodes. Subsequent EEG experiments involving time-reversed speech, cocktail party attention and audiovisual speech-in-noise demonstrated that this response was exquisitely sensitive to whether or not subjects were understanding the speech they heard.These findings demonstrate that, when successfully comprehending natural speech, the human brain encodes meaning as a function of the amount of new information carried by each word in a relatively time-locked fashion.In everyday life, people routinely process heard speech at rates in the range of 120 to 200 words per minute 1,2 . Unlike in the case of reading, listeners typically do not have much control over the rate at which these words are presented and they usually cannot replay the presentation of those words. Thus, successful speech comprehension must involve efficient, online mechanisms in the brain whereby each word is processed in a relatively time-locked fashion. In addition, it is well established that the processing of words does not happen in isolation, but is strongly influenced by the surrounding conversational While these models -and the experiments on which they are based -have greatly deepened our understanding of psycholinguistics, there has been a marked lack of electrophysiological evidence for the time-locked processing of meaning that must underpin natural speech comprehension. This is a shame as an electrophysiological index of such processing would be of great benefit for arbitrating between different psycholinguistic models, and could have important implications for research on language processing in numerous cohorts. Valuable insights into the semantic processing of speech have been provided by the well-known N400 component of the event-related potential 10 . However, the N400 literature has been dominated by paradigms focused on single, usually incongruous, words within specially constructed sentences, and has had much less to say about how ongoing neural activity reflects the computations that underpin natural, narrative speech comprehension. Furthermore, the use of the classic N400 paradigm has made it difficult to fully understand how selective attention and variations in intelligibility affect the semantic processing of speech under naturalistic conditions. . CC-BY-NC-ND 4.0 International license It is made available under a (which was not peer-review...
Convolutional neural networks (CNNs) have emerged as one of the most successful machine learning technologies for image and video processing. The most computationallyintensive parts of CNNs are the convolutional layers, which convolve multi-channel images with multiple kernels. A common approach to implementing convolutional layers is to expand the image into a column matrix (im2col) and perform Multiple Channel Multiple Kernel (MCMK) convolution using an existing parallel General Matrix Multiplication (GEMM) library. This im2col conversion greatly increases the memory footprint of the input matrix and reduces data locality.In this paper we propose a new approach to MCMK convolution that is based on General Matrix Multiplication (GEMM), but not on im2col. Our algorithm eliminates the need for data replication on the input thereby enabling us to apply the convolution kernels on the input images directly. We have implemented several variants of our algorithm on a CPU processor and an embedded ARM processor. On the CPU, our algorithm is faster than im2col in most cases.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.