Extracting Different Levels of Speech Information from EEG Using an LSTM-Based Model

Monesi, Mohammad Jalilpour; Accou, Bernd; Francart, Tom; hamme, Hugo Van

doi:10.48550/arxiv.2106.09622

Cited by 4 publications

(9 citation statements)

References 0 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…This paradigm can also be solved in a non-linear fashion with neural networks (e.g. Accou et al, 2021; Monesi et al, 2021; Bollens et al, 2022). Accou et al (2021) showed that the accuracy of a neural network solving a match-mismatch task could be used to estimate the speech reception threshold.…”

Section: Methods To Measure Neural Trackingmentioning

confidence: 99%

Neural tracking as a diagnostic tool to assess the auditory pathway

Canneyt

Gillis

Vanthornhout

et al. 2021

Preprint

Self Cite

View full text Add to dashboard Cite

The neural tracking framework enables the analysis of neural responses (EEG) to continuous natural speech, e.g., a story or a podcast. This allows for objective investigation of a range of auditory and linguistic processes in the brain during natural speech perception. This approach is more ecologically valid than traditional auditory evoked responses and has great potential for both research and clinical applications. In this article, we review the neural tracking framework and highlight three prominent examples of neural tracking analyses. This includes the neural tracking of the fundamental frequency of the voice (f0), the speech envelope and linguistic features. Each of these analyses provides a unique point of view into the hierarchical stages of speech processing in the human brain. f0-tracking assesses the encoding of fine temporal information in the early stages of the auditory pathway, i.e. from the auditory periphery up to early processing in the primary auditory cortex. This fundamental processing in (mostly) subcortical stages forms the foundation of speech perception in the cortex. Envelope tracking reflects bottom-up and top-down speech-related processes in the auditory cortex, and is likely necessary but not sufficient for speech intelligibility. To study neural processes more directly related to speech intelligibility, neural tracking of linguistic features can be used. This analysis focuses on the encoding of linguistic features (e.g. word or phoneme surprisal) in the brain. Together these analyses form a multi-faceted and time-effective objective assessment of the auditory and linguistic processing of an individual.

show abstract

Section: Methods To Measure Neural Trackingmentioning

confidence: 99%

Neural tracking as a diagnostic tool to assess the auditory pathway

Canneyt

Gillis

Vanthornhout

et al. 2021

Preprint

Self Cite

View full text Add to dashboard Cite

show abstract

“…Most studies we report here used acoustic features, such as the temporal envelope (e.g., Ciccarelli et al 2019, de Taillez et al 2020, Lu et al 2021, Su et al 2021, Xu et al 2022a, 2022b, or the Mel spectrogram (e.g. Krishna et al 2020, Kuruvila et al 2021, Monesi et al 2021. A study even used the fundamental frequency of the voice -f0-(Puffay et al 2022).…”

Section: Speechmentioning

confidence: 99%

“…de Cheveigné et al 2018, Monesi et al 2020) or more (e.g. Monesi et al 2021, Accou et al 2021a, Puffay et al 2022 speech segment candidates to associate the EEG segment with.…”

Section: Introductionmentioning

confidence: 99%

Relating EEG to continuous speech using deep neural networks: a review

et al. 2023

Self Cite

View full text Add to dashboard Cite

Objective. When a person listens to continuous speech, a corresponding response is elicited in the brain and can be recorded using electroencephalography (EEG). Linear models are presently used to relate the EEG recording to the corresponding speech signal. The ability of linear models to find a mapping between these two signals is used as a measure of neural tracking of speech. Such models are limited as they assume linearity in the EEG-speech relationship, which omits the nonlinear dynamics of the brain. As an alternative, deep learning models have recently been used to relate EEG to continuous speech. Approach. This paper reviews and comments on deep-learning-based studies that relate EEG to continuous speech in single- or multiple-speakers paradigms. We point out recurrent methodological pitfalls and the need for a standard benchmark of model analysis. Main results. We gathered 29 studies. The main methodological issues we found are biased cross-validations, data leakage leading to over-fitted models, or disproportionate data size compared to the model's complexity. In addition, we address requirements for a standard benchmark model analysis, such as public datasets, common evaluation metrics, and good practices for the match-mismatch task. Significance. We present a review paper summarizing the main deep-learning-based studies that relate EEG to speech while addressing methodological pitfalls and important considerations for this newly expanding field. Our study is particularly relevant given the growing application of deep learning in EEG-speech decoding.

show abstract

“…Considering that the power spectrum of the alpha frequency band, which was adopted in [23], may not represent the disorder degree of EEG resulting from the dynamic change of the attended auditory stimuli, the EEG feature extraction method in [23] was modified in two aspects. On the one hand, for the KUL dataset, each EEG segment was decomposed into five classical frequency bands, delta (1-3 Hz), theta (4-7 Hz), alpha (8-13 Hz), beta (14-30 Hz) and gamma (31)(32)(33)(34)(35)(36)(37)(38)(39)(40)(41)(42)(43)(44)(45)(46)(47)(48)(49)(50) [45,46]. It should be noted that for both DTU and PKU datasets, the frequency band of 31-50 Hz was not included because the sampling rates of EEG in these two datasets were set as 70 Hz and 64 Hz, respectively, to keep consistent with those of the baselines.…”

Section: Multi-band De Extractionmentioning

confidence: 99%

“…In [12,34], the common spatial pattern (CSP)-based EEG spatial enhancement strategy was combined with the CNN-based AAD model to improve its performance. In [7,35,36], the long short-term memory (LSTM) was combined with the CNN-based AAD model to improve its performance by adopting LSTM to learn the long-term dependence between EEG responses and auditory stimuli.…”

Section: Introductionmentioning

confidence: 99%

Detecting the locus of auditory attention based on the spectro-spatial-temporal analysis of EEG

Jiang

Chen²,

Jin³

2022

J. Neural Eng.

View full text Add to dashboard Cite

Objective. Auditory Attention Decoding (AAD) determines which speaker the listener is focusing on by analyzing his/her EEG. CNN was adopted to extract Spectro-Spatial-Feature (SSF) from short-time-interval of EEG to detect auditory spatial attention without stimuli. However, the following factors are not considered in SSF-CNN scheme. i) Single-band frequency analysis cannot represent the EEG pattern precisely. ii) The power cannot represent the EEG feature related to the dynamic patterns of the attended auditory stimulus. iii) The temporal feature of EEG representing the relationship between EEG and attended stimulus is not extracted. To solve these problems, SSF-CNN scheme was modified. Approach. i) Multiple-frequency bands, but not a single alpha frequency band, of EEG, were analyzed to represent the EEG pattern more precisely. ii) Differential Entropy (DE), but not power, was extracted from each frequency band to represent the disorder degree of EEG, which was related to the dynamic patterns of the attended auditory stimulus. iii) CNN and Convolutional-Long-Short-Term-Memory (ConvLSTM) were combined to extract spectro-spatial-temporal features from the 3-D descriptor sequence constructed based on the topographical activity maps of multiple-frequency bands. Main results. Experimental results on KUL, DTU, and PKU with 0.1s, 1s, 2s, and 5s decision windows demonstrated that: i) The proposed model outperformed SSF-CNN and state-of-the-art AAD models. Specifically, when the auditory stimulus was unavailable, AAD accuracy could be enhanced by at least 3:25%, 3:96%, and 5:08% on KUL, DTU, and PKU, respectively, compared with the baselines. And, on KUL, the longer decision window corresponded to lower enhancement, while on both DTU and PKU, the longer decision window corresponded to higher enhancement, except for two cases when decision window length was 2s on PKU or 5s on DTU. ii) Each modification contributed to the performance enhancement. Significance. DE feature, multi-band frequency analysis, and ConvLSTM-based temporal analysis help to enhance AAD accuracy.

show abstract

Extracting Different Levels of Speech Information from EEG Using an LSTM-Based Model

Cited by 4 publications

References 0 publications

Neural tracking as a diagnostic tool to assess the auditory pathway

Neural tracking as a diagnostic tool to assess the auditory pathway

Relating EEG to continuous speech using deep neural networks: a review

Detecting the locus of auditory attention based on the spectro-spatial-temporal analysis of EEG

Contact Info

Product

Resources

About