Speech Fine Structure Contains Critical Temporal Cues to Support Speech Segmentation

Teng, Xiangbin; Cogan, Gregory B.; Poeppel, David

doi:10.1101/508358

Cited by 7 publications

(9 citation statements)

References 74 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Our previous work (Teng, Tian, Doelling, & Poeppel, 2018b) showed that the onset responses were modulated by different frequency modulation spectra even though the same ramping windows were added to the stimuli of different 1/f modulation spectra. This interesting difference between the current experiment and Teng et al, 2018 suggests that not only the shape of amplitude envelopes but also spectral details of sounds significantly modulate auditory evoked responses (Oganian & Chang, 2018;Teng, Cogan, & Poeppel, 2018a).…”

Section: Onset/offset Responses and Induced Power Do Not Significantlcontrasting

confidence: 88%

Modulation Spectra Capture EEG Responses to Speech Signals and Drive Distinct Temporal Response Functions

Teng

Meng

Poeppel

2020

eNeuro

Self Cite

View full text Add to dashboard Cite

Speech signals have an unique shape of long-term modulation spectrum that is distinct from environmental noise, music, and non-speech vocalizations. Does the human auditory system adapt to the speech long-term modulation spectrum and efficiently extract critical information from speech signals? To answer this question, we tested whether neural responses to speech signals can be captured by specific modulation spectra of non-speech acoustic stimuli. We generated amplitude modulated (AM) noise with the speech modulation spectrum and 1/f modulation spectra of different exponents to imitate temporal dynamics of different natural sounds. We presented these AM stimuli and a 10-minute piece of natural speech to 19 human participants undergoing electroencephalography (EEG) recording. We derived temporal response functions (TRF) to the AM stimuli of different spectrum shapes and found distinct neural dynamics for each type of TRFs. We then used the TRFs of AM stimuli to predict neural responses to the speech signals, and found that 1) the TRFs of AM modulation spectra of exponents 1, 1.5 and 2 preferably captured EEG responses to speech signals in the delta band and 2) the theta neural band of speech neural responses can be captured by the AM stimuli of an exponent of 0.75. Our results suggest that the human auditory system shows specificity to the long-term modulation spectrum and is equipped with characteristic neural algorithms tailored to extract critical acoustic information from speech signals. 3 Significant Statement Speech signals have an unique long-term modulation spectrum shape that differs speech from other natural sounds. Does the human auditory system adapt to the speech long-term modulation spectrum and efficiently extract critical information from speech signals? To answer this question, we generated aritificial sounds with various modulation spectra and examined whether neural encoding models derived from specific modulation spectra can better explain neural responses to speech signals than others. We found that the modulation spectra with the exponents that are close to the speech modulation spectrum preferably captured EEG responses to speech signals than others. Our results suggest that the human auditory system shows high sensititity to the long-term modulation spectrum specific to speech signals.

show abstract

Section: Onset/offset Responses and Induced Power Do Not Significantlcontrasting

confidence: 88%

Modulation Spectra Capture EEG Responses to Speech Signals and Drive Distinct Temporal Response Functions

Teng

Meng

Poeppel

2020

eNeuro

Self Cite

View full text Add to dashboard Cite

show abstract

“…The representation of speech in the brain is often examined by measuring the alignment of rhythmic brain activity to the acoustic envelope of the signal ( Ahissar et al., 2001 ; Ding et al., 2014 ; Giraud and Poeppel, 2012 ; Gross et al., 2013 ; Kayser et al., 2015 ; Oganian and Chang, 2019 ; Teng et al., 2019 ; Teoh et al., 2019 ). To quantify this alignment, many studies rely on the overall or broadband acoustic envelope, which describes the amplitude fluctuation of the signal across the full spectral range and which provides a convenient and low-dimensional representation for data analysis.…”

Section: Discussionmentioning

confidence: 99%

Delta/theta band EEG differentially tracks low and high frequency speech-derived envelopes

Bröhl

Kayser

2021

NeuroImage

View full text Add to dashboard Cite

Highlights Delta/theta band EEG tracks band-limited speech-derived envelopes similar to real speech. Low and high frequency speech-derived envelopes are represented differentially. High-frequency derived envelopes are more susceptible to attentional and contextual manipulations. Delta band tracking shifts towards low frequency derived envelopes with more acoustic detail.

show abstract

“…A higher reconstruction accuracy between EEG responses and the speech envelopes was found in the delta and theta bands than that in the higher frequency bands, which was consistent with the literature (e.g., Ding and Simon, 2013 ; Di Liberto et al, 2015 ). However, speech features in the time and spectral domain could all affect the speech perception and corresponding cortical responses (e.g., Biesmans et al, 2016 ; Teng et al, 2019 ). Future studies could systematically analyze how cortical responses track the speech features at different auditory-inspired narrow bands to better simulate the processing in the auditory peripheral and central systems.…”

Section: Discussionmentioning

confidence: 99%

Robust EEG-Based Decoding of Auditory Attention With High-RMS-Level Speech Segments in Noisy Conditions

Wang

Chen

2020

Front. Hum. Neurosci.

View full text Add to dashboard Cite

The attended speech stream can be detected robustly, even in adverse auditory scenarios with auditory attentional modulation, and can be decoded using electroencephalographic (EEG) data. Speech segmentation based on the relative root-mean-square (RMS) intensity can be used to estimate segmental contributions to perception in noisy conditions. High-RMS-level segments contain crucial information for speech perception. Hence, this study aimed to investigate the effect of high-RMS-level speech segments on auditory attention decoding performance under various signal-to-noise ratio (SNR) conditions. Scalp EEG signals were recorded when subjects listened to the attended speech stream in the mixed speech narrated concurrently by two Mandarin speakers. The temporal response function was used to identify the attended speech from EEG responses of tracking to the temporal envelopes of intact speech and high-RMS-level speech segments alone, respectively. Auditory decoding performance was then analyzed under various SNR conditions by comparing EEG correlations to the attended and ignored speech streams. The accuracy of auditory attention decoding based on the temporal envelope with high-RMS-level speech segments was not inferior to that based on the temporal envelope of intact speech. Cortical activity correlated more strongly with attended than with ignored speech under different SNR conditions. These results suggest that EEG recordings corresponding to high-RMS-level speech segments carry crucial information for the identification and tracking of attended speech in the presence of background noise. This study also showed that with the modulation of auditory attention, attended speech can be decoded more robustly from neural activity than from behavioral measures under a wide range of SNR.

show abstract

Speech Fine Structure Contains Critical Temporal Cues to Support Speech Segmentation

Cited by 7 publications

References 74 publications

Modulation Spectra Capture EEG Responses to Speech Signals and Drive Distinct Temporal Response Functions

Modulation Spectra Capture EEG Responses to Speech Signals and Drive Distinct Temporal Response Functions

Delta/theta band EEG differentially tracks low and high frequency speech-derived envelopes

Robust EEG-Based Decoding of Auditory Attention With High-RMS-Level Speech Segments in Noisy Conditions

Contact Info

Product

Resources

About