Perceptual Doping: An Audiovisual Facilitation Effect on Auditory Speech Processing, From Phonetic Feature Extraction to Sentence Identification in Noise

Moradi, Shahram; Lidestam, Björn; Ng, Elaine Hoi Ning; Danielsson, Henrik; Rönnberg, Jerker

doi:10.1097/aud.0000000000000616

Cited by 10 publications

(11 citation statements)

References 95 publications

(137 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…This short-term alteration results in enhanced correlates of auditory selective spatial attention. Focusing on the very limited time necessary to improve neural and behavioral performance, the results by Hanenberg et al are in accord with the independent study of Moradi et al ( 2019 )—Brief exposure to audiovisual stimulus materials improves performance on auditory perception tasks. This finding generalizes to tasks ranging from the simple auditory gating of vowels and consonants to sentence perception in noise.…”

Section: Trainingsupporting

confidence: 73%

See 1 more Smart Citation

Editorial: Cognitive hearing science: Investigating the relationship between selective attention and brain activity

et al. 2022

Self Cite

View full text Add to dashboard Cite

Section: Trainingsupporting

confidence: 73%

“…This finding generalizes to tasks ranging from the simple auditory gating of vowels and consonants to sentence perception in noise. Moradi et al ( 2019 ) dub this phenomenon “perceptual doping”. The brain seems to have the power to rapidly re-calibrate the perceptual systems dealing with speech.…”

Section: Trainingmentioning

confidence: 99%

Editorial: Cognitive hearing science: Investigating the relationship between selective attention and brain activity

et al. 2022

Self Cite

View full text Add to dashboard Cite

“…Also related to RAMBPHO, we focused on a special form of priming. We dubbed the hypothesis “perceptual doping” ( Lidestam et al, 2014 ; Moradi et al, 2019 ). In brief, the priming effects of exposure to two initial conditions (auditory only, or audio-visually presented materials) on later auditory perception of consonants, vowels, and sentence materials generally demonstrated a multimodal facilitation (“doping”) effect.…”

Section: Comparison With Other Modelsmentioning

confidence: 99%

“…This seems to be rather close to stream segregation, but the theoretical languages differ. By inference, postdiction may then calibrate the selection of the auditory object, comparable to “perceptual doping” ( Moradi et al, 2019 ).…”

Section: Comparison With Other Modelsmentioning

confidence: 99%

The cognitive hearing science perspective on perceiving, understanding, and remembering language: The ELU model

et al. 2022

Self Cite

View full text Add to dashboard Cite

The review gives an introductory description of the successive development of data patterns based on comparisons between hearing-impaired and normal hearing participants’ speech understanding skills, later prompting the formulation of the Ease of Language Understanding (ELU) model. The model builds on the interaction between an input buffer (RAMBPHO, Rapid Automatic Multimodal Binding of PHOnology) and three memory systems: working memory (WM), semantic long-term memory (SLTM), and episodic long-term memory (ELTM). RAMBPHO input may either match or mismatch multimodal SLTM representations. Given a match, lexical access is accomplished rapidly and implicitly within approximately 100–400 ms. Given a mismatch, the prediction is that WM is engaged explicitly to repair the meaning of the input – in interaction with SLTM and ELTM – taking seconds rather than milliseconds. The multimodal and multilevel nature of representations held in WM and LTM are at the center of the review, being integral parts of the prediction and postdiction components of language understanding. Finally, some hypotheses based on a selective use-disuse of memory systems mechanism are described in relation to mild cognitive impairment and dementia. Alternative speech perception and WM models are evaluated, and recent developments and generalisations, ELU model tests, and boundaries are discussed.

show abstract

“…Mel Frequency Cepstral Coefficients (MFCC), Linear Predictive Cepstral Coefficients Coefficient (LPCC), etc. are often used spectral feature parameters [12]. Prior to deep learning, which is constrained by algorithms, MFCC had a high degree of discrimination, making it the standard method for automatic voice recognition.…”

Section: Extracting Audio Featuresmentioning

confidence: 99%

Emotion Recognition of College Students Based on Audio and Video Image

Zhu¹,

Ding²,

Xue³

2022

View full text Add to dashboard Cite

Emotional problems are common among contemporary college students. To improve their mental health, it is urgent to quickly identify college students' negative emotions, and guide them to improve their emotional development. Students' emotions are expressed through multiple modalities, such as audio, facial expressions, and gestures. Using the complementarity between multi-modal emotional information can improve the accuracy of emotion recognition. This paper proposes a multi-modal emotion recognition method for voice and video images based on deep learning: (1) For voice modal recognition, the voice is firstly preprocessed to extract voice emotional features, and then the attention-based longshort-term memory network (LSTM) is adopted for emotion recognition; (2) For video image modal recognition, the extended local binary pattern (LBP) operator is used to calculate the image features, LBP block weighting and multi-scale partitioning are combined for feature extraction, principal component analysis (PCA) is adopted to reduce the dimensionality of eigenvectors, and the VGG-16 network model is constructed with the transfer learning training strategy to realize emotion recognition. (3) The voice and video image emotions recognized by single-modal recognitions are weighed and subjected to feature fusion at the decision-making layer, and used to classify multi-modal emotions. Experimental results show that on the test set of the cheavd2.0 Chinese emotion database, the recognition accuracy of our multi-modal fusion recognition algorithm is better than the single-modal recognition methods.

show abstract

Perceptual Doping: An Audiovisual Facilitation Effect on Auditory Speech Processing, From Phonetic Feature Extraction to Sentence Identification in Noise

Cited by 10 publications

References 95 publications

Editorial: Cognitive hearing science: Investigating the relationship between selective attention and brain activity

Editorial: Cognitive hearing science: Investigating the relationship between selective attention and brain activity

The cognitive hearing science perspective on perceiving, understanding, and remembering language: The ELU model

Emotion Recognition of College Students Based on Audio and Video Image

Contact Info

Product

Resources

About