Interactive Multimodal Attention Network for Emotion Recognition in Conversation

Ren, Minjie; Huang, Xiangdong; Shi, Xiaoqi; Nie, Weizhi

doi:10.1109/lsp.2021.3078698

Cited by 17 publications

(9 citation statements)

References 20 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…To verify the effectiveness and validity of the proposed hierarchical interactive IMC system, a series of experiments are carried out by comparing the proposed system with the state-of-the-art methods [34][35][36][37][38][39][40][41][42][43][44][45][46][47] for video sentiment analysis. Notably, the references are selected according to the following four criteria, i.e., content relevance, total cited times, journal academic impact, and timeliness.…”

Section: Application In Video Sentiment Analysismentioning

confidence: 99%

A Brain-Inspired Hierarchical Interactive In-Memory Computing System and Its Application in Video Sentiment Analysis

Dong

Han

et al. 2023

IEEE Trans. Circuits Syst. Video Technol.

View full text Add to dashboard Cite

Video sentiment analysis can effectively establish the relationship between the emotion state and the multimodal information, while still suffer from intensive computation and low efficiency, due to the von Neumann computing architecture. Here, we present a brain-inspired hierarchical interactive in-memory computing (IMC) system, which can efficiently solve 'von Neumann bottleneck', enabling cross-modal interactions and semantic gap elimination. First, a 1T1M synapse array is fabricated using cost-effective, highly stable, flexible, and eco-friendly carbon materials, offering efficient analog multiply-accumulate operations. To illustrate the complexity of the proposed brain-inspired hierarchical interactive IMC system, three modules are proposed: 1) unimodal extraction module, 2) hierarchical interactive module, 3) output module. Furthermore, the proposed system is validated by applying it to video sentiment analysis. The experimental results demonstrate that the proposed system outperforms the existing state-of-the-art methods with high computational efficiency and good robustness. This work opens up a new way to achieve the deep integration of nanomaterials, deep learning, and modern electronics into IMC.

show abstract

Section: Application In Video Sentiment Analysismentioning

confidence: 99%

A Brain-Inspired Hierarchical Interactive In-Memory Computing System and Its Application in Video Sentiment Analysis

Dong

Han

et al. 2023

IEEE Trans. Circuits Syst. Video Technol.

View full text Add to dashboard Cite

show abstract

“…Tsai et al [35] used transformers to extract and fuse the features of the three modalities, which efectively solved the problems of unalignment of modal data and long-term dependencies between diferent modalities. Also, recently, pretrained networks using transfer learning techniques have achieved good performance for extracting features [36], especially in the feld of emotion recognition [37][38][39][40], and have advanced signifcantly. As the pretrained model can learn about global features from data, its parameters show better generalization efects.…”

Section: Related Workmentioning

confidence: 99%

Multimodal Emotion Recognition Based on Cascaded Multichannel and Hierarchical Fusion

Liu

Huang

2023

Computational Intelligence and Neuroscience

View full text Add to dashboard Cite

Humans express their emotions in a variety of ways, which inspires research on multimodal fusion-based emotion recognition that utilizes different modalities to achieve information complementation. However, extracting deep emotional features from different modalities and fusing them remain a challenging task. It is essential to exploit the advantages of different extraction and fusion approaches to capture the emotional information contained within and across modalities. In this paper, we present a novel multimodal emotion recognition framework called multimodal emotion recognition based on cascaded multichannel and hierarchical fusion (CMC-HF), where visual, speech, and text signals are simultaneously utilized as multimodal inputs. First, three cascaded channels based on deep learning technology perform feature extraction for the three modalities separately to enhance deeper information extraction ability within each modality and improve recognition performance. Second, an improved hierarchical fusion module is introduced to promote intermodality interactions of three modalities and further improve recognition and classification accuracy. Finally, to validate the effectiveness of the designed CMC-HF model, some experiments are conducted to evaluate two benchmark datasets, IEMOCAP and CMU-MOSI. The results show that we achieved an almost 2%∼3.2% increase in accuracy of the four classes for the IEMOCAP dataset as well as an improvement of 0.9%∼2.5% in the average class accuracy for the CMU-MOSI dataset when compared to the existing state-of-the-art methods. The ablation experimental results indicate that the cascaded feature extraction method and the hierarchical fusion method make a significant contribution to multimodal emotion recognition, suggesting that the three modalities contain deeper information interactions of both intermodality and intramodality. Hence, the proposed model has better overall performance and achieves higher recognition efficiency and better robustness.

show abstract

“…Feature Fusion. At first, SCCA is used to fuse the features of the two modalities from facial expression and speech [20][21][22]. e SCCA algorithm can be expressed as follows:…”

Section: Classroom Psychological Assessmentmentioning

confidence: 99%

“…First, the proposed method extracts the features of facial expression and speech, respectively. en, the sparse canonical correlation analysis (SCCA) [20][21][22] algorithm is used to fuse the two kinds of features to obtain a unified feature. Finally, the sparse representation-based classification (SRC) is used for bimodal emotion recognition.…”

Section: Introductionmentioning

confidence: 99%

Mental State Assessment in College English Teaching Courses Based on Deep Learning

Bei-bei

2022

Mathematical Problems in Engineering

View full text Add to dashboard Cite

College English teaching aims at students with different foundations and characteristics. Also, it is necessary to grasp the mental state of students in time during the teaching process. Based on the bimodal information of facial expressions and speech of classroom students, this paper designs a mental state assessment method based on deep learning. For bimodal emotion recognition of facial expressions and speech, a feature fusion method based on sparse canonical correlation analysis (SCCA) is proposed in this paper. First, the emotional features of the facial expression and speech are extracted, respectively. Then, SCCA is used to fuse the emotional features of the two modalities. Finally, the sparse representation-based classification (SRC) is used as the classifier for emotional prediction. Based on the prediction results, the mental state of different students can be grasped, so as to adjust the teaching strategy in a targeted manner. Experiments are carried out based on public datasets. First, the proposed method achieves the average classification accuracy of 92.4%, which is higher than those from the present methods for comparison. Second, under the condition of noise corruption, the proposed method keeps the superior robustness over the comparison methods. The results show that the proposed bimodal emotion recognition method based on SCCA and SRC can achieve higher recognition rates than some present methods.

show abstract

Interactive Multimodal Attention Network for Emotion Recognition in Conversation

Cited by 17 publications

References 20 publications

A Brain-Inspired Hierarchical Interactive In-Memory Computing System and Its Application in Video Sentiment Analysis

A Brain-Inspired Hierarchical Interactive In-Memory Computing System and Its Application in Video Sentiment Analysis

Multimodal Emotion Recognition Based on Cascaded Multichannel and Hierarchical Fusion

Mental State Assessment in College English Teaching Courses Based on Deep Learning

Contact Info

Product

Resources

About