MMGCN: Multimodal Fusion via Deep Graph Convolution Network for Emotion Recognition in Conversation

Hu, Jingwen; Liu, Yuchen; Zhao, Jinming; Jin, Qin

doi:10.18653/v1/2021.acl-long.440

Cited by 108 publications

(32 citation statements)

References 24 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…DialogueGCN (Ghosal et al, 2019) captures conversational dependencies between utterances with a graph-based structure. MMGCN (Hu et al, 2021) further proposes a GCNbased multimodal fusion method for multimodal ERC tasks to improve recognition performance. Di-alogXL (Shen et al, 2020) first introduces a strong pre-trained language model XLNet for text-based ERC.…”

Section: Related Methodsmentioning

confidence: 99%

“…MMGCN: A state-of-the-art GCN-based multimodal ERC framework proposed in (Hu et al, 2021). For the uni-modal experiments, we only model the fully connected graph.…”

Section: Baseline Modelsmentioning

confidence: 99%

“…(Poria et al, 2019b;Scherer, 2005;Koval et al, 2015). It has been proved in recent works Ghosal et al, 2019;Hu et al, 2021;Shen et al, 2020) that contextual information plays an important role in ERC tasks and brings significant improvements over baselines that only consider isolated utterances. DialogueRNN uses recurrent networks to model global and speaker-specific temporal-context information.…”

Section: Introductionmentioning

confidence: 99%

“…DialogueRNN uses recurrent networks to model global and speaker-specific temporal-context information. Di-alogueGCN (Ghosal et al, 2019) and MMGCN (Hu et al, 2021) use graph-based networks to capture conversational dependencies between utterances in dialogues. DialogXL (Shen et al, 2020) arXiv:2205.10237v1 [cs.CL] 9 May 2022 applies a strong pre-trained language model XLNet (Yang et al, 2019) to ERC and proposes a dialogaware self-attention method for modeling the context information.…”

Section: Introductionmentioning

confidence: 99%

See 3 more Smart Citations

M3ED: Multi-modal Multi-scene Multi-label Emotional Dialogue Database

Zhao¹,

Zhang²,

Hu³

et al. 2022

Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Self Cite

View full text Add to dashboard Cite

The emotional state of a speaker can be influenced by many different factors in dialogues, such as dialogue scene, dialogue topic, and interlocutor stimulus. The currently available data resources to support such multimodal affective analysis in dialogues are however limited in scale and diversity. In this work, we propose a Multi-modal Multi-scene Multi-label Emotional Dialogue dataset, M 3 ED, which contains 990 dyadic emotional dialogues from 56 different TV series, a total of 9,082 turns and 24,449 utterances. M 3 ED is annotated with 7 emotion categories (happy, surprise, sad, disgust, anger, fear, and neutral) at utterance level, and encompasses acoustic, visual, and textual modalities. To the best of our knowledge, M 3 ED is the first multimodal emotional dialogue dataset in Chinese. It is valuable for cross-culture emotion analysis and recognition. We apply several state-of-the-art methods on the M 3 ED dataset to verify the validity and quality of the dataset. We also propose a general Multimodal Dialogue-aware Interaction framework, MDI, to model the dialogue context for emotion recognition, which achieves comparable performance to the stateof-the-art methods on the M 3 ED. The full dataset and codes are available 1 .

show abstract

Section: Related Methodsmentioning

confidence: 99%

“…MMGCN: A state-of-the-art GCN-based multimodal ERC framework proposed in (Hu et al, 2021). For the uni-modal experiments, we only model the fully connected graph.…”

Section: Baseline Modelsmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

M3ED: Multi-modal Multi-scene Multi-label Emotional Dialogue Database

Zhao¹,

Zhang²,

Hu³

et al. 2022

Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Self Cite

View full text Add to dashboard Cite

show abstract

“…Moreover, DAG-ERC also makes meaningful and reasonable assumptions while constructing the graph by 1) Removing the link of an utterance in a dialogue to future utterances and 2) By imputing remote information for modeling conversational context by introducing another edge to the speakers previous utterance. Very recently MMGCN [15] proposed fusing information from multiple modalities by the use of spectral domain GCN to encode the multimodal contextual information. The work closest to our work is [11], where the authors use discourse relations between utterances to build a conversational graph and show that ER in both multi-party and two-party conversations benefit from conversational discourse structures.…”

Section: Graph-based Modelsmentioning

confidence: 99%

M-MELD: A Multilingual Multi-Party Dataset for Emotion Recognition in Conversations

Ghosh¹,

Srivastava²,

Umesh³

2022

Preprint

View full text Add to dashboard Cite

The expression of emotions is a crucial part of daily human communication. Modeling the conversational and sequential context has seen much success and plays a vital role in Emotion Recognition in Conversations (ERC). However, existing approaches either model only one of the two or employ naive late-fusion methodologies to obtain final utterance representations. This paper proposes a novel idea to incorporate both these contexts and better model the intrinsic structure within a conversation. More precisely, we propose a novel architecture boosted by a modified LSTM cell, which we call DiscLSTM, that better captures the interaction between conversational and sequential context. DiscLSTM brings together the best of both worlds and provides a more intuitive and efficient way to model the information flow between individual utterances by better capturing long-distance conversational background through discourse relations and sequential context through recurrence. We conduct experiments on four benchmark datasets for ERC and show that our model achieves performance competitive to state-of-the-art and at times performs better than other graph-based approaches in literature, with a conversational graph that is both sparse and avoids complicated edge relations like much of previous work. We make all our codes publicly available on GitHub 1 .

show abstract

Contextual Information and Commonsense Based Prompt for Emotion Recognition in Conversation

Yang

Yuan

et al. 2023

Lecture Notes in Computer Science

View full text Add to dashboard Cite

MMGCN: Multimodal Fusion via Deep Graph Convolution Network for Emotion Recognition in Conversation

Cited by 108 publications

References 24 publications

M3ED: Multi-modal Multi-scene Multi-label Emotional Dialogue Database

M3ED: Multi-modal Multi-scene Multi-label Emotional Dialogue Database

M-MELD: A Multilingual Multi-Party Dataset for Emotion Recognition in Conversations

Contextual Information and Commonsense Based Prompt for Emotion Recognition in Conversation

Contact Info

Product

Resources

About