A multi-view network for real-time emotion recognition in conversations

Ma, Hui; Wang, Jian; Lin, Hongfei; Pan, Xuejun; Zhang, Yijia; Yang, Zhihao

doi:10.1016/j.knosys.2021.107751

Cited by 29 publications

(13 citation statements)

References 36 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…It has gained significant popularity due to numerous applications. Existing literature suggests that a wide range of deep learning methods have been applied to address the Emotion Recognition in Conversation (ERC) task [25], [26], [27], [28], [29], [30], [31], [32], [33], [34], [35], [36], [37]. ICON [25] used a memory network architecture to model the interaction between self and inter-speaker states in two-party conversations.…”

Section: Related Workmentioning

confidence: 99%

Emotion Flip Reasoning in Multiparty Conversations

Kumar

Dudeja

Akhtar

et al. 2024

IEEE Trans. Artif. Intell.

View full text Add to dashboard Cite

In a conversational dialogue, speakers may have different emotional states and their dynamics play an important role in understanding dialogue's emotional discourse. However, simply detecting emotions is not sufficient to entirely comprehend the speaker-specific changes in emotion that occur during a conversation. To understand the emotional dynamics of speakers in an efficient manner, it is imperative to identify the rationale or instigator behind any changes or flips in emotion expressed by the speaker. In this paper, we explore the task called Instigator based Emotion Flip Reasoning (EFR), which aims to identify the instigator behind a speaker's emotion flip within a conversation. For example, an emotion flip from joy to anger could be caused by an instigator like threat. To facilitate this task, we present MELD-I, a dataset that includes ground-truth EFR instigator labels, which are in line with emotional psychology. To evaluate the dataset, we propose a novel neural architecture called TGIF, which leverages Transformer encoders and stacked GRUs to capture the dialogue context, speaker dynamics, and emotion sequence in a conversation. Our evaluation demonstrates state-of-the-art performance (+4−12% increase in F1-socre) against five baselines used for the task. Further, we establish the generalizability of TGIF on an unseen dataset in a zero-shot setting. Additionally, we provide a detailed analysis of the competing models, highlighting the advantages and limitations of our neural architecture.Impact Statement-Emotions play a pivotal role in deciding the impact of a statement uttered. However, in a conversational setting, simply identifying the emotions of utterances in a dialogue is not enough to characterize the emotional dynamic of the speaker. To this end, the proposed task of emotion-flip reasoning is eminent. The proposed flip explanations via triggers and instigators can help scrutinise how a particular type of remark or expression affects the end listener. A response generation mechanism can use these triggers or instigators as feedback to steer a conversation so that the user feels chipper.

show abstract

Section: Related Workmentioning

confidence: 99%

Emotion Flip Reasoning in Multiparty Conversations

Kumar

Dudeja

Akhtar

et al. 2024

IEEE Trans. Artif. Intell.

View full text Add to dashboard Cite

show abstract

“…AGHMN [23] uses a hierarchical memory network to enhance utterance representations and introduce an attention GRU to model contextual information. MVN [11] utilizes a multi-view network to model word-and utterancelevel dependencies in a conversation. In contrast, speakerdependent methods model both context-and speaker-sensitive dependencies.…”

Section: A Emotion Recognition In Conversationsmentioning

confidence: 99%

“…Existing mainstream works on ERC can generally be categorized into sequence-and graph-based methods. Sequencebased methods [4]- [11] use recurrent neural networks or transformers to model long-distance contextual information in a conversation. In contrast, graph-based methods [12]- [15] design graph structures for conversations and then use graph neural networks to capture multiple dependencies.…”

Section: Introductionmentioning

confidence: 99%

A Transformer-Based Model With Self-Distillation for Multimodal Emotion Recognition in Conversations

Wang

Lin

et al. 2024

IEEE Trans. Multimedia

View full text Add to dashboard Cite

Emotion recognition in conversations (ERC), the task of recognizing the emotion of each utterance in a conversation, is crucial for building empathetic machines. Existing studies focus mainly on capturing context-and speakersensitive dependencies on the textual modality but ignore the significance of multimodal information. Different from emotion recognition in textual conversations, capturing intra-and intermodal interactions between utterances, learning weights between different modalities, and enhancing modal representations play important roles in multimodal ERC. In this paper, we propose a transformer-based model with self-distillation (SDT) 1 for the task. The transformer-based model captures intra-and inter-modal interactions by utilizing intra-and inter-modal transformers, and learns weights between modalities dynamically by designing a hierarchical gated fusion strategy. Furthermore, to learn more expressive modal representations, we treat soft labels of the proposed model as extra training supervision. Specifically, we introduce self-distillation to transfer knowledge of hard and soft labels from the proposed model to each modality. Experiments on IEMOCAP and MELD datasets demonstrate that SDT outperforms previous state-of-the-art baselines.

show abstract

“…We propose a new loss function paradigm covering the pixel loss, structural similarity loss and gradient loss. The new loss function paradigm L total is shown in Equation (5).…”

Section: Loss Function and Evaluation Parametermentioning

confidence: 99%

“…Multi-view learning and multimodal fusion have been widely applied in many fields, including image segmentation [2], target tracking [3], object detection [4], behaviour and emotion recognition [5,6], multi-view question answering [7]. The above research fields and results provide a certain reference for multi-view image fusion technology in industry.…”

Section: Introductionmentioning

confidence: 99%

A multi‐view image fusion algorithm for industrial weld

Zheng

Zhao

Zhou

et al. 2022

IET Image Processing

View full text Add to dashboard Cite

Multi-view image fusion can be used to extract features from redundant and complementary multisource images. And the technique of obtaining high quality fusion images has become one of the research hotspots for image processing. In order to realize defect detection and intelligent grinding smoothly, multi-view fusion technology was applied in the field of overexposure and underexposure industrial welds, achieving high quality image enhancement. When preparing the data set of multi-view images, a hybrid registration algorithm with high matching ability is proposed. The data set of overexposure and underexposure weld images was obtained successfully by using the registration algorithm. In order to improve the fusion ability of overexposure and underexposure industrial welds, we propose a novel multi-view image fusion algorithm based on deep learning. The multiview fusion algorithm uses an autoencoder network structure, and its innovation lies in a parallel branch network with lightweight structure and strong generalization ability. The experimental results demonstrate that compared with other classical multi-view algorithms, our proposed algorithm gets the best parameters on the industrial weld data set in peak signal to noise ratio (PSNR) and root mean square error (RMSE), reaching 59.12 and 0.084, respectively. And the ablation and performance comparison experiments verify that the proposed parallel branch network has better generalization ability and fusion accuracy than other classical multi branch networks.

show abstract

A multi-view network for real-time emotion recognition in conversations

Cited by 29 publications

References 36 publications

Emotion Flip Reasoning in Multiparty Conversations

Emotion Flip Reasoning in Multiparty Conversations

A Transformer-Based Model With Self-Distillation for Multimodal Emotion Recognition in Conversations

A multi‐view image fusion algorithm for industrial weld

Contact Info

Product

Resources

About