Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conferen 2019
DOI: 10.18653/v1/d19-1566
|View full text |Cite
|
Sign up to set email alerts
|

Context-aware Interactive Attention for Multi-modal Sentiment and Emotion Analysis

Abstract: In recent times, multi-modal analysis has been an emerging and highly sought-after field at the intersection of natural language processing, computer vision, and speech processing. The prime objective of such studies is to leverage the diversified information, (e.g., textual, acoustic and visual), for learning a model. The effective interaction among these modalities often leads to a better system in terms of performance. In this paper, we introduce a recurrent neural network based approach for the multi-modal… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
30
0
2

Year Published

2020
2020
2024
2024

Publication Types

Select...
5
2
1

Relationship

0
8

Authors

Journals

citations
Cited by 63 publications
(32 citation statements)
references
References 25 publications
0
30
0
2
Order By: Relevance
“…These embeddings have been used to train a Logistic Regression model for the two downstream tasks of Sentiment Analysis and Emotion Recognition. Following the existing literature [8,2,5,15,3,16,10,17,9,13], we report the binary accuracy and weighted averaged f1 metrics on sentiment for MOSEI, and on each of the 4 emotions of IEMOCAP in an one-vs-all manner.…”
Section: Results On Downstream Classificationmentioning
confidence: 99%
See 2 more Smart Citations
“…These embeddings have been used to train a Logistic Regression model for the two downstream tasks of Sentiment Analysis and Emotion Recognition. Following the existing literature [8,2,5,15,3,16,10,17,9,13], we report the binary accuracy and weighted averaged f1 metrics on sentiment for MOSEI, and on each of the 4 emotions of IEMOCAP in an one-vs-all manner.…”
Section: Results On Downstream Classificationmentioning
confidence: 99%
“…In this paper we follow a widely used approach for basic unimodal feature extraction and multimodal alignment, similar to the one in various proposed methods, such as [8,2,5,15,3,16,10,17,9,13]. More specifically, after extracting the features on each modality(visual, textual and aural), the procedure of word-level alignment that was firstly used for this task in [17], is performed.…”
Section: -D Multimodal Sequence Representationmentioning
confidence: 99%
See 1 more Smart Citation
“…The experimental results of their research show that if the multimodal features are contextually integrated, then it can result in better performance (up to an accuracy level of 91.39%) as compared to the features that are extracted by a unimodal (having an accuracy value of up to 89.24%). Chauhan et al [7] have devised a multimodal emotions analysis model based on the recurrent neural network (RNN). This model also learns the interaction within different participating models by using an auto-encoder algorithm-based mechanism.…”
Section: Literature Reviewmentioning
confidence: 99%
“…Sentiment analysis has been studied under different settings in the literature (e.g., sentence-level, aspect-level, cross-domain) Chauhan et al, 2019;Hu et al, 2019). For ABSA, the early works have performed feature engineering to produce useful features for the statistical classification models (e.g., SVM) (Wagner et al, 2014).…”
Section: Related Workmentioning
confidence: 99%