Proceedings of the 2019 Conference of the North 2019
DOI: 10.18653/v1/n19-1034
|View full text |Cite
|
Sign up to set email alerts
|

Multi-task Learning for Multi-modal Emotion Recognition and Sentiment Analysis

Abstract: Related tasks often have inter-dependence on each other and perform better when solved in a joint framework. In this paper, we present a deep multi-task learning framework that jointly performs sentiment and emotion analysis both. The multi-modal inputs (i.e., text, acoustic and visual frames) of a video convey diverse and distinctive information, and usually do not have equal contribution in the decision making. We propose a contextlevel inter-modal attention framework for simultaneously predicting the sentim… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1

Citation Types

0
44
0

Year Published

2019
2019
2023
2023

Publication Types

Select...
5
4
1

Relationship

1
9

Authors

Journals

citations
Cited by 136 publications
(44 citation statements)
references
References 22 publications
0
44
0
Order By: Relevance
“…Third, this study demonstrates the superiority of the proposed AV-TFN method through comparisons of the performances of the visual network (VN) [7], audio network (AN), and audio-visual network (AVN) with concatenation (AVN-Concat) [8] and attention (AVN-Atten) [9] techniques. The experiment results show that AV-TFN significantly improves F1 score compared with AN, VN, AVN-Concat, and AVN-Atten methods, while also achieving speeds similar to that of the fast VN method.…”
mentioning
confidence: 84%
“…Third, this study demonstrates the superiority of the proposed AV-TFN method through comparisons of the performances of the visual network (VN) [7], audio network (AN), and audio-visual network (AVN) with concatenation (AVN-Concat) [8] and attention (AVN-Atten) [9] techniques. The experiment results show that AV-TFN significantly improves F1 score compared with AN, VN, AVN-Concat, and AVN-Atten methods, while also achieving speeds similar to that of the fast VN method.…”
mentioning
confidence: 84%
“…On the other hand Pham et al (2018) introduced multi-modal sequence-to-sequence models which perform specially well in bi-modal settings. Finally, Akhtar et al (2019) proposed a multi-modal, multi-task approach in which the inputs from a video (text, acoustic and visual frames), are exploited for simultaneously predicting the sentiment and expressed emotions of an utterance. Our work is related to all of these approaches, but it is different in that we apply multi-modal techniques not only for sentiment classification, but also for aspect extraction.…”
Section: Related Workmentioning
confidence: 99%
“…various state-ofthe-art systems for both sentiment and emotion analysis. Very recently, Akhtar et al (2019) in-troduced an attention based multi-task learning framework for sentiment and emotion classification on the CMU-MOSEI dataset.…”
Section: Related Workmentioning
confidence: 99%