Proceedings of the 28th ACM International Conference on Multimedia 2020
DOI: 10.1145/3394171.3413577
|View full text |Cite
|
Sign up to set email alerts
|

Transformer-based Label Set Generation for Multi-modal Multi-label Emotion Detection

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
12
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
5
3
1

Relationship

1
8

Authors

Journals

citations
Cited by 38 publications
(13 citation statements)
references
References 21 publications
0
12
0
Order By: Relevance
“…Recently, multi-modal multi-label emotion recognition has aroused increasing interest. For example, (Ju et al 2020;Zhang et al 2021a) models modality-to-label and featureto-label dependence besides label correlations.…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation
“…Recently, multi-modal multi-label emotion recognition has aroused increasing interest. For example, (Ju et al 2020;Zhang et al 2021a) models modality-to-label and featureto-label dependence besides label correlations.…”
Section: Related Workmentioning
confidence: 99%
“…In real-world applications, videos are often characterized by heterogeneous representations (i.e., visual, audio and text) and annotated with various emotion labels (e.g., happy, surprise). Multi-modal Multi-label Emotion Recognition (MMER) (Ju et al 2020;Zhang et al 2021a) refers to identifying various emotions by leveraging visual, audio and text modalities presented in videos.…”
Section: Introductionmentioning
confidence: 99%
“…where ATT self denotes self-modal multi-head attention as (Vaswani et al, 2017), and ATT cross denotes the cross-modal multi-head attention as (Ju et al, 2020). O rel and T rel are pre-trained embedding of image I and text X.…”
Section: Cross-modal Relation Detectionmentioning
confidence: 99%
“…Comparing to CNN, the attention mechanism learns more global dependencies, therefore, transformer also shows great performance in low-level tasks [3]. Transformer has also been proved effectiveness in multi-modal area, including multi-modal representations [45] and applications [13,19,31]. Inspired by the extensive applications of transformer, we integrate the transformer encoder-decoder into the document image rectification problem.…”
Section: Transformer In Language and Visionmentioning
confidence: 99%