2022
DOI: 10.1109/tcsvt.2022.3197420
|View full text |Cite
|
Sign up to set email alerts
|

Video-Based Cross-Modal Auxiliary Network for Multimodal Sentiment Analysis

Abstract: Multimodal sentiment analysis has a wide range of applications due to its information complementarity in multimodal interactions. Previous works focus more on investigating efficient joint representations, but they rarely consider the insufficient unimodal features extraction and data redundancy of multimodal fusion. In this paper, a Video-based Cross-modal Auxiliary Network (VCAN) is proposed, which is comprised of an audio features map module and a cross-modal selection module. The first module is designed t… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

1
5
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
5
1

Relationship

0
6

Authors

Journals

citations
Cited by 11 publications
(7 citation statements)
references
References 56 publications
(70 reference statements)
1
5
0
Order By: Relevance
“…For unimodal results, Table III shows that the visual modality succeeds over the audio modality on benchmark datasets, especially for RAVDESS. It verifies the importance of visual modality for emotion recognition, which is consistent with previous works (Chen et al , 2022b; Praveen et al , 2023). In addition, the reasons why the visual modality is remarkably important on the RAVDESS dataset are speculated as follows.…”
Section: Resultssupporting
confidence: 92%
See 3 more Smart Citations
“…For unimodal results, Table III shows that the visual modality succeeds over the audio modality on benchmark datasets, especially for RAVDESS. It verifies the importance of visual modality for emotion recognition, which is consistent with previous works (Chen et al , 2022b; Praveen et al , 2023). In addition, the reasons why the visual modality is remarkably important on the RAVDESS dataset are speculated as follows.…”
Section: Resultssupporting
confidence: 92%
“…Therefore, we speculate that the great improvements on classification tasks are due to the combination of MAIIM, MACIM and attention-based fusion in our proposed KE-AFN. For the regression task, KE-AFN outperforms existing state-of-the-art method (Chen et al , 2022b) by 1 per cent on MAE and also achieves comparable performance on Corr on par with the state-of-the-art, which indicates KE-AFN works well in fitting specific sentiment scores.…”
Section: Resultsmentioning
confidence: 79%
See 2 more Smart Citations
“…The development of augmented reality (AR) technologies and their application in interactive art [1] opens new opportunities for personalization of visual content. Personalization becomes the basis for creating a deeper and more meaningful experience for users, allowing art to adapt to individual preferences and emotional states [2]. However, there are often problems with improving the immersion effect during user interaction with interactive art in augmented reality systems.…”
Section: Introductionmentioning
confidence: 99%