2020
DOI: 10.1002/cpe.5954
|View full text |Cite
|
Sign up to set email alerts
|

Various syncretic co‐attention network for multimodal sentiment analysis

Abstract: The multimedia contents shared on social network reveal public sentimental attitudes toward specific events. Therefore, it is necessary to conduct sentiment analysis automatically on abundant multimedia data posted by the public for real-world applications. However, approaches to single-modal sentiment analysis neglect the internal connections between textual and visual contents, and current multimodal methods fail to exploit the multilevel semantic relations of heterogeneous features. In this article, the var… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
5
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
5
1

Relationship

0
6

Authors

Journals

citations
Cited by 6 publications
(5 citation statements)
references
References 46 publications
(109 reference statements)
0
5
0
Order By: Relevance
“…In order to assess the effectiveness of our proposed approach, we have undertaken a comparative analysis between our investigation utilizing the BG dataset and the current literature: [11], [16]- [18], [23], [24]. The comparison results in Group 1 of Table 4 demonstrate that, regarding the F1-score (92.60%) and accuracy (92.65%), the DMLANet model outperformed AMGN.…”
Section: E Comparative Results and Discussionmentioning
confidence: 99%
See 1 more Smart Citation
“…In order to assess the effectiveness of our proposed approach, we have undertaken a comparative analysis between our investigation utilizing the BG dataset and the current literature: [11], [16]- [18], [23], [24]. The comparison results in Group 1 of Table 4 demonstrate that, regarding the F1-score (92.60%) and accuracy (92.65%), the DMLANet model outperformed AMGN.…”
Section: E Comparative Results and Discussionmentioning
confidence: 99%
“…However, this model suffered from excessive memory overhead due to its lengthy execution time. Cao et al [24] proposed various syncretic co-attention networks (VSCN) to investigate multi-level matching correlations across multimodal information and consider each modality's specific characteristics for integrated sentiment classification. However, the emotion polarity is frequently unclear because visual components convey more information than text, causing the model to generate incorrect predictions occasionally.…”
Section: Literature Review a Multimodal Sentiment Analysismentioning
confidence: 99%
“…When applying attention mechanisms to images, different feature vectors associated with different regions are assigned different weights to create an attended image vector, as seen in the work of Zhang et al [ 3 ]. In contrast, Cao et al [ 4 ] adopt an asymmetric attention framework to generate attended image and textual feature vectors, while Xu et al [ 5 ] use a dual attention network (DAN) to simultaneously predict the attention distribution of both the image and the text. Unlike collaborative attention, where an asymmetric attention framework is used to generate attended feature vectors, memory vectors can be repeatedly modified at each inference level using a repeated DAN structure.…”
Section: Related Workmentioning
confidence: 99%
“…As a result, many multimodal sentiment categorization approaches have been proposed to incorporate diverse modalities. These approaches are classified into three distinct categories: early/feature fusion [17,18], intermediate/joint fusion [19][20][21][22][23][24][25][26][27][28], and late/decision fusion [29][30][31]. In the early fusion approach, a unified feature vector is created first, and then a Machine Learning (ML) classifier is fed with the features extracted from the input data.…”
Section: Literature Review 21 Visual-textual Sentiment Analysismentioning
confidence: 99%
“…The way textual and visual features are extracted and incorporated allows the model to achieve robust performance. Cao et al [22] proposed Various Syncretic Co-attention Networks (VSCN) to investigate multi-level matching correlations between multimodal data and incorporate each modality's unique information for integrated sentiment classification. However, the emotion polarity could be clearer because visual components convey more information than text, causing the model to generate incorrect predictions occasionally.…”
Section: Literature Review 21 Visual-textual Sentiment Analysismentioning
confidence: 99%