2017 IEEE International Conference on Image Processing (ICIP) 2017
DOI: 10.1109/icip.2017.8296543
|View full text |Cite
|
Sign up to set email alerts
|

Visual and textual sentiment analysis using deep fusion convolutional neural networks

Abstract: Sentiment analysis is attracting more and more attentions and has become a very hot research topic due to its potential applications in personalized recommendation, opinion mining, etc. Most of the existing methods are based on either textual or visual data and can not achieve satisfactory results, as it is very hard to extract sufficient information from only one single modality data. Inspired by the observation that there exists strong semantic correlation between visual and textual data in social medias, we… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
19
0

Year Published

2018
2018
2022
2022

Publication Types

Select...
4
3
1

Relationship

0
8

Authors

Journals

citations
Cited by 30 publications
(19 citation statements)
references
References 15 publications
(21 reference statements)
0
19
0
Order By: Relevance
“…Remarkable results have been achieved in [74][75][76][77][78][79], where ensembles of handcrafted features were worked out from images and combined with information provided by text analysis. Those approaches were subsequently outperformed by frameworks that integrated CNNs for extracting features from visual content [12,78,[80][81][82][83][84][85].…”
Section: Sentiment Analysis: Other Applicationsmentioning
confidence: 99%
“…Remarkable results have been achieved in [74][75][76][77][78][79], where ensembles of handcrafted features were worked out from images and combined with information provided by text analysis. Those approaches were subsequently outperformed by frameworks that integrated CNNs for extracting features from visual content [12,78,[80][81][82][83][84][85].…”
Section: Sentiment Analysis: Other Applicationsmentioning
confidence: 99%
“…We extract µz as the target vector v t (i.e., v t := µz). Hence, a unified representation of a rap song, which involves both prosodic information and semantic information, can be generated by repeating lines [4][5][6][7][8][9][10] with the returned hyper-parameters in line 24.…”
Section: In Conclusion the Loss Function Of The Vae Network Is Formulated Asmentioning
confidence: 99%
“…Dataset. Following [24], we extract the dominant parts (i.e., verses) of rap songs and obtain 16,697 verses in total 6 . The verses are divided into lines to obtain a dataset of 810,567 lines.…”
Section: Nextline Prediction Taskmentioning
confidence: 99%
See 1 more Smart Citation
“…Multimodal emotion processing has emerged out as a significant research trend over the last few years. Humans reflect various emotions during their communication via visual, textual, and other modalities [2]. Combining complementary information from images and texts could increase emotion recognition accuracy and help the machines become empathetic [3].…”
Section: Introductionmentioning
confidence: 99%