2015
DOI: 10.1109/tmm.2015.2482228
|View full text |Cite
|
Sign up to set email alerts
|

Deep Multimodal Learning for Affective Analysis and Retrieval

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
68
0
1

Year Published

2018
2018
2021
2021

Publication Types

Select...
5
2
1

Relationship

0
8

Authors

Journals

citations
Cited by 132 publications
(69 citation statements)
references
References 25 publications
0
68
0
1
Order By: Relevance
“…In particular, because of the complicated and unstructured nature of user-generated videos and the sparsity of video frames that express the emotion content, it is often hard to understand emotions conveyed in user-generated videos. To address this challenging problem, multi-modal fusion and knowledge transfer approaches have been proposed in recent works [28,45,49,46]. In this paper, we show that our FFCSN model can be easily extended to emotion recognition from user-generated videos, with state-of-the-art results achieved.…”
Section: Related Workmentioning
confidence: 78%
See 1 more Smart Citation
“…In particular, because of the complicated and unstructured nature of user-generated videos and the sparsity of video frames that express the emotion content, it is often hard to understand emotions conveyed in user-generated videos. To address this challenging problem, multi-modal fusion and knowledge transfer approaches have been proposed in recent works [28,45,49,46]. In this paper, we show that our FFCSN model can be easily extended to emotion recognition from user-generated videos, with state-of-the-art results achieved.…”
Section: Related Workmentioning
confidence: 78%
“…The YouTube-8 dataset [17] is used for performance evaluation. This dataset consists of 1,101 videos (downloaded from YouTube) annotated with 8 basic emotions: anger, an-Model multimodal ACC [17] visual+acoustic+attribute 46.1 [28] visual+acoustic+attribute 51.1 [49] visual+attribute 52.5 [46] visual+acoustic 52.6 [45] visual+acoustic 52.6 Ours visual 57.8 Table 3. Comparative results (%) of video-based emotion recognition on the YouTube-8 dataset.…”
Section: Dataset and Settingmentioning
confidence: 99%
“…In particular, understanding the sentiment in visual media content (i.e., images, videos) has attracted increasing research attention. Potential use of approaches developed for visual sentiment analysis is broad, including affective image retrieval [3], aesthetic quality categorization [4], opinion mining [5], comment assistant [6], etc.…”
Section: Introductionmentioning
confidence: 99%
“…Earlier work on emotional understanding of multimedia used hand crafted features from different modalities that are fused at feature or decision levels [8,15,46,117,126]. The more recent work mainly use deep learning models [55,93].…”
Section: Affective Computing Of Multimodal Datamentioning
confidence: 99%
“…Pang et al [93] used Deep Boltzmann Machine (DBM) to learn a joint representation across text, vision, and audio to recognize expected emotions from social media videos. Each modality is separately encoded with stacking multiple Restricted Boltzmann Machines (RBM) and pathways are merged to a joint representation layer.…”
Section: Affective Computing Of Multimodal Datamentioning
confidence: 99%