Convolutional MKL Based Multimodal Emotion Recognition and Sentiment Analysis

Poria, Soujanya; Chaturvedi, Iti; Wang, Zhaoxia; Hussain, Amir

doi:10.1109/icdm.2016.0055

Cited by 488 publications

(271 citation statements)

References 31 publications

Supporting

Mentioning

249

Contrasting

Unclassified

Order By: Relevance

“…In a study using an acted emotion corpus, Busso et al reported an accuracy of 89% in recognizing four classes of emotion, a significant improvement compared to the accuracy of 70.9% with speech-based system and 85% with facial expression based system [37]. This finding is reaffirmed by Poria et al who reported improvement when fusing three modalities into a single emotion recognizer [38]. Another study by Nojavanasghari et al focusing on spontaneous emotions in children also reported the same trend, with best overall performance at 69% [39].…”

Section: Resultssupporting

confidence: 74%

Construction of Spontaneous Emotion Corpus from Indonesian TV Talk Shows and Its Application on Multimodal Emotion Recognition

Lubis

Lestari

Sakti

et al. 2018

IEICE Trans. Inf. & Syst.

View full text Add to dashboard Cite

SUMMARY As interaction between human and computer continues to develop to the most natural form possible, it becomes increasingly urgent to incorporate emotion in the equation. This paper describes a step toward extending the research on emotion recognition to Indonesian. The field continues to develop, yet exploration of the subject in Indonesian is still lacking. In particular, this paper highlights two contributions: (1) the construction of the first emotional audio-visual database in Indonesian, and (2) the first multimodal emotion recognizer in Indonesian, built from the aforementioned corpus. In constructing the corpus, we aim at natural emotions that are corresponding to real life occurrences. However, the collection of emotional corpora is notably labor intensive and expensive. To diminish the cost, we collect the emotional data from television programs recordings, eliminating the need of an elaborate recording set up and experienced participants. In particular, we choose television talk shows due to its natural conversational content, yielding spontaneous emotion occurrences. To cover a broad range of emotions, we collected three episodes in different genres: politics, humanity, and entertainment. In this paper, we report points of analysis of the data and annotations. The acquisition of the emotion corpus serves as a foundation in further research on emotion. Subsequently, in the experiment, we employ the support vector machine (SVM) algorithm to model the emotions in the collected data. We perform multimodal emotion recognition utilizing the predictions of three modalities: acoustic, semantic, and visual. When compared to the unimodal result, in the multimodal feature combination, we attain identical accuracy for the arousal at 92.6%, and a significant improvement for the valence classification task at 93.8%. We hope to continue this work and move towards a finer-grain, more precise quantification of emotion.

show abstract

Section: Resultssupporting

confidence: 74%

Construction of Spontaneous Emotion Corpus from Indonesian TV Talk Shows and Its Application on Multimodal Emotion Recognition

Lubis

Lestari

Sakti

et al. 2018

IEICE Trans. Inf. & Syst.

View full text Add to dashboard Cite

show abstract

“…Poria et al (Poria et al, 2015(Poria et al, , 2016d) extracted audio, visual and textual features using convolutional neural network (CNN); concatenated those features and employed multiple kernel learning (MKL) for final sentiment classification. (Metallinou et al, 2008) and (Eyben et al, 2010a) fused audio and textual modalities for emotion recognition.…”

Section: Related Workmentioning

confidence: 99%

Context-Dependent Sentiment Analysis in User-Generated Videos

Poria

Wang

Hazarika

et al. 2017

Proceedings of the 55th Annual Meeting of the Association For Computational Linguistics (Volume 1: Long Papers)

Self Cite

617

431

View full text Add to dashboard Cite

Multimodal sentiment analysis is a developing area of research, which involves the identification of sentiments in videos. Current research considers utterances as independent entities, i.e., ignores the interdependencies and relations among the utterances of a video. In this paper, we propose a LSTM-based model that enables utterances to capture contextual information from their surroundings in the same video, thus aiding the classification process. Our method shows 5-10% performance improvement over the state of the art and high robustness to generalizability.

show abstract

“…This allows analysts in government, commercial and political domains who need to determine the response of people to different crisis events [5,40,59]. Similarly, online reviews need to be summarized in a manner that allows comparison of opinions, so that a user can clearly see the advantages and weaknesses of each product merely with a single glance, both in unimodal [60] and multimodal [50,9] contexts. Further, we can do in-depth opinion assessment, such as finding reasons or aspects [46] in opinion-bearing texts.…”

Section: Subjectivity Detectionmentioning

confidence: 99%