Deep Hierarchical Fusion with Application in Sentiment Analysis

Georgiou, Efthymios; Charilaos, Papaioannou,; Potamianos, Alexandros

doi:10.21437/interspeech.2019-3243

Cited by 28 publications

(23 citation statements)

References 20 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…[5] 76.5 73. 4 Zadeh et al [7] 76.9 77.0 Georgiou et al [9] 76.9 76.9 Poria et al [2] 77.64 -Ghosal et al [10] 82.31 80.69 Ghosal et al [10] 79.80 -Sun et al [4] 80 [7], ( ¦ ) results are obtained on CMU-MOSEI dataset after excluding the utterances with sentiment score of 0. We mention the results of proposed model with this setup in the parenthesis.…”

Section: Cmu-mosei Approachmentioning

confidence: 99%

“…Methods that jointly learn the interactions between two or three modalities [3,4], and 3. Methods that explicitly learn contributions from these unimodal and cross modal cues, typically using attention based techniques [5,6,7,8,9,10].…”

Section: Introductionmentioning

confidence: 99%

“…Most of the existing approaches propose either fusion at different granularities [3,9] or use a cross interaction block that couple the features from different modalities [10,6].…”

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

Gated Mechanism for Attention Based Multi Modal Sentiment Analysis

Kumar¹,

Vepa²

2020

ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

View full text Add to dashboard Cite

Multimodal sentiment analysis has recently gained popularity because of its relevance to social media posts, customer service calls and video blogs. In this paper, we address three aspects of multimodal sentiment analysis; 1. Cross modal interaction learning, i.e. how multiple modalities contribute to the sentiment, 2. Learning long-term dependencies in multimodal interactions and 3. Fusion of unimodal and cross modal cues. Out of these three, we find that learning cross modal interactions is beneficial for this problem. We perform experiments on two benchmark datasets, CMU Multimodal Opinion level Sentiment Intensity (CMU-MOSI) and CMU Multimodal Opinion Sentiment and Emotion Intensity (CMU-MOSEI) corpus. Our approach on both these tasks yields accuracies of 83.9% and 81.1% respectively, which is 1.6% and 1.34% absolute improvement over current state-ofthe-art.

show abstract

Section: Cmu-mosei Approachmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Gated Mechanism for Attention Based Multi Modal Sentiment Analysis

Kumar¹,

Vepa²

2020

ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

View full text Add to dashboard Cite

show abstract

“…In contrast, the early fusion could model inter-actions across modalities at raw features stage. Georgiou et al [4] concatenated features from different modalitiy at various levels and used multi-layer perceptron for emotion prediction. Generally speaking, concatenation based early fusion methods do not outperform the late fusion methods in SER [5].…”

Section: Introductionmentioning

confidence: 99%

A Two-Stage Attention Based Modality Fusion Framework for Multi-Modal Speech Emotion Recognition

Chen

Zhang

et al. 2021

IEICE Trans. Inf. & Syst.

View full text Add to dashboard Cite

Recently, automated recognition and analysis of human emotion has attracted increasing attention from multidisciplinary communities. However, it is challenging to utilize the emotional information simultaneously from multiple modalities. Previous studies have explored different fusion methods, but they mainly focused on either inter-modality interaction or intra-modality interaction. In this letter, we propose a novel two-stage fusion strategy named modality attention flow (MAF) to model the intra-and inter-modality interactions simultaneously in a unified endto-end framework. Experimental results show that the proposed approach outperforms the widely used late fusion methods, and achieves even better performance when the number of stacked MAF blocks increases.

show abstract

“…The general modus-operandi of SLU systems is to convert voice into text using an ASR engine and use natural language understanding (NLU) on the transcribed text by modelling conversational and channel properties while being robust to ASR errors. Since spoken conversation boasts of amalgamation of spontaneous speaker interactions, it has become imperative for model architectures to capture multimodal features from text and speech modalities (Georgiou et al, 2019). The aim of these multimodal systems is to capture acoustic information such as pitch, intonation, rate of speech, etc.…”

Section: Introductionmentioning

confidence: 99%

What BERT Based Language Model Learns in Spoken Transcripts: An Empirical Study

Kumar¹,

Sundararaman²,

Vepa³

2021

Proceedings of the Fourth BlackboxNLP Workshop on Analyzing and Interpreting Neural Networks for NLP

View full text Add to dashboard Cite

Language Models (LMs) have been ubiquitously leveraged in various tasks including spoken language understanding (SLU). Spoken language requires careful understanding of speaker interactions, dialog states and speech induced multimodal behaviors to generate a meaningful representation of the conversation. In this work, we propose to dissect SLU into three representative properties: conversational (disfluency, pause, overtalk), channel (speakertype, turn-tasks) and ASR (insertion, deletion, substitution). We probe BERT based language models (BERT, RoBERTa) trained on spoken transcripts to investigate its ability to understand multifarious properties in absence of any speech cues. Empirical results indicate that LM is surprisingly good at capturing conversational properties such as pause prediction and overtalk detection from lexical tokens. On the downsides, the LM scores low on turntasks and ASR errors predictions. Additionally, pre-training the LM on spoken transcripts restrain its linguistic understanding. Finally, we establish the efficacy and transferability of the mentioned properties on two benchmark datasets: Switchboard Dialog Act and Disfluency datasets.

show abstract

Deep Hierarchical Fusion with Application in Sentiment Analysis

Cited by 28 publications

References 20 publications

Gated Mechanism for Attention Based Multi Modal Sentiment Analysis

Gated Mechanism for Attention Based Multi Modal Sentiment Analysis

A Two-Stage Attention Based Modality Fusion Framework for Multi-Modal Speech Emotion Recognition

What BERT Based Language Model Learns in Spoken Transcripts: An Empirical Study

Contact Info

Product

Resources

About