The Multimodal Sentiment Analysis in Car Reviews (MuSe-CaR) Dataset: Collection, Insights and Improvements

Stappen, Lukas; Baird, Alice; Schumann, Lea; Schuller, Björn W.

doi:10.48550/arxiv.2101.06053

Cited by 4 publications

(11 citation statements)

References 69 publications

(103 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…. $15.00 https://doi.org/10.1145/3453892.3461009 fusion (LF) [3,9] approaches are the most prominent techniques. As discussed in [4] by N.Majumder et al, sequential (or hierarchical in their case) late fusion can filter out inter-modal correlation.…”

Section: Related Workmentioning

confidence: 99%

Sequential Late Fusion Technique for Multi-modal Sentiment Analysis

Banerjee

Lygerakis

Makedon

2021

Proceedings of the 14th PErvasive Technologies Related to Assistive Environments Conference

View full text Add to dashboard Cite

Multi-modal sentiment analysis plays an important role for providing better interactive experiences to users. Each modality in multi-modal data can provide different viewpoints or reveal unique aspects of a user's emotional state. In this work, we use text, audio and visual modalities from MOSI dataset and we propose a novel fusion technique using a multi-head attention LSTM network. Finally, we perform a classification task and evaluate its performance. CCS CONCEPTS• Computing methodologies → Modeling methodologies.

show abstract

Section: Related Workmentioning

confidence: 99%

Sequential Late Fusion Technique for Multi-modal Sentiment Analysis

Banerjee

Lygerakis

Makedon

2021

Proceedings of the 14th PErvasive Technologies Related to Assistive Environments Conference

View full text Add to dashboard Cite

show abstract

“…The MuSe-CaR [30] is a multimodal data set gathered in-the-wild from English YouTube videos centred around car reviews. It was created with different computational tasks in mind, allowing researchers to improve the machine's understanding of how sentiment and topics are connected.…”

Section: Muse-carmentioning

confidence: 99%

“…These pipelines provide timestamps for each word (start and end point of an utterance) through which all words articulated within a segment can be assigned to it. Due to the in-the-wild factors, the error rate of the automatic transcriptions is estimated to be relatively high and specified at around 25 % on a subset of 10 hand-transcribed videos by the authors of [30].…”

Section: Muse-carmentioning

confidence: 99%

See 1 more Smart Citation

GraphTMT: Unsupervised Graph-based Topic Modeling from Video Transcripts

Stappen

Jason

Hagerer³

et al. 2021

2021 IEEE Seventh International Conference on Multimedia Big Data (BigMM)

Self Cite

View full text Add to dashboard Cite

To unfold the tremendous amount of audiovisual data uploaded daily to social media platforms, effective topic modelling techniques are needed. Existing work tends to apply variants of topic models on text data sets. In this paper, we aim at developing a topic extractor on video transcriptions. The model improves coherence by exploiting neural word embeddings through a graph-based clustering method. Unlike typical topic models, this approach works without knowing the true number of topics. Experimental results on the real-life multimodal data set MuSe-CaR demonstrates that our approach extracts coherent and meaningful topics, outperforming baseline methods. Furthermore, we successfully demonstrate the generalisability of our approach on a pure text review data set.

show abstract

“…Furthermore, we utilize a richly annotated data set of ca. 600 h of continuous annotations (Stappen et al, 2021), and derive cross-task features from this initial correlation analysis. Second, we compare these engineered, lean features, to a computationally intensive feature selection approach and to all features when predicting selected engagement indicators (i.e., views, likes, number of comments, likes of the comments).…”

Section: Introductionmentioning

confidence: 99%

An Estimation of Online Video User Engagement From Features of Time- and Value-Continuous, Dimensional Emotions

Stappen¹,

Baird²,

Lienhart³

et al. 2022

Front. Comput. Sci.

Self Cite

View full text Add to dashboard Cite

Portraying emotion and trustworthiness is known to increase the appeal of video content. However, the causal relationship between these signals and online user engagement is not well understood. This limited understanding is partly due to a scarcity in emotionally annotated data and the varied modalities which express user engagement online. In this contribution, we utilize a large dataset of YouTube review videos which includes ca. 600 h of dimensional arousal, valence and trustworthiness annotations. We investigate features extracted from these signals against various user engagement indicators including views, like/dislike ratio, as well as the sentiment of comments. In doing so, we identify the positive and negative influences which single features have, as well as interpretable patterns in each dimension which relate to user engagement. Our results demonstrate that smaller boundary ranges and fluctuations for arousal lead to an increase in user engagement. Furthermore, the extracted time-series features reveal significant (p < 0.05) correlations for each dimension, such as, count below signal mean (arousal), number of peaks (valence), and absolute energy (trustworthiness). From this, an effective combination of features is outlined for approaches aiming to automatically predict several user engagement indicators. In a user engagement prediction paradigm we compare all features against semi-automatic (cross-task), and automatic (task-specific) feature selection methods. These selected feature sets appear to outperform the usage of all features, e.g., using all features achieves 1.55 likes per day (Lp/d) mean absolute error from valence; this improves through semi-automatic and automatic selection to 1.33 and 1.23 Lp/d, respectively (data mean 9.72 Lp/d with a std. 28.75 Lp/d).

show abstract

The Multimodal Sentiment Analysis in Car Reviews (MuSe-CaR) Dataset: Collection, Insights and Improvements

Cited by 4 publications

References 69 publications

Sequential Late Fusion Technique for Multi-modal Sentiment Analysis

Sequential Late Fusion Technique for Multi-modal Sentiment Analysis

GraphTMT: Unsupervised Graph-based Topic Modeling from Video Transcripts

An Estimation of Online Video User Engagement From Features of Time- and Value-Continuous, Dimensional Emotions

Contact Info

Product

Resources

About