Proceedings of the 28th ACM International Conference on Multimedia 2020
DOI: 10.1145/3394171.3413690
|View full text |Cite
|
Sign up to set email alerts
|

Cm-Bert

Abstract: Multimodal sentiment analysis is an emerging research field that aims to enable machines to recognize, interpret, and express emotion. Through the cross-modal interaction, we can get more comprehensive emotional characteristics of the speaker. Bidirectional Encoder Representations from Transformers (BERT) is an efficient pre-trained language representation model. Fine-tuning it has obtained new state-of-the-art results on eleven natural language processing tasks like question answering and natural language inf… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
10
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
4
2
1

Relationship

0
7

Authors

Journals

citations
Cited by 73 publications
(10 citation statements)
references
References 30 publications
0
10
0
Order By: Relevance
“…In line with similar conclusions from psychology literature surrounding the importance of paralinguistic cues in the communication process, there are a number of studies employing sentiment analysis techniques that suggest a combination of text and audio data may improve classification accuracy, and consequently create a more robust representation of sentiment (Bhaskar et al, 2014; Dair et al, 2021; Houjeij et al, 2012; Yang et al, 2020). Hence, given that prior literature suggests both textual and vocal characteristics of earnings calls to be informative, and that Natural Language Processing literature finds a combination of text and audio to significantly increase classification accuracy, the adhesion of both measures represents a natural future direction for the literature.…”
Section: Discussionmentioning
confidence: 57%
See 2 more Smart Citations
“…In line with similar conclusions from psychology literature surrounding the importance of paralinguistic cues in the communication process, there are a number of studies employing sentiment analysis techniques that suggest a combination of text and audio data may improve classification accuracy, and consequently create a more robust representation of sentiment (Bhaskar et al, 2014; Dair et al, 2021; Houjeij et al, 2012; Yang et al, 2020). Hence, given that prior literature suggests both textual and vocal characteristics of earnings calls to be informative, and that Natural Language Processing literature finds a combination of text and audio to significantly increase classification accuracy, the adhesion of both measures represents a natural future direction for the literature.…”
Section: Discussionmentioning
confidence: 57%
“…The gold standard for multimodal sentiment analysis is considered to be the combination of all three communication modalities-text, audio and visual. Various studies have used the combination of all three modalities to define sentiment, showing that the use of a tri-modality model is more robust at classifying sentiment over bi-modal and singular modality models (Bhaskar et al, 2014;Dair et al, 2021;Houjeij et al, 2012;Morency et al, 2011;Poria et al, 2015;Yang et al, 2020). 44 The main advantage of using multimodal classifiers for sentiment classification is the additional behavioural cues provided by the visual and audio data.…”
Section: Multimodal Analysismentioning
confidence: 99%
See 1 more Smart Citation
“…Researchers have been working on different combinations of text, audio, and image modalities that can enhance prediction accuracy [10]. This area of study focuses on various methods of integrating multimodal information, primarily through feature fusion and decision fusion, as outlined in several studies [22][12] [5]. The choice of method often depends on the specific application and the nature of the data being analyzed.…”
Section: Introductionmentioning
confidence: 99%
“…Emotion recognition in conversations (ERC) is vital and very challenging in the natural human machine interaction [1], intelligent education tutoring [2], and mental health analysis applications [3]. In daily life, humans utter a multi-turn conversation in a natural way which conveys emotion state through language and nonverbal content (e.g., facial expression and body language) [4].…”
Section: Introductionmentioning
confidence: 99%