2019
DOI: 10.1609/aaai.v33i01.33017216
|View full text |Cite
|
Sign up to set email alerts
|

Words Can Shift: Dynamically Adjusting Word Representations Using Nonverbal Behaviors

Abstract: Humans convey their intentions through the usage of both verbal and nonverbal behaviors during face-to-face communication. Speaker intentions often vary dynamically depending on different nonverbal contexts, such as vocal patterns and facial expressions. As a result, when modeling human language, it is essential to not only consider the literal meaning of the words but also the nonverbal contexts in which these words appear. To better model human language, we first model expressive nonverbal representations by… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

1
176
1

Year Published

2019
2019
2022
2022

Publication Types

Select...
5
3
1

Relationship

1
8

Authors

Journals

citations
Cited by 285 publications
(195 citation statements)
references
References 15 publications
(18 reference statements)
1
176
1
Order By: Relevance
“…We compare HFFN with following multimodal algorithms: RMFN (Liang et al, 2018a), MFN (Zadeh et al, 2018a), MCTN (Pham et al, 2019), BC-LSTM (Poria et al, 2017b), TFN , MARN (Zadeh et al, 2018b), LMF ), MFM (Tsai et al, 2019, MR-RF (Barezi et al, 2018), FAF (Gu et al, 2018b), RAVEN (Wang et al, 2019), GMFN (Zadeh et al, 2018c), Memn2n (Sukhbaatar et al, 2015), MM-B2 , CHFusion (Majumder et al, 2018), SVM Trees (Rozgic et al, 2012), CMN , C-MKL (Poria et al, 2016b) and CAT-LSTM (Poria et al, 2017c).…”
Section: Comparison With Baselinesmentioning
confidence: 99%
“…We compare HFFN with following multimodal algorithms: RMFN (Liang et al, 2018a), MFN (Zadeh et al, 2018a), MCTN (Pham et al, 2019), BC-LSTM (Poria et al, 2017b), TFN , MARN (Zadeh et al, 2018b), LMF ), MFM (Tsai et al, 2019, MR-RF (Barezi et al, 2018), FAF (Gu et al, 2018b), RAVEN (Wang et al, 2019), GMFN (Zadeh et al, 2018c), Memn2n (Sukhbaatar et al, 2015), MM-B2 , CHFusion (Majumder et al, 2018), SVM Trees (Rozgic et al, 2012), CMN , C-MKL (Poria et al, 2016b) and CAT-LSTM (Poria et al, 2017c).…”
Section: Comparison With Baselinesmentioning
confidence: 99%
“…Despite the popularity of this strategy, it has been shown to fail to fully capture cross-modal interactions [14,15]. Consequently, several multimodal feature representation strategies have been proposed for various applications [16,14,15,17]. Our work continues this line of research by investigating multimodal feature representation strategies for spoken words, as evaluated on the task of word importance prediction.…”
Section: Related Workmentioning
confidence: 97%
“…Rather than considering these two modalities as independent observations of speech, we focus on their cross-modal interaction to obtain a unified representation. We recognize that non-verbal cues during face-to-face communications contribute to influencing how humans understand spoken words [17]. Prosody is one such channel in spoken dialogue that is important in conversational speech, where speakers attach prosodic prominence to words (or sub-word components) to help listeners disambiguate meaning [24,25,26].…”
Section: Related Workmentioning
confidence: 99%
“…e experimental results show that Bi-LSTM framework prevail over traditional HMM framework. Wang et al [23] applied a the recurrent attended variation embedding network (RAVEN) for the multimodal emotion recognition, and the LSTM is used to extract features from single mode. e multimodal emotion studies above utilized the deep neural network model, and the results outperformed the traditional methods.…”
Section: Related Workmentioning
confidence: 99%