2024
DOI: 10.1007/s11042-023-17944-9
|View full text |Cite
|
Sign up to set email alerts
|

Survey of deep emotion recognition in dynamic data using facial, speech and textual cues

Tao Zhang,
Zhenhua Tan
Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1

Citation Types

0
2
0

Year Published

2024
2024
2024
2024

Publication Types

Select...
1

Relationship

0
1

Authors

Journals

citations
Cited by 1 publication
(2 citation statements)
references
References 165 publications
0
2
0
Order By: Relevance
“…In summary, encoder A consists of computing the log-Mel spectrograms (Equation ( 8)), reshaping and normalization of spectrograms (Equation ( 9)), and emotion encoding using text processing models (Equation (10) or Equation ( 11)); encoder B utilizes the fine-tuning of pre-trained models (Equation (12) or Equation ( 13)), and the dual-stream outputs are fused and the framework is trained towards speech emotion prediction (Equation ( 14)). How to define the output of the CAF module is described in Equation (20).…”
Section: Dual-stream Representation Of Audio Signalsmentioning
confidence: 99%
See 1 more Smart Citation
“…In summary, encoder A consists of computing the log-Mel spectrograms (Equation ( 8)), reshaping and normalization of spectrograms (Equation ( 9)), and emotion encoding using text processing models (Equation (10) or Equation ( 11)); encoder B utilizes the fine-tuning of pre-trained models (Equation (12) or Equation ( 13)), and the dual-stream outputs are fused and the framework is trained towards speech emotion prediction (Equation ( 14)). How to define the output of the CAF module is described in Equation (20).…”
Section: Dual-stream Representation Of Audio Signalsmentioning
confidence: 99%
“…Many multi-modal frameworks [10] have been developed for automated emotion recognition, since multi-modal representation offers the potential for a thorough and nuanced comprehension of emotional states. Liu et al explore peripheral physiological signals, EEG, and facial videos, and emotion dictionary learning with modality attention is proposed for mixed emotion recognition [11].…”
Section: Introductionmentioning
confidence: 99%