ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2022
DOI: 10.1109/icassp43922.2022.9747397
|View full text |Cite
|
Sign up to set email alerts
|

MM-DFN: Multimodal Dynamic Fusion Network for Emotion Recognition in Conversations

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

1
23
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
4
2
1

Relationship

0
7

Authors

Journals

citations
Cited by 54 publications
(24 citation statements)
references
References 21 publications
1
23
0
Order By: Relevance
“…MMGCN [13] constructs a fully connected graph to model multimodal and long-distance contextual information, and speaker embeddings are added for encoding speaker information. MM-DFN [17] designs a graphbased dynamic fusion module to reduce redundancy and enhance complementarity between modalities. MMTr [24] preserves the integrity of main modal representations and enhances weak modal representations by using multi-head attention.…”
Section: A Emotion Recognition In Conversationsmentioning
confidence: 99%
See 2 more Smart Citations
“…MMGCN [13] constructs a fully connected graph to model multimodal and long-distance contextual information, and speaker embeddings are added for encoding speaker information. MM-DFN [17] designs a graphbased dynamic fusion module to reduce redundancy and enhance complementarity between modalities. MMTr [24] preserves the integrity of main modal representations and enhances weak modal representations by using multi-head attention.…”
Section: A Emotion Recognition In Conversationsmentioning
confidence: 99%
“…MM-DFN [17]: It designs a graph-based dynamic fusion module to fuse multimodal context features, and this module could reduce redundancy and enhance complementarity between modalities.…”
Section: Baselinesmentioning
confidence: 99%
See 1 more Smart Citation
“…It can be observed that SpeechFormer++ with HuBERT features noticeably outperforms the previous works by a large margin of +3.1% WF1 and +1.2% WA. When compared under the hand-crafted features, SpeechFormer++ outperforms ConGCN [78], MMFA-RNN [19], MM-DFN [18] and CTNet [37] in terms of WF1. Note that SpeechFormer++ is simply applied in MELD and does not utilize the context and speaker information.…”
Section: B Speech Emotion Recognition On Meldmentioning
confidence: 99%
“…Recently, deep learning methods have delivered superior performance for PSP tasks owing to their remarkable modeling capabilities. For example, convolutional neural networks (CNNs) [10]- [16], graph neural networks (GNNs) [17], [18], recurrent neural networks (RNNs) [19]- [21] and two popular variants of the RNNs named long shortterm memory (LSTM) [22]- [24] and gated recurrent units (GRUs) [25] have achieved promising results in PSP domain.…”
Section: Introductionmentioning
confidence: 99%