2021
DOI: 10.1109/taslp.2021.3049898
|View full text |Cite
|
Sign up to set email alerts
|

CTNet: Conversational Transformer Network for Emotion Recognition

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
45
0

Year Published

2021
2021
2023
2023

Publication Types

Select...
5
1
1

Relationship

0
7

Authors

Journals

citations
Cited by 142 publications
(79 citation statements)
references
References 49 publications
0
45
0
Order By: Relevance
“…Table 3 compares the performance of the proposed model with the existing studies that also implemented the multimodal architecture and tested it on the MELD. Most of the previous studies [ 36 , 39 ] only considered the audio and text modalities. However, the study by Siriwardhana et al [ 38 ] proposed a multimodal fusion model for combining the modality of audio, face, and text and achieved state-of-the-art results.…”
Section: Resultsmentioning
confidence: 99%
See 3 more Smart Citations
“…Table 3 compares the performance of the proposed model with the existing studies that also implemented the multimodal architecture and tested it on the MELD. Most of the previous studies [ 36 , 39 ] only considered the audio and text modalities. However, the study by Siriwardhana et al [ 38 ] proposed a multimodal fusion model for combining the modality of audio, face, and text and achieved state-of-the-art results.…”
Section: Resultsmentioning
confidence: 99%
“…The transformer is a network architecture that purely depends on the attention mechanism without any recurrent structure [ 35 ]. The latest studies focused on using attention mechanisms to fuse different modalities of features for MMER [ 36 , 37 , 38 , 39 ]. Ho et al [ 36 ] proposed a multimodal approach based on a multilevel multi-head fusion attention mechanism and RNN to combine audio and text modalities for emotion estimation.…”
Section: Related Studiesmentioning
confidence: 99%
See 2 more Smart Citations
“…For categorical emotion recognition, most of the state-ofthe-arts utilize Accuracy(or called Recall) [96] and F 1 score to evaluate the performance of models. Here we suppose there are C emotion classes in a dataset.…”
Section: ) Categoricalmentioning
confidence: 99%