Interspeech 2021 2021
DOI: 10.21437/interspeech.2021-1840
|View full text |Cite
|
Sign up to set email alerts
|

Temporal Context in Speech Emotion Recognition

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
7
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
4
2

Relationship

1
5

Authors

Journals

citations
Cited by 19 publications
(7 citation statements)
references
References 0 publications
0
7
0
Order By: Relevance
“…However, frame-level emotion states need to be recog-nized to realize our method. While only utterance-level emotion labels are given for most of the SER dataset, several studies [15,1,20] indicate that frame-level emotion information can still be inferred by training with a segment-based classification objective. Particularly, as shown in Figure 1.a, we finetune wav2vec to extract frame-level emotion representation that is useful for predicting an utterance-level emotion label.…”
Section: Pseudo-label Task Adaptive Pretrainingmentioning
confidence: 99%
See 4 more Smart Citations
“…However, frame-level emotion states need to be recog-nized to realize our method. While only utterance-level emotion labels are given for most of the SER dataset, several studies [15,1,20] indicate that frame-level emotion information can still be inferred by training with a segment-based classification objective. Particularly, as shown in Figure 1.a, we finetune wav2vec to extract frame-level emotion representation that is useful for predicting an utterance-level emotion label.…”
Section: Pseudo-label Task Adaptive Pretrainingmentioning
confidence: 99%
“…FCN+Attention [3] Spectrogram 63.9 Wav2vec w/o. FT [14] Wav2vec 64.3 Wav2vec w. FT [15] Waveform 66.9 Wav2vec 2.0 w/o. FT [16] Wav2vec 2.0 6 66.3 Wav2vec 2.0 w. V-FT Waveform 69.9 Wav2vec 2.0 w. TAPT Waveform 73.5 Wav2vec 2.0 w. P-TAPT Waveform…”
Section: Comparison With Prior Workmentioning
confidence: 99%
See 3 more Smart Citations