“…To align with previous studies [20], we use 7,487 utterances from seven emotions: frustration, neutral, anger, sadness, excitement, happiness, surprise. Since there is no standard split for this dataset, we follow [20,14] to perform 10-fold cross-validation, where 8:1:1 are used for training, validation and test, respectively. The weighted accuracy (WA, i.e., the overall accuracy) and unweighted accuracy (UA, i.e., the average accuracy over all emotion categories) is adopted as the evaluation metrics.…”