ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2020
DOI: 10.1109/icassp40776.2020.9054317
|View full text |Cite
|
Sign up to set email alerts
|

X-Vectors Meet Emotions: A Study On Dependencies Between Emotion and Speaker Recognition

Abstract: In this work, we explore the dependencies between speaker recognition and emotion recognition. We first show that knowledge learned for speaker recognition can be reused for emotion recognition through transfer learning. Then, we show the effect of emotion on speaker recognition. For emotion recognition, we show that using a simple linear model is enough to obtain good performance on the features extracted from pre-trained models such as the x-vector model. Then, we improve emotion recognition performance by f… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

3
58
0

Year Published

2020
2020
2023
2023

Publication Types

Select...
5
4

Relationship

0
9

Authors

Journals

citations
Cited by 83 publications
(61 citation statements)
references
References 33 publications
3
58
0
Order By: Relevance
“…As mentioned before, x-vectors are DNN speaker embeddings that have seen a growing use in speaker recognition and paralinguistic tasks [16]. While i-vectors represent the total variability subspace of a channel or speaker, x-vectors aim to represent discriminative features between speakers.…”
Section: X-vector Extractionmentioning
confidence: 99%
See 1 more Smart Citation
“…As mentioned before, x-vectors are DNN speaker embeddings that have seen a growing use in speaker recognition and paralinguistic tasks [16]. While i-vectors represent the total variability subspace of a channel or speaker, x-vectors aim to represent discriminative features between speakers.…”
Section: X-vector Extractionmentioning
confidence: 99%
“…Proposed by Snyder, x-vectors [13] are discriminative DNN speaker embeddings that have outperformed i-vectors in tasks such as speaker and language recognition [14,15]. Recent advances suggest that x-vectors have been successfully applied to paralinguistic tasks such as emotion recognition [16], and to the detection of diseases like Obstructive Sleep Apnea [17] and Alzheimer's [18]. Following the line of research present in [11] and [12], we investigate the reliability of using x-vector speaker embeddings as features for automatic intelligibility prediction in the context of HNC.…”
Section: Introductionmentioning
confidence: 99%
“…In this paper, 40-dimensional (-d) FBKs with a 10 ms frame duration and 25 ms frame length are used, which is denoted FBK25. FBK features have information about the short-term spectrum but do not contain pitch information that can be important in describing emotional speech [20] and is often complementary to FBKs [21,22]. The log pitch frequency features with probability-of-voicing-weighted mean subtraction over a 1.5 second window are used along with FBKs [23].…”
Section: Audio Featuresmentioning
confidence: 99%
“…Although DNNs are already outperforming traditional approaches [21], that is not true for all tasks and data sets [22]. This has led the community to adopt transfer learning approaches, starting from feature-based [23] and recently moving to DL approaches [24,25,26]. Hence, understanding how transfer learning works could lead to the design of more powerful algorithms that unlock the full potential of DL for SER, and other low-resource audio tasks.…”
Section: Introductionmentioning
confidence: 99%