2021 29th European Signal Processing Conference (EUSIPCO) 2021
DOI: 10.23919/eusipco54536.2021.9616048
|View full text |Cite
|
Sign up to set email alerts
|

Study On the Temporal Pooling Used In Deep Neural Networks For Speaker Verification

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1

Citation Types

0
1
0

Year Published

2022
2022
2023
2023

Publication Types

Select...
2
1

Relationship

0
3

Authors

Journals

citations
Cited by 3 publications
(2 citation statements)
references
References 12 publications
0
1
0
Order By: Relevance
“…where EG[•] is the expectation on G, µ the empirical mean, and ∥ the concatenation. That is, P (g)∈R R•d concatenates the first R moments of g. In speaker verification: [20] shows that 3 rd -4 th moments alone are not useful; [21] uses R=4 for auxiliary tasks. In our case, we feed R=5 moments to the classifier.…”
Section: Our Methodsmentioning
confidence: 99%
“…where EG[•] is the expectation on G, µ the empirical mean, and ∥ the concatenation. That is, P (g)∈R R•d concatenates the first R moments of g. In speaker verification: [20] shows that 3 rd -4 th moments alone are not useful; [21] uses R=4 for auxiliary tasks. In our case, we feed R=5 moments to the classifier.…”
Section: Our Methodsmentioning
confidence: 99%
“…It is useful to note that the statistics pooling layer computes the mean and standard deviation vectors of its input before concatenating these values to form a new vector as the output. It was concluded that both mean and standard deviation pooling outperform max pooling in speaker identification and verification [86]. Since speaker identification is similar to language identification in terms of the NN-based model configuration, training, and evaluation, the statistics pooling layer is expected to achieve higher LID performance than the max pooling layer.…”
Section: X-vector Self-attention Lid Modelmentioning
confidence: 99%