ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2020
DOI: 10.1109/icassp40776.2020.9054151
|View full text |Cite
|
Sign up to set email alerts
|

An Improved Deep Neural Network for Modeling Speaker Characteristics at Different Temporal Scales

Abstract: This paper presents an improved deep embedding learning method based on convolutional neural network (CNN) for text-independent speaker verification. Two improvements are proposed for x-vector embedding learning: (1) Multiscale convolution (MSCNN) is adopted in frame-level layers to capture complementary speaker information in different receptive fields.(2) A Baum-Welch statistics attention (BWSA) mechanism is applied in pooling-layer, which can integrate more useful long-term speaker characteristics in the te… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2

Citation Types

0
2
0

Year Published

2020
2020
2023
2023

Publication Types

Select...
4
2

Relationship

0
6

Authors

Journals

citations
Cited by 7 publications
(2 citation statements)
references
References 26 publications
0
2
0
Order By: Relevance
“…• Single-head Baum-Welch statistics attention mechanism based statistics pooling [116]: To overcome the weakness of ( 12) which cannot fully mine the inner relationship between an utterance and its frames, [116] integrated the Baum-Welch statistics into the attention mechanism:…”
Section: Attention Pooling Methodsmentioning
confidence: 99%
See 1 more Smart Citation
“…• Single-head Baum-Welch statistics attention mechanism based statistics pooling [116]: To overcome the weakness of ( 12) which cannot fully mine the inner relationship between an utterance and its frames, [116] integrated the Baum-Welch statistics into the attention mechanism:…”
Section: Attention Pooling Methodsmentioning
confidence: 99%
“…The key matrix K is calculated from the Baum-Welch statistics. Specifically, [116] first calculates the normalized first order statistics f c from the cth component of a GMM-UBM model Ω (see ( 5)), and then conducts the following nonlinear transform:…”
Section: Attention Pooling Methodsmentioning
confidence: 99%