Interspeech 2019 2019
DOI: 10.21437/interspeech.2019-1788
|View full text |Cite
|
Sign up to set email alerts
|

Learning Speaker Aware Offsets for Speaker Adaptation of Neural Networks

Abstract: In this work, we present an unsupervised long short-term memory (LSTM) layer normalization technique that we call adaptation by speaker aware offsets (ASAO). These offsets are learned using an auxiliary network attached to the main senone classifier. The auxiliary network takes main network LSTM activations as input and tries to reconstruct speaker, (speaker,phone) and (speaker,senone)-level averages of the activations by minimizing the mean-squared error. Once the auxiliary network is jointly trained with the… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
2

Citation Types

1
10
0

Year Published

2020
2020
2023
2023

Publication Types

Select...
3
2

Relationship

1
4

Authors

Journals

citations
Cited by 8 publications
(11 citation statements)
references
References 17 publications
(27 reference statements)
1
10
0
Order By: Relevance
“…Our earlier work on adaptation by speaker aware offsets [12], [34] can also be grouped under the affine transformation category where speaker embeddings generated through an auxiliary network are used as bias vectors and subtracted from main network activations. As compared to [12], [34], in the current work, we investigate more general affine transformations. We also experiment with adding a nonlinearity to the transformation.…”
Section: A Speaker Adaptationmentioning
confidence: 99%
See 4 more Smart Citations
“…Our earlier work on adaptation by speaker aware offsets [12], [34] can also be grouped under the affine transformation category where speaker embeddings generated through an auxiliary network are used as bias vectors and subtracted from main network activations. As compared to [12], [34], in the current work, we investigate more general affine transformations. We also experiment with adding a nonlinearity to the transformation.…”
Section: A Speaker Adaptationmentioning
confidence: 99%
“…The proposed system for joint speaker adaptation and change detection combines the speaker adaptation scheme of [34] with Siamese change detection [30], and introduces an attention block to the auxiliary network. The attention block allows us to integrate ideas from the Siamese network into the new joint model.…”
Section: Joint Speaker Adaptation and Change Detectionmentioning
confidence: 99%
See 3 more Smart Citations