2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU) 2019
DOI: 10.1109/asru46091.2019.9003883
|View full text |Cite
|
Sign up to set email alerts
|

Online Batch Normalization Adaptation for Automatic Speech Recognition

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
4
0

Year Published

2021
2021
2023
2023

Publication Types

Select...
2
2
1

Relationship

0
5

Authors

Journals

citations
Cited by 5 publications
(4 citation statements)
references
References 22 publications
0
4
0
Order By: Relevance
“…Mana et al [161] showed that batch normalization layers can be also updated by recomputing the statistics µ and σ 2 in online fashion.…”
Section: Structured Transformsmentioning
confidence: 99%
See 1 more Smart Citation
“…Mana et al [161] showed that batch normalization layers can be also updated by recomputing the statistics µ and σ 2 in online fashion.…”
Section: Structured Transformsmentioning
confidence: 99%
“…[117]- [119], [180], [228], [231], [248], [291] Embedding Hybrid [56], [57], [61], [74], [130], [132], [138], [148], [150], [153], [159], [161], [168], [213], [292] E2E [62], [128], [217] Feature Hybrid [56], [74], [75], [135], [138], [168], [230], [285], [289], [290], [293] Data Hybrid [116], [193] embedding classes. The overall RERR is 9.72% 1 .…”
Section: Levelmentioning
confidence: 99%
“…Zhang et al [35] parametrized activation functions (ReLU and Sigmoid) while making some of these parameters SD. Wang et al [33] and Mana et al [20] repurposed scales and offsets of batch normalization as SD parameters. Zhao et al [37,38] found that most of the information in SD FC layers is stored in diagonals of the weight matrices and they proposed Low-Rank Plus Diagonal (LRPD, eLRPD) approaches which decompose (factorize) the original weight matrix of an FC layer into its diagonal and several smaller matrices.…”
Section: Related Workmentioning
confidence: 99%
“…Small speaker code is another method, which is learnt for each speaker via the back-propagation algorithm to transform the speaker features into a general speaker-independent feature space, in parallel with learning the large generic neural network [115,116]. [117,118] adapt the linear transformations in batch normalization to match the hidden layer input distribution between training and test data, so as to account for different speaker input distribution. Other than batch normalization, sigmoid and rectified linear unit (ReLU) parameters are generalised for adaptation as well [119].…”
Section: Model Adaptationmentioning
confidence: 99%