Online Batch Normalization Adaptation for Automatic Speech Recognition

Mana, Franco; Weninger, Felix; Gemello, Roberto; Zhan, Puming

doi:10.1109/asru46091.2019.9003883

Cited by 5 publications

(4 citation statements)

References 22 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Mana et al [161] showed that batch normalization layers can be also updated by recomputing the statistics µ and σ 2 in online fashion.…”

Section: Structured Transformsmentioning

confidence: 99%

“…[117]- [119], [180], [228], [231], [248], [291] Embedding Hybrid [56], [57], [61], [74], [130], [132], [138], [148], [150], [153], [159], [161], [168], [213], [292] E2E [62], [128], [217] Feature Hybrid [56], [74], [75], [135], [138], [168], [230], [285], [289], [290], [293] Data Hybrid [116], [193] embedding classes. The overall RERR is 9.72% 1 .…”

Section: Levelmentioning

confidence: 99%

See 1 more Smart Citation

Adaptation Algorithms for Neural Network-Based Speech Recognition: An Overview

Bell

Fainberg

Klejch

et al. 2021

IEEE Open J. Signal Process.

View full text Add to dashboard Cite

“…Mana et al [161] showed that batch normalization layers can be also updated by recomputing the statistics µ and σ 2 in online fashion.…”

Section: Structured Transformsmentioning

confidence: 99%

Section: Levelmentioning

confidence: 99%

Adaptation Algorithms for Neural Network-Based Speech Recognition: An Overview

Bell

Fainberg

Klejch

et al. 2021

IEEE Open J. Signal Process.

View full text Add to dashboard Cite

“…Zhang et al [35] parametrized activation functions (ReLU and Sigmoid) while making some of these parameters SD. Wang et al [33] and Mana et al [20] repurposed scales and offsets of batch normalization as SD parameters. Zhao et al [37,38] found that most of the information in SD FC layers is stored in diagonals of the weight matrices and they proposed Low-Rank Plus Diagonal (LRPD, eLRPD) approaches which decompose (factorize) the original weight matrix of an FC layer into its diagonal and several smaller matrices.…”

Section: Related Workmentioning

confidence: 99%

TS-Net: OCR Trained to Switch Between Text Transcription Styles

Jan¹,

Hradiš²

2021

Preprint

View full text Add to dashboard Cite

Users of OCR systems, from different institutions and scientific disciplines, prefer and produce different transcription styles. This presents a problem for training of consistent text recognition neural networks on real-world data. We propose to extend existing text recognition networks with a Transcription Style Block (TSB) which can learn from data to switch between multiple transcription styles without any explicit knowledge of transcription rules. TSB is an adaptive instance normalization conditioned by identifiers representing consistently transcribed documents (e.g. single document, documents by a single transcriber, or an institution). We show that TSB is able to learn completely different transcription styles in controlled experiments on artificial data, it improves text recognition accuracy on large-scale real-world data, and it learns semantically meaningful transcription style embeddings. We also show how TSB can efficiently adapt to transcription styles of new documents from transcriptions of only a few text lines.

show abstract

“…Small speaker code is another method, which is learnt for each speaker via the back-propagation algorithm to transform the speaker features into a general speaker-independent feature space, in parallel with learning the large generic neural network [115,116]. [117,118] adapt the linear transformations in batch normalization to match the hidden layer input distribution between training and test data, so as to account for different speaker input distribution. Other than batch normalization, sigmoid and rectified linear unit (ReLU) parameters are generalised for adaptation as well [119].…”

Section: Model Adaptationmentioning

confidence: 99%

Improving end-to-end transformer model architecture in ASR

Zhao¹

View full text Add to dashboard Cite

I am also particularly grateful to Gary for the advice and encouragement dispensed in the course of the review process for all my papers, and the help in critiquing my work.I want to extend my thanks to Alibaba Group and the Singapore Economic Development Board (EDB) for the support and funding of this Ph.D project, and also to Nanyang Technological University for the conducive environment in the Interdisciplinary Graduate School.Last, but not least, this work could not have been achieved without the love and support of my family, my husband who helped me through in many ways, and our daughter who brings endless joy to our lives.

show abstract

Online Batch Normalization Adaptation for Automatic Speech Recognition

Cited by 5 publications

References 22 publications

Adaptation Algorithms for Neural Network-Based Speech Recognition: An Overview

Adaptation Algorithms for Neural Network-Based Speech Recognition: An Overview

TS-Net: OCR Trained to Switch Between Text Transcription Styles

Improving end-to-end transformer model architecture in ASR

Contact Info

Product

Resources

About