Consistent Training and Decoding for End-to-End Speech Recognition Using Lattice-Free MMI

Tian, Jinchuan; Yu, Jianwei; Weng, Chao; Zhang, Shixiong; Su, Dan; Yu, Dong; Zou, Yuexian

doi:10.1109/icassp43922.2022.9746579

Cited by 11 publications

(2 citation statements)

References 18 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Table 1 compares the (CER) result of our model on the Aishell-1 test dataset with a few public models include: Espnet [33], WeNet [24], K2 [34] and Neural Transducer+LFMMI [35]. The first three models are all AED model structures, and the last is NT based.…”

Section: Results Of Aishell-1mentioning

confidence: 99%

Improving Mandarin Speech Recogntion with Block-augmented Transformer

Ren¹,

Zhu²,

Liuwei³

et al. 2022

Preprint

View full text Add to dashboard Cite

Recently Convolution-augmented Transformer (Conformer)[1] has shown promising results in Automatic Speech Recognition (ASR), outperforming the previous best published Transformer Transducer[2]. In this work, we believe that the output information of each block in the encoder and decoder is not completely inclusive, in other words, their output information may be complementary. We study how to take advantage of the complementary information of each block in a parameter-efficient way, and it is expected that this may lead to more robust performance. Therefore we propose the Block-augmented Transformer for speech recognition, named Blockformer. We have implemented two block ensemble methods: the base Weighted Sum of the Blocks Output (Base-WSBO), and the Squeeze-and-Excitation module [3] to Weighted Sum of the Blocks Output (SE-WSBO). Experiments have proved that the Blockformer significantly outperforms the state-of-the-art Conformer-based models on AISHELL-1, our model achieves a CER of 4.35% without using a language model and 4.10% with an external language model on the testset.

show abstract

Section: Results Of Aishell-1mentioning

confidence: 99%

Improving Mandarin Speech Recogntion with Block-augmented Transformer

Ren¹,

Zhu²,

Liuwei³

et al. 2022

Preprint

View full text Add to dashboard Cite

show abstract

“…We use the same Dev and Test dataset from AISHELL-1 as performance evaluation for private data. Inspecting previous Mandarin speech recognition results [34], RNN-Transducer from ESPNet [35] backbone appears to be a top ASR candidate and is used in our experiments. We follow the benchmark setup to build up our Mandarin ASR with a Conformer encoder, a Transformer decoder, and an LSTM prediction network with 135M trainable model parameters.…”

Section: Continuous Speech Recognition and Resultsmentioning

confidence: 99%

An Ensemble Teacher-Student Learning Approach with Poisson Sub-sampling to Differential Privacy Preserving Speech Recognition

Yang¹,

Qi²,

Siniscalchi³

et al. 2022

Preprint

View full text Add to dashboard Cite

We propose an ensemble learning framework with Poisson sub-sampling to effectively train a collection of teacher models to issue some differential privacy (DP) guarantee for training data. Through boosting under DP, a student model derived from the training data suffers little model degradation from the models trained with no privacy protection. Our proposed solution leverages upon two mechanisms, namely: (i) a privacy budget amplification via Poisson sub-sampling to train a target prediction model that requires less noise to achieve a same level of privacy budget, and (ii) a combination of the sub-sampling technique and an ensemble teacher-student learning framework that introduces DP-preserving noise at the output of the teacher models and transfers DP-preserving properties via noisy labels. Privacy-preserving student models are then trained with the noisy labels to learn the knowledge with DP-protection from the teacher model ensemble. Experimental evidences on spoken command recognition and continuous speech recognition of Mandarin speech show that our proposed framework greatly outperforms existing benchmark DP-preserving algorithms in both speech processing tasks.

show abstract