Semi-supervised discriminative language modeling for Turkish ASR

Çelebi, Arda; Sak, Haşim; Dikici, Erinç; Saraçlar, Murat; Lehr, Maider; Prud’hommeaux, Emily; Xu, Peng; Glenn, Nathan; Karakos, Damianos; Khudanpur, Sanjeev; Roark, Brian; Sagae, Kenji; Shafran, Izhak; Bikel, Daniel M.; Callison-Burch, Chris; Cao, Yuan; Hall, Keith B.; Hasler, Eva; Koehn, Philipp; Lopez, Adam; Post, Matt; Riley, Darcey

doi:10.1109/icassp.2012.6289049

Cited by 11 publications

(3 citation statements)

References 14 publications

(22 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…our claim of rapid adaptability of the system to varying mismatched acoustic and linguistic conditions. The extreme mismatched conditions involved in our experiments supports the possibility of going one step further and training our system on artificially generated data of noisy transformations of phrases as in [35,36,38,[57][58][59]. Thus possibly eliminating the need for an ASR for training purposes.…”

Section: F) Adaptationmentioning

confidence: 73%

“…Further, our work is different from discriminative training of acoustic [33] models and discriminative language models (DLM) [34], which are trained directly to optimize the word error rate using the reference transcripts. DLMs in particular involve optimizing, tuning, the weights of the language model with respect to the reference transcripts and are often utilized in re-ranking n-best ASR hypotheses [34][35][36][37][38]. The main distinction and advantage with our method is the NCPCM can potentially re-introduce learning from past mistakes: improving automatic speech recognition output via noisy-clean unseen or pruned-out phrases.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Learning from past mistakes: improving automatic speech recognition output via noisy-clean phrase context modeling

Shivakumar

Knight

et al. 2019

SIP

View full text Add to dashboard Cite

Automatic speech recognition (ASR) systems often make unrecoverable errors due to subsystem pruning (acoustic, language and pronunciation models); for example pruning words due to acoustics using short-term context, prior to rescoring with long-term context based on linguistics. In this work we model ASR as a phrase-based noisy transformation channel and propose an error correction system that can learn from the aggregate errors of all the independent modules constituting the ASR and attempt to invert those. The proposed system can exploit long-term context using a neural network language model and can better choose between existing ASR output possibilities as well as re-introduce previously pruned or unseen (out-of-vocabulary) phrases. It provides corrections under poorly performing ASR conditions without degrading any accurate transcriptions; such corrections are greater on top of out-of-domain and mismatched data ASR. Our system consistently provides improvements over the baseline ASR, even when baseline is further optimized through recurrent neural network language model rescoring. This demonstrates that any ASR improvements can be exploited independently and that our proposed system can potentially still provide benefits on highly optimized ASR. Finally, we present an extensive analysis of the type of errors corrected by our system.

show abstract

Section: F) Adaptationmentioning

confidence: 73%

Section: Introductionmentioning

confidence: 99%

Learning from past mistakes: improving automatic speech recognition output via noisy-clean phrase context modeling

Shivakumar

Knight

et al. 2019

SIP

View full text Add to dashboard Cite

show abstract

“…A comparison of various training methods for DLMs is given in [10]. Recently, semi-supervised discriminative language modeling has also been explored [11] [12] [13]. The work in [4] is the most related to our approach.…”

Section: Precious Workmentioning

confidence: 99%

Discriminative Language Model With Part-of-speech for Mandarin Large Vocabulary Continuous Speech Recognition System

Si¹,

Zhang

Zhang³

et al. 2013

Proceedings of the 2nd International Conference on Computer Science and Electronics Engineering (ICCSEE 2013)

View full text Add to dashboard Cite

Statistical language model, trained by a large number of text corpus, is an integral component in many speech and natural language model processing systems, such as speech recognition and machine translation. It is a probabilistic model which describes the distribution pattern of natural language. Over the last few decades, N-gram language model (LM) is the most popular technique since it is simple and effective. However, the training of the N-gram language model is based on the maximum likelihood rule resulting in suboptimal output in speech recognition systems. In this paper, a discriminative training based language model (DLM) which directly focused on minimizing speech recognition word error rate (WER) was employed to improve the performance of speech recognition system. In particular, the part-of-speech (POS) feature was used to train DLM as well as the n-gram features. Experimental results showed that DLM with n-gram features gave 1% absolute reduction in word error rate (WER). Combining n-gram features with POS feature, DLM could obtain another 0.4% absolute reduction in WER.

show abstract

AT-ST: Self-training Adaptation Strategy for OCR in Domains with Limited Transcriptions

Kišš

Beneš

Hradiš

2021

Lecture Notes in Computer Science

View full text Add to dashboard Cite

Semi-supervised discriminative language modeling for Turkish ASR

Cited by 11 publications

References 14 publications

Learning from past mistakes: improving automatic speech recognition output via noisy-clean phrase context modeling

Learning from past mistakes: improving automatic speech recognition output via noisy-clean phrase context modeling

Discriminative Language Model With Part-of-speech for Mandarin Large Vocabulary Continuous Speech Recognition System

AT-ST: Self-training Adaptation Strategy for OCR in Domains with Limited Transcriptions

Contact Info

Product

Resources

About