On efficient training of word classes and their application to recurrent neural network language models

Botros, Rami; Irie, Kazuki; Sundermeyer, Martin; Ney, Hermann

doi:10.21437/interspeech.2015-345

Cited by 13 publications

(7 citation statements)

References 14 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…2 We are using a multithreaded exchange implementation and stop the training when the cost stops decreasing. Our observation that an optimized exchange implementation can be faster than Brown clustering is in line with an earlier comparison [6].…”

Section: A Statistical Methods For Clustering Words Into Classessupporting

confidence: 90%

“…"kalvo", "kalvo+", "+kalvo", and "+kalvo+") are treated as separate tokens in language model training. As high-order n-grams are required to provide enough context information for subword-based modeling, we use variable-length n-gram models trained using the VariKN toolkit 6 that implements the Kneser-Ney growing and revised Kneser pruning algorithms [28].…”

Section: Subword Language Modelsmentioning

confidence: 99%

“…We evaluate different algorithms for clustering words into classes. Recent comparisons have shown an advantage in perplexity for the exchange algorithm over Brown clustering, while clusterings created from distributed word representations have not worked as well [6], [7], [8]. We present additionally a novel rule-based algorithm that clusters colloquial Finnish word forms, and also evaluate word error rate.…”

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

Automatic Speech Recognition With Very Large Conversational Finnish and Estonian Vocabularies

Enarvi

Smit

Virpioja

et al. 2017

IEEE/ACM Trans. Audio Speech Lang. Process.

View full text Add to dashboard Cite

Today, the vocabulary size for language models in large vocabulary speech recognition is typically several hundreds of thousands of words. While this is already sufficient in some applications, the out-of-vocabulary words are still limiting the usability in others. In agglutinative languages the vocabulary for conversational speech should include millions of word forms to cover the spelling variations due to colloquial pronunciations, in addition to the word compounding and inflections. Very large vocabularies are also needed, for example, when the recognition of rare proper names is important.Previously, very large vocabularies have been efficiently modeled in conventional n-gram language models either by splitting words into subword units or by clustering words into classes. While vocabulary size is not as critical anymore in modern speech recognition systems, training time and memory consumption become an issue when state-of-the-art neural network language models are used. In this paper we investigate techniques that address the vocabulary size issue by reducing the effective vocabulary size and by processing large vocabularies more efficiently.The experimental results in conversational Finnish and Estonian speech recognition indicate that properly defined word classes improve recognition accuracy. Subword n-gram models are not better on evaluation data than word n-gram models constructed from a vocabulary that includes all the words in the training corpus. However, when recurrent neural network (RNN) language models are used, their ability to utilize long contexts gives a larger gain to subword-based modeling. Our best results are from RNN language models that are based on statistical morphs. We show that the suitable size for a subword vocabulary depends on the language. Using time delay neural network (TDNN) acoustic models, we were able to achieve new state of the art in Finnish and Estonian conversational speech recognition, 27.1 % word error rate in the Finnish task and 21.9 % in the Estonian task.Index Terms-language modeling, word classes, subword units, artificial neural networks, automatic speech recognition 2329-9290

show abstract

Section: A Statistical Methods For Clustering Words Into Classessupporting

confidence: 90%

Section: Subword Language Modelsmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Automatic Speech Recognition With Very Large Conversational Finnish and Estonian Vocabularies

Enarvi

Smit

Virpioja

et al. 2017

IEEE/ACM Trans. Audio Speech Lang. Process.

View full text Add to dashboard Cite

show abstract

“…Later work discussed efficient implementations using the word-class and class-word statistics, as well as extension to trigram clustering [27]. While trigram statistics may provide improvements for a small number of classes, they often result in overlearning, and the best performance is normally obtained with bigram clustering [27,38]. The evaluation step may be parallelized for each word [38].…”

Section: Exchange Algorithmmentioning

confidence: 99%

Morphologically motivated word classes for very large vocabulary speech recognition of Finnish and Estonian

Varjokallio

Virpioja

Kurimo

2021

Computer Speech & Language

View full text Add to dashboard Cite

“…To rapidly perform repetitive experiments, we train the translation models with the in-domain TED portion of the dataset (roughly 2.5M running words for each side). We run the monolingual word clustering algorithm of (Botros et al, 2015) on each side of the parallel training data to obtain class label vocabularies (Section 3).…”

Section: Comparison Of Vocabulariesmentioning

confidence: 99%

A Comparative Study on Vocabulary Reduction for Phrase Table Smoothing

Kim¹,

Guta²,

Wuebker³

et al. 2016

Proceedings of the First Conference on Machine Translation: Volume 1, Research Papers

Self Cite

View full text Add to dashboard Cite

This work systematically analyzes the smoothing effect of vocabulary reduction for phrase translation models. We extensively compare various word-level vocabularies to show that the performance of smoothing is not significantly affected by the choice of vocabulary. This result provides empirical evidence that the standard phrase translation model is extremely sparse. Our experiments also reveal that vocabulary reduction is more effective for smoothing large-scale phrase tables.

show abstract

On efficient training of word classes and their application to recurrent neural network language models

Cited by 13 publications

References 14 publications

Automatic Speech Recognition With Very Large Conversational Finnish and Estonian Vocabularies

Automatic Speech Recognition With Very Large Conversational Finnish and Estonian Vocabularies

Morphologically motivated word classes for very large vocabulary speech recognition of Finnish and Estonian

A Comparative Study on Vocabulary Reduction for Phrase Table Smoothing

Contact Info

Product

Resources

About