Interspeech 2015 2015
DOI: 10.21437/interspeech.2015-345
|View full text |Cite
|
Sign up to set email alerts
|

On efficient training of word classes and their application to recurrent neural network language models

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

1
6
0

Year Published

2016
2016
2022
2022

Publication Types

Select...
4
2
1

Relationship

1
6

Authors

Journals

citations
Cited by 13 publications
(7 citation statements)
references
References 14 publications
1
6
0
Order By: Relevance
“…2 We are using a multithreaded exchange implementation and stop the training when the cost stops decreasing. Our observation that an optimized exchange implementation can be faster than Brown clustering is in line with an earlier comparison [6].…”
Section: A Statistical Methods For Clustering Words Into Classessupporting
confidence: 90%
See 2 more Smart Citations
“…2 We are using a multithreaded exchange implementation and stop the training when the cost stops decreasing. Our observation that an optimized exchange implementation can be faster than Brown clustering is in line with an earlier comparison [6].…”
Section: A Statistical Methods For Clustering Words Into Classessupporting
confidence: 90%
“…"kalvo", "kalvo+", "+kalvo", and "+kalvo+") are treated as separate tokens in language model training. As high-order n-grams are required to provide enough context information for subword-based modeling, we use variable-length n-gram models trained using the VariKN toolkit 6 that implements the Kneser-Ney growing and revised Kneser pruning algorithms [28].…”
Section: Subword Language Modelsmentioning
confidence: 99%
See 1 more Smart Citation
“…Later work discussed efficient implementations using the word-class and class-word statistics, as well as extension to trigram clustering [27]. While trigram statistics may provide improvements for a small number of classes, they often result in overlearning, and the best performance is normally obtained with bigram clustering [27,38]. The evaluation step may be parallelized for each word [38].…”
Section: Exchange Algorithmmentioning
confidence: 99%
“…To rapidly perform repetitive experiments, we train the translation models with the in-domain TED portion of the dataset (roughly 2.5M running words for each side). We run the monolingual word clustering algorithm of (Botros et al, 2015) on each side of the parallel training data to obtain class label vocabularies (Section 3).…”
Section: Comparison Of Vocabulariesmentioning
confidence: 99%