Gated Word-Character Recurrent Language Model

Miyamoto, Yasumasa; Cho, Kyunghyun

doi:10.18653/v1/d16-1209

Cited by 77 publications

(88 citation statements)

References 14 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Most of the word-character hybrid models focus on input representation rather than generation. Usually, their representations are concatenated, or composition functions are learned Miyamoto and Cho, 2016). Even though they use word information to the input, the decoding process of their models is still in the character-level.…”

Section: Related Workmentioning

confidence: 99%

Subword Language Model for Query Auto-Completion

Kim

2019

Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conferen

View full text Add to dashboard Cite

Current neural query auto-completion (QAC) systems rely on character-level language models, but they slow down when queries are long. We present how to utilize subword language models for the fast and accurate generation of query completion candidates. Representing queries with subwords shorten a decoding length significantly. To deal with issues coming from introducing subword language model, we develop a retrace algorithm and a reranking method by approximate marginalization. As a result, our model achieves up to 2.5 times faster while maintaining a similar quality of generated results compared to the character-level baseline. Also, we propose a new evaluation metric, mean recoverable length (MRL), measuring how many upcoming characters the model could complete correctly. It provides more explicit meaning and eliminates the need for prefix length sampling for existing rank-based metrics. Moreover, we performed a comprehensive analysis with ablation study to figure out the importance of each component 1 .

show abstract

Section: Related Workmentioning

confidence: 99%

Subword Language Model for Query Auto-Completion

Kim

2019

Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conferen

View full text Add to dashboard Cite

show abstract

“…In this work, we will use a mixture model over M different models for generating words in place of the single softmax over words (Miyamoto and Cho, 2016;Neubig and Dyer, 2016):…”

Section: Word Generation Mixture Modelmentioning

confidence: 99%

Using Morphological Knowledge in Open-Vocabulary Neural Language Models

Matthews¹,

Neubig²,

Dyer³

2018

Proceedings of the 2018 Conference of the North American Chapter Of the Association for Computational Linguistics: Hu

View full text Add to dashboard Cite

Languages with productive morphology pose problems for language models that generate words from a fixed vocabulary. Although character-based models allow any possible word type to be generated, they are linguistically naïve: they must discover that words exist and are delimited by spaces-basic linguistic facts that are built in to the structure of word-based models. We introduce an openvocabulary language model that incorporates more sophisticated linguistic knowledge by predicting words using a mixture of three generative processes: (1) by generating words as a sequence of characters, (2) by directly generating full word forms, and (3) by generating words as a sequence of morphemes that are combined using a hand-written morphological analyzer. Experiments on Finnish, Turkish, and Russian show that our model outperforms character sequence models and other strong baselines on intrinsic and extrinsic measures. Furthermore, we show that our model learns to exploit morphological knowledge encoded in the analyzer, and, as a byproduct, it can perform effective unsupervised morphological disambiguation.

show abstract

“…Dos Santos and Zadrozny (2014) join word and character representations in a deep neural network for part-of-speech tagging. Finally, Miyamoto and Cho (2016) describe a LM that is related to our model, although their character-level embedding is generated by a bidirectional LSTM and we do not use a gate to determine how much of the word and how much of the character embedding is used. However, they only compare to a simple baseline model of 2 LSTM layers of each 200 hidden units without dropout, resulting in a higher baseline perplexity (as mentioned in section 4.3, our CW model also achieves larger improvements than reported in this paper with respect to that baseline).…”

Section: Related Workmentioning

confidence: 99%

“…Miyamoto and Cho (2016) only report results for a small model that is trained without dropout, resulting in a baseline perplexity of 115.65. If we train our small model without dropout we get a comparable baseline perplexity (116.33) and a character-word perplexity of 110.54 (compare to 109.05 reported by Miyamoto and Cho (2016)). It remains to be seen whether their model performs equally well compared to better baselines.…”

Section: Englishmentioning

confidence: 99%

See 1 more Smart Citation

Character-Word LSTM Language Models

Verwimp¹,

Pelemans²,

hamme³

et al. 2017

Proceedings of the 15th Conference of the European Chapter of The Association for Computational Linguistics: Volume 1

View full text Add to dashboard Cite

We present a Character-Word Long ShortTerm Memory Language Model which both reduces the perplexity with respect to a baseline word-level language model and reduces the number of parameters of the model. Character information can reveal structural (dis)similarities between words and can even be used when a word is out-of-vocabulary, thus improving the modeling of infrequent and unknown words. By concatenating word and character embeddings, we achieve up to 2.77% relative improvement on English compared to a baseline model with a similar amount of parameters and 4.57% on Dutch. Moreover, we also outperform baseline word-level models with a larger number of parameters.

show abstract

Gated Word-Character Recurrent Language Model

Cited by 77 publications

References 14 publications

Subword Language Model for Query Auto-Completion

Subword Language Model for Query Auto-Completion

Using Morphological Knowledge in Open-Vocabulary Neural Language Models

Character-Word LSTM Language Models

Contact Info

Product

Resources

About