Learning to Create and Reuse Words in Open-Vocabulary Neural Language Modeling

Kawakami, Kazuya; Dyer, Chris; Blunsom, Phil

doi:10.18653/v1/p17-1137

Cited by 27 publications

(58 citation statements)

References 14 publications

(11 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Multi-lingual language modeling Training language models in non-English languages has only recently received some attention. Kawakami et al (2017) evaluate on seven languages. Cotterell et al (2018) study 21 languages.…”

Section: Cross-lingual Pretrained Language Modelsmentioning

confidence: 99%

MultiFiT: Efficient Multi-lingual Language Model Fine-tuning

Eisenschlos¹,

Ruder²,

Czapla³

et al. 2019

Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conferen

View full text Add to dashboard Cite

Pretrained language models are promising particularly for low-resource languages as they only require unlabelled data. However, training existing models requires huge amounts of compute, while pretrained cross-lingual models often underperform on low-resource languages. We propose Multi-lingual language model Fine-Tuning (MultiFiT) to enable practitioners to train and fine-tune language models efficiently in their own language. In addition, we propose a zero-shot method using an existing pretrained cross-lingual model. We evaluate our methods on two widely used cross-lingual classification datasets where they outperform models pretrained on orders of magnitude more data and compute. We release all models and code 1 .

show abstract

Section: Cross-lingual Pretrained Language Modelsmentioning

confidence: 99%

MultiFiT: Efficient Multi-lingual Language Model Fine-tuning

Eisenschlos¹,

Ruder²,

Czapla³

et al. 2019

Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conferen

View full text Add to dashboard Cite

show abstract

“…Recurrent neural language models can effectively learn complex dependencies, even in openvocabulary settings (Hwang and Sung, 2017;Kawakami et al, 2017). Whether the models are able to learn particular syntactic interactions is an intriguing question, and some methodologies have been presented to tease apart under what circumstances variously-trained models encode attested interactions (Linzen et al, 2016;Enguehard et al, 2017).…”

Section: Related Workmentioning

confidence: 99%

Are All Languages Equally Hard to Language-Model?

Cotterell¹,

Mielke²,

Eisner³

et al. 2018

Proceedings of the 2018 Conference of the North American Chapter Of the Association for Computational Linguistics: Hu

View full text Add to dashboard Cite

For general modeling methods applied to diverse languages, a natural question is: how well should we expect our models to work on languages with differing typological profiles? In this work, we develop an evaluation framework for fair cross-linguistic comparison of language models, using translated text so that all models are asked to predict approximately the same information. We then conduct a study on 21 languages, demonstrating that in some languages, the textual expression of the information is harder to predict with both n-gram and LSTM language models. We show complex inflectional morphology to be a cause of performance differences among languages.

show abstract

“…Multilingual Wikipedia Corpus The Multilingual Wikipedia Corpus (Kawakami, Dyer, and Blunsom 2017) contains 360 Wikipedia articles in English, French, Spanish, German, Russian, Czech, and Finnish. However, we re-tokenize the dataset, not only splitting on spaces (as Kawakami, Dyer, and Blunsom do) but also by splitting off each punctuation symbol as its own token.…”

Section: Datasetsmentioning

confidence: 99%

“…Our model allows for both, namely e(w) and h i . Finally, we also compare against the character-aware model of Kawakami, Dyer, and Blunsom (2017), both without (HCLM) and with their additional cache (HCLMcache). To our knowledge, that model has the best previously known performance on the raw (i.e., open-vocab) version of the WikiText-2 dataset, but we see in both Table 1 and Table 2 that our model and the PURE-BPE baseline beat it.…”

Section: Comparison To Baseline Modelsmentioning

confidence: 99%

“…Finally, the most relevant previous work is the (independently developed) model of Kawakami, Dyer, and Blunsom (2017), where each word has to be "spelled out" using a character-level RNN if it cannot be directly copied from the 45 1.386 1.36 1.317 1.45 1.414 1.42 1.362 1.88 1.856 1.70 1.652 1.63 1.598 FULL 1.45 1.387 1.36 1.319 1.51 1.465 1.42 1.363 1.95 1.928 1.79 1.751 1.74 1.709 Hwang and Sung (2017), there is no fixed vocabulary, so words that have fallen out of the cache have to be re-spelled. Our hierarchical generative story-specifically, the process that generates the lexicon-handles the re-use of words more gracefully.…”

Section: Related Workmentioning

confidence: 99%

See 1 more Smart Citation

Spell Once, Summon Anywhere: A Two-Level Open-Vocabulary Language Model

Mielke

Eisner

2019

AAAI

View full text Add to dashboard Cite

We show how the spellings of known words can help us deal with unknown words in open-vocabulary NLP tasks. The method we propose can be used to extend any closedvocabulary generative model, but in this paper we specifically consider the case of neural language modeling. Our Bayesian generative story combines a standard RNN language model (generating the word tokens in each sentence) with an RNNbased spelling model (generating the letters in each word type). These two RNNs respectively capture sentence structure and word structure, and are kept separate as in linguistics. By invoking the second RNN to generate spellings for novel words in context, we obtain an open-vocabulary language model. For known words, embeddings are naturally inferred by combining evidence from type spelling and token context. Comparing to baselines (including a novel strong baseline), we beat previous work and establish state-of-the-art results on multiple datasets. prior on embeddings · p spell (σ(w) | e(w)) spelling model for all types lexicon generation · n ∏ i=1 p LM (w i | w

show abstract

Learning to Create and Reuse Words in Open-Vocabulary Neural Language Modeling

Cited by 27 publications

References 14 publications

MultiFiT: Efficient Multi-lingual Language Model Fine-tuning

MultiFiT: Efficient Multi-lingual Language Model Fine-tuning

Are All Languages Equally Hard to Language-Model?

Spell Once, Summon Anywhere: A Two-Level Open-Vocabulary Language Model

Contact Info

Product

Resources

About