Proceedings of the 55th Annual Meeting of the Association For Computational Linguistics (Volume 1: Long Papers) 2017
DOI: 10.18653/v1/p17-1137
|View full text |Cite
|
Sign up to set email alerts
|

Learning to Create and Reuse Words in Open-Vocabulary Neural Language Modeling

Abstract: Fixed-vocabulary language models fail to account for one of the most characteristic statistical facts of natural language: the frequent creation and reuse of new word types. Although character-level language models offer a partial solution in that they can create word types not attested in the training corpus, they do not capture the "bursty" distribution of such words. In this paper, we augment a hierarchical LSTM language model that generates sequences of word tokens character by character with a caching mec… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

1
56
0

Year Published

2018
2018
2020
2020

Publication Types

Select...
4
4

Relationship

0
8

Authors

Journals

citations
Cited by 27 publications
(58 citation statements)
references
References 14 publications
(11 reference statements)
1
56
0
Order By: Relevance
“…Multi-lingual language modeling Training language models in non-English languages has only recently received some attention. Kawakami et al (2017) evaluate on seven languages. Cotterell et al (2018) study 21 languages.…”
Section: Cross-lingual Pretrained Language Modelsmentioning
confidence: 99%
“…Multi-lingual language modeling Training language models in non-English languages has only recently received some attention. Kawakami et al (2017) evaluate on seven languages. Cotterell et al (2018) study 21 languages.…”
Section: Cross-lingual Pretrained Language Modelsmentioning
confidence: 99%
“…Recurrent neural language models can effectively learn complex dependencies, even in openvocabulary settings (Hwang and Sung, 2017;Kawakami et al, 2017). Whether the models are able to learn particular syntactic interactions is an intriguing question, and some methodologies have been presented to tease apart under what circumstances variously-trained models encode attested interactions (Linzen et al, 2016;Enguehard et al, 2017).…”
Section: Related Workmentioning
confidence: 99%
“…Multilingual Wikipedia Corpus The Multilingual Wikipedia Corpus (Kawakami, Dyer, and Blunsom 2017) contains 360 Wikipedia articles in English, French, Spanish, German, Russian, Czech, and Finnish. However, we re-tokenize the dataset, not only splitting on spaces (as Kawakami, Dyer, and Blunsom do) but also by splitting off each punctuation symbol as its own token.…”
Section: Datasetsmentioning
confidence: 99%
“…Our model allows for both, namely e(w) and h i . Finally, we also compare against the character-aware model of Kawakami, Dyer, and Blunsom (2017), both without (HCLM) and with their additional cache (HCLMcache). To our knowledge, that model has the best previously known performance on the raw (i.e., open-vocab) version of the WikiText-2 dataset, but we see in both Table 1 and Table 2 that our model and the PURE-BPE baseline beat it.…”
Section: Comparison To Baseline Modelsmentioning
confidence: 99%
See 1 more Smart Citation