2019
DOI: 10.1609/aaai.v33i01.33016843
|View full text |Cite
|
Sign up to set email alerts
|

Spell Once, Summon Anywhere: A Two-Level Open-Vocabulary Language Model

Abstract: We show how the spellings of known words can help us deal with unknown words in open-vocabulary NLP tasks. The method we propose can be used to extend any closedvocabulary generative model, but in this paper we specifically consider the case of neural language modeling. Our Bayesian generative story combines a standard RNN language model (generating the word tokens in each sentence) with an RNNbased spelling model (generating the letters in each word type). These two RNNs respectively capture sentence structur… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
10
0

Year Published

2020
2020
2024
2024

Publication Types

Select...
6
2
2

Relationship

0
10

Authors

Journals

citations
Cited by 27 publications
(11 citation statements)
references
References 19 publications
0
10
0
Order By: Relevance
“…With various lexers written for the TokenBuffer API, users can also create their high-speed custom tokenizers with ease. The package also provides a simple reversible tokenizer (Mielke, 2019;Mielke & Eisner, 2018) that works by leaving certain merge symbols, as a means to reconstruct tokens into the original string.…”
Section: Discussionmentioning
confidence: 99%
“…With various lexers written for the TokenBuffer API, users can also create their high-speed custom tokenizers with ease. The package also provides a simple reversible tokenizer (Mielke, 2019;Mielke & Eisner, 2018) that works by leaving certain merge symbols, as a means to reconstruct tokens into the original string.…”
Section: Discussionmentioning
confidence: 99%
“…The language-specific value of this random slope parameter is then indicative of a stronger or weaker relationship between H and f for this language. As information encoding units, we estimate on two levels: on the level of words and, instead of estimating on the level of characters, we tokenize our text into sub-word units by byte pair encoding (BPE) 63,64 which plays an important role in many state-of-the-art natural language model applications 65,66 and provides strong baseline results on a multilingual corpus 67 . In total, we trained seven different LMs on the dataranging from very simple n-gram models to stateof-the-art deep neural networks (Table 3).…”
Section: Lines Represent Fitted Values Based On An Ansatz Function Th...mentioning
confidence: 99%
“…Some of the contexts are allowed to be non-contiguous in order to capture longer-term dependencies 48 and CMIX uses long short-term memory 61 (LSTM) trained by backpropagation as a byte-level mixer 59 . In addition, instead of estimating on the level of either characters or words, we tokenize our text into sub-word units by byte pair encoding (BPE) 27,62 which plays an important role in many state-of-the-art natural language model applications such as GPT-3 63 or SentencePiece 64 and provides strong baseline results on a multilingual corpus 65 .…”
Section: Comparing Complexity Rankings Across Corporamentioning
confidence: 99%