2016
DOI: 10.1162/tacl_a_00112
|View full text |Cite
|
Sign up to set email alerts
|

Fast, Small and Exact: Infinite-order Language Modelling with Compressed Suffix Trees

Abstract: Efficient methods for storing and querying are critical for scaling high-order m-gram language models to large corpora. We propose a language model based on compressed suffix trees, a representation that is highly compact and can be easily held in memory, while supporting queries needed in computing language model probabilities on-the-fly. We present several optimisations which improve query runtimes up to 2500×, despite only incurring a modest increase in construction time and memory usage. For large corpora … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

1
19
0

Year Published

2018
2018
2020
2020

Publication Types

Select...
3
2
1

Relationship

0
6

Authors

Journals

citations
Cited by 27 publications
(20 citation statements)
references
References 17 publications
1
19
0
Order By: Relevance
“…Once our index is built, creating one with a di↵erent context selection criterion is faster than building the competitor from scratch, and it takes half the space required for building our index from scratch. Building the index in [32] takes between 5 and 9 bytes per character, which is comparable to our construction, and between 1.1 and 3.2 microseconds per character, which is faster than or comparable to our non-pruned index.…”
Section: Complexity and Comparison To The Competitorsmentioning
confidence: 51%
See 4 more Smart Citations
“…Once our index is built, creating one with a di↵erent context selection criterion is faster than building the competitor from scratch, and it takes half the space required for building our index from scratch. Building the index in [32] takes between 5 and 9 bytes per character, which is comparable to our construction, and between 1.1 and 3.2 microseconds per character, which is faster than or comparable to our non-pruned index.…”
Section: Complexity and Comparison To The Competitorsmentioning
confidence: 51%
“…Results are in Figure 5. We don't compare scoring time to [32], since the latter supports just one scoring function which is significantly di↵erent from the ones we consider.…”
Section: Comparison To the Competitorsmentioning
confidence: 99%
See 3 more Smart Citations