2011
DOI: 10.1145/1897816.1897842
|View full text |Cite
|
Sign up to set email alerts
|

The sequence memoizer

Abstract: Probabilistic models of sequences play a central role in most machine translation, automated speech recognition, lossless compression, spell-checking, and gene identification applications to name but a few. Unfortunately, real-world sequence data often exhibit long range dependencies which can only be captured by computationally challenging, complex models. Sequence data arising from natural processes also often exhibits power-law properties, yet common sequence models do not capture such properties. The seque… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
37
0

Year Published

2014
2014
2021
2021

Publication Types

Select...
4
2
2

Relationship

0
8

Authors

Journals

citations
Cited by 41 publications
(37 citation statements)
references
References 17 publications
0
37
0
Order By: Relevance
“…For example, promising research directions include using a general law such as the Zipf-Mandelbrot law (Mandelbrot, 1965), a sophisticated model that cares the order of words such as hierarchical Pitman-Yor processes (Wood et al, 2011), and smoothing/backoff methods to handle the sparseness problem.…”
Section: Resultsmentioning
confidence: 99%
See 1 more Smart Citation
“…For example, promising research directions include using a general law such as the Zipf-Mandelbrot law (Mandelbrot, 1965), a sophisticated model that cares the order of words such as hierarchical Pitman-Yor processes (Wood et al, 2011), and smoothing/backoff methods to handle the sparseness problem.…”
Section: Resultsmentioning
confidence: 99%
“…The frequencies of high-order k-grams tend to be lower than in reality. We might need to place a hierarchical assumption on the a power law, as in done in hierarchical Pitman-Yor processes (Wood et al, 2011). Fig.…”
Section: Methodsmentioning
confidence: 99%
“…To demonstrate the efficacy of exploiting long-range temporal dependencies in modeling, we compared the performance of our proposed approach with the n-gram temporal model [22]. Fig 6 clearly illustrates the benefit of modeling long-range temporal contexts in the data.…”
Section: Discussionmentioning
confidence: 99%
“…On the other hand, if k = i − 1, it puts the entire history of the sequence into consideration. Addressing such long contexts is a valid concern in previous work [22]. However, a recently developed probabilistic model [21] broke through this limitation by exploring an infinite length of context in a discrete data sequence.…”
Section: Joint Segmentation and Classificationmentioning
confidence: 99%
“…in [19] and for more general sequence modelling in [20]. It was also shown to have a remarkable connection to interpolated Kneser-Ney, which is currently still one of the most effective language models since it was first proposed more than a decade ago [21], [22].…”
Section: Introductionmentioning
confidence: 94%