The sequence memoizer

Wood, Frank; Gasthaus, Jan; Archambeau, Cédric; James, Lancelot F.; Teh, Yee Whye

doi:10.1145/1897816.1897842

Cited by 41 publications

(37 citation statements)

References 17 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…For example, promising research directions include using a general law such as the Zipf-Mandelbrot law (Mandelbrot, 1965), a sophisticated model that cares the order of words such as hierarchical Pitman-Yor processes (Wood et al, 2011), and smoothing/backoff methods to handle the sparseness problem.…”

Section: Resultsmentioning

confidence: 99%

See 1 more Smart Citation

Perplexity on Reduced Corpora

Kobayashi

2014

Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

View full text Add to dashboard Cite

This paper studies the idea of removing low-frequency words from a corpus, which is a common practice to reduce computational costs, from a theoretical standpoint. Based on the assumption that a corpus follows Zipf's law, we derive tradeoff formulae of the perplexity of k-gram models and topic models with respect to the size of the reduced vocabulary. In addition, we show an approximate behavior of each formula under certain conditions. We verify the correctness of our theory on synthetic corpora and examine the gap between theory and practice on real corpora.

show abstract

Section: Resultsmentioning

confidence: 99%

“…The frequencies of high-order k-grams tend to be lower than in reality. We might need to place a hierarchical assumption on the a power law, as in done in hierarchical Pitman-Yor processes (Wood et al, 2011). Fig.…”

Section: Methodsmentioning

confidence: 99%

Perplexity on Reduced Corpora

Kobayashi

2014

Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

View full text Add to dashboard Cite

show abstract

“…To demonstrate the efficacy of exploiting long-range temporal dependencies in modeling, we compared the performance of our proposed approach with the n-gram temporal model [22]. Fig 6 clearly illustrates the benefit of modeling long-range temporal contexts in the data.…”

Section: Discussionmentioning

confidence: 99%

“…On the other hand, if k = i − 1, it puts the entire history of the sequence into consideration. Addressing such long contexts is a valid concern in previous work [22]. However, a recently developed probabilistic model [21] broke through this limitation by exploring an infinite length of context in a discrete data sequence.…”

Section: Joint Segmentation and Classificationmentioning

confidence: 99%

Temporal Sequence Modeling for Video Event Detection

Cheng¹,

Fan²,

Pankanti³

et al. 2014

2014 IEEE Conference on Computer Vision and Pattern Recognition

View full text Add to dashboard Cite

show abstract

“…in [19] and for more general sequence modelling in [20]. It was also shown to have a remarkable connection to interpolated Kneser-Ney, which is currently still one of the most effective language models since it was first proposed more than a decade ago [21], [22].…”

Section: Introductionmentioning

confidence: 94%

Latent IBP Compound Dirichlet Allocation

Archambeau

Lakshminarayanan

Bouchard

2015

IEEE Trans. Pattern Anal. Mach. Intell.

View full text Add to dashboard Cite

Abstract-We introduce the four-parameter IBP compound Dirichlet process (ICDP), a stochastic process that generates sparse nonnegative vectors with potentially an unbounded number of entries. If we repeatedly sample from the ICDP we can generate sparse matrices with an infinite number of columns and power-law characteristics. We apply the four-parameter ICDP to sparse nonparametric topic modelling to account for the very large number of topics present in large text corpora and the power-law distribution of the vocabulary of natural languages. The model, which we call latent IBP compound Dirichlet allocation (LIDA), allows for power-law distributions, both, in the number of topics summarising the documents and in the number of words defining each topic. It can be interpreted as a sparse variant of the hierarchical Pitman-Yor process when applied to topic modelling. We derive an efficient and simple collapsed Gibbs sampler closely related to the collapsed Gibbs sampler of latent Dirichlet allocation (LDA), making the model applicable in a wide range of domains. Our nonparametric Bayesian topic model compares favourably to the widely used hierarchical Dirichlet process and its heavy tailed version, the hierarchical Pitman-Yor process, on benchmark corpora. Experiments demonstrate that accounting for the power-distribution of real data is beneficial and that sparsity provides more interpretable results.

show abstract

The sequence memoizer

Cited by 41 publications

References 17 publications

Perplexity on Reduced Corpora

Perplexity on Reduced Corpora

Temporal Sequence Modeling for Video Event Detection

Latent IBP Compound Dirichlet Allocation

Contact Info

Product

Resources

About