2019
DOI: 10.3390/e21070640
|View full text |Cite
|
Sign up to set email alerts
|

Estimating Predictive Rate–Distortion Curves via Neural Variational Inference

Abstract: The Predictive Rate–Distortion curve quantifies the trade-off between compressing information about the past of a stochastic process and predicting its future accurately. Existing estimation methods for this curve work by clustering finite sequences of observations or by utilizing analytically known causal states. Neither type of approach scales to processes such as natural languages, which have large alphabets and long dependencies, and where the causal states are not known analytically. We describe Neural Pr… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
10
0

Year Published

2020
2020
2022
2022

Publication Types

Select...
6
1

Relationship

2
5

Authors

Journals

citations
Cited by 10 publications
(10 citation statements)
references
References 50 publications
0
10
0
Order By: Relevance
“…Within rate-distortion theory, models that optimally allocate limited memory resources to best predict the future of a sequence have been studied, under the name of the Predictive Information Bottleneck (Still, 2014). The Predictive Information Bottleneck has recently been applied to study the resource requirements required for predicting linguistic sequences by Hahn and Futrell (2019).…”
Section: Toward More Sophisticated Noise Modelsmentioning
confidence: 99%
See 1 more Smart Citation
“…Within rate-distortion theory, models that optimally allocate limited memory resources to best predict the future of a sequence have been studied, under the name of the Predictive Information Bottleneck (Still, 2014). The Predictive Information Bottleneck has recently been applied to study the resource requirements required for predicting linguistic sequences by Hahn and Futrell (2019).…”
Section: Toward More Sophisticated Noise Modelsmentioning
confidence: 99%
“…We believe the most promising way to instantiate lossy-context surprisal in a broad coverage model is using RNNs with explicitly constrained memory capacity (e.g., using methods such as those developed by Alemi, Fischer, Dillon, & Murphy, 2017;Hahn & Futrell, 2019). Lossy-context surprisal predicts that such RNNs will yield surprisal values which are more predictive of human reading times than unconstrained RNNs (though any RNN is operating under at least mild memory constraints).…”
Section: Prospects For a Broad-coverage Modelmentioning
confidence: 99%
“…There is a hypothesis by Hilberg [ 16 ] that the excess entropy of natural language is infinite. This hypothesis can be partly confirmed by the original estimates of conditional entropy by Shannon [ 17 ], by the power-law decay of the estimates of the entropy rate given by the PPM compression algorithm [ 18 ], by the approximately power-law growth of vocabulary called Heaps’ or Herdan’s law [ 2 , 3 , 19 , 20 ], and by some other experiments applying neural statistical language models [ 21 , 22 ]. In parallel, Dębowski [ 1 , 2 , 3 ] supposed that the very large excess entropy in natural language may be caused by the fact that texts in natural language describe some relatively slowly evolving and very complex reality.…”
Section: Applicationsmentioning
confidence: 84%
“…Hahn, M. and Futrell, R., Estimating Predictive Rate–Distortion Curves via Neural Variational Inference [ 12 ].…”
Section: The Contributionsmentioning
confidence: 99%