Advances in Neural Information Processing Systems 14 2002
DOI: 10.7551/mitpress/1120.003.0065
|View full text |Cite
|
Sign up to set email alerts
|

Entropy and Inference, Revisited

Abstract: We study properties of popular, near-uniform, priors for learning undersampled probability distributions on discrete nonmetric spaces and show that they lead to disastrous results. However, an Occam-style phase space argument allows us to salvage the priors and turn the problems into a surprisingly good estimator of entropies of discrete distributions.

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1

Citation Types

1
38
0

Year Published

2016
2016
2024
2024

Publication Types

Select...
5
3

Relationship

0
8

Authors

Journals

citations
Cited by 78 publications
(39 citation statements)
references
References 12 publications
1
38
0
Order By: Relevance
“…as the Dirichlet distribution over the (K − 1) simplex with all parameters α 1 = • • • = α K = α. Sampling from this Dirichlet with a fixed α, however, has the undesirable effect of generating distributions with a very narrow distribution of entropy H(X) [59]. To generate distributions with a near-uniform of entropy, we sample α from a Nemenman-Shafee-Bialek (NSB) prior [60] p(α) ∝ Kψ(Kα + 1) + ψ(α + 1), for a distribution over an alphabet of size K. For simulations of n binary variables, we set K = 2 n , sample α using the equation above, and then sample p X from a symmetric Dirichlet using standard algorithms.…”
Section: Discussionmentioning
confidence: 99%
“…as the Dirichlet distribution over the (K − 1) simplex with all parameters α 1 = • • • = α K = α. Sampling from this Dirichlet with a fixed α, however, has the undesirable effect of generating distributions with a very narrow distribution of entropy H(X) [59]. To generate distributions with a near-uniform of entropy, we sample α from a Nemenman-Shafee-Bialek (NSB) prior [60] p(α) ∝ Kψ(Kα + 1) + ψ(α + 1), for a distribution over an alphabet of size K. For simulations of n binary variables, we set K = 2 n , sample α using the equation above, and then sample p X from a symmetric Dirichlet using standard algorithms.…”
Section: Discussionmentioning
confidence: 99%
“…Fixedsize sliding windows are varied from 1 to 5. Word entropy calculations were made using the NSB estimator [24].…”
Section: (H Von Neumann (G)) = H Von Neumann G ∪(Uv) − H Von Neumann ...mentioning
confidence: 99%
“…The maximum likelihood entropy estimator is known to underestimate the true entropy in practical applications [23]. A range of more advanced entropy estimators have been proposed to overcome this limitation [23][24][25][26]. Here, we used the NSB estimator [24] to calculate word entropy.…”
mentioning
confidence: 99%
See 1 more Smart Citation
“…Using several corpora and tackling some problems of word entropy estimation [18], provided a public database of entropy values for 1259 languages. Since all entropy estimators are strongly correlated, for our experiments we used the entropy values provided by the NSB estimator [20].…”
Section: Information-theoretic Entropymentioning
confidence: 99%