Approximation of Density Functions by Sequences of Exponential Families

Barron, Andrew R.; Sheu, Chyong-Hwa

doi:10.1214/aos/1176348252

Cited by 174 publications

(218 citation statements)

References 39 publications

Supporting

Mentioning

213

Contrasting

Unclassified

Order By: Relevance

“…In recent years, theory has been developed in which a parametric family is not restricted to a given size, but rather the dimension of the family is increased at a certain rate as a function of the sample size, so as to get the smallest possible total risk, uniformly over classes of smooth functions, (see Cox, 1988, Stone, 1990, Barren and Sheu, 1991. A surprising aspect of this work is that the same rates of convergence of the total risk that are achievable by nonparametric estimators can be achieved by sequences of parametric families.…”

Section: Introductionmentioning

confidence: 99%

Untitled

Barron

1994

Machine Learning

Self Cite

View full text Add to dashboard Cite

Abstract. For a common class of artificial neural networks, the mean integrated squared error between the estimated network and a target function / is shown to be bounded by where n is the number of nodes, d is the input dimension of the function, N is the number of training observations, and Cf is the first absolute moment of the Fourier magnitude distribution of /. The two contributions to this total risk are the approximation error and the estimation error. Approximation error refers to the distance between the target function and the closest neural network function of a given architecture and estimation error refers to the distance between this ideal network function and an estimated network function. With n ~ Cf(N/(dlog AT))1/2 nodes, the order of the bound on the mean integrated squared error is optimized to be O(Cf ((d/N) logN)1/2).The bound demonstrates surprisingly favorable properties of network estimation compared to traditional series and nonparametric curve estimation techniques in the case that d is moderately large. Similar bounds are obtained when the number of nodes n is not preselected as a function of Cf (which is generally not known a priori), but rather the number of nodes is optimized from the observed data by the use of a complexity regularization or minimum description length criterion. The analysis involves Fourier techniques for the approximation error, metric entropy considerations for the estimation error, and a calculation of the index of resolvability of minimum complexity estimation of the family of networks.

show abstract

Section: Introductionmentioning

confidence: 99%

Untitled

Barron

1994

Machine Learning

Self Cite

View full text Add to dashboard Cite

show abstract

“…One approach, based on the Kullback-Leibler divergence, was first considered by Barron and Sheu [10]. Basically, it uses the following well-known Pythagorean property (see Lemma 3 in [10]):…”

Section: Consistency and Generalization Bounds Of Estimation Errormentioning

confidence: 99%

“…The restriction, λ ∈ Ω, will guarantee that the maximum likelihood estimate is an interior point of the set of λ's for which p λ (x) is defined. The optimal solution, pλ(x), of Equation (1) or Equation (3) is called the information projection [10,17] of p 0 (x) to the exponential family, E(x).…”

Section: Introductionmentioning

confidence: 99%

Consistency and Generalization Bounds for Maximum Entropy Density Estimation

Wang

Greiner

Wang

2013

Entropy

View full text Add to dashboard Cite

show abstract

“…In order to impose the non-negative condition, a novel transformation approach is proposed using a log function. This is partially inspired by the method presented in [23].…”

Section: Introductionmentioning

confidence: 99%

Negative-free approximation of probability density function for nonlinear projection filter

Kim

Richardson

2016

2016 IEEE 55th Conference on Decision and Control (CDC)

View full text Add to dashboard Cite

Abstract-Several approaches have been developed to estimate probability density function (pdf). The pdf has two important properties: the integration of pdf over whole sampling space is equal to 1 and the value of pdf in the sampling space is greater than or equal to zero. The first constraint can be easily achieved by the normalisation. On the other hand, it is very hard to impose the non-negativeness in the sampling space. In the pdf estimation, some areas in the sampling space might have negative pdf values. It produces unreasonable moment values such as negative probability or variance. A transformation to guarantee the negative-free pdf over a chosen sampling space is presented and it is applied to the nonlinear projection filter. The filter approximates the pdf to solve nonlinear estimation problems. For simplicity, one-dimensional nonlinear system is used as an example to show the derivations and it can be readily generalised for higher dimensional systems. The efficiency of the proposed method is demonstrated by numerical simulations. The simulations also show that to achieve the same level of approximation error in the filter the required number of basis functions with the transformation is a lot smaller compared to the ones without transformation. This will be hugely benefited when the filter is used for high dimensional systems, which requires significantly less computational cost.

show abstract

Approximation of Density Functions by Sequences of Exponential Families

Cited by 174 publications

References 39 publications

Untitled

Untitled

Consistency and Generalization Bounds for Maximum Entropy Density Estimation

Negative-free approximation of probability density function for nonlinear projection filter

Contact Info

Product

Resources

About