Maximum Likelihood Estimates for Gaussian Mixtures Are Transcendental

Améndola, Carlos; Drton, Mathias; Sturmfels, Bernd

doi:10.1007/978-3-319-32859-1_49

Cited by 21 publications

(25 citation statements)

References 20 publications

(33 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…These cubics will be explained in Section 2. For k = 2 we obtain the variety of secant lines, here denoted σ 2 (G 1,6 ). This represents mixtures of two univariate Gaussians.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Moment Varieties of Gaussian Mixtures

Améndola¹,

Faugère²,

Sturmfels³

2016

J Alg Stat

Self Cite

View full text Add to dashboard Cite

The points of a moment variety are the vectors of all moments up to some order of a family of probability distributions. We study this variety for mixtures of Gaussians. Following up on Pearson's classical work from 1894, we apply current tools from computational algebra to recover the parameters from the moments. Our moment varieties extend objects familiar to algebraic geometers. For instance, the secant varieties of Veronese varieties are the loci obtained by setting all covariance matrices to zero. We compute the ideals of the 5-dimensional moment varieties representing mixtures of two univariate Gaussians, and we offer a comparison to the maximum likelihood approach.

show abstract

“…These cubics will be explained in Section 2. For k = 2 we obtain the variety of secant lines, here denoted σ 2 (G 1,6 ). This represents mixtures of two univariate Gaussians.…”

Section: Introductionmentioning

confidence: 99%

“…Theorem 1. The defining polynomial of σ 2 (G 1,6 ) is a sum of 31154 monomials of degree 39. This polynomial has degrees 25, 33, 32, 23, 17, 12, 9 in m 0 , m 1 , m 2 , m 3 , m 4 , m 5 , m 6 respectively.…”

Section: Introductionmentioning

confidence: 99%

Moment Varieties of Gaussian Mixtures

Améndola¹,

Faugère²,

Sturmfels³

2016

J Alg Stat

Self Cite

View full text Add to dashboard Cite

show abstract

“…Minimization of α-divergences allows one to choose a trade-off between mode fitting and support fitting of the minimizer [36]. The minimizer of α-divergences including MLE as a special case has interesting connections with transcendental number theory [37].…”

Section: Bounding the α-Divergencementioning

confidence: 99%

Guaranteed Bounds on Information-Theoretic Measures of Univariate Mixtures Using Piecewise Log-Sum-Exp Inequalities

Nielsen¹,

Sun²

2016

Preprint

View full text Add to dashboard Cite

Information-theoretic measures such as the entropy, cross-entropy and the Kullback-Leibler divergence between two mixture models is a core primitive in many signal processing tasks. Since the Kullback-Leibler divergence of mixtures provably does not admit a closed-form formula, it is in practice either estimated using costly Monte-Carlo stochastic integration, approximated, or bounded using various techniques. We present a fast and generic method that builds algorithmically closed-form lower and upper bounds on the entropy, the cross-entropy and the Kullback-Leibler divergence of mixtures. We illustrate the versatile method by reporting on our experiments for approximating the Kullback-Leibler divergence between univariate exponential mixtures, Gaussian mixtures, Rayleigh mixtures, and Gamma mixtures.

show abstract

“…While EM remains the most popular method for estimating GMMs, it only guarantees convergence to a stationary point of the likelihood function. On the other hand, various studies have shown that the likelihood function has bad local maxima that can have arbitrarily worse log-likelihood values compared to any of the global maxima [22,25,2]. More importantly, Jin et al [24] proved that with random initialization, the EM algorithm will converge to a bad critical point with high probability.…”

Section: Introductionmentioning

confidence: 99%

Sliced Wasserstein Distance for Learning Gaussian Mixture Models

Kolouri

Rohde

Hoffmann

2018

2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition

View full text Add to dashboard Cite

Gaussian mixture models (GMM) are powerful parametric tools with many applications in machine learning and computer vision. Expectation maximization (EM) is the most popular algorithm for estimating the GMM parameters. However, EM guarantees only convergence to a stationary point of the log-likelihood function, which could be arbitrarily worse than the optimal solution. Inspired by the relationship between the negative log-likelihood function and the Kullback-Leibler (KL) divergence, we propose an alternative formulation for estimating the GMM parameters using the sliced Wasserstein distance, which gives rise to a new algorithm. Specifically, we propose minimizing the sliced-Wasserstein distance between the mixture model and the data distribution with respect to the GMM parameters. In contrast to the KL-divergence, the energy landscape for the sliced-Wasserstein distance is more well-behaved and therefore more suitable for a stochastic gradient descent scheme to obtain the optimal GMM parameters. We show that our formulation results in parameter estimates that are more robust to random initializations and demonstrate that it can estimate high-dimensional data distributions more faithfully than the EM algorithm.

show abstract

Maximum Likelihood Estimates for Gaussian Mixtures Are Transcendental

Cited by 21 publications

References 20 publications

Moment Varieties of Gaussian Mixtures

Moment Varieties of Gaussian Mixtures

Guaranteed Bounds on Information-Theoretic Measures of Univariate Mixtures Using Piecewise Log-Sum-Exp Inequalities

Sliced Wasserstein Distance for Learning Gaussian Mixture Models

Contact Info

Product

Resources

About