2020
DOI: 10.1145/3417994
|View full text |Cite
|
Sign up to set email alerts
|

Near-optimal Sample Complexity Bounds for Robust Learning of Gaussian Mixtures via Compression Schemes

Abstract: We introduce a novel technique for distribution learning based on a notion of sample compression . Any class of distributions that allows such a compression scheme can be learned with few samples. Moreover, if a class of distributions has such a compression scheme, then so do the classes of products and mixtures of those distributions. As an application of this technique, we prove that ˜Θ( kd … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
28
0

Year Published

2020
2020
2023
2023

Publication Types

Select...
5
2
1

Relationship

1
7

Authors

Journals

citations
Cited by 22 publications
(28 citation statements)
references
References 27 publications
0
28
0
Order By: Relevance
“…Note that an advantage of this method is that we can quantify the minimum amount of training samples M ′ that we need so that the total variation distance from the true distribution is no more than ϵ. Bounds for the minimum number of samples are available in [39]. In our case, this bound is…”
Section: A Gaussian Mixture Modelmentioning
confidence: 99%
“…Note that an advantage of this method is that we can quantify the minimum amount of training samples M ′ that we need so that the total variation distance from the true distribution is no more than ϵ. Bounds for the minimum number of samples are available in [39]. In our case, this bound is…”
Section: A Gaussian Mixture Modelmentioning
confidence: 99%
“…This leads to the following theorem. In terms of sample complexity, we leverage work by [42] that has shown thatΘ(KP 2 /ε 2 ) samples are both necessary and sufficient for learning mixed multivariate Gaussian distributions with K components in a P-dimensional feature space with up to ε error in total variation distance. This result implies that learning reasonably accurate models that achieve a low, constant error ε requires rel-atively few samples in practice, as K and P are assumed to be small compared to N in large-scale datasets.…”
Section: Time Cost and Complexity Analysismentioning
confidence: 99%
“…The second term denotes the distance between the task distribution and the fitted GMM. When the PDP hypothesis holds and the model learns a task well, this term is small as we can approximate φ(q (t) ) withp Ashtiani et al (Ashtiani et al 2018) for a rigorous analysis of estimating a distribution with GMM). In other words, this term is small if the classes are learned as concepts.…”
Section: Theoretical Analysismentioning
confidence: 99%