Proceedings of the 49th Annual ACM SIGACT Symposium on Theory of Computing 2017
DOI: 10.1145/3055399.3055491
|View full text |Cite
|
Sign up to set email alerts
|

Learning from untrusted data

Abstract: The vast majority of theoretical results in machine learning and statistics assume that the available training data is a reasonably reliable reflection of the phenomena to be learned or estimated. Similarly, the majority of machine learning and statistical techniques used in practice are brittle to the presence of large amounts of biased or malicious data. In this work we consider two frameworks in which to study estimation, learning, and optimization in the presence of significant fractions of arbitrary data.… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
2

Citation Types

2
265
1

Year Published

2018
2018
2022
2022

Publication Types

Select...
6
1

Relationship

0
7

Authors

Journals

citations
Cited by 200 publications
(269 citation statements)
references
References 47 publications
2
265
1
Order By: Relevance
“…after seeing the sample generated by the Gaussian). The authors in [CSV17] show that corruptions of an arbitrarily large fraction of samples can be tolerated as well, as long as we allow "list decoding" of the parameters of the Gaussian. In particular, they design learning algorithms that work when an (1 − α)-fraction of the samples can be adversarially corrupted, but output a set of poly(1/α) answers, one of which is guaranteed to be accurate.…”
Section: Related Workmentioning
confidence: 99%
See 2 more Smart Citations
“…after seeing the sample generated by the Gaussian). The authors in [CSV17] show that corruptions of an arbitrarily large fraction of samples can be tolerated as well, as long as we allow "list decoding" of the parameters of the Gaussian. In particular, they design learning algorithms that work when an (1 − α)-fraction of the samples can be adversarially corrupted, but output a set of poly(1/α) answers, one of which is guaranteed to be accurate.…”
Section: Related Workmentioning
confidence: 99%
“…Similar to [CSV17], we study a regime where only an arbitrarily small constant fraction of the samples from a normal distribution can be observed. In contrast to [CSV17], however, there is a fixed set S on which the samples are revealed without corruption, and we have oracle access to this set.…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation
“…Balcan et al [BBV08] introduced the notion of list-decodable learning, specifically, the notion of list-clustering. Charikar et al [CSV17] formally defined the notions of list-decodable learning and semi-verified learning, and showed that learning problems in the two models reduce to one another. Charikar et al [CSV17] obtained algorithms for list-decodable learning in the general setting of stochastic convex optimization, and applied the algorithm to a variety of settings including mean estimation, density estimation and planted partition problems (also see [SVC16,SKL17]).…”
Section: Introductionmentioning
confidence: 99%
“…Charikar et al [CSV17] formally defined the notions of list-decodable learning and semi-verified learning, and showed that learning problems in the two models reduce to one another. Charikar et al [CSV17] obtained algorithms for list-decodable learning in the general setting of stochastic convex optimization, and applied the algorithm to a variety of settings including mean estimation, density estimation and planted partition problems (also see [SVC16,SKL17]). The same model of list-decodable learning has been studied for the case of mean estimation [KS17b] and Gaussian mixture learning [KS17a,DKS18].…”
Section: Introductionmentioning
confidence: 99%