Learning from untrusted data

Charikar, Moses; Steinhardt, Jacob; Valiant, Gregory

doi:10.1145/3055399.3055491

Cited by 200 publications

(269 citation statements)

References 47 publications

Supporting

Mentioning

265

Contrasting

Order By: Relevance

“…after seeing the sample generated by the Gaussian). The authors in [CSV17] show that corruptions of an arbitrarily large fraction of samples can be tolerated as well, as long as we allow "list decoding" of the parameters of the Gaussian. In particular, they design learning algorithms that work when an (1 − α)-fraction of the samples can be adversarially corrupted, but output a set of poly(1/α) answers, one of which is guaranteed to be accurate.…”

Section: Related Workmentioning

confidence: 99%

Section: Related Workmentioning

confidence: 99%

“…Similar to [CSV17], we study a regime where only an arbitrarily small constant fraction of the samples from a normal distribution can be observed. In contrast to [CSV17], however, there is a fixed set S on which the samples are revealed without corruption, and we have oracle access to this set. The upshot is that we can provide a single accurate estimation of the normal rather than a list of candidate answers as in [CSV17], while accommodating a much larger number of deletions of samples compared to [DKK + 16a, DKK + 18].…”

Section: Related Workmentioning

confidence: 99%

See 2 more Smart Citations

Efficient Statistics, in High Dimensions, from Truncated Samples

Daskalakis¹,

Gouleakis²,

Tzamos

et al. 2018

2018 IEEE 59th Annual Symposium on Foundations of Computer Science (FOCS)

View full text Add to dashboard Cite

We provide an efficient algorithm for the classical problem, going back to Galton, Pearson, and Fisher, of estimating, with arbitrary accuracy the parameters of a multivariate normal distribution from truncated samples. Truncated samples from a d-variate normal N (µ, Σ) means a samples is only revealed if it falls in some subset S ⊆ R d ; otherwise the samples are hidden and their count in proportion to the revealed samples is also hidden. We show that the mean µ and covariance matrix Σ can be estimated with arbitrary accuracy in polynomial-time, as long as we have oracle access to S, and S has non-trivial measure under the unknown d-variate normal distribution. Additionally we show that without oracle access to S, any non-trivial estimation is impossible.

show abstract

Section: Related Workmentioning

confidence: 99%

Section: Related Workmentioning

confidence: 99%

Section: Related Workmentioning

confidence: 99%

See 1 more Smart Citation

Efficient Statistics, in High Dimensions, from Truncated Samples

Daskalakis¹,

Gouleakis²,

Tzamos

et al. 2018

2018 IEEE 59th Annual Symposium on Foundations of Computer Science (FOCS)

View full text Add to dashboard Cite

show abstract

“…Balcan et al [BBV08] introduced the notion of list-decodable learning, specifically, the notion of list-clustering. Charikar et al [CSV17] formally defined the notions of list-decodable learning and semi-verified learning, and showed that learning problems in the two models reduce to one another. Charikar et al [CSV17] obtained algorithms for list-decodable learning in the general setting of stochastic convex optimization, and applied the algorithm to a variety of settings including mean estimation, density estimation and planted partition problems (also see [SVC16,SKL17]).…”

Section: Introductionmentioning

confidence: 99%

“…Charikar et al [CSV17] formally defined the notions of list-decodable learning and semi-verified learning, and showed that learning problems in the two models reduce to one another. Charikar et al [CSV17] obtained algorithms for list-decodable learning in the general setting of stochastic convex optimization, and applied the algorithm to a variety of settings including mean estimation, density estimation and planted partition problems (also see [SVC16,SKL17]). The same model of list-decodable learning has been studied for the case of mean estimation [KS17b] and Gaussian mixture learning [KS17a,DKS18].…”

Section: Introductionmentioning

confidence: 99%

List Decodable Learning via Sum of Squares

Raghavendra

Yau

2020

Proceedings of the Fourteenth Annual ACM-SIAM Symposium on Discrete Algorithms

View full text Add to dashboard Cite

In the list-decodable learning setup, an overwhelming majority (say a 1 − β-fraction) of the input data consists of outliers and the goal of an algorithm is to output a small list L of hypotheses such that one of them agrees with inliers. We develop a framework for listdecodable learning via the Sum-of-Squares SDP hierarchy and demonstrate it on two basic statistical estimation problemswhere the corresponding labels y i are well-approximated by a linear function ℓ. We devise an algorithm that outputs a list L of linear functions such that there exists somel ∈ L that is close to ℓ. This yields the first algorithm for linear regression in a list-decodable setting. Our results hold for any distribution of examples whose concentration and anticoncentration can be certified by Sum-of-Squares proofs. • Mean Estimation: Given data points {X i } i∈[N] containing a subset S of βN inliers {X i } i∈S that are drawn i.i.d. from a Gaussian distribution N(µ, I) in d , we devise an algorithm that generates a list L of means such that there existsμ ∈ L close to µ. The recovery guarantees of the algorithm are analogous to the existing algorithms for the problem by Diakonikolas et al. [DKS18] and Kothari et al. [KS17a].In an independent and concurrent work, Karmalkar et al.[KKK19] also obtain an algorithm for list-decodable linear regression using the Sum-of-Squares SDP hierarchy.

show abstract