2020
DOI: 10.48550/arxiv.2008.01839
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Sketching Datasets for Large-Scale Learning (long version)

Rémi Gribonval,
Antoine Chatalic,
Nicolas Keriven
et al.

Abstract: This article considers "sketched learning," or "compressive learning," an approach to large-scale machine learning where datasets are massively compressed before learning (e.g., clustering, classification, or regression) is performed. In particular, a "sketch" is first constructed by computing carefully chosen nonlinear random features (e.g., random Fourier features) and averaging them over the whole dataset. Parameters are then learned from the sketch, without access to the original dataset. This article surv… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1

Citation Types

0
3
0

Year Published

2021
2021
2022
2022

Publication Types

Select...
1
1

Relationship

0
2

Authors

Journals

citations
Cited by 2 publications
(3 citation statements)
references
References 51 publications
0
3
0
Order By: Relevance
“…The excess risk of the GMM learning task is then controlled by the sum of an empirical error term and a modeling error term. This guarantees that the estimated GMM approximates well the distribution of the data [19].…”
Section: Recovery Guaranteesmentioning
confidence: 70%
See 1 more Smart Citation
“…The excess risk of the GMM learning task is then controlled by the sum of an empirical error term and a modeling error term. This guarantees that the estimated GMM approximates well the distribution of the data [19].…”
Section: Recovery Guaranteesmentioning
confidence: 70%
“…Leveraging ideas from compressive sensing [15] and streaming algorithms [9], R. Gribonval et al propose a sketching method [25,19,20,18,17] to compress the training database.…”
mentioning
confidence: 99%
“…The weighted variants of the variational (Feldman, Faulkner, and Krause 2011;Zhang et al 2016;Campbell andBeronov 2019) andsampling-based (McGrory et al 2014) methods then process the coresets. Reducing D relies on the compression of data into smaller representations via random projections (Siblini, Kuntz, and Meyer 2019;Ayesha, Hanif, and Talib 2020), which is achieved in two ways: (i) each data item is projected into an individual representation (Dasgupta 1999); (ii) all data items are projected into an overall representation, commonly referred to as sketch (Keriven et al 2018;Gribonval et al 2020).…”
Section: More Remarks On Related Workmentioning
confidence: 99%