2020
DOI: 10.1002/wics.1499
|View full text |Cite
|
Sign up to set email alerts
|

Random projections: Data perturbation for classification problems

Abstract: Random projections offer an appealing and flexible approach to a wide range of largescale statistical problems. They are particularly useful in high-dimensional settings, where we have many covariates recorded for each observation. In classification problems there are two general techniques using random projections. The first involves many projections in an ensemble -the idea here is to aggregate the results after applying different random projections, with the aim of achieving superior statistical accuracy. T… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
8
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
5
3
1

Relationship

0
9

Authors

Journals

citations
Cited by 18 publications
(8 citation statements)
references
References 94 publications
0
8
0
Order By: Relevance
“…A key example are random forests (Breiman, 2001), which randomly subsample features. Another key example are random projection ensemble methods, which explicitly aim to reduce dimension by random feature subsampling (Cannings and Samworth, 2017;Gataric et al, 2020), see Cannings (2021) for a review.…”
Section: Random Projection Methodsmentioning
confidence: 99%
“…A key example are random forests (Breiman, 2001), which randomly subsample features. Another key example are random projection ensemble methods, which explicitly aim to reduce dimension by random feature subsampling (Cannings and Samworth, 2017;Gataric et al, 2020), see Cannings (2021) for a review.…”
Section: Random Projection Methodsmentioning
confidence: 99%
“…To apply scagnostics to higher dimensional data sets we derive 10 + √ m random two-dimensional projections of the data set and calculate scagnostics on each (where m is the number of dimensions in the data set). Random projections are used for dimensionality reduction for kmeans clustering [31], and a class of classification methods use multiple random projections to create an ensemble of learners [32,33]. We choose 10 + √ m as having a sufficient number for lower dimensional data sets while growing for higher-dimensional data sets.…”
Section: Extracting Features From Data Setsmentioning
confidence: 99%
“…So far we have left unspecified the randomised dimensionality compression scheme to be used. Indeed the general result presented in this section could potentially be applied to any independently randomised ensemble, including random coordinate projections (Ho, 1998;Tian and Feng, 2021), and various sketching methods (Cormode, 2017;Cannings, 2020). For the bound to be useful, we need to be able to control the compressibility function ψ P (k).…”
Section: Main Upper Boundmentioning
confidence: 99%
“…Such technologies open new doors for dealing with massive high dimensional data sets, and inspire new research in areas as diverse as numerical analysis (Halko et al, 2011), statistical methodology (Heinze et al, 2016;Cannings and Samworth, 2017;Tian and Feng, 2021), pattern recognition (Reboredo et al, 2016), clustering (Boutsidis et al, 2015;Biau et al, 2008;Meintrup et al, 2019), optimisation (Pilanci and Wainwright, 2015;Wainwright, 2016, 2017;Derezinski et al, 2020), search based software engineering (Nair et al, 2016), imaging (Lustig et al, 2007;Ye, 2019;Palmer et al, 2015;Bentley et al, 2019), medical research (Peressutti et al, 2015), neuroscience (Arriaga et al, 2015), and computer vision (Jiao et al, 2019). The interested reader may also refer to recent surveys (Gibson et al, 2020), (Cannings, 2020), and references therein.…”
Section: Introductionmentioning
confidence: 99%