2020
DOI: 10.1080/01621459.2020.1796677
|View full text |Cite
|
Sign up to set email alerts
|

Rare Feature Selection in High Dimensions

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

1
56
0

Year Published

2020
2020
2022
2022

Publication Types

Select...
6
1
1

Relationship

1
7

Authors

Journals

citations
Cited by 35 publications
(57 citation statements)
references
References 37 publications
1
56
0
Order By: Relevance
“…The choice of the ℓ 1 penalty was motivated in [15] by the high dimensionality of microbiome data and the desire for parsimonious predictive models. However, such a penalty is not well-suited to situations in which large numbers of features are highly rare [21], a well-known feature of amplicon data. A common remedy, also adopted in [15], is to aggregate taxa at the base level, e.g., OTUs or ASVs, to the genus level and then to screen out all but the most abundant genera.…”
Section: Methodsmentioning
confidence: 99%
See 1 more Smart Citation
“…The choice of the ℓ 1 penalty was motivated in [15] by the high dimensionality of microbiome data and the desire for parsimonious predictive models. However, such a penalty is not well-suited to situations in which large numbers of features are highly rare [21], a well-known feature of amplicon data. A common remedy, also adopted in [15], is to aggregate taxa at the base level, e.g., OTUs or ASVs, to the genus level and then to screen out all but the most abundant genera.…”
Section: Methodsmentioning
confidence: 99%
“…Using OTU/ASVs as base level , Figure 1A illustrates the typical aggregation-to-genus level approach whereas Figure 1B shows the prediction-dependent approach. The method is designed to mesh seamlessly with the compositional data analysis framework by combining log-contrast regression [20] with tree-guided regularization, recently put forward in [21]. Thanks to the convexity of the underlying penalized estimation problem, can deliver interpretable aggregated solutions to large-scale microbiome regression problems in a fast and reproducible manner.…”
Section: Introductionmentioning
confidence: 99%
“…Yan and Bien 20 are independently developing a similar method involving adding aggregated variables to alter the regularization. Their approach is tailored to text mining problems, and consequently differs from ours in a couple of respects: First, they include an additional penalty for the coefficients at leaf nodes.…”
Section: Adapting Surf To Tree Structured Datamentioning
confidence: 99%
“…In recent years, several tree-guided lasso methods have been developed. For example, TASSO [ 20 ] applies an l 1 penalty on the sum of coefficients within each possible subtree, while rare [ 21 ] applies an l 1 penalty on latent variables of nodes to induce subtrees having equal coefficient values. Citrus [ 22 ] works on CyTOF data and applies a lasso-regularized regression model [ 23 ] to automatically select stratifying subpopulations and cell response features that are the best predictors of a phenotypic outcome.…”
Section: Introductionmentioning
confidence: 99%