2011
DOI: 10.1186/1471-2105-12-412
|View full text |Cite
|
Sign up to set email alerts
|

Prediction using step-wise L1, L2 regularization and feature selection for small data sets with large number of features

Abstract: BackgroundMachine learning methods are nowadays used for many biological prediction problems involving drugs, ligands or polypeptide segments of a protein. In order to build a prediction model a so called training data set of molecules with measured target properties is needed. For many such problems the size of the training data set is limited as measurements have to be performed in a wet lab. Furthermore, the considered problems are often complex, such that it is not clear which molecular descriptors (featur… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
42
1
1

Year Published

2015
2015
2024
2024

Publication Types

Select...
9
1

Relationship

0
10

Authors

Journals

citations
Cited by 81 publications
(48 citation statements)
references
References 23 publications
0
42
1
1
Order By: Relevance
“…Logistic regression is a commonly applied statistical method that when used with categorical variables can be contemplated as a generalized linear model. In a logistic regression, it is typical to apply a regularization terme.g., L1 (the sum of the absolute value of feature weights) and L2 (the sum of squared feature weights) -that introduce some bias while reducing variance, thereby improving predictive ability (Demir-Kavuk et al, 2011). Isakov et al (2017) used elastic net logistic regression (Zou and Hastie, 2005) which combines L1 and L2 penalties to prioritize IBD genes.…”
Section: Machine Learning Modelsmentioning
confidence: 99%
“…Logistic regression is a commonly applied statistical method that when used with categorical variables can be contemplated as a generalized linear model. In a logistic regression, it is typical to apply a regularization terme.g., L1 (the sum of the absolute value of feature weights) and L2 (the sum of squared feature weights) -that introduce some bias while reducing variance, thereby improving predictive ability (Demir-Kavuk et al, 2011). Isakov et al (2017) used elastic net logistic regression (Zou and Hastie, 2005) which combines L1 and L2 penalties to prioritize IBD genes.…”
Section: Machine Learning Modelsmentioning
confidence: 99%
“…Large coefficients that deviate from zero are thus penalized in the calculation of the loss function. The three widely used regularization schemes are called LASSO, Ridge, and ElasticNet [11]. A more interested reader is referred to a review article that specifically deals with various feature selection methods [12].…”
Section: Feature Selectionmentioning
confidence: 99%
“…The majority of our logistic regression models employed regularized regression to address this problem, with common choices including L1 (LASSO) and L2 (ridge) regularization (Demir-Kavuk et al 2011), as well as stepwise subset selection. In each of our logistic regressions, we standardized all the feature values by season prior to model fitting.…”
Section: Logistic Regressionmentioning
confidence: 99%