Prediction using step-wise L1, L2 regularization and feature selection for small data sets with large number of features

Demir-Kavuk, Ozgur; Kamada, Masao; Akutsu, Tatsuya; Knapp, Ernst‐Walter

doi:10.1186/1471-2105-12-412

Cited by 81 publications

(48 citation statements)

References 23 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…Logistic regression is a commonly applied statistical method that when used with categorical variables can be contemplated as a generalized linear model. In a logistic regression, it is typical to apply a regularization terme.g., L1 (the sum of the absolute value of feature weights) and L2 (the sum of squared feature weights) -that introduce some bias while reducing variance, thereby improving predictive ability (Demir-Kavuk et al, 2011). Isakov et al (2017) used elastic net logistic regression (Zou and Hastie, 2005) which combines L1 and L2 penalties to prioritize IBD genes.…”

Section: Machine Learning Modelsmentioning

confidence: 99%

Reaching the End-Game for GWAS: Machine Learning Approaches for the Prioritization of Complex Disease Loci

et al. 2020

View full text Add to dashboard Cite

Section: Machine Learning Modelsmentioning

confidence: 99%

Reaching the End-Game for GWAS: Machine Learning Approaches for the Prioritization of Complex Disease Loci

et al. 2020

View full text Add to dashboard Cite

“…Large coefficients that deviate from zero are thus penalized in the calculation of the loss function. The three widely used regularization schemes are called LASSO, Ridge, and ElasticNet [11]. A more interested reader is referred to a review article that specifically deals with various feature selection methods [12].…”

Section: Feature Selectionmentioning

confidence: 99%

Data science and machine learning in anesthesiology

Chae

2020

Korean J Anesthesiol

View full text Add to dashboard Cite

Machine learning (ML) is revolutionizing anesthesiology research. Unlike classical research methods that are largely inference-based, ML is geared more towards making accurate predictions. ML is a field of artificial intelligence concerned with developing algorithms and models to perform prediction tasks in the absence of explicit instructions. Most ML applications, despite being highly variable in the topics that they deal with, generally follow a common workflow. For classification tasks, a researcher typically tests various ML models and compares the predictive performance with the reference logistic regression model. The main advantage of ML lies in its ability to deal with many features with complex interactions and its specific focus on maximizing predictive performance. However, emphasis on data-driven prediction can sometimes neglect mechanistic understanding. This article mainly focuses on the application of supervised ML to electronic health record (EHR) data. The main limitation of EHR-based studies is in the difficulty of establishing causal relationships. However, the associated low cost and rich information content provide great potential to uncover hitherto unknown correlations. In this review, the basic concepts of ML are introduced along with important terms that any ML researcher should know. Practical tips regarding the choice of software and computing devices are also provided. Towards the end, several examples of successful ML applications in anesthesiology are discussed. The goal of this article is to provide a basic roadmap to novice ML researchers working in the field of anesthesiology.

show abstract

“…The majority of our logistic regression models employed regularized regression to address this problem, with common choices including L1 (LASSO) and L2 (ridge) regularization (Demir-Kavuk et al 2011), as well as stepwise subset selection. In each of our logistic regressions, we standardized all the feature values by season prior to model fitting.…”

Section: Logistic Regressionmentioning

confidence: 99%

A mixture-of-modelers approach to forecasting NCAA tournament outcomes

Yuan

Liu

Yeh

et al. 2015

Journal of Quantitative Analysis in Sports

View full text Add to dashboard Cite

Predicting the outcome of a single sporting event is difficult; predicting all of the outcomes for an entire tournament is a monumental challenge. Despite the difficulties, millions of people compete each year to forecast the outcome of the NCAA men's basketball tournament, which spans 63 games over 3 weeks. Statistical prediction of game outcomes involves a multitude of possible covariates and information sources, large performance variations from game to game, and a scarcity of detailed historical data. In this paper, we present the results of a team of modelers working together to forecast the 2014 NCAA men's basketball tournament. We present not only the methods and data used, but also several novel ideas for post-processing statistical forecasts and decontaminating data sources. In particular, we highlight the difficulties in using publicly available data and suggest techniques for improving their relevance.

show abstract

Prediction using step-wise L1, L2 regularization and feature selection for small data sets with large number of features

Cited by 81 publications

References 23 publications

Reaching the End-Game for GWAS: Machine Learning Approaches for the Prioritization of Complex Disease Loci

Reaching the End-Game for GWAS: Machine Learning Approaches for the Prioritization of Complex Disease Loci

Data science and machine learning in anesthesiology

A mixture-of-modelers approach to forecasting NCAA tournament outcomes

Contact Info

Product

Resources

About