2018
DOI: 10.1186/s12863-018-0646-3
|View full text |Cite
|
Sign up to set email alerts
|

Data mining and machine learning approaches for the integration of genome-wide association and methylation data: methodology and main conclusions from GAW20

Abstract: BackgroundMultiple layers of genetic and epigenetic variability are being simultaneously explored in an increasing number of health studies. We summarize here different approaches applied in the Data Mining and Machine Learning group at the GAW20 to integrate genome-wide genotype and methylation array data.ResultsWe provide a non-intimidating introduction to some frequently used methods to investigate high-dimensional molecular data and compare the different approaches tried by group members: random forest, de… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1

Citation Types

0
3
0

Year Published

2020
2020
2023
2023

Publication Types

Select...
3

Relationship

0
3

Authors

Journals

citations
Cited by 3 publications
(3 citation statements)
references
References 13 publications
0
3
0
Order By: Relevance
“…After removal of records with missing values and/or outliers, RFE-RF was applied to the processed dataset to reveal the predictors for DGF. In brief, RFE-RF ranks features by VIS, which is calculated based on final predictive accuracy and determines the optimal number of predictors in an arbitrarily pre-defined search space 61 . In this study, we tested 5, 10, 15, 20, 25, 30, and 126 features with tenfold cross-validation and used mean ROC-AUC as the performance metric.…”
Section: Methodsmentioning
confidence: 99%
“…After removal of records with missing values and/or outliers, RFE-RF was applied to the processed dataset to reveal the predictors for DGF. In brief, RFE-RF ranks features by VIS, which is calculated based on final predictive accuracy and determines the optimal number of predictors in an arbitrarily pre-defined search space 61 . In this study, we tested 5, 10, 15, 20, 25, 30, and 126 features with tenfold cross-validation and used mean ROC-AUC as the performance metric.…”
Section: Methodsmentioning
confidence: 99%
“…Meanwhile, it is prone to underfitting, the accuracy is not very high, and it is unable to handle a large number of multi-class features or variables well. Based on these, more and more machine learning methods are used in the field of medical statistics ( Watson et al, 2019 ), ( Darst et al, 2018 ). In this study, we also directly chose machine learning instead of logistic regression analysis as the main method to screen ESCC-related m6A regulators.…”
Section: Discussionmentioning
confidence: 99%
“…As the focus of the workshop, simulated data based on the GOLDN cohort was used to rigorously evaluate the statistical methods used, and the real data were used to deliver new insights into fibrate response [95]. While a review of all the statistical methods used in the workshop is beyond the scope of this paper, Cherlin et al [96] and Darst et al [97] provide excellent perspectives on the diversity of methods employed. The following outlines some of the major findings of the 20th GAW.…”
Section: Complex Modeling Through the Genetic Analysis Workhopmentioning
confidence: 99%