An ensemble learning approach jointly modeling main and interaction effects in genetic association studies

Zhang, Zhaogong; Zhang, Shuanglin; Wong, M. W.; Wareham, Nicholas J.; Sha, Qiuying

doi:10.1002/gepi.20304

Cited by 16 publications

(16 citation statements)

References 57 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Use of such techniques is just beginning to emerge: ridge regression [11] has been used for distinguishing between causative and non-causative variants for quantitative phenotypes, and penalized logistic and least angle regression have been used for identifying gene-gene interactions in binary traits [7,12]. A closely-related Bayesian penalized regression procedure [13] has also been suggested for genome-wide and/or fine-mapping studies.…”

Section: Discussionmentioning

confidence: 99%

Analysis of North American Rheumatoid Arthritis Consortium data using a penalized logistic regression approach

Croiseau

Cordell

2009

BMC Proc

View full text Add to dashboard Cite

We applied a penalized regression approach to single-nucleotide polymorphisms in regions on chromosomes 1, 6, and 9 of the North American Rheumatoid Arthritis Consortium data. Results were compared with a standard single-locus association test. Overall, the penalized regression approach did not appear to offer any advantage with respect to either detection or localization of disease-associated polymorphisms, compared with the single-locus approach.

show abstract

Section: Discussionmentioning

confidence: 99%

Analysis of North American Rheumatoid Arthritis Consortium data using a penalized logistic regression approach

Croiseau

Cordell

2009

BMC Proc

View full text Add to dashboard Cite

show abstract

“…There are also a few Bayesian studies for modeling nonlinear, non-additive or interaction covariate effects (Chen et al, 2012; Chipman, George & McCulloch, 1998; Gustafson, 2000). Finally, there is a recent and rich literature for detecting epistasis in GWAS association studies: (Li, Horstman & Chen, 2011; Ueki & Cordell, 2012; Yung et al, 2011; Zhang et al 2008; 2010a; 2010b; 2011). However, all these existing methods, except for the one by Chen et al (2012), are not designed for analyzing time-to-event outcomes, possibly with censoring.…”

Section: Introductionmentioning

confidence: 99%

“…RSF, being derived from RF, naturally inherits many of its important properties. One of them is that, being fully non-parametric, it is model-assumption free, and, as an (ensemble) tree-based modeling approach, it is suited to adaptively discover non-monotonic, nonlinear, and non-additive or high-order interaction-effects (Ishwaran et al, 2008; Zhang et al, 2008; Li, Horstman & Chen, 2011). Therefore, these approaches provide a natural alternative to build models that bypass the need to impose parametric constraints on the underlying distributions and a way to automatically deal with high-level interactions; both of which should ultimately result in more accurate predictions and more efficient epistasis detections.…”

Section: Introductionmentioning

confidence: 99%

“…In this study, we took advantage of the aforementioned properties and capabilities of ensemble tree models, such as RF and RSF, as done by others (Zhang et al, 2008; Li, Horstman & Chen, 2011). Specifically, we used RSF to model right-censored time-to-event outcome data and introduce a novel approach that makes use of RF concepts of variable importance (Breiman, 2001) and minimal depth of maximal subtree (Ishwaran, 2007) for selecting and ranking pairwise interaction statistics.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Ensemble survival tree models to reveal pairwise interactions of variables with time-to-events outcomes in low-dimensional setting

Dazard

Ishwaran

Mehlotra

et al. 2018

Statistical Applications in Genetics and Molecular Biology

View full text Add to dashboard Cite

Unraveling interactions among variables such as genetic, clinical, demographic and environmental factors is essential to understand the development of common and complex diseases. To increase the power to detect such variables interactions associated with clinical time-to-events outcomes, we borrowed established concepts from random survival forest (RSF) models. We introduce a novel RSF-based pairwise interaction estimator and derive a randomization method with bootstrap confidence intervals for inferring interaction significance. Using various linear and nonlinear time-to-events survival models in simulation studies, we first show the efficiency of our approach: true pairwise interaction-effects between variables are uncovered, while they may not be accompanied with their corresponding main-effects, and may not be detected by standard semi-parametric regression modeling and test statistics used in survival analysis. Moreover, using a RSF-based cross-validation scheme for generating prediction estimators, we show that informative predictors may be inferred. We applied our approach to an HIV cohort study recording key host gene polymorphisms and their association with HIV change of tropism or AIDS progression. Altogether, this shows how linear or nonlinear pairwise statistical interactions of variables may be efficiently detected with a predictive value in observational studies with time-to-event outcomes.

show abstract

“…To address this issue, many tools from the statistical and machine learning literature, developed to deal with high-dimensional search spaces, have been applied to multi-marker SNP data, for example neural networks (Ritchie et al, 2003b; North et al, 2003; Tomita et al, 2004), random forests (Breiman, 2001; Lunetta et al, 2004; Bureau et al, 2005; Chen et al, 2007), and various other methods based on partitions, trees, and splines, and ensembles of base learners (e.g. Chen et al, 2003; Cook et al, 2004; Zhang et al, 2008). Some approaches to delineate higher order interactions were specifically developed for SNP data, such as the multifactor dimensionality reduction techniques (Hahn et al, 2003; Ritchie et al, 2003a; Moore, 2004; Ritchie and Motsinger, 2005; Ritchie, 2005), the restricted partition method (Culverhouse et al, 2004, 2007), and logic regression (Kooperberg et al, 2001; Ruczinski et al, 2003, 2004).…”

Section: Introductionmentioning

confidence: 99%

Efficient Simulation of Epistatic Interactions in Case-Parent Trios

Schwender

Louis

et al. 2013

Hum Hered

View full text Add to dashboard Cite

Statistical approaches to evaluate interactions between single nucleotide polymorphisms (SNPs) and SNP-environment interactions are of great importance in genetic association studies, as susceptibility to complex disease might be related to the interaction of multiple SNPs and/or environmental factors. With these methods under active development, algorithms to simulate genomic data sets are needed to ensure proper type I error control of newly proposed methods and to compare power with existing methods. In this paper we propose an efficient method for a haplotype-based simulation of case-parent trios when the disease risk is thought to depend on possibly higher-order epistatic interactions or gene-environment interactions with binary exposures.

show abstract

An ensemble learning approach jointly modeling main and interaction effects in genetic association studies

Cited by 16 publications

References 57 publications

Analysis of North American Rheumatoid Arthritis Consortium data using a penalized logistic regression approach

Analysis of North American Rheumatoid Arthritis Consortium data using a penalized logistic regression approach

Ensemble survival tree models to reveal pairwise interactions of variables with time-to-events outcomes in low-dimensional setting

Efficient Simulation of Epistatic Interactions in Case-Parent Trios

Contact Info

Product

Resources

About