Fast and accurate exhaustive higher-order epistasis search with BitEpi

Bayat, Arash; Hosking, Brendan; Jain, Yatish; Hosking, Cameron; Kodikara, Milindi; Reti, Daniel; Twine, Natalie A.; Bauer, Denis C.

doi:10.1038/s41598-021-94959-y

Cited by 14 publications

(13 citation statements)

References 28 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…More experience in its application would be required to ascertain this. Early applications of the method (Lundberg et al, 2022) have lead to promising results in that using RFlocalfdr, in tandem with other methods, (Bayat et al, 2021(Bayat et al, , 2020) captured more phenotypic variance in Alzheimers disease than standard GWAS analysis methods. In addition, there is evidence that these findings replicate in an independent data set.…”

Section: Discussionmentioning

confidence: 99%

Thresholding Gini Variable Importance with a single trained Random Forest: An Empirical Bayes Approach

Dunne

Reguant²,

Ramarao-Milne³

et al. 2022

Preprint

Self Cite

View full text Add to dashboard Cite

Random Forests (RF) are a very widely used modelling tool. Lundberg et al. (2019) concludes that no nonlinear model had a more widespread popularity, from health care to academia to industry, than random forests and decision trees. The bounds of the ethodology are still being extended. Bayat et al. (2020) give an example with 80 million variables. It is highly desirable that RF models be made more interpretable and a large part of that is a better understanding of the characteristics of the variable importance measures generated by the RF. Due to its speed and ease of calculation, we consider the mean decrease in node "impurity" (MDI) variable importance (VI) and address the question of setting a significance level. The report is organized as follows: • We first consider the question of multiple testing in the case of multiple measurements made on two groups (the standard microarray set up Efron (2008)). We show that some standard approaches for multiple testing can fail severely due to the correlation structure of the measurement and other modelling failures. We show that deriving the null distribution by permutation does not fix the problem. This point applies to determining the null distribution of variable importances as well as many other statistical tests; • We show that variable correlation can either increase or decrease the (MDI) of variables in different settings. We also show that there is an additional problem with the permutation null due to the functional relationships between the statistics; • We consider the empirical Bayes argument of Efron (2005) and model the VI as a mixture of two distribution, a null and a non-null distribution. We find that unlike the relatively well behaved case considered in Efron's papers, there are a number of issues here: – the distribution may be multi-modal, which creates modelling difficulties; – the null distribution is not of an obvious form, as it is not symmetric. • We resolve these issues to derive a fast, plausible, empirical Bayes method for selecting significant variables while controlling the false discovery rate.

show abstract

Section: Discussionmentioning

confidence: 99%

Thresholding Gini Variable Importance with a single trained Random Forest: An Empirical Bayes Approach

Dunne

Reguant²,

Ramarao-Milne³

et al. 2022

Preprint

Self Cite

View full text Add to dashboard Cite

show abstract

“…When compared to existing epistasis detection software, Fiuncho offers support for a wider scope of application with no limit on the target epistasis size, and performs the fastest of all programs considered in this study. For example, on average, Fiuncho is seven times faster than its predecessor, MPI3SNP [4], three times faster than BitEpi [5] and 242 times faster than MDR [3]. Moreover, the speedups over BitEpi and MDR could be multiplied if larger experiments on multinode environments were considered, as they are restricted to the hardware resources available in a single node.…”

Section: Discussionmentioning

confidence: 99%

“…At last, the performance of Fiuncho was compared with other exhaustive epistasis detection tools from the literature: MPI3SNP [4], MDR [3] and BitEpi [5].…”

Section: Comparison With Other Softwarementioning

confidence: 99%

Fiuncho: a program for any-order epistasis detection in CPU clusters

Ponte-Fernández¹,

González-Domínguez²,

Martín³

2022

Preprint

View full text Add to dashboard Cite

Epistasis can be defined as the statistical interaction of genes during the expression of a phenotype. It is believed that it plays a fundamental role in gene expression, as individual genetic variants have reported a very small increase in disease risk in previous Genome-Wide Association Studies. The most successful approach to epistasis detection is the exhaustive method, although its exponential time complexity requires a highly parallel implementation in order to be used. This work presents Fiuncho, a program that exploits all levels of parallelism present in x86_64 CPU clusters in order to mitigate the complexity of this approach. It supports epistasis interactions of any order, and when compared with other exhaustive methods, it is on average 242, 7 and 3 times faster than MDR, BitEpi and MPI3SNP, respectively.

show abstract

“…Indeed, due to the exponential complexity involved in the higher-order exhaustive search algorithms, they are not applicable to large datasets. To address the above issues, several parametric modelling approaches ( 11 , 12 ), machine learning algorithms ( 8 , 10 ), and combinatorial optimizations ( 13 , 14 ) have been explored. But they are exclusively designed/used for detecting binary or higher-order interactions in case–control studies.…”

Section: Introductionmentioning

confidence: 99%

Epi-MEIF: detecting higher order epistatic interactions for complex traits using mixed effect conditional inference forests

Saha

Perrin

Röder

et al. 2022

Nucleic Acids Research

View full text Add to dashboard Cite

Understanding the relationship between genetic variations and variations in complex and quantitative phenotypes remains an ongoing challenge. While Genome-wide association studies (GWAS) have become a vital tool for identifying single-locus associations, we lack methods for identifying epistatic interactions. In this article, we propose a novel method for higher-order epistasis detection using mixed effect conditional inference forest (epiMEIF). The proposed method is fitted on a group of single nucleotide polymorphisms (SNPs) potentially associated with the phenotype and the tree structure in the forest facilitates the identification of n-way interactions between the SNPs. Additional testing strategies further improve the robustness of the method. We demonstrate its ability to detect true n-way interactions via extensive simulations in both cross-sectional and longitudinal synthetic datasets. This is further illustrated in an application to reveal epistatic interactions from natural variations of cardiac traits in flies (Drosophila). Overall, the method provides a generalized way to identify higher-order interactions from any GWAS data, thereby greatly improving the detection of the genetic architecture underlying complex phenotypes.

show abstract

Fast and accurate exhaustive higher-order epistasis search with BitEpi

Cited by 14 publications

References 28 publications

Thresholding Gini Variable Importance with a single trained Random Forest: An Empirical Bayes Approach

Thresholding Gini Variable Importance with a single trained Random Forest: An Empirical Bayes Approach

Fiuncho: a program for any-order epistasis detection in CPU clusters

Epi-MEIF: detecting higher order epistatic interactions for complex traits using mixed effect conditional inference forests

Contact Info

Product

Resources

About