Abstract:BackgroundDetecting and visualizing nonlinear interaction effects of single nucleotide polymorphisms (SNPs) or epistatic interactions are important topics in bioinformatics since they play an important role in unraveling the mystery of “missing heritability”. However, related studies are almost limited to pairwise epistatic interactions due to their methodological and computational challenges.ResultsWe develop CINOEDV (Co-Information based N-Order Epistasis Detector and Visualizer) for the detection and visual… Show more
“…Investigating the gene–gene interactions of diseases and cancers could facilitate the understanding of epistasis in populations in the field of systems biology 5 , 6 . Statistical method, data mining, and machine learning have been used to detect epistasis in family-based and case-control studies, such as co-information based n -order eistasis detection and visualizer (CINOEDV) 7 , support vector machine-based method (EpiMiner) 8 , and so on 9 .…”
Epistasis within disease-related genes (gene–gene interactions) was determined through contingency table measures based on multifactor dimensionality reduction (MDR) using single-nucleotide polymorphisms (SNPs). Most MDR-based methods use the single contingency table measure to detect gene–gene interactions; however, some gene–gene interactions may require identification through multiple contingency table measures. In this study, a multiobjective differential evolution method (called MODEMDR) was proposed to merge the various contingency table measures based on MDR to detect significant gene–gene interactions. Two contingency table measures, namely the correct classification rate and normalized mutual information, were selected to design the fitness functions in MODEMDR. The characteristics of multiobjective optimization enable MODEMDR to use multiple measures to efficiently and synchronously detect significant gene–gene interactions within a reasonable time frame. Epistatic models with and without marginal effects under various parameter settings (heritability and minor allele frequencies) were used to assess existing methods by comparing the detection success rates of gene–gene interactions. The results of the simulation datasets show that MODEMDR is superior to existing methods. Moreover, a large dataset obtained from the Wellcome Trust Case Control Consortium was used to assess MODEMDR. MODEMDR exhibited efficiency in identifying significant gene–gene interactions in genome-wide association studies.
“…Investigating the gene–gene interactions of diseases and cancers could facilitate the understanding of epistasis in populations in the field of systems biology 5 , 6 . Statistical method, data mining, and machine learning have been used to detect epistasis in family-based and case-control studies, such as co-information based n -order eistasis detection and visualizer (CINOEDV) 7 , support vector machine-based method (EpiMiner) 8 , and so on 9 .…”
Epistasis within disease-related genes (gene–gene interactions) was determined through contingency table measures based on multifactor dimensionality reduction (MDR) using single-nucleotide polymorphisms (SNPs). Most MDR-based methods use the single contingency table measure to detect gene–gene interactions; however, some gene–gene interactions may require identification through multiple contingency table measures. In this study, a multiobjective differential evolution method (called MODEMDR) was proposed to merge the various contingency table measures based on MDR to detect significant gene–gene interactions. Two contingency table measures, namely the correct classification rate and normalized mutual information, were selected to design the fitness functions in MODEMDR. The characteristics of multiobjective optimization enable MODEMDR to use multiple measures to efficiently and synchronously detect significant gene–gene interactions within a reasonable time frame. Epistatic models with and without marginal effects under various parameter settings (heritability and minor allele frequencies) were used to assess existing methods by comparing the detection success rates of gene–gene interactions. The results of the simulation datasets show that MODEMDR is superior to existing methods. Moreover, a large dataset obtained from the Wellcome Trust Case Control Consortium was used to assess MODEMDR. MODEMDR exhibited efficiency in identifying significant gene–gene interactions in genome-wide association studies.
“…For the SNP data matrix, a row represents genotypes of a sample and a column represents a SNP. Genotypes of a sample are usually coded as 0, 1, 2, 3, corresponding to missing data, homozygous common genotype (e.g., AA), heterozygous genotype (e.g., Aa and aA), and homozygous minor genotype (e.g., aa) [2], [20], [65]. The sample labels matrix has only one column listing the binary phenotype of each sample, where 0 denotes control and 1 denotes case.…”
Section: B Snp Data For Epistasis Detectionmentioning
confidence: 99%
“…Specifically, pheromones are stored as a square matrix, whose dimensionality is equal to the SNP number N , to reflect association strengths between two-SNP combinations and the phenotype. This means that formulas (1), (2) and (3) should be slightly adjusted:…”
Section: B Pheromone Updating Rules 1) Pheromone Depositionmentioning
Detection of epistatic interactions, which are referred to as nonlinear interactive effects of single nucleotide polymorphisms (SNPs), is increasingly being recognized as an important route in capturing the underlying genetic causes of complex diseases. Its methodological and computational challenges have been well understood, and many methods also have been proposed from different perspectives. Among them ant colony optimization (ACO)-based methods are promising due to their controllable time complexities, heuristic positive feedback search, and high detection power. Nevertheless, there is no comprehensive overview of them so far. This paper, therefore, provides a systematic review of 25 ACO-based epistasis detection methods. First, the generic ACO algorithm, as well as how it is applied to detect epistatic interactions, is briefly described. Then, an in-depth review of ACO-based methods for detecting epistatic interactions is discussed from four aspects, including path selection strategies, pheromone updating rules, fitness functions, and two-stage designs. Finally, this paper analyzes the strengths and limitations of involved methods, provides guidelines for applying them, and gives several views on the future directions of epistasis detection methods. INDEX TERMS Ant colony optimization (ACO), epistatic interactions, single nucleotide polymorphisms (SNPs), heuristic information, genome-wide association studies (GWAS).
“…To tackle these challenges, some algorithms were developed to detect synergistic SNP combinations associated with complex diseases. The majority of these methods can be classified into three categories: exhaustive methods [ 7 , 8 , 9 , 10 , 11 ], filtering methods (SNPHarvester) [ 12 , 13 ], or artificial intelligence (including swarm intelligence and heuristic search methods) [ 14 , 15 , 16 , 17 , 18 , 19 , 20 , 21 , 22 ].…”
Section: Introductionmentioning
confidence: 99%
“…Artificial intelligent algorithms, such as bayesian epistasis association mapping (BEAM) [ 14 ], Ant colony optimization based epistatic interaction (AntEpiSeeker) [ 15 ], Cuckoo search epitasis (CSE) [ 16 ], multi-objective ant colony optimization epistasis detection (MACOED) [ 17 ], fast harmony search algorithm based SNP epistasis detection (FHSA-SED) [ 18 ], niche harmony search algorithm based high-order SNP combination detection (NHSA-DHSC) [ 19 ], Co-Information basedN-Order epistasis detector and visualizer (CINOEDV) [ 20 ], and high-order interaction seeker(HiSeeker) [ 22 ] have attracted attention when detecting high-order epistatic interactions, due to a reduced computational burden, which is due to not all SNP combinations being examined. However, these algorithms are often sensitive to parameters, and easily trapped in local searches [ 23 , 24 ].…”
Detecting high-order epistasis in genome-wide association studies (GWASs) is of importance when characterizing complex human diseases. However, the enormous numbers of possible single-nucleotide polymorphism (SNP) combinations and the diversity among diseases presents a significant computational challenge. Herein, a fast method for detecting high-order epistasis based on an interaction weight (FDHE-IW) method is evaluated in the detection of SNP combinations associated with disease. First, the symmetrical uncertainty (SU) value for each SNP is calculated. Then, the top-k SNPs are isolated as guiders to identify 2-way SNP combinations with significant interaction weight values. Next, a forward search is employed to detect high-order SNP combinations with significant interaction weight values as candidates. Finally, the findings were statistically evaluated using a G-test to isolate true positives. The developed algorithm was used to evaluate 12 simulated datasets and an age-related macular degeneration (AMD) dataset and was shown to perform robustly in the detection of some high-order disease-causing models.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.