Genome-wide association study is especially challenging in detecting high-order disease-causing models due to model diversity, possible low or even no marginal effect of the model, and extraordinary search and computations. In this paper, we propose a niche harmony search algorithm where joint entropy is utilized as a heuristic factor to guide the search for low or no marginal effect model, and two computationally lightweight scores are selected to evaluate and adapt to diverse of disease models. In order to obtain all possible suspected pathogenic models, niche technique merges with HS, which serves as a taboo region to avoid HS trapping into local search. From the resultant set of candidate SNP-combinations, we use G-test statistic for testing true positives. Experiments were performed on twenty typical simulation datasets in which 12 models are with marginal effect and eight ones are with no marginal effect. Our results indicate that the proposed algorithm has very high detection power for searching suspected disease models in the first stage and it is superior to some typical existing approaches in both detection power and CPU runtime for all these datasets. Application to age-related macular degeneration (AMD) demonstrates our method is promising in detecting high-order disease-causing models.
MotivationTwo-locus model is a typical significant disease model to be identified in genome-wide association study (GWAS). Due to intensive computational burden and diversity of disease models, existing methods have drawbacks on low detection power, high computation cost, and preference for some types of disease models.MethodIn this study, two scoring functions (Bayesian network based K2-score and Gini-score) are used for characterizing two SNP locus as a candidate model, the two criteria are adopted simultaneously for improving identification power and tackling the preference problem to disease models. Harmony search algorithm (HSA) is improved for quickly finding the most likely candidate models among all two-locus models, in which a local search algorithm with two-dimensional tabu table is presented to avoid repeatedly evaluating some disease models that have strong marginal effect. Finally G-test statistic is used to further test the candidate models.ResultsWe investigate our method named FHSA-SED on 82 simulated datasets and a real AMD dataset, and compare it with two typical methods (MACOED and CSE) which have been developed recently based on swarm intelligent search algorithm. The results of simulation experiments indicate that our method outperforms the two compared algorithms in terms of detection power, computation time, evaluation times, sensitivity (TPR), specificity (SPC), positive predictive value (PPV) and accuracy (ACC). Our method has identified two SNPs (rs3775652 and rs10511467) that may be also associated with disease in AMD dataset.
Detecting high-order epistasis in genome-wide association studies (GWASs) is of importance when characterizing complex human diseases. However, the enormous numbers of possible single-nucleotide polymorphism (SNP) combinations and the diversity among diseases presents a significant computational challenge. Herein, a fast method for detecting high-order epistasis based on an interaction weight (FDHE-IW) method is evaluated in the detection of SNP combinations associated with disease. First, the symmetrical uncertainty (SU) value for each SNP is calculated. Then, the top-k SNPs are isolated as guiders to identify 2-way SNP combinations with significant interaction weight values. Next, a forward search is employed to detect high-order SNP combinations with significant interaction weight values as candidates. Finally, the findings were statistically evaluated using a G-test to isolate true positives. The developed algorithm was used to evaluate 12 simulated datasets and an age-related macular degeneration (AMD) dataset and was shown to perform robustly in the detection of some high-order disease-causing models.
Motivation Recently, multi-objective swarm intelligence optimization (SIO) algorithms have attracted considerable attention as disease model-free methods for detecting high-order single nucleotide polymorphism (SNP) interactions. However, a strict Pareto optimal set may filter out some of the SNP combinations associated with disease status. Furthermore, the lack of heuristic factors for finding SNP interactions and the preference for discrimination approaches to disease models are considerable challenges for SIO. Method In this study, we propose a multi-population harmony search (HS) algorithm dedicated to the detection of high-order SNP interactions (MP-HS-DHSI). This method consists of three stages. In the 1st stage, HS with multi-population (multi-harmony memories) is used to discover a set of candidate high-order SNP combinations having an association with disease status. In HS, multiple criteria (Bayesian network-based K2-score, Jensen-Shannon (JS) divergence, likelihood ratio (LR) and normalized distance with joint entropy (ND-JE)) are adopted by four harmony memories to improve the ability to discriminate diverse disease models. A novel evaluation criterion named ND-JE is proposed to guide HS to explore clues for high-order SNP interactions. In the 2nd and 3rd stages, the G-test statistical method and multifactor dimensionality reduction (MDR) are employed to verify the authenticity of the candidate solutions, respectively. Results We compared MP-HS-DHSI with four state-of-the-art SIO algorithms for detecting high-order SNP interactions for 20 simulation disease models and a real dataset of age-related macular degeneration (AMD). The experimental results revealed that our proposed method can accelerate the search speed efficiently and enhance the discrimination ability of diverse epistasis models. Availability and implementation https://github.com/shouhengtuo/MP-HS-DHSI. Supplementary information Supplementary data are available at Bioinformatics online.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.