BackgroundLarge-scale genetic studies of common human diseases have focused almost exclusively on the independent main effects of single-nucleotide polymorphisms (SNPs) on disease susceptibility. These studies have had some success, but much of the genetic architecture of common disease remains unexplained. Attention is now turning to detecting SNPs that impact disease susceptibility in the context of other genetic factors and environmental exposures. These context-dependent genetic effects can manifest themselves as non-additive interactions, which are more challenging to model using parametric statistical approaches. The dimensionality that results from a multitude of genotype combinations, which results from considering many SNPs simultaneously, renders these approaches underpowered. We previously developed the multifactor dimensionality reduction (MDR) approach as a nonparametric and genetic model-free machine learning alternative. Approaches such as MDR can improve the power to detect gene-gene interactions but are limited in their ability to exhaustively consider SNP combinations in genome-wide association studies (GWAS), due to the combinatorial explosion of the search space. We introduce here a stochastic search algorithm called Crush for the application of MDR to modeling high-order gene-gene interactions in genome-wide data. The Crush-MDR approach uses expert knowledge to guide probabilistic searches within a framework that capitalizes on the use of biological knowledge to filter gene sets prior to analysis. Here we evaluated the ability of Crush-MDR to detect hierarchical sets of interacting SNPs using a biology-based simulation strategy that assumes non-additive interactions within genes and additivity in genetic effects between sets of genes within a biochemical pathway.ResultsWe show that Crush-MDR is able to identify genetic effects at the gene or pathway level significantly better than a baseline random search with the same number of model evaluations. We then applied the same methodology to a GWAS for Alzheimer’s disease and showed base level validation that Crush-MDR was able to identify a set of interacting genes with biological ties to Alzheimer’s disease.ConclusionsWe discuss the role of stochastic search and cloud computing for detecting complex genetic effects in genome-wide data.
The genetic history of prehistoric and protohistoric Korean populations is not well understood due to the lack of ancient Korean genomes. Here, we report the first paleogenomic data from Korea; eight shotgun-sequenced genomes (0.7X~6.1X coverage) from two archeological sites in Gimhae: Yuha-ri shell mound and Daesung-dong tumuli, the most important funerary complex of the Gaya confederacy. All eight individuals are from the Korean Three Kingdoms period (4th-7th century CE), during which there is archaeological evidence of extensive trade connections with both northern (modern-day China) and eastern (modern-day Japan) kingdoms. All genomes are best modeled as an admixture between a northern-Chinese Iron Age genetic source and a Japanese-Jomon-related ancestry. The proportion of Jomon-related ancestry suggests the presence of two genetic groups within the population. The observed substructure indicates diversity among the Gaya population that is not related to either social status or sex.
DNA-assisted identification of historical remains requires the genetic analysis of highly degraded DNA, along with a comparison to DNA from known relatives. This can be achieved by targeting single nucleotide polymorphisms (SNPs) using a hybridization capture and next-generation sequencing approach suitable for degraded skeletal samples. In the present study, two SNP capture panels were designed to target ∼25,000 (25K) and ∼95,000 (95K) autosomal SNPs, respectively, to enable distant kinship estimation (up to 4th degree relatives). Low-coverage SNP data were successfully recovered from 14 skeletal elements 75 years postmortem, with captured DNA having mean insert sizes ranging from 32-170 bp across the 14 samples. SNP comparison with DNA from known family references was performed in the Parabon Fχ Forensic Analysis Platform, which utilizes a likelihood approach for kinship prediction that was optimized for low-coverage sequencing data with DNA damage. The 25K and 95K panels produced 15,000 and 42,000 SNPs on average, respectively allowing for accurate kinship prediction in 17 and 19 of the 21 pairwise comparisons. Whole genome sequencing was not able to produce sufficient SNP data for accurate kinship prediction, demonstrating that hybridization capture is necessary for historical samples. This study provides the groundwork for the expansion of research involving compromised samples to include SNP hybridization capture.Author SummaryOur study evaluates ancient DNA techniques involving SNP capture and Next-Generation Sequencing for use in forensic identification. We utilized bone samples from 14 sets of previously identified historical remains aged 70 years postmortem for low-coverage SNP genotyping and extended kinship analysis. We performed whole genome sequencing and hybridization capture with two SNP panels, one targeting ∼25,000 SNPs and the other targeting ∼95,000 SNPs, to assess SNP recovery and accuracy in kinship estimation. A genotype likelihood approach was utilized for SNP profiling of degraded DNA characterized by cytosine deamination typical of ancient and historical specimens. Family reference samples from known relatives up to 4th degree were genotyped using a SNP microarray. We then utilized the Parabon Fχ Forensic Analysis Platform to perform pairwise comparisons of all bone and reference samples for kinship prediction. The results showed that both capture panels facilitated accurate kinship prediction in more than 80% of the tested relationships without producing false positive matches (or adventitious hits), which were commonly observed in the whole genome sequencing comparisons. We demonstrate that SNP capture can be an effective method for genotyping of historical remains for distant kinship analysis with known relatives, which will support humanitarian efforts and forensic identification.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.