The variance component tests used in genome-wide association studies (GWAS) including large sample sizes become computationally exhaustive when the number of genetic markers is over a few hundred thousand. We present an extremely fast variance components-based two-step method, GRAMMAR-Gamma, developed as an analytical approximation within a framework of the score test approach. Using simulated and real human GWAS data sets, we show that this method provides unbiased estimates of the SNP effect and has a power close to that of the likelihood ratio test-based method. The computational complexity of our method is close to its theoretical minimum, that is, to the complexity of the analysis that ignores genetic structure. The running time of our method linearly depends on sample size, whereas this dependency is quadratic for other existing methods. Simulations suggest that GRAMMAR-Gamma may be used for association testing in whole-genome resequencing studies of large human cohorts.
In the Victorian era, Sir Francis Galton showed that 'when dealing with the transmission of stature from parents to children, the average height of the two parents, y is all we need care to know about them' (1886). One hundred and twenty-two years after Galton's work was published, 54 loci showing strong statistical evidence for association to human height were described, providing us with potential genomic means of human height prediction. In a population-based study of 5748 people, we find that a 54-loci genomic profile explained 4-6% of the sex-and age-adjusted height variance, and had limited ability to discriminate tall/short people, as characterized by the area under the receiver-operating characteristic curve (AUC). In a family-based study of 550 people, with both parents having height measurements, we find that the Galtonian mid-parental prediction method explained 40% of the sex-and age-adjusted height variance, and showed high discriminative accuracy. We have also explored how much variance a genomic profile should explain to reach certain AUC values. For highly heritable traits such as height, we conclude that in applications in which parental phenotypic information is available (eg, medicine), the Victorian Galton's method will long stay unsurpassed, in terms of both discriminative accuracy and costs. For less heritable traits, and in situations in which parental information is not available (eg, forensics), genomic methods may provide an alternative, given that the variants determining an essential proportion of the trait's variation can be identified.
Motivation A huge number of genome-wide association studies (GWAS) summary statistics freely available in databases provide a new material for gene-based association analysis aimed at identifying rare genetic variants. Only a few of the many popular gene-based methods developed for individual genotype and phenotype data are adapted for the practical use of the GWAS summary statistics as input. Results We analytically prove and numerically illustrate that all popular powerful methods developed for gene-based association analysis of individual phenotype and genotype data can be modified to utilize GWAS summary statistics. We have modified and implemented all of the popular methods, including burden and kernel machine-based tests, multiple and functional linear regression, principal components analysis and others, in the R package sumFREGAT. Using real summary statistics for coronary artery disease, we show that the new package is able to detect genes not found by the existing packages. Availability and implementation The R package sumFREGAT is freely and publicly available at: https://CRAN.R-project.org/package=sumFREGAT. Supplementary information Supplementary data are available at Bioinformatics online.
The Eurasian common shrew (Sorex araneus L.) is characterized by spectacular chromosomal variation, both autosomal variation of the Robertsonian type and an XX/XY 1 Y 2 system of sex determination. It is an important mammalian model of chromosomal and genome evolution as it is one of the few species with a complete genome sequence. Here we generate a high-precision cytological recombination map for the species, the third such map produced in mammals, following those for humans and house mice. We prepared synaptonemal complex (SC) spreads of meiotic chromosomes from 638 spermatocytes of 22 males of nine different Robertsonian karyotypes, identifying each autosome arm by differential DAPI staining. Altogether we mapped 13,983 recombination sites along 7095 individual autosomes, using immunolocalization of MLH1, a mismatch repair protein marking recombination sites. We estimated the total recombination length of the shrew genome as 1145 cM. The majority of bivalents showed a high recombination frequency near the telomeres and a low frequency near the centromeres. The distances between MLH1 foci were consistent with crossover interference both within chromosome arms and across the centromere in metacentric bivalents. The pattern of recombination along a chromosome arm was a function of its length, interference, and centromere and telomere effects. The specific DNA sequence must also be important because chromosome arms of the same length differed substantially in their recombination pattern. These features of recombination show great similarity with humans and mice and suggest generality among mammals. However, contrary to a widespread perception, the metacentric bivalent tu usually lacked an MLH1 focus on one of its chromosome arms, arguing against a minimum requirement of one chiasma per chromosome arm for correct segregation. With regard to autosomal chromosomal variation, the chromosomes showing Robertsonian polymorphism display MLH1 foci that become increasingly distal when comparing acrocentric homozygotes, heterozygotes, and metacentric homozygotes. Within the sex trivalent XY 1 Y 2 , the autosomal part of the complex behaves similarly to other autosomes.
Regional-based association analysis instead of individual testing of each SNP was introduced in genome-wide association studies to increase the power of gene mapping, especially for rare genetic variants. For regional association tests, the kernel machine-based regression approach was recently proposed as a more powerful alternative to collapsing-based methods. However, the vast majority of existing algorithms and software for the kernel machine-based regression are applicable only to unrelated samples. In this paper, we present a new method for the kernel machine-based regression association analysis of quantitative traits in samples of related individuals. The method is based on the GRAMMAR+ transformation of phenotypes of related individuals, followed by use of existing kernel machine-based regression software for unrelated samples. We compared the performance of kernel-based association analysis on the material of the Genetic Analysis Workshop 17 family sample and real human data by using our transformation, the original untransformed trait, and environmental residuals. We demonstrated that only the GRAMMAR+ transformation produced type I errors close to the nominal value and that this method had the highest empirical power. The new method can be applied to analysis of related samples by using existing software for kernel-based association analysis developed for unrelated samples.
Region-based association analysis is a more powerful tool for gene mapping than testing of individual genetic variants, particularly for rare genetic variants. The most powerful methods for regional mapping are based on the functional data analysis approach, which assumes that the regional genome of an individual may be considered as a continuous stochastic function that contains information about both linkage and linkage disequilibrium. Here, we extend this powerful approach, earlier applied only to independent samples, to the samples of related individuals. To this end, we additionally include a random polygene effects in functional linear model used for testing association between quantitative traits and multiple genetic variants in the region. We compare the statistical power of different methods using Genetic Analysis Workshop 17 mini-exome family data and a wide range of simulation scenarios. Our method increases the power of regional association analysis of quantitative traits compared with burden-based and kernel-based methods for the majority of the scenarios. In addition, we estimate the statistical power of our method using regions with small number of genetic variants, and show that our method retains its advantage over burden-based and kernel-based methods in this case as well. The new method is implemented as the R-function ‘famFLM’ using two types of basis functions: the B-spline and Fourier bases. We compare the properties of the new method using models that differ from each other in the type of their function basis. The models based on the Fourier basis functions have an advantage in terms of speed and power over the models that use the B-spline basis functions and those that combine B-spline and Fourier basis functions. The ‘famFLM’ function is distributed under GPLv3 license and is freely available at http://mga.bionet.nsc.ru/soft/famFLM/.
BackgroundDesign of new highly productive livestock breeds, well-adapted to local climatic conditions is one of the aims of modern agriculture and breeding. The genetics underlying economically important traits in cattle are widely studied, whereas our knowledge of the genetic mechanisms of adaptation to local environments is still scarce. To address this issue for cold climates we used an integrated approach for detecting genomic intervals related to body temperature maintenance under acute cold stress. Our approach combined genome-wide association studies (GWAS) and scans for signatures of selection applied to a cattle population (Hereford and Kazakh Whiteheaded beef breeds) bred in Siberia. We utilized the GGP HD150K DNA chip containing 139,376 single nucleotide polymorphism markers.ResultsWe detected a single candidate region on cattle chromosome (BTA)15 overlapping between the GWAS results and the results of scans for selective sweeps. This region contains two genes, MSANTD4 and GRIA4. Both genes are functional candidates to contribute to the cold-stress resistance phenotype, due to their indirect involvement in the cold shock response (MSANTD4) and body thermoregulation (GRIA4).ConclusionsOur results point to a novel region on BTA15 which is a candidate region associated with the body temperature maintenance phenotype in Siberian cattle. The results of our research and the follow up studies might be used for the development of cattle breeds better adapted to cold climates of the Russian Federation and other Northern countries with similar climates.Electronic supplementary materialThe online version of this article (10.1186/s12863-019-0725-0) contains supplementary material, which is available to authorized users.
The kernel machine-based regression is an efficient approach to region-based association analysis aimed at identification of rare genetic variants. However, this method is computationally complex. The running time of kernel-based association analysis becomes especially long for samples with genetic (sub) structures, thus increasing the need to develop new and effective methods, algorithms, and software packages. We have developed a new R-package called fast family-based sequence kernel association test (FFBSKAT) for analysis of quantitative traits in samples of related individuals. This software implements a score-based variance component test to assess the association of a given set of single nucleotide polymorphisms with a continuous phenotype. We compared the performance of our software with that of two existing software for family-based sequence kernel association testing, namely, ASKAT and famSKAT, using the Genetic Analysis Workshop 17 family sample. Results demonstrate that FFBSKAT is several times faster than other available programs. In addition, the calculations of the three-compared software were similarly accurate. With respect to the available analysis modes, we combined the advantages of both ASKAT and famSKAT and added new options to empower FFBSKAT users. The FFBSKAT package is fast, user-friendly, and provides an easy-to-use method to perform whole-exome kernel machine-based regression association analysis of quantitative traits in samples of related individuals. The FFBSKAT package, along with its manual, is available for free download at http://mga.bionet.nsc.ru/soft/FFBSKAT/.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.