We describe a reference panel of 64,976 human haplotypes at 39,235,157 SNPs constructed using whole genome sequence data from 20 studies of predominantly European ancestry. Using this resource leads to accurate genotype imputation at minor allele frequencies as low as 0.1%, a large increase in the number of SNPs tested in association studies and can help to discover and refine causal loci. We describe remote server resources that allow researchers to carry out imputation and phasing consistently and efficiently.
Large studies use genotype data to discover genetic contributions to complex traits and infer relationships between those traits. Co-incident geographical variation in genotypes and health traits can bias these analyses. Here we show that single genetic variants and genetic scores composed of multiple variants are associated with birth location within UK Biobank and that geographic structure in genotype data cannot be accounted for using routine adjustment for study centre and principal components derived from genotype data. We find that major health outcomes appear geographically structured and that coincident structure in health outcomes and genotype data can yield biased associations. Understanding and accounting for this phenomenon will be important when making inference from genotype data in large studies.
Glycemic traits are used to diagnose and monitor type 2 diabetes, and cardiometabolic health. To date, most genetic studies of glycemic traits have focused on individuals of European ancestry. Here, we aggregated genome-wide association studies in up to 281,416 individuals without diabetes (30% non-European ancestry) with fasting glucose, 2h-glucose post-challenge, glycated hemoglobin, and fasting insulin data. Trans-ancestry and single-ancestry meta-analyses identified 242 loci (99 novel; P <5x10 -8 ), 80% with no significant evidence of between-ancestry heterogeneity. Analyses restricted to European ancestry individuals with equivalent sample size would have led to 24 fewer new loci. Compared to single-ancestry, equivalent sized trans-ancestry fine-mapping reduced the number of estimated variants in 99% credible sets by a median of 37.5%. Genomic feature, gene-expression and gene-set analyses revealed distinct biological signatures for each trait, highlighting different underlying biological pathways. Our results increase understanding of diabetes pathophysiology by use of trans-ancestry studies for improved power and resolution.
This study focused on resolving the relationship between BMI and type 2 diabetes. The availability of multiple variants associated with BMI offers a new chance to resolve the true causal effect of BMI on type 2 diabetes; however, the properties of these associations and their validity as genetic instruments need to be considered alongside established and new methods for undertaking Mendelian randomization (MR). We explore the potential for pleiotropic genetic variants to generate bias, revise existing estimates, and illustrate value in new analysis methods. A two-sample MR approach with 96 genetic variants was used with three different analysis methods, two of which (MR-Egger and the weighted median) have been developed specifically to address problems of invalid instrumental variables. We estimate an odds ratio for type 2 diabetes per unit increase in BMI (kg/m 2 ) of between 1.19 and 1.38, with the most stable estimate using all instruments and a weighted median approach (1.26 [95% CI 1.17, 1.34]). TCF7L2(rs7903146) was identified as a complex effect or pleiotropic instrument, and removal of this variant resulted in convergence of causal effect estimates from different causal analysis methods. This indicated the potential for pleiotropy to affect estimates and differences in performance of alternative analytical methods. In a real type 2 diabetes-focused example, this study demonstrates the potential impact of invalid instruments on causal effect estimates and the potential for new approaches to mitigate the bias caused.
Theory hypothesizes that the rate of decline in linkage disequilibrium (LD) as a function of distance between markers, measured by r(2), can be used to estimate effective population size (N(e)) and how it varies over time. The development of high-density genotyping makes feasible the application of this theory and has provided an impetus to improve predictions. This study considers the impact of several developments on the estimation of N(e) using both simulated and equine high-density single-nucleotide polymorphism data, when N(e) is assumed to be constant a priori and when it is not. In all models, estimates of N(e) were highly sensitive to thresholds imposed upon minor allele frequency (MAF) and to a priori assumptions on the expected r(2) for adjacent markers. Where constant N(e) was assumed a priori, then estimates with the lowest mean square error were obtained with MAF thresholds between 0.05 and 0.10, adjustment of r(2) for finite sample size, estimation of a [the limit for r(2) as recombination frequency (c) approaches 0] and relating N(e) to c (1 - c/2). The findings for predicting N(e) from models allowing variable N(e) were much less clear, apart from the desirability of correcting for finite sample size, and the lack of consistency in estimating recent N(e) (<7 generations) where estimates use data with large c. The theoretical conflicts over how estimation should proceed and uncertainty over where predictions might be expected to fit well suggest that the estimation of N(e) when it varies be carried out with extreme caution.
SummaryMany genomic methodologies rely on the presence and extent of linkage disequilibrium (LD) between markers and genetic variants underlying traits of interest, but the extent of LD in the horse has yet to be comprehensively characterized. In this study, we evaluate the extent and decay of LD in a sample of 817 Thoroughbreds. Horses were genotyped for over 50 000 single nucleotide polymorphism (SNP) markers across the genome, with 34 848 autosomal SNPs used in the final analysis. Linkage disequilibrium, as measured by the squared correlation coefficient (r 2 ), was found to be relatively high between closely linked markers (>0.6 at 5 kb) and to extend over long distances, with average r 2 maintained above nonsyntenic levels for single nucleotide polymorphisms (SNPs) up to 20 Mb apart. Using formulae which relate expected LD to effective population size (N e ), and assuming a constant actual population size, N e was estimated to be 100 in our population. Values of historical N e , calculated assuming linear population growth, suggested a decrease in N e since the distant past, reaching a minimum twenty generations ago, followed by a subsequent increase until the present time. The qualitative trends observed in N e can be rationalized by current knowledge of the history of the Thoroughbred breed, and inbreeding statistics obtained from published pedigree analyses are in agreement with observed values of N e . Given the high LD observed and the small estimated N e , genomic methodologies such as genomic selection could feasibly be applied to this population using the existing SNP marker set.
The variation in weight within a shared environment is largely attributable to genetic factors. Whilst many genes/loci confer susceptibility to obesity, little is known about the genetic architecture of healthy thinness. Here, we characterise the heritability of thinness which we found was comparable to that of severe obesity (h2 = 28.07 vs 32.33% respectively), although with incomplete genetic overlap (r = -0.49, 95% CI [-0.17, -0.82], p = 0.003). In a genome-wide association analysis of thinness (n = 1,471) vs severe obesity (n = 1,456), we identified 10 loci previously associated with obesity, and demonstrate enrichment for established BMI-associated loci (pbinomial = 3.05x10-5). Simulation analyses showed that different association results between the extremes were likely in agreement with additive effects across the BMI distribution, suggesting different effects on thinness and obesity could be due to their different degrees of extremeness. In further analyses, we detected a novel obesity and BMI-associated locus at PKHD1 (rs2784243, obese vs. thin p = 5.99x10-6, obese vs. controls p = 2.13x10-6 pBMI = 2.3x10-13), associations at loci recently discovered with much larger sample sizes (e.g. FAM150B and PRDM6-CEP120), and novel variants driving associations at previously established signals (e.g. rs205262 at the SNRPC/C6orf106 locus and rs112446794 at the PRDM6-CEP120 locus). Our ability to replicate loci found with much larger sample sizes demonstrates the value of clinical extremes and suggest that characterisation of the genetics of thinness may provide a more nuanced understanding of the genetic architecture of body weight regulation and may inform the identification of potential anti-obesity targets.
Detailed phenotyping is required to deepen our understanding of the biological mechanisms behind genetic associations. In addition, the impact of potentially modifiable risk factors on disease requires analytical frameworks that allow causal inference. Here, we discuss the characteristics of Recall-by-Genotype (RbG) as a study design aimed at addressing both these needs. We describe two broad scenarios for the application of RbG: studies using single variants and those using multiple variants. We consider the efficacy and practicality of the RbG approach, provide a catalogue of UK-based resources for such studies and present an online RbG study planner.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
334 Leonard St
Brooklyn, NY 11211
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.