Large-scale whole genome sequencing (WGS) studies have enabled the analysis of rare variants (RVs) associated with complex phenotypes. Commonly used RV association tests (RVATs) have limited scope to leverage variant functions. We propose STAAR (variant-Set Test for Association using Annotation infoRmation), a scalable and powerful RVAT method that effectively incorporates both variant categories and multiple complementary annotations using a dynamic weighting scheme. For the latter, we introduce “annotation Principal Components”, multi-dimensional summaries of in silico variant annotations. STAAR accounts for population structure and relatedness, and is scalable for analyzing very large cohort and biobank WGS studies of continuous and dichotomous traits. We applied STAAR to identify RVs associated with four lipid traits in 12,316 discovery samples and 17,822 replication samples from the Trans-Omics for Precision Medicine program. We discovered and replicated novel RV associations, including disruptive missense RVs of NPC1L1 and an intergenic region near APOC1P1 associated with low-density lipoprotein cholesterol.
Heritability, the proportion of phenotypic variance explained by genetic factors, can be estimated from pedigree data 1 , but such estimates are uninformative with respect to the underlying genetic architecture. Analyses of data from genome-wide association studies (GWAS) on unrelated individuals have shown that for human traits and disease, approximately one-third to two-thirds of heritability is captured by common SNPs 2-5 . It is not known whether the remaining heritability is due to the imperfect tagging of causal variants by common SNPs, in particular if the causal variants are rare, or other reasons such as overestimation of heritability from pedigree data. Here we show that pedigree heritability for height and body mass index (BMI) appears to be fully recovered from whole-genome sequence (WGS) data on 21,620 unrelated individuals of European ancestry. We assigned 47.1 million genetic variants to groups based upon their minor allele frequencies (MAF) and linkage disequilibrium (LD) with variants nearby, and estimated and partitioned variation accordingly. The estimated heritability was 0.79 (SE 0.09) for height and 0.40 (SE 0.09) for BMI, consistent with pedigree estimates. Low-MAF variants in low LD with neighbouring variants were enriched for heritability, to a greater extent for protein altering variants, consistent with negative selection thereon. Cumulatively variants in the MAF range of 0.0001 to 0.1 explained 0.54 (SE 0.05) and 0.51 (SE 0.11) of heritability for height and BMI, respectively. Our results imply that the still missing heritability of complex traits and disease is accounted for by rare variants, in particular those in regions of low LD.
Background A growing number of studies clearly demonstrate a substantial association between chronic obstructive pulmonary disease (COPD) and cardiovascular diseases (CVD), although little is known about the shared genetics that contribute to this association. Methods We conducted a large-scale cross-trait genome-wide association study to investigate genetic overlap between COPD (N case = 12,550, N control = 46,368) from the International COPD Genetics Consortium and four primary cardiac traits: resting heart rate (RHR) ( N = 458,969), high blood pressure (HBP) (N case = 144,793, N control = 313,761), coronary artery disease (CAD)(N case = 60,801, N control = 123,504), and stroke (N case = 40,585, N control = 406,111) from UK Biobank, CARDIoGRAMplusC4D Consortium, and International Stroke Genetics Consortium data. Results RHR and HBP had modest genetic correlation, and CAD had borderline evidence with COPD at a genome-wide level. We found evidence of local genetic correlation with particular regions of the genome. Cross-trait meta-analysis of COPD identified 21 loci jointly associated with RHR, 22 loci with HBP, and 3 loci with CAD. Functional analysis revealed that shared genes were enriched in smoking-related pathways and in cardiovascular, nervous, and immune system tissues. An examination of smoking-related genetic variants identified SNPs located in 15q25.1 region associated with cigarettes per day, with effects on RHR and CAD. A Mendelian randomization analysis showed a significant positive causal effect of COPD on RHR (causal estimate = 0.1374, P = 0.008). Conclusion In a set of large-scale GWAS, we identify evidence of shared genetics between COPD and cardiac traits. Electronic supplementary material The online version of this article (10.1186/s12931-019-1036-8) contains supplementary material, which is available to authorized users.
Lung carcinogenesis is a complex and stepwise process involving accumulation of genetic mutations in signaling and oncogenic pathways via interactions with environmental factors and host susceptibility. Tobacco exposure is the leading cause of lung cancer, but its relationship to clinically relevant mutations and the composite tumor mutation burden (TMB) has not been fully elucidated. In this study, we investigated the dose–response relationship in a retrospective observational study of 931 patients treated for advanced-stage non–small cell lung cancer (NSCLC) between April 2013 and February 2020 at the Dana Farber Cancer Institute and Brigham and Women’s Hospital. Doubling smoking pack-years was associated with increased KRASG12C and less frequent EGFRdel19 and EGFRL858R mutations, whereas doubling smoking-free months was associated with more frequent EGFRL858R. In advanced lung adenocarcinoma, doubling smoking pack-years was associated with an increase in TMB, whereas doubling smoking-free months was associated with a decrease in TMB, after controlling for age, gender, and stage. There is a significant dose–response association of smoking history with genetic alterations in cancer-related pathways and TMB in advanced lung adenocarcinoma. Significance: This study clarifies the relationship between smoking history and clinically relevant mutations in non–small cell lung cancer, revealing the potential of smoking history as a surrogate for tumor mutation burden.
A growing number of studies clearly demonstrate a substantial link between metabolic dysfunction and the risk of Alzheimer's disease (AD), especially glucose related dysfunction; one hypothesis for this comorbidity is the presence of a common genetic etiology. We conducted a large-scale cross-trait GWAS to investigate the genetic overlap between AD and 10 metabolic traits. Among all the metabolic traits, fasting glucose, fasting insulin and HDL were found to be genetically associated with AD. Local genetic covariance analysis found 19q13 region had strong local genetic correlation between AD and T2D (P=6.78×10 −22), LDL (P=1.74×10 −253) and HDL (P=7.94×10 −18). Cross-trait meta-analysis identified 4 loci that were associated with AD and fasting glucose, 3 loci that were associated with AD and fasting insulin, and 20 loci that were associated with AD and HDL (P meta <1.6×10 −8 , single trait P < 0.05). Functional analysis revealed that the shared genes are enriched in amyloid metabolic process, lipoprotein remodeling and other related pathways; pancreas, liver, blood and other tissues. Our work identifies common genetic
1Whole genome sequencing (WGS) studies are being widely conducted to identify rare 2 variants associated with human diseases and disease-related traits. Classical single-3 marker association analyses for rare variants have limited power, and variant-set based 4analyses are commonly used to analyze rare variants. However, existing variant-set 5 based approaches need to pre-specify genetic regions for analysis, and hence are not 6 directly applicable to WGS data due to the large number of intergenic and intron regions 7 that consist of a massive number of non-coding variants. The commonly used sliding 8 window method requires pre-specifying fixed window sizes, which are often unknown as 9 a priori, are difficult to specify in practice and are subject to limitations given genetic 10 association region sizes are likely to vary across the genome and phenotypes. We 11 propose a computationally-efficient and dynamic scan statistic method (Scan the 12 Genome (SCANG)) for analyzing WGS data that flexibly detects the sizes and the 13 locations of rare-variants association regions without the need of specifying a prior fixed 14 window size. The proposed method controls the genome-wise type I error rate and 15 accounts for the linkage disequilibrium among genetic variants. It allows the detected 16 rare variants association region sizes to vary across the genome. Through extensive 17 simulated studies that consider a wide variety of scenarios, we show that SCANG 18 substantially outperforms several alternative rare-variant association detection methods 19 while controlling for the genome-wise type I error rates. We illustrate SCANG by 20 analyzing the WGS lipids data from the Atherosclerosis Risk in Communities (ARIC) 21 study. 22 23
Over a decade of genome-wide association studies have made great strides toward the detection of genes and genetic mechanisms underlying complex traits. However, the majority of associated loci reside in non-coding regions that are functionally uncharacterized in general. Now, the availability of large-scale tissue and cell type-specific transcriptome and epigenome data enables us to elucidate how non-coding genetic variants can affect gene expressions and are associated with phenotypic changes. Here we provide an overview of this emerging field in human genomics, summarizing available data resources and state-of-the-art analytic methods to facilitate in-silico prioritization of non-coding regulatory mutations. We also highlight the limitations of current approaches and discuss the direction of much needed future research.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.