Despite the great success of genome-wide association studies (GWAS) in identification of the common genetic variants associated with complex diseases, the current GWAS have focused on single-SNP analysis. However, single-SNP analysis often identifies only a few of the most significant SNPs that account for a small proportion of the genetic variants and offers only a limited understanding of complex diseases. To overcome these limitations, we propose gene and pathway-based association analysis as a new paradigm for GWAS. As a proof of concept, we performed a comprehensive gene and pathway-based association analysis of 13 published GWAS. Our results showed that the proposed new paradigm for GWAS not only identified the genes that include significant SNPs found by single-SNP analysis, but also detected new genes in which each single SNP conferred a small disease risk; however, their joint actions were implicated in the development of diseases. The results also showed that the new paradigm for GWAS was able to identify biologically meaningful pathways associated with the diseases, which were confirmed by a gene-set-rich analysis using gene expression data.
Current GWAS have primarily focused on testing association of single SNPs. To only test for association of single SNPs has limited utility and is insufficient to dissect the complex genetic structure of many common diseases. To meet conceptual and technical challenges raised by GWAS, we propose gene and pathway-based GWAS as complementary to the current single SNP-based GWAS. This publication develops three statistics for testing association of genes and pathways with disease: linear combination test, quadratic test and decorrelation test which take correlations among SNPs within a gene or genes within a pathway into account. The null distribution of the proposed statistics is examined and the statistics are applied to GWAS of rheumatoid arthritis in the Wellcome Trust Case Control Consortium and the North American Rheumatoid Arthritis Consortium studies. The preliminary results show that the proposed gene and pathway-based GWAS offer several remarkable features. First, not only can they identify the genes that have large genetic effects, but also they can detect new genes in which each single SNP conferred a small amount of disease risk, and their joint actions can be implicated in the development of diseases. Second, gene and pathway-based analysis can allow the formation of the core of pathway definition of complex diseases and unravel the functional bases of an association finding. Third, replication of association findings at the gene or pathway level is much easier than replication at the individual SNP level.
Telomeres play a central role in cellular aging, and shorter telomere length has been associated with age-related disorders including diabetes. However, a causal link between telomere shortening and diabetes risk has not been established. In a well-characterized longitudinal cohort of American Indians participating in the Strong Heart Family Study, we examined whether leukocyte telomere length (LTL) at baseline predicts incident diabetes independent of known diabetes risk factors. Among 2,328 participants free of diabetes at baseline, 292 subjects developed diabetes during an average 5.5 years of follow-up. Compared with subjects in the highest quartile (longest) of LTL, those in the lowest quartile (shortest) had an almost twofold increased risk of incident diabetes (hazard ratio [HR] 1.83 [95% CI 1.26–2.66]), whereas the risk for those in the second (HR 0.87 [95% CI 0.59–1.29]) and the third (HR 0.95 [95% CI 0.65–1.38]) quartiles was statistically nonsignificant. These findings suggest a nonlinear association between LTL and incident diabetes and indicate that LTL could serve as a predictive marker for diabetes development in American Indians, who suffer from disproportionately high rates of diabetes.
Although great progress in genome-wide association studies (GWAS) has been made,
the significant SNP associations identified by GWAS account for only a few
percent of the genetic variance, leading many to question where and how we can
find the missing heritability. There is increasing interest in genome-wide
interaction analysis as a possible source of finding heritability unexplained by
current GWAS. However, the existing statistics for testing interaction have low
power for genome-wide interaction analysis. To meet challenges raised by
genome-wide interactional analysis, we have developed a novel statistic for
testing interaction between two loci (either linked or unlinked). The null
distribution and the type I error rates of the new statistic for testing
interaction are validated using simulations. Extensive power studies show that
the developed statistic has much higher power to detect interaction than
classical logistic regression. The results identified 44 and 211 pairs of SNPs
showing significant evidence of interactions with FDR<0.001 and
0.001
An individual's disease risk is determined by the compounded action of both common variants, inherited from remote ancestors, that segregated within the population and rare variants, inherited from recent ancestors, that segregated mainly within pedigrees. Next-generation sequencing (NGS) technologies generate high-dimensional data that allow a nearly complete evaluation of genetic variation. Despite their promise, NGS technologies also suffer from remarkable limitations: high error rates, enrichment of rare variants, and a large proportion of missing values, as well as the fact that most current analytical methods are designed for population-based association studies. To meet the analytical challenges raised by NGS, we propose a general framework for sequence-based association studies that can use various types of family and unrelated-individual data sampled from any population structure and a universal procedure that can transform any population-based association test statistic for use in family-based association tests. We develop family-based functional principal-component analysis (FPCA) with or without smoothing, a generalized T(2), combined multivariate and collapsing (CMC) method, and single-marker association test statistics. Through intensive simulations, we demonstrate that the family-based smoothed FPCA (SFPCA) has the correct type I error rates and much more power to detect association of (1) common variants, (2) rare variants, (3) both common and rare variants, and (4) variants with opposite directions of effect from other population-based or family-based association analysis methods. The proposed statistics are applied to two data sets with pedigree structures. The results show that the smoothed FPCA has a much smaller p value than other statistics.
Background Although in the past few years we have witnessed the rapid development of novel statistical methods for association studies of qualitative traits using next Generation Sequencing (NGS) data, only a few statistics are proposed for testing the association of rare variants with quantitative traits. The QTL analysis of rare variants remains challenging. Analysis from low dimensional data to high dimensional genomic data demands changes in statistical methods from multivariate data analysis to functional data analysis. Methods In this paper, we propose a functional linear model (FLM) as a general principle for developing novel and powerful QTL analysis methods designed for resequencing data. By simulations we calculate the type I error rates and evaluate the power of the FLM and other eight existing statistical methods even in the presence of both positive and negative signs of effects. Results Since the FLM retains all of the genetic information in the data and explores the merits of both variant-by-variant and collective analysis and overcomes their limitation, the FLM has a much higher power than other existing statistics in all the scenarios considered. To further evaluate its performance, the FLM is applied to association analysis of six quantitative traits in the Dallas Heart Study, and RNA-seq eQTL analysis with genetic variation in the low coverage resequencing data of the 1000 Genomes Project. Real data analysis shows that the FLM has much smaller P-values to identify significantly associated variants than other existing methods. Conclusions The FLM is expected to open a new route for QTL analysis.
Our results reveal a critical role of DNA hydroxymethylation in AD pathology and provide mechanistic insight into the molecular mechanisms underlying AD.
Despite great success of GWAS in identification of common genetic variants associated with complex diseases, the current GWAS have focused on single SNP analysis. However, single SNP analysis often identifies a number of the most significant SNPs that account for only a small proportion of the genetic variants and offers limited understanding of complex diseases. To overcome these limitations, we propose gene and pathway-based association analysis as a new paradigm for GWAS. As a proof of concept, we performed a comprehensive gene and pathway-based association analysis for thirteen published GWAS. Our results showed that the proposed new paradigm for GWAS not only identified the genes that include significant SNPs found by single SNP analysis, but also detected new genes in which each single SNP conferred small disease risk, but their joint actions were implicated in the development of diseases. The results also demonstrated that the new paradigm for GWAS was able to identify biologically meaningful pathways associated with the diseases which were confirmed by gene-set rich analysis using gene expression data. Genome-wide association studies (GWAS) armed with efficient genotyping technologies are emerging as a major tool to identify disease susceptibility loci and are successful to detect association of a number of SNPs with complex diseases 1-12. However, to only test for association of single SNP is insufficient to dissect complex genetic structure of complex diseases. To extract biological insight from GWAS and to understand the principles underlying complex phenomena that take place on various biological pathways remain a major challenge. In a typical GWAS, hundreds of thousands of SNPs are genotyped for thousands of individuals. By comparisons of differences in the DNA variations between the normal and affected individuals, the SNPs can be ordered according to their degrees of association. The common approach is to select dozens of the most significant SNPs in the list for further investigations. This approach
scite is a Brooklyn-based startup that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
334 Leonard St
Brooklyn, NY 11211
Copyright © 2023 scite Inc. All rights reserved.
Made with 💙 for researchers