The transcriptome-wide association study (TWAS) has emerged as one of several promising techniques for integrating multi-scale ‘omics’ data into traditional genome-wide association studies (GWAS). Unlike GWAS, which associates phenotypic variance directly with genetic variants, TWAS uses a reference dataset to train a predictive model for gene expressions, which allows it to associate phenotype with variants through the mediating effect of expressions. Although effective, this core innovation of TWAS is poorly understood, since the predictive accuracy of the genotype-expression model is generally low and further bounded by expression heritability. This raises the question: to what degree does the accuracy of the expression model affect the power of TWAS? Furthermore, would replacing predictions with actual, experimentally determined expressions improve power? To answer these questions, we compared the power of GWAS, TWAS, and a hypothetical protocol utilizing real expression data. We derived non-centrality parameters (NCPs) for linear mixed models (LMMs) to enable closed-form calculations of statistical power that do not rely on specific protocol implementations. We examined two representative scenarios: causality (genotype contributes to phenotype through expression) and pleiotropy (genotype contributes directly to both phenotype and expression), and also tested the effects of various properties including expression heritability. Our analysis reveals two main outcomes: (1) Under pleiotropy, the use of predicted expressions in TWAS is superior to actual expressions. This explains why TWAS can function with weak expression models, and shows that TWAS remains relevant even when real expressions are available. (2) GWAS outperforms TWAS when expression heritability is below a threshold of 0.04 under causality, or 0.06 under pleiotropy. Analysis of existing publications suggests that TWAS has been misapplied in place of GWAS, in situations where expression heritability is low.
The power of genotype–phenotype association mapping studies increases greatly when contributions from multiple variants in a focal region are meaningfully aggregated. Currently, there are two popular categories of variant aggregation methods. Transcriptome-wide association studies (TWAS) represent a set of emerging methods that select variants based on their effect on gene expressions, providing pretrained linear combinations of variants for downstream association mapping. In contrast to this, kernel methods such as sequence kernel association test (SKAT) model genotypic and phenotypic variance use various kernel functions that capture genetic similarity between subjects, allowing nonlinear effects to be included. From the perspective of machine learning, these two methods cover two complementary aspects of feature engineering: feature selection/pruning and feature aggregation. Thus far, no thorough comparison has been made between these categories, and no methods exist which incorporate the advantages of TWAS- and kernel-based methods. In this work, we developed a novel method called kernel-based TWAS (kTWAS) that applies TWAS-like feature selection to a SKAT-like kernel association test, combining the strengths of both approaches. Through extensive simulations, we demonstrate that kTWAS has higher power than TWAS and multiple SKAT-based protocols, and we identify novel disease-associated genes in Wellcome Trust Case Control Consortium genotyping array data and MSSNG (Autism) sequence data. The source code for kTWAS and our simulations are available in our GitHub repository (https://github.com/theLongLab/kTWAS).
Background: Standard Genome-wide association study (GWAS) discovers genetic variants explaining phenotypic variance by directly associate them. With the availability of other omics data such as gene expression, the field is stepping into an exciting era of multi-scale omics integration. An emerging technique is transcriptome-wide association study (TWAS) that conducts association mapping by utilizing gene expression data from a separate reference dataset based on which a model predicting expression by genotype is trained. Despite its success in practice, two fundamental questions have been unaddressed yet. First, in practice, the accuracy of predicting expression by genotype is generally low, which is bounded by the expression heritability. So, the question is whether such a low accuracy may impact the power of TWAS, and what level of accuracy is sufficient. Second, since predicting expression is a critical step in TWAS, one may ask what if we have actual expression assessed by a real experiment, and whether that will improve or deteriorate power. Answering these questions will bring thorough understanding of TWAS and practical guidelines in association mapping. Results: To address the above questions, we conducted power analysis for GWAS, TWAS, and expression medicated GWAS (emGWAS). Specifically, we derived non-centrality parameters (NCPs), enabling closed-form derivation of statistical power to facilitate a thorough power analysis without relying on particular implementations. We assessed the power of the three protocols with respect to two representative scenarios: causality (genotype contributes to phenotype through expression) and pleiotropy (genotype contributes directly to both phenotype and expression). For both scenarios, we tested various properties including expression heritability. Conclusions: (1) TWAS utilizing predicted expression enjoys higher power than emGWAS that has actual expressions in the pleiotropy scenario, revealing a deep insight into TWAS models as well as a practical guideline of applying TWAS even in cases when expressions are available in a GWAS dataset. (2) TWAS is suboptimal compared to GWAS when expression heritability is too low. The superiority ordering of TWAS and GWAS disclosed a turn-point in each of the causality and pleiotropy scenarios. Analysis of published discoveries shows the selection of protocols might be questionable based on the identified turn-points.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.