27The transcriptome-wide association studies (TWAS) that test for association between the study 28 trait and the imputed gene expression levels from cis-acting expression quantitative trait loci (cis-29 eQTL) genotypes have successfully enhanced the discovery of genetic risk loci for complex traits.
30By using the gene expression imputation models fitted from reference datasets that have both 31 genetic and transcriptomic data, TWAS facilitates gene-based tests with GWAS data while 32 accounting for the reference transcriptomic data. The existing TWAS tools like PrediXcan and 33 FUSION use parametric imputation models that have limitations for modeling the complex genetic 34 architecture of transcriptomic data. Therefore, to improve on this, we propose to use a Bayesian 35 method that assumes a data-driven nonparametric prior to impute gene expression. The
36nonparametric Bayesian method is flexible and general because it includes both of the parametric 37 imputation models used by PrediXcan and FUSION as special cases. Our simulation studies 38 2 showed that the nonparametric Bayesian model improved both imputation " for transcriptomic 39 data and the TWAS power over PrediXcan. In real applications, our nonparametric Bayesian 40 method fitted transcriptomic imputation models for 57.6% more genes over PrediXcan, thus 41 improving the power of follow-up TWAS. Hence, the nonparametric Bayesian model is preferred 42 for modeling the complex genetic architecture of transcriptomes and is expected to enhance 43 transcriptome-integrated genetic association studies. We implement our Bayesian approach in a 44 convenient software tool "TIGAR" (Transcriptome-Integrated Genetic Association Resource), 45 which imputes transcriptomic data and performs subsequent TWAS using individual-level or 46 summary-level GWAS data. 47 48 Introduction 49Genome-wide association studies (GWAS) have successfully identified thousands of 50 genetic risk loci for complex traits. However, the majority of these loci are located within noncoding 51 regions whose molecular mechanisms remain unknown 1-3 . Recent studies have shown that these 52 associated regions were enriched for regulatory elements such as enhancers (H3K27ac marks) 4; 53 5 and expression of quantitative trait loci (eQTL) 6; 7 , suggesting that the genetically regulated gene 54 expression might play a key role in the biological mechanisms of complex traits. Multiple studies 55 have recently generated rich transcriptomic datasets for diverse tissues of the human body, e.g.,
56the Genotype-Tissue Expression (GTEx) project for 44 human tissues 6 , Genetic European
57Variation in Health and Disease (GEUVADIS) for lymphoblastoid cell lines 8 , Depression Genes 58 and Networks (DGN) for whole-blood samples 9 , and the North American Brain Expression 59 Consortium (NABEC) for cortex tissues 10 . Previous studies [11][12][13][14][15][16] have also shown that integrating 60 transcriptomic data in GWAS can help identify novel functional loci.
61The majority of GWAS projects do not possess tra...