Short tandem repeats (STRs) have been implicated in a variety of complex traits in humans. However, genome-wide studies of the effects of STRs on gene expression thus far have had limited power to detect associations and provide insights into putative mechanisms. Here, we leverage whole genome sequencing and expression data for 17 tissues from the Genotype-Tissue Expression Project to identify more than 28,000 STRs for which repeat number is associated with expression of nearby genes (eSTRs). We employ fine-mapping to quantify the probability that each eSTR is causal and characterize the top 1,400 fine-mapped eSTRs. We identify hundreds of eSTRs linked with published GWAS signals and implicate specific eSTRs in complex traits including height, schizophrenia, inflammatory bowel disease, and intelligence. Overall, our results support the hypothesis that eSTRs contribute to a range of human phenotypes and our data will serve as a valuable resource for future studies of complex traits Users may view, print, copy, and download text and data-mine the content in such documents, for the purposes of academic research, subject always to the full Conditions of use:
Summary A rich set of tools have recently been developed for performing genome-wide genotyping of tandem repeats (TRs). However, standardized tools for downstream analysis of these results are lacking. To facilitate TR analysis applications, we present TRTools, a Python library and suite of command line tools for filtering, merging, and quality control of TR genotype files. TRTools utilizes an internal harmonization module making it compatible with outputs from a wide range of TR genotypers. Availability TRTools is freely available at https://github.com/gymreklab/TRTools. Documentation Detailed documentation is available at https://trtools.readthedocs.io. Supplementary information Supplementary data are available at Bioinformatics online.
Short tandem repeats (STRs), genomic regions each consisting of a sequence of 1-6 base pairs repeated in succession, represent one of the largest sources of human genetic variation. However, many STR effects are not captured well by standard genome-wide association studies (GWAS) or downstream analyses that are mostly based on single nucleotide polymorphisms (SNPs). To study the involvement of STRs in complex traits, we imputed genotypes for 445,735 autosomal STRs into SNP data from 408,153 White British UK Biobank participants and tested for association with 44 blood and serum biomarker phenotypes. We used two fine-mapping methods, SuSiE and FINEMAP, to identify 118 high-confidence STR-trait associations predicted as causal variants under all fine-mapping settings tested. Using these results, we estimate that STRs drive 5.2-9.7% of GWAS signals for these traits. Our high confidence STR-trait associations implicate STRs in some of the strongest hits for multiple phenotypes, including a trinucleotide STR in APOB associated with LDL cholesterol and a CGG repeat in the promoter of CBL associated with multiple platelet traits. Replication analyses in additional population groups and orthogonal expression data further support the role of a subset of the candidate STRs we identify. Together, our study suggests that polymorphic tandem repeats make widespread contributions to complex traits, provides a set of stringently selected candidate causal STRs, and demonstrates the need to routinely consider a more complete view of human genetic variation in GWAS.
Short tandem repeats (STRs) have been implicated in a variety of complex traits in humans. However, genome-wide studies of the effects of STRs on gene expression thus far have had limited power to detect associations and elucidate the underlying biological mechanisms. Here, we leverage whole genome sequencing and expression data for 17 tissues from GTEx to identify STRs whose repeat lengths are associated with expression of nearby genes (eSTRs). Our analysis reveals more than 3,000 high-confidence eSTRs, which are enriched in known or predicted regulatory regions. We show eSTRs may act through a variety of mechanisms. We further identify hundreds of eSTRs that potentially drive published GWAS signals and implicate specific eSTRs in height and schizophrenia. Overall, our results demonstrate that eSTRs potentially contribute to a range of human phenotypes. We expect that our comprehensive eSTR catalog will serve as a valuable resource for future studies of complex traits. link between an eSTR for RFT1 and height and use reporter assays to experimentally validate the effect of this STR on expression. Finally, the complete catalog of eSTRs is publicly available and will likely be a valuable resource for future studies of complex traits. Results Profiling expression STRs across 17 human tissuesWe performed a genome-wide analysis to identify associations between the number of repeats in each STR and expression of nearby genes (expression STRs, or "eSTRs", which we use to refer to a unique STR by gene association). We focused on 652 samples included in the Genotype Tissue Expression (GTEx) (GTEx Consortium, 2015) dataset for which both high coverage whole genome sequencing (WGS) and RNA-sequencing of multiple tissues were available. The WGS cohort consisted of 561 individuals with reported European ancestry, 75 of African ancestry, and 8, 3, and 5 of Asian, Amerindian, and Unknown ancestry, respectively. We used HipSTR (Willems et al., 2017) to genotype STRs in each sample. Resulting genotypes were subjected to stringent filtering to remove low quality calls ( Methods ). After filtering, 175,226 STRs remained for downstream analysis. To identify eSTRs, we performed a linear regression between average STR length and normalized gene expression for each individual at each STR within 100kb of a gene, controlling for sex, population structure, and technical covariates ( Methods , Figures S1, S2 ). Analysis was restricted to 17 tissues where we had data for at least 100 samples ( Figure 1A, Table S1, Methods ) and to genes with median RPKM greater than 0. As a control, for each STR-gene pair we performed a permutation analysis in which sample identifiers were shuffled. Altogether, we performed an average of 278,521 STR-gene tests across 16,065 genes per tissue.Using this approach, we identified 25,561 unique eSTRs associated with 11,810 genes in at least one tissue at a gene-level FDR of 10% ( Methods ). Of these, 8,417 (32.5%) were shared by two or more tissues and 469 were shared by 10 or more tissues ( Figure S3 ). P-value...
A rich set of tools have recently been developed for performing genomewide genotyping of tandem repeats (TRs). However, standardized tools for downstream analysis of these results are lacking. To facilitate TR analysis applications, we present TRTools, a Python library and a suite of command-line tools for filtering, merging, and quality control of TR genotype files. TRTools utilizes an internal harmonization module making it compatible with outputs from a wide range of TR genotypers.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.