Supplementary data are available at Bioinformatics online.
Structural variants (SVs) are an important source of human genetic diversity but their contribution to traits, disease, and gene regulation remains unclear. We mapped cis expression quantitative trait loci (eQTLs) in 13 tissues via joint analysis of SVs, single nucleotide (SNV), and short insertion/deletion (indel) variants from deep whole genome sequencing (WGS). We estimate that SVs are causal at 3.5–6.8% of eQTLs – a substantially higher fraction than prior estimates – and that expression-altering SVs have larger effect sizes than SNVs and indels. We identified 789 putative causal SVs predicted to directly alter gene expression: most (88.3%) are noncoding variants enriched at enhancers and other regulatory elements, and 52 are linked to genome-wide association study loci. We observe a notable abundance of rare, high impact SVs associated with aberrant expression of nearby genes. These results suggest that comprehensive WGS-based SV analyses will increase the power of common and rare variant association studies.
Rare genetic variants are abundant in humans and are expected to contribute to individual disease risk1,2,3,4. While genetic association studies have successfully identified common genetic variants associated with susceptibility, these studies are not practical for identifying rare variants1,5. Efforts to distinguish pathogenic variants from benign rare variants have leveraged the genetic code to identify deleterious protein-coding alleles1,6,7, but no analogous code exists for non-coding variants. Therefore, ascertaining which rare variants have phenotypic effects remains a major challenge. Rare non-coding variants have been associated with extreme gene expression in studies using single tissues8,9,10,11, but their effects across tissues are unknown. Here we identify gene expression outliers, or individuals showing extreme expression levels for a particular gene, across 44 human tissues by using combined analyses of whole genomes and multi-tissue RNA-sequencing data from the Genotype-Tissue Expression (GTEx) project v6p release12. We find that 58% of underexpression and 28% of overexpression outliers have nearby conserved rare variants compared to 8% of non-outliers. Additionally, we developed RIVER (RNA-informed variant effect on regulation), a Bayesian statistical model that incorporates expression data to predict a regulatory effect for rare variants with higher accuracy than models using genomic annotations alone. Overall, we demonstrate that rare variants contribute to large gene expression changes across tissues and provide an integrative method for interpretation of rare variants in individual genomes.
Affordable genome sequencing technologies promise to revolutionize the field of human genetics by enabling comprehensive studies that interrogate all classes of genome variation, genome-wide, across the entire allele frequency spectrum. Ongoing projects worldwide are sequencing many thousands-and soon millions-of human genomes as part of various gene mapping studies, biobanking efforts, and clinical programs. However, while genome sequencing data production has become routine, genome analysis and interpretation remain challenging endeavors with many limitations and caveats. Here, we review the current state of technologies for genetic variant discovery, genotyping, and functional interpretation and discuss the prospects for future advances. We focus on germline variants discovered by whole-genome sequencing, genome-wide functional genomic approaches for predicting and measuring variant functional effects, and implications for studies of common and rare human disease.
Gene co-expression networks capture biologically important patterns in gene expression data, enabling functional analyses of genes, discovery of biomarkers, and interpretation of genetic variants. Most network analyses to date have been limited to assessing correlation between total gene expression levels in a single tissue or small sets of tissues. Here, we built networks that additionally capture the regulation of relative isoform abundance and splicing, along with tissue-specific connections unique to each of a diverse set of tissues. We used the Genotype-Tissue Expression (GTEx) project v6 RNA sequencing data across 50 tissues and 449 individuals. First, we developed a framework called Transcriptome-Wide Networks (TWNs) for combining total expression and relative isoform levels into a single sparse network, capturing the interplay between the regulation of splicing and transcription. We built TWNs for 16 tissues and found that hubs in these networks were strongly enriched for splicing and RNA binding genes, demonstrating their utility in unraveling regulation of splicing in the human transcriptome. Next, we used a Bayesian biclustering model that identifies network edges unique to a single tissue to reconstruct Tissue-Specific Networks (TSNs) for 26 distinct tissues and 10 groups of related tissues. Finally, we found genetic variants associated with pairs of adjacent nodes in our networks, supporting the estimated network structures and identifying 20 genetic variants with distant regulatory impact on transcription and splicing. Our networks provide an improved understanding of the complex relationships of the human transcriptome across tissues.
Rare genetic variants are abundant across the human genome, and identifying their function and phenotypic impact is a major challenge. Measuring aberrant gene expression has aided in identifying functional, large-effect rare variants (RVs). Here, we expanded detection of genetically driven transcriptome abnormalities by analyzing gene expression, allele-specific expression, and alternative splicing from multitissue RNA-sequencing data, and demonstrate that each signal informs unique classes of RVs. We developed Watershed, a probabilistic model that integrates multiple genomic and transcriptomic signals to predict variant function, validated these predictions in additional cohorts and through experimental assays, and used them to assess RVs in the UK Biobank, the Million Veterans Program, and the Jackson Heart Study. Our results link thousands of RVs to diverse molecular effects and provide evidence to associate RVs affecting the transcriptome with human traits.
Structural variants (SVs) are an important source of human genetic diversity but their contribution to traits, disease, and gene regulation remains unclear. The Genotype-Tissue Expression (GTEx) project presents an unprecedented opportunity to address this question due to the availability of deep whole genome sequencing (WGS) and multi-tissue RNA-seq data from 147 individuals. We used comprehensive methods to identify 24,157 high confidence SVs, and mapped cis expression quantitative trait loci (eQTLs) in 13 tissues via joint analysis of SVs, single nucleotide (SNV) and short insertion/deletion (indel) variants. We identified 24,801 eQTLs affecting the expression of 10,101 distinct genes. Based on haplotype structure and heritability partitioning, we estimate that SVs are the causal variant at 3.3-7.0% of eQTLs, which is nearly an order of magnitude higher than prior estimates from low coverage WGS and represents a 26- to 54-fold enrichment relative to their scarcity in the genome. Expression-altering SVs also have significantly larger effect sizes than SNVs and indels. We identified 787 putatively causal SVs predicted to directly alter gene expression, most of which (88.3%) are noncoding variants that show significant enrichment at enhancers and other regulatory elements. By evaluating linkage disequilibrium between SVs, SNVs and indels, we nominate 49 SVs as plausible causal variants at published genome-wide association study (GWAS) loci. Remarkably, 29.9% of the common SV-eQTLs are not well tagged by flanking SNVs, and we observe a notable abundance (relative to SNVs and indels) of rare, high impact SVs associated with aberrant expression of nearby genes. These results suggest that comprehensive WGS-based SV analyses will increase the power of both common and rare variant association studies.
Mammalian somatosensory topographic maps contain specialized neuronal structures that precisely recapitulate the spatial pattern of peripheral sensory organs. In the mouse, whiskers are orderly mapped onto several brainstem nuclei as a set of modular structures termed barrelettes. Using a dual-color iontophoretic labeling strategy, we found that the precise topography of barrelettes is not a result of ordered positions of sensory neurons within the ganglion. We next explored another possibility that formation of the whisker map is influenced by periphery-derived mechanisms. During the period of peripheral sensory innervation, several TGF-β ligands are exclusively expressed in whisker follicles in a dynamic spatiotemporal pattern. Disrupting TGF-β signaling, specifically in sensory neurons by conditional deletion of Smad4 at the late embryonic stage, results in the formation of abnormal barrelettes in the principalis and interpolaris brainstem nuclei and a complete absence of barrelettes in the caudalis nucleus. We further show that this phenotype is not derived from defective peripheral innervation or central axon outgrowth but is attributable to the misprojection and deficient segregation of trigeminal axonal collaterals into proper barrelettes. Furthermore, Smad4-deficient neurons develop simpler terminal arbors and form fewer synapses. Together, our findings substantiate the involvement of whisker-derived TGF-β/Smad4 signaling in the formation of the whisker somatotopic maps.O ne prominent characteristic of the rodent whisker-somatosensory system is its precisely organized topographic sensory maps (1-3). Each whisker is innervated by peripheral axons of a subset of trigeminal sensory neurons whose cell bodies reside in the trigeminal ganglia (TG) and central axons project to the brainstem (4). Sensory afferents carrying information from individual whiskers segregate and converge to form modular structures termed barrelettes, whose spatial organization exactly mirrors that of the whiskers in the periphery (5). The barrelette map in the brainstem emerges during development and serves as a template for the subsequent generation of homologous upstream structures in the thalamus and cortex, termed barreloids and barrels, respectively (5, 6). Interestingly, induction of an extra whisker by exogenous expression of Shh during early development leads to the formation of an extra barrelette with a topographic position corresponding to that of the ectopic whisker (7). Together with other studies (8-10), these data suggest that the formation of the whisker map is under the strong instructive influence of the periphery. The whisker-derived signals regulating barrelette formation remain mostly unknown, however.Previous work showed that BMP4 signaling induces differential expression of genes in trigeminal sensory neurons innervating different areas of the face along the dorsoventral axis (11,12). At later developmental stages, multiple TGF-β superfamily ligands are expressed in whisker follicles during the period of sensory...
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.