Structural variants (SVs) rearrange large segments of DNA1 and can have profound consequences in evolution and human disease2,3. As national biobanks, disease-association studies, and clinical genetic testing have grown increasingly reliant on genome sequencing, population references such as the Genome Aggregation Database (gnomAD)4 have become integral in the interpretation of single-nucleotide variants (SNVs)5. However, there are no reference maps of SVs from high-coverage genome sequencing comparable to those for SNVs. Here we present a reference of sequence-resolved SVs constructed from 14,891 genomes across diverse global populations (54% non-European) in gnomAD. We discovered a rich and complex landscape of 433,371 SVs, from which we estimate that SVs are responsible for 25–29% of all rare protein-truncating events per genome. We found strong correlations between natural selection against damaging SNVs and rare SVs that disrupt or duplicate protein-coding sequence, which suggests that genes that are highly intolerant to loss-of-function are also sensitive to increased dosage6. We also uncovered modest selection against noncoding SVs in cis-regulatory elements, although selection against protein-truncating SVs was stronger than all noncoding effects. Finally, we identified very large (over one megabase), rare SVs in 3.9% of samples, and estimate that 0.13% of individuals may carry an SV that meets the existing criteria for clinically important incidental findings7. This SV resource is freely distributed via the gnomAD browser8 and will have broad utility in population genetics, disease-association studies, and diagnostic screening.
Recently developed spatial gene expression technologies such as the SpatialTranscriptomics and Visium platforms allow for comprehensive measurement of transcriptomic profiles while retaining spatial context. However, existing methods for analyzing spatial gene expression data often do not efficiently leverage the spatial information and fail to address the limited resolution of the technology. Here, we introduce BayesSpace, a fully Bayesian statistical method for clustering analysis and resolution enhancement of spatial transcriptomics data that seamlessly integrates into current transcriptomics analysis workflows. We show that BayesSpace improves the identification of transcriptionally distinct tissues from spatial transcriptomics samples of the brain, of melanoma, and of squamous cell carcinoma. In particular, BayesSpace's improved resolution allows the identification of tissue structure that is not detectable at the original resolution and thus not recovered by other methods. Using an in silico dataset constructed from scRNA-seq, we demonstrate that BayesSpace can spatially resolve expression patterns to near single-cell resolution without the need for external single-cell sequencing data.In all, our results illustrate the utility BayesSpace has in facilitating the discovery of biological insights from a variety of spatial transcriptomics datasets.
Despite their clinical significance, characterization of balanced chromosomal abnormalities (BCAs) has largely been restricted to cytogenetic resolution. We explored the landscape of BCAs at nucleotide resolution in 273 subjects with a spectrum of congenital anomalies. Whole-genome sequencing revised 93% of karyotypes and revealed complexity that was cryptic to karyotyping in 21% of BCAs, highlighting the limitations of conventional cytogenetic approaches. At least 33.9% of BCAs resulted in gene disruption that likely contributed to the developmental phenotype, 5.2% were associated with pathogenic genomic imbalances, and 7.3% disrupted topologically associated domains (TADs) encompassing known syndromic loci. Remarkably, BCA breakpoints in eight subjects altered a single TAD encompassing MEF2C, a known driver of 5q14.3 microdeletion syndrome, resulting in decreased MEF2C expression. This study proposes that sequence-level resolution dramatically improves prediction of clinical outcomes for balanced rearrangements, and provides insight into novel pathogenic mechanisms such as altered regulation due to changes in chromosome topology.
Genomic association studies of common or rare protein-coding variation have established robust statistical approaches to account for multiple testing. Here, we present a comparable framework to evaluate rare and de novo noncoding single nucleotide variants, insertion/deletions, and all classes of structural variation from whole-genome sequencing (WGS). Integrating genomic annotations at the level of nucleotides, genes, and regulatory regions, we define 51,801 annotation categories. Analyses of 519 autism spectrum disorder families did not identify association with any categories after correction for 4,123 effective tests. Without appropriate correction, biologically plausible associations are observed in both cases and controls. Despite excluding previously identified gene-disrupting mutations, coding regions still exhibited the strongest associations. Thus, in autism the contribution of de novo noncoding variation is probably modest compared to de novo coding variants. Robust results from future WGS studies will require large cohorts and comprehensive analytical strategies that consider the substantial multiple testing burden.
BackgroundStructural variation (SV) influences genome organization and contributes to human disease. However, the complete mutational spectrum of SV has not been routinely captured in disease association studies.ResultsWe sequenced 689 participants with autism spectrum disorder (ASD) and other developmental abnormalities to construct a genome-wide map of large SV. Using long-insert jumping libraries at 105X mean physical coverage and linked-read whole-genome sequencing from 10X Genomics, we document seven major SV classes at ~5 kb SV resolution. Our results encompass 11,735 distinct large SV sites, 38.1% of which are novel and 16.8% of which are balanced or complex. We characterize 16 recurrent subclasses of complex SV (cxSV), revealing that: (1) cxSV are larger and rarer than canonical SV; (2) each genome harbors 14 large cxSV on average; (3) 84.4% of large cxSVs involve inversion; and (4) most large cxSV (93.8%) have not been delineated in previous studies. Rare SVs are more likely to disrupt coding and regulatory non-coding loci, particularly when truncating constrained and disease-associated genes. We also identify multiple cases of catastrophic chromosomal rearrangements known as chromoanagenesis, including somatic chromoanasynthesis, and extreme balanced germline chromothripsis events involving up to 65 breakpoints and 60.6 Mb across four chromosomes, further defining rare categories of extreme cxSV.ConclusionsThese data provide a foundational map of large SV in the morbid human genome and demonstrate a previously underappreciated abundance and diversity of cxSV that should be considered in genomic studies of human disease.Electronic supplementary materialThe online version of this article (doi:10.1186/s13059-017-1158-6) contains supplementary material, which is available to authorized users.
SUMMARYMitral valve prolapse (MVP) is a common cardiac valve disease that affects nearly 1 in 40 individuals1–3. It can manifest as mitral regurgitation and is the leading indication for mitral valve surgery4,5. Despite a clear heritable component, the genetic etiology leading to non-syndromic MVP has remained elusive. Four affected individuals from a large multigenerational family segregating non-syndromic MVP underwent capture sequencing of the linked interval on chromosome 11. We report a missense mutation in the DCHS1 gene, the human homologue of the Drosophila cell polarity gene dachsous (ds) that segregates with MVP in the family. Morpholino knockdown of the zebrafish homolog dachsous1b resulted in a cardiac atrioventricular canal defect that could be rescued by wild-type human DCHS1, but not by DCHS1 mRNA with the familial mutation. Further genetic studies identified two additional families in which a second deleterious DCHS1 mutation segregates with MVP. Both DCHS1 mutations reduce protein stability as demonstrated in zebrafish, cultured cells, and, notably, in mitral valve interstitial cells (MVICs) obtained during mitral valve repair surgery of a proband. Dchs1+/− mice had prolapse of thickened mitral leaflets, which could be traced back to developmental errors in valve morphogenesis. DCHS1 deficiency in MVP patient MVICs as well as in Dchs1+/− mouse MVICs result in altered migration and cellular patterning, supporting these processes as etiological underpinnings for the disease. Understanding the role of DCHS1 in mitral valve development and MVP pathogenesis holds potential for therapeutic insights for this very common disease.
A Correction to this paper has been published: https://doi.org/10.1038/s41586-020-03176-6.
Copy-number variants (CNVs) have been the predominant focus of genetic studies of structural variation, and chromosomal microarray (CMA) for genome-wide CNV detection is the recommended first-tier genetic diagnostic screen in neurodevelopmental disorders. We compared CNVs observed by CMA to the structural variation detected by whole-genome large-insert sequencing in 259 individuals diagnosed with autism spectrum disorder (ASD) from the Simons Simplex Collection. These analyses revealed a diverse landscape of complex duplications in the human genome. One remarkably common class of complex rearrangement, which we term dupINVdup, involves two closely located duplications ("paired duplications") that flank the breakpoints of an inversion. This complex variant class is cryptic to CMA, but we observed it in 8.1% of all subjects. We also detected other paired-duplication signatures and duplication-mediated complex rearrangements in 15.8% of all ASD subjects. Breakpoint analysis showed that the predominant mechanism of formation of these complex duplication-associated variants was microhomology-mediated repair. On the basis of the striking prevalence of dupINVdups in this cohort, we explored the landscape of all inversion variation among the 235 highest-quality libraries and found abundant complexity among these variants: only 39.3% of inversions were canonical, or simple, inversions without additional rearrangement. Collectively, these findings indicate that dupINVdups, as well as other complex duplication-associated rearrangements, represent relatively common sources of genomic variation that is cryptic to population-based microarray and low-depth whole-genome sequencing. They also suggest that paired-duplication signatures detected by CMA warrant further scrutiny in genetic diagnostic testing given that they might mark complex rearrangements of potential clinical relevance.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.