The 1000 Genomes Project set out to provide a comprehensive description of common human genetic variation by applying whole-genome sequencing to a diverse set of individuals from multiple populations. Here we report completion of the project, having reconstructed the genomes of 2,504 individuals from 26 populations using a combination of low-coverage whole-genome sequencing, deep exome sequencing, and dense microarray genotyping. We characterized a broad spectrum of genetic variation, in total over 88 million variants (84.7 million single nucleotide polymorphisms (SNPs), 3.6 million short insertions/deletions (indels), and 60,000 structural variants), all phased onto high-quality haplotypes. This resource includes >99% of SNP variants with a frequency of >1% for a variety of ancestries. We describe the distribution of genetic variation across the global sample, and discuss the implications for common disease studies.
We sequenced exomes from more than 2,500 simplex families each having a child with an autistic spectrum disorder (ASD). By comparing affected to unaffected siblings, we estimate that 13% of de novo (DN) missense mutations and 42% of DN likely gene-disrupting (LGD) mutations contribute to 12% and 9% of diagnoses, respectively. Including copy number variants, coding DN mutations contribute to about 30% of all simplex and 45% of female diagnoses. Virtually all LGD mutations occur opposite wild-type alleles. LGD targets in affected females significantly overlap the targets in males of lower IQ, but neither overlaps significantly with targets in males of higher IQ. We estimate that LGD mutation in about 400 genes can contribute to the joint class of affected females and males of lower IQ, with an overlapping and similar number of genes vulnerable to causative missense mutation. LGD targets in the joint class overlap with published targets for intellectual disability and schizophrenia, and are enriched for chromatin modifiers, FMRP-associated genes and embryonically expressed genes. Virtually all significance for the latter comes from affected females.
We tested the hypothesis that de novo copy number variation (CNV) is associated with autism spectrum disorders (ASDs). We performed comparative genomic hybridization (CGH) on the genomic DNA of patients and unaffected subjects to detect copy number variants not present in their respective parents. Candidate genomic regions were validated by higher-resolution CGH, paternity testing, cytogenetics, fluorescence in situ hybridization, and microsatellite genotyping. Confirmed de novo CNVs were significantly associated with autism (P = 0.0005). Such CNVs were identified in 12 out of 118 (10%) of patients with sporadic autism, in 2 out of 77 (3%) of patients with an affected first-degree relative, and in 2 out of 196 (1%) of controls. Most de novo CNVs were smaller than microscopic resolution. Affected genomic regions were highly heterogeneous and included mutations of single genes. These findings establish de novo germline mutation as a more significant risk factor for ASD than previously recognized.
SUMMARY Exome sequencing of 343 families, each with a single child on the autism spectrum and at least one unaffected sibling, reveal de novo small indels and point substitutions, which come mostly from the paternal line in an age-dependent manner. We do not see significantly greater numbers of de novo missense mutations in affected versus unaffected children, but gene-disrupting mutations (nonsense, splice site, and frame shifts) are twice as frequent, 59 to 28. Based on this differential and the number of recurrent and total targets of gene disruption found in our and similar studies, we estimate between 350 and 400 autism susceptibility genes. Many of the disrupted genes in these studies are associated with the fragile X protein, FMRP, reinforcing links between autism and synaptic plasticity. We find FMRP-associated genes are under greater purifying selection than the remainder of genes and suggest they are especially dosage-sensitive targets of cognitive disorders.
Summary Genomic structural variants (SVs) are abundant in humans, differing from other variation classes in extent, origin, and functional impact. Despite progress in SV characterization, the nucleotide resolution architecture of most SVs remains unknown. We constructed a map of unbalanced SVs (i.e., copy number variants) based on whole genome DNA sequencing data from 185 human genomes, integrating evidence from complementary SV discovery approaches with extensive experimental validations. Our map encompassed 22,025 deletions and 6,000 additional SVs, including insertions and tandem duplications. Most SVs (53%) were mapped to nucleotide resolution, which facilitated analyzing their origin and functional impact. We examined numerous whole and partial gene deletions with a genotyping approach and observed a depletion of gene disruptions amongst high frequency deletions. Furthermore, we observed differences in the size spectra of SVs originating from distinct formation mechanisms, and constructed a map constructed a map of SV hotspots formed by common mechanisms. Our analytical framework and SV map serves as a resource for sequencing-based association studies.
To explore the genetic contribution to autistic spectrum disorders (ASDs), we have studied genomic copy-number variation in a large cohort of families with a single affected child and at least one unaffected sibling. We confirm a major contribution from de novo deletions and duplications but also find evidence of a role for inherited "ultrarare" duplications. Our results show that, relative to males, females have greater resistance to autism from genetic causes, which raises the question of the fate of female carriers. By analysis of the proportion and number of recurrent loci, we set a lower bound for distinct target loci at several hundred. We find many new candidate regions, adding substantially to the list of potential gene targets, and confirm several loci previously observed. The functions of the genes in the regions of de novo variation point to a great diversity of genetic causes but also suggest functional convergence.
Methods for the direct detection of copy number variation (CNV) genome-wide have become effective instruments for identifying genetic risk factors for disease. The application of next-generation sequencing platforms to genetic studies promises to improve sensitivity to detect CNVs as well as inversions, indels, and SNPs. New computational approaches are needed to systematically detect these variants from genome sequence data. Existing sequence-based approaches for CNV detection are primarily based on paired-end read mapping (PEM) as reported previously by Tuzun et al. and Korbel et al. Due to limitations of the PEM approach, some classes of CNVs are difficult to ascertain, including large insertions and variants located within complex genomic regions. To overcome these limitations, we developed a method for CNV detection using read depth of coverage. Event-wise testing (EWT) is a method based on significance testing. In contrast to standard segmentation algorithms that typically operate by performing likelihood evaluation for every point in the genome, EWT works on intervals of data points, rapidly searching for specific classes of events. Overall false-positive rate is controlled by testing the significance of each possible event and adjusting for multiple testing. Deletions and duplications detected in an individual genome by EWT are examined across multiple genomes to identify polymorphism between individuals. We estimated error rates using simulations based on real data, and we applied EWT to the analysis of chromosome 1 from paired-end shotgun sequence data (303) on five individuals. Our results suggest that analysis of read depth is an effective approach for the detection of CNVs, and it captures structural variants that are refractory to established PEM-based methods.
Representational Oligonucleotide Microarray Analysis (ROMA) detects genomic amplifications and deletions with boundaries defined at a resolution of ∼50 kb. We have used this technique to examine 243 breast tumors from two separate studies for which detailed clinical data were available. The very high resolution of this technology has enabled us to identify three characteristic patterns of genomic copy number variation in diploid tumors and to measure correlations with patient survival. One of these patterns is characterized by multiple closely spaced amplicons, or “firestorms,” limited to single chromosome arms. These multiple amplifications are highly correlated with aggressive disease and poor survival even when the rest of the genome is relatively quiet. Analysis of a selected subset of clinical material suggests that a simple genomic calculation, based on the number and proximity of genomic alterations, correlates with life-table estimates of the probability of overall survival in patients with primary breast cancer. Based on this sample, we generate the working hypothesis that copy number profiling might provide information useful in making clinical decisions, especially regarding the use or not of systemic therapies (hormonal therapy, chemotherapy), in the management of operable primary breast cancer with ostensibly good prognosis, for example, small, node-negative, hormone-receptor-positive diploid cases.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
334 Leonard St
Brooklyn, NY 11211
Copyright © 2023 scite Inc. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.