The 1000 Genomes Project set out to provide a comprehensive description of common human genetic variation by applying whole-genome sequencing to a diverse set of individuals from multiple populations. Here we report completion of the project, having reconstructed the genomes of 2,504 individuals from 26 populations using a combination of low-coverage whole-genome sequencing, deep exome sequencing, and dense microarray genotyping. We characterized a broad spectrum of genetic variation, in total over 88 million variants (84.7 million single nucleotide polymorphisms (SNPs), 3.6 million short insertions/deletions (indels), and 60,000 structural variants), all phased onto high-quality haplotypes. This resource includes >99% of SNP variants with a frequency of >1% for a variety of ancestries. We describe the distribution of genetic variation across the global sample, and discuss the implications for common disease studies.
DNA sequence information underpins genetic research, enabling discoveries of important biological or medical benefit. Sequencing projects have traditionally employed long (400–800 bp) reads, but the existence of reference sequences for the human and many other genomes makes it possible to develop new, fast approaches to re-sequencing, whereby shorter reads are compared to a reference to identify intra-species genetic variation. We report an approach that generates several billion bases of accurate nucleotide sequence per experiment at low cost. Single molecules of DNA are attached to a flat surface, amplified in situ and used as templates for synthetic sequencing with fluorescent reversible terminator deoxyribonucleotides. Images of the surface are analysed to generate high quality sequence. We demonstrate application of this approach to human genome sequencing on flow-sorted X chromosomes and then scale the approach to determine the genome sequence of a male Yoruba from Ibadan, Nigeria. We build an accurate consensus sequence from >30x average depth of paired 35-base reads. We characterise four million SNPs and four hundred thousand structural variants, many of which are previously unknown. Our approach is effective for accurate, rapid and economical whole genome re-sequencing and many other biomedical applications.
SummaryThe Tasmanian devil (Sarcophilus harrisii), the largest marsupial carnivore, is endangered due to a transmissible facial cancer spread by direct transfer of living cancer cells through biting. Here we describe the sequencing, assembly, and annotation of the Tasmanian devil genome and whole-genome sequences for two geographically distant subclones of the cancer. Genomic analysis suggests that the cancer first arose from a female Tasmanian devil and that the clone has subsequently genetically diverged during its spread across Tasmania. The devil cancer genome contains more than 17,000 somatic base substitution mutations and bears the imprint of a distinct mutational process. Genotyping of somatic mutations in 104 geographically and temporally distributed Tasmanian devil tumors reveals the pattern of evolution and spread of this parasitic clonal lineage, with evidence of a selective sweep in one geographical area and persistence of parallel lineages in other populations.PaperClip
Chronic lymphocytic leukemia is characterized by relapse after treatment and chemotherapy resistance. Similarly, in other malignancies leukemia cells accumulate mutations during growth, forming heterogeneous cell populations that are subject to Darwinian selection and may respond differentially to treatment. There is therefore a clinical need to monitor changes in the subclonal composition of cancers during disease progression. Here, we use whole-genome sequencing to track subclonal heterogeneity in 3 chronic lymphocytic leukemia patients subjected to repeated cycles of therapy. We reveal different somatic mutation profiles in each patient and use these to establish probable hierarchical patterns of subclonal evolution, to identify subclones that decline or expand over time, and to detect founder mutations. We show that clonal evolution patterns are heterogeneous in individual patients. We conclude that genome sequencing is a powerful and sensitive approach to monitor disease progression repeatedly at the molecular level. IntroductionDespite significant progress in the management of lymphomas and leukemias, relapse remains the major cause of death. Increased use of expensive targeted therapies and toxic chemotherapies (especially in the elderly) confronts us with an urgent need to improve response prediction for all cancer patients to reduce side effects and costs from ineffective treatment. Current diagnostic approaches to treatment selection, response monitoring, and relapse prediction are limited to single genes and apply only to a minority of hematologic cancers. This is at odds with modern concepts of tumor propagation and maintenance, which propose that every cell in an individual cancer is characterized by a combination of mutation events that comprise tumorigenic (driver) mutations, passive (passenger) mutations, and possibly predisposing germline risk variants. Cancer cells propagate and diversify during tumor growth, resulting in a heterogeneous population of genotypically and phenotypically distinct subclones that are related in a hierarchical lineage. As the composition of the local environment changes, for example as a consequence of drug treatment, tumor cell populations adapt and evolve by Darwinian selection. [1][2][3] Whole-genome sequencing (WGS) of a single tumor sample can be used to generate a comprehensive catalog of variants that provides a snapshot of the cell population en masse at a particular time point. 2,4-6 However, over time and with continued evolution of the cancer, this snapshot becomes progressively less representative of the disease. Recent reports have described whole-tumor genomes from single patients or cohorts of individuals mostly at single time points and irrespective of treatment. [7][8][9][10] This approach has enabled identification of mutations representative and in some cases highly predictive of histologic cancer type, outcome, and/or treatment response. [11][12][13][14][15] Comparison of sequence data from primary and metastatic tumor samples, or from multiple lo...
To assess factors influencing the success of whole genome sequencing for mainstream clinical diagnosis, we sequenced 217 individuals from 156 independent cases across a broad spectrum of disorders in whom prior screening had identified no pathogenic variants. We quantified the number of candidate variants identified using different strategies for variant calling, filtering, annotation and prioritisation. We found that jointly calling variants across samples, filtering against both local and external databases, deploying multiple annotation tools and using familial transmission above biological plausibility contributed to accuracy. Overall, we identified disease causing variants in 21% of cases, rising to 34% (23/68) for Mendelian disorders and 57% (8/14) in trios. We also discovered 32 potentially clinically actionable variants in 18 genes unrelated to the referral disorder, though only four were ultimately considered reportable. Our results demonstrate the value of genome sequencing for routine clinical diagnosis, but also highlight many outstanding challenges.Users may view, print, copy, and download text and data-mine the content in such documents, for the purposes of academic research, subject always to the full Conditions of use
The mechanisms involved in progression from monoclonal gammopathy of undetermined significance (MGUS) and smoldering myeloma (SMM) to malignant multiple myeloma (MM) and plasma cell leukemia (PCL) are poorly understood but believed to involve the sequential acquisition of genetic hits. We performed exome and whole genome sequencing on a series of MGUS (n=4), high risk (HR)-SMM (n=4), MM (n=26) and PCL (n=2) samples, including four cases who transformed from HR-SMM to MM, to determine the genetic factors which drive progression of disease. The pattern and number of non-synonymous mutations show that the MGUS disease stage is less genetically complex than MM, and HR-SMM is similar to presenting MM. Intraclonal heterogeneity is present at all stages and using cases of HR-SMM, which transformed to MM, we show that intraclonal heterogeneity is a typical feature of the disease. At the HR-SMM stage of disease the majority of the genetic changes necessary to give rise to MM are already present. These data suggest that clonal progression is the key feature of transformation of HR-SMM to MM and as such the invasive clinically predominant clone typical of MM is already present at the SMM stage and would be amenable to therapeutic intervention at that stage.
Background Mycobacterium tuberculosis complex (MTBC), the causative agent of tuberculosis (TB), is characterized by low sequence diversity making this bacterium one of the classical examples of a genetically monomorphic pathogen. Because of this limited DNA sequence variation, routine genotyping of clinical MTBC isolates for epidemiological purposes relies on highly discriminatory DNA fingerprinting methods based on mobile and repetitive genetic elements. According to the standard view, isolates exhibiting the same fingerprinting pattern are considered direct progeny of the same bacterial clone, and most likely reflect ongoing transmission or disease relapse within individual patients.Methodology/Principal FindingsHere we further investigated this assumption and used massively parallel whole-genome sequencing to compare one drug-susceptible (K-1) and one multidrug resistant (MDR) isolate (K-2) of a rapidly spreading M. tuberculosis Beijing genotype clone from a high incidence region (Karakalpakstan, Uzbekistan). Both isolates shared the same IS6110 RFLP pattern and the same allele at 23 out of 24 MIRU-VNTR loci.We generated 23.9 million (K-1) and 33.0 million (K-2) paired 50 bp purity filtered reads corresponding to a mean coverage of 483.5 fold and 656.1 fold respectively. Compared with the laboratory strain H37Rv both Beijing isolates shared 1,209 SNPs. The two Beijing isolates differed by 130 SNPs and one large deletion. The susceptible isolate had 55 specific SNPs, while the MDR variant had 75 specific SNPs, including the five known resistance-conferring mutations.ConclusionsOur results suggest that M. tuberculosis isolates exhibiting identical DNA fingerprinting patterns can harbour substantial genomic diversity. Because this heterogeneity is not captured by traditional genotyping of MTBC, some aspects of the transmission dynamics of tuberculosis could be missed or misinterpreted. Furthermore, a valid differentiation between disease relapse and exogenous reinfection might be impossible using standard genotyping tools if the overall diversity of circulating clones is limited. These findings have important implications for clinical trials of new anti-tuberculosis drugs.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.