Advances in modern sequencing technologies allow us to generate sufficient data to analyze hundreds of bacterial genomes from a single machine in a single day. This potential for sequencing massive numbers of genomes calls for fully automated methods to produce high-quality assemblies and variant calls. We introduce Pilon, a fully automated, all-in-one tool for correcting draft assemblies and calling sequence variants of multiple sizes, including very large insertions and deletions. Pilon works with many types of sequence data, but is particularly strong when supplied with paired end data from two Illumina libraries with small e.g., 180 bp and large e.g., 3–5 Kb inserts. Pilon significantly improves draft genome assemblies by correcting bases, fixing mis-assemblies and filling gaps. For both haploid and diploid genomes, Pilon produces more contiguous genomes with fewer errors, enabling identification of more biologically relevant genes. Furthermore, Pilon identifies small variants with high accuracy as compared to state-of-the-art tools and is unique in its ability to accurately identify large sequence variants including duplications and resolve large insertions. Pilon is being used to improve the assemblies of thousands of new genomes and to identify variants from thousands of clinically relevant bacterial strains. Pilon is freely available as open source software.
Here we report a high-quality draft genome sequence of the domestic dog (Canis familiaris), together with a dense map of single nucleotide polymorphisms (SNPs) across breeds. The dog is of particular interest because it provides important evolutionary information and because existing breeds show great phenotypic diversity for morphological, physiological and behavioural traits. We use sequence comparison with the primate and rodent lineages to shed light on the structure and evolution of genomes and genes. Notably, the majority of the most highly conserved non-coding sequences in mammalian genomes are clustered near a small subset of genes with important roles in development. Analysis of SNPs reveals long-range haplotypes across the entire dog genome, and defines the nature of genetic diversity within and across breeds. The current SNP map now makes it possible for genome-wide association studies to identify genes responsible for diseases and traits, with important consequences for human and companion animal health.
Massively parallel DNA sequencing technologies are revolutionizing genomics by making it possible to generate billions of relatively short (∼100-base) sequence reads at very low cost. Whereas such data can be readily used for a wide range of biomedical applications, it has proven difficult to use them to generate high-quality de novo genome assemblies of large, repeat-rich vertebrate genomes. To date, the genome assemblies generated from such data have fallen far short of those obtained with the older (but much more expensive) capillary-based sequencing approach. Here, we report the development of an algorithm for genome assembly, ALLPATHS-LG, and its application to massively parallel DNA sequence data from the human and mouse genomes, generated on the Illumina platform. The resulting draft genome assemblies have good accuracy, short-range contiguity, long-range connectivity, and coverage of the genome. In particular, the base accuracy is high (≥99.95%) and the scaffold sizes (N50 size = 11.5 Mb for human and 7.2 Mb for mouse) approach those obtained with capillary-based sequencing. The combination of improved sequencing technology and improved computational methods should now make it possible to increase dramatically the de novo sequencing of large genomes. The ALLPATHS-LG program is available at http://www.broadinstitute.org/science/programs/ genome-biology/crd. T he high-quality assembly of a genome sequence is a critical foundation for understanding the biology of an organism, the genetic variation within a species, or the pathology of a tumor. High-quality assembly is particularly challenging for large, repeatrich genomes such as those of mammals. Among mammals, "finished" genome sequences have been completed for the human and the mouse (1, 2). However, for most large genomes, efforts have focused on using shotgun-sequencing data to produce highquality draft genome assemblies-with long-range contiguity in the range of 20-100 kb and long-range connectivity in the range of 10 Mb (e.g., refs. 3-5). Using traditional capillary-based sequencing, such assemblies have been produced for multiple mammals at a cost of tens of million dollars each.Recently, there has been a revolution in DNA sequencing technology. New massively parallel technologies can produce DNA sequence information at a per-base cost that is ∼100,000-fold lower than a decade ago (6, 7). In principle, this should make it possible to dramatically decrease the cost of generating highquality draft genome assemblies. In practice, however, this has been difficult because the new technology produces sequencing "reads" of only ∼100 bases in length (compared with >700 bases for capillary-based technology). These shorter reads are also less accurate. For both of these reasons, these data are more difficult to assemble into long contiguous and connected sequence. Excellent de novo assemblies using massively parallel sequence data have been reported for microbes with genomes up to 40 Mb (refs. 8-10 and many others). There have been some important pioneering e...
Variation in vectorial capacity for human malaria among Anopheles mosquito species is determined by many factors, including behavior, immunity, and life history. To investigate the genomic basis of vectorial capacity and explore new avenues for vector control, we sequenced the genomes of 16 anopheline mosquito species from diverse locations spanning ~100 million years of evolution. Comparative analyses show faster rates of gene gain and loss, elevated gene shuffling on the X chromosome, and more intron losses, relative to Drosophila. Some determinants of vectorial capacity, such as chemosensory genes, do not show elevated turnover, but instead diversify through protein-sequence changes. This dynamism of anopheline genes and genomes may contribute to their flexible capacity to take advantage of new ecological niches, including adapting to humans as primary hosts.
Deep sequencing technologies have the potential to transform the study of highly variable viral pathogens by providing a rapid and cost-effective approach to sensitively characterize rapidly evolving viral quasispecies. Here, we report on a high-throughput whole HIV-1 genome deep sequencing platform that combines 454 pyrosequencing with novel assembly and variant detection algorithms. In one subject we combined these genetic data with detailed immunological analyses to comprehensively evaluate viral evolution and immune escape during the acute phase of HIV-1 infection. The majority of early, low frequency mutations represented viral adaptation to host CD8+ T cell responses, evidence of strong immune selection pressure occurring during the early decline from peak viremia. CD8+ T cell responses capable of recognizing these low frequency escape variants coincided with the selection and evolution of more effective secondary HLA-anchor escape mutations. Frequent, and in some cases rapid, reversion of transmitted mutations was also observed across the viral genome. When located within restricted CD8 epitopes these low frequency reverting mutations were sufficient to prime de novo responses to these epitopes, again illustrating the capacity of the immune response to recognize and respond to low frequency variants. More importantly, rapid viral escape from the most immunodominant CD8+ T cell responses coincided with plateauing of the initial viral load decline in this subject, suggestive of a potential link between maintenance of effective, dominant CD8 responses and the degree of early viremia reduction. We conclude that the early control of HIV-1 replication by immunodominant CD8+ T cell responses may be substantially influenced by rapid, low frequency viral adaptations not detected by conventional sequencing approaches, which warrants further investigation. These data support the critical need for vaccine-induced CD8+ T cell responses to target more highly constrained regions of the virus in order to ensure the maintenance of immunodominant CD8 responses and the sustained decline of early viremia.
The complete genome of the green-sulfur eubacterium Chlorobium tepidum TLS was determined to be a single circular chromosome of 2,154,946 bp. This represents the first genome sequence from the phylum Chlorobia, whose members perform anoxygenic photosynthesis by the reductive tricarboxylic acid cycle. Genome comparisons have identified genes in C. tepidum that are highly conserved among photosynthetic species. Many of these have no assigned function and may play novel roles in photosynthesis or photobiology. Phylogenomic analysis reveals likely duplications of genes involved in biosynthetic pathways for photosynthesis and the metabolism of sulfur and nitrogen as well as strong similarities between metabolic processes in C. tepidum and many Archaeal species.
BackgroundThe continued advance of antibiotic resistance threatens the treatment and control of many infectious diseases. This is exemplified by the largest global outbreak of extensively drug-resistant (XDR) tuberculosis (TB) identified in Tugela Ferry, KwaZulu-Natal, South Africa, in 2005 that continues today. It is unclear whether the emergence of XDR-TB in KwaZulu-Natal was due to recent inadequacies in TB control in conjunction with HIV or other factors. Understanding the origins of drug resistance in this fatal outbreak of XDR will inform the control and prevention of drug-resistant TB in other settings. In this study, we used whole genome sequencing and dating analysis to determine if XDR-TB had emerged recently or had ancient antecedents.Methods and FindingsWe performed whole genome sequencing and drug susceptibility testing on 337 clinical isolates of Mycobacterium tuberculosis collected in KwaZulu-Natal from 2008 to 2013, in addition to three historical isolates, collected from patients in the same province and including an isolate from the 2005 Tugela Ferry XDR outbreak, a multidrug-resistant (MDR) isolate from 1994, and a pansusceptible isolate from 1995. We utilized an array of whole genome comparative techniques to assess the relatedness among strains, to establish the order of acquisition of drug resistance mutations, including the timing of acquisitions leading to XDR-TB in the LAM4 spoligotype, and to calculate the number of independent evolutionary emergences of MDR and XDR. Our sequencing and analysis revealed a 50-member clone of XDR M. tuberculosis that was highly related to the Tugela Ferry XDR outbreak strain. We estimated that mutations conferring isoniazid and streptomycin resistance in this clone were acquired 50 y prior to the Tugela Ferry outbreak (katG S315T [isoniazid]; gidB 130 bp deletion [streptomycin]; 1957 [95% highest posterior density (HPD): 1937–1971]), with the subsequent emergence of MDR and XDR occurring 20 y (rpoB L452P [rifampicin]; pncA 1 bp insertion [pyrazinamide]; 1984 [95% HPD: 1974–1992]) and 10 y (rpoB D435G [rifampicin]; rrs 1400 [kanamycin]; gyrA A90V [ofloxacin]; 1995 [95% HPD: 1988–1999]) prior to the outbreak, respectively. We observed frequent de novo evolution of MDR and XDR, with 56 and nine independent evolutionary events, respectively. Isoniazid resistance evolved before rifampicin resistance 46 times, whereas rifampicin resistance evolved prior to isoniazid only twice. We identified additional putative compensatory mutations to rifampicin in this dataset. One major limitation of this study is that the conclusions with respect to ordering and timing of acquisition of mutations may not represent universal patterns of drug resistance emergence in other areas of the globe.ConclusionsIn the first whole genome-based analysis of the emergence of drug resistance among clinical isolates of M. tuberculosis, we show that the ancestral precursor of the LAM4 XDR outbreak strain in Tugela Ferry gained mutations to first-line drugs at the beginning of the antibiotic e...
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.