BackgroundThe emergence of next generation sequencing (NGS) has provided the means for rapid and high throughput sequencing and data generation at low cost, while concomitantly creating a new set of challenges. The number of available assembled microbial genomes continues to grow rapidly and their quality reflects the quality of the sequencing technology used, but also of the analysis software employed for assembly and annotation.Methodology/Principal FindingsIn this work, we have explored the quality of the microbial draft genomes across various sequencing technologies. We have compared the draft and finished assemblies of 133 microbial genomes sequenced at the Department of Energy-Joint Genome Institute and finished at the Los Alamos National Laboratory using a variety of combinations of sequencing technologies, reflecting the transition of the institute from Sanger-based sequencing platforms to NGS platforms. The quality of the public assemblies and of the associated gene annotations was evaluated using various metrics. Results obtained with the different sequencing technologies, as well as their effects on downstream processes, were analyzed. Our results demonstrate that the Illumina HiSeq 2000 sequencing system, the primary sequencing technology currently used for de novo genome sequencing and assembly at JGI, has various advantages in terms of total sequence throughput and cost, but it also introduces challenges for the downstream analyses. In all cases assembly results although on average are of high quality, need to be viewed critically and consider sources of errors in them prior to analysis.ConclusionThese data follow the evolution of microbial sequencing and downstream processing at the JGI from draft genome sequences with large gaps corresponding to missing genes of significant biological role to assemblies with multiple small gaps (Illumina) and finally to assemblies that generate almost complete genomes (Illumina+PacBio).
Motivation: The Biological Reference Repository (BioR) is a toolkit for annotating variants. BioR stores public and user-specific annotation sources in indexed JSON-encoded flat files (catalogs). The BioR toolkit provides the functionality to combine and retrieve annotation from these catalogs via the command-line interface. Several catalogs from commonly used annotation sources and instructions for creating user-specific catalogs are provided. Commands from the toolkit can be combined with other UNIX commands for advanced annotation processing. We also provide instructions for the development of custom annotation pipelines.Availability and implementation: The package is implemented in Java and makes use of external tools written in Java and Perl. The toolkit can be executed on Mac OS X 10.5 and above or any Linux distribution. The BioR application, quickstart, and user guide documents and many biological examples are available at http://bioinformaticstools.mayo.edu.Contact: Kocher.JeanPierre@mayo.eduSupplementary information: Supplementary data are available at Bioinformatics online.
Whole-genome sequencing (WGS) can provide excellent resolution in global and local epidemiological investigations of Staphylococcus aureus outbreaks. A variety of sequencing approaches and analytical tools have been used; it is not clear which is ideal. We compared two WGS strategies and two analytical approaches to the standard method of SmaI restriction digestion pulsed-field gel electrophoresis (PFGE) for typing S. aureus. Forty-two S. aureus isolates from three outbreaks and 12 reference isolates were studied. Near-complete genomes, assembled de novo with paired-end and long-mate-pair (8 kb) libraries were first assembled and analyzed utilizing an in-house assembly and analytical informatics pipeline. In addition, pairedend data were assembled and analyzed using a commercial software package. Single nucleotide variant (SNP) analysis was performed using the in-house pipeline. Two assembly strategies were used to generate core genome multilocus sequence typing (cgMLST) data. First, the near-complete genome data generated with the inhouse pipeline were imported into the commercial software and used to perform cgMLST analysis. Second, the commercial software was used to assemble paired-end data, and resolved assemblies were used to perform cgMLST. Similar isolate clustering was observed using SNP calling and cgMLST, regardless of data assembly strategy. All methods provided more discrimination between outbreaks than did PFGE. Overall, all of the evaluated WGS strategies yielded statistically similar results for S. aureus typing.KEYWORDS MRSA, PFGE, Staphylococcus aureus, molecular typing, whole-genome sequencing M ethicillin-resistant Staphylococcus aureus (MRSA) infections are associated with high morbidity and mortality. MRSA transmission poses a challenge to hospital infection prevention and control practitioners and public health professionals. The Centers for Disease Control and Prevention's Active Bacterial Surveillance Report estimated that there were 72,444 cases of invasive MRSA infection in 2014, the majority of which were health care associated (HCA) (1). Proactive screening strategies (molecular and culture based) are emphasized in many institutions and are mandatory in some states. Despite these measures, HCA-MRSA outbreaks continue to occur. Thorough investigation of outbreaks is essential for confirming that an outbreak is occurring, understanding transmission patterns and reservoirs, and intervening to interrupt out-
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.