We report the discovery and confirmation of 23 novel mutations with previously undocumented role in isoniazid (INH) drug resistance, in catalase-peroxidase (katG) gene of Mycobacterium tuberculosis (Mtb) isolates. With these mutations, a synonymous mutation in fabG1g609a, and two canonical mutations, we were able to explain 98% of the phenotypic resistance observed in 366 clinical Mtb isolates collected from four high tuberculosis (TB)-burden countries: India, Moldova, Philippines, and South Africa. We conducted overlapping targeted and whole-genome sequencing for variant discovery in all clinical isolates with a variety of INH-resistant phenotypes. Our analysis showed that just two canonical mutations (katG 315AGC-ACC and inhA promoter-15C-T) identified 89.5% of resistance phenotypes in our collection. Inclusion of the 23 novel mutations reported here, and the previously documented point mutation in fabG1, increased the sensitivity of these mutations as markers of INH resistance to 98%. Only six (2%) of the 332 resistant isolates in our collection did not harbor one or more of these mutations. The third most prevalent substitution, at inhA promoter position -8, present in 39 resistant isolates, was of no diagnostic significance since it always co-occurred with katG 315. 79% of our isolates harboring novel mutations belong to genetic group 1 indicating a higher tendency for this group to go down an uncommon evolutionary path and evade molecular diagnostics. The results of this study contribute to our understanding of the mechanisms of INH resistance in Mtb isolates that lack the canonical mutations and could improve the sensitivity of next generation molecular diagnostics.
BackgroundThe genetic basis of virulence in Mycobacterium tuberculosis has been investigated through genome comparisons of virulent (H37Rv) and attenuated (H37Ra) sister strains. Such analysis, however, relies heavily on the accuracy of the sequences. While the H37Rv reference genome has had several corrections to date, that of H37Ra is unmodified since its original publication.ResultsHere, we report the assembly and finishing of the H37Ra genome from single-molecule, real-time (SMRT) sequencing. Our assembly reveals that the number of H37Ra-specific variants is less than half of what the Sanger-based H37Ra reference sequence indicates, undermining and, in some cases, invalidating the conclusions of several studies. PE_PPE family genes, which are intractable to commonly-used sequencing platforms because of their repetitive and GC-rich nature, are overrepresented in the set of genes in which all reported H37Ra-specific variants are contradicted. Further, one of the sequencing errors in H37Ra masks a true variant in common with the clinical strain CDC1551 which, when considered in the context of previous work, corresponds to a sequencing error in the H37Rv reference genome.ConclusionsOur results constrain the set of genomic differences possibly affecting virulence by more than half, which focuses laboratory investigation on pertinent targets and demonstrates the power of SMRT sequencing for producing high-quality reference genomes.Electronic supplementary materialThe online version of this article (doi:10.1186/s12864-017-3687-5) contains supplementary material, which is available to authorized users.
Mycobacterium tuberculosis , the primary causative agent of tuberculosis, kills more humans than any other infectious bacterium. Yet 40% of its genome is functionally uncharacterized, leaving much about the genetic basis of its resistance to antibiotics, capacity to withstand host immunity, and basic metabolism yet undiscovered.
Motivation: Single Molecule Real-Time (SMRT) sequencing has important and underutilized
The genetic basis of virulence in Mycobacterium tuberculosis has been investigated through genome comparisons of its virulent (H37Rv) and attenuated (H37Ra) sister strains. Such analysis, however, relies heavily on the accuracy of the sequences. While the H37Rv reference genome has had several corrections to date, that of H37Ra is unmodified since its original publication. Here, we report the assembly and finishing of the H37Ra genome from single-molecule, real-time (SMRT) sequencing. Our assembly reveals that the number of H37Ra-specific variants is less than half of what the Sanger-based H37Ra reference sequence indicates, undermining and, in some cases, invalidating the conclusions of several studies. PE PPE family genes, which are intractable to commonly-used sequencing platforms because of their repetitive and GC-rich nature, are overrepresented in the set of genes in which all reported H37Ra-specific variants are contradicted. We discuss how our results change the picture of virulence attenuation and the power of SMRT sequencing for producing high-quality reference genomes. * Equal contribution † Corresponding author: faramarz@sdsu.edu 1 . CC-BY-ND 4.0 International license peer-reviewed) is the author/funder. It is made available under aThe copyright holder for this preprint (which was not . http://dx.doi.org/10.1101/064840 doi: bioRxiv preprint first posted online Jul. 19, 2016; Tuberculosis is a serious and pervasive public health problem [1]. It is a disease 1 caused by infection of bacteria from the Mycobacterium tuberculosis complex 2 (MTBC). The reference strain, Mycobacterium tuberculosis H37Rv, has an at-3 tenuated counterpart known as H37Ra that is available for studies where facil-4 ities to handle virulent samples are lacking. H37Ra exhibits a distinct colony 5 morphology, an absence of cord formation, decreased resistance to stress and 6 hypoxia, and attenuated virulence in mammalian models [2][3][4]. The H37Ra 22In this study, we sequenced and assembled the genome of M. [14, 15]. This insertion was the heterogeneous inser-56 tion responsible for the discrepant contig ends in our raw genome assembly. 57Such heterogeneity implies either a lack of selection pressure on the insertion in 58 culture, a recent emergence of the insertion, or both. 59The 3456bp insertion in ppe54 with respect to H37RaJH incidentally corre-60 sponds to a tandem duplication of a 1728bp sequence at the same site in H37Rv H37Rv and CDC1551 as "H37Ra-specific". These mutations fall within or ad-95 jacent to (which we term "affecting") 56 genes in H37Rv, which we refer to as The copyright holder for this preprint (which was not . http://dx.doi.org/10.1101/064840 doi: bioRxiv preprint first posted online Jul. 19, 2016; colleagues also discovered sequencing errors in the H37Rv reference sequence [5], 98 a number of which were corrected in NC 000962.3 [9], the version used in our 99 study. 100To see how well the HC genes are supported by our assembly of H37Ra, we 101 determined variants with respect to H37Rv ...
Phylogenetic inference based on genomic structural variations, that manipulate the gene order and content of whole chromosomes, promises to inform a more comprehensive understanding of evolution. The first challenge in using such data, the incompleteness of available de novo assemblies, is easing as long read technologies enable (near-)complete genome assembly, but methodological challenges remain. To obtain the input to rearrangement-based inference methods, we need to detect syntenic blocks of orthologous sequences, a task that can be accomplished in many ways, none of which are obviously preferable. In this paper, we use 94 reference quality genomes of primarily Mycobacterium tuberculosis (Mtb) isolates as a benchmark to evaluate these methods. The clonal nature of Mtb evolution, the manageable genome sizes, along with substantial levels of structural variation make this an ideal benchmarking dataset. We test several methods for detecting homology and obtaining syntenic blocks, and two methods for inferring phylogenies, comparing them to the standard method that uses substitutions for inferring the tree. We find that not only the choice of methods but also their parameters can impact results, especially among branches with lower support. In particular, a method based on an encoding of adjacencies applied to Cactus-defined blocks was fully compatible with the highly supported branches of the substitution-based tree. Thus, we were able to combine the two trees to obtain a supertree with high resolution utilizing both SNPs and rearrangements. Furthermore, we observed that the results were much less affected by the choice of the tree inference method than by the method used to determine the underlying syntenic blocks. Overall, our results indicate that accurate trees can be inferred using genome rearrangements, but the choice of the methods for inferring the homology matters and requires care.
Each decade, billions are invested in Tuberculosis (TB) research to further characterize M. tuberculosis pathogenesis. Despite this investment, nearly half of the 4,031 M. tuberculosis protein-coding genes lack descriptive annotation in community databases, due largely to incomplete reconciliation with the literature and a lack of structure-based methods for functional inference. We coin the term "hypotheticome" as the set of genes in an organism without known function. For M. tuberculosis' hypotheticome, we compiled the set of genes lacking functional assignment in the most frequently used Mycobacteria annotation database through systematic, exhaustive manual literature curation and 3Dprotein structure-based inference, and reconciled these annotations with frequented functional databases, creating a comprehensive M. tuberculosis functional knowledge-base. In doing so, we also introduce standard usage of qualifying adjectives based on quantitative measures of certainty with the hope that this approach is adopted in choosing qualifiers for future functional assignments.Through these methods we functionally annotated 41.3% of the M. tuberculosis hypotheticome, and provide insight into its pathogenesis, antibiotic-resistance, and virulence. Processes implicated in the unique lifestyle of M. tuberculosis of long-term persistence and obligate pathogenesis in genotoxic host microenvironmentslipid metabolism, polyketide biosynthesis, and membrane transport and effluxwere overrepresented in our annotation. Our structural similarity approach unturned proteins that appear critical in host-interaction through apparent host mimicry, particularly involving the phagosome and vesicle-mediated transport, as well as putative structural analogs for highly mutable protein classes, including dozens of PE/PPE family proteins which are major players at the host-pathogen interface, and sixteen potential efflux pumps which are integral to M. tuberculosis drug tolerance. Hypotheses drawn from these proteins' function may help characterize the onset of latency and identify therapeutic targets. A unified annotation is essential for clear communication about M. tuberculosis. These improvements provide the most comprehensive M. tuberculosis genome annotation to date, and the approach presented can be applied to systematically annotate the genome of other organisms. We provide our novel annotations in General Feature Format with Enzyme Commission and Gene Ontology terms for integration into existing annotation frameworks.
De novo assembly has become commonplace for microbial organisms, increasing the demand for reliable genome annotation. Ab initio annotation is not an ideal approach for closely related strains due to suboptimal matching of the short or hypervariable genomic features that reference-based annotation transfer can overcome through identification of conserved synteny. At the same time, reference-based annotation methods leave gaps in the annotation where structural variations introduce unique sequence. We present Hybran, a hybrid reference-based and ab initio prokaryotic genome annotation pipeline that transfers features from a curated reference annotation and supplements unannotated regions with ab initio predictions. It builds on existing tools to create initial annotations using both approaches, then compares and resolves them to produce the hybrid annotation. With this pipeline, full advantage is taken of the community's experimental efforts on reference strains to propagate as many known features as possible without sacrificing best-effort ab initio predictions for the remaining unannotated loci. Genome annotation performed in this way can facilitate comparative genomics and the investigation of evolutionary dynamics in microbial populations. Hybran is freely available at https://lpcdrp.gitlab.io/hybran
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
334 Leonard St
Brooklyn, NY 11211
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.