BackgroundFor the last 25 years species delimitation in prokaryotes (Archaea and Bacteria) was to a large extent based on DNA-DNA hybridization (DDH), a tedious lab procedure designed in the early 1970s that served its purpose astonishingly well in the absence of deciphered genome sequences. With the rapid progress in genome sequencing time has come to directly use the now available and easy to generate genome sequences for delimitation of species. (Genome Blast Distance Phylogeny) infers genome-to-genome distances between pairs of entirely or partially sequenced genomes, a digital, highly reliable estimator for the relatedness of genomes. Its application as an in-silico replacement for DDH was recently introduced. The main challenge in the implementation of such an application is to produce digital DDH values that must mimic the wet-lab DDH values as close as possible to ensure consistency in the Prokaryotic species concept.ResultsCorrelation and regression analyses were used to determine the best-performing methods and the most influential parameters. was further enriched with a set of new features such as confidence intervals for intergenomic distances obtained via resampling or via the statistical models for DDH prediction and an additional family of distance functions. As in previous analyses, obtained the highest agreement with wet-lab DDH among all tested methods, but improved models led to a further increase in the accuracy of DDH prediction. Confidence intervals yielded stable results when inferred from the statistical models, whereas those obtained via resampling showed marked differences between the underlying distance functions.ConclusionsDespite the high accuracy of -based DDH prediction, inferences from limited empirical data are always associated with a certain degree of uncertainty. It is thus crucial to enrich in-silico DDH replacements with confidence-interval estimation, enabling the user to statistically evaluate the outcomes. Such methodological advancements, easily accessible through the web service at http://ggdc.dsmz.de, are crucial steps towards a consistent and truly genome sequence-based classification of microorganisms.
Helicobacter pylori, strain 26695, has a circular genome of 1,667,867 base pairs and 1,590 predicted coding sequences. Sequence analysis indicates that H. pylori has well-developed systems for motility, for scavenging iron, and for DNA restriction and modification. Many putative adhesins, lipoproteins and other outer membrane proteins were identified, underscoring the potential complexity of host-pathogen interaction. Based on the large number of sequence-related genes encoding outer membrane proteins and the presence of homopolymeric tracts and dinucleotide repeats in coding sequences, H. pylori, like several other mucosal pathogens, probably uses recombination and slipped-strand mispairing within repeats as mechanisms for antigenic variation and adaptive evolution. Consistent with its restricted niche, H. pylori has a few regulatory networks, and a limited metabolic repertoire and biosynthetic capacity. Its survival in acid conditions depends, in part, on its ability to establish a positive inside-membrane potential in low pH.
Sequencing of bacterial and archaeal genomes has revolutionized our understanding of the many roles played by microorganisms1. There are now nearly 1,000 completed bacterial and archaeal genomes available2, most of which were chosen for sequencing on the basis of their physiology. As a result, the perspective provided by the currently available genomes is limited by a highly biased phylogenetic distribution3–5. To explore the value added by choosing microbial genomes for sequencing on the basis of their evolutionary relationships, we have sequenced and analysed the genomes of 56 culturable species of Bacteria and Archaea selected to maximize phylogenetic coverage. Analysis of these genomes demonstrated pronounced benefits (compared to an equivalent set of genomes randomly selected from the existing database) in diverse areas including the reconstruction of phylogenetic history, the discovery of new protein families and biological properties, and the prediction of functions for known genes from other organisms. Our results strongly support the need for systematic ‘phylogenomic’ efforts to compile a phylogeny-driven ‘Genomic Encyclopedia of Bacteria and Archaea’ in order to derive maximum knowledge from existing microbial genome data as well as from genome sequences to come.
Although Escherichia coli is the most widely studied bacterial model organism and often considered to be the model bacterium per se, its type strain was until now forgotten from microbial genomics. As a part of the GenomicEncyclopedia ofBacteria andArchaea project, we here describe the features of E. coli DSM 30083T together with its genome sequence and annotation as well as novel aspects of its phenotype. The 5,038,133 bp containing genome sequence includes 4,762 protein-coding genes and 175 RNA genes as well as a single plasmid. Affiliation of a set of 250 genome-sequenced E. coli strains, Shigella and outgroup strains to the type strain of E. coli was investigated using digital DNA:DNA-hybridization (dDDH) similarities and differences in genomic G+C content. As in the majority of previous studies, results show Shigella spp. embedded within E. coli and in most cases forming a single subgroup of it. Phylogenomic trees also recover the proposed E. coli phylotypes as monophyla with minor exceptions and place DSM 30083T in phylotype B2 with E. coli S88 as its closest neighbor. The widely used lab strain K-12 is not only genomically but also physiologically strongly different from the type strain. The phylotypes do not express a uniform level of character divergence as measured using dDDH, however, thus an alternative arrangement is proposed and discussed in the context of bacterial subspecies. Analyses of the genome sequences of a large number of E. coli strains and of strains from > 100 other bacterial genera indicate a value of 79-80% dDDH as the most promising threshold for delineating subspecies, which in turn suggests the presence of five subspecies within E. coli.Electronic supplementary materialThe online version of this article (doi:10.1186/1944-3277-9-2) contains supplementary material, which is available to authorized users.
The plant growth promoting model bacterium FZB42T was proposed as the type strain of Bacillus amyloliquefaciens subsp. plantarum (Borriss et al., 2011), but has been recently recognized as being synonymous to Bacillus velezensis due to phylogenomic analysis (Dunlap C. et al., 2016). However, until now, majority of publications consider plant-associated close relatives of FZB42 still as “B. amyloliquefaciens.” Here, we reinvestigated the taxonomic status of FZB42 and related strains in its context to the free-living soil bacterium DSM7T, the type strain of B. amyloliquefaciens. We identified 66 bacterial genomes from the NCBI data bank with high similarity to DSM7T. Dendrograms based on complete rpoB nucleotide sequences and on core genome sequences, respectively, clustered into a clade consisting of three tightly linked branches: (1) B. amyloliquefaciens, (2) Bacillus siamensis, and (3) a conspecific group containing the type strains of B. velezensis, Bacillus methylotrophicus, and B. amyloliquefaciens subsp. plantarum. The three monophyletic clades shared a common mutation rate of 0.01 substitutions per nucleotide position, but were distantly related to Bacillus subtilis (0.1 substitutions per nucleotide position). The tight relatedness of the three clusters was corroborated by TETRA, dDDH, ANI, and AAI analysis of the core genomes, but dDDH and ANI values were found slightly below species level thresholds when B. amyloliquefaciens DSM7T genome sequence was used as query sequence. Due to these results, we propose that the B. amyloliquefaciens clade should be considered as a taxonomic unit above of species level, designated here as “operational group B. amyloliquefaciens” consisting of the soil borne B. amyloliquefaciens, and plant associated B. siamensis and B. velezensis, whose members are closely related and allow identifying changes on the genomic level due to developing the plant-associated life-style.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.