In this study, more than one hundred thousand Escherichia coli and Shigella genomes were examined and classified. This is, to our knowledge, the largest E. coli genome dataset analyzed to date. A Mash-based analysis of a cleaned set of 10,667 E. coli genomes from GenBank revealed 14 distinct phylogroups. A representative genome or medoid identified for each phylogroup was used as a proxy to classify 95,525 unassembled genomes from the Sequence Read Archive (SRA). We find that most of the sequenced E. coli genomes belong to four phylogroups (A, C, B1 and E2(O157)). Authenticity of the 14 phylogroups is supported by several different lines of evidence: phylogroup-specific core genes, a phylogenetic tree constructed with 2613 single copy core genes, and differences in the rates of gene gain/loss/duplication. The methodology used in this work is able to reproduce known phylogroups, as well as to identify previously uncharacterized phylogroups in E. coli species.
The explosion of microbial genome sequences in public databases allows for largescale population studies of model organisms, such as Escherichia coli. We have examined more than one hundred-thousand E. coli and Shigella genomes. After removing outliers, genomes were classified into two broad clusters based on a semi-automated Mash analysis, which distinguished 14 distinct phylotypes, graphically illustrated by Cytoscape. From a set of more than ten-thousand good quality E. coli and Shigella genomes from GenBank, we find roughly 2,700 gene families in the E. coli species core, and more than 135,000 gene families in the E. coli pan-genome. Based on a set of 2,613 single-copy core proteins taken from one representative genome per phylotype, we constructed a robust phylogenetic tree. This is the largest E. coli genome dataset analyzed to date, and provides valuable insight into the population structure of the species.
Insertion sequences (ISs) and other transposable elements are associated with the mobilization of antibiotic resistance determinants and the modulation of pathogenic characteristics. In this work, we aimed to investigate the association between ISs and antibiotic resistance genes, and their role in the dissemination and modification of the antibiotic-resistant phenotype. To that end, we leveraged fully resolved
Enterococcus faecium
and
Enterococcus faecalis
genomes of isolates collected over 5 days from an inpatient with prolonged bacteraemia. Isolates from both species harboured similar IS family content but showed significant species-dependent differences in copy number and arrangements of ISs throughout their replicons. Here, we describe two inter-specific IS-mediated recombination events and IS-mediated excision events in plasmids of
E. faecium
isolates. We also characterize a novel arrangement of the ISs in a Tn1546-like transposon in
E. faecalis
isolates likely implicated in a vancomycin genotype–phenotype discrepancy. Furthermore, an extended analysis revealed a novel association between daptomycin resistance mutations in liaSR genes and a putative composite transposon in
E. faecium
, offering a new paradigm for the study of daptomycin resistance and novel insights into its dissemination. In conclusion, our study highlights the role ISs and other transposable elements play in the rapid adaptation and response to clinically relevant stresses such as aggressive antibiotic treatment in enterococci.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.