Assembly of bacterial short-read whole-genome sequencing data frequently results in hundreds of contigs for which the origin, plasmid or chromosome, is unclear. Complete genomes resolved by long-read sequencing can be used to generate and label short-read contigs. These were used to train several popular machine learning methods to classify the origin of contigs from Enterococcus faecium, Klebsiella pneumoniae and Escherichia coli using pentamer frequencies. We selected support-vector machine (SVM) models as the best classifier for all three bacterial species (F1-score E. faecium=0.92, F1-score K. pneumoniae=0.90, F1-score E. coli=0.76), which outperformed other existing plasmid prediction tools using a benchmarking set of isolates. We demonstrated the scalability of our models by accurately predicting the plasmidome of a large collection of 1644 E. faecium isolates and illustrate its applicability by predicting the location of antibiotic-resistance genes in all three species. The SVM classifiers are publicly available as an R package and graphical-user interface called ‘mlplasmids’. We anticipate that this tool may significantly facilitate research on the dissemination of plasmids encoding antibiotic resistance and/or contributing to host adaptation.
Assembly of bacterial short-read whole genome sequencing (WGS) data frequently results in hundreds of contigs for which the origin, plasmid or chromosome, is unclear. Long-read sequencing has emerged as a solution to resolve plasmid structures and to obtain complete genomes for most bacterial species. This information can be used to generate and label datasets from short-read based contigs as plasmid-or chromosome-derived. We investigated the use of several popular machine learning methods to classify short-read contigs with known plasmid-or chromosome-origin from Enterococcus faecium , Klebsiella pneumoniae and Escherichia coli using pentamer frequencies. Based on resulting F1-scores we selected support-vector machine (SVM) models as best classifier for all three bacterial species (F1-score E. faecium = 0.94, F1-score K. pneumoniae = 0.90, F1-score E. coli = 0.76) , which outperformed other existing plasmid tools using an independent set of isolates (precision E. faecium = 0.92, precision K. pneumoniae = 0.86, precision E. coli = 0.82). We demonstrated the scalability of our model by accurately predicting the plasmidome of a large collection of 1,644 E. faecium isolates with only short-read WGS available using a standard laptop with a single core. A low number of false positive predicted sequences suggests that the assignment of a particular gene of interest as plasmid-or chromosome-encoded by the models is plausible. The SVM classifiers are publicly available as a new R package called 'mlplasmids' at https://gitlab.com/sirarredondo/mlplasmids under the GNU General Public License v3.0. We additionally developed a graphical-user interface using the Shiny package which can be accessed at https://sarredondo.shinyapps.io/mlplasmids/ . Single genomes can easily be predicted by uploading genome assemblies. We anticipate that this tool may significantly facilitate research on the dissemination of plasmids encoding antibiotic resistance and/or contributing to host adaptation.
Background Knowledge on the molecular epidemiology of Escherichia coli causing E. coli bacteremia (ECB) in the Netherlands is mostly based on extended-spectrum beta-lactamase-producing E. coli (ESBL-Ec). We determined differences in clonality and resistance and virulence gene (VG) content between non-ESBL-producing E. coli (non-ESBL-Ec) and ESBL-Ec isolates from ECB episodes with different epidemiological characteristics. Methods A random selection of non-ESBL-Ec isolates as well as all available ESBL-Ec blood isolates was obtained from two Dutch hospitals between 2014 and 2016. Whole genome sequencing was performed to infer sequence types (STs), serotypes, acquired antibiotic resistance genes and VG scores, based on presence of 49 predefined putative pathogenic VG. Results ST73 was most prevalent among the 212 non-ESBL-Ec (N = 26, 12.3%) and ST131 among the 69 ESBL-Ec (N = 30, 43.5%). Prevalence of ST131 among non-ESBL-Ec was 10.4% (N = 22, P value < .001 compared to ESBL-Ec). O25:H4 was the most common serotype in both non-ESBL-Ec and ESBL-Ec. Median acquired resistance gene counts were 1 (IQR 1-6) and 7 (IQR 4-9) for non-ESBL-Ec and ESBL-Ec, respectively (P value < .001). Among non-ESBL-Ec, acquired resistance gene count was highest among blood isolates from a
S. Harbarth). y Other members of the MODERN WP2 study group are listed in the study group section. Contents lists available at ScienceDirect Clinical Microbiology and Infection j o u r n a l h o m e p a g e : w w w . c l i n i c a l m i c r o b i o l o g y a n d i n f e c t i o n . c o m
Introduction
The human gut microbiota is an important reservoir of ESBL-producing Escherichia coli (ESBL-Ec). Community surveillance studies of ESBL-Ec to monitor circulating clones and ESBL genes are logistically challenging and costly.
Objectives
To evaluate if isolates obtained in routine clinical practice can be used as an alternative to monitor the distribution of clones and ESBL genes circulating in the community.
Methods
WGS was performed on 451 Dutch ESBL-Ec isolates (2014–17), including 162 community faeces and 289 urine and blood isolates. We compared proportions of 10 most frequently identified STs, PopPUNK-based sequence clusters (SCs) and ESBL gene subtypes and the degree of similarity using Czekanowski’s proportional similarity index (PSI).
Results
Nine out of 10 most prevalent STs and SCs and 8/10 most prevalent ESBL genes in clinical ESBL-Ec were also the most common types in community faeces. The proportions of ST131 (39% versus 23%) and SC131 (40% versus 25%) were higher in clinical isolates than in community faeces (P < 0.01). Within ST131, H30Rx (C2) subclade was more prevalent among clinical isolates (55% versus 26%, P < 0.01). The proportion of ESBL gene blaCTX-M-1 was lower in clinical isolates (5% versus 18%, P < 0.01). Czekanowski’s PSI confirmed that the differences in ESBL-Ec from community faeces and clinical isolates were limited.
Conclusions
Distributions of the 10 most prevalent clones and ESBL genes from ESBL-Ec community gut colonization and extra-intestinal infection overlapped in majority, indicating that isolates from routine clinical practice could be used to monitor ESBL-Ec clones and ESBL genes in the community.
on behalf of the MODERN WP3 study group, Populations of extended-spectrum β-lactamase-producing Escherichia coli and Klebsiella pneumoniae are different in human-polluted environment and food items: A multicentre European study,
30Background: Knowledge on the molecular epidemiology of Escherichia coli causing E. coli 31 bacteremia (ECB) in the Netherlands is mostly based on extended-spectrum beta-lactamase-32 producing E. coli (ESBL-Ec). We determined differences in clonality and resistance and 33 virulence gene (VG) content between non-ESBL-producing E. coli (non-ESBL-Ec) and ESBL-Ec 34 blood isolates with different epidemiological characteristics. 35
Materials/methods: A random selection of non-ESBL-Ec isolates as well as all available ESBL-36Ec blood isolates was obtained from two Dutch hospitals between 2014 and 2016. Whole 37 genome sequencing was performed to infer sequence types (STs), serotypes, acquired 38 antibiotic resistance genes and VG scores, based on presence of 49 predefined putative 39 pathogenic VG. 40Results: ST73 was most prevalent among the 212 non-ESBL-Ec (N=26, 12.3%) and ST131 41 among the 69 ESBL-Ec (N=30, 43.5%). Prevalence of ST131 among non-ESBL-Ec was 10.4% 42 (N=22, P value < 0.001 compared to ESBL-Ec). O25:H4 was the most common serotype in both 43 non-ESBL-Ec and ESBL-Ec. Median acquired resistance gene counts were 1 (IQR 1 -6) and 7 44 (IQR 4 -9) for non-ESBL-Ec and ESBL-Ec, respectively (P value < 0.001). Among non-ESBL-45Ec, acquired resistance gene count was highest among blood isolates from a primary gastro-46intestinal focus (median 4, IQR 1 -8). Median VG scores were 13 (IQR 9 -20) and 12 (IQR 8 -47 14) for non-ESBL-Ec and ESBL-Ec, respectively (P value = 0.002). VG scores among non-48 ESBL-Ec from a primary urinary focus (median 15, were higher compared to non-49 ESBL-Ec from a primary gastro-intestinal (median 10, IQR 6 -13) or hepatic-biliary focus 50 (median 11, IQR 5 -18) (P values = 0.007 and 0.036, respectively). VG content varied between 51 different E. coli STs. 52Conclusions: Non-ESBL-Ec and ESBL-Ec blood isolates from two Dutch hospitals differed in 53 clonal distribution, resistance gene and VG content. Also, resistance gene and VG content 54 differed between non-ESBL-Ec from different primary foci of ECB. 55
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.