Recent advances in whole genome sequencing have made the technology available for routine use in microbiological laboratories. However, a major obstacle for using this technology is the availability of simple and automatic bioinformatics tools. Based on previously published and already available web-based tools we developed a single pipeline for batch uploading of whole genome sequencing data from multiple bacterial isolates. The pipeline will automatically identify the bacterial species and, if applicable, assemble the genome, identify the multilocus sequence type, plasmids, virulence genes and antimicrobial resistance genes. A short printable report for each sample will be provided and an Excel spreadsheet containing all the metadata and a summary of the results for all submitted samples can be downloaded. The pipeline was benchmarked using datasets previously used to test the individual services. The reported results enable a rapid overview of the major results, and comparing that to the previously found results showed that the platform is reliable and able to correctly predict the species and find most of the expected genes automatically. In conclusion, a combined bioinformatics platform was developed and made publicly available, providing easy-to-use automated analysis of bacterial whole genome sequencing data. The platform may be of immediate relevance as a guide for investigators using whole genome sequencing for clinical diagnostics and surveillance. The platform is freely available at: https://cge.cbs.dtu.dk/services/CGEpipeline-1.1 and it is the intention that it will continue to be expanded with new features as these become available.
The aim of this study was to construct a valid publicly available method for in silico fimH subtyping of Escherichia coli particularly suitable for differentiation of fine-resolution subgroups within clonal groups defined by standard multilocus sequence typing (MLST). FimTyper was constructed as a FASTA database containing all currently known fimH alleles. The software source code is publicly available at https://bitbucket.org/genomicepidemiology/fimtyper, the database is freely available at https://bitbucket.org/genomicepidemiology/fimtyper_db, and a service implementing the software is available at https://cge.cbs.dtu.dk/services/FimTyper. FimTyper was validated on three data sets: one containing Sanger sequences of fimH alleles of 42 E. coli isolates generated prior to the current study (data set 1), one containing whole-genome sequence (WGS) data of 243 third-generation-cephalosporin-resistant E. coli isolates (data set 2), and one containing a randomly chosen subset of 40 E. coli isolates from data set 2 that were subjected to conventional fimH subtyping (data set 3). The combination of the three data sets enabled an evaluation and comparison of FimTyper on both Sanger sequences and WGS data. FimTyper correctly predicted all 42 fimH subtypes from the Sanger sequences from data set 1 and successfully analyzed all 243 draft genomes from data set 2. FimTyper subtyping of the Sanger sequences and WGS data from data set 3 were in complete agreement. Additionally, fimH subtyping was evaluated on a phylogenetic network of 122 sequence type 131 (ST131) E. coli isolates. There was perfect concordance between the typology and fimH-based subclones within ST131, with accurate identification of the pandemic multidrug-resistant clonal subgroup ST131-H30. FimTyper provides a standardized tool, as a rapid alternative to conventional fimH subtyping, highly suitable for surveillance and outbreak detection. KEYWORDS fimH, Escherichia coli, typing, whole-genome sequencing analysisT he fimH gene is part of the fim operon, which encodes a surface organelle named type 1 fimbriae found in most Escherichia coli strains (1). The FimH protein is located at the tip of the fimbrial structure and serves as a D-mannose-specific adhesin, which aids in immobilizing the bacterium on both biotic and abiotic surfaces (2, 3). Studies have shown only minor sequence variation within the fimH genes, which renders the fimH alleles feasible for use in high-resolution subtyping of multilocus sequence typing (MLST)-based E. coli clonal groups. The applicability of fimH subtyping has been shown to be particularly relevant within the highly virulent sequence type 131 (ST131) clonal group, where the resistant and multiresistant H30 subgroups carrying the fimH30 allele have been identified (4, 5). As ST131 E. coli is the most dominant human-pathogenic clonal group being reported in relation to bloodstream infections, the need to perform fimH subtyping is undisputed. Traditionally, typing of fimH alleles has been obtained
Our results provide support for the hypothesis that clonal transfer of cephalosporin-resistant E. coli from chicken meat to humans may occur, and may cause difficult-to-treat infections. Furthermore, these E. coli can be a source of AmpC-resistance plasmids for opportunistic pathogens in the human microbiota.
BackgroundWhole genome sequencing (WGS) is increasingly used in diagnostics and surveillance of infectious diseases. A major application for WGS is to use the data for identifying outbreak clusters, and there is therefore a need for methods that can accurately and efficiently infer phylogenies from sequencing reads. In the present study we describe a new dataset that we have created for the purpose of benchmarking such WGS-based methods for epidemiological data, and also present an analysis where we use the data to compare the performance of some current methods.ResultsOur aim was to create a benchmark data set that mimics sequencing data of the sort that might be collected during an outbreak of an infectious disease. This was achieved by letting an E. coli hypermutator strain grow in the lab for 8 consecutive days, each day splitting the culture in two while also collecting samples for sequencing. The result is a data set consisting of 101 whole genome sequences with known phylogenetic relationship. Among the sequenced samples 51 correspond to internal nodes in the phylogeny because they are ancestral, while the remaining 50 correspond to leaves.We also used the newly created data set to compare three different online available methods that infer phylogenies from whole-genome sequencing reads: NDtree, CSI Phylogeny and REALPHY. One complication when comparing the output of these methods with the known phylogeny is that phylogenetic methods typically build trees where all observed sequences are placed as leafs, even though some of them are in fact ancestral. We therefore devised a method for post processing the inferred trees by collapsing short branches (thus relocating some leafs to internal nodes), and also present two new measures of tree similarity that takes into account the identity of both internal and leaf nodes.ConclusionsBased on this analysis we find that, among the investigated methods, CSI Phylogeny had the best performance, correctly identifying 73% of all branches in the tree and 71% of all clades.We have made all data from this experiment (raw sequencing reads, consensus whole-genome sequences, as well as descriptions of the known phylogeny in a variety of formats) publicly available, with the hope that other groups may find this data useful for benchmarking and exploring the performance of epidemiological methods. All data is freely available at: https://cge.cbs.dtu.dk/services/evolution_data.php.Electronic supplementary materialThe online version of this article (doi:10.1186/s12864-016-3407-6) contains supplementary material, which is available to authorized users.
Public health authorities whole-genome sequence thousands of isolates each month for microbial diagnostics and surveillance of pathogenic bacteria. The computational methods have not kept up with the deluge of data and the need for real-time results. We have therefore created a bioinformatics pipeline for rapid subtyping and continuous phylogenomic analysis of bacterial samples, suited for large-scale surveillance. The data is divided into sets by mapping to reference genomes, then consensus sequences are generated. Nucleotide based genetic distance is calculated between the sequences in each set, and isolates are clustered together at 10 single-nucleotide polymorphisms. Phylogenetic trees are inferred from the non-redundant sequences and the clustered isolates are added back. The method is accurate at grouping outbreak strains together, while discriminating them from non-outbreak strains. The pipeline is applied in Evergreen Online, which processes publicly available sequencing data from foodborne bacterial pathogens on a daily basis, updating phylogenetic trees as needed.
Metastasis is the main cause of cancer death, yet the evolutionary processes behind it remain largely unknown. Here, through analysis of large panel-based genomic datasets from the AACR GENIE project, including 40,979 primary and metastatic tumors across 25 distinct cancer types, we explore how the evolutionary pressure of cancer metastasis shapes the selection of genomic drivers of cancer. The most commonly affected genes were TP53, MYC, and CDKN2A, with no specific pattern associated with metastatic disease. This suggests that, on a driver mutation level, the selective pressure operating in primary and metastatic tumors is similar. The most highly enriched individual driver mutations in metastatic tumors were mutations known to drive resistance to hormone therapies in breast and prostate cancer (ESR1 and AR), anti-EGFR therapy in non-small cell lung cancer (EGFR T790M), and imatinib in gastrointestinal cancer (KIT V654A). Specific mutational signatures were also associated with treatment in three cancer types, supporting clonal selection following anti-cancer therapy. Overall, this implies that initial acquisition of driver mutations is predominantly shaped by the tissue of origin, where specific mutations define the developing primary tumor and drive growth, immune escape, and tolerance to chromosomal instability. However, acquisition of driver mutations that contribute to metastatic disease is less specific, with the main genomic drivers of metastatic cancer evolution associating with resistance to therapy.
Knowledge about the difference in the global distribution of pathogens and non-pathogens is limited. Here, we investigate it using a multi-sample metagenomics phylogeny approach based on short-read metagenomic sequencing of sewage from 79 sites around the world. For each metagenomic sample, bacterial template genomes were identified in a non-redundant database of whole genome sequences. Reads were mapped to the templates identified in each sample. Phylogenetic trees were constructed for each template identified in multiple samples. The countries from which the samples were taken were grouped according to different definitions of world regions. For each tree, the tendency for regional clustering was determined. Phylogenetic trees representing 95 unique bacterial templates were created covering 4 to 71 samples. Varying degrees of regional clustering could be observed. The clustering was most pronounced for environmental bacterial species and human commensals, and less for colonizing opportunistic pathogens, opportunistic pathogens and pathogens. No pattern of significant difference in clustering between any of the organism classifications and country groupings according to income were observed. our study suggests that while the same bacterial species might be found globally, there is a geographical regional selection or barrier to spread for individual clones of environmental and human commensal bacteria, whereas this is to a lesser degree the case for strains and clones of human pathogens and opportunistic pathogens. One of the basic dogma in microbiology has for almost a century been that we for microorganisms consider that "everything is everywhere but the environment selects" 1,2. A large number of papers about the global transmission events of bacterial clones have been published, including descriptions of emergence and spread of specific clones of Vibrio cholera, MRSA, Escherichia coli, Clostridium difficile 3-6 and many other bacterial pathogens. The main focus has been on pathogenic clones and virtually nothing is known about the global phylogeny of commensal species and clones. The gut microbiota has so far mainly been studied in relation to diet, use of medication and diseases 7,8 , mostly within countries 9 and in some studies between countries 10,11. These studies have looked at the species or genera composition of the microbiota and the interaction between species, while virtually no details on within species phylogeny have been investigated. The same has been the case for environmental bacteria; there are numerous projects, which have sequenced the metagenome of different niches 12 , but not with much focus on the importance of geographical locations or within species phylogeny. Almost all studies into the within species phylogeny of bacterial species have been conducted using whole genome sequencing of single cultivated isolates. A recent metagenomic study where DNA was isolated both directly from faeces and from isolates cultured from the faeces, demonstrated that most pairs of isolates and metageno...
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.