Command-line annotation software tools have continuously gained popularity compared to centralized online services due to the worldwide increase of sequenced bacterial genomes. However, results of existing command-line software pipelines heavily depend on taxon-specific databases or sufficiently well annotated reference genomes. Here, we introduce Bakta, a new command-line software tool for the robust, taxon-independent, thorough and, nonetheless, fast annotation of bacterial genomes. Bakta conducts a comprehensive annotation workflow including the detection of small proteins taking into account replicon metadata. The annotation of coding sequences is accelerated via an alignment-free sequence identification approach that in addition facilitates the precise assignment of public database cross-references. Annotation results are exported in GFF3 and International Nucleotide Sequence Database Collaboration (INSDC)-compliant flat files, as well as comprehensive JSON files, facilitating automated downstream analysis. We compared Bakta to other rapid contemporary command-line annotation software tools in both targeted and taxonomically broad benchmarks including isolates and metagenomic-assembled genomes. We demonstrated that Bakta outperforms other tools in terms of functional annotations, the assignment of functional categories and database cross-references, whilst providing comparable wall-clock runtimes. Bakta is implemented in Python 3 and runs on MacOS and Linux systems. It is freely available under a GPLv3 license at https://github.com/oschwengers/bakta. An accompanying web version is available at https://bakta.computational.bio.
Introduction: The increasing incidence of infections caused by extended-spectrum beta-lactamase (ESBL)-producing Escherichia coli in sub-Saharan Africa is of serious concern. Studies from countries with a highly industrialized poultry industry suggest the poultry production-food-consumer chain as a potential transmission route. In Africa, integrated studies at this human–animal interface are still missing.Aim: To determine the molecular epidemiology of ESBL-producing E. coli from the intestinal tract of humans and poultry in rural Ghana.Methods: During a 6-month period, fecal samples from all children admitted to the Agogo Hospital (Ghana) and broilers at eight poultry farms located within the hospital catchment area were collected. After screening on selective ESBL agar, whole genome sequencing (WGS) was performed on all ESBL isolates. The genomes were analyzed using multilocus sequence typing (MLST), ESBL genotyping and genome-based phylogenetic analyses.Results: Of 140 broilers and 54 children, 41 (29%) and 33 (61%) harbored ESBL E. coli, respectively, with prevalences on farms ranging between 0 and 85%. No predominant sequence type (ST) was detected among humans. ST10 was most prevalent among broilers (n = 31, 69%). The ESBL gene blaCTX-M-15 was predominant among broilers (n = 43, 96%) and humans (n = 32, 97%). Whole-genome-based phylogenetic analysis revealed three very closely related broiler/human isolate clusters (10% of ESBL isolates) with chromosomal and plasmid-mediated ESBL genes.Conclusion: The findings demonstrate a high frequency of intestinal ESBL-producing E. coli in rural Ghana. Considering that animal and human samples are independent specimens from the same geographic location, the number of closely related ESBL isolates circulating across these two reservoirs is substantial. Hence, poultry farms or meat products might be an important source for ESBL-producing bacteria in rural Ghana leading to difficult-to-treat infections in humans.
Plasmids are extrachromosomal genetic elements that replicate independently of the chromosome and play a vital role in the environmental adaptation of bacteria. Due to potential mobilization or conjugation capabilities, plasmids are important genetic vehicles for antimicrobial resistance genes and virulence factors with huge and increasing clinical implications. They are therefore subject to large genomic studies within the scientific community worldwide. As a result of rapidly improving next-generation sequencing methods, the quantity of sequenced bacterial genomes is constantly increasing, in turn raising the need for specialized tools to (i) extract plasmid sequences from draft assemblies, (ii) derive their origin and distribution, and (iii) further investigate their genetic repertoire. Recently, several bioinformatic methods and tools have emerged to tackle this issue; however, a combination of high sensitivity and specificity in plasmid sequence identification is rarely achieved in a taxonindependent manner. In addition, many software tools are not appropriate for large high-throughput analyses or cannot be included in existing software pipelines due to their technical design or software implementation. In this study, we investigated differences in the replicon distributions of protein-coding genes on a large scale as a new approach to distinguish plasmidborne from chromosome-borne contigs. We defined and computed statistical discrimination thresholds for a new metric: the replicon distribution score (RDS), which achieved an accuracy of 96.6 %. The final performance was further improved by the combination of the RDS metric with heuristics exploiting several plasmid-specific higher-level contig characterizations. We implemented this workflow in a new high-throughput taxon-independent bioinformatics software tool called Platon for the recruitment and characterization of plasmid-borne contigs from short-read draft assemblies. Compared to PlasFlow, Platon achieved a higher accuracy (97.5 %) and more balanced predictions (F1=82.6 %) tested on a broad range of bacterial taxa and better or equal performance against the targeted tools PlasmidFinder and PlaScope on sequenced Escherichia coli isolates. Platon is available at: http:// platon. computational. bio/.
Motivation: The vast amount of already available and currently generated read mapping data requires comprehensive visualization, and should benefit from bioinformatics tools offering a wide spectrum of analysis functionality from just one source. Appropriate handling of multiple mapped reads during mapping analyses remains an issue that demands improvement.Results: The capabilities of the read mapping analysis and visualization tool ReadXplorer were vastly enhanced. Here, we present an even finer granulated read mapping classification, improving the level of detail for analyses and visualizations. The spectrum of automatic analysis functions has been broadened to include genome rearrangement detection as well as correlation analysis between two mapping data sets. Existing functions were refined and enhanced, namely the computation of differentially expressed genes, the read count and normalization analysis and the transcription start site detection. Additionally, ReadXplorer 2 features a highly improved support for large eukaryotic data sets and a command line version, enabling its integration into workflows. Finally, the new version is now able to display any kind of tabular results from other bioinformatics tools.Availability and Implementation: http://www.readxplorer.orgContact: readxplorer@computational.bio.uni-giessen.deSupplementary information: Supplementary data are available at Bioinformatics online.
Motivation Antimicrobial resistance (AMR) is one of the biggest global problems threatening human and animal health. Rapid and accurate AMR diagnostic methods are thus very urgently needed. However, traditional antimicrobial susceptibility testing (AST) is time-consuming, low throughput, and viable only for cultivable bacteria. Machine learning methods may pave the way for automated AMR prediction based on genomic data of the bacteria. However, comparing different machine learning methods for the prediction of AMR based on different encodings and whole-genome sequencing data without previously known knowledge remains to be done. Results In the current study, we evaluated logistic regression (LR), support vector machine (SVM), random forest (RF), and convolutional neural network (CNN) for the prediction of AMR for the antibiotics ciprofloxacin (CIP), cefotaxime (CTX), ceftazidime (CTZ), and gentamicin (GEN). We could demonstrate that these models can effectively predict AMR with label encoding, one-hot encoding, and frequency matrix chaos game representation (FCGR encoding) on whole-genome sequencing data. We trained these models on a large AMR dataset and evaluated them on an independent public data set. Generally, RFs and CNNs perform better than LR and SVM with AUCs up to 0.96. Furthermore, we were able to identify mutations that are associated with AMR for each antibiotic. Availability Source code in data preparation and model training are provided at GitHub website (https://github.com/YunxiaoRen/ML-iAMR). Supplementary information Supplementary data are available at Bioinformatics online.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.