Genome-wide association study (GWAS) methods applied to bacterial genomes have shown promising results for genetic marker discovery or detailed assessment of marker effect. Recently, alignment-free methods based on k-mer composition have proven their ability to explore the accessory genome. However, they lead to redundant descriptions and results which are sometimes hard to interpret. Here we introduce DBGWAS, an extended k-mer-based GWAS method producing interpretable genetic variants associated with distinct phenotypes. Relying on compacted De Bruijn graphs (cDBG), our method gathers cDBG nodes, identified by the association model, into subgraphs defined from their neighbourhood in the initial cDBG. DBGWAS is alignment-free and only requires a set of contigs and phenotypes. In particular, it does not require prior annotation or reference genomes. It produces subgraphs representing phenotype-associated genetic variants such as local polymorphisms and mobile genetic elements (MGE). It offers a graphical framework which helps interpret GWAS results. Importantly it is also computationally efficient—experiments took one hour and a half on average. We validated our method using antibiotic resistance phenotypes for three bacterial species. DBGWAS recovered known resistance determinants such as mutations in core genes in Mycobacterium tuberculosis, and genes acquired by horizontal transfer in Staphylococcus aureus and Pseudomonas aeruginosa—along with their MGE context. It also enabled us to formulate new hypotheses involving genetic variants not yet described in the antibiotic resistance literature. An open-source tool implementing DBGWAS is available at https://gitlab.com/leoisl/dbgwas.
Motivation: Metagenomics characterizes the taxonomic diversity of microbial communities by sequencing DNA directly from an environmental sample. One of the main challenges in metagenomics data analysis is the binning step, where each sequenced read is assigned to a taxonomic clade. Because of the large volume of metagenomics datasets, binning methods need fast and accurate algorithms that can operate with reasonable computing requirements. While standard alignment-based methods provide state-of-the-art performance, compositional approaches that assign a taxonomic class to a DNA read based on the k-mers it contains have the potential to provide faster solutions.Results: We propose a new rank-flexible machine learning-based compositional approach for taxonomic assignment of metagenomics reads and show that it benefits from increasing the number of fragments sampled from reference genome to tune its parameters, up to a coverage of about 10, and from increasing the k-mer size to about 12. Tuning the method involves training machine learning models on about 108 samples in 107 dimensions, which is out of reach of standard softwares but can be done efficiently with modern implementations for large-scale machine learning. The resulting method is competitive in terms of accuracy with well-established alignment and composition-based tools for problems involving a small to moderate number of candidate species and for reasonable amounts of sequencing errors. We show, however, that machine learning-based compositional approaches are still limited in their ability to deal with problems involving a greater number of species and more sensitive to sequencing errors. We finally show that the new method outperforms the state-of-the-art in its ability to classify reads from species of lineage absent from the reference database and confirm that compositional approaches achieve faster prediction times, with a gain of 2–17 times with respect to the BWA-MEM short read mapper, depending on the number of candidate species and the level of sequencing noise.Availability and implementation: Data and codes are available at http://cbio.ensmp.fr/largescalemetagenomics.Contact: pierre.mahe@biomerieux.comSupplementary information: Supplementary data are available at Bioinformatics online.
BackgroundSeveral studies demonstrated the feasibility of predicting bacterial antibiotic resistance phenotypes from whole-genome sequences, the prediction process usually amounting to detecting the presence of genes involved in antibiotic resistance mechanisms, or of specific mutations, previously identified from a training panel of strains, within these genes. We address the problem from the supervised statistical learning perspective, not relying on prior information about such resistance factors. We rely on a k-mer based genotyping scheme and a logistic regression model, thereby combining several k-mers into a probabilistic model. To identify a small yet predictive set of k-mers, we rely on the stability selection approach (Meinshausen et al., J R Stat Soc Ser B 72:417–73, 2010), that consists in penalizing logistic regression models with a Lasso penalty, coupled with extensive resampling procedures.ResultsUsing public datasets, we applied the resulting classifiers to two bacterial species and achieved predictive performance equivalent to state of the art. The models are extremely sparse, involving 1 to 8 k-mers per antibiotic, hence are remarkably easy and fast to evaluate on new genomes (from raw reads to assemblies).ConclusionOur proof of concept therefore demonstrates that stability selection is a powerful approach to investigate bacterial genotype-phenotype relationships.Electronic supplementary materialThe online version of this article (10.1186/s12859-018-2403-z) contains supplementary material, which is available to authorized users.
We present the MilliDrop Analyzer (MDA), a droplet-based millifluidic system for digital antimicrobial susceptibility testing (D-AST), which enables us to determine minimum inhibitory concentrations (MICs) precisely and accurately. The MilliDrop technology was validated by using resazurin for fluorescence readout, for comparison with standard methodology, and for conducting reproducibility studies. In this first assessment, the susceptibility of a reference Gram-negative strain Escherichia coli ATCC 25922 to gentamicin, chloramphenicol, and nalidixic acid were tested by the MDA, VITEK®2, and broth microdilution as a reference standard. We measured the susceptibility of clinically relevant Gram-positive strains of Staphylococcus aureus to vancomycin, including vancomycin-intermediate S. aureus (VISA), heterogeneous vancomycin-intermediate S. aureus (hVISA), and vancomycin-susceptible S. aureus (VSSA) strains. The MDA provided results which were much more accurate than those of VITEK®2 and standard broth microdilution. The enhanced accuracy enabled us to reliably discriminate between VSSA and hVISA strains.
Infectious diseases are caused by single or successive contacts with pathogens. Nevertheless, contacts with pathogens do not implicate infection. In 1993, Yakovlev et al. proposed a model to study a population of cancer patients with a cured fraction, a well adapted model to describe an infectious disease with a unique infection occasion. Extensions of this model have been proposed in the recent years. We present a mechanistic formulation in the context of infectious diseases with multiple infection occasions. It is a mixture model that enables to study risk factors associated with infection intensity at each infection occasion and factors shortening the delay from exposure to clinical event. Simulations are performed to evaluate the model fitting and two examples are presented for illustration: an analysis of an HIV-1 mother-to-child transmission data set and an analysis of nosocomial urinary tract infections data set.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.