BackgroundThe quality of multiple sequence alignments plays an important role in the accuracy of phylogenetic inference. It has been shown that removing ambiguously aligned regions, but also other sources of bias such as highly variable (saturated) characters, can improve the overall performance of many phylogenetic reconstruction methods. A current scientific trend is to build phylogenetic trees from a large number of sequence datasets (semi-)automatically extracted from numerous complete genomes. Because these approaches do not allow a precise manual curation of each dataset, there exists a real need for efficient bioinformatic tools dedicated to this alignment character trimming step.ResultsHere is presented a new software, named BMGE (Block Mapping and Gathering with Entropy), that is designed to select regions in a multiple sequence alignment that are suited for phylogenetic inference. For each character, BMGE computes a score closely related to an entropy value. Calculation of these entropy-like scores is weighted with BLOSUM or PAM similarity matrices in order to distinguish among biologically expected and unexpected variability for each aligned character. Sets of contiguous characters with a score above a given threshold are considered as not suited for phylogenetic inference and then removed. Simulation analyses show that the character trimming performed by BMGE produces datasets leading to accurate trees, especially with alignments including distantly-related sequences. BMGE also implements trimming and recoding methods aimed at minimizing phylogeny reconstruction artefacts due to compositional heterogeneity.ConclusionsBMGE is able to perform biologically relevant trimming on a multiple alignment of DNA, codon or amino acid sequences. Java source code and executable are freely available at ftp://ftp.pasteur.fr/pub/GenSoft/projects/BMGE/.
Microbial pathogenesis studies are typically performed with reference strains, thereby overlooking microbial intra-species virulence heterogeneity. Here we integrated human epidemiological and clinical data with bacterial population genomics to harness the biodiversity of the model foodborne pathogen Listeria monocytogenes and decipher the basis of its neural and placental tropisms.Taking advantage of the clonal structure of this bacterial species, we identify clones epidemiologically associated with either food or human central nervous system (CNS) and maternal-neonatal (MN) listeriosis. The latter are also most prevalent in patients without immunosuppressive comorbidities. Strikingly, CNS and MN clones are hypervirulent in a humanized mouse model of listeriosis. By integrating epidemiological data and comparative genomics, we uncovered multiple novel putative virulence factors and demonstrated experimentally the contribution of the first gene cluster mediating Listeria monocytogenes neural and placental tropisms. This study illustrates the exceptional power of harnessing microbial biodiversity to identify clinically relevant microbial virulence attributes.
Listeria monocytogenes (Lm) is a major human foodborne pathogen. Numerous Lm outbreaks have been reported worldwide and associated with a high case fatality rate, reinforcing the need for strongly coordinated surveillance and outbreak control. We developed a universally applicable genome-wide strain genotyping approach and investigated the population diversity of Lm using 1,696 isolates from diverse sources and geographical locations. We define, with unprecedented precision, the population structure of Lm, demonstrate the occurrence of international circulation of strains and reveal the extent of heterogeneity in virulence and stress resistance genomic features among clinical and food isolates. Using historical isolates, we show that the evolutionary rate of Lm from lineage I and lineage II is low (∼2.5 × 10 substitutions per site per year, as inferred from the core genome) and that major sublineages (corresponding to so-called 'epidemic clones') are estimated to be at least 50-150 years old. This work demonstrates the urgent need to monitor Lm strains at the global level and provides the unified approach needed for global harmonization of Lm genome-based typing and population biology.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.