Motivation Analysis toolkits for shotgun metagenomic data achieve strain-level characterization of complex microbial communities by capturing intra-species gene content variation. Yet, these tools are hampered by the extent of reference genomes that are far from covering all microbial variability, as many species are still not sequenced or have only few strains available. Binning co-abundant genes obtained from de novo assembly is a powerful reference-free technique to discover and reconstitute gene repertoire of microbial species. While current methods accurately identify species core parts, they miss many accessory genes or split them into small gene groups that remain unassociated to core clusters. Results We introduce MSPminer, a computationally efficient software tool that reconstitutes Metagenomic Species Pan-genomes (MSPs) by binning co-abundant genes across metagenomic samples. MSPminer relies on a new robust measure of proportionality coupled with an empirical classifier to group and distinguish not only species core genes but accessory genes also. Applied to a large scale metagenomic dataset, MSPminer successfully delineates in a few hours the gene repertoires of 1661 microbial species with similar specificity and higher sensitivity than existing tools. The taxonomic annotation of MSPs reveals microorganisms hitherto unknown and brings coherence in the nomenclature of the species of the human gut microbiota. The provided MSPs can be readily used for taxonomic profiling and biomarkers discovery in human gut metagenomic samples. In addition, MSPminer can be applied on gene count tables from other ecosystems to perform similar analyses. Availability and implementation The binary is freely available for non-commercial users at www.enterome.com/downloads . Supplementary information Supplementary data are available at Bioinformatics online.
We apply modeling approaches to investigate the distribution of late recombination nodules in maize (Zea mays). Such nodules indicate crossover positions along the synaptonemal complex. High-quality nodule data were analyzed using two different interference models: the "statistical" gamma model and the "mechanical" beam film model. For each chromosome, we exclude at a 98% significance level the hypothesis that a single pathway underlies the formation of all crossovers, pointing to the coexistence of two types of crossing-over in maize, as was previously demonstrated in other organisms. We estimate the proportion of crossovers coming from the noninterfering pathway to range from 6 to 23% depending on the chromosome, with a cell average of ;15%. The mean number of noninterfering crossovers per chromosome is significantly correlated with the length of the synaptonemal complex. We also quantify the intensity of interference. Finally, we develop inference tools that allow one to tackle, without much loss of power, complex crossover interference models such as the beam film. The lack of a likelihood function in such models had prevented their use for parameter estimation. This advance will allow more realistic mechanisms of crossover formation to be modeled in the future.
BackgroundDuring meiosis, homologous chromosomes exchange segments via the formation of crossovers. This phenomenon is highly regulated; in particular, crossovers are distributed heterogeneously along the physical map and rarely arise in close proximity, a property referred to as "interference". Crossover positions form patterns that give clues about how crossovers are formed. In several organisms including yeast, tomato, Arabidopsis, and mouse, it is believed that crossovers form via at least two pathways, one interfering, the other not.ResultsWe have developed a software package - "CODA", for CrossOver Distribution Analyzer - which allows one to quantitatively characterize crossover patterns by fitting interference models to experimental data. Two families of interfering models are provided: the "gamma" model and the "beam-film" model. The user can specify single or two-pathways modeling, and the software package infers the model's parameters and their confidence intervals. CODA can handle data produced from measurements on bivalents or gametes, in the form of continuous crossover positions or marker genotyping. We illustrate the possibilities on data from Wheat, corn and mouse.ConclusionsCODA extends the kind of crossover data that could be analyzed so far to include gametic data (rather than only bivalents/tetrads) when using two-pathways modeling. It will also enable users to perform analyses based on the beam-film model. CODA implements that model's complex physics and mathematics, and uses a summary statistic to overcomes the lack of a computable likelihood which has hampered its use till now.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.