Highlights d FlashWeave infers direct associations, resulting in sparse, interpretable networks d The method's flexible graphical model framework scales to 500,000+ samples d It integrates environmental & technical factors; adjusts for specific latent signals d An extensive human gut microbial network reveals patterns of biological interest
MotivationRibosomal RNA profiling has become crucial to studying microbial communities, but meaningful taxonomic analysis and inter-comparison of such data are still hampered by technical limitations, between-study design variability and inconsistencies between taxonomies used.ResultsHere we present MAPseq, a framework for reference-based rRNA sequence analysis that is up to 30% more accurate (F½ score) and up to one hundred times faster than existing solutions, providing in a single run multiple taxonomy classifications and hierarchical operational taxonomic unit mappings, for rRNA sequences in both amplicon and shotgun sequencing strategies, and for datasets of virtually any size.Availability and implementationSource code and binaries are freely available at https://github.com/jfmrod/mapseqSupplementary information Supplementary data are available at Bioinformatics online.
The recent explosion of metagenomic sequencing data opens the door towards the modeling of microbial ecosystems in unprecedented detail. In particular, co-occurrence based prediction of ecological interactions could strongly benefit from this development. However, current methods fall short on several fronts: univariate tools do not distinguish between direct and indirect interactions, resulting in excessive false positives, while approaches with better resolution are so far computationally highly limited. Furthermore, confounding variables typical for cross-study data sets are rarely addressed. We present FlashWeave, a new approach based on a flexible Probabilistic Graphical Models framework to infer highly resolved direct microbial interactions from massive heterogeneous microbial abundance data sets with seamless integration of metadata. On a variety of benchmarks, FlashWeave outperforms state-of-the-art methods by several orders of magnitude in terms of speed while generally providing increased accuracy. We apply FlashWeave to a cross-study data set of 69 818 publicly available human gut samples, resulting in one of the largest and most diverse models of microbial interactions in the human gut to date. Prediction performance on a variety of synthetic data setsSince experimentally verified biological interactions between microbes are scarcely available, we first employed previously published frameworks that generate synthetic data with ecological structure. We compared the quality of networks inferred by FlashWeave "sensitive" (-S) and "fast" (-F) ( Fig. 1 C) to three competing univariate inference methods (SparCC [30] , eLSA [17] and CoNet [20] ) and three conditional methods (mLDM [21] and SpiecEasi [15] with neighborhood selection (MB) and inverse covariance selection (GL)).
Fermentation by gut microbe of Japanese macaques Hanya et al.
There is now a great awareness of the high diversity of most environmental (“free-living”) and host-associated microbiomes, but exactly how diverse microbial communities form and maintain is still highly debated. A variety of theories have been put forward, but testing them has been problematic because most studies have been based on synthetic communities that fail to accurately mimic the natural composition (i.e., the species used are typically not found together in the same environment), the diversity (usually too low to be representative), or the environmental system itself (using designs with single carbon sources or solely mixed liquid cultures).
BackgroundThe identification of body site-specific microbial biomarkers and their use for classification tasks have promising applications in medicine, microbial ecology, and forensics. Previous studies have characterized site-specific microbiota and shown that sample origin can be accurately predicted by microbial content. However, these studies were usually restricted to single datasets with consistent experimental methods and conditions, as well as comparatively small sample numbers. The effects of study-specific biases and statistical power on classification performance and biomarker identification thus remain poorly understood. Furthermore, reliable detection in mixtures of different body sites or with noise from environmental contamination has rarely been investigated thus far. Finally, the impact of ecological associations between microbes on biomarker discovery was usually not considered in previous work.ResultsHere we present the analysis of one of the largest cross-study sequencing datasets of microbial communities from human body sites (15,082 samples from 57 publicly available studies). We show that training a Random Forest Classifier on this aggregated dataset increases prediction performance for body sites by 35% compared to a single-study classifier. Using simulated datasets, we further demonstrate that the source of different microbial contributions in mixtures of different body sites or with soil can be detected starting at 1% of the total microbial community. We apply a biomarker selection method that excludes indirect environmental associations driven by microbe-microbe associations, yielding a parsimonious set of highly predictive taxa including novel biomarkers and excluding many previously reported taxa. We find a considerable fraction of unclassified biomarkers (“microbial dark matter”) and observe that negatively associated taxa have a surprisingly high impact on classification performance. We further detect a significant enrichment of rod-shaped, motile, and sporulating taxa for feces biomarkers, consistent with a highly competitive environment.ConclusionsOur machine learning model shows strong body site classification performance, both in single-source samples and mixtures, making it promising for tasks requiring high accuracy, such as forensic applications. We report a core set of ecologically informed biomarkers, inferred across a wide range of experimental protocols and conditions, providing the most concise, general, and least biased overview of body site-associated microbes to date.Electronic supplementary materialThe online version of this article (10.1186/s40168-018-0565-6) contains supplementary material, which is available to authorized users.
Metagenomic sequencing has become crucial to studying microbial communities, but meaningful taxonomic analysis and inter-comparison of such data are still hampered by technical limitations, between-study design variability and inconsistencies between taxonomies used. Here we present MAPseq, a framework for reference-based rRNA metagenomic analysis that is up to 30% more accurate (F 1/2 score) and up to one hundred times faster than existing solutions, providing in a single run multiple taxonomy classifications and hierarchical OTU mappings, for both amplicon and shotgun sequencing strategies, and for datasets of virtually any size. Availability: Source code and binaries are freely available at
Horizontal gene transfer, the exchange of genetic material through means other than reproduction, is a fundamental force in prokaryotic genome evolution. Genomic persistence of horizontally transferred genes has been shown to be influenced by both ecological and evolutionary factors. However, the limited availability of ecological information apart from species’ isolation sources prevented deeper exploration of ecological contributions to horizontal gene transfer. Here, we assessed extensive ecological profiles of gene-exchanging organisms, focusing on transfers detected through explicit phylogenetic methods. By analysing the observed horizontal gene transfer events, we show distinct functional profiles for recent versus old events. Although most genes transferred are accessory, genes transferred earlier in evolution tend to be more ubiquitous within present-day species. Based on environmental information, we find that co-occurring, interacting, and high-abundance species tend to exchange more genes. Finally, we show that host-associated specialist species are much more likely to exchange genes with each other, while generalist species display less of a preference towards HGT with other species in their assigned habitat. Our study covers an unprecedented scale of integrated horizontal gene transfer and environmental information, highlighting broad eco-evolutionary trends.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.