Highlights d FlashWeave infers direct associations, resulting in sparse, interpretable networks d The method's flexible graphical model framework scales to 500,000+ samples d It integrates environmental & technical factors; adjusts for specific latent signals d An extensive human gut microbial network reveals patterns of biological interest
MotivationRibosomal RNA profiling has become crucial to studying microbial communities, but meaningful taxonomic analysis and inter-comparison of such data are still hampered by technical limitations, between-study design variability and inconsistencies between taxonomies used.ResultsHere we present MAPseq, a framework for reference-based rRNA sequence analysis that is up to 30% more accurate (F½ score) and up to one hundred times faster than existing solutions, providing in a single run multiple taxonomy classifications and hierarchical operational taxonomic unit mappings, for rRNA sequences in both amplicon and shotgun sequencing strategies, and for datasets of virtually any size.Availability and implementationSource code and binaries are freely available at https://github.com/jfmrod/mapseqSupplementary information
Supplementary data are available at Bioinformatics online.
The recent explosion of metagenomic sequencing data opens the door towards the modeling of microbial ecosystems in unprecedented detail. In particular, co-occurrence based prediction of ecological interactions could strongly benefit from this development. However, current methods fall short on several fronts: univariate tools do not distinguish between direct and indirect interactions, resulting in excessive false positives, while approaches with better resolution are so far computationally highly limited. Furthermore, confounding variables typical for cross-study data sets are rarely addressed. We present FlashWeave, a new approach based on a flexible Probabilistic Graphical Models framework to infer highly resolved direct microbial interactions from massive heterogeneous microbial abundance data sets with seamless integration of metadata. On a variety of benchmarks, FlashWeave outperforms state-of-the-art methods by several orders of magnitude in terms of speed while generally providing increased accuracy. We apply FlashWeave to a cross-study data set of 69 818 publicly available human gut samples, resulting in one of the largest and most diverse models of microbial interactions in the human gut to date.
Prediction performance on a variety of synthetic data setsSince experimentally verified biological interactions between microbes are scarcely available, we first employed previously published frameworks that generate synthetic data with ecological structure. We compared the quality of networks inferred by FlashWeave "sensitive" (-S) and "fast" (-F) ( Fig. 1 C) to three competing univariate inference methods (SparCC [30] , eLSA [17] and CoNet [20] ) and three conditional methods (mLDM [21] and SpiecEasi [15] with neighborhood selection (MB) and inverse covariance selection (GL)).
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.