BackgroundTaxonomic classification of marker-gene sequences is an important step in microbiome analysis.ResultsWe present q2-feature-classifier (https://github.com/qiime2/q2-feature-classifier), a QIIME 2 plugin containing several novel machine-learning and alignment-based methods for taxonomy classification. We evaluated and optimized several commonly used classification methods implemented in QIIME 1 (RDP, BLAST, UCLUST, and SortMeRNA) and several new methods implemented in QIIME 2 (a scikit-learn naive Bayes machine-learning classifier, and alignment-based taxonomy consensus methods based on VSEARCH, and BLAST+) for classification of bacterial 16S rRNA and fungal ITS marker-gene amplicon sequence data. The naive-Bayes, BLAST+-based, and VSEARCH-based classifiers implemented in QIIME 2 meet or exceed the species-level accuracy of other commonly used methods designed for classification of marker gene sequences that were evaluated in this work. These evaluations, based on 19 mock communities and error-free sequence simulations, including classification of simulated “novel” marker-gene sequences, are available in our extensible benchmarking framework, tax-credit (https://github.com/caporaso-lab/tax-credit-data).ConclusionsOur results illustrate the importance of parameter tuning for optimizing classifier performance, and we make recommendations regarding parameter choices for these classifiers under a range of standard operating conditions. q2-feature-classifier and tax-credit are both free, open-source, BSD-licensed packages available on GitHub.
BackgroundWe present the Biological Observation Matrix (BIOM, pronounced “biome”) format: a JSON-based file format for representing arbitrary observation by sample contingency tables with associated sample and observation metadata. As the number of categories of comparative omics data types (collectively, the “ome-ome”) grows rapidly, a general format to represent and archive this data will facilitate the interoperability of existing bioinformatics tools and future meta-analyses.FindingsThe BIOM file format is supported by an independent open-source software project (the biom-format project), which initially contains Python objects that support the use and manipulation of BIOM data in Python programs, and is intended to be an open development effort where developers can submit implementations of these objects in other programming languages.ConclusionsThe BIOM file format and the biom-format project are steps toward reducing the “bioinformatics bottleneck” that is currently being experienced in diverse areas of biological sciences, and will help us move toward the next phase of comparative omics where basic science is translated into clinical and environmental applications. The BIOM file format is currently recognized as an Earth Microbiome Project Standard, and as a Candidate Standard by the Genomic Standards Consortium.
We present QIIME 2, an open-source microbiome data science platform accessible to users spanning the microbiome research ecosystem, from scientists and engineers to clinicians and policy makers. QIIME 2 provides new features that will drive the next generation of microbiome research. These include interactive spatial and temporal analysis and visualization tools, support for metabolomics and shotgun metagenomics analysis, and automated data provenance tracking to ensure reproducible, transparent microbiome data science.
We present a performance-optimized algorithm, subsampled open-reference OTU picking, for assigning marker gene (e.g., 16S rRNA) sequences generated on next-generation sequencing platforms to operational taxonomic units (OTUs) for microbial community analysis. This algorithm provides benefits over de novo OTU picking (clustering can be performed largely in parallel, reducing runtime) and closed-reference OTU picking (all reads are clustered, not only those that match a reference database sequence with high similarity). Because more of our algorithm can be run in parallel relative to “classic” open-reference OTU picking, it makes open-reference OTU picking tractable on massive amplicon sequence data sets (though on smaller data sets, “classic” open-reference OTU clustering is often faster). We illustrate that here by applying it to the first 15,000 samples sequenced for the Earth Microbiome Project (1.3 billion V4 16S rRNA amplicons). To the best of our knowledge, this is the largest OTU picking run ever performed, and we estimate that our new algorithm runs in less than 1/5 the time than would be required of “classic” open reference OTU picking. We show that subsampled open-reference OTU picking yields results that are highly correlated with those generated by “classic” open-reference OTU picking through comparisons on three well-studied datasets. An implementation of this algorithm is provided in the popular QIIME software package, which uses uclust for read clustering. All analyses were performed using QIIME’s uclust wrappers, though we provide details (aided by the open-source code in our GitHub repository) that will allow implementation of subsampled open-reference OTU picking independently of QIIME (e.g., in a compiled programming language, where runtimes should be further reduced). Our analyses should generalize to other implementations of these OTU picking algorithms. Finally, we present a comparison of parameter settings in QIIME’s OTU picking workflows and make recommendations on settings for these free parameters to optimize runtime without reducing the quality of the results. These optimized parameters can vastly decrease the runtime of uclust-based OTU picking in QIIME.
Multi-omic insights into microbiome function and composition typically advance one study at a time. However, to understand relationships across studies, they must be aggregated into meta-analyses. This makes it possible to generate new hypotheses by finding features that are reproducible across biospecimens and data layers. Qiita dramatically accelerates such integration tasks in a web-based microbiome comparison platform, which we demonstrate with Human Microbiome Project and iHMP data.
BackgroundIt is now apparent that the complex microbial communities found on and in the human body vary across individuals. What has largely been missing from previous studies is an understanding of how these communities vary over time within individuals. To the extent to which it has been considered, it is often assumed that temporal variability is negligible for healthy adults. Here we address this gap in understanding by profiling the forehead, gut (fecal), palm, and tongue microbial communities in 85 adults, weekly over 3 months.ResultsWe found that skin (forehead and palm) varied most in the number of taxa present, whereas gut and tongue communities varied more in the relative abundances of taxa. Within each body habitat, there was a wide range of temporal variability across the study population, with some individuals harboring more variable communities than others. The best predictor of these differences in variability across individuals was microbial diversity; individuals with more diverse gut or tongue communities were more stable in composition than individuals with less diverse communities.ConclusionsLongitudinal sampling of a relatively large number of individuals allowed us to observe high levels of temporal variability in both diversity and community structure in all body habitats studied. These findings suggest that temporal dynamics may need to be considered when attempting to link changes in microbiome structure to changes in health status. Furthermore, our findings show that, not only is the composition of an individual’s microbiome highly personalized, but their degree of temporal variability is also a personalized feature.Electronic supplementary materialThe online version of this article (doi:10.1186/s13059-014-0531-y) contains supplementary material, which is available to authorized users.
The human body harbors 10–100 trillion microbes, mainly bacteria in our gut, which greatly outnumber our own human cells. This bacterial assemblage, referred to as the human microbiota, plays a fundamental role in our well-being. Deviations from healthy microbial compositions (dysbioses) have been linked with important human diseases, including inflammation-linked disorders such as allergies, obesity and inflammatory bowel disease. Characterizing the temporal variations and community membership of the healthy human microbiome is critical in order to accurately identify the significant deviations from normality that could be associated with disease states. However, the diversity of the human microbiome varies between body sites, between individuals, and over time. Environmental differences have also been shown to play a role in shaping the human microbiome in different cultures, requiring that the healthy human microbiome be characterized across lifespans, ethnicities, nationalities, cultures, and geographic locales. In this paper, we summarize our knowledge on the microbial composition of the five best-characterized body sites (gut, skin, oral, airways, and vagina), focusing on inter- and intrapersonal variations and our current understanding of the sources of this variation.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
334 Leonard St
Brooklyn, NY 11211
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.