It is challenging to associate features such as human health outcomes, diet, environmental conditions, or other metadata to microbial community measurements, due in part to their quantitative properties. Microbiome multi-omics are typically noisy, sparse (zero-inflated), high-dimensional, extremely non-normal, and often in the form of count or compositional measurements. Here we introduce an optimized combination of novel and established methodology to assess multivariable association of microbial community features with complex metadata in population-scale observational studies. Our approach, MaAsLin 2 (Microbiome Multivariable Associations with Linear Models), uses generalized linear and mixed models to accommodate a wide variety of modern epidemiological studies, including cross-sectional and longitudinal designs, as well as a variety of data types (e.g., counts and relative abundances) with or without covariates and repeated measurements. To construct this method, we conducted a large-scale evaluation of a broad range of scenarios under which straightforward identification of meta-omics associations can be challenging. These simulation studies reveal that MaAsLin 2’s linear model preserves statistical power in the presence of repeated measures and multiple covariates, while accounting for the nuances of meta-omics features and controlling false discovery. We also applied MaAsLin 2 to a microbial multi-omics dataset from the Integrative Human Microbiome (HMP2) project which, in addition to reproducing established results, revealed a unique, integrated landscape of inflammatory bowel diseases (IBD) across multiple time points and omics profiles.
It is challenging to associate features such as human health outcomes, diet, environmental conditions, or other metadata to microbial community measurements, due in part to their quantitative properties. Microbiome multi-omics are typically noisy, sparse (zero-inflated), high-dimensional, extremely non-normal, and often in the form of count or compositional measurements. Here we introduce an optimized combination of novel and established methodology to assess multivariable association of microbial community features with complex metadata in population-scale observational studies. Our approach, MaAsLin 2 (Microbiome Multivariable Associations with Linear Models), uses general linear models to accommodate a wide variety of modern epidemiological studies, including cross-sectional and longitudinal designs, as well as a variety of data types (e.g. counts and relative abundances) with or without covariates and repeated measurements. To construct this method, we conducted a large-scale evaluation of a broad range of scenarios under which straightforward identification of meta-omics associations can be challenging. These simulation studies reveal that MaAsLin 2’s linear model preserves statistical power in the presence of repeated measures and multiple covariates, while accounting for the nuances of meta-omics features and controlling false discovery. We also applied MaAsLin 2 to a microbial multi-omics dataset from the Integrative Human Microbiome (HMP2) project which, in addition to reproducing established results, revealed a unique, integrated landscape of inflammatory bowel disease (IBD) across multiple time points and omics profiles.
Metagenomic assembly enables new organism discovery from microbial communities, but it can only capture few abundant organisms from most metagenomes. Here we present MetaPhlAn 4, which integrates information from metagenome assemblies and microbial isolate genomes for more comprehensive metagenomic taxonomic profiling. From a curated collection of 1.01 M prokaryotic reference and metagenome-assembled genomes, we define unique marker genes for 26,970 species-level genome bins, 4,992 of them taxonomically unidentified at the species level. MetaPhlAn 4 explains ~20% more reads in most international human gut microbiomes and >40% in less-characterized environments such as the rumen microbiome and proves more accurate than available alternatives on synthetic evaluations while also reliably quantifying organisms with no cultured isolates. Application of the method to >24,500 metagenomes highlights previously undetected species to be strong biomarkers for host conditions and lifestyles in human and mouse microbiomes and shows that even previously uncharacterized species can be genetically profiled at the resolution of single microbial strains.
Motivation Metatranscriptomics (MTX) has become an increasingly practical way to profile the functional activity of microbial communities in situ. However, MTX remains underutilized due to experimental and computational limitations. The latter are complicated by non-independent changes in both RNA transcript levels and their underlying genomic DNA copies (as microbes simultaneously change their overall abundance in the population and regulate individual transcripts), genetic plasticity (as whole loci are frequently gained and lost in microbial lineages) and measurement compositionality and zero-inflation. Here, we present a systematic evaluation of and recommendations for differential expression (DE) analysis in MTX. Results We designed and assessed six statistical models for DE discovery in MTX that incorporate different combinations of DNA and RNA normalization and assumptions about the underlying changes of gene copies or species abundance within communities. We evaluated these models on multiple simulated and real multi-omic datasets. Models adjusting transcripts relative to their encoding gene copies as a covariate were significantly more accurate in identifying DE from MTX in both simulated and real datasets. Moreover, we show that when paired DNA measurements (metagenomic data) are not available, models normalizing MTX measurements within-species while also adjusting for total-species RNA balance sensitivity, specificity and interpretability of DE detection, as does filtering likely technical zeros. The efficiency and accuracy of these models pave the way for more effective MTX-based DE discovery in microbial communities. Availability and implementation The analysis code and synthetic datasets used in this evaluation are available online at http://huttenhower.sph.harvard.edu/mtx2021. Supplementary information Supplementary data are available at Bioinformatics online.
Metagenomic assembly enables novel organism discovery from microbial communities, but from most metagenomes it can only capture few abundant organisms. Here, we present a method - MetaPhlAn 4 - to integrate information from both metagenome assemblies and microbial isolate genomes for improved and more comprehensive metagenomic taxonomic profiling. From a curated collection of 1.01M prokaryotic reference and metagenome-assembled genomes, we defined unique marker genes for 26,970 species-level genome bins, 4,992 of them taxonomically unidentified at the species level. MetaPhlAn 4 explains ~20% more reads in most international human gut microbiomes and >40% in less-characterized environments such as the rumen microbiome, and proved more accurate than available alternatives on synthetic evaluations while also reliably quantifying organisms with no cultured isolates. Application of the method to >24,500 metagenomes highlighted previously undetected species to be strong biomarkers for host conditions and lifestyles in human and mice microbiomes, and showed that even previously uncharacterized species can be genetically profiled at the resolution of single microbial strains. MetaPhlAn 4 thus integrates the novelty of metagenomic assemblies with the sensitivity and fidelity of reference-based analyses, providing efficient metagenomic profiling of uncharacterized species and enabling deeper and more comprehensive microbiome biomarker detection.
Background The incidence of colorectal cancer (CRC) is increasing in developing countries, yet limited research on the CRC- associated microbiota has been conducted in these areas, in part due to scarce resources, facilities, and the difficulty of fresh or frozen stool storage/transport. Here, we aimed (1) to establish a broad representation of diverse developing countries (Argentina, Chile, India, and Vietnam); (2) to validate a ‘resource-light’ sample-collection protocol translatable in these settings using guaiac faecal occult blood test (gFOBT) cards stored and, importantly, shipped internationally at room temperature; (3) to perform initial profiling of the collective CRC-associated microbiome of these developing countries; and (4) to compare this quantitatively with established CRC biomarkers from developed countries. Methods We assessed the effect of international storage and transport at room temperature by replicating gFOBT from five UK volunteers, storing two in the UK, and sending replicates to institutes in the four countries. Next, to determine the effect of prolonged UK storage, DNA extraction replicates for a subset of samples were performed up to 252 days apart. To profile the CRC-associated microbiome of developing countries, gFOBT were collected from 41 treatment-naïve CRC patients and 40 non-CRC controls from across the four institutes, and V4 16S rRNA gene sequencing was performed. Finally, we constructed a random forest (RF) model that was trained and tested against existing datasets from developed countries. Results The microbiome was stably assayed when samples were stored/transported at room temperature and after prolonged UK storage. Large-scale microbiome structure was separated by country and continent, with a smaller effect from CRC. Importantly, the RF model performed similarly to models trained using external datasets and identified similar taxa of importance (Parvimonas, Peptostreptococcus, Fusobacterium, Alistipes, and Escherichia). Conclusions This study demonstrates that gFOBT, stored and transported at room temperature, represents a suitable method of faecal sample collection for amplicon-based microbiome biomarkers in developing countries and suggests a CRC-faecal microbiome association that is consistent between developed and developing countries.
To assess the utility of microbiome profiles for national-scale colorectal cancer (CRC) screening, we assessed 2,252 routinely processed NHS Bowel Cancer Screening Programme guaiac faecal occult blood test (gFOBT) samples. We generated four microbiome-based random forest classification models, each showing potential to improve accuracy.Two distinguished either CRC or neoplasm (CRC or adenoma) from gFOBT blood-negative samples (equivalent to firsttier screening). Two distinguished CRC or neoplasm from samples that had tested positive for blood by gFOBT, with participants referred for colonoscopy, but at colonoscopy no-lesion was found (second-tier screening to rule out gFOBT false positives). Each model remained robust to validation and when restricted to fifteen taxa, raising the possibility of an inexpensive qPCR-test. The models performed favourably compared with existing microbiome studies, FIT and Cologuard. These results suggest that microbiome analysis could be integrated into national CRC screening to improve accuracy and reduce the number of unnecessary screening colonoscopies.
Background The gut microbiome is a critical modulator of host immunity and is linked to the immune response to respiratory viral infections. However, few studies have gone beyond describing broad compositional alterations in severe COVID-19, defined as acute respiratory or other organ failure. Methods We profiled 127 hospitalized patients with COVID-19 (n = 79 with severe COVID-19 and 48 with moderate) who collectively provided 241 stool samples from April 2020 to May 2021 to identify links between COVID-19 severity and gut microbial taxa, their biochemical pathways, and stool metabolites. Results Forty-eight species were associated with severe disease after accounting for antibiotic use, age, sex, and various comorbidities. These included significant in-hospital depletions of Fusicatenibacter saccharivorans and Roseburia hominis, each previously linked to post-acute COVID syndrome or “long COVID,” suggesting these microbes may serve as early biomarkers for the eventual development of long COVID. A random forest classifier achieved excellent performance when tasked with classifying whether stool was obtained from patients with severe vs. moderate COVID-19, a finding that was externally validated in an independent cohort. Dedicated network analyses demonstrated fragile microbial ecology in severe disease, characterized by fracturing of clusters and reduced negative selection. We also observed shifts in predicted stool metabolite pools, implicating perturbed bile acid metabolism in severe disease. Conclusions Here, we show that the gut microbiome differentiates individuals with a more severe disease course after infection with COVID-19 and offer several tractable and biologically plausible mechanisms through which gut microbial communities may influence COVID-19 disease course. Further studies are needed to expand upon these observations to better leverage the gut microbiome as a potential biomarker for disease severity and as a target for therapeutic intervention.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.