Abstract:High‐throughput sequencing of amplicons from environmental DNA samples permits rapid, standardized and comprehensive biodiversity assessments. However, retrieving and interpreting the structure of such data sets requires efficient methods for dimensionality reduction. Latent Dirichlet Allocation (LDA) can be used to decompose environmental DNA samples into overlapping assemblages of co‐occurring taxa. It is a flexible model‐based method adapted to uneven sample sizes and to large and sparse data sets. Here, we… Show more
“…Once we had selected the K value, we ran 100 independent MCMC chains on the whole dataset from random initial conditions. To check for potential insufficient mixing along the chains, we measured the similarity in the spatial distribution of assemblages across the chains (Table S1), using the metric defined in Sommeria-Klein et al (2019). We picked the chain with posterior probability closest to the mean across chains for the final interpretation.…”
Section: Methodsmentioning
confidence: 99%
“…The data encompass 250,057 eukaryotic Operational Taxonomic Units (OTUs) sampled globally at the surface and at the Deep Chlorophyl Maximum (DCM) across 129 stations. We use a probabilistic model that allows identification of a number of ‘assemblages’, each of which represents a set of OTUs that tend to co-occur across samples (Sommeria-Klein et al, 2019; Valle, Baiser, Woodall, & Chazdon, 2014; Methods). Each local planktonic community can then be seen as a sample drawn in various proportions from the assemblages.…”
Short abstractEukaryotic plankton are a core component of marine ecosystems with exceptional taxonomic and ecological diversity. Yet how their ecology interacts with the environment to drive global distribution patterns is poorly understood. Here, we use Tara Oceans metabarcoding data covering all the major ocean basins combined with a probabilistic model of taxon co-occurrence to compare the biogeography of 70 major groups of eukaryotic plankton. We uncover two main axes of biogeographic variation. First, more diverse groups display stronger biogeographic structure. Second, large-bodied consumers are structured by oceanic basins, mostly via the main currents, while small-bodied phototrophs are structured by latitude, with a comparatively stronger influence of biotic conditions. Our study highlights striking differences in biogeographies across plankton groups and disentangles their determinants at the global scale.One-sentence summaryEukaryotic plankton biogeography and its determinants at global scale reflect differences in ecology and body size.
“…Once we had selected the K value, we ran 100 independent MCMC chains on the whole dataset from random initial conditions. To check for potential insufficient mixing along the chains, we measured the similarity in the spatial distribution of assemblages across the chains (Table S1), using the metric defined in Sommeria-Klein et al (2019). We picked the chain with posterior probability closest to the mean across chains for the final interpretation.…”
Section: Methodsmentioning
confidence: 99%
“…The data encompass 250,057 eukaryotic Operational Taxonomic Units (OTUs) sampled globally at the surface and at the Deep Chlorophyl Maximum (DCM) across 129 stations. We use a probabilistic model that allows identification of a number of ‘assemblages’, each of which represents a set of OTUs that tend to co-occur across samples (Sommeria-Klein et al, 2019; Valle, Baiser, Woodall, & Chazdon, 2014; Methods). Each local planktonic community can then be seen as a sample drawn in various proportions from the assemblages.…”
Short abstractEukaryotic plankton are a core component of marine ecosystems with exceptional taxonomic and ecological diversity. Yet how their ecology interacts with the environment to drive global distribution patterns is poorly understood. Here, we use Tara Oceans metabarcoding data covering all the major ocean basins combined with a probabilistic model of taxon co-occurrence to compare the biogeography of 70 major groups of eukaryotic plankton. We uncover two main axes of biogeographic variation. First, more diverse groups display stronger biogeographic structure. Second, large-bodied consumers are structured by oceanic basins, mostly via the main currents, while small-bodied phototrophs are structured by latitude, with a comparatively stronger influence of biotic conditions. Our study highlights striking differences in biogeographies across plankton groups and disentangles their determinants at the global scale.One-sentence summaryEukaryotic plankton biogeography and its determinants at global scale reflect differences in ecology and body size.
“…Application of LDA to these data should help reveal the structure of microbial assemblages on a global scale [52]. For example, Sommeria-Klein et al recently applied LDA to taxonomic profiles of a tropical forest soil DNA dataset to reveal spatial structures [53]. The second direction is the extension of the LDA model-LDA has high model extensibility.…”
Section: Table 2 Functional Assemblage Having the Largest Relativementioning
Background: The human gut microbiome has been suggested to affect human health and thus has received considerable attention. To clarify the structure of the human gut microbiome, clustering methods are frequently applied to human gut taxonomic profiles. Enterotypes, i.e., clusters of individuals with similar microbiome composition, are well-studied and characterized. However, only a few detailed studies on assemblages, i.e., clusters of co-occurring bacterial taxa, have been conducted. Particularly, the relationship between the enterotype and assemblage is not well-understood. Results: In this study, we detected gut microbiome assemblages using a latent Dirichlet allocation (LDA) method. We applied LDA to a large-scale human gut metagenome dataset and found that a 4-assemblage LDA model could represent relationships between enterotypes and assemblages with high interpretability. This model indicated that each individual tends to have several assemblages, three of which corresponded to the three classically recognized enterotypes. Conversely, the fourth assemblage corresponded to no enterotypes and emerged in all enterotypes. Interestingly, the dominant genera of this assemblage (Clostridium, Eubacterium, Faecalibacterium, Roseburia, Coprococcus, and Butyrivibrio) included butyrate-producing species such as Faecalibacterium prausnitzii. Indeed, the fourth assemblage significantly positively correlated with three butyrate-producing functions. Conclusions: We conducted an assemblage analysis on a large-scale human gut metagenome dataset using LDA. The present study revealed that there is an enterotype-independent assemblage.
“…We next tested LDA potential to stratify gut microbiota of the cohort participants. This unsupervised machine learning technique is increasingly finding acceptance in the field of microbiome [46–48] for its unique ability to reveal latent or hidden groups within the data cloud. Supplementary Figure S4 shows LDA model’s perplexity parameter and log-likelihood values to find optimal number of clusters.…”
Alzheimer’s disease (AD) is a heterogeneous neurodegenerative disorder that spans over a continuum with multiple phases including preclinical, mild cognitive impairment, and dementia. Unlike most other chronic diseases there are limited number of human studies reporting on AD gut microbiota in the literature. These published studies suggest that the gut microbiota of AD continuum patients varies considerably throughout the disease stages, raising expectations for existence of multiple microbiota community types. However, the community types of AD gut microbiota were not systematically investigated before, leaving important research gap for diet-based intervention studies and recently initiated precision nutrition approaches aiming at stratifying patients into distinct dietary subgroups. Here, we comprehensively assessed the community types of gut microbiota across the AD continuum. We analyze 16S rRNA amplicon sequencing of stool samples from 27 mild cognitive patients, 47 AD, and 51 non-demented control subjects using tools compatible with compositional nature of microbiota. To characterize gut microbiota community types, we applied multiple machine learning techniques including partitioning around the medoid clustering, fitting probabilistic Dirichlet mixture model, Latent Dirichlet Allocation model, and performed topological data analysis for population scale microbiome stratification based on Mapper algorithm. These four distinct techniques all converge on Prevotella and Bacteroides partitioning of the gut microbiota across AD continuum while some methods provided fine scale resolution in partitioning the community landscape. The Signature taxa and neuropsychometric parameters together robustly classify the heterogenous groups within the cohort. Our results provide a framework for precision nutrition approaches and diet-based intervention studies targeting AD cohorts.IMPORTANCEThe prevalence of AD worldwide is estimated to reach 131 million by 2050. Most disease modifying treatments and drug trials have failed due partly to the heterogeneous and complex nature of the disease. Unlike other neurodegenerative diseases gut microbiota of AD patients is poorly studied. Recently initiated ambitious precision nutrition initiative or other diet-based interventions can potentially be more effective if the heterogeneous disease such as AD is deconstructed into multiple strata allowing for better identification of biomarkers across narrower patient population for improved results. Because gut microbiota is inherently integral part of the nutritional interventions there is unmet need for microbiota-informed stratification of AD clinical cohorts in nutritional studies. Our study fills in this gap and draws attention to the need for microbiota stratification as one of the essential steps for precision nutrition interventions. We demonstrate that while Prevotella and Bacteroides clusters are the consensus partitions the newly developed probabilistic methods can provide fine scale resolution in partitioning the AD gut microbiome landscape.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.