Datasets collected by high-throughput sequencing (HTS) of 16S rRNA gene amplimers, metagenomes or metatranscriptomes are commonplace and being used to study human disease states, ecological differences between sites, and the built environment. There is increasing awareness that microbiome datasets generated by HTS are compositional because they have an arbitrary total imposed by the instrument. However, many investigators are either unaware of this or assume specific properties of the compositional data. The purpose of this review is to alert investigators to the dangers inherent in ignoring the compositional nature of the data, and point out that HTS datasets derived from microbiome studies can and should be treated as compositions at all stages of analysis. We briefly introduce compositional data, illustrate the pathologies that occur when compositional data are analyzed inappropriately, and finally give guidance and point to resources and examples for the analysis of microbiome datasets using compositional data analysis.
The composition of the human gut microbiota is linked to health and disease, but knowledge of individual microbial species is needed to decipher their biological roles. Despite extensive culturing and sequencing efforts, the complete bacterial repertoire of the human gut microbiota remains undefined. Here we identify 1,952 uncultured candidate bacterial species by reconstructing 92,143 metagenome-assembled genomes from 11,850 human gut microbiomes. These uncultured genomes substantially expand the known species repertoire of the collective human gut microbiota, with a 281% increase in phylogenetic diversity. Although the newly identified species are less prevalent in well-studied populations compared to reference isolate genomes, they improve classification of understudied African and South American samples by more than 200%. These candidate species encode hundreds of newly identified biosynthetic gene clusters and possess a distinctive functional capacity that might explain their elusive nature. Our work expands the known diversity of uncultured gut bacteria, which provides unprecedented resolution for taxonomic and functional characterization of the intestinal microbiota.
BackgroundExperimental designs that take advantage of high-throughput sequencing to generate datasets include RNA sequencing (RNA-seq), chromatin immunoprecipitation sequencing (ChIP-seq), sequencing of 16S rRNA gene fragments, metagenomic analysis and selective growth experiments. In each case the underlying data are similar and are composed of counts of sequencing reads mapped to a large number of features in each sample. Despite this underlying similarity, the data analysis methods used for these experimental designs are all different, and do not translate across experiments. Alternative methods have been developed in the physical and geological sciences that treat similar data as compositions. Compositional data analysis methods transform the data to relative abundances with the result that the analyses are more robust and reproducible.ResultsData from an in vitro selective growth experiment, an RNA-seq experiment and the Human Microbiome Project 16S rRNA gene abundance dataset were examined by ALDEx2, a compositional data analysis tool that uses Bayesian methods to infer technical and statistical error. The ALDEx2 approach is shown to be suitable for all three types of data: it correctly identifies both the direction and differential abundance of features in the differential growth experiment, it identifies a substantially similar set of differentially expressed genes in the RNA-seq dataset as the leading tools and it identifies as differential the taxa that distinguish the tongue dorsum and buccal mucosa in the Human Microbiome Project dataset. The design of ALDEx2 reduces the number of false positive identifications that result from datasets composed of many features in few samples.ConclusionStatistical analysis of high-throughput sequencing datasets composed of per feature counts showed that the ALDEx2 R package is a simple and robust tool, which can be applied to RNA-seq, 16S rRNA gene sequencing and differential growth datasets, and by extension to other techniques that use a similar approach.
Alignments and program files can be found in the Supplementary Information.
Experimental variance is a major challenge when dealing with high-throughput sequencing data. This variance has several sources: sampling replication, technical replication, variability within biological conditions, and variability between biological conditions. The high per-sample cost of RNA-Seq often precludes the large number of experiments needed to partition observed variance into these categories as per standard ANOVA models. We show that the partitioning of within-condition to between-condition variation cannot reasonably be ignored, whether in single-organism RNA-Seq or in Meta-RNA-Seq experiments, and further find that commonly-used RNA-Seq analysis tools, as described in the literature, do not enforce the constraint that the sum of relative expression levels must be one, and thus report expression levels that are systematically distorted. These two factors lead to misleading inferences if not properly accommodated. As it is usually only the biological between-condition and within-condition differences that are of interest, we developed ALDEx, an ANOVA-like differential expression procedure, to identify genes with greater between- to within-condition differences. We show that the presence of differential expression and the magnitude of these comparative differences can be reasonably estimated with even very small sample sizes.
BackgroundFecal bacteriotherapy (‘stool transplant’) can be effective in treating recurrent Clostridium difficile infection, but concerns of donor infection transmission and patient acceptance limit its use. Here we describe the use of a stool substitute preparation, made from purified intestinal bacterial cultures derived from a single healthy donor, to treat recurrent C. difficile infection that had failed repeated standard antibiotics. Thirty-three isolates were recovered from a healthy donor stool sample. Two patients who had failed at least three courses of metronidazole or vancomycin underwent colonoscopy and the mixture was infused throughout the right and mid colon. Pre-treatment and post-treatment stool samples were analyzed by 16 S rRNA gene sequencing using the Ion Torrent platform.ResultsBoth patients were infected with the hyper virulent C. difficile strain, ribotype 078. Following stool substitute treatment, each patient reverted to their normal bowel pattern within 2 to 3 days and remained symptom-free at 6 months. The analysis demonstrated that rRNA sequences found in the stool substitute were rare in the pre-treatment stool samples but constituted over 25% of the sequences up to 6 months after treatment.ConclusionThis proof-of-principle study demonstrates that a stool substitute mixture comprising a multi-species community of bacteria is capable of curing antibiotic-resistant C. difficile colitis. This benefit correlates with major changes in stool microbial profile and these changes reflect isolates from the synthetic mixture.Trial registrationClinical trial registration number: CinicalTrials.gov NCT01372943
In the United States, 1 in 8 women will be diagnosed with breast cancer in her lifetime. Along with genetics, the environment contributes to disease development, but what these exact environmental factors are remains unknown. We have previously shown that breast tissue is not sterile but contains a diverse population of bacteria. We thus believe that the host's local microbiome could be modulating the risk of breast cancer development. Using 16S rRNA amplicon sequencing, we show that bacterial profiles differ between normal adjacent tissue from women with breast cancer and tissue from healthy controls. Women with breast cancer had higher relative abundances of Bacillus, Enterobacteriaceae and Staphylococcus. Escherichia coli (a member of the Enterobacteriaceae family) and Staphylococcus epidermidis, isolated from breast cancer patients, were shown to induce DNA double-stranded breaks in HeLa cells using the histone-2AX (H2AX) phosphorylation (γ-H2AX) assay. We also found that microbial profiles are similar between normal adjacent tissue and tissue sampled directly from the tumor. This study raises important questions as to what role the breast microbiome plays in disease development or progression and how we can manipulate this for possible therapeutics or prevention. IMPORTANCE This study shows that different bacterial profiles in breast tissue exist between healthy women and those with breast cancer. Higher relative abundances of bacteria that had the ability to cause DNA damage in vitro were detected in breast cancer patients, as was a decrease in some lactic acid bacteria, known for their beneficial health effects, including anticarcinogenic properties. This study raises important questions as to the role of the mammary microbiome in modulating the risk of breast cancer development.
f ; Preclinical Imaging, Perkin Elmer, Alameda, California, USA g In recent years, a greater appreciation for the microbes inhabiting human body sites has emerged. In the female mammary gland, milk has been shown to contain bacterial species, ostensibly reaching the ducts from the skin. We decided to investigate whether there is a microbiome within the mammary tissue. Using 16S rRNA sequencing and culture, we analyzed breast tissue from 81 women with and without cancer in Canada and Ireland. A diverse population of bacteria was detected within tissue collected from sites all around the breast in women aged 18 to 90, not all of whom had a history of lactation. The principal phylum was Proteobacteria. The most abundant taxa in the Canadian samples were Bacillus (11.4%), Acinetobacter (10.0%), Enterobacteriaceae (8.3%), Pseudomonas (6.5%), Staphylococcus (6.5%), Propionibacterium (5.8%), Comamonadaceae (5.7%), Gammaproteobacteria (5.0%), and Prevotella (5.0%). In the Irish samples the most abundant taxa were Enterobacteriaceae (30.8%), Staphylococcus (12.7%), Listeria welshimeri (12.1%), Propionibacterium (10.1%), and Pseudomonas (5.3%). None of the subjects had signs or symptoms of infection, but the presence of viable bacteria was confirmed in some samples by culture. The extent to which these organisms play a role in health or disease remains to be determined.T he human body is home to a large and diverse population of bacteria with properties that are both harmful and beneficial to health (1-6), and for this reason there has been a strong push in recent years to fully characterize the bacteria associated with different parts of the body under different health conditions. These studies have been made possible with the use of deep-sequencing technologies, and sites once thought of as sterile, such as the stomach, bladder, and lungs, have now been shown to harbor an indigenous microbiota (7-9). We hypothesized that microbes may also be present in breast tissue given the known presence of bacteria in human milk (10). This is not surprising considering that skin and oral bacteria have access to the mammary ducts through the nipple (11), with some recent studies suggesting their source to be from the mother's gastrointestinal tract (12). We rationalized that given the nutrient-rich fatty composition of the female breast, the widespread vasculature and lymphatics, and the diffuse location of the lobules and ducts leading from the nipple, bacteria would be widespread within the mammary glands, irrespective of lactation. Thus, the objective of the study was to determine, using culture and nonculture methods, whether breast tissue contains a microbiome. To ensure that the results obtained were not an artifact of a single demographic, tissue was collected from two distant countries, Canada and Ireland. MATERIALS AND METHODS Clinical samples and study design (Canadian samples). Ethical approval for this study was obtained from Western Research Ethics Board andLawson Health Research Institute, London, Ontario, Canada. Patie...
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.