Background: Data from 16S ribosomal RNA (rRNA) amplicon sequencing present challenges to ecological and statistical interpretation. In particular, library sizes often vary over several ranges of magnitude, and the data contains many zeros. Although we are typically interested in comparing relative abundance of taxa in the ecosystem of two or more groups, we can only measure the taxon relative abundance in specimens obtained from the ecosystems. Because the comparison of taxon relative abundance in the specimen is not equivalent to the comparison of taxon relative abundance in the ecosystems, this presents a special challenge. Second, because the relative abundance of taxa in the specimen (as well as in the ecosystem) sum to 1, these are compositional data. Because the compositional data are constrained by the simplex (sum to 1) and are not unconstrained in the Euclidean space, many standard methods of analysis are not applicable. Here, we evaluate how these challenges impact the performance of existing normalization methods and differential abundance analyses. Results: Effects on normalization: Most normalization methods enable successful clustering of samples according to biological origin when the groups differ substantially in their overall microbial composition. Rarefying more clearly clusters samples according to biological origin than other normalization techniques do for ordination metrics based on presence or absence. Alternate normalization measures are potentially vulnerable to artifacts due to library size. Effects on differential abundance testing: We build on a previous work to evaluate seven proposed statistical methods using rarefied as well as raw data. Our simulation studies suggest that the false discovery rates of many differential abundance-testing methods are not increased by rarefying itself, although of course rarefying results in a loss of sensitivity due to elimination of a portion of available data. For groups with large (~10×) differences in the average library size, rarefying lowers the false discovery rate. DESeq2, without addition of a constant, increased sensitivity on smaller datasets (<20 samples per group) but tends towards a higher false discovery rate with more samples, very uneven (~10×) library sizes, and/or compositional effects. For drawing inferences regarding taxon abundance in the ecosystem, analysis of composition of microbiomes (ANCOM) is not only very sensitive (for >20 samples per group) but also critically the only method tested that has a good control of false discovery rate. Conclusions: These findings guide which normalization and differential abundance techniques to use based on the data characteristics of a given study.
MiRNAs are regulatory molecules that can be packaged into exosomes and secreted from cells. Here, we show that adipose tissue macrophages (ATMs) in obese mice secrete miRNA-containing exosomes (Exos), which cause glucose intolerance and insulin resistance when administered to lean mice. Conversely, ATM Exos obtained from lean mice improve glucose tolerance and insulin sensitivity when administered to obese recipients. miR-155 is one of the miRNAs overexpressed in obese ATM Exos, and earlier studies have shown that PPARγ is a miR-155 target. Our results show that miR-155KO animals are insulin sensitive and glucose tolerant compared to controls. Furthermore, transplantation of WT bone marrow into miR-155KO mice mitigated this phenotype. Taken together, these studies show that ATMs secrete exosomes containing miRNA cargo. These miRNAs can be transferred to insulin target cell types through mechanisms of paracrine or endocrine regulation with robust effects on cellular insulin action, in vivo insulin sensitivity, and overall glucose homeostasis.
Disruption of healthy microbial communities has been linked to numerous diseases, yet microbial interactions are little understood. This is due in part to the large number of bacteria, and the much larger number of interactions (easily in the millions), making experimental investigation very difficult at best and necessitating the nascent field of computational exploration through microbial correlation networks. We benchmark the performance of eight correlation techniques on simulated and real data in response to challenges specific to microbiome studies: fractional sampling of ribosomal RNA sequences, uneven sampling depths, rare microbes and a high proportion of zero counts. Also tested is the ability to distinguish signals from noise, and detect a range of ecological and time-series relationships. Finally, we provide specific recommendations for correlation technique usage. Although some methods perform better than others, there is still considerable need for improvement in current techniques.
RNA interference (RNAi) has become a powerful technique for reverse genetics and drug discovery and, in both of these areas, large-scale high-throughput RNAi screens are commonly performed. The statistical techniques used to analyze these screens are frequently borrowed directly from smallmolecule screening; however small-molecule and RNAi data characteristics differ in meaningful ways. We examine the similarities and differences between RNAi and small-molecule screens, highlighting particular characteristics of RNAi screen data that must be addressed during analysis. Additionally, we provide guidance on selection of analysis techniques in the context of a sample workflow.
Off-target gene silencing can present a notable challenge in the interpretation of data from large-scale RNA interference (RNAi) screens. We performed a detailed analysis of off-targeted genes identified by expression profiling of human cells transfected with small interfering RNA (siRNA). Contrary to common assumption, analysis of the subsequent off-target gene database showed that overall identity makes little or no contribution to determining whether the expression of a particular gene will be affected by a given siRNA, except for near-perfect matches. Instead, off-targeting is associated with the presence of one or more perfect 3' untranslated region (UTR) matches with the hexamer or heptamer seed region (positions 2-7 or 2-8) of the antisense strand of the siRNA. These findings have strong implications for future siRNA design and the application of RNAi in high-throughput screening and therapeutic development.
Although recent microarray studies have provided evidence of RNA interference (RNAi)-mediated off-target gene modulation, little is known about whether these changes induce observable phenotypic outcomes. Here we show that a fraction of randomly selected small inhibitory RNAs (siRNAs) can induce changes in cell viability in a target-independent fashion. The observed toxicity requires an intact RNAi pathway and can be eliminated by the addition of chemical modifications that reduce off-target effects. Furthermore, an analysis of toxic and nontoxic duplexes identifies a strong correlation between the toxicity and the presence of a 4-base-pair motif (UGGC) in the RISC-entering strand of toxic siRNA. This article provides further evidence of siRNA-induced off-target effects generating a measurable phenotype and also provides an example of how such undesirable phenotypes can be mitigated by addition of chemical modifications to the siRNA.
We developed a systematic approach to map human genetic networks by combinatorial CRISPR-Cas9 perturbations coupled to robust analysis of growth kinetics. We targeted all pairs of 73 cancer genes with dual-guide RNAs in three cell lines, altogether comprising 141,912 tests of interaction. Numerous therapeutically relevant interactions were identified and these patterns replicated with combinatorial drugs at 75% precision. Based on these results we anticipate cellular context will be critical to synthetic-lethal therapies.
We have implemented in Python the COmparative GENomic Toolkit, a fully integrated and thoroughly tested framework for novel probabilistic analyses of biological sequences, devising workflows, and generating publication quality graphics. PyCogent includes connectors to remote databases, built-in generalized probabilistic techniques for working with biological sequences, and controllers for third-party applications. The toolkit takes advantage of parallel architectures and runs on a range of hardware and operating systems, and is available under the general public license from http://sourceforge.net/projects/pycogent. RationaleThe genetic divergence of species is affected by both DNA metabolic processes and natural selection. Processes contributing to genetic variation that are undetectable with intraspecific data may be detectable by inter-specific analyses because of the accumulation of signal over evolutionary time scales. As a consequence of the greater statistical power, there is interest in applying comparative analyses to address an increasing number and diversity of problems, in particular analyses that integrate sequence and phenotype. Significant barriers that hinder the extension of comparative analyses to exploit genome indexed phenotypic data include the narrow focus of most analytical tools, and the diverse array of data sources, formats, and tools available. Theoretically coherent integrative analyses can be conducted by combining probabilistic models of different aspects of genotype. Probabilistic models of sequence change underlie many core bioinformatics tasks, including similarity search, sequence alignment, phylogenetic inference, and ancestral state reconstruction. Probabilistic models allow usage of likelihood inference, a powerful approach from statistics, to establish the significance of differences in support of competing hypotheses. Linking different analyses through a shared and explicit probabilistic model of sequence change is thus extremely valuable, and provides a basis for generalizing analyses to more complex models of evolution (for example, to incorporate dependence between sites). Numerous studies have established how biological factors representing metabolic or selective influences can be represented in substitution models as specific parameters that affect rates of interchange between sequence motifs or the spatial occurrence of such rates [1][2][3][4]. Given this solid grounding, it is desirable to have a toolkit that allows flexible parameterization of probabilistic models and interchange of appropriate modules.There are many existing software packages that can manipulate biological sequences and structures, but few allow specification of both truly novel statistical models and detailed workflow control for genome scale datasets. Traditional phylogenetic analysis applications [5,6] typically provide a number of explicitly defined statistical models that are difficult to modify. One exception in which the parameterization of entirely novel substitution models was poss...
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.