Motivation: RNA sequencing is now widely performed to study differential expression among experimental conditions. As tests are performed on a large number of genes, stringent false-discovery rate control is required at the expense of detection power. Ad hoc filtering techniques are regularly used to moderate this correction by removing genes with low signal, with little attention paid to their impact on downstream analyses.Results: We propose a data-driven method based on the Jaccard similarity index to calculate a filtering threshold for replicated RNA sequencing data. In comparisons with alternative data filters regularly used in practice, we demonstrate the effectiveness of our proposed method to correctly filter lowly expressed genes, leading to increased detection power for moderately to highly expressed genes. Interestingly, this data-driven threshold varies among experiments, highlighting the interest of the method proposed here.Availability: The proposed filtering method is implemented in the package available on Bioconductor.Contact: andrea.rau@jouy.inra.frSupplementary information: Supplementary data are available at Bioinformatics online.
Abstract. Gaussian graphical models are widely utilized to infer and visualize networks of dependencies between continuous variables. However, inferring the graph is difficult when the sample size is small compared to the number of variables. To reduce the number of parameters to estimate in the model, we propose a non-asymptotic model selection procedure supported by strong theoretical guarantees based on an oracle type inequality and a minimax lower bound. The covariance matrix of the model is approximated by a block-diagonal matrix. The structure of this matrix is detected by thresholding the sample covariance matrix, where the threshold is selected using the slope heuristic. Based on the block-diagonal structure of the covariance matrix, the estimation problem is divided into several independent problems: subsequently, the network of dependencies between variables is inferred using the graphical lasso algorithm in each block. The performance of the procedure is illustrated on simulated data. An application to a real gene expression dataset with a limited sample size is also presented: the dimension reduction allows attention to be objectively focused on interactions among smaller subsets of genes, leading to a more parsimonious and interpretable modular network.
We introduce a k-mer-based computational protocol, DE-kupl, for capturing local RNA variation in a set of RNA-seq libraries, independently of a reference genome or transcriptome. DE-kupl extracts all k-mers with differential abundance directly from the raw data files. This enables the retrieval of virtually all variation present in an RNA-seq data set. This variation is subsequently assigned to biological events or entities such as differential long non-coding RNAs, splice and polyadenylation variants, introns, repeats, editing or mutation events, and exogenous RNA. Applying DE-kupl to human RNA-seq data sets identified multiple types of novel events, reproducibly across independent RNA-seq experiments.Electronic supplementary materialThe online version of this article (doi:10.1186/s13059-017-1372-2) contains supplementary material, which is available to authorized users.
Gene network inference from transcriptomic data is an important methodological challenge and a key aspect of systems biology. Although several methods have been proposed to infer networks from microarray data, there is a need for inference methods able to model RNA-seq data, which are count-based and highly variable. In this work we propose a hierarchical Poisson log-normal model with a Lasso penalty to infer gene networks from RNA-seq data; this model has the advantage of directly modelling discrete data and accounting for inter-sample variance larger than the sample mean. Using real microRNA-seq data from breast cancer tumors and simulations, we compare this method to a regularized Gaussian graphical model on log-transformed data, and a Poisson log-linear graphical model with a Lasso penalty on power-transformed data. For data simulated with large inter-sample dispersion, the proposed model performs better than the other methods in terms of sensitivity, specificity and area under the ROC curve. These results show the necessity of methods specifically designed for gene network inference from RNA-seq data.
A reference-free computational workflow for the discovery of unannotated RNA biomarkers retrieves novel noncoding RNAs with high potential for prostate cancer diagnosis.
Plants have developed a diversity of strategies to take up and store essential metals in order to colonize various types of soils including mineralized soils. Yet, our knowledge of the capacity of plant species to accumulate metals is still fragmentary across the plant kingdom. In this study, we have used the X-Ray Fluorescence technology to analyze metal concentration in a wide diversity of species of the Neotropical flora that was not extensively investigated so far. In total, we screened more than 11 000 specimens representing about 5000 species from herbaria in Paris and Cuba. Our study provides a large overview of the accumulation of metals such as manganese, zinc and nickel in the Neotropical flora. We report 30 new nickel hyperaccumulating species from Cuba, including the first records in the families Connaraceae, Melastomataceae, Polygonaceae, Santalaceae and Urticaceae. We also identified the first species from this region of the world that can be considered as manganese hyperaccumulators in the genera Lomatia (Proteaceae), Calycogonium (Melastomataceae), Ilex (Aquifoliaceae), Morella (Myricaceae) and Pimenta (Myrtaceae). Finally, we report the first zinc hyperaccumulator, Rinorea multivenosa (Violaceae), from the Amazonas region. The identification of species able to accumulate high amounts of metals will become instrumental to support the development of phytotechnologies in order to limit the impact of soil metal pollution in this region of the world.
Inter-species RNA-Seq datasets are increasingly common, and have the potential to answer new questions about the evolution of gene expression. Single species differential expression analysis is now a well studied problem that benefits from sound statistical methods. Extensive reviews on biological or synthetic datasets have provided the community with a clear picture on the relative performances of the available methods in various settings. However, synthetic dataset simulation tools are still missing in the inter-species gene expression context. In this work, we develop and implement a new simulation framework. This tool builds on both the RNA-Seq and the Phylogenetic Comparative Methods literatures to generate realistic count datasets, while taking into account the phylogenetic relationships between the samples. We illustrate the usefulness of this new framework through a targeted simulation study, that reproduces the features of a recently published dataset, containing gene expression data in adult eye tissue across blind and sighted freshwater crayfish species. Using our simulated datasets, we perform a fair comparison of several approaches used for differential expression analysis. This benchmark reveals some of the strengths and weaknesses of both the classical and phylogenetic approaches for inter-species differential expression analysis, and allows for a reanalysis of the crayfish dataset. The tool has been integrated in the R package compcodeR, freely available on Bioconductor.
Background Most interactions between the host and its microbiota occur at the gut barrier, and primary colonizers are essential in the gut barrier maturation in the early life. The mother–offspring transmission of microorganisms is the most important factor influencing microbial colonization in mammals, and C-section delivery (CSD) is an important disruptive factor of this transfer. Recently, the deregulation of symbiotic host-microbe interactions in early life has been shown to alter the maturation of the immune system, predisposing the host to gut barrier dysfunction and inflammation. The main goal of this study is to decipher the role of the early-life gut microbiota-barrier alterations and its links with later-life risks of intestinal inflammation in a murine model of CSD. Results The higher sensitivity to chemically induced inflammation in CSD mice is related to excessive exposure to a too diverse microbiota too early in life. This early microbial stimulus has short-term consequences on the host homeostasis. It switches the pup’s immune response to an inflammatory context and alters the epithelium structure and the mucus-producing cells, disrupting gut homeostasis. This presence of a too diverse microbiota in the very early life involves a disproportionate short-chain fatty acids ratio and an excessive antigen exposure across the vulnerable gut barrier in the first days of life, before the gut closure. Besides, as shown by microbiota transfer experiments, the microbiota is causal in the high sensitivity of CSD mice to chemical-induced colitis and in most of the phenotypical parameters found altered in early life. Finally, supplementation with lactobacilli, the main bacterial group impacted by CSD in mice, reverts the higher sensitivity to inflammation in ex-germ-free mice colonized by CSD pups’ microbiota. Conclusions Early-life gut microbiota-host crosstalk alterations related to CSD could be the linchpin behind the phenotypic effects that lead to increased susceptibility to an induced inflammation later in life in mice.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.