The state and development of the intestinal epithelium is vital for infant health, and increased understanding in this area has been limited by an inability to directly assess epithelial cell biology in the healthy newborn intestine. To that end, we have developed a novel, noninvasive, molecular approach that utilizes next generation RNA sequencing on stool samples containing intact epithelial cells for the purpose of quantifying intestinal gene expression. We then applied this technique to compare host gene expression in healthy term and extremely preterm infants. Bioinformatic analyses demonstrate repeatable detection of human mRNA expression, and network analysis shows immune cell function and inflammation pathways to be up-regulated in preterm infants. This study provides incontrovertible evidence that whole-genome sequencing of stool-derived RNA can be used to examine the neonatal host epithelial transcriptome in infants, which opens up opportunities for sequential monitoring of gut gene expression in response to dietary or therapeutic interventions.
BackgroundSequencing datasets consist of a finite number of reads which map to specific regions of a reference genome. Most effort in modeling these datasets focuses on the detection of univariate differentially expressed genes. However, for classification, we must consider multiple genes and their interactions.ResultsThus, we introduce a hierarchical multivariate Poisson model (MP) and the associated optimal Bayesian classifier (OBC) for classifying samples using sequencing data. Lacking closed-form solutions, we employ a Monte Carlo Markov Chain (MCMC) approach to perform classification. We demonstrate superior or equivalent classification performance compared to typical classifiers for two synthetic datasets and over a range of classification problem difficulties. We also introduce the Bayesian minimum mean squared error (MMSE) conditional error estimator and demonstrate its computation over the feature space. In addition, we demonstrate superior or leading class performance over an RNA-Seq dataset containing two lung cancer tumor types from The Cancer Genome Atlas (TCGA).ConclusionsThrough model-based, optimal Bayesian classification, we demonstrate superior classification performance for both synthetic and real RNA-Seq datasets. A tutorial video and Python source code is available under an open source license at http://bit.ly/1gimnss.Electronic supplementary materialThe online version of this article (doi:10.1186/s12859-014-0401-3) contains supplementary material, which is available to authorized users.
There is mounting evidence that noncoding microRNAs (miRNA) are modulated by select chemoprotective dietary agents. For example, recently we demonstrated that the unique combination of dietary fish oil (containing n-3 fatty acids) plus pectin (fermented to butyrate in the colon) (FPA) up-regulates a subset of putative tumor suppressor miRNAs in intestinal mucosa, and down-regulates their predicted target genes following carcinogen exposure as compared to control (corn oil plus cellulose (CCA)) diet. To further elucidate the biological effects of diet and carcinogen modulated miR’s in the colon, we verified that miR-26b and miR-203 directly target PDE4B and TCF4, respectively. Since perturbations in adult stem cell dynamics are generally believed to represent an early step in colon tumorigenesis and to better understand how the colonic stem cell population responds to environmental factors such as diet and carcinogen, we additionally determined the effects of the chemoprotective FPA diet on miRNAs and mRNAs in colonic stem cells obtained from Lgr5-EGFP-IRES-creERT2 knock-in mice. Following global miRNA profiling, 26 miRNAs (P <0.05) were differentially expressed in Lgr5high stem cells as compared to Lgr5negative differentiated cells. FPA treatment up-regulated miR-19b, miR-26b and miR-203 expression as compared to CCA specifically in Lgr5high cells. In contrast, in Lgr5negative cells, only miR-19b and its indirect target PTK2B were modulated by the FPA diet. These data indicate for the first time that select dietary cues can impact stem cell regulatory networks, in part, by modulating the steady-state levels of miRNAs. To our knowledge, this is the first study to utilize Lgr5+ reporter mice to determine the impact of diet and carcinogen on miRNA expression in colonic stem cells and their progeny.
Single Echo Acquisition (SEA) is a method of completely parallel MR imaging that uses coil elements for spatial localization during receive, replacing the need for phase encoding repetitions. In this receive-only form, SEA imaging requires the use of a phase compensation gradient, the value of which is dependent on coil geometry, imaging distance from the elements, and element orientation. Operation of the arrays in transmit-receive mode, while adding significant complexity, is one potential method of eliminating the restrictions imposed by the phase compensation gradient. This abstract examines a straightforward current-splitting technique to enable parallel transmission for studying the complicated field interactions of these array coils in transmit mode.
We present a procedure to generate a stochastic genetic regulatory network model consistent with pathway information. Using the stochastic dynamics of Markov chains, we produce a model constrained by the prior knowledge despite the sometimes incomplete, time independent, and often conflicting nature of these pathways. We apply the Markov theory to study the model's long run behavior and introduce a biologically important transformation to aid in comparison with real biological outcome prediction in the steady-state domain. Our technique produces biologically faithful models without the need for rate kinetics, detailed timing information, or complex inference procedures. To demonstrate the method, we produce a model using 28 pathways from the biological literature pertaining to the transcription factor family nuclear factor-κB. Predictions from this model in the steady-state domain are then validated against nine mice knockout experiments.
Contemporary high-throughput technologies provide measurements of very large numbers of variables but often with very small sample sizes. This paper proposes an optimization-based paradigm for utilizing prior knowledge to design better performing classifiers when sample sizes are limited. We derive approximate expressions for the first and second moments of the true error rate of the proposed classifier under the assumption of two widely-used models for the uncertainty classes; ε-contamination and p-point classes. The applicability of the approximate expressions is discussed by defining the problem of finding optimal regularization parameters through minimizing the expected true error. Simulation results using the Zipf model show that the proposed paradigm yields improved classifiers that outperform traditional classifiers that use only training data. Our application of interest involves discrete gene regulatory networks possessing labeled steady-state distributions. Given prior operational knowledge of the process, our goal is to build a classifier that can accurately label future observations obtained in the steady state by utilizing both the available prior knowledge and the training data. We examine the proposed paradigm on networks containing NF-κB pathways, where it shows significant improvement in classifier performance over the classical data-only approach to classifier design. Companion website: http://gsp.tamu.edu/Publications/supplementary/shahrokh12a.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.