Massively-parallel cDNA sequencing has opened the way to deep and efficient probing of transcriptomes. Current approaches for transcript reconstruction from such data often rely on aligning reads to a reference genome, and are thus unsuitable for samples with a partial or missing reference genome. Here, we present the Trinity methodology for de novo full-length transcriptome reconstruction, and evaluate it on samples from fission yeast, mouse, and whitefly – an insect whose genome has not yet been sequenced. Trinity fully reconstructs a large fraction of the transcripts present in the data, also reporting alternative splice isoforms and transcripts from recently duplicated genes. In all cases, Trinity performs better than other available de novo transcriptome assembly programs, and its sensitivity is comparable to methods relying on genome alignments. Our approach provides a unified and general solution for transcriptome reconstruction in any sample, especially in the complete absence of a reference genome.
De novo assembly of RNA-Seq data allows us to study transcriptomes without the need for a genome sequence, such as in non-model organisms of ecological and evolutionary importance, cancer samples, or the microbiome. In this protocol, we describe the use of the Trinity platform for de novo transcriptome assembly from RNA-Seq data in non-model organisms. We also present Trinity’s supported companion utilities for downstream applications, including RSEM for transcript abundance estimation, R/Bioconductor packages for identifying differentially expressed transcripts across samples, and approaches to identify protein coding genes. In an included tutorial we provide a workflow for genome-independent transcriptome analysis leveraging the Trinity platform. The software, documentation and demonstrations are freely available from http://trinityrnaseq.sf.net.
Much of a cell's activity is organized as a network of interacting modules: sets of genes coregulated to respond to different conditions. We present a probabilistic method for identifying regulatory modules from gene expression data. Our procedure identifies modules of coregulated genes, their regulators and the conditions under which regulation occurs, generating testable hypotheses in the form 'regulator X regulates module Y under conditions W'. We applied the method to a Saccharomyces cerevisiae expression data set, showing its ability to identify functionally coherent modules and their correct regulators. We present microarray experiments supporting three novel predictions, suggesting regulatory roles for previously uncharacterized proteins.
DNA hybridization arrays simultaneously measure the expression level for thousands of genes. These measurements provide a "snapshot" of transcription levels within the cell. A major challenge in computational biology is to uncover, from such measurements, gene/protein interactions and key biological features of cellular systems. In this paper, we propose a new framework for discovering interactions between genes based on multiple expression measurements. This framework builds on the use of Bayesian networks for representing statistical dependencies. A Bayesian network is a graph-based model of joint multivariate probability distributions that captures properties of conditional independence between variables. Such models are attractive for their ability to describe complex stochastic processes and because they provide a clear methodology for learning from (noisy) observations. We start by showing how Bayesian networks can describe interactions between genes. We then describe a method for recovering gene interactions from microarray data using tools for learning Bayesian networks. Finally, we demonstrate this method on the S. cerevisiae cell-cycle measurements of Spellman et al. (1998).
SUMMARY Genetic screens help infer gene function in mammalian cells, but it has remained difficult to assay complex phenotypes – such as transcriptional profiles – at scale. Here, we develop Perturb-seq, combining single cell RNA-seq and CRISPR based perturbations to perform many such assays in a pool. We demonstrate Perturb-seq by analyzing 200,000 cells in immune cells and cell lines, focusing on transcription factors regulating the response of dendritic cells to lipopolysaccharide (LPS). Perturb-seq accurately identifies individual gene targets, gene signatures, and cell states affected by individual perturbations and their genetic interactions. We posit new functions for regulators of differentiation, the anti-viral response, and mitochondrial function during immune activation. By decomposing many high content measurements into the effects of perturbations, their interactions, and diverse cell metadata, Perturb-seq dramatically increases the scope of pooled genomic assays.
In a living cell, gene expression--the transcription of DNA to messenger RNA followed by translation to protein--occurs stochastically, as a consequence of the low copy number of DNA and mRNA molecules involved. These stochastic events of protein production are difficult to observe directly with measurements on large ensembles of cells owing to lack of synchronization among cells. Measurements so far on single cells lack the sensitivity to resolve individual events of protein production. Here we demonstrate a microfluidic-based assay that allows real-time observation of the expression of beta-galactosidase in living Escherichia coli cells with single molecule sensitivity. We observe that protein production occurs in bursts, with the number of molecules per burst following an exponential distribution. We show that the two key parameters of protein expression--the burst size and frequency--can be either determined directly from real-time monitoring of protein production or extracted from a measurement of the steady-state copy number distribution in a population of cells. Application of this assay to probe gene expression in individual budding yeast and mouse embryonic stem cells demonstrates its generality. Many important proteins are expressed at low levels, and are thus inaccessible by current genomic and proteomic techniques. This microfluidic single cell assay opens up possibilities for system-wide characterization of the expression of these low copy number proteins.
While many individual transcription factors are known to regulate hematopoietic differentiation, major aspects of the global architecture of hematopoiesis remain unknown. Here, we profiled gene expression in 38 distinct purified populations of human hematopoietic cells and used probabilistic models of gene expression and analysis of cis-elements in gene promoters to decipher the general organization of their regulatory circuitry. We identified modules of highly co-expressed genes, some of which are restricted to a single lineage, but most are expressed at variable levels across multiple lineages. We found densely interconnected cis-regulatory circuits and a large number of transcription factors that are differentially expressed across hematopoietic states. These findings suggest a more complex regulatory system for hematopoiesis than previously assumed.
SUMMARY Epigenetic information can be inherited through the mammalian germline, and represents a plausible transgenerational carrier of environmental information. To test whether transgenerational inheritance of environmental information occurs in mammals, we carried out an expression profiling screen for genes in mice that responded to paternal diet. Offspring of males fed a low protein diet exhibited elevated hepatic expression of many genes involved in lipid and cholesterol biosynthesis, and decreased levels of cholesterol esters, relative to the offspring of males fed a control diet. Epigenomic profiling of offspring livers revealed numerous modest (~20%) changes in cytosine methylation depending on paternal diet, including reproducible changes in methylation over a likely enhancer for the key lipid regulator PPARα. These results, in conjunction with recent human epidemiological data, indicate that parental diet can affect cholesterol and lipid metabolism in offspring, and define a model system to study environmental reprogramming of the heritable epigenome.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.