DNA hybridization arrays simultaneously measure the expression level for thousands of genes. These measurements provide a "snapshot" of transcription levels within the cell. A major challenge in computational biology is to uncover, from such measurements, gene/protein interactions and key biological features of cellular systems. In this paper, we propose a new framework for discovering interactions between genes based on multiple expression measurements. This framework builds on the use of Bayesian networks for representing statistical dependencies. A Bayesian network is a graph-based model of joint multivariate probability distributions that captures properties of conditional independence between variables. Such models are attractive for their ability to describe complex stochastic processes and because they provide a clear methodology for learning from (noisy) observations. We start by showing how Bayesian networks can describe interactions between genes. We then describe a method for recovering gene interactions from microarray data using tools for learning Bayesian networks. Finally, we demonstrate this method on the S. cerevisiae cell-cycle measurements of Spellman et al. (1998).
DNA hybridization arrays simultaneously measure the expression level for thousands of genes. These measurements provide a "snapshot" of transcription levels within the cell. A major challenge in computational biology is to uncover, from such measurements, gene/protein interactions and key biological features of cellular systems. In this paper, we propose a new framework for discovering interactions between genes based on multiple expression measurements. This framework builds on the use of Bayesian networks for representing statistical dependencies. A Bayesian network is a graph-based model of joint multivariate probability distributions that captures properties of conditional independence between variables. Such models are attractive for their ability to describe complex stochastic processes and because they provide a clear methodology for learning from (noisy) observations. We start by showing how Bayesian networks can describe interactions between genes. We then describe a method for recovering gene interactions from microarray data using tools for learning Bayesian networks. Finally, we demonstrate this method on the S. cerevisiae cell-cycle measurements of Spellman et al. (1998).
Constantly improving gene expression profiling technologies are expected to provide understanding and insight into cancer-related cellular processes. Gene expression data is also expected to significantly aid in the development of efficient cancer diagnosis and classification platforms. In this work we examine three sets of gene expression data measured across sets of tumor(s) and normal clinical samples: The first set consists of 2,000 genes, measured in 62 epithelial colon samples (Alon et al., 1999). The second consists of approximately equal to 100,000 clones, measured in 32 ovarian samples (unpublished extension of data set described in Schummer et al. (1999)). The third set consists of approximately equal to 7,100 genes, measured in 72 bone marrow and peripheral blood samples (Golub et al, 1999). We examine the use of scoring methods, measuring separation of tissue type (e.g., tumors from normals) using individual gene expression levels. These are then coupled with high-dimensional classification methods to assess the classification power of complete expression profiles. We present results of performing leave-one-out cross validation (LOOCV) experiments on the three data sets, employing nearest neighbor classifier, SVM (Cortes and Vapnik, 1995), AdaBoost (Freund and Schapire, 1997) and a novel clustering-based classification technique. As tumor samples can differ from normal samples in their cell-type composition, we also perform LOOCV experiments using appropriately modified sets of genes, attempting to eliminate the resulting bias. We demonstrate success rate of at least 90% in tumor versus normal classification, using sets of selected genes, with, as well as without, cellular-contamination-related members. These results are insensitive to the exact selection mechanism, over a certain range.
SummaryGeneration of induced pluripotent stem cells is a reproducible but inefficient procedure. While genomic approaches have previously been used to study reprogramming, they average measurements across a large population of cells, the majority of which fail to induce pluripotency. Here, we used high-resolution, live time-lapse imaging to trace the reprogramming process from single donor cells to pluripotency factor positive colonies. Tracing back successfully reprogrammed colonies, we calculate a normalized cell-of-origin reprogramming efficiency that is limited to the pool of responding cells that form colonies. Our data provided a detailed physical description of the specific characteristics of reprogramming populations and reveal a robust, sequential trajectory from a somatic morphology and proliferative index to those of pluripotent cells, suggestive of an early specifying event. Our results clarify and expand previously proposed theoretical models, and provide important new insights into the still poorly defined process of direct reprogramming.Ectopic expression of Oct4 and Sox2 in combination with both Klf4 and c-Myc (OSKM) 1 , Klf4 alone 2 , 3, Lin28 and Nanog 4 or Esrrb 5 is sufficient to reprogram somatic cells to a pluripotent state. These induced pluripotent stem (iPS) cells exhibit many of the molecular and functional characteristics of embryonic stem (ES) cells 6. While iPS cell technology has progressed dramatically within the past three years (reviewed in 7), the extended latency and low efficiency of reprogramming events within induced populations obscure efforts to characterize the underlying mechanism 8. One simple model suggests that progressive proliferation allows for the accumulation of factor-mediated stochastic events that lead select members through a path towards pluripotency. In an alternative model, the likelihood of iPS cell colony formation is specified at an earlier time, after which the resulting path is more defined 8 , 9 ,10 . Population-level measurements typically done in reprogramming studies cannot distinguish between these stochastic or more sequential events. To study reprogramming at the single cell level, we developed a live cell, high throughput imaging system based on previously characterized, clonally inducible murine embryonic fibroblast (MEFs) 11 (Supplementary Fig. 1,2). High-resolution transmitted light images (Fig. 1a, upper panels) taken along a 12-day time course from the initial fibroblasts to the final iPS cell colonies show that even at low starting cell densities it is virtually impossible to accurately follow the progeny of a single cell over the course of days. To facilitate tracking of individual cells, we transduced MEFs with one of several lentiviral vectors encoding different fluorescent proteins and seeded them at variable densities into unlabeled populations ( Fig. 1a; lower panels and Supplementary Movie 1).Our system allows us to trace multiple discrete reprogramming "lineages" from parental fibroblast to terminal iPS cell colony. We acquired i...
Constantly improving gene expression profiling technologies are expected to provide understanding and insight into cancer related cellular processes. Gene expression data is also expected to sigmficantly aad m the development of efficient cancer diagnosis and classification platforms. In this work we examine two sets of gene expression data measured across sets of tumor and normal clinical samples One set consists of 2,000 genes, measured in 62 epithelial colon samples [1]. The second consists of ~ 100,000 clones, measured in 32 ovarian samples (unpublished, extension of data set described in [26]).We examine the use of scoring methods, measuring separation of tumors from normals using individual gene expression levels. These are then coupled with high dimensional classification methods to assess the classification power of complete expression profiles. We present results of performing leave-one-out cross vahdatlon (LOOCV) experiments on the two data sets. employing SVM [8], AdaBoost [13] and a novel clustering based classification technique. As tumor samples can differ from normal samples in their cell-type compositmn we also perform LOOCV experiments using appropriately modified sets of genes, attempting to eliminate the resulting bias.We demonstrate success rate of at least 90% in tumor vs normal classification, using sets of selected genes, with as well as w~thout cellular contamination related members. These results are insensitive to the exact selection mechanism, over a certain range.
We present fine-grained dynamical models of gene transcription and develop methods for reconstructing them from gene expression data within the framework of a generative probabilistic model. Unlike previous works, we employ quantitative transcription rates, and simultaneously estimate both the kinetic parameters that govern these rates, and the activity levels of unobserved regulators that control them. We apply our approach to expression datasets from yeast and show that we can learn the unknown regulator activity profiles, as well as the binding affinity parameters. We also introduce a novel structure learning algorithm, and demonstrate its power to accurately reconstruct the regulatory network from those datasets.
Cell-to-cell variability in the timing of cell-fate changes can be advantageous for a population of single-celled organisms growing in a fluctuating environment. We study timing variability during meiosis in Saccharomyces cerevisiae, initiated upon nutritional starvation. We use time-lapse fluorescence microscopy to measure the timing of meiotic events in single cells and find that the duration of meiosis is highly variable between cells. This variability is concentrated between the beginning of starvation and the onset of early meiosis genes. Cell-cycle variability and nutritional history have little effect on this timing variability. Rather, variation in the production rate of the meiotic master regulator Ime1 and its gradual increase over time govern this variability, and cell size effects are channeled through Ime1. These results tie phenotypic variability with expression dynamics of a transcriptional regulator and provide a general framework for the study of temporal developmental processes.
Aging-related neurodegenerative disorders, such as Parkinson's, Alzheimer's and Huntington's diseases, are characterized by accumulation of protein aggregates in distinct neuronal cells that eventually die. In Huntington's disease, the protein huntingtin forms aggregates, and the age of disease onset is inversely correlated to the length of the protein's poly-glutamine tract. Using quantitative assays to estimate microscopically and capture biochemically protein aggregates, here we study in Saccharomyces cerevisiae aging-related aggregation of GFP-tagged, huntingtin-derived proteins with different polyQ lengths. We find that the short 25Q protein never aggregates whereas the long 103Q version always aggregates. However, the mid-size 47Q protein is soluble in young logarithmically growing yeast but aggregates as the yeast cells enter the stationary phase and age, allowing us to plot an “aggregation timeline”. This aging-dependent aggregation was associated with increased cytotoxicity. We also show that two aging-related genes, SIR2 and HSF1, affect aggregation of the polyQ proteins. In Δsir2 strain the aging-dependent aggregation of the 47Q protein is aggravated, while overexpression of the transcription factor Hsf1 attenuates aggregation. Thus, the mid-size 47Q protein and our quantitative aggregation assays provide valuable tools to unravel the roles of genes and environmental conditions that affect aging-related aggregation.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.