The liver and the kidney are the most common targets of chemical toxicity, due to their major metabolic and excretory functions. However, since the liver is directly involved in biotransformation, compounds in many currently and normally used drugs could affect it adversely. Most chemical compounds are already labeled according to FDA-approved labels using DILI-concern scale. Drug Induced Liver Injury (DILI) scale refers to an adverse drug reaction. Many compounds do not exhibit hepatotoxicity at early stages of development, so it is important to detect anomalies at gene expression level that could predict adverse reactions in later stages. In this study, a large collection of microarray data is used to investigate gene expression changes associated with hepatotoxicity. Using TG-GATEs a large-scale toxicogenomics database, we present a computational strategy to classify compounds by toxicity levels in human and animal models through patterns of gene expression. We combined machine learning algorithms with time series analysis to identify genes capable of classifying compounds by FDA-approved labeling as DILI-concern toxic. The goal is to define gene expression profiles capable of distinguishing the different subtypes of hepatotoxicity. The study illustrates that expression profiling can be used to classify compounds according to different hepatotoxic levels; to label those that are currently labeled as undertemined; and to determine if at the molecular level, animal models are a good proxy to predict hepatotoxicity in humans.
Studies conducted in time series could be far more informative than those that only capture a specific moment in time. However, when it comes to transcriptomic data, time points are sparse creating the need for a constant search for methods capable of extracting information out of experiments of this kind. We propose a feature selection algorithm embedded in a hidden Markov model applied to gene expression time course data on either single or even multiple biological conditions. For the latter, in a simple case-control study features or genes are selected under the assumption of no change over time for the control samples, while the case group must have at least one change. The proposed model reduces the feature space according to a two-state hidden Markov model. The two states define change/no-change in gene expression. Features are ranked in consonance with three scores: number of changes across time, magnitude of such changes and quality of replicates as a measure of how much they deviate from the mean. An important highlight is that this strategy overcomes the few samples limitation, common in transcriptome experiments through a process of data transformation and rearrangement. To prove this method, our strategy was applied to three publicly available data sets. Results show that feature domain is reduced by up to 90% leaving only few but relevant features yet with findings consistent to those previously reported. Moreover, our strategy proved to be robust, stable and working on studies where sample size is an issue otherwise. Hence, even with two biological replicates and/or three time points our method proves to work well.
Background A probabilistic graphical model is a representation of searched properties of random variables represented by nodes. The edges in the graph represent conditional independence properties used to obtain a number of valid factorizations of the joint probability distribution. There are different types of graphical models, Bayesian networks for instance, are directed graphical models but Markov random fields are undirected models. Dynamic Bayesian networks (DBN) are Bayesian networks that model time series where edges have a direction and point in the direction of time. The directed edge in a BN can encode either a random or a deterministic relationship between the two variables. The structure of a graphical model represents the way in which a set of random variables probabilistically represents natural and systematic processes such as noisy measurements of the observed gene expression levels at each time point, and how these variables interact in a form of a gene regulatory network.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.