Supplementary data are available at Bioinformatics online.
As computational and mathematical studies become increasingly central to studies of complicated reaction systems, it will become ever more important to identify the assumptions our models must make and determine when those assumptions are valid. Here, we examine that question with respect to viral capsid assembly by studying the 'pathway complexity' of model capsid assembly systems, which we informally define as the number of reaction pathways and intermediates one must consider to accurately describe a given system. We use two model types for this study: ordinary differential equation models, which allow us to precisely and deterministically compare the accuracy of capsid models under different degrees of simplification, and stochastic discrete event simulations, which allow us to sample use of reaction intermediates across a wide parameter space allowing for an extremely large number of possible reaction pathways. The models provide complementary information in support of a common conclusion that the ability of simple pathway models to adequately explain capsid assembly kinetics varies considerably across the space of biologically meaningful assembly parameters. These studies provide grounds for caution regarding our ability to reliably represent real systems with simple models and to extrapolate results from one set of assembly conditions to another. In addition, the analysis tools developed for this study are likely to have broader use in the analysis and efficient simulation of large reaction systems.
Synthetic lethal interactions in cancer hold the potential for successful combined therapies, which would avoid the difficulties of single molecule-targeted treatment. Identification of interactions that are specific for human tumors is an open problem in cancer research. This work aims at deciphering synthetic sick or lethal interactions directly from somatic alteration, expression and survival data of cancer patients. To this end, we look for pairs of genes and their alterations or expression levels that are "avoided" by tumors and "beneficial" for patients. Thus, candidates for synthetic sickness or lethality (SSL) interaction are identified as such gene pairs whose combination of states is under-represented in the data. Our main methodological contribution is a quantitative score that allows ranking of the candidate SSL interactions according to evidence found in patient survival. Applying this analysis to glioblastoma data, we collect 1,956 synthetic sick or lethal partners for 85 abundantly altered genes, most of which show extensive copy number variation across the patient cohort. We rediscover and interpret known interaction between TP53 and PLK1, as well as provide insight into the mechanism behind EGFR interacting with AKT2, but not AKT1 nor AKT3. Cox model analysis determines 274 of identified interactions as having significant impact on overall survival in glioblastoma, which is more informative than a standard survival predictor based on patient's age.Single molecule-targeted therapies, the dominant tool for cancer treatment, have limited efficacy due to toxicity 1 and rapid development of drug resistance.2-4 Combination therapies based on synthetic sickness or lethality (SSL) are hoped to overcome these difficulties 5 and promise successful treatment strategies.6,7 The mechanism behind SSL-based therapy is that while targeting individual genes in a given interacting pair has a moderate effect, targeting both either kills, or significantly decreases tumor viability.Compared to the comprehensive collection of synthetic lethal gene pairs in yeast, 8 the set of known SSL interactions in human cancer is disappointingly small 9 and their identification remains an open problem. Experimental approaches are overwhelmed by the quadratic number of possible pairs, and can only be applied to cell lines.7 High-throughput studies focus on single, abundantly altered genes (called primary genes), such as KRAS, 10 or PI3K, 11 and screen through their possible partner genes. Alternatively, a small set of plausible genes is selected for testing, for example, based upon their function in DNA repair. 12,13 Existing predictive computational methods 14-17 require large training datasets of known genetic interactions, that are only available for few simple model organisms.18 Genome-wide association studies 19-21 are limited to estimating cancer risk associated with certain single-nucleotide polymorphisms in the germline. identify SSL interactions in humans based on evolutionary conservation to yeast, which are likely inco...
Abstract. Accurate reconstruction of phylogenies remains a key challenge in evolutionary biology. Most biologically plausible formulations of the problem are formally NP-hard, with no known efficient solution. The standard in practice are fast heuristic methods that are empirically known to work very well in general, but can yield results arbitrarily far from optimal. Practical exact methods, which yield exponential worstcase running times but generally much better times in practice, provide an important alternative. We report progress in this direction by introducing a provably optimal method for the weighted multi-state maximum parsimony phylogeny problem. The method is based on generalizing the notion of the Buneman graph, a construction key to efficient exact methods for binary sequences, so as to apply to sequences with arbitrary finite numbers of states with arbitrary state transition weights. We implement an integer linear programming (ILP) method for the multi-state problem using this generalized Buneman graph and demonstrate that the resulting method is able to solve data sets that are intractable by prior exact methods in run times comparable with popular heuristics. Our work provides the first method for provably optimal maximum parsimony phylogeny inference that is practical for multi-state data sets of more than a few characters.
Models of reaction chemistry based on the stochastic simulation algorithm (SSA) have become a crucial tool for simulating complicated biological reaction networks due to their ability to handle extremely complicated networks and to represent noise in small-scale chemistry. These methods can, however, become highly inefficient for stiff reaction systems, those in which different reaction channels operate on widely varying time scales. In this paper, we develop two methods for accelerating sampling in SSA models: an exact method and a scheme allowing for sampling accuracy up to any arbitrary error bound. Both methods depend on the analysis of the eigenvalues of continuous time Markov models that define the behavior of the SSA. We show how each can be applied to accelerate sampling within known Markov models or to subgraphs discovered automatically during execution. We demonstrate these methods for two applications of sampling in stiff SSAs that are important for modeling self-assembly reactions: sampling breakage times for multiply connected bond networks and sampling assembly times for multisubunit nucleation reactions. We show theoretically and empirically that our eigenvalue methods provide substantially reduced sampling times for a large class of models used in simulating self-assembly. These techniques are also likely to have broader use in accelerating SSA models so as to apply them to systems and parameter ranges that are currently computationally intractable.
Accurate reconstruction of phylogenies remains a key challenge in evolutionary biology. Most biologically plausible formulations of the problem are formally NP-hard, with no known efficient solution. The standard in practice are fast heuristic methods that are empirically known to work very well in general, but can yield results arbitrarily far from optimal. Practical exact methods, which yield exponential worst-case running times but generally much better times in practice, provide an important alternative. We report progress in this direction by introducing a provably optimal method for the weighted multi-state maximum parsimony phylogeny problem. The method is based on generalizing the notion of the Buneman graph, a construction key to efficient exact methods for binary sequences, so as to apply to sequences with arbitrary finite numbers of states with arbitrary state transition weights. We implement an integer linear programming (ILP) method for the multi-state problem using this generalized Buneman graph and demonstrate that the resulting method is able to solve data sets that are intractable by prior exact methods in run times comparable with popular heuristics. We further show on a collection of less difficult problem instances that the ILP method leads to large reductions in average-case run times relative to leading heuristics on moderately hard problems. Our work provides the first method for provably optimal maximum parsimony phylogeny inference that is practical for multi-state data sets of more than a few characters.
Much modern work in phylogenetics depends on statistical sampling approaches to phylogeny construction to estimate probability distributions of possible trees for any given input data set. Our theoretical understanding of sampling approaches to phylogenetics remains far less developed than that for optimization approaches, however, particularly with regard to the number of sampling steps needed to produce accurate samples of tree partition functions. Despite the many advantages in principle of being able to sample trees from sophisticated probabilistic models, we have little theoretical basis for concluding that the prevailing sampling approaches do in fact yield accurate samples from those models within realistic numbers of steps. We propose a novel approach to phylogenetic sampling intended to be both efficient in practice and more amenable to theoretical analysis than the prevailing methods. The method depends on replacing the standard tree rearrangement moves with an alternative Markov model in which one solves a theoretically hard but practically tractable optimization problem on each step of sampling. The resulting method can be applied to a broad range of standard probability models, yielding practical algorithms for efficient sampling and rigorous proofs of accurate sampling for heated versions of some important special cases. We demonstrate the efficiency and versatility of the method by an analysis of uncertainty in tree inference over varying input sizes. In addition to providing a new practical method for phylogenetic sampling, the technique is likely to prove applicable to many similar problems involving sampling over combinatorial objects weighted by a likelihood model.
Stable random variables are motivated by the central limit theorem for densities with (potentially) unbounded variance and can be thought of as natural generalizations of the Gaussian distribution to skewed and heavy-tailed phenomenon. In this paper, we introduce α-stable graphical (α-SG) models, a class of multivariate stable densities that can also be represented as Bayesian networks whose edges encode linear dependencies between random variables. One major hurdle to the extensive use of stable distributions is the lack of a closed-form analytical expression for their densities. This makes penalized maximumlikelihood based learning computationally demanding. We establish theoretically that the Bayesian information criterion (BIC) can asymptotically be reduced to the computationally more tractable minimum dispersion criterion (MDC) and develop StabLe, a structure learning algorithm based on MDC. We use simulated datasets for five benchmark network topologies to empirically demonstrate how StabLe improves upon ordinary least squares (OLS) regression. We also apply StabLe to microarray gene expression data for lymphoblastoid cells from 727 individuals belonging to eight global population groups. We establish that StabLe improves test set performance relative to OLS via ten-fold cross-validation. Finally, we develop SGEX, a method for quantifying differential expression of genes between different population groups.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.