Translation of mRNA into protein is a fundamental yet complex biological process with multiple factors that can potentially affect its efficiency. Here, we study a stochastic model describing the traffic flow of ribosomes along the mRNA (namely, the inhomogeneous -TASEP), and identify the key parameters that govern the overall rate of protein synthesis, sensitivity to initiation rate changes, and efficiency of ribosome usage. By analyzing a continuum limit of the model, we obtain closed-form expressions for stationary currents and ribosomal densities, which agree well with Monte Carlo simulations. Furthermore, we completely characterize the phase transitions in the system, and by applying our theoretical results, we formulate design principles that detail how to tune the key parameters we identified to optimize translation efficiency. Using ribosome profiling data from S. cerevisiae, we shows that its translation system is generally consistent with these principles. Our theoretical results have implications for evolutionary biology, as well as synthetic biology.
Direct comparison of bulk gene expression profiles is complicated by distinct cell type mixtures in each sample that obscure whether observed differences are actually caused by changes in the expression levels themselves or are simply a result of differing cell type compositions. Single-cell technology has made it possible to measure gene expression in individual cells, achieving higher resolution at the expense of increased noise. If carefully incorporated, such single-cell data can be used to deconvolve bulk samples to yield accurate estimates of the true cell type proportions, thus enabling one to disentangle the effects of differential expression and cell type mixtures. Here, we propose a generative model and a likelihood-based inference method that uses asymptotic statistical theory and a novel optimization procedure to perform deconvolution of bulk RNA-seq data to produce accurate cell type proportion estimates. We show the effectiveness of our method, called RNA-Sieve, across a diverse array of scenarios involving real data and discuss extensions made uniquely possible by our probabilistic framework, including a demonstration of well-calibrated confidence intervals.
Direct comparison of bulk gene expression profiles is complicated by distinct cell type mixtures in each sample which obscure whether observed differences are actually due to changes in expression levels themselves or simply cell type compositions. Single-cell technology has made it possible to measure gene expression in individual cells, achieving higher resolution at the expense of increased noise. If carefully incorporated, such single-cell data can be used to deconvolve bulk samples to yield accurate estimates of the true cell type proportions, thus enabling one to disentangle the effects of differential expression and cell type mixtures. Here, we propose a generative model and a likelihood-based inference method that uses asymptotic statistical theory and a novel optimization procedure to perform deconvolution of bulk RNA-seq data to produce accurate cell type proportion estimates. We demonstrate the effectiveness of our method, called RNA-Sieve, across a diverse array of scenarios involving real data and discuss several extensions made uniquely possible by our probabilistic framework, including general hypotheses tests and confidence intervals.
Translation of mRNA into protein is a fundamental yet complex biological process with multiple factors that can potentially affect its efficiency. In particular, different genes can have quite different initiation rates, while site-specific elongation rates can vary substantially along a given transcript. Here, we analyze a stochastic model of translation dynamics to identify the key parameters that govern the overall rate of protein synthesis and the efficiency of ribosome usage. The mathematical model we study is an interacting particle system that generalizes the Totally Asymmetric Simple Exclusion Process (TASEP), where particles correspond to ribosomes. While the TASEP and its variants have been studied for the past several decades through simulations and mean field approximations, a general analytic solution has remained challenging to obtain. By analyzing the so-called hydrodynamic limit, we here obtain exact closed-form expressions for stationary currents and particle densities that agree well with Monte Carlo simulations. In addition, we provide a complete characterization of phase transitions in the system. Surprisingly, phase boundaries depend on only four parameters: the particle size, and the first, last and minimum particle jump rates. Relating these theoretical results to translation, we formulate four design principles that detail how to tune these parameters to optimize translation efficiency in terms of protein production rate and resource usage. We then analyze ribosome profiling data of S. cerevisiae and demonstrate that its translation system is generally efficient, consistent with the design principles we found. We discuss implications of our findings on evolutionary constraints and codon usage bias.
Chromosomal rearrangements can initiate and drive cancer progression, yet it has been challenging to evaluate their impact, especially in genetically heterogeneous solid cancers. To address this problem we developed HiDENSEC, a new computational framework for analyzing chromatin conformation capture in heterogeneous samples, which can infer somatic copy number alterations, characterize large-scale chromosomal rearrangements, and estimate cancer cell fractions. We validated HiDENSEC with in silico and in vitro controls, and then characterized chromosome-scale evolution during melanoma progression in formalin-fixed tumor samples from three patients. The resulting comprehensive annotation of the genomic events includes copy number neutral translocations that disrupt tumor suppressor genes such as NF1, whole chromosome arm exchanges that result in loss of CDKN2A, and whole-arm copy-number neutral loss of homozygosity involving PTEN. These findings show that large-scale chromosomal rearrangements occur throughout cancer evolution and characterizing these events yields insights into drivers of melanoma progression.
Random divisions of an interval arise in various context, including statistics, physics, and geometric analysis. For testing the uniformity of a random partition of the unit interval [0, 1] into k disjoint subintervals of size (S k 1 , . . . , S k k ), Greenwood (1946) suggested using the squared 2 -norm of this size vector as a test statistic, prompting a number of subsequent studies. Despite much progress on understanding its power and asymptotic properties, attempts to find its exact distribution have succeeded so far for only small values of k. Here, we develop an efficient method to compute the distribution of the Greenwood statistic and more general spacing-statistics for an arbitrary value of k. Specifically, we consider random divisions of {1, 2, . . . , n} into k subsets of consecutive integers and study S n,k p p,w , the pth power of the weighted p -norm of the subset size vector S n,k = (S n,k 1 , . . . , S n,k k ) for arbitrary weights w = (w 1 , . . . , w k ). We present an exact and quickly computable formula for its moments, as well as a simple algorithm to accurately reconstruct a probability distribution using the moment sequence. We also study various scaling limits, one of which corresponds to the Greenwood statistic in the case of p = 2 and w = (1, . . . , 1), and this connection allows us to obtain information about regularity, monotonicity and local behavior of its distribution. Lastly, we devise a new family of non-parametric tests using S n,k p p,w and demonstrate that they exhibit substantially improved power for a large class of alternatives, compared to existing popular methods such as the Kolmogorov-Smirnov, Cramér-von Mises, and Mann-Whitney/Wilcoxon rank-sum tests.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.