Dan D. Erdmann-Pham scite author profile

Translation of mRNA into protein is a fundamental yet complex biological process with multiple factors that can potentially affect its efficiency. Here, we study a stochastic model describing the traffic flow of ribosomes along the mRNA (namely, the inhomogeneous -TASEP), and identify the key parameters that govern the overall rate of protein synthesis, sensitivity to initiation rate changes, and efficiency of ribosome usage. By analyzing a continuum limit of the model, we obtain closed-form expressions for stationary currents and ribosomal densities, which agree well with Monte Carlo simulations. Furthermore, we completely characterize the phase transitions in the system, and by applying our theoretical results, we formulate design principles that detail how to tune the key parameters we identified to optimize translation efficiency. Using ribosome profiling data from S. cerevisiae, we shows that its translation system is generally consistent with these principles. Our theoretical results have implications for evolutionary biology, as well as synthetic biology.

show abstract

Likelihood-based deconvolution of bulk gene expression data using single-cell references

Erdmann-Pham

Fischer

Hong

et al. 2021

Genome Res.

View full text Add to dashboard Cite

Direct comparison of bulk gene expression profiles is complicated by distinct cell type mixtures in each sample that obscure whether observed differences are actually caused by changes in the expression levels themselves or are simply a result of differing cell type compositions. Single-cell technology has made it possible to measure gene expression in individual cells, achieving higher resolution at the expense of increased noise. If carefully incorporated, such single-cell data can be used to deconvolve bulk samples to yield accurate estimates of the true cell type proportions, thus enabling one to disentangle the effects of differential expression and cell type mixtures. Here, we propose a generative model and a likelihood-based inference method that uses asymptotic statistical theory and a novel optimization procedure to perform deconvolution of bulk RNA-seq data to produce accurate cell type proportion estimates. We show the effectiveness of our method, called RNA-Sieve, across a diverse array of scenarios involving real data and discuss extensions made uniquely possible by our probabilistic framework, including a demonstration of well-calibrated confidence intervals.

show abstract

A likelihood-based deconvolution of bulk gene expression data using single-cell references

Erdmann-Pham

Fischer

Hong

et al. 2020

Preprint

View full text Add to dashboard Cite

Direct comparison of bulk gene expression profiles is complicated by distinct cell type mixtures in each sample which obscure whether observed differences are actually due to changes in expression levels themselves or simply cell type compositions. Single-cell technology has made it possible to measure gene expression in individual cells, achieving higher resolution at the expense of increased noise. If carefully incorporated, such single-cell data can be used to deconvolve bulk samples to yield accurate estimates of the true cell type proportions, thus enabling one to disentangle the effects of differential expression and cell type mixtures. Here, we propose a generative model and a likelihood-based inference method that uses asymptotic statistical theory and a novel optimization procedure to perform deconvolution of bulk RNA-seq data to produce accurate cell type proportion estimates. We demonstrate the effectiveness of our method, called RNA-Sieve, across a diverse array of scenarios involving real data and discuss several extensions made uniquely possible by our probabilistic framework, including general hypotheses tests and confidence intervals.

show abstract

The key parameters that govern translation efficiency

Erdmann-Pham

Duc

Song

2018

Preprint

View full text Add to dashboard Cite

Translation of mRNA into protein is a fundamental yet complex biological process with multiple factors that can potentially affect its efficiency. In particular, different genes can have quite different initiation rates, while site-specific elongation rates can vary substantially along a given transcript. Here, we analyze a stochastic model of translation dynamics to identify the key parameters that govern the overall rate of protein synthesis and the efficiency of ribosome usage. The mathematical model we study is an interacting particle system that generalizes the Totally Asymmetric Simple Exclusion Process (TASEP), where particles correspond to ribosomes. While the TASEP and its variants have been studied for the past several decades through simulations and mean field approximations, a general analytic solution has remained challenging to obtain. By analyzing the so-called hydrodynamic limit, we here obtain exact closed-form expressions for stationary currents and particle densities that agree well with Monte Carlo simulations. In addition, we provide a complete characterization of phase transitions in the system. Surprisingly, phase boundaries depend on only four parameters: the particle size, and the first, last and minimum particle jump rates. Relating these theoretical results to translation, we formulate four design principles that detail how to tune these parameters to optimize translation efficiency in terms of protein production rate and resource usage. We then analyze ribosome profiling data of S. cerevisiae and demonstrate that its translation system is generally efficient, consistent with the design principles we found. We discuss implications of our findings on evolutionary constraints and codon usage bias.

show abstract

EGGTART: A tool to visualize the dynamics of biophysical transport under the inhomogeneous l-TASEP

Erdmann-Pham

Son

Duc

et al. 2021

Biophysical Journal

View full text Add to dashboard Cite

Transferability of Geometric Patterns from Protein Self-Interactions to Protein-Ligand Interactions

Koehl

Jagota

Erdmann-Pham

et al. 2021

View full text Add to dashboard Cite

Tracing cancer evolution and heterogeneity using Hi-C

Erdmann-Pham

Batra

Turkalo

et al. 2023

Preprint

View full text Add to dashboard Cite

Chromosomal rearrangements can initiate and drive cancer progression, yet it has been challenging to evaluate their impact, especially in genetically heterogeneous solid cancers. To address this problem we developed HiDENSEC, a new computational framework for analyzing chromatin conformation capture in heterogeneous samples, which can infer somatic copy number alterations, characterize large-scale chromosomal rearrangements, and estimate cancer cell fractions. We validated HiDENSEC with in silico and in vitro controls, and then characterized chromosome-scale evolution during melanoma progression in formalin-fixed tumor samples from three patients. The resulting comprehensive annotation of the genomic events includes copy number neutral translocations that disrupt tumor suppressor genes such as NF1, whole chromosome arm exchanges that result in loss of CDKN2A, and whole-arm copy-number neutral loss of homozygosity involving PTEN. These findings show that large-scale chromosomal rearrangements occur throughout cancer evolution and characterizing these events yields insights into drivers of melanoma progression.

show abstract

Exact and arbitrarily accurate non-parametric two-sample tests based on rank spacings

Erdmann-Pham¹,

Terhorst²,

Song³

2020

Preprint

View full text Add to dashboard Cite

Random divisions of an interval arise in various context, including statistics, physics, and geometric analysis. For testing the uniformity of a random partition of the unit interval [0, 1] into k disjoint subintervals of size (S k 1 , . . . , S k k ), Greenwood (1946) suggested using the squared 2 -norm of this size vector as a test statistic, prompting a number of subsequent studies. Despite much progress on understanding its power and asymptotic properties, attempts to find its exact distribution have succeeded so far for only small values of k. Here, we develop an efficient method to compute the distribution of the Greenwood statistic and more general spacing-statistics for an arbitrary value of k. Specifically, we consider random divisions of {1, 2, . . . , n} into k subsets of consecutive integers and study S n,k p p,w , the pth power of the weighted p -norm of the subset size vector S n,k = (S n,k 1 , . . . , S n,k k ) for arbitrary weights w = (w 1 , . . . , w k ). We present an exact and quickly computable formula for its moments, as well as a simple algorithm to accurately reconstruct a probability distribution using the moment sequence. We also study various scaling limits, one of which corresponds to the Greenwood statistic in the case of p = 2 and w = (1, . . . , 1), and this connection allows us to obtain information about regularity, monotonicity and local behavior of its distribution. Lastly, we devise a new family of non-parametric tests using S n,k p p,w and demonstrate that they exhibit substantially improved power for a large class of alternatives, compared to existing popular methods such as the Kolmogorov-Smirnov, Cramér-von Mises, and Mann-Whitney/Wilcoxon rank-sum tests.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.