Wei‐Chen Chen scite author profile

Extracting biologically meaningful information from the continuing flood of genomic data is a major challenge in the life sciences. Codon usage bias (CUB) is a general feature of most genomes and is thought to reflect the effects of both natural selection for efficient translation and mutation bias. Here we present a mechanistically interpretable, Bayesian model (ribosome overhead costs Stochastic Evolutionary Model of Protein Production Rate [ROC SEMPPR]) to extract meaningful information from patterns of CUB within a genome. ROC SEMPPR is grounded in population genetics and allows us to separate the contributions of mutational biases and natural selection against translational inefficiency on a gene-by-gene and codon-by-codon basis. Until now, the primary disadvantage of similar approaches was the need for genome scale measurements of gene expression. Here, we demonstrate that it is possible to both extract accurate estimates of codon-specific mutation biases and translational efficiencies while simultaneously generating accurate estimates of gene expression, rather than requiring such information. We demonstrate the utility of ROC SEMPPR using the Saccharomyces cerevisiae S288c genome. When we compare our model fits with previous approaches we observe an exceptionally high agreement between estimates of both codon-specific parameters and gene expression levels (ρ>0.99 in all cases). We also observe strong agreement between our parameter estimates and those derived from alternative data sets. For example, our estimates of mutation bias and those from mutational accumulation experiments are highly correlated (ρ=0.95). Our estimates of codon-specific translational inefficiencies and tRNA copy number-based estimates of ribosome pausing time (ρ=0.64), and mRNA and ribosome profiling footprint-based estimates of gene expression (ρ=0.53−0.74) are also highly correlated, thus supporting the hypothesis that selection against translational inefficiency is an important force driving the evolution of CUB. Surprisingly, we find that for particular amino acids, codon usage in highly expressed genes can still be largely driven by mutation bias and that failing to take mutation bias into account can lead to the misidentification of an amino acid’s “optimal” codon. In conclusion, our method demonstrates that an enormous amount of biologically important information is encoded within genome scale patterns of codon usage, accessing this information does not require gene expression measurements, but instead carefully formulated biologically interpretable models.

show abstract

MixSim: AnRPackage for Simulating Data to Study Performance of Clustering Algorithms

Melnykov¹,

Chen²,

Maitra³

2012

J. Stat. Soft.

129

View full text Add to dashboard Cite

The R package MixSim is a new tool that allows simulating mixtures of Gaussian distributions with different levels of overlap between mixture components. Pairwise overlap, defined as a sum of two misclassification probabilities, measures the degree of interaction between components and can be readily employed to control the clustering complexity of datasets simulated from mixtures. These datasets can then be used for systematic performance investigation of clustering and finite mixture modeling algorithms. Among other capabilities of MixSim, there are computing the exact overlap for Gaussian mixtures, simulating Gaussian and non-Gaussian data, simulating outliers and noise variables, calculating various measures of agreement between two partitionings, and constructing parallel distribution plots for the graphical display of finite mixture models. All features of the package are illustrated in great detail. The utility of the package is highlighted through a small comparison study of several popular clustering algorithms.

show abstract

Survey of Briarane-related Diterpenoids — Part II

Sung¹,

Chang²,

Fang³

et al. 2005

HETEROCYCLES

View full text Add to dashboard Cite

Estimating gene expression and codon specific translational efficiencies, mutation biases, and selection coefficients from genomic data alone.

Gilchrist

Chen

Shah

et al. 2014

Preprint

View full text Add to dashboard Cite

Extracting biologically meaningful information from the continuing flood of genomic data is a major challenge in the life sciences. Codon usage bias (CUB) is a general feature of most genomes and is thought to reflect the effects of both natural selection for efficient translation and mutation bias. Here we present a mechanistically interpretable, Bayesian model (ROC SEMPPR) to extract biologically meaningful information from patterns of CUB within a genome. ROC SEMPPR, is grounded in population genetics and allows us to separate the contributions of mutational biases and natural selection against translational inefficiency on a gene by gene and codon by codon basis. Until now, the primary disadvantage of similar approaches was the need for genome scale measurements of gene expression. Here we demonstrate that it is possible to both extract accurate estimates of codon specific mutation biases and translational efficiencies while simultaneously generating accurate estimates of gene expression, rather than requiring such information. We demonstrate the utility of ROC SEMPPR using the S. cerevisiae S288c genome. When we compare our model fits with previous approaches we observe an exceptionally high agreement between estimates of both codon specific parameters and gene expression levels (ρ > 0.99 in all cases). We also observe strong agreement between our parameter estimates and those derived from alternative datasets. For example, our estimates of mutation bias and those from mutational accumulation exper-iments are highly correlated (ρ = 0.95). Our estimates of codon specific translational inefficiencies are tRNA copy number based estimates of ribosome pausing time (ρ = 0.64), and mRNA and ribosome profiling footprint based estimates of gene expression (ρ = 0.53 − 0.74) are also highly correlated, thus supporting the hypothesis that selection against translational inefficiency is an important force driving the evolution of CUB. Surprisingly, we find that for particular amino acids, codon usage in highly expressed genes can still be largely driven by mutation bias and that failing to take mutation bias into account can lead to the misidentification of an amino acid’s ‘optimal’ codon. In conclusion, our method demonstrates that an enormous amount of biologically important information is encoded within genome scale patterns of codon usage, accessing this information does not require gene expression measurements, but instead carefully formulated biologically interpretable models.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Wei‐Chen Chen

Estimating Gene Expression and Codon-Specific Translational Efficiencies, Mutation Biases, and Selection Coefficients from Genomic Data Alone ‡

MixSim: AnRPackage for Simulating Data to Study Performance of Clustering Algorithms

Survey of Briarane-related Diterpenoids — Part II

Estimating gene expression and codon specific translational efficiencies, mutation biases, and selection coefficients from genomic data alone.

Contact Info

Product

Resources

About