The codon stabilization coefficient (CSC) is derived from the correlation between each codon frequency in transcripts and mRNA half-life experimental data. In this work, we used this metric as a reference to compare previously published Saccharomyces cerevisiae mRNA half-life datasets and investigate how codon composition related to protein levels. We generated CSCs derived from nine studies. Four datasets produced similar CSCs, which also correlated with other independent parameters that reflected codon optimality, such as the tRNA abundance and ribosome residence time. By calculating the average CSC for each gene, we found that most mRNAs tended to have more non-optimal codons. Conversely, a high proportion of optimal codons was found for genes coding highly abundant proteins, including proteins that were only transiently overexpressed in response to stress conditions. We also used CSCs to identify and locate mRNA regions enriched in non-optimal codons. We found that these stretches were usually located close to the initiation codon and were sufficient to slow ribosome movement. However, in contrast to observations from reporter systems, we found no position-dependent effect on the mRNA half-life. These analyses underscore the value of CSCs in studies of mRNA stability and codon bias and their relationships with protein expression.
As proteins are synthesized, the nascent polypeptide must pass through a negatively charged exit tunnel. During this stage, positively charged stretches can interact with the ribosome walls and slow the translation. Therefore, charged polypeptides may be important factors that affect protein expression. To determine the frequency and distribution of positively and negatively charged stretches in different proteomes, the net charge was calculated for every 30 consecutive amino acid residues, which corresponds to the length of the ribosome exit tunnel. The following annotated and reviewed proteins in the UniProt database (Swiss-Prot) were analyzed: 551,705 proteins from different organisms and a total of 180 million protein segments. We observed that there were more negative than positive stretches and that super-charged positive sequences (i.e., net charges ≥ 14) were underrepresented in the proteomes. Overall, the proteins were more positively charged at their N-termini and C-termini, and this feature was present in most organisms and subcellular localizations. To investigate whether the N-terminal charges affect the elongation rates, previously published ribosomal profiling data obtained from S. cerevisiae, without translation-interfering drugs, were analyzed. We observed a nonlinear effect of the charge on the ribosome occupancy in which values ≥ +5 and ≤ -6 showed increased and reduced ribosome densities, respectively. These groups also showed different distributions across 80S monosomes and polysomes. Basic polypeptides are more common within short proteins that are translated by monosomes, whereas negative stretches are more abundant in polysome-translated proteins. These findings suggest that the nascent peptide charge impacts translation and can be one of the factors that regulate translation efficiency and protein expression.
It has been proposed that polybasic peptides cause slower movement of ribosomes through an electrostatic interaction with the highly negative ribosome exit tunnel. Ribosome profiling data-the sequencing of short ribosome-bound fragments of mRNA-is a powerful tool for the analysis of mRNA translation. Using the yeast Saccharomyces cerevisiae as a model, we showed that reduced translation efficiency associated with polybasic protein sequences could be inferred from ribosome profiling. However, an increase in ribosome density at polybasic sequences was evident only when the commonly used translational inhibitors cycloheximide and anisomycin were omitted during mRNA isolation. Since ribosome profiling performed without inhibitors agrees with experimental evidence obtained by other methods, we conclude that cycloheximide and anisomycin must be avoided in ribosome profiling experiments.
Capsid proteins often present a positively charged arginine-rich sequence at their terminal regions, which has a fundamental role in genome packaging and particle stability for some icosahedral viruses. These sequences show little to no conservation and are structurally dynamic such that they cannot be easily detected by common sequence or structure comparisons. As a result, the occurrence and distribution of positively charged domains across the viral universe are unknown. Based on the net charge calculation of discrete protein segments, we identified proteins containing amino acid stretches with a notably high net charge (Q > + 17), which are enriched in icosahedral viruses with a distinctive bias towards arginine over lysine. We used viral particle structural data to calculate the total electrostatic charge derived from the most positively charged protein segment of capsid proteins and correlated these values with genome charges arising from the phosphates of each nucleotide. We obtained a positive correlation (r = 0.91, p-value <0001) for a group of 17 viral families, corresponding to 40% of all families with icosahedral structures described to date. These data indicated that unrelated viruses with diverse genome types adopt a common underlying mechanism for capsid assembly based on R-arms.
Translation initiation is a critical step in the regulation of protein synthesis, and it is subjected to different control mechanisms, such as 5ʹ UTR secondary structure and initiation codon context, that can influence the rates at which initiation and consequentially translation occur. For some genes, translation elongation also affects the rate of protein synthesis. With a GFP library containing nearly all possible combinations of nucleotides from the 3 rd to the 5 th codon positions in the protein coding region of the mRNA, it was previously demonstrated that some nucleotide combinations increased GFP expression up to four orders of magnitude. While it is clear that the codon region from positions 3 to 5 can influence protein expression levels of artificial constructs, its impact on endogenous proteins is still unknown. Through bioinformatics analysis, we identified the nucleotide combinations of the GFP library in Escherichia coli genes and examined the correlation between the expected levels of translation according to the GFP data with the experimental measures of protein expression. We observed that E. coli genes were enriched with the nucleotide compositions that enhanced protein expression in the GFP library, but surprisingly, it seemed to affect the translation efficiency only marginally. Nevertheless, our data indicate that different enterobacteria present similar nucleotide composition enrichment as E. coli, suggesting an evolutionary pressure towards the conservation of short translational enhancer sequences.
Translation initiation is a critical step in the regulation of protein synthesis, and it is subjected to different control mechanisms, such as 5' UTR secondary structure and initiation codon context, that can influence the rates at which initiation and consequentially translation occurs. For some genes, translation elongation also affects the protein synthesis rate. Recently, it was proposed that the identity of codons three to five, called short translational ramp, have a strong influence on translation elongation and protein expression. By the use of a GFP library where nearly all combinations of nucleotides at these positions were created, it was demonstrated that some of nucleotides combinations increased GFP expression up to four orders of magnitude by enhancing their translation efficiency (TE). While it is clear that the short ramp can influence protein expression levels of artificial constructs, its impact on physiological proteins is still unknown. In this work, we aimed to investigate the relevance of the short translational ramp on a physiological context. Through bioinformatics analysis, we identified the nucleotide combinations from the GFP library on Escherichia coli genes and examined their correlation with TE. We observed that E. coli genes were enriched with nucleotide compositions that enhanced protein expression on the GFP library, but, surprisingly, it seems to affect the TE only marginally.Nevertheless, our data indicate that different enterobacteria present similar nucleotide composition enrichment as E. coli, suggesting an evolutionary pressure towards the conservation of the short translational ramp.
Protein segments with a high concentration of positively charged amino acid residues are often used in reporter constructs designed to activate ribosomal mRNA/protein decay pathways, such as those involving nonstop mRNA decay (NSD), no-go mRNA decay (NGD) and the ribosome quality control (RQC) complex. It has been proposed that the electrostatic interaction of the positively charged nascent peptide with the negatively charged ribosomal exit tunnel leads to translation arrest. When stalled long enough, the translation process is terminated with the degradation of the transcript and an incomplete protein. Although early experiments made a strong argument for this mechanism, other features associated with positively charged reporters, such as codon bias and mRNA and protein structure, have emerged as potent inducers of ribosome stalling. We carefully reviewed the published data on the protein and mRNA expression of artificial constructs with diverse compositions as assessed in different organisms. We concluded that, although polybasic sequences generally lead to lower translation efficiency, it appears that an aggravating factor, such as a nonoptimal codon composition, is necessary to cause translation termination events.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
334 Leonard St
Brooklyn, NY 11211
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.