At a high level, the tidyverse is a language for solving data science challenges with R code. Its primary goal is to facilitate a conversation between a human and a computer about data. Less abstractly, the tidyverse is a collection of R packages that share a high-level design philosophy and low-level grammar and data structures, so that learning one package makes it easier to learn the next.
Summary The biological causes of selective pressures on coding-sequence evolution remain controversial, despite the surprising consistency of covariation between common measures of evolutionary change (substitution rates) and gene expression (mRNA levels, codon usage) across taxa. We carry out a unified analysis which reveals these conserved patterns in E. coli, yeast, worm, fly, mouse, and human, and suggests that all trends stem largely from a unified underlying selective pressure. In metazoans, these trends are strongest in tissues composed of neurons, whose structure and lifetime confer extreme sensitivity to protein misfolding. We propose, and demonstrate using a molecular-level evolutionary simulation, that selection against toxicity of misfolded proteins generated by ribosome errors suffices to create all the observed covariation. The mechanistic model of molecular evolution which emerges yields testable biochemical predictions, calls into question use of nonsynonymous-to-synonymous substitution ratios (Ka/Ks) to detect functional selection, and suggests how mistranslation may contribute to neurodegenerative disease.
Much recent work has explored molecular and population-genetic constraints on the rate of protein sequence evolution. The best predictor of evolutionary rate is expression level, for reasons that have remained unexplained. Here, we hypothesize that selection to reduce the burden of protein misfolding will favor protein sequences with increased robustness to translational missense errors. Pressure for translational robustness increases with expression level and constrains sequence evolution. Using several sequenced yeast genomes, global expression and protein abundance data, and sets of paralogs traceable to an ancient whole-genome duplication in yeast, we rule out several confounding effects and show that expression level explains roughly half the variation in Saccharomyces cerevisiae protein evolutionary rates. We examine causes for expression's dominant role and find that genome-wide tests favor the translational robustness explanation over existing hypotheses that invoke constraints on function or translational efficiency. Our results suggest that proteins evolve at rates largely unrelated to their functions and can explain why highly expressed proteins evolve slowly across the tree of life.evolutionary rate ͉ protein misfolding ͉ yeast ͉ translation errors ͉ gene duplication A central problem in molecular evolution is why proteins evolve at different rates. Protein evolutionary rates, quantified by the number of nonsynonymous nucleotide changes per site (dN) in the encoding genes, are routinely used to build phylogenetic trees, detect selection, find orthologous proteins among related species (1), and evaluate the functional importance of genes (2), yet we possess only hints of the biophysical cause of rate differences. Thirty years ago, Zuckerkandl (3) proposed that a protein's sequence will evolve at a rate primarily determined by the proportion of its sites involved in specific functions (or ''functional density''). Although this proposal has gained wide acceptance (2), measurement of functional density remains problematic because residues may contribute to protein function in unpredictable ways, and arduous sequence-wide saturation mutagenesis and mutant characterization studies are required to ascertain these effects.Instead, many recent studies have focused on other, more readily obtained, measures that may approximate functional density. For example, protein-protein interactions presumably constrain interfacial residues, and some reports indicate that highly interactive proteins evolve slowly (4). The intuition that a protein's overall functional importance should amplify the fitness costs of mutations at sites that make subtle functional contributions has been captured in analyses of how a gene's functional category (5, 6), its essentiality for organism survival (6-8), or the fitness effect of its deletion (or ''dispensability'') (9, 10) correlate with evolutionary rate. In all cases, the effects under consideration explain only a small fraction (Ϸ5% or less) of the observed variation in evolutionary...
A gene's rate of sequence evolution is among the most fundamental evolutionary quantities in common use, but what determines evolutionary rates has remained unclear. Here, we carry out the first combined analysis of seven predictors (gene expression level, dispensability, protein abundance, codon adaptation index, gene length, number of protein-protein interactions, and the gene's centrality in the interaction network) previously reported to have independent influences on protein evolutionary rates. Strikingly, our analysis reveals a single dominant variable linked to the number of translation events which explains 40-fold more variation in evolutionary rate than any other, suggesting that protein evolutionary rate has a single major determinant among the seven predictors. The dominant variable explains nearly half the variation in the rate of synonymous and protein evolution. We show that the two most commonly used methods to disentangle the determinants of evolutionary rate, partial correlation analysis and ordinary multivariate regression, produce misleading or spurious results when applied to noisy biological data. We overcome these difficulties by employing principal component regression, a multivariate regression of evolutionary rate against the principal components of the predictor variables. Our results support the hypothesis that translational selection governs the rate of synonymous and protein sequence evolution in yeast.
Antiretroviral therapy can reduce human immunodeficiency virus type 1 (HIV-1) viremia to below the detection limit of ultrasensitive clinical assays (50 copies of HIV-1 RNA/ml). However, latent HIV-1 persists in resting CD4؉ T cells, and low residual levels of free virus are found in the plasma. Limited characterization of this residual viremia has been done because of the low number of virions per sample. Using intensive sampling, we analyzed residual viremia and compared these viruses to latent proviruses in resting CD4 ؉ T cells in peripheral blood. For each patient, we found some viruses in the plasma that were identical to viruses in resting CD4 ؉ T cells by pol gene sequencing. However, in a majority of patients, the most common viruses in the plasma were rarely found in resting CD4 ؉ T cells even when the resting cell compartment was analyzed with assays that detect replication-competent viruses. Despite the large diversity of pol sequences in resting CD4 ؉ T cells, the residual viremia was dominated by a homogeneous population of viruses with identical pol sequences. In the most extensively studied case, a predominant plasma sequence was also found in analysis of the env gene, and linkage by long-distance reverse transcriptase PCR established that these predominant plasma sequences represented a single predominant plasma virus clone. The predominant plasma clones were released for months to years without evident sequence change. Thus, in some patients on antiretroviral therapy, the major mechanism for residual viremia involves prolonged production of a small number of viral clones without evident evolution, possibly by cells other than circulating CD4 ؉ T cells.Treatment of human immunodeficiency virus type 1 (HIV-1) infection with highly active antiretroviral therapy (HAART) reduces viremia to below the detection limit of ultrasensitive clinical assays (15,16,37). However, HIV-1 persists in resting CD4 ϩ T cells (6,8,9,12,51) and possibly other reservoirs (4, 58). The latent reservoir in resting CD4ϩ T cells has a long half-life (11,41,44,47,56) that will likely preclude virus eradication unless novel approaches (5, 24-28, 42) can purge latently infected cells.In patients on HAART, HIV-1 persistence is evidenced not only by the latent reservoir in resting CD4 ϩ T cells but also by free virus in the plasma (10,17,19,36,41,48,52). Free virions can be found with special methods, even in patients who do not have clinically detectable viremia (10,18,19,36,52). Given the short half-life of free virus (20,49), this residual viremia indicates active virus production. This virus production may reflect low-level ongoing replication that continues despite HAART (7,10,13,14,18,21,33,48,56) and/or release of virus from latently infected cells that become activated (19,22,34,48,55) or from other stable cellular reservoirs (4, 58). The characterization of residual viremia may provide a means for determining the importance of different mechanisms of viral persistence.Although the presence of free virus can be detected ...
To determine whether genes retain ancestral functions over a billion years of evolution and to identify principles of deep evolutionary divergence, we replaced 414 essential yeast genes with their human orthologs, assaying for complementation of lethal growth defects upon loss of the yeast genes. Nearly half (47%) of the yeast genes could be successfully humanized. Sequence similarity and expression only partly predicted replaceability. Instead, replaceability depended strongly on gene modules: genes in the same process tended to be similarly replaceable (e.g., sterol biosynthesis) or not (e.g., DNA replication initiation). Simulations confirmed selection for specific function can maintain replaceability despite extensive sequence divergence. Critical ancestral functions of many essential genes are thus retained in a pathway-specific manner, robust to drift in sequences, splicing, and protein interfaces.
Errors in protein synthesis disrupt cellular fitness, cause disease phenotypes, and shape gene and genome evolution. Experimental and theoretical results on this topic have accumulated rapidly in disparate fields such as neurobiology, protein biosynthesis and degradation, and molecular evolution, yet with limited communication between disciplines. Here, we review studies of error frequencies, their cellular and organismal consequences, and attendant long-range evolutionary responses. Measurements of error frequencies, from transcription through protein folding, remain in their infancy; we emphasize major areas where little is known, such as the failure rate of protein folding, or where technological innovations may enable imminent gains, such as translational missense error frequencies. Evolutionary responses to errors fall into two broad categories: adaptations that minimize errors and their attendant costs, and adaptations which exploit errors for the organism’s benefit. Given this wide spectrum of effects, it may be more useful to refer to synthesis outcomes as beneficial and deleterious rather than correct and erroneous.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
334 Leonard St
Brooklyn, NY 11211
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.