Coding sequence evolution was once thought to be the result of selection on optimal protein function alone. Selection can, however, also act at the RNA level, for example, to facilitate rapid translation or ensure correct splicing. Here, we ask whether the way DNA works also imposes constraints on coding sequence evolution. We identify nucleosome positioning as a likely candidate to set up such a DNA-level selective regime and use high-resolution microarray data in yeast to compare the evolution of coding sequence bound to or free from nucleosomes. Controlling for gene expression and intra-gene location, we find a nucleosome-free “linker” sequence to evolve on average 5–6% slower at synonymous sites. A reduced rate of evolution in linker is especially evident at the 5′ end of genes, where the effect extends to non-synonymous substitution rates. This is consistent with regular nucleosome architecture in this region being important in the context of gene expression control. As predicted, codons likely to generate a sequence unfavourable to nucleosome formation are enriched in linker sequence. Amino acid content is likewise skewed as a function of nucleosome occupancy. We conclude that selection operating on DNA to maintain correct positioning of nucleosomes impacts codon choice, amino acid choice, and synonymous and non-synonymous rates of evolution in coding sequence. The results support the exclusion model for nucleosome positioning and provide an alternative interpretation for runs of rare codons. As the intimate association of histones and DNA is a universal characteristic of genic sequence in eukaryotes, selection on coding sequence composition imposed by nucleosome positioning should be phylogenetically widespread.
In Drosophila melanogaster, synonymous codons corresponding to the most abundant cognate tRNAs are used more frequently, especially in highly expressed genes. Increased use of such "optimal" codons is considered an adaptation for translational efficiency. Need it always be the case that selection should favor the use of a translationally optimal codon? Here, we investigate one possible confounding factor, namely, the need to specify information in exons necessary to enable correct splicing. As expected from such a model, in Drosophila many codons show different usage near intron-exon boundaries versus exon core regions. However, this finding is in principle also consistent with Hill-Robertson effects modulating usage of translationally optimal codons. However, several results support the splice model over the translational selection model: 1) the trends in codon usage are strikingly similar to those in mammals in which codon usage near boundaries correlates with abundance in exonic splice enhancers (ESEs), 2) codons preferred near boundaries tend to be enriched for A and avoid C (conversely those avoided near boundaries prefer C rather than A), as expected were ESEs involved, and 3) codons preferred near boundaries are typically not translationally optimal. We conclude that usage of translationally optimal codons usage is compromised in the vicinity of splice junctions in intron-containing genes, to the effect that we observe higher levels of usage of translationally optimal codons at the center of exons. On the gene level, however, controlling for known correlates of codon bias, the impact on codon usage patterns is quantitatively small. These results have implications for inferring aspects of the mechanism of splicing given nothing more than a well-annotated genome.
Integrating genome-scale sequence, expression, structural and protein interaction data from E. coli we establish an interaction between chaperone (GroEL) dependency and optimal codon usage.Highly expressed sporadic substrates of GroEL employ more optimal codons than expected, show enrichment for optimal codons at structurally sensitive sites and greater conservation of codon optimality under conditions of relaxed purifying selection.We suggest that highly expressed genes cannot routinely utilize GroEL for error control so that codon usage has evolved to provide complementary error limitation, whereas obligate GroEL substrates experience relaxed selection on codon usage.Our results support a critical role of misfolding prevention in gene evolution.
Nucleosomes in eukaryotes act as platforms for the dynamic integration of epigenetic information. Posttranslational modifications are reversibly added or removed and core histones exchanged for paralogous variants, in concert with changing demands on transcription and genome accessibility. Histones are also common in archaea. Their role in genome regulation, however, and the capacity of individual paralogs to assemble into histone–DNA complexes with distinct properties remain poorly understood. Here, we combine structural modeling with phylogenetic analysis to shed light on archaeal histone paralogs, their evolutionary history, and capacity to generate combinatorial chromatin states through hetero-oligomeric assembly. Focusing on the human commensal Methanosphaera stadtmanae as a model archaeal system, we show that the heteromeric complexes that can be assembled from its seven histone paralogs vary substantially in DNA binding affinity and tetramer stability. Using molecular dynamics simulations, we go on to identify unique paralogs in M. stadtmanae and Methanobrevibacter smithii that are characterized by unstable interfaces between dimers. We propose that these paralogs act as capstones that prevent stable tetramer formation and extension into longer oligomers characteristic of model archaeal histones. Importantly, we provide evidence from phylogeny and genome architecture that these capstones, as well as other paralogs in the Methanobacteriales, have been maintained for hundreds of millions of years following ancient duplication events. Taken together, our findings indicate that at least some archaeal histone paralogs have evolved to play distinct and conserved functional roles, reminiscent of eukaryotic histone variants. We conclude that combinatorially complex histone-based chromatin is not restricted to eukaryotes and likely predates their emergence.
It has long been known that methylated cytosines deaminate at higher rates than unmodified cytosines and constitute mutational hotspots in mammalian genomes. The repertoire of naturally occurring cytosine modifications, however, extends beyond 5-methylcytosine to include its oxidation derivatives, notably 5-hydroxymethylcytosine. The effects of these modifications on sequence evolution are unknown. Here, we combine base-resolution maps of methyl- and hydroxymethylcytosine in human and mouse with population genomic, divergence and somatic mutation data to show that hydroxymethylated and methylated cytosines show distinct patterns of variation and evolution. Surprisingly, hydroxymethylated sites are consistently associated with elevated C to G transversion rates at the level of segregating polymorphisms, fixed substitutions, and somatic mutations in tumors. Controlling for multiple potential confounders, we find derived C to G SNPs to be 1.43-fold (1.22-fold) more common at hydroxymethylated sites compared to methylated sites in human (mouse). Increased C to G rates are evident across diverse functional and sequence contexts and, in cancer genomes, correlate with the expression of Tet enzymes and specific components of the mismatch repair pathway (MSH2, MSH6, and MBD4). Based on these and other observations we suggest that hydroxymethylation is associated with a distinct mutational burden and that the mismatch repair pathway is implicated in causing elevated transversion rates at hydroxymethylated cytosines.
When a duplicate gene has no apparent loss-of-function phenotype, it is commonly considered that the phenotype has been masked as a result of functional redundancy with the remaining paralog. This is supported by indirect evidence showing that multi-copy genes show loss-of-function phenotypes less often than single-copy genes and by direct tests of phenotype masking using select gene sets. Here we take a systematic genome-wide RNA interference approach to assess phenotype masking in paralog pairs in the Caenorhabditis elegans genome. Remarkably, in contrast to expectations, we find that phenotype masking makes only a minor contribution to the low knockdown phenotype rate for duplicate genes. Instead, we find that non-essential genes are highly over-represented among duplicates, leading to a low observed loss-of-function phenotype rate. We further find that duplicate pairs derived from essential and non-essential genes have contrasting evolutionary dynamics: whereas non-essential genes are both more often successfully duplicated (fixed) and lost, essential genes are less often duplicated but upon successful duplication are maintained over longer periods. We expect the fundamental evolutionary duplication dynamics presented here to be broadly applicable.
Background: In mammals, splice-regulatory domains impose marked trends on the relative abundance of certain amino acids near exon-intron boundaries. Is this a mammalian particularity or symptomatic of exonic splicing regulation across taxa? Are such trends more common in species that a priori have a harder time identifying exon ends, that is, those with pre-mRNA rich in intronic sequence? We address these questions surveying exon composition in a sample of phylogenetically diverse genomes.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.