Background Rapid accumulation of vertebrate genome sequences render comparative genomics a powerful approach to study macro-evolutionary events. The assessment of phylogenic relationships between species routinely depends on the analysis of sequence homology at the nucleotide or protein level. Results We analyzed mRNA GC content, codon usage and divergence of orthologous proteins in 55 vertebrate genomes. Data were visualized in genome-wide landscapes using a sliding window approach. Landscapes of GC content reveal both evolutionary conservation of clustered genes, and lineage-specific changes, so that it was possible to construct a phylogenetic tree that closely matched the classic “tree of life”. Landscapes of GC content also strongly correlated to landscapes of amino acid usage: positive correlation with glycine, alanine, arginine and proline and negative correlation with phenylalanine, tyrosine, methionine, isoleucine, asparagine and lysine. Peaks of GC content correlated strongly with increased protein divergence. Conclusions Landscapes of base- and amino acid composition of the coding genome opens a new approach in comparative genomics, allowing identification of discrete regions in which protein evolution accelerated over deep evolutionary time. Insight in the evolution of genome structure may spur novel studies assessing the evolutionary benefit of genes in particular genomic regions. Electronic supplementary material The online version of this article (10.1186/s12862-019-1469-1) contains supplementary material, which is available to authorized users.
Background Approximately 1000 protein encoding genes common for vertebrates are still unannotated in avian genomes. Are these genes evolutionary lost or are they not yet found for technical reasons? Using genome landscapes as a tool to visualize large-scale regional effects of genome evolution, we reexamined this question. Results On basis of gene annotation in non-avian vertebrate genomes, we established a list of 15,135 common vertebrate genes. Of these, 1026 were not found in any of eight examined bird genomes. Visualizing regional genome effects by our sliding window approach showed that the majority of these "missing" genes can be clustered to 14 regions of the human reference genome. In these clusters, an additional 1517 genes (often gene fragments) were underrepresented in bird genomes. The clusters of “missing” genes coincided with regions of very high GC content, particularly in avian genomes, making them “hidden” because of incomplete sequencing. Moreover, proteins encoded by genes in these sequencing refractory regions showed signs of accelerated protein evolution. As a proof of principle for this idea we experimentally characterized the mRNA and protein products of four "hidden" bird genes that are crucial for energy homeostasis in skeletal muscle: ALDOA, ENO3, PYGM and SLC2A4. Conclusions A least part of the “missing” genes in bird genomes can be attributed to an artifact caused by the difficulty to sequence regions with extreme GC% (“hidden” genes). Biologically, these “hidden” genes are of interest as they encode proteins that evolve more rapidly than the genome wide average. Finally we show that four of these “hidden” genes encode key proteins for energy metabolism in flight muscle.
Background Different types of proteins diverge at vastly different rates. Moreover, the same type of protein has been observed to evolve with different rates in different phylogenetic lineages. In the present study we measured the rates of protein evolution in Eutheria (placental mammals) and Metatheria (marsupials) on a genome-wide basis and we propose that the gene position in the genome landscape has an important influence on the rate of protein divergence. Results We analyzed a protein-encoding gene set (n = 15,727) common to 16 mammals (12 Eutheria and 4 Metatheria). Using sliding windows that averaged regional effects of protein divergence we constructed landscapes in which strong and lineage-specific regional effects were seen on the molecular clock rate of protein divergence. Within each lineage, the relatively high rates were preferentially found in subtelomeric chromosomal regions. Such regions were observed to contain important and well-studied loci for fetal growth, uterine function and the generation of diversity in the adaptive repertoire of immunoglobulins. Conclusions A genome landscape approach visualizes lineage-specific regional differences between Eutherian and Metatherian rates of protein evolution. This phenomenon of chromosomal position is a new element that explains at least part of the lineage-specific effects and differences between proteins on the molecular clock rates.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.