Comparative analysis of multiple genomes in a phylogenetic framework dramatically improves the precision and sensitivity of evolutionary inference, producing more robust results than single-genome analyses can provide. The genomes of 12 Drosophila species, ten of which are presented here for the first time (sechellia, simulans, yakuba, erecta, ananassae, persimilis, willistoni, mojavensis, virilis and grimshawi), illustrate how rates and patterns of sequence divergence across taxa can illuminate evolutionary processes on a genomic scale. These genome sequences augment the formidable genetic tools that have made Drosophila melanogaster a pre-eminent model for animal genetics, and will further catalyse fundamental research on mechanisms of development, cell biology, genetics, disease, neurobiology, behaviour, physiology and evolution. Despite remarkable similarities among these Drosophila species, we identified many putatively non-neutral changes in protein-coding genes, non-coding RNA genes, and cis-regulatory regions. These may prove to underlie differences in the ecology and behaviour of these diverse species.
BackgroundOrthology analysis is an important part of data analysis in many areas of bioinformatics such as comparative genomics and molecular phylogenetics. The ever-increasing flood of sequence data, and hence the rapidly increasing number of genomes that can be compared simultaneously, calls for efficient software tools as brute-force approaches with quadratic memory requirements become infeasible in practise. The rapid pace at which new data become available, furthermore, makes it desirable to compute genome-wide orthology relations for a given dataset rather than relying on relations listed in databases.ResultsThe program described here is a stand-alone tool that is geared towards large datasets and makes use of distributed computing techniques when run on multi-core hardware. It implements an extended version of the reciprocal best alignment heuristic. We apply to compute orthologous proteins in the complete set of all 717 eubacterial genomes available at NCBI at the beginning of 2009. We identified thirty proteins present in 99% of all bacterial proteomes.Conclusions significantly reduces the required amount of memory for orthology analysis compared to existing tools, allowing such computations to be performed on off-the-shelf hardware.
It was a zoological sensation when a living specimen of the coelacanth was first discovered in 1938, as this lineage of lobe-finned fish was thought to have gone extinct 70 million years ago. The modern coelacanth looks remarkably similar to many of its ancient relatives, and its evolutionary proximity to our own fish ancestors provides a glimpse of the fish that first walked on land. Here we report the genome sequence of the African coelacanth, Latimeria chalumnae. Through a phylogenomic analysis, we conclude that the lungfish, and not the coelacanth, is the closest living relative of tetrapods. Coelacanth protein-coding genes are significantly more slowly evolving than those of tetrapods, unlike other genomic features . Analyses of changes in genes and regulatory elements during the vertebrate adaptation to land highlight genes involved in immunity, nitrogen excretion and the development of fins, tail, ear, eye, brain, and olfaction. Functional assays of enhancers involved in the fin-to-limb transition and in the emergence of extra-embryonic tissues demonstrate the importance of the coelacanth genome as a blueprint for understanding tetrapod evolution.
Growing evidence shows that epigenetic mechanisms contribute to complex traits, with implications across many fields of biology. In plant ecology, recent studies have attempted to merge ecological experiments with epigenetic analyses to elucidate the contribution of epigenetics to plant phenotypes, stress responses, adaptation to habitat, and range distributions. While there has been some progress in revealing the role of epigenetics in ecological processes, studies with non-model species have so far been limited to describing broad patterns based on anonymous markers of DNA methylation. In contrast, studies with model species have benefited from powerful genomic resources, which contribute to a more mechanistic understanding but have limited ecological realism. Understanding the significance of epigenetics for plant ecology requires increased transfer of knowledge and methods from model species research to genomes of evolutionarily divergent species, and examination of responses to complex natural environments at a more mechanistic level. This requires transforming genomics tools specifically for studying non-model species, which is challenging given the large and often polyploid genomes of plants. Collaboration among molecular geneticists, ecologists and bioinformaticians promises to enhance our understanding of the mutual links between genome function and ecological processes.
Vault RNAs (vtRNAs) are small, about 100 nt long, polymerase III transcripts contained in the vault particles of eukaryotic cells. Presumably due to their enigmatic function, they have received little attention compared with most other noncoding RNA (ncRNA) families. Their poor sequence conservation makes homology search a complex and tedious task even within vertebrates. Here we report on a systematic and comprehensive analysis of this rapidly evolving class of ncRNAs in deuterostomes, providing a comprehensive collection of computationally predicted vtRNA genes. We find that all previously described vtRNAs are located at a conserved genomic locus linked to the protocadherin gene cluster, an association that is conserved throughout gnathostomes. Lineage-specific expansions to small vtRNA gene clusters are frequently observed in this region. A second vtRNA locus is syntenically conserved across eutherian mammals. The vtRNAs at the two eutherian loci exhibit substantial differences in their promoter structures, explaining their differential expression patterns in several human cancer cell lines. In teleosts, expression of several paralogous vtRNA genes, most but not all located at the syntenically conserved protocadherin locus, was verified by reverse transcriptase-polymerase chain reaction.
The cell cycle genes homology region (CHR) has been identified as a DNA element with an important role in transcriptional regulation of late cell cycle genes. It has been shown that such genes are controlled by DREAM, MMB and FOXM1-MuvB and that these protein complexes can contact DNA via CHR sites. However, it has not been elucidated which sequence variations of the canonical CHR are functional and how frequent CHR-based regulation is utilized in mammalian genomes. Here, we define the spectrum of functional CHR elements. As the basis for a computational meta-analysis, we identify new CHR sequences and compile phylogenetic motif conservation as well as genome-wide protein-DNA binding and gene expression data. We identify CHR elements in most late cell cycle genes binding DREAM, MMB, or FOXM1-MuvB. In contrast, Myb- and forkhead-binding sites are underrepresented in both early and late cell cycle genes. Our findings support a general mechanism: sequential binding of DREAM, MMB and FOXM1-MuvB complexes to late cell cycle genes requires CHR elements. Taken together, we define the group of CHR-regulated genes in mammalian genomes and provide evidence that the CHR is the central promoter element in transcriptional regulation of late cell cycle genes by DREAM, MMB and FOXM1-MuvB.
Growing evidence makes a strong case that epigenetic mechanisms contribute to complex traits, with implications across many fields of biology from dissecting developmental processes to understanding aspects of human health and disease. In ecology, recent studies have merged ecological experimental design with epigenetic analyses to elucidate the contribution of epigenetics to plant phenotypes, stress response, adaptation to habitat, or species range distributions. While there has been some progress in revealing the role of epigenetics in ecological processes, many studies with non-model species have so far been limited to describing broad patterns based on anonymous markers of DNA methylation. In contrast, studies with model species have benefited from powerful genomic resources, which allow for a more mechanistic understanding but have limited ecological realism. To understand the true significance of epigenetics for plant ecology and evolution, we must combine both approaches transferring knowledge and methods from model-species research to genomes of evolutionarily divergent species, and examining responses to complex natural environments at a more mechanistic level. This requires transforming genomics tools specifically for studying non-model species, which is challenging given the large and often polyploid genomes of plants. Collaboration between molecular epigeneticists, ecologists and bioinformaticians promises to enhance our understanding of the mutual links between genome function and ecological processes.All rights reserved. No reuse allowed without permission.(which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity.
Motivation: Sequence-based methods for phylogenetic reconstruction from (nucleic acid) sequence data are notoriously plagued by two effects: homoplasies and alignment errors. Large evolutionary distances imply a large number of homoplastic sites. As most protein-coding genes show dramatic variations in substitution rates that are not uncorrelated across the sequence, this often leads to a patchwork pattern of (i) phylogenetically informative and (ii) effectively randomized regions. In highly variable regions, furthermore, alignment errors accumulate resulting in sometimes misleading signals in phylogenetic reconstruction.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.