Despite the central importance of noncoding DNA to gene regulation and evolution, understanding of the extent of selection on plant noncoding DNA remains limited compared to that of other organisms. Here we report sequencing of genomes from three Brassicaceae species (Leavenworthia alabamica, Sisymbrium irio and Aethionema arabicum) and their joint analysis with six previously sequenced crucifer genomes. Conservation across orthologous bases suggests that at least 17% of the Arabidopsis thaliana genome is under selection, with nearly one-quarter of the sequence under selection lying outside of coding regions. Much of this sequence can be localized to approximately 90,000 conserved noncoding sequences (CNSs) that show evidence of transcriptional and post-transcriptional regulation. Population genomics analyses of two crucifer species, A. thaliana and Capsella grandiflora, confirm that most of the identified CNSs are evolving under medium to strong purifying selection. Overall, these CNSs highlight both similarities and several key differences between the regulatory DNA of plants and other species.
Several demographic and selective events occurred during the domestication of wheat from the allotetraploid wild emmer (Triticum turgidum ssp. dicoccoides). Cultivated wheat has since been affected by other historical events. We analyzed nucleotide diversity at 21 loci in a sample of 101 individuals representing 4 taxa corresponding to representative steps in the recent evolution of wheat (wild, domesticated, cultivated durum, and bread wheats) to unravel the evolutionary history of cultivated wheats and to quantify its impact on genetic diversity. Sequence relationships are consistent with a single domestication event and identify 2 genetically different groups of bread wheat. The wild group is not highly polymorphic, with only 212 polymorphic sites among the 21,720 bp sequenced, and, during domestication, diversity was further reduced in cultivated forms--by 69% in bread wheat and 84% in durum wheat--with considerable differences between loci, some retaining no polymorphism at all. Coalescent simulations were performed and compared with our data to estimate the intensity of the bottlenecks associated with domestication and subsequent selection. Based on our 21-locus analysis, the average intensity of domestication bottleneck was estimated at about 3--giving a population size for the domesticated form about one third that of wild dicoccoides. The most severe bottleneck, with an intensity of about 6, occurred in the evolution of durum wheat. We investigated whether some of the genes departed from the empirical distribution of most loci, suggesting that they might have been selected during domestication or breeding. We detected a departure from the null model of demographic bottleneck for the hypothetical gene HgA. However, the atypical pattern of polymorphism at this locus might reveal selection on the linked locus Gsp1A, which may affect grain softness--an important trait for end-use quality in wheat.
Theoretical and empirical comparisons of molecular diversity in selfing and outcrossing plants have primarily focused on long-term consequences of differences in mating system (between species). However, improving our understanding of the causes of mating system evolution requires ecological and genetic studies of the early stages of mating system transition. Here, we examine nuclear and chloroplast DNA sequences and microsatellite variation in a large sample of populations of Arabidopsis lyrata from the Great Lakes region of Eastern North American that show intra-and interpopulation variation in the degree of self-incompatibility and realized outcrossing rates. Populations show strong geographic clustering irrespective of mating system, suggesting that selfing either evolved multiple times or has spread to multiple genetic backgrounds. Diversity is reduced in selfing populations, but not to the extent of the severe loss of variation expected if selfing evolved due to selection for reproductive assurance in connection with strong founder events. The spread of self-compatibility in this region may have been favored as colonization bottlenecks following glaciation or migration from Europe reduced standing levels of inbreeding depression. However, our results do not suggest a single transition to selfing in this system, as has been suggested for some other species in the Brassicaceae.Arabidopsis, bottlenecks, breakdown of self-incompatibility, demography, effective population size, inbreeding depression, mating system evolution, population genetics.
Efficient algorithms and programs for the analysis of the ever-growing amount of biological sequence data are strongly needed in the genomics era. The pace at which new data and methodologies are generated calls for the use of pre-existing, optimized-yet extensible-code, typically distributed as libraries or packages. This motivated the Bio++ project, aiming at developing a set of C++ libraries for sequence analysis, phylogenetics, population genetics, and molecular evolution. The main attractiveness of Bio++ is the extensibility and reusability of its components through its object-oriented design, without compromising the computer-efficiency of the underlying methods. We present here the second major release of the libraries, which provides an extended set of classes and methods. These extensions notably provide built-in access to sequence databases and new data structures for handling and manipulating sequences from the omics era, such as multiple genome alignments and sequencing reads libraries. More complex models of sequence evolution, such as mixture models and generic n-tuples alphabets, are also included.
The extent that both positive and negative selection vary across different portions of plant genomes remains poorly understood. Here, we sequence whole genomes of 13 Capsella grandiflora individuals and quantify the amount of selection across the genome. Using an estimate of the distribution of fitness effects, we show that selection is strong in coding regions, but weak in most noncoding regions, with the exception of 5′ and 3′ untranslated regions (UTRs). However, estimates of selection on noncoding regions conserved across the Brassicaceae family show strong signals of selection. Additionally, we see reductions in neutral diversity around functional substitutions in both coding and conserved noncoding regions, indicating recent selective sweeps at these sites. Finally, using expression data from leaf tissue we show that genes that are more highly expressed experience stronger negative selection but comparable levels of positive selection to lowly expressed genes. Overall, we observe widespread positive and negative selection in coding and regulatory regions, but our results also suggest that both positive and negative selection on plant noncoding sequence are considerably rarer than in animal genomes.
BackgroundOf the different bioinformatic methods used to recover transposable elements (TEs) in genome sequences, one of the most commonly used procedures is the homology-based method proposed by the RepeatMasker program. RepeatMasker generates several output files, including the .out file, which provides annotations for all detected repeats in a query sequence. However, a remaining challenge consists of identifying the different copies of TEs that correspond to the identified hits. This step is essential for any evolutionary/comparative analysis of the different copies within a family. Different possibilities can lead to multiple hits corresponding to a unique copy of an element, such as the presence of large deletions/insertions or undetermined bases, and distinct consensus corresponding to a single full-length sequence (like for long terminal repeat (LTR)-retrotransposons). These possibilities must be taken into account to determine the exact number of TE copies.ResultsWe have developed a perl tool that parses the RepeatMasker .out file to better determine the number and positions of TE copies in the query sequence, in addition to computing quantitative information for the different families. To determine the accuracy of the program, we tested it on several RepeatMasker .out files corresponding to two organisms (Drosophila melanogaster and Homo sapiens) for which the TE content has already been largely described and which present great differences in genome size, TE content, and TE families.ConclusionsOur tool provides access to detailed information concerning the TE content in a genome at the family level from the .out file of RepeatMasker. This information includes the exact position and orientation of each copy, its proportion in the query sequence, and its quality compared to the reference element. In addition, our tool allows a user to directly retrieve the sequence of each copy and obtain the same detailed information at the family level when a local library with incomplete TE class/subclass information was used with RepeatMasker. We hope that this tool will be helpful for people working on the distribution and evolution of TEs within genomes.
Self-fertilization is hypothesized to be an evolutionary dead end because reversion to outcrossing can rarely happen, and selfing lineages are thought to rapidly become extinct because of limited potential for adaptation and/or accumulation of deleterious mutations. We tested these two assumptions by combining morphological characters and molecular-evolution analyses in a tribe of hermaphroditic grasses (Triticeae). First, we determined the mating system of the 19 studied species. Then, we sequenced 27 protein-coding loci and compared base composition and substitution patterns between selfers and outcrossers. We found that the evolution of the mating system is best described by a model including outcrossing-to-selfing transitions only. At the molecular level, we showed that regions of low recombination exhibit signatures of relaxed selection. However, we did not detect any evidence of accumulation of nonsynonymous substitutions in selfers compared to outcrossers. Additionally, we tested for the potential deleterious effects of GC-biased gene conversion in outcrossing species. We found that recombination and not the mating system affected substitution patterns and base composition. We suggest that, in Triticeae, although recombination patterns have remained stable, selfing lineages are of recent origin and inbreeding may have persisted for insufficient time for differences between the two mating systems to evolve. K E Y W O R D S :Biased gene conversion, effective population size, mating system, protein evolution, recombination, selection efficiency, substitution rate.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
334 Leonard St
Brooklyn, NY 11211
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.