To take complete advantage of information on within-species polymorphism and divergence from close relatives, one needs to know the rate and the molecular spectrum of spontaneous mutations. To this end, we have searched for de novo spontaneous mutations in the complete nuclear genomes of five Arabidopsis thaliana mutation accumulation lines that had been maintained by single-seed descent for 30 generations. We identified and validated 99 base substitutions and 17 small and large insertions and deletions. Our results imply a spontaneous mutation rate of 7 × 10−9 base substitutions per site per generation, the majority of which are G:C→A:T transitions. We explain this very biased spectrum of base substitution mutations as a result of two main processes: deamination of methylated cytosines and ultraviolet light–induced mutagenesis.
Knowledge of mutation processes is central to understanding virtually all evolutionary phenomena and the underlying nature of genetic disorders and cancers. However, the limitations of standard molecular mutation detection methods have historically precluded a genomewide understanding of mutation rates and spectra in the nuclear genomes of multicellular organisms. We applied two high-throughput DNA sequencing technologies to identify and characterize hundreds of spontaneously arising base-substitution mutations in 10 Caenorhabditis elegans mutation-accumulation (MA)-line nuclear genomes. C. elegans mutation rate estimates were similar to previous calculations based on smaller numbers of mutations. Mutations were distributed uniformly within and among chromosomes and were not associated with recombination rate variation in the MA lines, suggesting that intragenomic variation in genetic hitchhiking and/or background selection are primarily responsible for the chromosomal distribution patterns of polymorphic nucleotides in C. elegans natural populations. A strong mutational bias from G/C to A/T nucleotides was detected in the MA lines, implicating oxidative DNA damage as a major endogenous mutagenic force in C. elegans. The observed mutational bias also suggests that the C. elegans nuclear genome cannot be at equilibrium because of mutation alone. Transversions dominate the spectrum of spontaneous mutations observed here, whereas transitions dominate patterns of allegedly neutral polymorphism in natural populations of C. elegans and many other animal species; this observation challenges the assumption that natural patterns of molecular variation in noncoding regions of the nuclear genome accurately reflect underlying mutation processes.high-throughput DNA sequencing ͉ mutation accumulation M utation is the fuel for evolution and the underlying cause of virtually all genetic diseases and cancers. Accurate knowledge of the rate and spectrum of base-substitution mutation is essential to studying and understanding a variety of evolutionary phenomena, including rates of molecular evolution (1), estimating the effective population size from standing levels of neutral genetic variation (2), and evaluating assumptions underlying common tests of selection on DNA sequence (1, 3). Despite the important roles of base-substitution mutations in evolutionary studies and their impact on human health, direct knowledge on genome-wide basesubstitution processes remains scarce. Because mutations occur extremely infrequently, the genomic rate and molecular spectrum of mutation have historically been indirectly inferred from either between-species divergence or standing genetic variation at loci thought to be evolving neutrally, or by extrapolation from estimates at a small handful of loci (4, 5). The former approach relies on the assumption of selective neutrality and might produce misleading results if the putatively neutral loci examined are in fact subject to selection or if the estimated times of divergence are inaccurate. The latt...
Knowledge of the genome-wide rate and spectrum of mutations is necessary to understand the origin of disease and the genetic variation driving all evolutionary processes. Here, we provide a genome-wide analysis of the rate and spectrum of mutations obtained in two Daphnia pulex genotypes via separate mutation-accumulation (MA) experiments. Unlike most MA studies that utilize haploid, homozygous, or self-fertilizing lines, D. pulex can be propagated ameiotically while maintaining a naturally heterozygous, diploid genome, allowing the capture of the full spectrum of genomic changes that arise in a heterozygous state. While base-substitution mutation rates are similar to those in other multicellular eukaryotes (about 4 × 10 −9 per site per generation), we find that the rates of large-scale (>100 kb) de novo copy-number variants (CNVs) are significantly elevated relative to those seen in previous MA studies. The heterozygosity maintained in this experiment allowed for estimates of gene-conversion processes. While most of the conversion tract lengths we report are similar to those generated by meiotic processes, we also find larger tract lengths that are indicative of mitotic processes. Comparison of MA lines to natural isolates reveals that a majority of large-scale CNVs in natural populations are removed by purifying selection. The mutations observed here share similarities with disease-causing, complex, large-scale CNVs, thereby demonstrating that MA studies in D. pulex serve as a system for studying the processes leading to such alterations.
Photoreactivation, one of the first DNA repair pathways to evolve, is the direct reversal of premutagenic lesions caused by ultraviolet (UV) irradiation, catalyzed by photolyases in a light-dependent, single-enzyme reaction. It has been experimentally shown that photoreactivation prevents UV mutagenesis in a broad range of species. In the absence of photoreactivation, UV-induced photolesions are repaired by the more complex and much less efficient nucleotide excision repair pathway. Despite their obvious beneficial effects, several lineages, including placental mammals, lost photolyase genes during evolution. In this study, we ask why photolyase genes have been lost in those lineages and discuss the significance of these losses in the context of the evolution of the genomic mutation rates. We first perform an extensive phylogenomic analysis of the photolyase/cryptochrome family, to assess what species lack each kind of photolyase gene. Then, we estimate the ratio of nonsynonymous to synonymous substitution rates in several groups of photolyase genes, as a proxy of the strength of purifying natural selection, and we ask whether less evolutionarily constrained photolyase genes are more likely lost. We also review functional data and compare the efficiency of different kinds of photolyases. We find that eukaryotic photolyases are, on average, less evolutionarily constrained than eubacterial ones and that the strength of natural selection is correlated with the affinity of photolyases for their substrates. We propose that the loss of photolyase genes in eukaryotic species may be due to weak natural selection and may result in a deleterious increase of their genomic mutation rates. In contrast, the loss of photolyase genes in prokaryotes may not cause an increase in the mutation rate and be neutral in most cases.
The growing catalogue of structural variants in humans often overlooks inversions as one of the most difficult types of variation to study, even though they affect phenotypic traits in diverse organisms. Here, we have analysed in detail 90 inversions predicted from the comparison of two independently assembled human genomes: the reference genome (NCBI36/HG18) and HuRef. Surprisingly, we found that two thirds of these predictions (62) represent errors either in assembly comparison or in one of the assemblies, including 27 misassembled regions in HG18. Next, we validated 22 of the remaining 28 potential polymorphic inversions using different PCR techniques and characterized their breakpoints and ancestral state. In addition, we determined experimentally the derived allele frequency in Europeans for 17 inversions (DAF = 0.01-0.80), as well as the distribution in 14 worldwide populations for 12 of them based on the 1000 Genomes Project data. Among the validated inversions, nine have inverted repeats (IRs) at their breakpoints, and two show nucleotide variation patterns consistent with a recurrent origin. Conversely, inversions without IRs have a unique origin and almost all of them show deletions or insertions at the breakpoints in the derived allele mediated by microhomology sequences, which highlights the importance of mechanisms like FoSTeS/MMBIR in the generation of complex rearrangements in the human genome. Finally, we found several inversions located within genes and at least one candidate to be positively selected in Africa. Thus, our study emphasizes the importance of careful analysis and validation of large-scale genomic predictions to extract reliable biological conclusions.
Despite many years of study into inversions, very little is known about their functional consequences, especially in humans. A common hypothesis is that the selective value of inversions stems in part from their effects on nearby genes, although evidence of this in natural populations is almost nonexistent. Here we present a global analysis of a new 415-kb polymorphic inversion that is among the longest ones found in humans and is the first with clear position effects. This inversion is located in chromosome 19 and has been generated by non-homologous end joining between blocks of transposable elements with low identity. PCR genotyping in 541 individuals from eight different human populations allowed the detection of tag SNPs and inversion genotyping in multiple populations worldwide, showing that the inverted allele is mainly found in East Asia with an average frequency of 4.7%. Interestingly, one of the breakpoints disrupts the transcription factor gene ZNF257, causing a significant reduction in the total expression level of this gene in lymphoblastoid cell lines. RNA-Seq analysis of the effects of this expression change in standard homozygotes and inversion heterozygotes revealed distinct expression patterns that were validated by quantitative RT-PCR. Moreover, we have found a new fusion transcript that is generated exclusively from inverted chromosomes around one of the breakpoints. Finally, by the analysis of the associated nucleotide variation, we have estimated that the inversion was generated ~40,000–50,000 years ago and, while a neutral evolution cannot be ruled out, its current frequencies are more consistent with those expected for a deleterious variant, although no significant association with phenotypic traits has been found so far.
The spontaneous deamination of cytosine produces uracil mispaired with guanine in DNA, which will produce a mutation, unless repaired. In all domains of life, uracil-DNA glycosylases (UDGs) are responsible for the elimination of uracil from DNA. Thus, UDGs contribute to the integrity of the genetic information and their loss results in mutator phenotypes. We are interested in understanding the role of UDG genes in the evolutionary variation of the rate and the spectrum of spontaneous mutations. To this end, we determined the presence or absence of the five main UDG families in more than 1,000 completely sequenced genomes and analyzed their patterns of gene loss and gain in eubacterial lineages. We observe nonindependent patterns of gene loss and gain between UDG families in Eubacteria, suggesting extensive functional overlap in an evolutionary timescale. Given that UDGs prevent transitions at G:C sites, we expected the loss of UDG genes to bias the mutational spectrum toward a lower equilibrium G + C content. To test this hypothesis, we used phylogenetically independent contrasts to compare the G + C content at intergenic and 4-fold redundant sites between lineages where UDG genes have been lost and their sister clades. None of the main UDG families present in Eubacteria was associated with a higher G + C content at intergenic or 4-fold redundant sites. We discuss the reasons of this negative result and report several features of the evolution of the UDG superfamily with implications for their functional study. uracil-DNA glycosylase, mutation rate evolution, mutational bias, GC content, DNA repair, mutator gene.
One of the most used techniques to study structural variation at a genome level is paired-end mapping (PEM). PEM has the advantage of being able to detect balanced events, such as inversions and translocations. However, inversions are still quite difficult to predict reliably, especially from high-throughput sequencing data. We simulated realistic PEM experiments with different combinations of read and library fragment lengths, including sequencing errors and meaningful base-qualities, to quantify and track down the origin of false positives and negatives along sequencing, mapping, and downstream analysis. We show that PEM is very appropriate to detect a wide range of inversions, even with low coverage data. However, % of inversions located between segmental duplications are expected to go undetected by the most common sequencing strategies. In general, longer DNA libraries improve the detectability of inversions far better than increments of the coverage depth or the read length. Finally, we review the performance of three algorithms to detect inversions —SVDetect, GRIAL, and VariationHunter—, identify common pitfalls, and reveal important differences in their breakpoint precisions. These results stress the importance of the sequencing strategy for the detection of structural variants, especially inversions, and offer guidelines for the design of future genome sequencing projects.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.