BackgroundGenomic studies of endangered species provide insights into their evolution and demographic history, reveal patterns of genomic erosion that might limit their viability, and offer tools for their effective conservation. The Iberian lynx (Lynx pardinus) is the most endangered felid and a unique example of a species on the brink of extinction.ResultsWe generate the first annotated draft of the Iberian lynx genome and carry out genome-based analyses of lynx demography, evolution, and population genetics. We identify a series of severe population bottlenecks in the history of the Iberian lynx that predate its known demographic decline during the 20th century and have greatly impacted its genome evolution. We observe drastically reduced rates of weak-to-strong substitutions associated with GC-biased gene conversion and increased rates of fixation of transposable elements. We also find multiple signatures of genetic erosion in the two remnant Iberian lynx populations, including a high frequency of potentially deleterious variants and substitutions, as well as the lowest genome-wide genetic diversity reported so far in any species.ConclusionsThe genomic features observed in the Iberian lynx genome may hamper short- and long-term viability through reduced fitness and adaptive potential. The knowledge and resources developed in this study will boost the research on felid evolution and conservation genomics and will benefit the ongoing conservation and management of this emblematic species.Electronic supplementary materialThe online version of this article (doi:10.1186/s13059-016-1090-1) contains supplementary material, which is available to authorized users.
Accumulating evidence indicates that some protein-coding genes have originated de novo from previously non-coding genomic sequences. However, the processes underlying de novo gene birth are still enigmatic. In particular, the appearance of a new functional protein seems highly improbable unless there is already a pool of neutrally evolving peptides that are translated at significant levels and that can at some point acquire new functions. Here, we use deep ribosome-profiling sequencing data, together with proteomics and single nucleotide polymorphism information, to search for these peptides. We find hundreds of open reading frames that are translated and that show no evolutionary conservation or selective constraints. These data suggest that the translation of these neutrally evolving peptides may be facilitated by the chance occurrence of open reading frames with a favourable codon composition. We conclude that the pervasive translation of the transcriptome provides plenty of material for the evolution of new functional proteins.
This study examines the potential of next-generation sequencing based ‘genotyping-by-sequencing’ (GBS) of microsatellite loci for rapid and cost-effective genotyping in large-scale population genetic studies. The recovery of individual genotypes from large sequence pools was achieved by PCR-incorporated combinatorial barcoding using universal primers. Three experimental conditions were employed to explore the possibility of using this approach with existing and novel multiplex marker panels and weighted amplicon mixture. The GBS approach was validated against microsatellite data generated by capillary electrophoresis. GBS allows access to the underlying nucleotide sequences that can reveal homoplasy, even in large datasets and facilitates cross laboratory transfer. GBS of microsatellites, using individual combinatorial barcoding, is potentially faster and cheaper than current microsatellite approaches and offers better and more data.
Most of the current knowledge on the genetic basis of adaptive evolution is based on the analysis of single nucleotide polymorphisms (SNPs). Despite increasing evidence for their causal role, the contribution of structural variants to adaptive evolution remains largely unexplored. In this work, we analyzed the population frequencies of 1,615 Transposable Element (TE) insertions annotated in the reference genome of Drosophila melanogaster , in 91 samples from 60 worldwide natural populations. We identified a set of 300 polymorphic TEs that are present at high population frequencies, and located in genomic regions with high recombination rate, where the efficiency of natural selection is high. The age and the length of these 300 TEs are consistent with relatively young and long insertions reaching high frequencies due to the action of positive selection. Besides, we identified a set of 21 fixed TEs also likely to be adaptive. Indeed, we, and others, found evidence of selection for 84 of these reference TE insertions. The analysis of the genes located nearby these 84 candidate adaptive insertions suggested that the functional response to selection is related with the GO categories of response to stimulus, behavior, and development. We further showed that a subset of the candidate adaptive TEs affects expression of nearby genes, and five of them have already been linked to an ecologically relevant phenotypic effect. Our results provide a more complete understanding of the genetic variation and the fitness-related traits relevant for adaptive evolution. Similar studies should help uncover the importance of TE-induced adaptive mutations in other species as well.
De novo gene origination has been recently established as an important mechanism for the formation of new genes. In organisms with a large genome, intergenic and intronic regions provide plenty of raw material for new transcriptional events to occur, but little is know about how de novo transcripts originate in more densely-packed genomes. Here, we identify 213 de novo originated transcripts in Saccharomyces cerevisiae using deep transcriptomics and genomic synteny information from multiple yeast species grown in two different conditions. We find that about half of the de novo transcripts are expressed from regions which already harbor other genes in the opposite orientation; these transcripts show similar expression changes in response to stress as their overlapping counterparts, and some appear to translate small proteins. Thus, a large fraction of de novo genes in yeast are likely to co-evolve with already existing genes.
Although transposable elements are an important source of regulatory variation, their genome-wide contribution to the transcriptional regulation of stress-response genes has not been studied yet. Stress is a major aspect of natural selection in the wild, leading to changes in the transcriptional regulation of a variety of genes that are often triggered by one or a few transcription factors. In this work, we take advantage of the wealth of information available for Drosophila melanogaster and humans to analyze the role of transposable elements in six stress regulatory networks: immune, hypoxia, oxidative, xenobiotic, heat shock, and heavy metal. We found that transposable elements were enriched for caudal, dorsal, HSF, and tango binding sites in D. melanogaster and for NFE2L2 binding sites in humans. Taking into account the D. melanogaster population frequencies of transposable elements with predicted binding motifs and/or binding sites, we showed that those containing three or more binding motifs/sites are more likely to be functional. For a representative subset of these TEs, we performed in vivo transgenic reporter assays in different stress conditions. Overall, our results showed that TEs are relevant contributors to the transcriptional regulation of stress-response genes.
In breast cancer cells, some topologically associating domains (TADs) behave as hormonal gene regulation units, within which gene transcription is coordinately regulated in response to steroid hormones. Here we further describe that responsive TADs contain 20-to 100-kb-long clusters of intermingled estrogen receptor (ESR1) and progesterone receptor (PGR) binding sites, hereafter called hormone-control regions (HCRs). In T47D cells, we identified more than 200 HCRs, which are frequently bound by unliganded ESR1 and PGR. These HCRs establish steady long-distance inter-TAD interactions between them and organize characteristic looping structures with promoters in their TADs even in the absence of hormones in ESR1 +-PGR + cells. This organization is dependent on the expression of the receptors and is further dynamically modulated in response to steroid hormones. HCRs function as platforms that integrate different signals, resulting in some cases in opposite transcriptional responses to estrogens or progestins. Altogether, these results suggest that steroid hormone receptors act not only as hormone-regulated sequence-specific transcription factors but also as local and global genome organizers.
Large-scale evolutionary studies often require the automated construction of alignments of a large number of homologous gene families. The majority of eukaryotic genes can produce different transcripts due to alternative splicing or transcription initiation, and many such transcripts encode different protein isoforms. As analyses tend to be gene centered, one single-protein isoform per gene is selected for the alignment, with the de facto approach being to use the longest protein isoform per gene (Longest), presumably to avoid including partial sequences and to maximize sequence information. Here, we show that this approach is problematic because it increases the number of indels in the alignments due to the inclusion of nonhomologous regions, such as those derived from species-specific exons, increasing the number of misaligned positions. With the aim of ameliorating this problem, we have developed a novel heuristic, Protein ALignment Optimizer (PALO), which, for each gene family, selects the combination of protein isoforms that are most similar in length. We examine several evolutionary parameters inferred from alignments in which the only difference is the method used to select the protein isoform combination: Longest, PALO, the combination that results in the highest sequence conservation, and a randomly selected combination. We observe that Longest tends to overestimate both nonsynonymous and synonymous substitution rates when compared with PALO, which is most likely due to an excess of misaligned positions. The estimation of the fraction of genes that have experienced positive selection by maximum likelihood is very sensitive to the method of isoform selection employed, both when alignments are constructed with MAFFT and with Prank+F. Longest performs better than a random combination but still estimates up to 3 times more positively selected genes than the combination showing the highest conservation, indicating the presence of many false positives. We show that PALO can eliminate the majority of such false positives and thus that it is a more appropriate approach for large-scale analyses than Longest. A web server has been set up to facilitate the use of PALO given a user-defined set of gene families; it is available at http://evolutionarygenomics.imim.es/palo.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.