While some human-specific protein-coding genes have been proposed to originate from ancestral lncRNAs, the transition process remains poorly understood. Here we identified 64 hominoid-specific de novo genes and report a mechanism for the origination of functional de novo proteins from ancestral lncRNAs with precise splicing structures and specific tissue expression profiles. Whole-genome sequencing of dozens of rhesus macaque animals revealed that these lncRNAs are generally not more selectively constrained than other lncRNA loci. The existence of these newly-originated de novo proteins is also not beyond anticipation under neutral expectation, as they generally have longer theoretical lifespan than their current age, due to their GC-rich sequence property enabling stable ORFs with lower chance of non-sense mutations. Interestingly, although the emergence and retention of these de novo genes are likely driven by neutral forces, population genetics study in 67 human individuals and 82 macaque animals revealed signatures of purifying selection on these genes specifically in human population, indicating a proportion of these newly-originated proteins are already functional in human. We thus propose a mechanism for creation of functional de novo proteins from ancestral lncRNAs during the primate evolution, which may contribute to human-specific genetic novelties by taking advantage of existed genomic contexts.
Null variants are prevalent within the human genome, and their accurate interpretation is critical for clinical management. In 2018, the ClinGen Sequence Variant Interpretation (SVI) Working Group refined the only criterion with a very strong pathogenicity rating (PVS1). To streamline PVS1 interpretation, we have developed an automatic classification tool with a graphical user interface called AutoPVS1. The performance of AutoPVS1 was assessed using 56 variants manually curated by the ClinGen's SVI Working Group; it achieved an interpretation concordance of 93% (52/56). A further analysis of 28,586 putative loss‐of‐function variants by AutoPVS1 demonstrated that at least 27.7% of them do not reach a very strong strength level, 17.5% because of variant‐specific issues and 10.2% due to disease mechanism considerations. Notably, 41.0% (1,936/4,717) of splicing variants were assigned a decreased preliminary PVS1 strength level, a significantly greater fraction than in frameshift variants (13.2%) and nonsense variants (10.8%). Our results reinforce the necessity of considering variant‐specific issues and disease mechanisms in variant interpretation and demonstrate that AutoPVS1 meets an urgent need by enabling biocurators to easily assign accurate, reliable and reproducible PVS1 strength levels in the process of variant interpretation. AutoPVS1 is publicly available at http://autopvs1.genetics.bgi.com/.
With genome sequence and composition highly analogous to human, rhesus macaque represents a unique reference for evolutionary studies of human biology. Here, we developed a comprehensive genomic framework of rhesus macaque, the RhesusBase2, for evolutionary interrogation of human genes and the associated regulations. A total of 1,667 next-generation sequencing (NGS) data sets were processed, integrated, and evaluated, generating 51.2 million new functional annotation records. With extensive NGS annotations, RhesusBase2 refined the fine-scale structures in 30% of the macaque Ensembl transcripts, reporting an accurate, up-to-date set of macaque gene models. On the basis of these annotations and accurate macaque gene models, we further developed an NGS-oriented Molecular Evolution Gateway to access and visualize macaque annotations in reference to human orthologous genes and associated regulations (www.rhesusbase.org/molEvo). We highlighted the application of this well-annotated genomic framework in generating hypothetical link of human-biased regulations to human-specific traits, by using mechanistic characterization of the DIEXF gene as an example that provides novel clues to the understanding of digestive system reduction in human evolution. On a global scale, we also identified a catalog of 9,295 human-biased regulatory events, which may represent novel elements that have a substantial impact on shaping human transcriptome and possibly underpin recent human phenotypic evolution. Taken together, we provide an NGS data-driven, information-rich framework that will broadly benefit genomics research in general and serves as an important resource for in-depth evolutionary studies of human biology.
Although millions of RNA editing events have been reported to modify hereditary information across the primate transcriptome, evidence for their functional significance remains largely elusive, particularly for the vast majority of editing sites in noncoding regions. Here, we report a new mechanism for the functionality of RNA editing—a crosstalk with PIWI-interacting RNA (piRNA) biogenesis. Exploiting rhesus macaque as an emerging model organism closely related to human, in combination with extensive genome and transcriptome sequencing in seven tissues of the same animal, we deciphered accurate RNA editome across both long transcripts and the piRNA species. Superimposing and comparing these two distinct RNA editome profiles revealed 4,170 editing-bearing piRNA variants, or epiRNAs, that primarily derived from edited long transcripts. These epiRNAs represent distinct entities that evidence an intersection between RNA editing regulations and piRNA biogenesis. Population genetics analyses in a macaque population of 31 independent animals further demonstrated that the epiRNA-associated RNA editing is maintained by purifying selection, lending support to the functional significance of this crosstalk in rhesus macaque. Correspondingly, these findings are consistent in human, supporting the conservation of this mechanism during the primate evolution. Overall, our study reports the earliest lines of evidence for a crosstalk between selectively constrained RNA editing regulation and piRNA biogenesis, and further illustrates that such an interaction may contribute substantially to the diversification of the piRNA repertoire in primates.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.