Abstract:Parvoviruses (family Parvoviridae) are small, non-enveloped DNA viruses that infect a broad range of animal species. Comparative studies, supported by experimental evidence, show that many vertebrate species contain sequences derived from ancient parvoviruses embedded in their genomes. These endogenous parvoviral elements (EPVs), which arose via recombination-based mechanisms in infected germline cells of ancestral organisms, constitute a form of molecular fossil record that can be used to investigate the orig… Show more
Background: Genomic regions that remain poorly understood, often referred to as the "dark genome," contain a variety of functionally relevant and biologically informative genome features. These include endogenous viral elements (EVEs) - virus-derived sequences that can dramatically impact host biology and serve as a virus "fossil record". In this study, we introduce a database-integrated genome screening (DIGS) approach to investigating the dark genome in silico, focusing on EVEs found within vertebrate genomes. Results: Using DIGS on 874 vertebrate species genomes, we uncovered approximately 1.1 million EVE sequences, with over 99% originating from endogenous retroviruses or transposable elements that contain EVE DNA. We show that the remaining 6038 sequences represent over a thousand distinct horizontal gene transfer events across ten virus families, including some that have not previously been reported as EVEs. We explore the genomic and phylogenetic characteristics of non-retroviral EVEs and determine their rates of acquisition during vertebrate evolution. Our study uncovers novel virus diversity and broadens our knowledge of virus distribution among vertebrate hosts. It also provides new insights into the long-term evolution of highly pathogenic filoviruses. Conclusions: We comprehensively catalogue and analyse EVEs within 874 vertebrate genomes, shedding light on the distribution, diversity and long-term evolution of viruses, and revealing their extensive impact on vertebrate genome evolution. Our results demonstrate the power of linking a relational database management system to a similarity search-based screening pipeline for in silico exploration of the dark genome.
Background: Genomic regions that remain poorly understood, often referred to as the "dark genome," contain a variety of functionally relevant and biologically informative genome features. These include endogenous viral elements (EVEs) - virus-derived sequences that can dramatically impact host biology and serve as a virus "fossil record". In this study, we introduce a database-integrated genome screening (DIGS) approach to investigating the dark genome in silico, focusing on EVEs found within vertebrate genomes. Results: Using DIGS on 874 vertebrate species genomes, we uncovered approximately 1.1 million EVE sequences, with over 99% originating from endogenous retroviruses or transposable elements that contain EVE DNA. We show that the remaining 6038 sequences represent over a thousand distinct horizontal gene transfer events across ten virus families, including some that have not previously been reported as EVEs. We explore the genomic and phylogenetic characteristics of non-retroviral EVEs and determine their rates of acquisition during vertebrate evolution. Our study uncovers novel virus diversity and broadens our knowledge of virus distribution among vertebrate hosts. It also provides new insights into the long-term evolution of highly pathogenic filoviruses. Conclusions: We comprehensively catalogue and analyse EVEs within 874 vertebrate genomes, shedding light on the distribution, diversity and long-term evolution of viruses, and revealing their extensive impact on vertebrate genome evolution. Our results demonstrate the power of linking a relational database management system to a similarity search-based screening pipeline for in silico exploration of the dark genome.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.