The discovery of cruciviruses revealed the most explicit example of a common protein homologue between DNA and RNA viruses to date. Cruciviruses are a novel group of circular Rep-encoding single-stranded DNA (ssDNA) (CRESS-DNA) viruses that encode capsid proteins that are most closely related to those encoded by RNA viruses in the family Tombusviridae. The apparent chimeric nature of the two core proteins encoded by crucivirus genomes suggests horizontal gene transfer of capsid genes between DNA and RNA viruses. Here, we identified and characterized 451 new crucivirus genomes and 10 capsid-encoding circular genetic elements through de novo assembly and mining of metagenomic data. These genomes are highly diverse, as demonstrated by sequence comparisons and phylogenetic analysis of subsets of the protein sequences they encode. Most of the variation is reflected in the replication-associated protein (Rep) sequences, and much of the sequence diversity appears to be due to recombination. Our results suggest that recombination tends to occur more frequently among groups of cruciviruses with relatively similar capsid proteins and that the exchange of Rep protein domains between cruciviruses is rarer than intergenic recombination. Additionally, we suggest members of the stramenopiles/alveolates/Rhizaria supergroup as possible crucivirus hosts. Altogether, we provide a comprehensive and descriptive characterization of cruciviruses. IMPORTANCE Viruses are the most abundant biological entities on Earth. In addition to their impact on animal and plant health, viruses have important roles in ecosystem dynamics as well as in the evolution of the biosphere. Circular Rep-encoding single-stranded (CRESS) DNA viruses are ubiquitous in nature, many are agriculturally important, and they appear to have multiple origins from prokaryotic plasmids. A subset of CRESS-DNA viruses, the cruciviruses, have homologues of capsid proteins encoded by RNA viruses. The genetic structure of cruciviruses attests to the transfer of capsid genes between disparate groups of viruses. However, the evolutionary history of cruciviruses is still unclear. By collecting and analyzing cruciviral sequence data, we provide a deeper insight into the evolutionary intricacies of cruciviruses. Our results reveal an unexpected diversity of this virus group, with frequent recombination as an important determinant of variability.
Background Although originally thought to evolve clonally, studies have revealed that most bacteria exchange DNA. However, it remains unclear to what extent gene flow shapes the evolution of bacterial genomes and maintains the cohesion of species. Results Here, we analyze the patterns of gene flow within and between >2600 bacterial species. Our results show that fewer than 10% of bacterial species are truly clonal, indicating that purely asexual species are rare in nature. We further demonstrate that the taxonomic criterion of ~95% genome sequence identity routinely used to define bacterial species does not accurately represent a level of divergence that imposes an effective barrier to gene flow across bacterial species. Interruption of gene flow can occur at various sequence identities across lineages, generally from 90 to 98% genome identity. This likely explains why a ~95% genome sequence identity threshold has empirically been judged as a good approximation to define bacterial species. Our results support a universal mechanism where the availability of identical genomic DNA segments required to initiate homologous recombination is the primary determinant of gene flow and species boundaries in bacteria. We show that these barriers of gene flow remain porous since many distinct species maintain some level of gene flow, similar to introgression in sexual organisms. Conclusions Overall, bacterial evolution and speciation are likely shaped by similar forces driving the evolution of sexual organisms. Our findings support a model where the interruption of gene flow—although not necessarily the initial cause of speciation—leads to the establishment of permanent and irreversible species borders.
The core genome represents the set of genes shared by all, or nearly all, strains of a given population or species of prokaryotes. Inferring the core genome is integral to many genomic analyses, however, most methods rely on the comparison of all the pairs of genomes; a step that is becoming increasingly difficult given the massive accumulation of genomic data. Here, we present CoreCruncher; a program that robustly and rapidly constructs core genomes across hundreds or thousands of genomes. CoreCruncher does not compute all pairwise genome comparisons and uses a heuristic based on the distributions of identity scores to classify sequences as orthologs or paralogs/xenologs. Although it is much faster than current methods, our results indicate that our approach is more conservative than other tools and less sensitive to the presence of paralogs and xenologs. CoreCruncher is freely available at https://github.com/lbobay/CoreCruncher. CoreCruncher is written in Python 3.7 but can also run on Python 2.7 without modification. It requires the python library Numpy and either Usearch or Blast. Certain options require the programs muscle or mafft.
The discovery of cruciviruses revealed the most explicit example of a common protein homologue between DNA and RNA viruses to date. Cruciviruses are a novel group of circular Rep-encoding ssDNA (CRESS-DNA) viruses that encode capsid proteins (CPs) that are most closely related to those encoded by RNA viruses in the family Tombusviridae. The apparent chimeric nature of the two core proteins encoded by crucivirus genomes suggests horizontal gene transfer of CP genes between DNA and RNA viruses. Here, we identified and characterized 451 new crucivirus genomes and ten CP-encoding circular genetic elements through de novo assembly and mining of metagenomic data. These genomes are highly diverse, as demonstrated by sequence comparisons and phylogenetic analysis of subsets of the protein sequences they encode.Most of the variation is reflected in the replication associated protein (Rep) sequences, and much of the sequence diversity appears to be due to recombination. Our results suggest that recombination tends to occur more frequently among groups of cruciviruses with relatively similar capsid proteins, and that the exchange of Rep protein domains between cruciviruses is rarer than gene exchange. Altogether, we provide a comprehensive and descriptive characterization of cruciviruses. IMPORTANCEViruses are the most abundant biological entities on Earth. In addition to their impact on animal and plant health, viruses have important roles in ecosystem dynamics as well as in the evolution of the biosphere. Circular Rep-encoding single-stranded (CRESS) DNA viruses are ubiquitous in nature, many are agriculturally important, and are viruses that appear to have multiple origins from prokaryotic plasmids. CRESS-DNA viruses such as the cruciviruses, have homologues of capsid proteins (CPs) encoded by RNA viruses. The genetic structure of cruciviruses attests to the transfer of capsid genes between disparate groups of viruses. However, the evolutionary history of cruciviruses is still unclear. By collecting and analyzing cruciviral sequence data, we provide a deeper insight into the evolutionary intricacies of cruciviruses. Our results reveal an unexpected diversity of this virus group, with frequent recombination as an important determinant of variability. Bootstrap support 0.95 -1 0.8 -0.95 1 subs/site UF Bootstrap support 0.95 -1 0.8 -0.95 1 subs/site Rep CP A B * Endonuclease Helicase UF Bootstrap support 0.95 -1 0.8 -0.95 Cp-based clusters Endonuclease Helicase UF Bootstrap support 0.95 -1 0.8 -0.95 Endonuclease Helicase UF Bootstrap support 0.95 -1 0.8 -0.95
Nucleic acid secondary structures play important roles in regulating biological processes. StemLoop-Finder is a computational tool to recognize and annotate conserved structural motifs in large data sets. The program is optimized for the detection of stem-loop structures that may serve as origins of replication in circular replication-associated protein (Rep)-encoding single-stranded (CRESS) DNA viruses.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.