BackgroundViruses are an underrepresented taxa in the study and identification of microbiome constituents; however, they play an essential role in health, microbiome regulation, and transfer of genetic material. Only a few thousand viruses have been isolated, sequenced, and assigned a taxonomy, which further limits the ability to identify and quantify viruses in the microbiome. Additionally, the vast diversity of viruses represents a challenge for classification, not only in constructing a viral taxonomy, but also in identifying similarities between a virus’ genotype and its phenotype. However, the diversity of viral sequences can be leveraged to classify their sequences in metagenomic and metatranscriptomic samples.MethodsTo identify and quantify viruses in transcriptomic and genomic samples, we developed a dynamic programming algorithm for creating a classification tree out of 715,672 metagenome viruses. To create the classification tree, we clustered proportional similarity scores generated from the k-mer profiles of each of the metagenome viruses. We then integrated the viral classification tree with the NCBI taxonomy for use with ParaKraken (a parallelized version of Kraken), a metagenomic/transcriptomic classifier. The resulting Kraken2 database of the metagenomic viruses can be found here: https://www.osti.gov/biblio/1615774 and is compatible with Kraken2.ResultsTo illustrate the breadth of our utility for classifying viruses with ParaKraken, especially samples without virus-induced pathophysiology, we analyzed data from a plant metagenome study identifying the differences between two Populus genotypes in three different compartments and on a human metatranscriptome study identifying the differences between Autism Spectrum Disorder patients and controls in post mortem brain tissue. In the Populus study, we identified genotype and compartment-specific viral signatures, while in the Autism study we identified a significant increase in abundance of eight viral sequences in post mortem brains. We also show the potential accuracy for classifying viruses by utilizing the NCBI viral databases to identify the uniqueness of viral sequences and to identify pathogenic viruses in known COVID-19 and cassava brown streak virus infection samples.ConclusionViruses represent an essential component of the microbiome. The ability to classify viruses represents the compulsory first step in better understanding their role in the microbiome. The viral classification method presented here allows for a more complete identification of viral sequences for use in identifying associations between viruses and the host and viruses and other microbiome members and can be used with any tool that utilizes a taxonomy for classification (such as Kraken).