In this paper, we present a toolset and related resources for rapid identification of viruses and microorganisms from short-read or long-read sequencing data. We present fastv as an ultra-fast tool to detect microbial sequences present in sequencing data, identify target microorganisms, and visualize coverage of microbial genomes. This tool is based on the k-mer mapping and extension method. K-mer sets are generated by UniqueKMER, another tool provided in this toolset.UniqueKMER can generate complete sets of unique k-mers for each genome within a large set of viral or microbial genomes. For convenience, unique k-mers for microorganisms and common viruses that afflict humans have been generated and are provided with the tools. As a lightweight tool, fastv accepts FASTQ data as input, and directly outputs the results in both HTML and JSON formats. Prior to the k-mer analysis, fastv automatically performs adapter trimming, quality pruning, base correction, and other pre-processing to ensure the accuracy of k-mer analysis. Specifically, fastv provides built-in support for rapid SARS-CoV-2 identification and typing. Experimental results showed that fastv achieved 100% sensitivity and 100% specificity for detecting SARS-CoV-2 from sequencing data; and can distinguish SARS-CoV-2 from SARS, MERS, and other coronaviruses. This toolset is available at: https://github.com/OpenGene/fastv.
As part of the OpenGene projects, fastv and UniqueKMER are open-sourced through the MIT license.Fastv is available at https://github.com/OpenGene/fastv, and UniqueKMER is available at https://github.com/OpenGene/UniqueKMER. The pre-computed unique k-mer resources are also provided in these repositories.
Key PointsThis tool presents a new tool fastv for rapid identification of SARS-Cov-2, other viruses and microorganisms. Another tool UniqueKMER is presented for generation of high-quality unique k-mers.Unique k-mer resources for tens of thousands of viruses and microorganisms have been precomputed, and uploaded to the tools' repositories.
Supplementary DataA pipeline for alignment-based SARS-CoV-2 identification was provided in Supplementary file 1.