2018
DOI: 10.1128/mspheredirect.00069-18
|View full text |Cite
|
Sign up to set email alerts
|

A Reference Viral Database (RVDB) To Enhance Bioinformatics Analysis of High-Throughput Sequencing for Novel Virus Detection

Abstract: Detection of distantly related viruses by high-throughput sequencing (HTS) is bioinformatically challenging because of the lack of a public database containing all viral sequences, without abundant nonviral sequences, which can extend runtime and obscure viral hits. Our reference viral database (RVDB) includes all viral, virus-related, and virus-like nucleotide sequences (excluding bacterial viruses), regardless of length, and with overall reduced cellular sequences. Semantic selection criteria (SEM-I) were us… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
131
0

Year Published

2019
2019
2023
2023

Publication Types

Select...
4
4
2

Relationship

0
10

Authors

Journals

citations
Cited by 201 publications
(139 citation statements)
references
References 48 publications
0
131
0
Order By: Relevance
“…Genomes were aligned to the reference assembly of SARS-CoV-2 as available from Refseq (O'Leary et al 2016); Refseq accession NC_045512.2) by means of the nucmer (Marçais et al, 2018) program. Viral genomes of the SARS 2003 outbreak were retrieved from the NCBI virus database (Goodacre et al, 2018). Classification/association of strains to the 3 (early/middle/late) phases of the epidemic are according to Song et al 2005. Only isolates from the late phase of the epidemic were considered, based on considerations regarding the availability of a relatively high number of genomes (65) and the high level of similarity with the reference SARS-CoV genome.…”
Section: Methodsmentioning
confidence: 99%
“…Genomes were aligned to the reference assembly of SARS-CoV-2 as available from Refseq (O'Leary et al 2016); Refseq accession NC_045512.2) by means of the nucmer (Marçais et al, 2018) program. Viral genomes of the SARS 2003 outbreak were retrieved from the NCBI virus database (Goodacre et al, 2018). Classification/association of strains to the 3 (early/middle/late) phases of the epidemic are according to Song et al 2005. Only isolates from the late phase of the epidemic were considered, based on considerations regarding the availability of a relatively high number of genomes (65) and the high level of similarity with the reference SARS-CoV genome.…”
Section: Methodsmentioning
confidence: 99%
“…Additionally, public databases for microbial reference genomes are being continuously updated, and laborato ries need to keep track of the exact versions used in addi tion to dealing with potential misannotations and other database errors. Larger and more complete databases containing publicly deposited sequences such as the National Center for Biotechnology Information (NCBI) Nucleotide database are more comprehensive but also contain more errors than curated, more limited data bases such as FDA ARGOS 91,113 or the FDA Reference Viral Database (RVDB) 114 . A combined approach that incorporates annotated sequences from multiple data bases may enable greater confidence in the sensitivity and specificity of microorganism identification.…”
Section: Bioinformatics Challengesmentioning
confidence: 99%
“…The poor quality of the reference sequences is a major obstacle to processing data. Because of this, specialized databases are being compiled manually to ensure a strict quality control, e.g., ViPR [170], RVDB [171] and viruSITE [172]. However, they are reliable for the same reason that they are limited, because only a small fraction of all published sequences ever make it to these databases.…”
Section: Methods Of Sequencing Data Analysismentioning
confidence: 99%