2018
DOI: 10.3389/fgene.2018.00304
|View full text |Cite
|
Sign up to set email alerts
|

MARVEL, a Tool for Prediction of Bacteriophage Sequences in Metagenomic Bins

Abstract: Here we present MARVEL, a tool for prediction of double-stranded DNA bacteriophage sequences in metagenomic bins. MARVEL uses a random forest machine learning approach. We trained the program on a dataset with 1,247 phage and 1,029 bacterial genomes, and tested it on a dataset with 335 bacterial and 177 phage genomes. We show that three simple genomic features extracted from contig sequences were sufficient to achieve a good performance in separating bacterial from phage sequences: gene density, strand shifts,… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
112
0

Year Published

2019
2019
2023
2023

Publication Types

Select...
5
2
1

Relationship

0
8

Authors

Journals

citations
Cited by 135 publications
(112 citation statements)
references
References 53 publications
0
112
0
Order By: Relevance
“…We chose VirSorter over virfinder (Ren, Ahlgren, Lu, Fuhrman, & Sun, ) because it has been shown that the later may misclassify eukaryotic sequences as viral. Some other approaches have been recently developed to retrieve viral signals from (meta‐)genomic data, such as marvel (Amgarten, Braga, da Silva, & Setubal, ) and virminer (Zheng et al, ), but they have been developed to detect viral genomes in prokaryotes. From the 64 viral contigs retrieved in the protist cells, the narrow host range of these viruses was remarkable given that >95% of the detected viral sequences ( n = 61) were specific to one stramenopile lineage and just a few were shared between lineages ( n = 3).…”
Section: Discussionmentioning
confidence: 99%
“…We chose VirSorter over virfinder (Ren, Ahlgren, Lu, Fuhrman, & Sun, ) because it has been shown that the later may misclassify eukaryotic sequences as viral. Some other approaches have been recently developed to retrieve viral signals from (meta‐)genomic data, such as marvel (Amgarten, Braga, da Silva, & Setubal, ) and virminer (Zheng et al, ), but they have been developed to detect viral genomes in prokaryotes. From the 64 viral contigs retrieved in the protist cells, the narrow host range of these viruses was remarkable given that >95% of the detected viral sequences ( n = 61) were specific to one stramenopile lineage and just a few were shared between lineages ( n = 3).…”
Section: Discussionmentioning
confidence: 99%
“…K-mer similarity score can discriminate viruses within other prokaryotic genomes even the viral reads is short(500 bp). Furthermore, MAR-VEL Amgarten et al (2018) tool is used for identifying bacteriophage sequences in metagenomic bins. Random forest approach is applied in MARVEL and model input is various engineered features, such as gene density, strand shifts, and fractions of significant hits to a viral protein database Grazziotin et al (2016).…”
Section: Machine Learning Toolsmentioning
confidence: 99%
“…A different approach consists of using machine learning techniques to learn from examples to classify viral genomes and to generalize to novel samples. In particular, several machine learning models for detecting viruses in metagenomic data have been already published [23,24,25], but none of them were trained nor tested to identify viruses in different human biospecimens.…”
Section: Introductionmentioning
confidence: 99%
“…Currently, the detection of potential viral genomes in human biospecimens is usually 19 performed by NCBI BLAST, which implements alignment-based classification where 20 sequences are aligned to known genomes from public databases and then estimates how 21 much percentage similarity they share. However, metagenomic samples might contain a 22 large number of highly divergent viruses that have no homologs at all among known 23 genomes. As a consequence, many sequences generated from NGS technologies are 24 classified as "unknown" by BLAST [5,18].…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation