2017
DOI: 10.1093/bioinformatics/btx036
|View full text |Cite
|
Sign up to set email alerts
|

MetaShot: an accurate workflow for taxon classification of host-associated microbiome from shotgun metagenomic data

Abstract: SummaryShotgun metagenomics by high-throughput sequencing may allow deep and accurate characterization of host-associated total microbiomes, including bacteria, viruses, protists and fungi. However, the analysis of such sequencing data is still extremely challenging in terms of both overall accuracy and computational efficiency, and current methodologies show substantial variability in misclassification rate and resolution at lower taxonomic ranks or are limited to specific life domains (e.g. only bacteria). W… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
32
0
1

Year Published

2017
2017
2023
2023

Publication Types

Select...
3
2
2

Relationship

0
7

Authors

Journals

citations
Cited by 21 publications
(33 citation statements)
references
References 12 publications
0
32
0
1
Order By: Relevance
“…We have implemented the set cover approach to taxonomic annotation in a next release of the TANGO software (Clemente et al, 2011;Alonso et al, 2013), which belongs in the BioMaS (Fosso et al, 2015) and MetaShot (Fosso et al, 2017) pipelines. The new implementation of TANGO consists of the following: a first Python script for extracting the candidates matches for each read from the BLAST output, a second Python script for taxonomic annotation using the NCBI Taxonomy (Federhen, 2012(Federhen, , 2015, based on the ETE Toolkit (Huerta-Cepas et al, 2016), a third Python script for taxonomic annotation using the Greengenes taxonomy (McDonald et al, 2012), fourth Python script for resolving any remaining ambiguities by finding an exact solution to a set cover problem with the least total size of subsets, based on Gurobi Optimizer (Gurobi Optimization, Inc., 2017), and a fifth Python script for obtaining the relative abundance profile of the metagenomic sample.…”
Section: Resultsmentioning
confidence: 99%
See 1 more Smart Citation
“…We have implemented the set cover approach to taxonomic annotation in a next release of the TANGO software (Clemente et al, 2011;Alonso et al, 2013), which belongs in the BioMaS (Fosso et al, 2015) and MetaShot (Fosso et al, 2017) pipelines. The new implementation of TANGO consists of the following: a first Python script for extracting the candidates matches for each read from the BLAST output, a second Python script for taxonomic annotation using the NCBI Taxonomy (Federhen, 2012(Federhen, , 2015, based on the ETE Toolkit (Huerta-Cepas et al, 2016), a third Python script for taxonomic annotation using the Greengenes taxonomy (McDonald et al, 2012), fourth Python script for resolving any remaining ambiguities by finding an exact solution to a set cover problem with the least total size of subsets, based on Gurobi Optimizer (Gurobi Optimization, Inc., 2017), and a fifth Python script for obtaining the relative abundance profile of the metagenomic sample.…”
Section: Resultsmentioning
confidence: 99%
“…Annotating a read as coming from the LCA of the candidate sequences in a reference taxonomy (Huson and Weber, 2013) maximizes precision, as in that case there are no TN and no FN, but at the expense of specificity, because the number of FP in a reference taxonomy can be very large. Annotating a read as coming from an internal node with the largest F-measure value (Clemente et al, 2011;Alonso et al, 2013;Fosso et al, 2015Fosso et al, , 2017 minimizes the classification error as a combination of precision and sensitivity.…”
Section: Introductionmentioning
confidence: 99%
“…Increasing threshold to 100 again decreased the number of filtered eukaryotic reads (to 74-86%) with only slight improvement on the number of retained virus reads (0-9%). For the simulated metagenome (Fosso et al, 2017), setting the threshold to 50 results in filtering 99.93% of host reads and only 0.05% of viral reads (excluding the endogenous retroviral reads, which are filtered to a large extent). Thus, we selected 50 as a working threshold although we recognize that a more robust optimization can be performed.…”
Section: Unix Pipeline For Assembly Taxonomic Profiling and Binning mentioning
confidence: 99%
“…Pipelines in the third group, such as MetaPhlan2 (Truong et al, 2015), Kraken2 (Wood et al, 2019) and Centrifuge (Kim et al, 2016a), can perform composition analysis for all known taxa. There are also a number of tools, pipelines, and algorithms for virus discovery, including Genome Detective (Vilsker et al, 2019), VIP (Li et al, 2016), PathSeq (Kostic et al, 2011), SURPI (Ho and Tzanetakis, 2014), READSCAN (Naeem et al, 2013) , VirusFinder (Wang et al, 2013) and MetaShot (Fosso et al, 2017). Most of these tools depend exclusively on nucleotide-level sequence alignments and can detect viruses with highly similar sequences to a known virus.…”
Section: Introductionmentioning
confidence: 99%
“…Thus, it allows for an unbiased diagnostic analysis. There is a variety of tool able to address NGS-based pathogen related questions with different focuses: either aiming to discover yet unknown genomes [5][6][7][8][9][10][11][12][13][14][15][16][17][18][19][20][21][22] or to detect known species in a sample [23][24][25][26][27][28][29][30][31][32][33][34][35][36][37][38][39][40]. Among both groups, there are different underlying algorithms, the main distinction running between alignment-based [15-17, 19, 23, 25, 26, 28-31, 33, 35-37, 39, 40] and alignment-free methods [6,9,12,21,32,38].…”
Section: Introductionmentioning
confidence: 99%