2017
DOI: 10.1101/126953
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

MAPseq: improved speed, accuracy and consistency in ribosomal RNA sequence analysis

Abstract: Metagenomic sequencing has become crucial to studying microbial communities, but meaningful taxonomic analysis and inter-comparison of such data are still hampered by technical limitations, between-study design variability and inconsistencies between taxonomies used. Here we present MAPseq, a framework for reference-based rRNA metagenomic analysis that is up to 30% more accurate (F 1/2 score) and up to one hundred times faster than existing solutions, providing in a single run multiple taxonomy classifications… Show more

Help me understand this report
View published versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
6
0

Year Published

2020
2020
2024
2024

Publication Types

Select...
5

Relationship

1
4

Authors

Journals

citations
Cited by 5 publications
(6 citation statements)
references
References 19 publications
0
6
0
Order By: Relevance
“…In contrast to a recent study on fecal microbiota of the same host species [56], a large fraction of OTUs (48.4%) could not be taxonomically classified even at the phylum level in our analysis. This difference is due to the more precise mapping approach we used here (MAPseq; [62]), which assigns low confidence to a read if multiple taxa have similar alignment scores and can thus not be confidently distinguished. In various benchmarks, this approach was shown to yield better classifications than the commonly used less conservative alternatives [62,82].…”
Section: Discussionmentioning
confidence: 99%
See 1 more Smart Citation
“…In contrast to a recent study on fecal microbiota of the same host species [56], a large fraction of OTUs (48.4%) could not be taxonomically classified even at the phylum level in our analysis. This difference is due to the more precise mapping approach we used here (MAPseq; [62]), which assigns low confidence to a read if multiple taxa have similar alignment scores and can thus not be confidently distinguished. In various benchmarks, this approach was shown to yield better classifications than the commonly used less conservative alternatives [62,82].…”
Section: Discussionmentioning
confidence: 99%
“…This difference is due to the more precise mapping approach we used here (MAPseq; [62]), which assigns low confidence to a read if multiple taxa have similar alignment scores and can thus not be confidently distinguished. In various benchmarks, this approach was shown to yield better classifications than the commonly used less conservative alternatives [62,82]. The high fraction of largely unclassified OTUs may be an indication that much is still to be learned about the macaque gut microbiome.…”
Section: Discussionmentioning
confidence: 99%
“…This included using the fastx_truncate option to remove the 16S rRNA PCR primers, fastq_filter to filter based on a minimum fragment length of 150 bp and ee (expected error) value of 1.0 quality, fastx_uniques to get unique sequences per sample, and unoise3 to check for and remove chimeric sequences. The resulting datasets were then classified with MAPseq (Matias Rodrigues et al, 2017), utilizing their default curated database. Briefly, this was created using NCBI GenBank and RefSeq reference sequence databases, extracting any sequences annotated as ribosomal RNA with 16S or 18S in the annotation.…”
Section: S Rrna Gene Sequencing and Analysismentioning
confidence: 99%
“…Contrary to the above, MG-RAST performs sequence-based rRNA searches against M5RNA, a subset of the M5NR database (Wilke et al, 2012) containing non-redundant rRNA sequences, using VSEARCH (Rognes et al, 2016), an open-source alternative of the usearch tool (Edgar, 2010). Another useful tool is MapSeq (Matias Rodrigues et al, 2017), a k-mer based rRNA sequence search and analysis tool that is used by MGnify to analyze cmsearch results and provide SSU and LSU taxonomy assignment. Finally, the identified RNA genes can be used to establish a generalized functional profile for the analyzed sample, using functional annotations from reference genomes with matches to the detected marker regions.…”
Section: Gene Calling and Annotationmentioning
confidence: 99%