2017
DOI: 10.1101/224535
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

MEGAN-LR: New algorithms allow accurate binning and easy interactive exploration of metagenomic long reads and contigs

Abstract: Background There are numerous computational tools for taxonomic or functional analysis of microbiome samples, optimized to run on hundreds of millions of short, high quality sequencing reads.Programs such as MEGAN allow the user to interactively navigate these large datasets. Long read sequencing technologies continue to improve and produce increasing numbers of longer reads (of varying lengths in the range of 10k-1M bps, say), but of low quality. There is an increasing interest in using long reads in microbio… Show more

Help me understand this report
View published versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
34
1

Year Published

2020
2020
2022
2022

Publication Types

Select...
7
2

Relationship

3
6

Authors

Journals

citations
Cited by 29 publications
(35 citation statements)
references
References 31 publications
0
34
1
Order By: Relevance
“…For example, to call Bison bison as being present in this instance, at least 3 unique reads must align to a region of the B. bison mitogenome with ≥ 95% identical similarity, with low e-values and high bit scores (metrics used by BLASTn to assess the likelihood of misalignments), and by matching or exceeding other parameters used to define confidence in the taxon identification (the rational for select LCA parameters are discussed further in supplementary Appendix A SET-E. Bioinformatic workflow). Adjusting LCA parameters shifts the trade-off ratio of false positive to false negative assignments, although it would seem that optimal LCA parameters may only exist on a project-by-project or even sample-bysample basis depending on the taxonomic molecular constituents present, the degree of aDNA damage, and the research question (see Huson et al, 2018). For example, if percent identity is set to 100, only exact matches will be considered, but then aDNA fragments with terminal base modifications (the majority of aDNA molecules) will be unassigned when their taxonomic classification might otherwise be obvious.…”
Section: Megan Lcamentioning
confidence: 99%
“…For example, to call Bison bison as being present in this instance, at least 3 unique reads must align to a region of the B. bison mitogenome with ≥ 95% identical similarity, with low e-values and high bit scores (metrics used by BLASTn to assess the likelihood of misalignments), and by matching or exceeding other parameters used to define confidence in the taxon identification (the rational for select LCA parameters are discussed further in supplementary Appendix A SET-E. Bioinformatic workflow). Adjusting LCA parameters shifts the trade-off ratio of false positive to false negative assignments, although it would seem that optimal LCA parameters may only exist on a project-by-project or even sample-bysample basis depending on the taxonomic molecular constituents present, the degree of aDNA damage, and the research question (see Huson et al, 2018). For example, if percent identity is set to 100, only exact matches will be considered, but then aDNA fragments with terminal base modifications (the majority of aDNA molecules) will be unassigned when their taxonomic classification might otherwise be obvious.…”
Section: Megan Lcamentioning
confidence: 99%
“…The next step for the HQ MAGs was to correct the frameshift errors, as described in [15], using Diamond 0.9.32 [27] and MEGAN-LR 6.19.1 [28]. We used ideel [29] to visualize the number of truncated ORF.…”
Section: Metagenomics Assembly and Polishingmentioning
confidence: 99%
“…For functional characterization, raw reads were submitted for BLAST search against the nr database using diamond BLASTx at e-value 1e-3, similarity > 90%, and alignment length > 20 amino acids [111]. Using the best-hit algorithm, individual reads were described to belong to a class in the particular classification system [112]. The method was as follows: For a read 'r', let 'a' describe the highest-scoring alignment to a reference protein belonging to functional class 'c' and the number of reads that mapped to the individual proteins were then analysed using databases eggNOG [113], KEGG [114] and SEED [115].…”
Section: Taxonomic and Functional Characterizationmentioning
confidence: 99%