2011
DOI: 10.1155/2011/495849
|View full text |Cite
|
Sign up to set email alerts
|

Discovering the Unknown: Improving Detection of Novel Species and Genera from Short Reads

Abstract: High-throughput sequencing technologies enable metagenome profiling, simultaneous sequencing of multiple microbial species present within an environmental sample. Since metagenomic data includes sequence fragments (“reads”) from organisms that are absent from any database, new algorithms must be developed for the identification and annotation of novel sequence fragments. Homology-based techniques have been modified to detect novel species and genera, but, composition-based methods, have not been adapted. We de… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
11
0

Year Published

2014
2014
2020
2020

Publication Types

Select...
4
1

Relationship

0
5

Authors

Journals

citations
Cited by 12 publications
(11 citation statements)
references
References 28 publications
0
11
0
Order By: Relevance
“…An alternative approach uses a ‘detector’ to flag sequences that represent taxa not present in the reference data set so that they can be removed prior to making taxonomic assignments (Rosen et al . ; Lan et al . ).…”
Section: Discussionmentioning
confidence: 99%
“…An alternative approach uses a ‘detector’ to flag sequences that represent taxa not present in the reference data set so that they can be removed prior to making taxonomic assignments (Rosen et al . ; Lan et al . ).…”
Section: Discussionmentioning
confidence: 99%
“…However, at least two issues make taxonomic assignment difficult. First, the read length obtained by next generation sequencing technologies is not long enough to allow the original methods to properly assign the reads to low taxonomic levels (such as genus or species) due to the low sequence divergence between closely related taxonomic groups [ 3 ]. And second, since reference genomes are not available for many uncultured organisms, an incorrect assignment (or even no assignment at all) may be produced when no closely related species have been previously identified.…”
Section: Introductionmentioning
confidence: 99%
“…Packages using the former approach include MG-RAST [ 4 ] and MEGAN [ 5 ], while the Naïve Bayesian Classifier (such as implemented in Fragment Classification Package, FCP) [ 6 ] and the interpolated Markov model classification (IMM-based), used by Phymm [ 7 ] are based on composition similarity. The performance of assignment programs has been assessed using simulated and well-known experimental data [ 2 , 3 , 6 - 8 ]. In the case of composition-based programs, these methods can classify all the reads [ 2 ], and report the associated likelihood of the read to be assigned to the different categories.…”
Section: Introductionmentioning
confidence: 99%
“…While new sequences are regularly added to databases (more than ten million 16S-rRNA gene sequences are currently available, source: ncbi.nlm.nih.gov), a series of intrinsic limitations affecting the different experimental protocols have been highlighted 22 23 24 25 26 27 28 . These aspects prompted both protocol improvements 29 and suggestions for denoising or data mining 30 31 32 33 .…”
mentioning
confidence: 99%