2010
DOI: 10.1109/tnb.2010.2081375
|View full text |Cite
|
Sign up to set email alerts
|

Comparison of Statistical Methods to Classify Environmental Genomic Fragments

Abstract: "Binning" (or taxonomic classification) of DNA sequence reads is an initial step to analyzing an environmental biological sample. Currently, a homology-based tool, BLAST, is one of the most commonly used tools to label DNA reads, but it is argued that BLAST will quickly lose its classification ability as the genome databases grow. In this paper, we compare the accuracies of a naïve Bayes classifier (NBC) and statistical language model to BLAST for binning reads and demonstrate that NBC obtains good performance… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
7
0
4

Year Published

2012
2012
2022
2022

Publication Types

Select...
5
1

Relationship

0
6

Authors

Journals

citations
Cited by 6 publications
(11 citation statements)
references
References 33 publications
0
7
0
4
Order By: Relevance
“…Contigs were assigned to high-level taxonomic groups (Class level and above) using a Naïve Baysien Classifier (NBC) that compares against a database containing all DNA sequences in NCBI that classified as either Bacteria, Archaea, Fungi, or viral (Rosen and Essinger, 2010 ). This approach was chosen because NBC has been found to outperform most other composition, similarity, and phylogeny based metagenomic classifiers in terms of sensitivity and precision (Bazinet and Cummings, 2012 ).…”
Section: Methodsmentioning
confidence: 99%
“…Contigs were assigned to high-level taxonomic groups (Class level and above) using a Naïve Baysien Classifier (NBC) that compares against a database containing all DNA sequences in NCBI that classified as either Bacteria, Archaea, Fungi, or viral (Rosen and Essinger, 2010 ). This approach was chosen because NBC has been found to outperform most other composition, similarity, and phylogeny based metagenomic classifiers in terms of sensitivity and precision (Bazinet and Cummings, 2012 ).…”
Section: Methodsmentioning
confidence: 99%
“…In general, the classifiers performed better at genotyping than at subtyping HCV sequences. Several studies for viral [9], [10] and metagenomic [3], [20], [21] taxonomic classification have reported similar results where the performance is better at high-level classifications. Genomic sequences are more similar at low-level than at high-level clades, which makes more difficult to discriminate between sequences at low-level clades.…”
Section: Overall Remarksmentioning
confidence: 59%
“…For instance B-MB with α = 1e−100 and LSVM with L2 penalty are the best choice in generative and discriminative models respectively as they were stable among all experimental classifications. As observed in previous studies [20]- [22], generative classifiers (MB and Markov) are sensitive to how they infer their parameters (class-conditional densities), either by MLE or Bayesian approaches. MLE approach could overfit and produce a sparse parameter matrix W when unseen k-mers in training step will have null estimates [20], [23] or very small probabilities will underflow the numerical precision [14].…”
Section: Overall Remarksmentioning
confidence: 63%
“…Nonetheless this challenge may be addressed computationally, sorting raw sequencing reads taxonomically (Figure 5 ) and phylogenetically (Weisburg et al, 1991 ; Retief, 2000 ; Darling et al, 2014 ) and thus yield conclusive information about the population of the niche, which can be extended subsequently to the assembled contigs and genes. This process is called taxonomic binning (Droge and Mchardy, 2012 ) and there are numerous tools (Mohammed et al, 2011 ; Pati et al, 2011 ; Luo et al, 2014 ; Wang et al, 2014 ) that rely on homology based or composition based approaches (Rosen and Essinger, 2010 ).…”
Section: Data Acquisitionmentioning
confidence: 99%