2014
DOI: 10.1089/cmb.2014.0173
|View full text |Cite
|
Sign up to set email alerts
|

A Coverage Criterion for Spaced Seeds and Its Applications to Support Vector Machine String Kernels andk-Mer Distances

Abstract: Spaced seeds have been recently shown to not only detect more alignments, but also to give a more accurate measure of phylogenetic distances, and to provide a lower misclassification rate when used with Support Vector Machines (SVMs). We confirm by independent experiments these two results, and propose in this article to use a coverage criterion to measure the seed efficiency in both cases in order to design better seed patterns. We show first how this coverage criterion can be directly measured by a full auto… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

2
27
0

Year Published

2015
2015
2022
2022

Publication Types

Select...
5
2
1

Relationship

2
6

Authors

Journals

citations
Cited by 23 publications
(29 citation statements)
references
References 72 publications
2
27
0
Order By: Relevance
“…In conclusion, while spaced seeds provide a much better estimator for alignments whose quality ranges over a large interval, for high-quality alignments (> 90% of identity), the hit number of contiguous seed becomes a better estimator. The superiority of hit-number over coverage for high-quality alignments has also been reported in [32]. Along with Spearman's correlation, we also made an analysis of mutual information computed on the same data (data not shown) that confirmed the above conclusions.…”
Section: Correlation Of Counts With Alignment Qualitysupporting
confidence: 81%
See 1 more Smart Citation
“…In conclusion, while spaced seeds provide a much better estimator for alignments whose quality ranges over a large interval, for high-quality alignments (> 90% of identity), the hit number of contiguous seed becomes a better estimator. The superiority of hit-number over coverage for high-quality alignments has also been reported in [32]. Along with Spearman's correlation, we also made an analysis of mutual information computed on the same data (data not shown) that confirmed the above conclusions.…”
Section: Correlation Of Counts With Alignment Qualitysupporting
confidence: 81%
“…Another improvement considered in [21,32,29] is to use multiple seeds, i.e. several seeds simultaneously instead of a single one.…”
Section: Discussionmentioning
confidence: 99%
“…To approximate a measure of conserved nucleotides, the coverage is projected over individual nucleotides rather than directly counting shared skip-mers which would introduce redundancy from phased matches. An equivalent coverage metric for spaced seeds can be found in Noé and Martin (2014) where it is also used to estimate distances. The score for each feature (i.e.…”
Section: Methodsmentioning
confidence: 99%
“…Spaced seeds are widely used for approximate sequence matching in bioinformatics and they have been increasingly applied to improve the sensitivity and specificity of homology search algorithms (Kucherov et al, 2006;Noé and Martin, 2014). Spaced seeds are now routinely used, instead of k-mers, in many problems involving sequence comparison such as multiple sequence alignment (Darling et al, 2006), protein classification (Onodera and Shibuya, 2013), read mapping (Rumble et al, 2009), phylogeny reconstruction (Leimeister et al, 2014), and metagenome reads clustering and classification (Binda et al, 2015;Ounit and Lonardi, 2016;Girotto et al, 2017c).…”
Section: Introductionmentioning
confidence: 99%