2018
DOI: 10.1101/451278
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

MeShClust2: Application of alignment-free identity scores in clustering long DNA sequences

Abstract: Grouping sequences into similar clusters is an important part of sequence analysis. Widely used clustering tools sacrifice quality for speed. Previously, we developed MeShClust, which utilizes k-mer counts in an alignment-assisted classifier and the mean-shift algorithm for clustering DNA sequences. Although MeShClust outperformed related tools in terms of cluster quality, the alignment algorithm used for generating training data for the classifier was not scalable to longer sequences. In contrast, MeShClust 2… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1

Citation Types

0
3
0

Year Published

2018
2018
2024
2024

Publication Types

Select...
2
2
1

Relationship

2
3

Authors

Journals

citations
Cited by 7 publications
(3 citation statements)
references
References 41 publications
0
3
0
Order By: Relevance
“…Further, clustering algorithms, e.g. k -means and mean shift ( 35 , 36 ), can utilize Identity in grouping similar sequences. Finally, this methodology was also utilized in Look4TRs ( 34 ) where it was applied to finding repeated motifs in tandem repeats.…”
Section: Discussionmentioning
confidence: 99%
See 1 more Smart Citation
“…Further, clustering algorithms, e.g. k -means and mean shift ( 35 , 36 ), can utilize Identity in grouping similar sequences. Finally, this methodology was also utilized in Look4TRs ( 34 ) where it was applied to finding repeated motifs in tandem repeats.…”
Section: Discussionmentioning
confidence: 99%
“…This design was inspired by our earlier research. We have successfully implemented adaptive software tools using self-supervised learning algorithms for locating cis-regulatory modules ( 32 ), identifying DNA repeats ( 33 , 34 ), and for clustering DNA sequences ( 35 , 36 ). Multiple software tools we developed earlier utilize general linear models (GLM) ( 34–40 ).…”
Section: Introductionmentioning
confidence: 99%
“…The adaptation of the mean shift algorithm in MeSh-Clust v2.0 [41] is the same as that of MeShClust v1.0. The difference between the two versions is that the second version's classifier utilized in selecting similar sequences does not utilize alignment algorithms to generate identity scores for training; it uses similar idea to that implemented in Identity.…”
Section: Meshclust V20mentioning
confidence: 99%