2004
DOI: 10.1093/nar/gkh180
|View full text |Cite
|
Sign up to set email alerts
|

Local homology recognition and distance measures in linear time using compressed amino acid alphabets

Abstract: Methods for discovery of local similarities and estimation of evolutionary distance by identifying k-mers (contiguous subsequences of length k) common to two sequences are described. Given unaligned sequences of length L, these methods have O(L) time complexity. The ability of compressed amino acid alphabets to extend these techniques to distantly related proteins was investigated. The performance of these algorithms was evaluated for different alphabets and choices of k using a test set of 1848 pairs of struc… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
101
2

Year Published

2010
2010
2016
2016

Publication Types

Select...
6
3
1

Relationship

0
10

Authors

Journals

citations
Cited by 134 publications
(103 citation statements)
references
References 17 publications
0
101
2
Order By: Relevance
“…Since large-scale microbial community profiling becomes more accessible to scientists, scalable yet accurate tools like CRiSPy are crucial for research in this area. Although CRiSPy is designed for microbial studies targeting DNA sequence analysis, the individual k-mer distance and genetic distance modules on GPUs can easily be extended to support protein sequence analysis and be used in general sequence analysis studies such as the usage of k-mer distance for fast, approximate phylogenetic tree construction by Edgar [17] or the utilization of pairwise genetic distance matrix in multiple sequence alignment programs such as ClustalW [18]. Availability: CRiSPy is available from the authors upon request.…”
Section: Resultsmentioning
confidence: 99%
“…Since large-scale microbial community profiling becomes more accessible to scientists, scalable yet accurate tools like CRiSPy are crucial for research in this area. Although CRiSPy is designed for microbial studies targeting DNA sequence analysis, the individual k-mer distance and genetic distance modules on GPUs can easily be extended to support protein sequence analysis and be used in general sequence analysis studies such as the usage of k-mer distance for fast, approximate phylogenetic tree construction by Edgar [17] or the utilization of pairwise genetic distance matrix in multiple sequence alignment programs such as ClustalW [18]. Availability: CRiSPy is available from the authors upon request.…”
Section: Resultsmentioning
confidence: 99%
“…(2) K-mer distance [10,12], which is linear time complexity for pairwise distance estimation. (3) Random generation by computer pseudo number, which is just to provide a fast method to generate data.…”
Section: Experiments Resultsmentioning
confidence: 99%
“…On the other hand, the enhancement of efficiency is widely observed in sequence-related bioinformatics studies. For example, a check for the accuracy and efficiency in the local homolog recognition with simplified alphabet is carried out [146]. It is found that the compressed alphabets could greatly improve the performance in local similarity discovery with a comparable coverage.…”
Section: Implications Of Simplified Amino Acid Alphabetsmentioning
confidence: 99%