1992
DOI: 10.1093/bioinformatics/8.2.121
|View full text |Cite
|
Sign up to set email alerts
|

Statistical distance between texts and filtration methods in sequence comparison

Abstract: Upon searching local similarities in long sequences, the necessity of a 'rapid' similarity search becomes acute. Quadratic complexity of dynamic programming algorithms forces the employment of filtration methods that allow elimination of the sequences with a low similarity level. The paper is devoted to the theoretical substantiations of the filtration method based on the statistical distance between texts. The notion of the filtration efficiency is introduced and the efficiency of several filters is estimated… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
9
0

Year Published

1995
1995
2016
2016

Publication Types

Select...
8
1

Relationship

0
9

Authors

Journals

citations
Cited by 16 publications
(9 citation statements)
references
References 21 publications
0
9
0
Order By: Relevance
“…If the imprint of the global signature is locally pervasive, down to the scale of the single gene or coding sequence, large deviations on that scale could highlight segments introduced by recent horizontal transfer from another species [13]. So-called filtration methods, based on dissimilarity measures computed from dinucleotide counts, have been employed for the alignment-free computation of evolutionary distances between homologous sequences [14,15]. The "transition matrix method" was a similar technique involving raw counts of amino acid pairs in protein primary sequences [16].…”
Section: Introductionmentioning
confidence: 99%
“…If the imprint of the global signature is locally pervasive, down to the scale of the single gene or coding sequence, large deviations on that scale could highlight segments introduced by recent horizontal transfer from another species [13]. So-called filtration methods, based on dissimilarity measures computed from dinucleotide counts, have been employed for the alignment-free computation of evolutionary distances between homologous sequences [14,15]. The "transition matrix method" was a similar technique involving raw counts of amino acid pairs in protein primary sequences [16].…”
Section: Introductionmentioning
confidence: 99%
“…Current methods (reviewed in Pevzner, 1992) typically require two arbitrary assumptions to be made for each similarity search: one about the length of the longest common word that is to be considered and the other about the threshold of similarity for significant matches. The method proposed in this paper removes the need for any restrictions on word length while keeping the computation time linear, and it also provides a bound on significance, thus removing need for any arbitrary thresholds.…”
Section: Resultsmentioning
confidence: 99%
“…Since then, it has been applied in phylogenetic reconstruction [8][9][10][11], identification of homologous proteins [4], genome annotation [12], classification of metagenomic sequences [13], and identification of regulatory sequences [14]. Also, it has been shown as an efficient technique for sequence filtering [15].…”
Section: /32mentioning
confidence: 99%