2006
DOI: 10.1093/bioinformatics/btl512
|View full text |Cite
|
Sign up to set email alerts
|

Application of a simple likelihood ratio approximant to protein sequence classification

Abstract: We used a simple LRA-based on the maximal similarity (or minimal distance) scores of the two top ranking sequence classes. The scoring methods (Smith-Waterman, BLAST, local alignment kernel and compression based distances) were compared on datasets designed to test sequence similarities between proteins distantly related in terms of structure or evolution. It was found that LRA-based scoring can significantly outperform simple scoring methods.

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
10
0

Year Published

2009
2009
2016
2016

Publication Types

Select...
5
3
2

Relationship

0
10

Authors

Journals

citations
Cited by 23 publications
(10 citation statements)
references
References 12 publications
0
10
0
Order By: Relevance
“…The problem is very complicated and non-trivial. Proper selection of the protein domain is necessary [102][103][104][105][106][107][108]. In addition to pure chemical data [109][110][111][112][113][114][115][116] in the context of the Drug Discovery [117][118][119][120][121][122][123][124][125][126][127], there is also a need for some knowledge on protein-protein interactions, the high quality structural prediction of proteins [2,[128][129][130][131][132][133][134][135][136] and their inhibitors, and a detailed understanding of how those inhibitors affect the molecular recognition between proteins.…”
Section: Resultsmentioning
confidence: 99%
“…The problem is very complicated and non-trivial. Proper selection of the protein domain is necessary [102][103][104][105][106][107][108]. In addition to pure chemical data [109][110][111][112][113][114][115][116] in the context of the Drug Discovery [117][118][119][120][121][122][123][124][125][126][127], there is also a need for some knowledge on protein-protein interactions, the high quality structural prediction of proteins [2,[128][129][130][131][132][133][134][135][136] and their inhibitors, and a detailed understanding of how those inhibitors affect the molecular recognition between proteins.…”
Section: Resultsmentioning
confidence: 99%
“…Euclidean distance [16], [28] is a similarity measure commonly used in time-series classification when the compared sequences are of the same length and phase, while Dynamic Time Warping [17] is used when more flexible matching is desired. Under the same category, alignment-based methods have been used in several applications in which the sequences consist of symbols [13]. Two types of functions have been proposed: (1) globalalignment functions, such as the Edit Distance, which compute an optimum global alignment score through dynamic programing [25], and (2) local-alignment functions, such as Smith-Waterman [27] and BLAST [1], which calculate scores between two sequences based on most similar sub-regions.…”
Section: Related Workmentioning
confidence: 99%
“…Therefore, it is costly on a large data set. Ratanamahatana et al [48] propose a method to dramatically speed up the DTW similarity search process by using tight lower bounds For symbolic sequences, such as protein sequences and DNA sequences, alignment based distances are popular adopted [25]. Given a similarity matrix and a gap penalty, the NeedlemanWunsch algorithm [44] computes an optimum global alignment score between two sequences through dynamic programming.…”
Section: Sequence Distance Based Classificationmentioning
confidence: 99%