Application of a simple likelihood ratio approximant to protein sequence classification

Kaján, László; Kertész‐Farkas, Attila; Franklin, Dino; Ivanova, Neli; Kocsor, András; Pongor, Sándor

doi:10.1093/bioinformatics/btl512

Cited by 23 publications

(10 citation statements)

References 12 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The problem is very complicated and non-trivial. Proper selection of the protein domain is necessary [102][103][104][105][106][107][108]. In addition to pure chemical data [109][110][111][112][113][114][115][116] in the context of the Drug Discovery [117][118][119][120][121][122][123][124][125][126][127], there is also a need for some knowledge on protein-protein interactions, the high quality structural prediction of proteins [2,[128][129][130][131][132][133][134][135][136] and their inhibitors, and a detailed understanding of how those inhibitors affect the molecular recognition between proteins.…”

Section: Resultsmentioning

confidence: 99%

The interactome: Predicting the protein-protein interactions in cells

Plewczyński

Ginalski

2009

Cellular and Molecular Biology Letters

View full text Add to dashboard Cite

Abstract:The term Interactome describes the set of all molecular interactions in cells, especially in the context of protein-protein interactions. These interactions are crucial for most cellular processes, so the full representation of the interaction repertoire is needed to understand the cell molecular machinery at the system biology level. In this short review, we compare various methods for predicting protein-protein interactions using sequence and structure information. The ultimate goal of those approaches is to present the complete methodology for the automatic selection of interaction partners using their amino acid sequences and/or three dimensional structures, if known. Apart from a description of each method, details of the software or web interface needed for high throughput prediction on the whole genome scale are also provided. The proposed validation of the theoretical methods using experimental data would be a better assessment of their accuracy.

show abstract

Section: Resultsmentioning

confidence: 99%

The interactome: Predicting the protein-protein interactions in cells

Plewczyński

Ginalski

2009

Cellular and Molecular Biology Letters

View full text Add to dashboard Cite

show abstract

“…Euclidean distance [16], [28] is a similarity measure commonly used in time-series classification when the compared sequences are of the same length and phase, while Dynamic Time Warping [17] is used when more flexible matching is desired. Under the same category, alignment-based methods have been used in several applications in which the sequences consist of symbols [13]. Two types of functions have been proposed: (1) globalalignment functions, such as the Edit Distance, which compute an optimum global alignment score through dynamic programing [25], and (2) local-alignment functions, such as Smith-Waterman [27] and BLAST [1], which calculate scores between two sequences based on most similar sub-regions.…”

Section: Related Workmentioning

confidence: 99%

A Multi-Granularity Pattern-Based Sequence Classification Framework for Educational Data

Jaber

Wood

Papapetrou³

et al. 2016

2016 IEEE International Conference on Data Science and Advanced Analytics (DSAA)

View full text Add to dashboard Cite

Abstract-In many application domains, such as education, sequences of events occurring over time need to be studied in order to understand the generative process behind these sequences, and hence classify new examples. In this paper, we propose a novel multi-granularity sequence classification framework that generates features based on frequent patterns at multiple levels of time granularity. Feature selection techniques are applied to identify the most informative features that are then used to construct the classification model. We show the applicability and suitability of the proposed framework to the area of educational data mining by experimenting on an educational dataset collected from an asynchronous communication tool in which students interact to accomplish an underlying group project. The experimental results showed that our model can achieve competitive performance in detecting the students' roles in their corresponding projects, compared to a baseline similarity-based approach.

show abstract

“…Therefore, it is costly on a large data set. Ratanamahatana et al [48] propose a method to dramatically speed up the DTW similarity search process by using tight lower bounds For symbolic sequences, such as protein sequences and DNA sequences, alignment based distances are popular adopted [25]. Given a similarity matrix and a gap penalty, the NeedlemanWunsch algorithm [44] computes an optimum global alignment score between two sequences through dynamic programming.…”

Section: Sequence Distance Based Classificationmentioning

confidence: 99%

A brief survey on sequence classification

Xing

Pei

Keogh

2010

SIGKDD Explor. Newsl.

502

266

View full text Add to dashboard Cite

Sequence classification has a broad range of applications such as genomic analysis, information retrieval, health informatics, finance, and abnormal detection. Different from the classification task on feature vectors, sequences do not have explicit features. Even with sophisticated feature selection techniques, the dimensionality of potential features may still be very high and the sequential nature of features is difficult to capture. This makes sequence classification a more challenging task than classification on feature vectors. In this paper, we present a brief review of the existing work on sequence classification. We summarize the sequence classification in terms of methodologies and application domains. We also provide a review on several extensions of the sequence classification problem, such as early classification on sequences and semi-supervised learning on sequences.

show abstract

Application of a simple likelihood ratio approximant to protein sequence classification

Cited by 23 publications

References 12 publications

The interactome: Predicting the protein-protein interactions in cells

The interactome: Predicting the protein-protein interactions in cells

A Multi-Granularity Pattern-Based Sequence Classification Framework for Educational Data

A brief survey on sequence classification

Contact Info

Product

Resources

About