2007
DOI: 10.1073/pnas.0607879104
|View full text |Cite
|
Sign up to set email alerts
|

Predicting protein–protein interactions based only on sequences information

Abstract: Protein-protein interactions (PPIs) are central to most biological processes. Although efforts have been devoted to the development of methodology for predicting PPIs and protein interaction networks, the application of most existing methods is limited because they need information about protein homology or the interaction marks of the protein partners. In the present work, we propose a method for PPI prediction using only the information of protein sequences. This method was developed based on a learning algo… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

3
811
2
3

Year Published

2009
2009
2023
2023

Publication Types

Select...
5
2
1

Relationship

0
8

Authors

Journals

citations
Cited by 877 publications
(834 citation statements)
references
References 34 publications
3
811
2
3
Order By: Relevance
“…RPISeq is a family of machine learning classifiers (RF and SVM) designed to predict the probability of interaction between a given protein and RNA. In this method, RNA sequences are encoded as normalized frequencies of RNA tetrads, and protein sequences are encoded using a conjoint triad feature (CTF) method originally proposed by Shen et al [23]. In essence, RPISeq exploits the amino acid composition of protein sequences and ribonucleotide composition of RNA sequences to predict the probability that a given pair (one protein and one RNA) will interact.…”
Section: Rna-protein Partner Prediction Methods and Web Serversmentioning
confidence: 99%
“…RPISeq is a family of machine learning classifiers (RF and SVM) designed to predict the probability of interaction between a given protein and RNA. In this method, RNA sequences are encoded as normalized frequencies of RNA tetrads, and protein sequences are encoded using a conjoint triad feature (CTF) method originally proposed by Shen et al [23]. In essence, RPISeq exploits the amino acid composition of protein sequences and ribonucleotide composition of RNA sequences to predict the probability that a given pair (one protein and one RNA) will interact.…”
Section: Rna-protein Partner Prediction Methods and Web Serversmentioning
confidence: 99%
“…Various approaches mainly differ in their encoding of sequence features and choice of learning functions. For instance, Martin et al encoded the sequence information for a protein pair by a product of signatures [80], while Shen et al proposed the use of conjoint triads, i.e., frequencies of continuous subsequences of three residues [76]. Guo et al used the auto-correlation values of seven different physicochemical scales for protein sequences as protein interaction predictors [81].…”
Section: Sequence Signaturementioning
confidence: 99%
“…These motifs are learned from existing PPIs using only sequence data and characterize direct binding, but also could be related to protein function, which is in turn predictive of PPIs [76]. Methods based on information content analyze co-occurring subsequences of proteins with experimentally verified interactions, and use these patterns for predicting new interactions.…”
Section: Sequence Signaturementioning
confidence: 99%
“…However, considering all 20 3 amino acid triads requires an 8000-dimensional feature vector to represent a protein, which is too large for contemporary machine learning tools. Thus, the 20 amino acid types were clustered into seven groups based on their dipole strength and side chain volumes to reduce the dimensions of the feature vector (Shen, Zhang et al 2007). The frequencies of the 7 3 = 343 triads can be used to encode a protein sequence.…”
Section: Sequence Informationmentioning
confidence: 99%
“…This led to more complicated relations than that among co-occurrence-based methods. For example, Shen et al (Shen, Zhang et al 2007) proposed to use a composition of short sequences as protein features and a following work by Chang et al (Chang, Syu et al 2010) combined these features with protein surface information. In addition to the overlap of features among different MLbased methods, they may use identical or different ML techniques.…”
Section: Introductionmentioning
confidence: 99%