2010
DOI: 10.2174/092986610791190336
|View full text |Cite
|
Sign up to set email alerts
|

Gene Ontology-Based Protein Function Prediction by Using Sequence Composition Information

Abstract: The prediction of protein function is a difficult and important problem in computational biology. In this study, an efficient method is presented to predict protein function with sequence composition information. Four kinds of basic building blocks of protein sequences are investigated, including N-grams, binary profiles, PFAM domains and InterPro domains. The protein sequences are mapped into high-dimensional vectors by using the occurrence frequencies of each kind of building blocks. The resulting vectors ar… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1

Citation Types

0
1
0

Year Published

2011
2011
2013
2013

Publication Types

Select...
2
1

Relationship

0
3

Authors

Journals

citations
Cited by 3 publications
(1 citation statement)
references
References 53 publications
(69 reference statements)
0
1
0
Order By: Relevance
“…For the structure-based models, mutant PR or RT proteins were represented as feature vectors whose input attributes, obtained via an in silico mutagenesis technique relying on a four-body statistical potential energy function, quantified environmental perturbations at positions in their respective targets upon mutation (Figure 4). Similarly, we developed sequence-based models by generating mutant attributes through two applications of n-grams, a technique previously used by other groups in a variety of studies on proteins [32-36] though not in this particular realm. Our models display performance measures that are generally competitive with those described by Rhee et al [24] in their seminal systematic study that utilizes a sequence-based approach.…”
Section: Discussionmentioning
confidence: 99%
“…For the structure-based models, mutant PR or RT proteins were represented as feature vectors whose input attributes, obtained via an in silico mutagenesis technique relying on a four-body statistical potential energy function, quantified environmental perturbations at positions in their respective targets upon mutation (Figure 4). Similarly, we developed sequence-based models by generating mutant attributes through two applications of n-grams, a technique previously used by other groups in a variety of studies on proteins [32-36] though not in this particular realm. Our models display performance measures that are generally competitive with those described by Rhee et al [24] in their seminal systematic study that utilizes a sequence-based approach.…”
Section: Discussionmentioning
confidence: 99%