2013
DOI: 10.1186/1471-2105-14-s3-s8
|View full text |Cite
|
Sign up to set email alerts
|

MS-k NN: protein function prediction by integrating multiple data sources

Abstract: Background Protein function determination is a key challenge in the post-genomic era. Experimental determination of protein functions is accurate, but time-consuming and resource-intensive. A cost-effective alternative is to use the known information about sequence, structure, and functional properties of genes and proteins to predict functions using statistical methods. In this paper, we describe the Multi-Source k-Nearest Neighbor (MS-k NN) algorithm for function prediction, which finds k-nearest… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
2

Citation Types

0
69
1
1

Year Published

2017
2017
2023
2023

Publication Types

Select...
4
3
2

Relationship

0
9

Authors

Journals

citations
Cited by 85 publications
(81 citation statements)
references
References 16 publications
0
69
1
1
Order By: Relevance
“…The prediction of gene function generally proceeds by the transfer of function from genes with experimental evidence to unannotated, or less-annotated, genes that are similar by some measure [42]. While several methods use multiple data types to carry out predictions [31,47,11], many solely rely on evolutionary relationships [16,24,6,10] and are the focus of the current study.…”
Section: Introductionmentioning
confidence: 99%
“…The prediction of gene function generally proceeds by the transfer of function from genes with experimental evidence to unannotated, or less-annotated, genes that are similar by some measure [42]. While several methods use multiple data types to carry out predictions [31,47,11], many solely rely on evolutionary relationships [16,24,6,10] and are the focus of the current study.…”
Section: Introductionmentioning
confidence: 99%
“…There already exist several attempts on using network information for AFP under the CAFA setting with limited success. For example, MS-kNN, a top method in both CAFA1 and CAFA2, used a weighted average method to integrate multiple sources (information of each source through k-nearest neighbor (KNN)) including networks [24]. However, the accuracy by MS-kNN was similar to that by KNN from a single source, i.e.…”
Section: Related Workmentioning
confidence: 99%
“…In fact in AFP, several approaches of using the idea of integrating data/classifiers have already been proposed. MS-kNN, a top method in CAFA1 and CAFA2, predicts the function by averaging over the prediction scores from three data sources: sequences, expression and protein-protein interaction [16] (Note that MS-kNN is NOT a sequencebased method). Also Jones-UCL, the top team of CAFA1, integrates prediction scores from multiple methods by using the 'consensus' function (given in Eq.…”
Section: Related Workmentioning
confidence: 99%