2012
DOI: 10.1093/bioinformatics/bts504
|View full text |Cite
|
Sign up to set email alerts
|

Positive-unlabeled learning for disease gene identification

Abstract: Background: Identifying disease genes from human genome is an important but challenging task in biomedical research. Machine learning methods can be applied to discover new disease genes based on the known ones. Existing machine learning methods typically use the known disease genes as the positive training set P and the unknown genes as the negative training set N (non-disease gene set does not exist) to build classifiers to identify new disease genes from the unknown genes. However, such kind of classifiers … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
133
0

Year Published

2015
2015
2021
2021

Publication Types

Select...
6
1
1

Relationship

0
8

Authors

Journals

citations
Cited by 173 publications
(135 citation statements)
references
References 36 publications
0
133
0
Order By: Relevance
“…Therefore, to reduce this uncertainty, the binary-class supervised learning methods have to choose and were different in the way to construct the set of negative training samples. In reality, the unknown set may contain unknown disease genes; therefore, semi-supervised learning methods were recently proposed to solve the problem, where the classifier is learned from both labeled (i.e., known disease genes) and unlabeled (i.e., the unknown genes) data [50,51]. Another recent approach to disease gene prediction is to use unary/one-class classification techniques, in which the classifier is learned from only positive samples (i.e., known disease genes) [52].…”
Section: Discussionmentioning
confidence: 99%
“…Therefore, to reduce this uncertainty, the binary-class supervised learning methods have to choose and were different in the way to construct the set of negative training samples. In reality, the unknown set may contain unknown disease genes; therefore, semi-supervised learning methods were recently proposed to solve the problem, where the classifier is learned from both labeled (i.e., known disease genes) and unlabeled (i.e., the unknown genes) data [50,51]. Another recent approach to disease gene prediction is to use unary/one-class classification techniques, in which the classifier is learned from only positive samples (i.e., known disease genes) [52].…”
Section: Discussionmentioning
confidence: 99%
“…In our experiments, we have employed employed the data used by (Yang et al, 2012). This data has 5405 known disease genes spanning 2751 disease phenotypes, where all the genes have been extracted by combining GENECARD (Safran et al, 2010) and OMIM (McKusick, 2007) disease gene data.…”
Section: Experimental Datamentioning
confidence: 99%
“…In this regard, the performance of the final classifier will be decreased. Yang et al (2012) (PUDI), selected some samples from unknown genes by applying the Euclidean distance between 'positive representative vector' and each of the unknown genes as negative instances. They also defined three other sets namely, likely negative, likely positive, and weakly negative set, based on their likelihoods to be positive or negative class.…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation
“…One common approach is Positive and Unlabeled (PU) learning [83] that learns from positive and unlabeled data alone. It is used when only the labels for disease genes are available [130,131]. the maximum class probability greater than 0.5; otherwise the unknown class would be predicted.…”
Section: Positive and Unlabeled Learning Approachesmentioning
confidence: 99%