2021
DOI: 10.1093/bib/bbab461
|View full text |Cite
|
Sign up to set email alerts
|

Positive-unlabeled learning in bioinformatics and computational biology: a brief review

Abstract: Conventional supervised binary classification algorithms have been widely applied to address significant research questions using biological and biomedical data. This classification scheme requires two fully labeled classes of data (e.g. positive and negative samples) to train a classification model. However, in many bioinformatics applications, labeling data is laborious, and the negative samples might be potentially mislabeled due to the limited sensitivity of the experimental equipment. The positive unlabel… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
27
0

Year Published

2022
2022
2023
2023

Publication Types

Select...
6
3

Relationship

2
7

Authors

Journals

citations
Cited by 45 publications
(41 citation statements)
references
References 96 publications
0
27
0
Order By: Relevance
“…They are usually performed worse, especially for those proteins with low sequence similarity with the proteins in the search library. Machine learning combined with extensive sequence feature engineering techniques has been successfully used in many bioinformatics topics [ [21] , [22] , [23] , [24] , [25] , [26] , [27] , [28] , [29] , [30] , [31] , [32] , [69] ], and provide an alternative efficient and accurate strategy to study these enigmatic proteins. As such, we are highly motivated to leverage cutting-edge machine learning techniques to develop computational approaches to identify the PE_PGRS proteins rapidly and accurately.…”
Section: Introductionmentioning
confidence: 99%
“…They are usually performed worse, especially for those proteins with low sequence similarity with the proteins in the search library. Machine learning combined with extensive sequence feature engineering techniques has been successfully used in many bioinformatics topics [ [21] , [22] , [23] , [24] , [25] , [26] , [27] , [28] , [29] , [30] , [31] , [32] , [69] ], and provide an alternative efficient and accurate strategy to study these enigmatic proteins. As such, we are highly motivated to leverage cutting-edge machine learning techniques to develop computational approaches to identify the PE_PGRS proteins rapidly and accurately.…”
Section: Introductionmentioning
confidence: 99%
“…To evaluate the performance of these methods, we regard the known DMA pairs as positive samples and unlabeled DMA pairs as negative samples ( Peng et al, 2020 ; Li et al, 2022 ). We set up a 5-fold cross-validation scenario in which we randomly divide positive samples and negative samples into five groups, respectively.…”
Section: Experiments and Resultsmentioning
confidence: 99%
“…Another noteworthy issue is the negative data for training the supervised RF model of PCfun. This problem has been brought to attention and discussed in our previous studies [ 57 , 58 ]. In this study, similar issues occurred when constructing the negative PC–GO associations in the training datasets (‘ Supplementary Methods ’).…”
Section: Discussionmentioning
confidence: 99%
“…Theoretically, there would be many negative protein complex-GO pairs, some of which might be mislabeled. Compared to the traditional supervised machine-learning models, positive-unlabeled learning [ 57 , 59 ] only requires positive and unlabeled (i.e. either positive or negative) training samples to build reliable predictors with competitive prediction performance and, therefore, can be considered as an alternative option for tackling this issue.…”
Section: Discussionmentioning
confidence: 99%