2018
DOI: 10.1038/s41598-018-31395-5
|View full text |Cite
|
Sign up to set email alerts
|

A novel logistic regression model combining semi-supervised learning and active learning for disease classification

Abstract: Traditional supervised learning classifier needs a lot of labeled samples to achieve good performance, however in many biological datasets there is only a small size of labeled samples and the remaining samples are unlabeled. Labeling these unlabeled samples manually is difficult or expensive. Technologies such as active learning and semi-supervised learning have been proposed to utilize the unlabeled samples for improving the model performance. However in active learning the model suffers from being short-sig… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
12
0

Year Published

2019
2019
2024
2024

Publication Types

Select...
5
4
1

Relationship

0
10

Authors

Journals

citations
Cited by 20 publications
(12 citation statements)
references
References 24 publications
0
12
0
Order By: Relevance
“…Consequently, the number of training data increased by five times the primary ones. The discrimination of the original aroma odor/breath samples and the molecule additive-containing samples and the calculation of the feature score for each data set were performed by the logistic regression model (Figure e) . Machine learning was performed to optimize the following equation: log p /(1 – p ) = β 0 + x 1 β 1 + x 2 β 2 + x 3 β 3 + ... + x n β n , where p is the probability of which the data sets can be classified, x n is the intensity of each segment in the 2D MS map, β n ( n ≥ 1) is the model’s learned weight (i.e., feature score), and β 0 is the bias.…”
Section: Discussionmentioning
confidence: 99%
“…Consequently, the number of training data increased by five times the primary ones. The discrimination of the original aroma odor/breath samples and the molecule additive-containing samples and the calculation of the feature score for each data set were performed by the logistic regression model (Figure e) . Machine learning was performed to optimize the following equation: log p /(1 – p ) = β 0 + x 1 β 1 + x 2 β 2 + x 3 β 3 + ... + x n β n , where p is the probability of which the data sets can be classified, x n is the intensity of each segment in the 2D MS map, β n ( n ≥ 1) is the model’s learned weight (i.e., feature score), and β 0 is the bias.…”
Section: Discussionmentioning
confidence: 99%
“…While conventional semisupervised approaches attempt to exploit the latent structure of unlabeled data with the specific goal of improving label predictions, the goal of active learning is to reduce the number of labeled examples needed for learning at the same time. 78 As an active research area, there are many different active learning methods, such as pool-based active learning, 79 where the learner chooses which sample to label next in a pool of unlabeled data; selective sampling, 80,81 where unlabeled data come as a stream and the learner decides to query or discard each arriving point; and batch-mode active learning, 82 where the learner queries multiple instances at once. Interested readers may refer to a report by Settles 83 to learn more about the details of the query strategy frameworks.…”
Section: How Artificial Intelligence Workmentioning
confidence: 99%
“…The idea of this method is very naive, namely that if samples are closer in the feature space, they are more likely to be in the same class. Then, we observed two linear classification modelsâĂŹ performance, linear support vector classifier (linear SVC) [3,17] and logistic regression (LR) [7]. Both identify hyperplanes in the feature space to split samples into different classes.…”
Section: Prediction Modelsmentioning
confidence: 99%