Semi-supervised self-training for decision tree classifiers

Tanha, Jafar; Someren, Maarten van; Afsarmanesh, Hamideh

doi:10.1007/s13042-015-0328-7

Cited by 191 publications

(85 citation statements)

References 34 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…Tanha et al [2] proposed a decision tree based self-training approach. This approach uses a label proportion of training labeled examples included in corresponding leaf node as confidence measure of classified example.…”

Section: Related Workmentioning

confidence: 99%

See 1 more Smart Citation

Deep Neural Network Self-training Based on Unsupervised Learning and Dropout

Lee¹,

Kim

Lee

2017

IJFIS

View full text Add to dashboard Cite

In supervised learning methods, a large amount of labeled data is necessary to find reliable classification boundaries to train a classifier. However, it is hard to obtain a large amount of labeled data in practice and it is time-consuming with a lot of cost to obtain labels of data. Although unlabeled data is comparatively plentiful than labeled data, most of supervised learning methods are not designed to exploit unlabeled data. Self-training is one of the semisupervised learning methods that alternatively repeat training a base classifier and labeling unlabeled data in training set. Most self-training methods have adopted confidence measures to select confidently labeled examples because high-confidence usually implies low error. A major difficulty of self-training is the error amplification. If a classifier misclassifies some examples and the misclassified examples are included in the labeled training set, the next classifier may learn improper classification boundaries and generate more misclassified examples. Since base classifiers are built with small labeled dataset and are hard to earn good generalization performance due to the small labeled dataset. Although improving training procedure and the performance of classifiers, error occurrence is inevitable, so corrections of self-labeled data are necessary to avoid error amplification in the following classifiers. In this paper, we propose a deep neural network based approach for alleviating the problems of self-training by combining schemes: pre-training, dropout and error forgetting. By applying combinations of these schemes to various dataset, a trained classifier using our approach shows improved performance than trained classifier using common self-training.

show abstract

Section: Related Workmentioning

confidence: 99%

“…For example, a selftraining decision tree [2] has used class proportion of leaf as confidence measure. A self-training support vector machine (SVM) [3] have used distance between examples and the classification boundary.…”

Section: Introductionmentioning

confidence: 99%

Deep Neural Network Self-training Based on Unsupervised Learning and Dropout

Lee¹,

Kim

Lee

2017

IJFIS

View full text Add to dashboard Cite

show abstract

“…Another advantage of the decision tree is that it is easy to understand and manipulate by the manufacturing engineer (Perez, Datta-Gupta, & Misra, 2005) including C-4.5 (Quinlan, 1993), QUEST (Agbon, Aldana, & Araque, 2003), ID-3 (Papagelis & Kalles, 2000), and GA-Tree (Quinlan, 1987). Strategies for similar capabilities decision (Loh & Shih, 1997), which manage the overfitting issue, have been proposed by Quinlan (1986), and the matter of the capacity to be made greater or lesser has been discussed by Tanha, Someren, and Afsarmanesh (2015). This helps in choosing a decision from multiple choices with the same capability on the basis of the probability of occurrence (Chandra & Varghese, 2009).…”

Section: Department Of Agriculture; Dementioning

confidence: 99%

Production planning of Pakistan Tobacco Company (PTC) using quantitative and multiple‐criteria decision analysis—A case in‐point

Yousaf

Ali

Sabir

et al. 2017

Multi Criteria Decision Anal

View full text Add to dashboard Cite

The main purpose of this paper is to improve the production planning of Pakistan Tobacco Company by selecting the most preferred brand and subsequently generating maximum profit from it. As the company produces a variety of products, the technique of multi criteria decision making is used to select the most preferred brand. To generate the maximum output from the preferred brand, different methods of qualitative managerial analysis are used, which include decision analysis to decide “why and where” the manufacturing should be carried out, transportation model to minimize the logistics cost while meeting the demand, and linear programming technique to maximize the profit generated in 2014–2015. The result obtained from analytical hierarchy process shows that the most preferred brand of the company with respect to price, quality, and comfort is John Player Gold Leaf. The decision analysis explains that this brand should be manufactured in the Jhelum factory of the company as it is more cost‐effective to produce and there is a high availability of resources. Transportation model minimizes the logistics cost of this brand from the 2 factories while meeting the demand at each provinces, central warehouse. Linear programming contributes in generating a profit of 32.738 billion PKR with an amount of 0.35 million PKR, which is more than that of the current profit of the company in the year. These results will allow the top management of the company to take corrective decisions well in time, gain a core competency in cost reduction, and make the supply chain process more efficient.

show abstract

“…Tanha et al [10] suggested that using decision tree classifiers as base classifiers along with self-training algorithm is not quite effective as semisupervised learning is concerned mainly due to low performance when decision tree classifiers compute probability estimations for their predictions. However, decision trees are not demanding in training time and produce easily comprehensive models.…”

Section: Introductionmentioning

confidence: 99%

“…A series of modifications have been proposed so as to refrain from using the simplistic proportion distribution at the leaves of a pruned decision tree [11]. Laplacian correction and grafted decision trees are some of them [10]. Torgo [12] also made a thorough study of tree-based regression models and focused on generation of tree models and on pruning by tree selection.…”

Section: Introductionmentioning

confidence: 99%

Self-Trained LMT for Semisupervised Learning

Fazakis

Karlos

Kotsiantis

et al. 2016

Computational Intelligence and Neuroscience

View full text Add to dashboard Cite

The most important asset of semisupervised classification methods is the use of available unlabeled data combined with a clearly smaller set of labeled examples, so as to increase the classification accuracy compared with the default procedure of supervised methods, which on the other hand use only the labeled data during the training phase. Both the absence of automated mechanisms that produce labeled data and the high cost of needed human effort for completing the procedure of labelization in several scientific domains rise the need for semisupervised methods which counterbalance this phenomenon. In this work, a self-trained Logistic Model Trees (LMT) algorithm is presented, which combines the characteristics of Logistic Trees under the scenario of poor available labeled data. We performed an in depth comparison with other well-known semisupervised classification methods on standard benchmark datasets and we finally reached to the point that the presented technique had better accuracy in most cases.

show abstract

Semi-supervised self-training for decision tree classifiers

Cited by 191 publications

References 34 publications

Deep Neural Network Self-training Based on Unsupervised Learning and Dropout

Deep Neural Network Self-training Based on Unsupervised Learning and Dropout

Production planning of Pakistan Tobacco Company (PTC) using quantitative and multiple‐criteria decision analysis—A case in‐point

Self-Trained LMT for Semisupervised Learning

Contact Info

Product

Resources

About