Unpack Local Model Interpretation for GBDT

Fang, Wenjing; Zhou, Jun; Li, Xiaolong; Zhu, Kenny Q.

doi:10.1007/978-3-319-91458-9_48

Cited by 8 publications

(7 citation statements)

References 9 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The proportion of positive samples in all training samples contained in this node is denoted as r t k (y), which can also be considered as the probability that the training sample contained in node k belongs to the predicted sample category y. The difference in proportion of positive samples in child node and its corresponding parent node can be viewed as the node importance of the child node [40][41][42]. The larger the difference, the higher the purity of the sample split to the child node compared to that of the parent node, thus the higher the importance of the child node for the classification problem.…”

Section: Traditional Random Forest Algorithmmentioning

confidence: 99%

Improved Random Forest Algorithm Based on Decision Paths for Fault Diagnosis of Chemical Process with Incomplete Data

Zhang

Luo

et al. 2021

Sensors

View full text Add to dashboard Cite

Fault detection and diagnosis (FDD) has received considerable attention with the advent of big data. Many data-driven FDD procedures have been proposed, but most of them may not be accurate when data missing occurs. Therefore, this paper proposes an improved random forest (RF) based on decision paths, named DPRF, utilizing correction coefficients to compensate for the influence of incomplete data. In this DPRF model, intact training samples are firstly used to grow all the decision trees in the RF. Then, for each test sample that possibly contains missing values, the decision paths and the corresponding nodes importance scores are obtained, so that for each tree in the RF, the reliability score for the sample can be inferred. Thus, the prediction results of each decision tree for the sample will be assigned to certain reliability scores. The final prediction result is obtained according to the majority voting law, combining both the predicting results and the corresponding reliability scores. To prove the feasibility and effectiveness of the proposed method, the Tennessee Eastman (TE) process is tested. Compared with other FDD methods, the proposed DPRF model shows better performance on incomplete data.

show abstract

Section: Traditional Random Forest Algorithmmentioning

confidence: 99%

Improved Random Forest Algorithm Based on Decision Paths for Fault Diagnosis of Chemical Process with Incomplete Data

Zhang

Luo

et al. 2021

Sensors

View full text Add to dashboard Cite

show abstract

“…GBDT is a common choice in machine learning tasks. Besides the high performance and efficiency, GBDT and its variants also provides the model interpretability [47] and the easiness of parameter tuning. The most direct transfer is first train a model on the source dataset.…”

Section: B Model-based Transfermentioning

confidence: 99%

Adapted Tree Boosting for Transfer Learning

Fang

Chen

Wang

et al. 2019

2019 IEEE International Conference on Big Data (Big Data)

Self Cite

View full text Add to dashboard Cite

Secure online transaction is an essential task for ecommerce platforms. Alipay, one of the world's leading cashless payment platform, provides the payment service to both merchants and individual customers. The fraud detection models are built to protect the customers, but stronger demands are raised by the new scenes, which are lacking in training data and labels. The proposed model makes a difference by utilizing the data under similar old scenes and the data under a new scene is treated as the target domain to be promoted. Inspired by this real case in Alipay, we view the problem as a transfer learning problem and design a set of revise strategies to transfer the source domain models to the target domain under the framework of gradient boosting tree models. This work provides an option for the cold-start and data-sharing problems.

show abstract

“…The interpretability of boosted tree model in both global and local level has been shown in [3]. In our work, since the whole model of each task consists of the common part and the specific part, so we collect them all to get the whole importance of each feature.…”

Section: Interpretabilitymentioning

confidence: 99%

“…In our work, since the whole model of each task consists of the common part and the specific part, so we collect them all to get the whole importance of each feature. For each instance, the contribution of each feature to the final prediction can be calculated with the method in [3]. An example of the top 20 important feature in task2 of Scene1 is shown in figure 2.…”

Section: Interpretabilitymentioning

confidence: 99%

“…(2) The construction of the trees in the common model may be unbeneficial or even harmful for some task after some rounds, which means that a mechanism is needed to find the proper round so that a task can quit from the common model training process if necessary. (3) The training of the second stage should take the information of the first stage into consideration, so that the obtained model can be more effective, instead of simply combining two boosted tree models when predicting for each task. To handle these problems, a regularization strategy is proposed for the construction of each tree to alleviate the domination problem, and an early stopping strategy is designed so that a task can quit the common process if further training will not improve its performance.…”

Section: Multi-task Boosted Tree 31 the Whole Frameworkmentioning

confidence: 99%

See 1 more Smart Citation

Interpretable MTL from Heterogeneous Domains using Boosted Tree

Zhang

2019

Proceedings of the 28th ACM International Conference on Information and Knowledge Management

View full text Add to dashboard Cite

Multi-task learning (MTL) aims at improving the generalization performance of several related tasks by leveraging useful information contained in them. However, in industrial scenarios, interpretability is always demanded, and the data of different tasks may be in heterogeneous domains, making the existing methods unsuitable or unsatisfactory. In this paper, following the philosophy of boosted tree, we proposed a two-stage method. In stage one, a common model is built to learn the commonalities using the common features of all instances. Different from the training of conventional boosted tree model, we proposed a regularization strategy and an earlystopping mechanism to optimize the multi-task learning process. In stage two, started by fitting the residual error of the common model, a specific model is constructed with the task-specific instances to further boost the performance. Experiments on both benchmark and real-world datasets validate the effectiveness of the proposed method. What's more, interpretability can be naturally obtained from the tree based method, satisfying the industrial needs.

show abstract

Unpack Local Model Interpretation for GBDT

Cited by 8 publications

References 9 publications

Improved Random Forest Algorithm Based on Decision Paths for Fault Diagnosis of Chemical Process with Incomplete Data

Improved Random Forest Algorithm Based on Decision Paths for Fault Diagnosis of Chemical Process with Incomplete Data

Adapted Tree Boosting for Transfer Learning

Interpretable MTL from Heterogeneous Domains using Boosted Tree

Contact Info

Product

Resources

About