Gradient Boosting Decision Tree (GBDT) is a popular machine learning algorithms with implementations such as LightGBM and in popular machine learning toolkits like Scikit-Learn. Many implementations can only produce trees in an offline manner and in a greedy manner. We explore ways to convert existing GBDT implementations to known neural network architectures with minimal performance loss in order to allow decision splits to be updated in an online manner and provide extensions to allow splits points to be altered as a neural architecture search problem. We provide learning bounds for our neural network and demonstrate that our non-greedy approach has comparable performance to state-of-the-art offline, greedy tree boosting models.
This work presents an approach to automatically induction for non-greedy decision trees constructed from neural network architecture. This construction can be used to transfer weights when growing or pruning a decision tree, allowing non-greedy decision tree algorithms to automatically learn and adapt to the ideal architecture. In this work, we examine the underpinning ideas within ensemble modelling and Bayesian model averaging which allow our neural network to asymptotically approach the ideal architecture through weights transfer. Experimental results demonstrate that this approach improves models over fixed set of hyperparameters for decision tree models and decision forest models.
We show that Residual Networks (ResNet) is equivalent to boosting feature representation, without any modification to the underlying ResNet training algorithm. A regret bound based on Online Gradient Boosting theory is proved and suggests that ResNet could achieve Online Gradient Boosting regret bounds through neural network architectural changes with the addition of a shrinkage parameter in the identity skip-connections and using residual modules with max-norm bounds. Through this relation between ResNet and Online Boosting, novel feature representation boosting algorithms can be constructed based on altering residual modules. We demonstrate this through proposing decision tree residual modules to construct a new boosted decision tree algorithm and demonstrating generalization error bounds for both approaches; relaxing constraints within BoostResNet algorithm to allow it to be trained in an out-of-core manner. We evaluate convolution ResNet with and without shrinkage modifications to demonstrate its efficacy, and demonstrate that our online boosted decision tree algorithm is comparable to state-of-the-art offline boosted decision tree algorithms without the drawback of offline approaches.Taking inspiration from Online Boosting, we also modify the architecture of ResNet with an additional learnable shrinkage parameter (vanilla ResNet can be interpreted as Online Boosting algorithm where the shrinkage factor is fixed/unlearnable and set to 1). As this approach only modifies the neural network architecture, the same underlying ResNet algorithms can still be used.Experimentally, we compare vanilla ResNet with our modified ResNet using convolutional neural network residual network (ResNet-CNN) on multiple image datasets. Our modified ResNet shows some improvement over vanilla ResNet architecture.We also compare our boosted decision tree neural decision tree residual network on multiple benchmark datasets and their results against other decision tree ensemble methods, including Deep Neural Decision Forests [6], neural decision trees ensembled via AdaNet [7], and off-the-shelf algorithms (gradient boosting decision tree/random forest) using LightGBM [8]. In our experiments, neural decision tree residual network showed superior performance to neural decision tree variants, and comparable performance to offline tradition gradient boosting decision tree models. Related WorksIn recent years researchers have sought to understand why ResNet perform the way that they do. The BoostResNet algorithm reinterprets ResNet as a multi-channel telescoping sum boosting problem for the purpose of introducing a new algorithm for sequential training [4], providing theoretical justification for the representational power of ResNet under linear neural network constraints [3]. One interpretation of residual networks is as a collection of many paths of differing lengths which behave like a shallow ensemble; empirical studies demonstrate that residual networks introduce short paths which can carry gradients throughout the extent of very deep n...
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.