Gradient Boosting Machines (GBMs) are powerful ensemble learning techniques that have been successfully applied to several low-dimensional applications. In GBMs, the learning algorithm sequentially fits new models to provide more accurate prediction of the response variable. Despite their high accuracy, GBMs suffer from major drawbacks such as high memory-consumption. In addition, given the fact that the learning algorithm is essentially sequential, it has problems with parallelization by design. Therefore, building optimized GBMs for high-dimensional applications requires powerful computations resources. In this paper, using real, high-dimensional (i.e. 1776 predictors) dataset, we demonstrate that by using different features selection/reduction techniques, the computations costs for building and tuning Tree-based GBMs can be substantially reduced at a slight drop in prediction accuracy. To cope with the data-intensive computations involved in building and tuning the ensembles, we utilize Amazon Elastic Compute Cloud (EC2) web service.
The development of a new drug largely depends on trial and error. It typically involves synthesizing thousands of compounds that finally becomes a drug. This process is extremely expensive and slow. Therefore, the ability to accurately predict the biological activity of molecules, and understand the rationale behind those predictions would be of great value to the pharmaceutical industry. Gradient Boosting Machines (GBMs) are powerful ensemble learning techniques that have been successfully applied to several low-dimensional applications. Despite their high accuracy, GBMs suffer from major drawbacks such as high memory-consumption. In this paper, using real, high-dimensional (i.e. 1776 predictors) molecules dataset, we demonstrate that by using different feature selection/reduction techniques, the computations costs for building and tuning GBMs can be substantially reduced at a slight drop in prediction accuracy. In addition, by fusing the decisions made by the ensembles using two fusion techniques, namely a majority vote and an optimized feedforward neural network, we obtain a better prediction accuracy than the individual accuracy of all ensembles.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.