Tarek Abdunabi scite author profile

Gradient Boosting Machines (GBMs) are powerful ensemble learning techniques that have been successfully applied to several low-dimensional applications. In GBMs, the learning algorithm sequentially fits new models to provide more accurate prediction of the response variable. Despite their high accuracy, GBMs suffer from major drawbacks such as high memory-consumption. In addition, given the fact that the learning algorithm is essentially sequential, it has problems with parallelization by design. Therefore, building optimized GBMs for high-dimensional applications requires powerful computations resources. In this paper, using real, high-dimensional (i.e. 1776 predictors) dataset, we demonstrate that by using different features selection/reduction techniques, the computations costs for building and tuning Tree-based GBMs can be substantially reduced at a slight drop in prediction accuracy. To cope with the data-intensive computations involved in building and tuning the ensembles, we utilize Amazon Elastic Compute Cloud (EC2) web service.

show abstract

Predicting a biological response of molecules from their chemical properties using diverse and optimized ensembles of stochastic gradient boosting machine

Abdunabi

Basir

2014

View full text Add to dashboard Cite

The development of a new drug largely depends on trial and error. It typically involves synthesizing thousands of compounds that finally becomes a drug. This process is extremely expensive and slow. Therefore, the ability to accurately predict the biological activity of molecules, and understand the rationale behind those predictions would be of great value to the pharmaceutical industry. Gradient Boosting Machines (GBMs) are powerful ensemble learning techniques that have been successfully applied to several low-dimensional applications. Despite their high accuracy, GBMs suffer from major drawbacks such as high memory-consumption. In this paper, using real, high-dimensional (i.e. 1776 predictors) molecules dataset, we demonstrate that by using different feature selection/reduction techniques, the computations costs for building and tuning GBMs can be substantially reduced at a slight drop in prediction accuracy. In addition, by fusing the decisions made by the ensembles using two fusion techniques, namely a majority vote and an optimized feedforward neural network, we obtain a better prediction accuracy than the individual accuracy of all ensembles.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Tarek Abdunabi

Smart mobile web browsing

Building diverse and optimized ensembles of gradient boosted trees for high-dimensional data

Predicting a biological response of molecules from their chemical properties using diverse and optimized ensembles of stochastic gradient boosting machine

Contact Info

Product

Resources

About