Accelerating the XGBoost algorithm using GPU computing

Mitchell, Rory; Frank, Eibe

doi:10.7717/peerj-cs.127

Cited by 208 publications

(86 citation statements)

References 11 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…Machine learning techniques, such as GBRTs, are considered superior in predicting when compared to statistical methods (Elith, Leathwick & Hastie, 2008). XGBoost, in particular, is regarded as the state-of-the-art tree boosting system, yielding very fast and accurate predictions (Chen & Guestrin, 2016;Mitchell & Frank, 2017). One of the main advantages of GBRTs over statistical methods is the possibility of efficiently modelling multilevel variable interactions (Elith et al, 2008).…”

Section: Predicting Growth Coefficients For Trait Combinations and mentioning

confidence: 99%

Global drivers of reef fish growth

Morais

Bellwood

2018

Fish and Fisheries

View full text Add to dashboard Cite

Few studies have attempted to understand how fish growth scales at community and macroecological levels. This study evaluated the drivers of reef fish growth across a large gradient of environmental variables and a range of morphological and behavioural traits. We compiled Von Bertalanffy Growth parameters for reef fishes and standardized K relative to species maximum sizes, obtaining Kmax. We then modelled the response of Kmax to body size, diet, body shape, position relative to the reef, schooling behaviour, sea surface temperature, pelagic net primary productivity and ageing method, while accounting for phylogenetic structure in the data. The final model explained 61.5% of the variation in Kmax and contained size, temperature, diet, method and position. Body size explained 64% of the modelled Kmax variability, while the other variables explained between 6% (temperature) and 2.5% (position). Kmax steadily decreased with body size and increased with temperature. All else being equal, herbivores/macroalgivores and pelagic reef fishes had higher growth rates than the other groups. Moreover, length–frequency ageing tended to overestimate Kmax compared to other methods (e.g. otolith's rings). Our results are consistent with (a) metabolic theory that predicts body size and temperature dependence of physiological rates; and (b) ecological theory that implies influence of resource availability and acquisition on growth. At last, we use machine learning to accurately predict growth coefficients for combinations of traits and environmental settings. Our study helps to bridge the gap between individual and community growth patterns, providing insights into the role of fish growth in the ecosystem process of biomass accumulation.

show abstract

Section: Predicting Growth Coefficients For Trait Combinations and mentioning

confidence: 99%

Global drivers of reef fish growth

Morais

Bellwood

2018

Fish and Fisheries

View full text Add to dashboard Cite

show abstract

“…XGBoost includes a regularization term that is used to alleviate overfitting, and as support for arbitrary differentiable loss functions [29]. The objective function of Xgboost consists of two parts; namely, a loss function over the training set and a regularization term that penalizes the complexity of the model as follows [30]:…”

Section: Xgboostmentioning

confidence: 99%

“…γT provides a constant penalty for each additional tree leaf, and λω 2 penalizes for extreme weights. γ and λ are user configurable parameters [30].…”

Section: Xgboostmentioning

confidence: 99%

Developing a Novel Machine Learning-Based Classification Scheme for Predicting SPCs in Colorectal Cancer Survivors

Ting

Chang

et al. 2020

Applied Sciences

View full text Add to dashboard Cite

Colorectal cancer is ranked third and fourth in terms of mortality and cancer incidence in the world. While advances in treatment strategies have provided cancer patients with longer survival, potentially harmful second primary cancers can occur. Therefore, second primary colorectal cancer analysis is an important issue with regard to clinical management. In this study, a novel predictive scheme was developed for predicting the risk factors associated with second colorectal cancer in patients with colorectal cancer by integrating five machine learning classification techniques, including support vector machine, random forest, multivariate adaptive regression splines, extreme learning machine, and extreme gradient boosting. A total of 4287 patients in the datasets provided by three hospital tumor registries were used. Our empirical results revealed that this proposed predictive scheme provided promising classification results and the identification of important risk factors for predicting second colorectal cancer based on accuracy, sensitivity, specificity, and area under the curve metrics. Collectively, our clinical findings suggested that the most important risk factors were the combined stage, age at diagnosis, BMI, surgical margins of the primary site, tumor size, sex, regional lymph nodes positive, grade/differentiation, primary site, and drinking behavior. Accordingly, these risk factors should be monitored for the early detection of second primary tumors in order to improve treatment and intervention strategies.

show abstract

“…CUB is a C++ template library that contains multiple algorithms for the collectives. The CUB library contains fastest [63,66] implementation for the reduction and scan collectives and is used by libraries such as Thrust [29] as well as most deep learning frameworks [5,14,23,25,68]. We compare against the latest release of CUB [62] (version 1.8) and evaluate against different parameters of the collectives.…”

Section: Optimizing Cub For Half Precisionmentioning

confidence: 99%

“…The objective of the paper is to expand the class of algorithms that can execute on TCUs-enabling the TCU to be used for non-GEMM kernels. We choose reduction and scan, since a large body of work [30,31,32,33,60,68,82,84] has shown that they are key primitives of data parallel implementations of radix sort, quicksort, string comparison, lexical analysis, stream compaction, polynomial evaluation, solving recurrence equations, and histograms. We formulate a simple mapping between reduction or scan and TCUs.…”

Section: Introductionmentioning

confidence: 99%

Accelerating reduction and scan using tensor core units

Dakkak

Xiong

et al. 2019

Proceedings of the ACM International Conference on Supercomputing

View full text Add to dashboard Cite

Driven by deep learning, there has been a surge of specialized processors for matrix multiplication, referred to as Tensor Core Units (TCUs). These TCUs are capable of performing matrix multiplications on small matrices (usually 4 × 4 or 16 × 16) to accelerate the convolutional and recurrent neural networks in deep learning workloads. In this paper we leverage NVIDIA's TCU to express both reduction and scan with matrix multiplication and show the benefits -in terms of program simplicity, efficiency, and performance. Our algorithm exercises the NVIDIA TCUs which would otherwise be idle, achieves 89% − 98% of peak memory copy bandwidth, and is orders of magnitude faster (up to 100× for reduction and 3× for scan) than state-of-the-art methods for small segment sizes -common in machine learning and scientific applications. Our algorithm achieves this while decreasing the power consumption by up to 22% for reduction and 16% for scan.

show abstract

Accelerating the XGBoost algorithm using GPU computing

Cited by 208 publications

References 11 publications

Global drivers of reef fish growth

Global drivers of reef fish growth

Developing a Novel Machine Learning-Based Classification Scheme for Predicting SPCs in Colorectal Cancer Survivors

Accelerating reduction and scan using tensor core units

Contact Info

Product

Resources

About