The importance of young athletes in the field of professional cycling has sky-rocketed during the past years. Nevertheless, the early talent identification of these riders largely remains a subjective assessment. Therefore, an analytical system which automatically detects talented riders based on their freely available youth results should be installed. However, such a system cannot be copied directly from related fields, as large distinctions are observed between cycling and other sports. The aim of this paper is to develop such a data analytical system, which leverages the unique features of each race and thereby focusses on feature engineering, data quality, and visualization. To facilitate the deployment of prediction algorithms in situations without complete cases, we propose an adaptation to the k-nearest neighbours imputation algorithm which uses expert knowledge. Overall, our proposed method correlates strongly with eventual rider performance and can aid scouts in targeting young talents. On top of that, we introduce several model interpretation tools to give insight into which current starting professional riders are expected to perform well and why.
The purpose of this paper is to enhance current practices in business-to-business (B2B) customer churn prediction modelling. Following the recent trend from accuracy-based to profit-driven evaluation businessto-customer churn prediction, we present a novel expected maximum profit measure for B2B customer churn (EMPB), which is used to demonstrate how current practices are suboptimal due to large discrepancies in customer value. To directly incorporate the heterogeneity of customer values and profit concerns of the company, we propose an instance-dependent profit maximizing classifier based on gradient boosting, named B2Boost. The main innovation of B2Boost is the fact that it considers these differences and incorporates them into the model construction by maximizing the objective function in terms of the EMPB. The results indicate that the expected maximal profit gains made in our analyses are substantial.This study arguments towards both deploying models based on customer-specific profitability differences, as well as evaluating based on our instance-dependent EMPB measure.
The Spearman scores from the rf_knn and rf_regression models are given an incorrect negative value in Table 6 (upper figure below), while their actual performance should be the opposite value (i.e., 0.5308, and 0.5779).Original article has been corrected.Publisher's Note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.