Sebastiaan Höppner scite author profile

Customer retention campaigns increasingly rely on predictive models to detect potential churners in a vast customer base. From the perspective of machine learning, the task of predicting customer churn can be presented as a binary classification problem. Using data on historic behavior, classification algorithms are built with the purpose of accurately predicting the probability of a customer defecting. The predictive churn models are then commonly selected based on accuracy related performance measures such as the area under the ROC curve (AUC). However, these models are often not well aligned with the core business requirement of profit maximization, in the sense that, the models fail to take into account not only misclassification costs, but also the benefits originating from a correct classification. Therefore, the aim is to construct churn prediction models that are profitable and preferably interpretable too. The recently developed expected maximum profit measure for customer churn (EMPC) has been proposed in order to select the most profitable churn model. We present a new classifier that integrates the EMPC metric directly into the model construction. Our technique, called ProfTree, uses an evolutionary algorithm for learning profit driven decision trees. In a benchmark study with real-life data sets from various telecommunication service providers, we show that ProfTree achieves significant profit improvements compared to classic accuracy driven tree-based methods.

show abstract

Detection of Fraud in a Clinical Trial Using Unsupervised Statistical Monitoring

Viron¹,

Trotta²,

Schumacher³

et al. 2021

Ther Innov Regul Sci

View full text Add to dashboard Cite

Background A central statistical assessment of the quality of data collected in clinical trials can improve the quality and efficiency of sponsor oversight of clinical investigations. Material and Methods The database of a large randomized clinical trial with known fraud was reanalyzed with a view to identifying, using only statistical monitoring techniques, the center where fraud had been confirmed. The analysis was conducted with an unsupervised statistical monitoring software using mixed-effects statistical models. The statistical analyst was unaware of the location, nature, and extent of the fraud. Results Five centers were detected as atypical, including the center with known fraud (which was ranked 2). An incremental analysis showed that the center with known fraud could have been detected after only 25% of its data had been reported. Conclusion An unsupervised approach to central monitoring, using mixed-effects statistical models, is effective at detecting centers with fraud or other data anomalies in clinical trials.

show abstract

Instance-dependent cost-sensitive learning for detecting transfer fraud

Höppner

Baesens

Verbeke

et al. 2022

European Journal of Operational Research

View full text Add to dashboard Cite

Cellwise robust M regression

Filzmoser

Höppner

Ortner

et al. 2020

Computational Statistics & Data Analysis

View full text Add to dashboard Cite

The cellwise robust M regression estimator is introduced as the first estimator of its kind that intrinsically yields both a map of cellwise outliers consistent with the linear model, and a vector of regression coefficients that is robust against vertical outliers and leverage points. As a by-product, the method yields a weighted and imputed data set that contains estimates of what the values in cellwise outliers would need to amount to if they had fit the model. The method is illustrated to be equally robust as its casewise counterpart, MM regression. The cellwise regression method discards less information than any casewise robust estimator. Therefore, predictive power can be expected to be at least as good as casewise alternatives. These results are corroborated in a simulation study. Moreover, while the simulations show that predictive performance is at least on par with casewise methods if not better, an application to a data set consisting of compositions of Swiss nutrients, shows that in individual cases, CRM can achieve a significantly higher predictive accuracy compared to MM regression.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.