Class imbalance presents significant challenges to customer churn prediction. Many data-level sampling solutions have been developed to deal with this issue. In this paper, we comprehensively compare the performance of several state-of-the-art sampling techniques in the context of churn prediction. A recently developed maximum profit criterion is used as one of the main performance measures to offer more insights from the perspective of cost-benefit. The experimental results show that the impact of sampling methods depends on the used evaluation metric and the impact pattern is interrelated with the classifiers. An in-depth exploration of the reaction patterns is conducted and suitable sampling strategies are recommended for each situation. Furthermore, we also discuss the setting of the sampling rate in the empirical comparison. Our findings will offer a useful guideline for the use of sampling methods in the context of churn prediction.
Customer retention has become a necessity in many markets, including mobile telecommunications. As it becomes easier for customers to switch providers, the providers seek to improve prediction models in an effort to intervene with potential churners. Many studies have evaluated different models seeking any improvement to prediction accuracy. This study proposes that the attributes, not the model, need to be reconsidered. By representing call detail records as a social network of customers, network attributes can be extracted for use in various traditional prediction models. The use of network attributes exhibits a significant increase in the area under the receiver operating curve (AUC) when compared to using just individual customer attributes.
Customer retention has become a necessity in many markets, including mobile telecommunications. As it becomes easier for customers to switch providers, the providers seek to improve prediction models in an effort to intervene with potential churners. Many studies have evaluated different models seeking any improvement to prediction accuracy. This study proposes that the attributes, not the model, need to be reconsidered. By representing call detail records as a social network of customers, network attributes can be extracted for use in various traditional prediction models. The use of network attributes exhibits a significant increase in the area under the receiver operating curve (AUC) when compared to using just individual customer attributes.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.