Abstract:Telecommunication industry is highly competitive, and mass marketing is not applicable anymore. Moreover, Mobile customers have different behaviors that urge telecom industries to differentiate their strategies to meet customers' needs. At the same time, mobile operators have an enormous amount of customer records, and data-driven approaches can help them to draw insights from this huge amount of data. Therefore, a data-driven segmentation approach can support marketing strategies to tailor their marketing pla… Show more
“…Researchers used various unsupervised learning techniques for customer segmentation based on behavioral and factor analysis. Namvar, Ghazanfari & Naderpour (2017) proposed the data-driven segmentation for obtaining the increment in Average Revenue Per User (ARPU), in order to help the operators to design their marketing strategies. K-mean clustering algorithm was used to divide the process into two segments (1) behavioral segmentation, and (2) beneficial segmentation.…”
The telecom sector is currently undergoing a digital transformation by integrating artificial intelligence (AI) and Internet of Things (IoT) technologies. Customer retention in this context relies on the application of autonomous AI methods for analyzing IoT device data patterns in relation to the offered service packages. One significant challenge in existing studies is treating churn recognition and customer segmentation as separate tasks, which diminishes overall system accuracy. This study introduces an innovative approach by leveraging a unified customer analytics platform that treats churn recognition and segmentation as a bi-level optimization problem. The proposed framework includes an Auto Machine Learning (AutoML) oversampling method, effectively handling three mixed datasets of customer churn features while addressing imbalanced-class distribution issues. To enhance performance, the study utilizes the strength of oversampling methods like synthetic minority oversampling technique for nominal and continuous features (SMOTE-NC) and synthetic minority oversampling with encoded nominal and continuous features (SMOTE-ENC). Performance evaluation, using 10-fold cross-validation, measures accuracy and F1-score. Simulation results demonstrate that the proposed strategy, particularly Random Forest (RF) with SMOTE-NC, outperforms standard methods with SMOTE. It achieves accuracy rates of 79.24%, 94.54%, and 69.57%, and F1-scores of 65.25%, 81.87%, and 45.62% for the IBM, Kaggle Telco and Cell2Cell datasets, respectively. The proposed method autonomously determines the number and density of clusters. Factor analysis employing Bayesian logistic regression identifies influential factors for accurate customer segmentation. Furthermore, the study segments consumers behaviorally and generates targeted recommendations for personalized service packages, benefiting decision-makers.
“…Researchers used various unsupervised learning techniques for customer segmentation based on behavioral and factor analysis. Namvar, Ghazanfari & Naderpour (2017) proposed the data-driven segmentation for obtaining the increment in Average Revenue Per User (ARPU), in order to help the operators to design their marketing strategies. K-mean clustering algorithm was used to divide the process into two segments (1) behavioral segmentation, and (2) beneficial segmentation.…”
The telecom sector is currently undergoing a digital transformation by integrating artificial intelligence (AI) and Internet of Things (IoT) technologies. Customer retention in this context relies on the application of autonomous AI methods for analyzing IoT device data patterns in relation to the offered service packages. One significant challenge in existing studies is treating churn recognition and customer segmentation as separate tasks, which diminishes overall system accuracy. This study introduces an innovative approach by leveraging a unified customer analytics platform that treats churn recognition and segmentation as a bi-level optimization problem. The proposed framework includes an Auto Machine Learning (AutoML) oversampling method, effectively handling three mixed datasets of customer churn features while addressing imbalanced-class distribution issues. To enhance performance, the study utilizes the strength of oversampling methods like synthetic minority oversampling technique for nominal and continuous features (SMOTE-NC) and synthetic minority oversampling with encoded nominal and continuous features (SMOTE-ENC). Performance evaluation, using 10-fold cross-validation, measures accuracy and F1-score. Simulation results demonstrate that the proposed strategy, particularly Random Forest (RF) with SMOTE-NC, outperforms standard methods with SMOTE. It achieves accuracy rates of 79.24%, 94.54%, and 69.57%, and F1-scores of 65.25%, 81.87%, and 45.62% for the IBM, Kaggle Telco and Cell2Cell datasets, respectively. The proposed method autonomously determines the number and density of clusters. Factor analysis employing Bayesian logistic regression identifies influential factors for accurate customer segmentation. Furthermore, the study segments consumers behaviorally and generates targeted recommendations for personalized service packages, benefiting decision-makers.
“…For example, one of the most famous partitive clustering algorithms, k-means, uses the K-means++ algorithm to find the initial prototypes [23]. Partitive clustering algorithms have been used in a wide range of applications, from big data clustering [24] for customer segmentation [25,26], to weather prediction [27], to biomedical health [28], and many others. The main steps of a partitive clustering algorithm are outlined in Algorithm 1 below.…”
Mobile telematics is a relatively new innovation that involves collecting data on driving behavior using the internal sensors in a smartphone rather than from an in-vehicle data recorder. However, telematics data are usually not labeled, which makes extracting driving patterns from them very difficult. Therefore, unsupervised learning algorithms play an important role in this field. In addition, most current research is based on datasets developed in a laboratory or from site investigations and questionnaires, which are very different from real-world driving behaviors. To advance unsupervised learning techniques in this field, and to fill the gap in findings based on real-world data, we have developed an unsupervised pattern recognition framework for mobile telematics data. The framework comprises three main components: a self-organizing map, a nine-layers deep autoencoder, and partitive clustering algorithms. The SOM algorithm reduces the complexity of the data, the deep auto-encoder extracts the features, and the clustering algorithm groups driving events with similar patterns into behaviors. Further, given clustering with mobile telematics data is an under-researched area, we undertook an empirical comparison of five well-known clustering algorithms to determine the strengths and weaknesses of each method and which is best suited to categorizing driving styles. The study was conducted with a real-world insurance dataset containing 500,000 journeys by 2500 drivers, and the results were evaluated against three measures -Davis Boulding, Calinski Harabasz, and execution time. Overall, we find that k-means clustering and a self-organizing map were able to extract more accurate patterns than others. A statistical analysis of the 29 clusters produced by SOM and k-means, revealed 29 unique driving styles, all of which can be found in the transportation literature. The results from the study, with support from the corresponding literature review, demonstrate the efficacy of the presented framework in unsupervised settings. Additionally, the results provide a basis for developing a future risk analysis and automatic decision support system for usage-based insurance companies.
“…Namvar et al [23] also proposed a 2-dimensional segmentation to segment telco users in both behavioural and beneficial phases using K-means clustering. Usage-based features were applied to the behavioural segmentation, while revenuebased features were applied by the beneficial segmentation.…”
Section: B Customer Segmentationmentioning
confidence: 99%
“…Most researchers only focused on one element of customer analytics, that is, either churn prediction or customer segmentation. And the majority of current researches applied segmentation to the whole customer dataset [21] [22] [23]. However, If only churn prediction is conducted, it is not able to understand the reasons behind it well, since the operator can only know which customers are likely to churn.…”
Section: Research Motivation and Contributionsmentioning
In the telco industry, attracting new customers is no longer a good strategy since the cost of retaining existing customers is much lower. Churn management becomes instrumental in the telco industry. As there is limited study combining churn prediction and customer segmentation, this paper aims to propose an integrated customer analytics framework for churn management. There are six components in the framework, including data pre-processing, exploratory data analysis (EDA), churn prediction, factor analysis, customer segmentation, and customer behaviour analytics. This framework integrates churn prediction and customer segmentation process to provide telco operators with a complete churn analysis to better manage customer churn. Three datasets are used in the experiments with six machine learning classifiers. First, the churn status of the customers is predicted using multiple machine learning classifiers. Synthetic Minority Oversampling Technique (SMOTE) is applied to the training set to deal with the problems with imbalanced datasets. The 10-fold cross-validation is used to assess the models. Accuracy and F1-score are used for model evaluation. F1-score is considered to be an important metric to measure the models for imbalanced datasets since the premise of churn management is to be able to identify customers who will churn. Experimental analysis indicates that AdaBoost performed the best in Dataset 1, with accuracy of 77.19% and F1-score of 63.11%. Random Forest performed the best in Dataset 2, with accuracy of 93.6% and F1-score of 77.20%. Random Forest performed the best in Dataset 3 in terms of accuracy, at 63.09%, while Multi-layer Perceptron performed the best in terms of F1-score, at 42.84%. After implementing churn prediction, Bayesian Logistic Regression is used to conduct the factor analysis and to figure out some important features for churn customer segmentation. Churn customer segmentation is then carried out using K-means clustering. Customers are segmented into different groups, which allows marketers and decision makers to adopt retention strategies more precisely.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.