Credit scoring is without a doubt one of the oldest applications of analytics. In recent years, a multitude of sophisticated classification techniques have been developed to improve the statistical performance of credit scoring models. Instead of focusing on the techniques themselves, this paper leverages alternative data sources to enhance both statistical and economic model performance. The study demonstrates how including call networks, in the context of positive credit information, as a new Big Data source has added value in terms of profit by applying a profit measure and profit-based feature selection. A unique combination of datasets, including call-detail records, credit and debit account information of customers is used to create scorecards for credit card applicants. Call-detail records are used to build call networks and advanced social network analytics techniques are applied to propagate influence from prior defaulters throughout the network to produce influence scores. The results show that combining call-detail records with traditional data in credit scoring models significantly increases their performance when measured in AUC. In terms of profit, the best model is the one built with only calling behavior features. In addition, the calling behavior features are the most predictive in other models, both in terms of statistical and economic performance. The results have an impact in terms of ethical use of call-detail records, regulatory implications, financial inclusion, as well as data sharing and privacy.
We employ the epidemic Renormalization Group (eRG) framework to understand, reproduce and predict the COVID-19 pandemic diffusion across the US. The human mobility across different geographical US divisions is modelled via open source flight data alongside the impact of social distancing for each such division. We analyse the impact of the vaccination strategy on the current pandemic wave dynamics in the US. We observe that the ongoing vaccination campaign will not impact the current pandemic wave and therefore strict social distancing measures must still be enacted. To curb the current and the next waves our results indisputably show that vaccinations alone are not enough and strict social distancing measures are required until sufficient immunity is achieved. Our results are essential for a successful vaccination strategy in the US.
Social network analytics methods are being used in the telecommunication industry to predict customer churn with great success. In particular it has been shown that relational learners adapted to this specific problem enhance the performance of predictive models. In the current study we benchmark different strategies for constructing a relational learner by applying them to a total of eight distinct call-detail record datasets, originating from telecommunication organizations across the world. We statistically evaluate the effect of relational classifiers and collective inference methods on the predictive power of relational learners, as well as the performance of models where relational learners are combined with traditional methods of predicting customer churn in the telecommunication industry.Finally we investigate the effect of network construction on model performance; our findings imply that the definition of edges and weights in the network does have an impact on the results of the predictive models. As a result of the study, the best configuration is a non-relational learner enriched with network variables, without collective inference, using binary weights and undirected networks. In addition, we provide guidelines on how to apply social networks analytics for churn prediction in the telecommunication industry in an optimal way, ranging from network architecture to model building and evaluation.
Summary Obstructive sleep apnea is linked to severe health consequences such as hypertension, daytime sleepiness, and cardiovascular disease. Nearly a billion people are estimated to have obstructive sleep apnea with a substantial economic burden. However, the current diagnostic parameter of obstructive sleep apnea, the apnea–hypopnea index, correlates poorly with related comorbidities and symptoms. Obstructive sleep apnea severity is measured by counting respiratory events, while other physiologically relevant consequences are ignored. Furthermore, as the clinical methods for analysing polysomnographic signals are outdated, laborious, and expensive, most patients with obstructive sleep apnea remain undiagnosed. Therefore, more personalised diagnostic approaches are urgently needed. The Sleep Revolution, funded by the European Union's Horizon 2020 Research and Innovation Programme, aims to tackle these shortcomings by developing machine learning tools to better estimate obstructive sleep apnea severity and phenotypes. This allows for improved personalised treatment options, including increased patient participation. Also, implementing these tools will alleviate the costs and increase the availability of sleep studies by decreasing manual scoring labour. Finally, the project aims to design a digital platform that functions as a bridge between researchers, patients, and clinicians, with an electronic sleep diary, objective cognitive tests, and questionnaires in a mobile application. These ambitious goals will be achieved through extensive collaboration between 39 centres, including expertise from sleep medicine, computer science, and industry and by utilising tens of thousands of retrospectively and prospectively collected sleep recordings. With the commitment of the European Sleep Research Society and Assembly of National Sleep Societies, the Sleep Revolution has the unique possibility to create new standardised guidelines for sleep medicine.
In order to improve the performance of any machine learning model, it is important to focus more on the data itself instead of continuously developing new algorithms. This is exactly the aim of feature engineering. It can be defined as the clever engineering of data hereby exploiting the intrinsic bias of the machine learning technique to our benefit, ideally both in terms of accuracy and interpretability at the same time. Often times it will be applied in combination with simple machine learning techniques such as regression models or decision trees to boost their performance (whilst maintaining the interpretability property which is so often needed in analytical modeling) but it may also improve complex techniques such as XGBoost and neural networks. Feature engineering aims at designing smart features in one of two possible ways: either by adjusting existing features using various transformations or by extracting or creating new meaningful features (a process often called "featurization") from different sources (e.g., transactional data, network data, time series data, text data, etc.).
The fraud detection of cargo theft has been a serious issue in ports for a long time. Traditional research in detecting theft risk is expert- and survey-based, which is not optimal for proactive prediction. As we move into a pervasive and ubiquitous paradigm, the implications of external environment and system behavior are continuously captured as multi-source data. Therefore, we propose a novel data-driven approach for formulating predictive models for detecting bulk cargo theft in ports. More specifically, we apply various feature-ranking methods and classification algorithms for selecting an effective feature set of relevant risk elements. Then, implicit Bayesian networks are derived with the features to graphically present the relationship with the risk elements of fraud. Thus, various binary classifiers are compared to derive a suitable predictive model, and Bayesian network performs best overall. The resulting Bayesian networks are then comparatively analyzed based on the outcomes of model validation and testing, as well as essential domain knowledge. The experimental results show that predictive models are effective, with both accuracy and recall values greater than 0.8. These predictive models are not only useful for understanding the dependency between relevant risk elements, but also for supporting the strategy optimization of risk management.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.