Personal credit scoring is a challenging issue. In recent years, research has shown that machine learning has satisfactory performance in credit scoring. Because of the advantages of feature combination and feature selection, decision trees can match credit data which have high dimension and a complex correlation. Decision trees tend to overfitting yet. eXtreme Gradient Boosting is an advanced gradient enhanced tree that overcomes its shortcomings by integrating tree models. The structure of the model is determined by hyperparameters, which is aimed at the time-consuming and laborious problem of manual tuning, and the optimization method is employed for tuning. As particle swarm optimization describes the particle state and its motion law as continuous real numbers, the hyperparameter applicable to eXtreme Gradient Boosting can find its optimal value in the continuous search space. However, classical particle swarm optimization tends to fall into local optima. To solve this problem, this paper proposes an eXtreme Gradient Boosting credit scoring model that is based on adaptive particle swarm optimization. The swarm split, which is based on the clustering idea and two kinds of learning strategies, is employed to guide the particles to improve the diversity of the subswarms, in order to prevent the algorithm from falling into a local optimum. In the experiment, several traditional machine learning algorithms and popular ensemble learning classifiers, as well as four hyperparameter optimization methods (grid search, random search, tree-structured Parzen estimator, and particle swarm optimization), are considered for comparison. Experiments were performed with four credit datasets and seven KEEL benchmark datasets over five popular evaluation measures: accuracy, error rate (type I error and type II error), Brier score, and
F
1
score. Results demonstrate that the proposed model outperforms other models on average. Moreover, adaptive particle swarm optimization performs better than the other hyperparameter optimization strategies.
Stock prediction is a challenging task due to multiple influencing factors and complex market dependencies. Traditional solutions are based on a single type of information. With the success of multi-source information in different fields, the combination of different types of information such as numerical and textual information has become a promising option.
Although multi-source information provides rich multi-view information, how to mine and construct structured relationships from them is a difficult problem. Specifically, most existing methods usually extract features from commonly used multi-source information as predictive information sources, without further pre-constructing stock relationship graphs with dependencies using broader information. More importantly, they typically treat each stock as an isolated forecasting, or employ stock market correlations based on a fixed predefined graph structure, but current methods are not sensitive enough to aggregate the attribute features extracted from multi-source information and stock relationship graph, to obtain the dynamic update of market relations and relationship strength. The stock market is highly temporally, and the attributes of nodes are affected by the time perception of other attributes, which is not fully considered.
To address these problems, we propose a novel dynamic attributes-driven graph attention networks incorporating sentiment (DGATS) information, transaction data, and text data. Inspired by behavioral finance, we separately extract sentiment information as a factor of technical indicators, and further realize the early fusion of technical indicators and textual data through Kronecker product-based tensor fusion. In particular, by LSTM and temporal attention network, the short-term and long-term transition features are gradually grasped from the local composition of the fused stock trading sequence. Furthermore, real-time intra-market dependencies and key attributes information are captured with graph networks, enabling dynamic updates of relationships and relationship strengths in predefined graphs. Experiments on the real datasets show that the architecture can outperform the previous methods in prediction performance.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.