Missing data is a prevailing problem in predictive analytics. In this paper, a winner-take-all (WTA) autoencoder-based piecewise linear model is developed to solve the nonlinear regression problem under the missing value scenario, which consists of two parts: an overcomplete WTA autoencoder and a gated linear network. The overcomplete WTA autoencoder is a stacked denoising autoencoder (SDAE) designed to play two roles: (1) to estimate the missing values; (2) to realize a sophisticated partitioning by generating a broad set of binary gate control sequences. Besides, an iterative algorithm with renewed teacher signals is developed to train the SDAE. On the other hand, the gated linear network with the generated binary gate control sequences implements a flexible piecewise linear model for nonlinear regression. By composing a quasi-linear kernel based on the gate control sequences, the piecewise linear model is then identified in the same way as a support vector regression. Experimental results have shown that our proposed hybrid model has a better performance than traditional models.
In both the research and engineering fields, missing data is a serious problem that cannot be overlooked. Therefore, available datasets with missing data are a challenge to be modeled by conventional global prediction models. In this paper, we propose a hybrid model consisting of an autoencoder and a gated linear network for solving the regression problem under missing value scenario. A sophisticated modeling and identifying algorithm is developed. First, an extended affinity propagation (AP) clustering algorithm is applied to obtain a self-organized competitive net dividing the datasets into several clusters. Second, a multiple imputation tool with top p% winner-take-all denoising autoencoders (DAE) is introduced to realize better predictions of missing values, in which rough estimates of missing values by using the mean imputation and similarity method within the clusters are used as teacher signals of DAE. Finally, a gated linear network is designed to construct a piecewise linear regression model with interpolations in the exact same way as a support vector regression with a quasilinear kernel composed using the cluster information obtained in the AP clustering step. Based on the experiments of five datasets, our proposed method demonstrates its effectiveness and robustness compared with other traditional kernels and state-of-the-art methods, even on datasets with a large percentage of missing values.
Soil erosion often leads to land degradation, agricultural production reduction, and environmental deterioration, which seriously restricts the sustainable development of regions. Clarifying the driving factors of soil erosion is the premise of preventing soil erosion. Given the lack of current research on the driving factors/force changes of soil erosion in different regions or under different erosion intensity grades, this paper pioneered to use machine learning methods to address this problem. Firstly, the widely used (Revised) Universal Soil Loss Equation ((R)USLE) framework was applied to simulate the spatial distribution of soil erosion. Then, the K-fold algorithm was used to evaluate the accuracy and stability of five machine learning algorithms for fitting soil erosion. The random forest (RF) method performed best, with average accuracy reaching 86.35%. Then, the Permutation Importance (PI) and the Partial Dependence Plot (PDP) methods based on RF were introduced to quantitatively analyze the main driving factors under different geological conditions and the driving force changes of each factor under different erosion intensity grades, respectively. Results showed that the main drivers of soil erosion in Chongqing and Guizhou were cover management factors (PI: 0.4672, 0.4788), while that in Sichuan was slope length and slope factor (PI: 0.6165). Under different erosion intensity grades, the driving force of each factor shows nonlinear and complex inhibitory or promoting effects with factor value changing. These findings can provide scientific guidance for the refined management of soil erosion, which is significant for halting or reversing land degradation and achieving sustainable use of land resources.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.