When creating a soil contamination map using geostatistical techniques, there are various sources that can affect prediction errors. In this study, a grid-based soil contamination map was created from the sampling data of heavy metal concentrations in soil in abandoned mine areas using Ordinary Kriging. Five factors that were judged to affect the prediction error of the soil contamination map were selected, and the variation of the root mean squared error (RMSE) between the predicted value and the actual value was analyzed based on the Leave-one-out technique. Then, using a machine learning algorithm, derived the top three factors affecting the RMSE. As a result, it was analyzed that Variogram Model, Minimum Neighbors, and Anisotropy factors have the largest impact on RMSE in the Standard interpolation. For the variogram models, the Spherical model showed the lowest RMSE, while the Minimum Neighbors had the lowest value at 3 and then increased as the value increased. In the case of Anisotropy, it was found to be more appropriate not to consider anisotropy. In this study, through the combined use of geostatistics and machine learning, it was possible to create a highly reliable soil contamination map at the local scale, and to identify which factors have a significant impact when interpolating a small amount of soil heavy metal data.
In this study, the prediction of mining-induced subsidence is analyzed and compared using various machine learning models. Factors affecting the occurrence of subsidence are identified from eight and 1,730 sets of subsidence data. Five machine learning models are selected, i.e., Adaboost, artificial neural networks, the k-nearest neighbor, random forest, and the support vector machine, which are frequently used in studies related to geohazard prediction. In addition, the stacking technique is applied to five algorithms based on 10 combinations, and the predictive performance of each ensemble method is evaluated and compared. To evaluate the classification performance of the machine learning technique applied in this study, recall is used as an evaluation index, which describes the ratio of the predicted ground subsidence instead of the area under curve used previously. Based on the values of recall, the random forest demonstrates the best performance (with a recall of 0.955). The recall is expected to be a more reliable evaluation index for predicting subsidence occurrences compared with other indices.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.