Abstract:Advances in data acquisition and statistical methodology have led to growing use of machine‐learning methods to predict geomorphic disturbance events. However, capturing the data required to parameterize these models is challenging because of expense or, more fundamentally, because the phenomenon of interest occurs infrequently. Thus, it is important to understand how the nature of the data used to train predictive models influences their performance. Using a database of cliff failure prediction and associated… Show more
“…However, the random sampling method has some drawbacks. First, this method does not pay attention to the distribution pattern of absence samples, and therefore the absence samples generated are sometimes significantly clustered and do not provide overall information on the entire study area [34,38]. Second, absence samples may be very close to presence locations, resulting in confusion in the model and also increasing an error in the final output [39].…”
Although sampling strategy plays an important role in groundwater potential mapping and significantly influences model accuracy, researchers often apply a simple random sampling method to determine absence (non-occurrence) samples. In this study, an automated, user-friendly geographic information system (GIS)-based tool, selection of absence samples (SAS), was developed using the Python programming language. The SAS tool takes into account different geospatial concepts, including nearest neighbor (NN) and hotspot analyses. In a case study, it was successfully applied to the Bojnourd watershed, Iran, together with two machine learning models (random forest (RF) and multivariate adaptive regression splines (MARS)) with GIS and remotely sensed data, to model groundwater potential. Different evaluation criteria (area under the receiver operating characteristic curve (AUC-ROC), true skill statistic (TSS), efficiency (E), false positive rate (FPR), true positive rate (TPR), true negative rate (TNR), and false negative rate (FNR)) were used to scrutinize model performance. Two absence sample types were produced, based on a simple random method and the SAS tool, and used in the models. The results demonstrated that both RF (AUC-ROC = 0.913, TSS = 0.72, E = 0.926) and MARS (AUC-ROC = 0.889, TSS = 0.705, E = 0.90) performed better when using absence samples generated by the SAS tool, indicating that this tool is capable of producing trustworthy absence samples to improve groundwater potential models.
“…However, the random sampling method has some drawbacks. First, this method does not pay attention to the distribution pattern of absence samples, and therefore the absence samples generated are sometimes significantly clustered and do not provide overall information on the entire study area [34,38]. Second, absence samples may be very close to presence locations, resulting in confusion in the model and also increasing an error in the final output [39].…”
Although sampling strategy plays an important role in groundwater potential mapping and significantly influences model accuracy, researchers often apply a simple random sampling method to determine absence (non-occurrence) samples. In this study, an automated, user-friendly geographic information system (GIS)-based tool, selection of absence samples (SAS), was developed using the Python programming language. The SAS tool takes into account different geospatial concepts, including nearest neighbor (NN) and hotspot analyses. In a case study, it was successfully applied to the Bojnourd watershed, Iran, together with two machine learning models (random forest (RF) and multivariate adaptive regression splines (MARS)) with GIS and remotely sensed data, to model groundwater potential. Different evaluation criteria (area under the receiver operating characteristic curve (AUC-ROC), true skill statistic (TSS), efficiency (E), false positive rate (FPR), true positive rate (TPR), true negative rate (TNR), and false negative rate (FNR)) were used to scrutinize model performance. Two absence sample types were produced, based on a simple random method and the SAS tool, and used in the models. The results demonstrated that both RF (AUC-ROC = 0.913, TSS = 0.72, E = 0.926) and MARS (AUC-ROC = 0.889, TSS = 0.705, E = 0.90) performed better when using absence samples generated by the SAS tool, indicating that this tool is capable of producing trustworthy absence samples to improve groundwater potential models.
“…In this study, a thorough investigation of ANN structures has been carried out for predicting the 28 days CS of FC, which is one of the most important mechanical properties of FC. In addition, it is well-known that the accuracy of the given machine learning algorithm greatly depends on the sampling strategy to construct the model [85,86]. results in [25,47].…”
Section: Discussionmentioning
confidence: 99%
“…In this study, a thorough investigation of ANN structures has been carried out for predicting the 28 days CS of FC, which is one of the most important mechanical properties of FC. In addition, it is well-known that the accuracy of the given machine learning algorithm greatly depends on the sampling strategy to construct the model [85,86]. Therefore, MCS was used in this study to fully analyze the capability of all the C-ANN structures, taking into account such variability of the input space in the training phase of the model.…”
Development of Foamed Concrete (FC) and incessant increases in fabrication technology have paved the way for many promising civil engineering applications. Nevertheless, the design of FC requires a large number of experiments to determine the appropriate Compressive Strength (CS). Employment of machine learning algorithms to take advantage of the existing experiments database has been attempted, but model performance can still be improved. In this study, the performance of an Artificial Neural Network (ANN) was fully analyzed to predict the 28 days CS of FC. Monte Carlo simulations (MCS) were used to statistically analyze the convergence of the modeled results under the effect of random sampling strategies and the network structures selected. Various statistical measures such as Coefficient of Determination (R2), Mean Absolute Error (MAE), and Root Mean Squared Error (RMSE) were used for validation of model performance. The results show that ANN is a highly efficient predictor of the CS of FC, achieving a maximum R2 value of 0.976 on the training part and an R2 of 0.972 on the testing part, using the optimized C-ANN-[3–4–5–1] structure, which compares with previous published studies. In addition, a sensitivity analysis using Partial Dependence Plots (PDP) over 1000 MCS was also performed to interpret the relationship between the input parameters and 28 days CS of FC. Dry density was found as the variable with the highest impact to predict the CS of FC. The results presented could facilitate and enhance the use of C-ANN in other civil engineering-related problems.
“…Current studies have clarified patterns of spatial sensitivity, however temporal forecasts have remained largely empirical [49], [50]. Most ML techniques achieve overall success rates of 75 − 95% [51]. While this may seem very promising, there are issues which remain with data input quality, potential over fitting and inadequate choice of prediction models, introducing unintentional inclusion of redundant or noise variables, and technical limits to predicting only certain types and sizes of the flare event [52], [53], [54].…”
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.