Using Machine Learning to Predict Geomorphic Disturbance: The Effects of Sample Size, Sample Prevalence, and Sampling Strategy

Perry, George L.; Dickson, Mark E.

doi:10.1029/2018jf004640

Cited by 26 publications

(16 citation statements)

References 58 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…However, the random sampling method has some drawbacks. First, this method does not pay attention to the distribution pattern of absence samples, and therefore the absence samples generated are sometimes significantly clustered and do not provide overall information on the entire study area [34,38]. Second, absence samples may be very close to presence locations, resulting in confusion in the model and also increasing an error in the final output [39].…”

Section: Introductionmentioning

confidence: 99%

An Automated Python Language-Based Tool for Creating Absence Samples in Groundwater Potential Mapping

et al. 2019

View full text Add to dashboard Cite

Although sampling strategy plays an important role in groundwater potential mapping and significantly influences model accuracy, researchers often apply a simple random sampling method to determine absence (non-occurrence) samples. In this study, an automated, user-friendly geographic information system (GIS)-based tool, selection of absence samples (SAS), was developed using the Python programming language. The SAS tool takes into account different geospatial concepts, including nearest neighbor (NN) and hotspot analyses. In a case study, it was successfully applied to the Bojnourd watershed, Iran, together with two machine learning models (random forest (RF) and multivariate adaptive regression splines (MARS)) with GIS and remotely sensed data, to model groundwater potential. Different evaluation criteria (area under the receiver operating characteristic curve (AUC-ROC), true skill statistic (TSS), efficiency (E), false positive rate (FPR), true positive rate (TPR), true negative rate (TNR), and false negative rate (FNR)) were used to scrutinize model performance. Two absence sample types were produced, based on a simple random method and the SAS tool, and used in the models. The results demonstrated that both RF (AUC-ROC = 0.913, TSS = 0.72, E = 0.926) and MARS (AUC-ROC = 0.889, TSS = 0.705, E = 0.90) performed better when using absence samples generated by the SAS tool, indicating that this tool is capable of producing trustworthy absence samples to improve groundwater potential models.

show abstract

Section: Introductionmentioning

confidence: 99%

An Automated Python Language-Based Tool for Creating Absence Samples in Groundwater Potential Mapping

et al. 2019

View full text Add to dashboard Cite

show abstract

Section: Discussionmentioning

confidence: 99%

“…In this study, a thorough investigation of ANN structures has been carried out for predicting the 28 days CS of FC, which is one of the most important mechanical properties of FC. In addition, it is well-known that the accuracy of the given machine learning algorithm greatly depends on the sampling strategy to construct the model [85,86]. Therefore, MCS was used in this study to fully analyze the capability of all the C-ANN structures, taking into account such variability of the input space in the training phase of the model.…”

Section: Discussionmentioning

confidence: 99%

Investigation and Optimization of the C-ANN Structure in Predicting the Compressive Strength of Foamed Concrete

Dao

et al. 2020

Materials

View full text Add to dashboard Cite

Development of Foamed Concrete (FC) and incessant increases in fabrication technology have paved the way for many promising civil engineering applications. Nevertheless, the design of FC requires a large number of experiments to determine the appropriate Compressive Strength (CS). Employment of machine learning algorithms to take advantage of the existing experiments database has been attempted, but model performance can still be improved. In this study, the performance of an Artificial Neural Network (ANN) was fully analyzed to predict the 28 days CS of FC. Monte Carlo simulations (MCS) were used to statistically analyze the convergence of the modeled results under the effect of random sampling strategies and the network structures selected. Various statistical measures such as Coefficient of Determination (R2), Mean Absolute Error (MAE), and Root Mean Squared Error (RMSE) were used for validation of model performance. The results show that ANN is a highly efficient predictor of the CS of FC, achieving a maximum R2 value of 0.976 on the training part and an R2 of 0.972 on the testing part, using the optimized C-ANN-[3–4–5–1] structure, which compares with previous published studies. In addition, a sensitivity analysis using Partial Dependence Plots (PDP) over 1000 MCS was also performed to interpret the relationship between the input parameters and 28 days CS of FC. Dry density was found as the variable with the highest impact to predict the CS of FC. The results presented could facilitate and enhance the use of C-ANN in other civil engineering-related problems.

show abstract

“…Current studies have clarified patterns of spatial sensitivity, however temporal forecasts have remained largely empirical [49], [50]. Most ML techniques achieve overall success rates of 75 − 95% [51]. While this may seem very promising, there are issues which remain with data input quality, potential over fitting and inadequate choice of prediction models, introducing unintentional inclusion of redundant or noise variables, and technical limits to predicting only certain types and sizes of the flare event [52], [53], [54].…”

Section: Motivationmentioning

confidence: 99%

Using Support Vector Machine (SVM) and Ionospheric Total Electron Content (TEC) Data for Solar Flare Predictions

Asaly

Gottlieb

Reuveni

2021

IEEE J. Sel. Top. Appl. Earth Observations Remote Sensing

View full text Add to dashboard Cite

Using Machine Learning to Predict Geomorphic Disturbance: The Effects of Sample Size, Sample Prevalence, and Sampling Strategy

Cited by 26 publications

References 58 publications

An Automated Python Language-Based Tool for Creating Absence Samples in Groundwater Potential Mapping

An Automated Python Language-Based Tool for Creating Absence Samples in Groundwater Potential Mapping

Investigation and Optimization of the C-ANN Structure in Predicting the Compressive Strength of Foamed Concrete

Using Support Vector Machine (SVM) and Ionospheric Total Electron Content (TEC) Data for Solar Flare Predictions

Contact Info

Product

Resources

About